idnits 2.17.1 

draft-irtf-pearg-numeric-ids-generation-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (August 23, 2019) is 1698 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)

  ** Obsolete normative reference: RFC 6528 (Obsoleted by RFC 9293)

  == Outdated reference: A later version (-11) exists of
     draft-gont-numeric-ids-sec-considerations-04


     Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Research Task Force (IRTF)                              F. Gont
3	Internet-Draft                                              SI6 Networks
4	Intended status: Best Current Practice                           I. Arce
5	Expires: February 24, 2020                                     Quarkslab
6	                                                         August 23, 2019

8	           On the Generation of Transient Numeric Identifiers
9	               draft-irtf-pearg-numeric-ids-generation-00

11	Abstract

13	   This document performs an analysis of the security and privacy
14	   implications of different types of "numeric identifiers" used in IETF
15	   protocols, and tries to categorize them based on their
16	   interoperability requirements and the associated failure severity
17	   when such requirements are not met.  Subsequently, it provides advice
18	   on possible algorithms that could be employed to satisfy the
19	   interoperability requirements of each identifier type, while
20	   minimizing the security and privacy implications, thus providing
21	   guidance to protocol designers and protocol implementers.  Finally,
22	   this describes a number of algorithms that have been employed in real
23	   implementations to generate transient numeric identifiers and
24	   analyzes their security and privacy properties.

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at https://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on February 24, 2020.

43	Copyright Notice

45	   Copyright (c) 2019 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (https://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
61	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
62	   3.  Threat Model  . . . . . . . . . . . . . . . . . . . . . . . .   5
63	   4.  Issues with the Specification of Identifiers  . . . . . . . .   5
64	   5.  Protocol Failure Severity . . . . . . . . . . . . . . . . . .   6
65	   6.  Categorizing Identifiers  . . . . . . . . . . . . . . . . . .   6
66	   7.  Common Algorithms for Identifier Generation . . . . . . . . .   8
67	     7.1.  Category #1: Uniqueness (soft failure)  . . . . . . . . .   8
68	     7.2.  Category #2: Uniqueness (hard failure)  . . . . . . . . .  10
69	     7.3.  Category #3: Uniqueness, constant within context (soft-
70	           failure)  . . . . . . . . . . . . . . . . . . . . . . . .  10
71	     7.4.  Category #4: Uniqueness, monotonically increasing within
72	           context (hard failure)  . . . . . . . . . . . . . . . . .  12
73	   8.  Common Vulnerabilities Associated with Transient Numeric
74	       Identifiers . . . . . . . . . . . . . . . . . . . . . . . . .  17
75	     8.1.  Network Activity Correlation  . . . . . . . . . . . . . .  17
76	     8.2.  Information Leakage . . . . . . . . . . . . . . . . . . .  17
77	     8.3.  Exploitation of Semantics of Transient Numeric
78	           Identifiers . . . . . . . . . . . . . . . . . . . . . . .  19
79	     8.4.  Exploitation of Collisions of Transient Numeric
80	           Identifiers . . . . . . . . . . . . . . . . . . . . . . .  19
81	   9.  Vulnerability Analysis of Specific Transient Numeric
82	       Identifiers Categories  . . . . . . . . . . . . . . . . . . .  19
83	     9.1.  Category #1: Uniqueness (soft failure)  . . . . . . . . .  19
84	     9.2.  Category #2: Uniqueness (hard failure)  . . . . . . . . .  20
85	     9.3.  Category #3: Uniqueness, constant within context (soft
86	           failure)  . . . . . . . . . . . . . . . . . . . . . . . .  20
87	     9.4.  Category #4: Uniqueness, monotonically increasing within
88	           context (hard failure)  . . . . . . . . . . . . . . . . .  20
89	   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  22
90	   11. Security Considerations . . . . . . . . . . . . . . . . . . .  22
91	   12. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  23
92	   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  23
93	     13.1.  Normative References . . . . . . . . . . . . . . . . . .  23
94	     13.2.  Informative References . . . . . . . . . . . . . . . . .  24
95	   Appendix A.  Flawed Algorithms  . . . . . . . . . . . . . . . . .  27
96	     A.1.  Predictable Linear Identifiers Algorithm  . . . . . . . .  27
97	     A.2.  Random-Increments Algorithm . . . . . . . . . . . . . . .  28
98	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  30

100	1.  Introduction

102	   Network protocols employ a variety of numeric identifiers for
103	   different protocol entities, ranging from DNS Transaction IDs (TxIDs)
104	   to transport protocol ports (e.g.  TCP ports) or IPv6 Interface
105	   Identifiers (IIDs).  These identifiers usually have specific
106	   properties that must be satisfied such that they do not result in
107	   negative interoperability implications (e.g. uniqueness during a
108	   specified period of time), and an associated failure severity when
109	   such properties are not met, ranging from soft to hard failures.

111	   For more than 30 years, a large number of implementations of the TCP/
112	   IP protocol suite have been subject to a variety of attacks, with
113	   effects ranging from Denial of Service (DoS) or data injection, to
114	   information leakage that could be exploited for pervasive monitoring
115	   [RFC7258].  The root of these issues has been, in many cases, the
116	   poor selection of identifiers in such protocols, usually as a result
117	   of insufficient or misleading specifications.  While it is generally
118	   trivial to identify an algorithm that can satisfy the
119	   interoperability requirements of a given identifier, there exists
120	   practical evidence that doing so without negatively affecting the
121	   security and/or privacy properties of the aforementioned protocols is
122	   prone to error [I-D.gont-numeric-ids-history].

124	   For example, implementations have been subject to security and/or
125	   privacy issues resulting from:

127	   o  Predictable TCP sequence numbers

129	   o  Predictable transport protocol port numbers

131	   o  Predictable IPv4 or IPv6 Fragment Identifiers

133	   o  Predictable IPv6 Interface Identifiers (IIDs)

135	   o  Predictable DNS Transaction Identifiers (TxIDs)

137	   Recent history indicates that when new protocols are standardized or
138	   new protocol implementations are produced, the security and privacy
139	   properties of the associated identifiers tend to be overlooked and
140	   inappropriate algorithms to generate transient numeric identifiers
141	   are either suggested in the specification or selected by
142	   implementers.  As a result, we believe that advice in this area is
143	   warranted.

145	   This document contains a non-exhaustive survey of identifiers
146	   employed in various IETF protocols, and aims to categorize such
147	   identifiers based on their interoperability requirements, and the
148	   associated failure severity when such requirements are not met.
149	   Subsequently, it provides advice on possible algorithms that could be
150	   employed to satisfy the interoperability requirements of each
151	   category, while minimizing the associated security and privacy
152	   implications.  Finally, it analyzes several algorithms that have been
153	   employed in real implementations to meet such requirements and
154	   analyzes their security and privacy properties.

156	2.  Terminology

158	   Identifier:
159	      A data object in a protocol specification that can be used to
160	      definitely distinguish a protocol object (a datagram, network
161	      interface, transport protocol endpoint, session, etc) from all
162	      other objects of the same type, in a given context.  Identifiers
163	      are usually defined as a series of bits and represented using
164	      integer values.  We note that different identifiers may have
165	      additional requirements or properties depending on their specific
166	      use in a protocol.  We use the term "identifier" as a generic term
167	      to refer to any data object in a protocol specification that
168	      satisfies the identification property stated above.

170	   Failure Severity:
171	      The consequences of a failure to comply with the interoperability
172	      requirements of a given identifier.  Severity considers the worst
173	      potential consequence of a failure, determined by the system
174	      damage and/or time lost to repair the failure.  In this document
175	      we define two types of failure severity: "soft" and "hard".

177	   Hard Failure:
178	      A hard failure is a non-recoverable condition in which a protocol
179	      does not operate in the prescribed manner or it operates with
180	      excessive degradation of service.  For example, an established TCP
181	      connection that is aborted due to an error condition constitutes,
182	      from the point of view of the transport protocol, a hard failure,
183	      since it enters a state from which normal operation cannot be
184	      recovered.

186	   Soft Failure:
187	      A soft failure is a recoverable condition in which a protocol does
188	      not operate in the prescribed manner but normal operation can be
189	      resumed automatically in a short period of time.  For example, a
190	      simple packet-loss event that is subsequently recovered with a
191	      retransmission can be considered a soft failure.

193	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
194	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
195	   document are to be interpreted as described in RFC 2119 [RFC2119].

197	3.  Threat Model

199	   Throughout this document, we assume an attacker does not have
200	   physical or logical access to the device(s) being attacked.  We
201	   assume the attacker can simply send any traffic to the target
202	   devices, to e.g. sample identifiers employed by such devices.

204	4.  Issues with the Specification of Identifiers

206	   While assessing protocol specifications regarding the use of
207	   identifiers, we found that most of the issues discussed in this
208	   document arise as a result of one of the following conditions:

210	   o  Protocol specifications which under-specify the requirements for
211	      their identifiers

213	   o  Protocol specifications that over-specify their identifiers

215	   o  Protocol implementations that simply fail to comply with the
216	      specified requirements

218	   A number of protocol implementations (too many of them) simply
219	   overlook the security and privacy implications of identifiers
220	   [I-D.gont-numeric-ids-history].  Examples of them are the
221	   specification of TCP port numbers in [RFC0793], the specification of
222	   TCP sequence numbers in [RFC0793], or the specification of the DNS
223	   TxID in [RFC1035].

225	   On the other hand, there are a number of protocol specifications that
226	   over-specify some of their associated protocol identifiers.  For
227	   example, [RFC4291] essentially results in link-layer addresses being
228	   embedded in the IPv6 Interface Identifiers (IIDs) when the
229	   interoperability requirement of uniqueness could be achieved in other
230	   ways that do not result in negative security and privacy implications
231	   [RFC7721].  Similarly, [RFC2460] suggested the use of a global
232	   counter for the generation of Fragment Identification values, when
233	   the interoperability properties of uniqueness per {Src IP, Dst IP}
234	   could be achieved with other algorithms that do not result in
235	   negative security and privacy implications.

237	   Finally, there are protocol implementations that simply fail to
238	   comply with existing protocol specifications.  For example, some
239	   popular operating systems (notably Microsoft Windows) still fail to
240	   implement transport port randomization, as specified in [RFC6056].

242	5.  Protocol Failure Severity

244	   Section 2 defines the concept of "Failure Severity" and two types of
245	   failures that we employ throughout this document: soft and hard.

247	   Our analysis of the severity of a failure is performed from the point
248	   of view of the protocol in question.  However, the corresponding
249	   severity on the upper application or protocol may not be the same as
250	   that of the protocol in question.  For example, a TCP connection that
251	   is aborted may or may not result in a hard failure of the upper
252	   application: if the upper application can establish a new TCP
253	   connection without any impact on the application, a hard failure at
254	   the TCP protocol may have no severity at the application level.  On
255	   the other hand, if a hard failure of a TCP connection results in
256	   excessive degradation of service at the application layer, it will
257	   also result in a hard failure at the application.

259	6.  Categorizing Identifiers

261	   This section includes a non-exhaustive survey of identifiers, and
262	   proposes a number of categories that can accommodate these
263	   identifiers based on their interoperability requirements and their
264	   failure modes (soft or hard)

266	   +------------+--------------------------------------+---------------+
267	   | Identifier |    Interoperability Requirements     |    Failure    |
268	   |            |                                      |    Severity   |
269	   +------------+--------------------------------------+---------------+
270	   | IPv6 Frag  |   Uniqueness (for IP address pair)   | Soft/Hard (1) |
271	   |     ID     |                                      |               |
272	   +------------+--------------------------------------+---------------+
273	   |  IPv6 IID  | Uniqueness (and constant within IPv6 |    Soft (3)   |
274	   |            |             prefix) (2)              |               |
275	   +------------+--------------------------------------+---------------+
276	   |  TCP ISN   |       Monotonically-increasing       |    Hard (4)   |
277	   +------------+--------------------------------------+---------------+
278	   |  TCP eph.  |    Uniqueness (for connection ID)    |      Hard     |
279	   |    port    |                                      |               |
280	   +------------+--------------------------------------+---------------+
281	   | IPv6 Flow  |              Uniqueness              |    None (5)   |
282	   |     L.     |                                      |               |
283	   +------------+--------------------------------------+---------------+
284	   |  DNS TxID  |              Uniqueness              |    None (6)   |
285	   +------------+--------------------------------------+---------------+

287	                      Table 1: Survey of Identifiers

289	   Notes:

291	   (1)
292	      While a single collision of Fragment ID values would simply lead
293	      to a single packet drop (and hence a "soft" failure), repeated
294	      collisions at high data rates might trash the Fragment ID space,
295	      leading to a hard failure [RFC4963].

297	   (2)
298	      While the interoperability requirements are simply that the
299	      Interface ID results in a unique IPv6 address, for operational
300	      reasons it is typically desirable that the resulting IPv6 address
301	      (and hence the corresponding Interface ID) be constant within each
302	      network [RFC7217] [RFC8064] .

304	   (3)
305	      While IPv6 Interface IDs must result in unique IPv6 addresses,
306	      IPv6 Duplicate Address Detection (DAD) [RFC4862] allows for the
307	      detection of duplicate Interface IDs/addresses, and hence such
308	      Interface ID collisions can be recovered.

310	   (4)
311	      In theory there are no interoperability requirements for TCP
312	      Initial Sequence Numbers (ISNs), since the TIME-WAIT state and
313	      TCP's "quiet time" take care of old segments from previous
314	      incarnations of the connection.  However, a widespread
315	      optimization allows for a new incarnation of a previous connection
316	      to be created if the ISN of the incoming SYN is larger than the
317	      last sequence number seen in that direction for the previous
318	      incarnation of the connection.  Thus, monotonically-increasing TCP
319	      sequence numbers allow for such optimization to work as expected
320	      [RFC6528].

322	   (5)
323	      The IPv6 Flow Label is typically employed for load sharing
324	      [RFC7098], along with the Source and Destination IPv6 addresses.
325	      Reuse of a Flow Label value for the same set {Source Address,
326	      Destination Address} would typically cause both flows to be
327	      multiplexed into the same link.  However, as long as this does not
328	      occur deterministically, it will not result in any negative
329	      implications.

331	   (6)
332	      DNS TxIDs are employed, together with the Source Address,
333	      Destination Address, Source Port, and Destination Port, to match
334	      DNS requests and responses.  However, since an implementation
335	      knows which DNS requests were sent for that set of {Source
336	      Address, Destination Address, Source Port, and Destination Port,
337	      DNS TxID}, a collision of TxID would result, if anything, in a
338	      small performance penalty (the response would be discarded when it
339	      is found that it does not answer the query sent in the
340	      corresponding DNS query).

342	   Based on the survey above, we can categorize identifiers as follows:

344	   +-----+---------------------------------------+---------------------+
345	   | Cat |                Category               |   Sample Proto IDs  |
346	   |  #  |                                       |                     |
347	   +-----+---------------------------------------+---------------------+
348	   |  1  |       Uniqueness (soft failure)       |  IPv6 Flow L., DNS  |
349	   |     |                                       |        TxIDs        |
350	   +-----+---------------------------------------+---------------------+
351	   |  2  |       Uniqueness (hard failure)       |  IPv6 Frag ID, TCP  |
352	   |     |                                       |    ephemeral port   |
353	   +-----+---------------------------------------+---------------------+
354	   |  3  |  Uniqueness, constant within context  |      IPv6 IIDs      |
355	   |     |             (soft failure)            |                     |
356	   +-----+---------------------------------------+---------------------+
357	   |  4  |  Uniqueness, monotonically increasing |       TCP ISN       |
358	   |     |     within context (hard failure)     |                     |
359	   +-----+---------------------------------------+---------------------+

361	                      Table 2: Identifier Categories

363	   We note that Category #4 could be considered a generalized case of
364	   category #3, in which a monotonically increasing element is added to
365	   a constant (within context) element, such that the resulting
366	   identifiers are monotonically increasing within a specified context.
367	   That is, the same algorithm could be employed for both #3 and #4,
368	   given appropriate parameters.

370	7.  Common Algorithms for Identifier Generation

372	   The following subsections describe common algorithms found for
373	   Protocol ID generation for each of the categories above.

375	7.1.  Category #1: Uniqueness (soft failure)

377	7.1.1.  Simple Randomization Algorithm
378	       /* Ephemeral port selection function */
379	       id_range = max_id - min_id + 1;
380	       next_id = min_id + (random() % id_range);
381	       count = next_id;

383	       do {
384	           if(check_suitable_id(next_id))
385	               return next_id;

387	           if (next_id == max_id) {
388	               next_id = min_id;
389	           } else {
390	               next_id++;
391	           }

393	           count--;
394	       } while (count > 0);

396	       return ERROR;

398	   Note:
399	      random() is a function that returns a pseudo-random unsigned
400	      integer number of appropriate size.  Note that the output needs to
401	      be unpredictable, and typical implementations of POSIX random()
402	      function do not necessarily meet this requirement.  See [RFC4086]
403	      for randomness requirements for security.

405	      The function check_suitable_id() can check, when possible, whether
406	      this identifier is e.g. already in use.  When already used, this
407	      algorithm selects the next available protocol ID.

409	      All the variables (in this and all the algorithms discussed in
410	      this document) are unsigned integers.

412	   This algorithm does not suffer from any of the issues discussed in
413	   Section 8.

415	7.1.2.  Another Simple Randomization Algorithm

417	   The following pseudo-code illustrates another algorithm for selecting
418	   a random identifier in which, in the event the identifier is found to
419	   be not suitable (e.g., already in use), another identifier is
420	   selected randomly:

422	       id_range = max_id - min_id + 1;
423	       next_id = min_id + (random() % id_range);
424	       count = id_range;

426	       do {
427	           if(check_suitable_id(next_id))
428	               return next_id;

430	           next_id = min_id + (random() % id_range);
431	           count--;
432	       } while (count > 0);

434	       return ERROR;

436	   This algorithm might be unable to select an identifier (i.e., return
437	   "ERROR") even if there are suitable identifiers available, when there
438	   are a large number of identifiers "in use".

440	   This algorithm does not suffer from any of the issues discussed in
441	   Section 8.

443	7.2.  Category #2: Uniqueness (hard failure)

445	   One of the most trivial approaches for achieving uniqueness for an
446	   identifier (with a hard failure mode) is to implement a linear
447	   function.  As a result, all of the algorithms described in
448	   Section 7.4 are of use for complying the requirements of this
449	   identifier category.

451	7.3.  Category #3: Uniqueness, constant within context (soft-failure)

453	   The goal of this algorithm is to produce identifiers that are
454	   constant for a given context, but that change when the aforementioned
455	   context changes.

457	   Keeping one value for each possible "context" may in many cases be
458	   considered too onerous in terms of memory requirements.  As a
459	   workaround, the following algorithm employs a calculated technique
460	   (as opposed to keeping state in memory) to maintain the constant
461	   identifier for each given context.

463	   In the following algorithm, the function F() provides (statelessly) a
464	   constant identifier for each given context.

466	       /* Protocol ID selection function  */
467	       id_range = max_id - min_id + 1;

469	       counter = 0;

471	       do {
472	           offset = F(CONTEXT, counter, secret_key);
473	           next_id = min_id + (offset % id_range);

475	           if(check_suitable_id(next_id))
476	               return next_id;

478	           counter++;

480	       } while (counter <= MAX_RETRIES);

482	       return ERROR;

484	   The function F() provides a "per-CONTEXT" constant identifier for a
485	   given context. 'offset' may take any value within the storage type
486	   range since we are restricting the resulting identifier to be in the
487	   range [min_id, max_id] in a similar way as in the algorithm described
488	   in Section 7.1.1.  Collisions can be recovered by incrementing the
489	   'counter' variable and recomputing F().

491	   The function F() should be a cryptographic hash function like SHA-256
492	   [FIPS-SHS].  Note: MD5 [RFC1321] is considered unacceptable for F()
493	   [RFC6151].  CONTEXT is the concatenation of all the elements that
494	   define a given context.  For example, if this algorithm is expected
495	   to produce identifiers that are unique per network interface card
496	   (NIC) and SLAAC autoconfiguration prefix, the CONTEXT should be the
497	   concatenation of e.g. the interface index and the SLAAC
498	   autoconfiguration prefix (please see [RFC7217] for an implementation
499	   of this algorithm for the generation of IPv6 IIDs).

501	   The secret should be chosen to be as random as possible (see
502	   [RFC4086] for recommendations on choosing secrets).

504	   For obvious reasons, the transient network identifiers generated with
505	   this algorithm allow for network activity correlation within
506	   "CONTEXT".  However, this is essentially a design goal of this
507	   category of transient numeric identifiers.

509	7.4.  Category #4: Uniqueness, monotonically increasing within context
510	      (hard failure)

512	7.4.1.  Per-context Counter Algorithm

514	   One possible way to achieve low identifier reuse frequency while
515	   still avoiding predictable sequences would be to employ a per-context
516	   counter, as opposed to a global counter.  Such an algorithm could be
517	   described as follows:

519	       /* Initialization at system boot time. Could be random */
520	       id_inc= 1;

522	       /* Identifier selection function */
523	       count = max_id - min_id + 1;

525	       if(lookup_counter(CONTEXT) == ERROR){
526	            create_counter(CONTEXT);
527	       }

529	       next_id= lookup_counter(CONTEXT);

531	       do {
532	           if (next_id == max_id) {
533	               next_id = min_id;
534	           }
535	           else {
536	               next_id = next_id + id_inc;
537	           }

539	           if (check_suitable_id(next_id)){
540	               store_counter(CONTEXT, next_id);
541	               return next_id;
542	           }

544	           count--;

546	       } while (count > 0);

548	       store_counter(CONTEXT, next_id);
549	       return ERROR;

551	   NOTE:
552	      lookup_counter() returns the current counter for a given context,
553	      or an error condition if such a counter does not exist.

555	      create_counter() creates a counter for a given context, and
556	      initializes such counter to a random value.

558	      store_counter() saves (updates) the current counter for a given
559	      context.

561	      check_suitable_id() is a function that checks whether the
562	      resulting identifier is acceptable (e.g., whether its in use,
563	      etc.).

565	   Essentially, whenever a new identifier is to be selected, the
566	   algorithm checks whether there there is a counter for the
567	   corresponding context.  If there is, such counter is incremented to
568	   obtain the new identifier, and the new identifier updates the
569	   corresponding counter.  If there is no counter for such context, a
570	   new counter is created an initialized to a random value, and used as
571	   the new identifier.

573	   This algorithm produces a per-context counter, which results in one
574	   linear function for each context.  Since the origin of each "line" is
575	   a random value, the resulting values are unknown to an off-path
576	   attacker.

578	   This algorithm has the following drawbacks:

580	   o  If, as a result of resource management, the counter for a given
581	      context must be removed, the last identifier value used for that
582	      context will be lost.  Thus, if subsequently an identifier needs
583	      to be generated for such context, that counter will need to be
584	      recreated and reinitialized to random value, thus possibly leading
585	      to reuse/collistion of identifiers.

587	   o  If the identifiers are predictable by the destination system
588	      (e.g., the destination host represents the "context"), a
589	      vulnerable host might possibly leak to third parties the
590	      identifiers used by other hosts to send traffic to it (i.e., a
591	      vulnerable Host B could leak to Host C the identifier values that
592	      Host A is using to send packets to Host B).  Appendix A of
593	      [RFC7739] describes one possible scenario for such leakage in
594	      detail.

596	   Otherwise, the identifiers produced by this algorithm do not suffer
597	   from the other issues discussed in Section 8.

599	7.4.2.  Simple Hash-Based Algorithm

601	   The goal of this algorithm is to produce monotonically-increasing
602	   sequences, with a randomized initial value, for each given context.
603	   For example, if the identifiers being generated must be unique for
604	   each {src IP, dst IP} set, then each possible combination of {src IP,
605	   dst IP} should have a corresponding "next_id" value.

607	   Keeping one value for each possible "context" may in many cases be
608	   considered too onerous in terms of memory requirements.  As a
609	   workaround, the following algorithm employs a calculated technique
610	   (as opposed to keeping state in memory) to maintain the random offset
611	   for each possible context.

613	   In the following algorithm, the function F() provides (statelessly) a
614	   random offset for each given context.

616	       /* Initialization at system boot time. Could be random. */
617	       counter = 0;

619	       /* Protocol ID selection function  */
620	       id_range = max_id - min_id + 1;
621	       offset = F(CONTEXT, secret_key);
622	       count = id_range;

624	       do {
625	           next_id = min_id +
626	                  (counter + offset) % id_range;

628	           counter++;

630	           if(check_suitable_id(next_id))
631	               return next_id;

633	           count--;

635	       } while (count > 0);

637	       return ERROR;

639	   The function F() provides a "per-CONTEXT" fixed offset within the
640	   identifier space.  Both the 'offset' and 'counter' variables may take
641	   any value within the storage type range since we are restricting the
642	   resulting identifier to be in the range [min_id, max_id] in a similar
643	   way as in the algorithm described in Section 7.1.1.  This allows us
644	   to simply increment the 'counter' variable and rely on the unsigned
645	   integer to wrap around.

647	   The function F() should be a cryptographic hash function like SHA-256
648	   [FIPS-SHS].  Note: MD5 [RFC1321] is considered unacceptable for F()
649	   [RFC6151].  CONTEXT is the concatenation of all the elements that
650	   define a given context.  For example, if this algorithm is expected
651	   to produce identifiers that are monotonically-increasing for each set
652	   (Source IP Address, Destination IP Address), the CONTEXT should be
653	   the concatenation of these two values.

655	   The secret should be chosen to be as random as possible (see
656	   [RFC4086] for recommendations on choosing secrets).

658	   It should be noted that, since this algorithm uses a global counter
659	   ("counter") for selecting identifiers, this algorithm produces an
660	   information leakage (as described in Section 8.2).  For example, if
661	   this algorithm were used for TCP ephemeral port selection, and an
662	   attacker could force a client to periodically establish a new TCP
663	   connection to an attacker-controlled machine (or through an attacker-
664	   observable routing path), the attacker could subtract consecutive
665	   source port values to obtain the number of outgoing TCP connections
666	   established globally by the target host within that time period (up
667	   to wrap-around issues and five-tuple collisions, of course).

669	7.4.3.  Double-Hash Algorithm

671	   A trade-off between maintaining a single global 'counter' variable
672	   and maintaining 2**N 'counter' variables (where N is the width of the
673	   result of F()) could be achieved as follows.  The system would keep
674	   an array of TABLE_LENGTH integers, which would provide a separation
675	   of the increment of the 'counter' variable.  This improvement could
676	   be incorporated into the algorithm from Section 7.4.2 as follows:

678	       /* Initialization at system boot time */
679	       for(i = 0; i < TABLE_LENGTH; i++)
680	           table[i] = random();

682	       id_inc = 1;

684	       /* Protocol ID selection function */
685	       id_range = max_id - min_id + 1;
686	       offset = F(CONTEXT, secret_key1);
687	       index = G(CONTEXT, secret_key2);
688	       count = id_range;

690	       do {
691	           next_id = min_id + (offset + table[index]) % id_range;
692	           table[index] = table[index] + id_inc;

694	           if(check_suitable_id(next_id))
695	               return next_id;

697	          count--;

699	       } while (count > 0);

701	       return ERROR;

703	   'table[]' could be initialized with random values, as indicated by
704	   the initialization code in pseudo-code above.  The function G()
705	   should be a cryptographic hash function.  It should use the same
706	   CONTEXT as F(), and a secret key value to compute a value between 0
707	   and (TABLE_LENGTH-1).

709	   The array 'table[]' assures that successive identifiers for a given
710	   context will be monotonically-increasing.  However, the increments
711	   space is separated into TABLE_LENGTH different spaces, and thus
712	   identifier reuse frequency will be (probabilistically) lower than
713	   that of the algorithm in Section 7.4.2.  That is, the generation of
714	   identifier for one given context will not necessarily result in
715	   increments in the identifiers for other contexts.

717	   It is interesting to note that the size of 'table[]' does not limit
718	   the number of different identifier sequences, but rather separates
719	   the *increments* into TABLE_LENGTH different spaces.  The identifier
720	   sequence will result from adding the corresponding entry of 'table[]'
721	   to the variable 'offset', which selects the actual identifier
722	   sequence (as in the algorithm from Section 7.4.2).

724	   An attacker can perform traffic analysis for any "increment space"
725	   (i.e., context) into which the attacker has "visibility" -- namely,
726	   the attacker can force a node to generate identifiers where G(offset)
727	   identifies the target "increment space".  However, the attacker's
728	   ability to perform traffic analysis is very reduced when compared to
729	   the predictable linear identifiers (described in Appendix A.1) and
730	   the hash-based identifiers (described in Section 7.4.2).
731	   Additionally, an implementation can further limit the attacker's
732	   ability to perform traffic analysis by further separating the
733	   increment space (that is, using a larger value for TABLE_LENGTH) and/
734	   or by randomizing the increments.

736	   Otherwise, this algorithm does not suffer from the issues discussed
737	   in Section 8.

739	8.  Common Vulnerabilities Associated with Transient Numeric Identifiers

741	8.1.  Network Activity Correlation

743	   An identifier that is predictable or stable within a given context
744	   allows for network activity correlation within that context.

746	   For example, a stable IPv6 Interface Identifier allows for network
747	   activity to be correlated for the context in which that address is
748	   stable [RFC7721].  A stable-per-network (as in [RFC7217] allows for
749	   network activity correlation within a network, whereas a constant
750	   IPv6 Interface Identifier allows not only network activity
751	   correlation within the same network, but also across networks ("host
752	   tracking").

754	   Predictable transient numeric identifiers can also allow for network
755	   activity correlation.  For example, a node that generates TCP ISNs
756	   with a global counter will typically allow network activity
757	   correlation even as it roams across networks, since the communicating
758	   nodes could infer the identity of the node based on the TCP ISNs
759	   employed for subsequent communication instances.  Similarly, a node
760	   that generates predictable IPv6 Fragment Identification values could
761	   be subject to network activity correlation (see e.g.
762	   [Bellovin2002]).

764	8.2.  Information Leakage

766	   Transient numeric identifiers that are not randomized can leak out
767	   information to other communicating nodes.  For example, it is common
768	   to generate identifiers like:

770	                ID = offset(CONTEXT_1) + linear(CONTEXT_2);

772	   This generic expression generates identifiers by adding a linear
773	   function to an offset.  The offset is constant within a given
774	   context, whereas linear() is a linear function for a given context
775	   (possibly different to that of offset()).  Identifiers generated with
776	   this expression will generally be predictable within CONTEXT_1.
777	   Thus, CONTEXT_1 essentially specifies e.g. the context within which
778	   network activity correlation is possible thanks to these identifiers.
779	   When CONTEXT_1 is "global" (e.g., offset() is simply a constant
780	   value), then all the corresponding transient numeric identifiers
781	   become predictable in all contexts.

783	      NOTE: If offset() has a global context and the specific value is
784	      known, the resulting identifiers may leak even more information.
785	      For example, the if Fragment Identification values are generated
786	      with the generic function above, and CONTEXT_1 is "global", then
787	      the corresponding identifiers will leak the number of fragmented
788	      datagrams sent for CONTEXT_2.  If both CONTEXT_1 and CONTEXT_2 are
789	      "global", then Fragment Identification values would be generated
790	      with a global counter (initialized to offset()), and thus each
791	      generated Fragment Identification value would leak the number of
792	      fragmented datagrams transmitted by the node since it was
793	      bootstrapped.

795	   On the other hand, linear() will be predictable within CONTEXT_2.
796	   The predictability of linear(), irrespective of the context and/or
797	   predictability of offset(), can leak out information that is of use
798	   to attackers.  For example, a node that selects ephemeral port
799	   numbers on as in:

801	                ehemeral_port = offset(Dest_IP) + linear()

803	   that is, with a per-destination offset, but global linear() function
804	   (e.g., a global counter), will leak information about the number of
805	   outgoing connections that have been issued between any two issued
806	   outgoing connections.

808	   Similarly, a node that generates Fragment Identification values as
809	   in:

811	                Frag_ID = offset(Srd_IP, Dst_IP) + linear()

813	   will leak out information about the number of fragmented packets that
814	   have been transmitted between any two other transmitted fragmented
815	   packets.  The vulnerabilities described in [Sanfilippo1998a],
816	   [Sanfilippo1998b], and [Sanfilippo1999] are all associated with the
817	   use of a global linear() function (i.e., a global CONTEXT_2).

819	8.3.  Exploitation of Semantics of Transient Numeric Identifiers

821	   Identifiers that are not semantically opaque tend to be more
822	   predictable than semantically-opaque identifiers.  For example, a MAC
823	   address contains an OUI (Organizationally-Unique Identifier) which
824	   identifies the vendor that manufactured the underlying network
825	   interface card.  This fact may be leveraged by an attacker meaning to
826	   "predict" MAC addresses, if he has some knowledge about the possible
827	   NIC vendor.

829	   [RFC7707] discusses a number of techniques to reduce the search space
830	   when performing IPv6 address-scanning attacks by leveraging the
831	   semantics of the IIDs produced by a number by traditional IID-
832	   generation algorithms (now replaced by [RFC8064] with [RFC7217]).

834	8.4.  Exploitation of Collisions of Transient Numeric Identifiers

836	   In many cases, th collision of transient network identifiers can have
837	   a hard failure severity (or result in a hard failure severity if an
838	   attacker can cause multiple collisions deterministically, one after
839	   another).  For example, predictable Fragment Identification values
840	   open the door to Denial of Service (DoS) attacks (see e.g.
841	   [RFC5722].  Similarly, predictable TCP ISNs open the door to trivial
842	   connection-reset and data injection attacks (see e.g.
843	   [Joncheray1995]).

845	9.  Vulnerability Analysis of Specific Transient Numeric Identifiers
846	    Categories

848	   The following subsections analyze common vulnerabilities associated
849	   with the generation of identifiers for each of the categories
850	   identified in Section 6.

852	9.1.  Category #1: Uniqueness (soft failure)

854	   Possible vulnerabilities associated with identifiers of this category
855	   are:

857	   o  Use of trivial algorithms (e.g. global counters) that generate
858	      predictable identifiers

860	   o  Use of flawed PRNGs (please see e.g.  [Zalewski2001],
861	      [Zalewski2002] and [Klein2007])

863	   Since the only interoperability requirement for these identifiers is
864	   uniqueness, the obvious approach to generate them is to employ a
865	   PRNG.  An implementer should consult [RFC4086] regarding randomness
866	   requirements for security, and consult relevant documentation when
867	   employing a PRNG provided by the underlying system.

869	   Use of algorithms other than PRNGs for generating identifiers of this
870	   category is discouraged.

872	9.2.  Category #2: Uniqueness (hard failure)

874	   As noted in Section 7.2 this category typically employs the same
875	   algorithms as Category #4, since a monotonically-increasing sequence
876	   tends to minimize the identifier reuse frequency.  Therefore, the
877	   vulnerability analysis of Section 9.4 applies to this case.

879	9.3.  Category #3: Uniqueness, constant within context (soft failure)

881	   There are two main vulnerabilities that may be associated with
882	   identifiers of this category:

884	   1.  Use algorithms or sources that result in predictable identifiers

886	   2.  Employing the same identifier across contexts in which constantcy
887	       is not required

889	   At times, an implementation or specification may be tempted to employ
890	   a source for the identifier which is known to provide unique values.
891	   However, while unique, the associated identifiers may have other
892	   properties such as being predictable or leaking information about the
893	   node in question.  For example, as noted in [RFC7721], embedding
894	   link-layer addresses for generating IPv6 IIDs not only results in
895	   predictable values, but also leaks information about the manufacturer
896	   of the network interface card.

898	   On the other hand, using an identifier across contexts where
899	   constantcy is not required can be leveraged for correlation of
900	   activities.  On of the most trivial examples of this is the use of
901	   IPv6 IIDs that are constant across networks (such as IIDs that embed
902	   the underlying link-layer address).

904	9.4.  Category #4: Uniqueness, monotonically increasing within context
905	      (hard failure)

907	   A simple way to generalize algorithms employed for generating
908	   identifiers of Category #4 would be as follows:

910	       /* Identifier selection function */
911	       count = max_id - min_id + 1;

913	       do {
914	           linear(CONTEXT_2)= linear(CONTEXT_2) + increment();
915	           next_id= offset(CONTEXT_1) + linear(CONTEXT_2);

917	           if(check_suitable_id(next_id))
918	              return next_id;

920	           count--;
921	       } while (count > 0);

923	       return ERROR;

925	   Essentially, an identifier (next_id) is generated by adding a linear
926	   function (linear()) to an offset value, which is unknown to the
927	   attacker, and constant for given context (CONTEXT_1).

929	   The following aspects of the algorithm should be considered:

931	   o  For the most part, it is the offset() function that results in
932	      identifiers that are unpredictable by an off-path attacker.  While
933	      the resulting sequence will be monotonically-increasing, the use
934	      of an offset value that is unknown to the attacker makes the
935	      resulting values unknown to the attacker.

937	   o  The most straightforward "stateless" implementation of offset
938	      would be that in which offset() is the result of a
939	      cryptographically-secure hash-function that takes the values that
940	      identify the context and a "secret_key" (not shown in the figure
941	      above) as arguments.

943	   o  Another possible (but stateful) approach would be to simply
944	      generate a random "per-context" offset and store it in memory, and
945	      then look-up the corresponding context when a new identifier is to
946	      be selected.  The algorithm in Section 7.4.1 is essentially an
947	      implementation of this type.

949	   o  The linear function is incremented according to increment().  In
950	      the most trivial case increment() could always return the constant
951	      "1".  But it could also possibly return small random integers such
952	      the increments are unpredictable.

954	   Considering the generic algorithm illustrated above we can identify
955	   the following possible vulnerabilities:

957	   o  If the offset value spans more than the necessary context,
958	      identifiers could be unnecessarily predictable by other parties,
959	      since the offset value would be unnecessarily leaked to them.  For
960	      example, an implementation that means to produce a per-destination
961	      counter but replaces offset() with a constant number (i.e.,
962	      employs a global counter), will unnecessarily result in
963	      predictable identifiers.

965	   o  The function linear() could be seen as representing the number of
966	      identifiers that have so far been generated for a given context
967	      (CONTEXT_2).  If linear() spans more than the necessary context,
968	      the "increments" could be leaked to other parties, thus disclosing
969	      information about the number of identifiers that have so far been
970	      generated.  For example, an implementation in which linear() is
971	      implemented as a single global counter will unnecessarily leak
972	      information the number of identifiers that have been produced.
973	      [Fyodor2004] is one example of how such information leakages can
974	      be exploited.

976	   o  increment() determines how the linear() is incremented for each
977	      identifier that is selected.  In the most trivial case,
978	      increment() will return the integer "1".  However, an
979	      implementation may have increment() return a "small" random
980	      integer value such that even if the current value employed by the
981	      generator is guessed (see Appendix A of [RFC7739]), the exact next
982	      identifier to be selected will be slightly harder to identify.

984	10.  IANA Considerations

986	   There are no IANA registries within this document.  The RFC-Editor
987	   can remove this section before publication of this document as an
988	   RFC.

990	11.  Security Considerations

992	   The entire document is about the security and privacy implications of
993	   identifiers.  [I-D.gont-numeric-ids-sec-considerations] formally
994	   requires protocols specifications to include an appropriate analysis
995	   of the interoperability, security, and privacy implications of the
996	   transient numeric identifiers they specify, while this document
997	   analyzes possible algorithms (and their implications) that could be
998	   employed to comply with the interoperability properties of a
999	   transient numeric identifier, while mitigating the possible security
1000	   and privacy implications.

1002	12.  Acknowledgements

1004	   The authors would like to thank (in alphabetical order) Steven
1005	   Bellovin, Joseph Lorenzo Hall, Gre Norcie, and Martin Thomson, for
1006	   providing valuable comments on earlier versions of this document.

1008	   The authors would like to thank Diego Armando Maradona for his magic
1009	   and inspiration.

1011	13.  References

1013	13.1.  Normative References

1015	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
1016	              RFC 793, DOI 10.17487/RFC0793, September 1981,
1017	              <https://www.rfc-editor.org/info/rfc793>.

1019	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1020	              Requirement Levels", BCP 14, RFC 2119,
1021	              DOI 10.17487/RFC2119, March 1997,
1022	              <https://www.rfc-editor.org/info/rfc2119>.

1024	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
1025	              (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
1026	              December 1998, <https://www.rfc-editor.org/info/rfc2460>.

1028	   [RFC4086]  Eastlake 3rd, D., Schiller, J., and S. Crocker,
1029	              "Randomness Requirements for Security", BCP 106, RFC 4086,
1030	              DOI 10.17487/RFC4086, June 2005,
1031	              <https://www.rfc-editor.org/info/rfc4086>.

1033	   [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
1034	              Architecture", RFC 4291, DOI 10.17487/RFC4291, February
1035	              2006, <https://www.rfc-editor.org/info/rfc4291>.

1037	   [RFC4862]  Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless
1038	              Address Autoconfiguration", RFC 4862,
1039	              DOI 10.17487/RFC4862, September 2007,
1040	              <https://www.rfc-editor.org/info/rfc4862>.

1042	   [RFC5722]  Krishnan, S., "Handling of Overlapping IPv6 Fragments",
1043	              RFC 5722, DOI 10.17487/RFC5722, December 2009,
1044	              <https://www.rfc-editor.org/info/rfc5722>.

1046	   [RFC6528]  Gont, F. and S. Bellovin, "Defending against Sequence
1047	              Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February
1048	              2012, <https://www.rfc-editor.org/info/rfc6528>.

1050	   [RFC7217]  Gont, F., "A Method for Generating Semantically Opaque
1051	              Interface Identifiers with IPv6 Stateless Address
1052	              Autoconfiguration (SLAAC)", RFC 7217,
1053	              DOI 10.17487/RFC7217, April 2014,
1054	              <https://www.rfc-editor.org/info/rfc7217>.

1056	   [RFC8064]  Gont, F., Cooper, A., Thaler, D., and W. Liu,
1057	              "Recommendation on Stable IPv6 Interface Identifiers",
1058	              RFC 8064, DOI 10.17487/RFC8064, February 2017,
1059	              <https://www.rfc-editor.org/info/rfc8064>.

1061	13.2.  Informative References

1063	   [Bellovin2002]
1064	              Bellovin, S., "A Technique for Counting NATted Hosts",
1065	              IMW'02 Nov. 6-8, 2002, Marseille, France, 2002.

1067	   [CPNI-TCP]
1068	              Gont, F., "Security Assessment of the Transmission Control
1069	              Protocol (TCP)",  United Kingdom's Centre for the
1070	              Protection of National Infrastructure (CPNI) Technical
1071	              Report, 2009, <https://www.gont.com.ar/papers/
1072	              tn-03-09-security-assessment-TCP.pdf>.

1074	   [FIPS-SHS]
1075	              FIPS, "Secure Hash Standard (SHS)",  Federal Information
1076	              Processing Standards Publication 180-4, March 2012,
1077	              <http://csrc.nist.gov/publications/fips/fips180-4/
1078	              fips-180-4.pdf>.

1080	   [Fyodor2004]
1081	              Fyodor, "Idle scanning and related IP ID games", 2004,
1082	              <http://www.insecure.org/nmap/idlescan.html>.

1084	   [I-D.gont-numeric-ids-history]
1085	              Gont, F. and I. Arce, "Unfortunate History of Transient
1086	              Numeric Identifiers", draft-gont-numeric-ids-history-05
1087	              (work in progress), July 2019.

1089	   [I-D.gont-numeric-ids-sec-considerations]
1090	              Gont, F. and I. Arce, "Security Considerations for
1091	              Transient Numeric Identifiers Employed in Network
1092	              Protocols", draft-gont-numeric-ids-sec-considerations-04
1093	              (work in progress), July 2019.

1095	   [Joncheray1995]
1096	              Joncheray, L., "A Simple Active Attack Against TCP", Proc.
1097	              Fifth Usenix UNIX Security Symposium, 1995.

1099	   [Klein2007]
1100	              Klein, A., "OpenBSD DNS Cache Poisoning and Multiple O/S
1101	              Predictable IP ID Vulnerability", 2007,
1102	              <http://www.trusteer.com/files/OpenBSD_DNS_Cache_Poisoning
1103	              _and_Multiple_OS_Predictable_IP_ID_Vulnerability.pdf>.

1105	   [Morris1985]
1106	              Morris, R., "A Weakness in the 4.2BSD UNIX TCP/IP
1107	              Software", CSTR 117, AT&T Bell Laboratories, Murray Hill,
1108	              NJ, 1985,
1109	              <https://pdos.csail.mit.edu/~rtm/papers/117.pdf>.

1111	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
1112	              specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
1113	              November 1987, <https://www.rfc-editor.org/info/rfc1035>.

1115	   [RFC1321]  Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,
1116	              DOI 10.17487/RFC1321, April 1992,
1117	              <https://www.rfc-editor.org/info/rfc1321>.

1119	   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
1120	              Errors at High Data Rates", RFC 4963,
1121	              DOI 10.17487/RFC4963, July 2007,
1122	              <https://www.rfc-editor.org/info/rfc4963>.

1124	   [RFC6056]  Larsen, M. and F. Gont, "Recommendations for Transport-
1125	              Protocol Port Randomization", BCP 156, RFC 6056,
1126	              DOI 10.17487/RFC6056, January 2011,
1127	              <https://www.rfc-editor.org/info/rfc6056>.

1129	   [RFC6151]  Turner, S. and L. Chen, "Updated Security Considerations
1130	              for the MD5 Message-Digest and the HMAC-MD5 Algorithms",
1131	              RFC 6151, DOI 10.17487/RFC6151, March 2011,
1132	              <https://www.rfc-editor.org/info/rfc6151>.

1134	   [RFC7098]  Carpenter, B., Jiang, S., and W. Tarreau, "Using the IPv6
1135	              Flow Label for Load Balancing in Server Farms", RFC 7098,
1136	              DOI 10.17487/RFC7098, January 2014,
1137	              <https://www.rfc-editor.org/info/rfc7098>.

1139	   [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
1140	              Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May
1141	              2014, <https://www.rfc-editor.org/info/rfc7258>.

1143	   [RFC7707]  Gont, F. and T. Chown, "Network Reconnaissance in IPv6
1144	              Networks", RFC 7707, DOI 10.17487/RFC7707, March 2016,
1145	              <https://www.rfc-editor.org/info/rfc7707>.

1147	   [RFC7721]  Cooper, A., Gont, F., and D. Thaler, "Security and Privacy
1148	              Considerations for IPv6 Address Generation Mechanisms",
1149	              RFC 7721, DOI 10.17487/RFC7721, March 2016,
1150	              <https://www.rfc-editor.org/info/rfc7721>.

1152	   [RFC7739]  Gont, F., "Security Implications of Predictable Fragment
1153	              Identification Values", RFC 7739, DOI 10.17487/RFC7739,
1154	              February 2016, <https://www.rfc-editor.org/info/rfc7739>.

1156	   [Sanfilippo1998a]
1157	              Sanfilippo, S., "about the ip header id", Post to Bugtraq
1158	              mailing-list, Mon Dec 14 1998,
1159	              <http://seclists.org/bugtraq/1998/Dec/48>.

1161	   [Sanfilippo1998b]
1162	              Sanfilippo, S., "Idle scan", Post to Bugtraq mailing-list,
1163	              1998, <http://www.kyuzz.org/antirez/papers/dumbscan.html>.

1165	   [Sanfilippo1999]
1166	              Sanfilippo, S., "more ip id", Post to Bugtraq mailing-
1167	              list, 1999,
1168	              <http://www.kyuzz.org/antirez/papers/moreipid.html>.

1170	   [Shimomura1995]
1171	              Shimomura, T., "Technical details of the attack described
1172	              by Markoff in NYT", Message posted in USENET's
1173	              comp.security.misc newsgroup  Message-ID:
1174	              <3g5gkl$5j1@ariel.sdsc.edu>, 1995,
1175	              <http://www.gont.com.ar/docs/post-shimomura-usenet.txt>.

1177	   [Silbersack2005]
1178	              Silbersack, M., "Improving TCP/IP security through
1179	              randomization without sacrificing interoperability",
1180	              EuroBSDCon 2005 Conference, 2005,
1181	              <http://citeseerx.ist.psu.edu/viewdoc/
1182	              download?doi=10.1.1.91.4542&rep=rep1&type=pdf>.

1184	   [Zalewski2001]
1185	              Zalewski, M., "Strange Attractors and TCP/IP Sequence
1186	              Number Analysis", 2001,
1187	              <http://lcamtuf.coredump.cx/oldtcp/tcpseq.html>.

1189	   [Zalewski2002]
1190	              Zalewski, M., "Strange Attractors and TCP/IP Sequence
1191	              Number Analysis - One Year Later", 2001,
1192	              <http://lcamtuf.coredump.cx/newtcp/>.

1194	Appendix A.  Flawed Algorithms

1196	   The following subsections document algorithms with known negative
1197	   security and privacy implications.

1199	A.1.  Predictable Linear Identifiers Algorithm

1201	   One of the most trivial ways to achieve uniqueness with a low
1202	   identifier reuse frequency is to produce a linear sequence.

1204	   For example, the following algorithm has been employed (see e.g.
1205	   [Morris1985], [Shimomura1995], [Silbersack2005] and [CPNI-TCP]) in a
1206	   number of operating systems for selecting IP fragment IDs, TCP
1207	   ephemeral ports, etc.:

1209	       /* Initialization at system boot time. Could be random */
1210	       next_id = min_id;
1211	       id_inc= 1;

1213	       /* Identifier selection function */
1214	       count = max_id - min_id + 1;

1216	       do {
1217	           if (next_id == max_id) {
1218	               next_id = min_id;
1219	           }
1220	           else {
1221	               next_id = next_id + id_inc;
1222	           }

1224	           if (check_suitable_id(next_id))
1225	               return next_id;

1227	           count--;

1229	       } while (count > 0);

1231	       return ERROR;

1233	   Note:
1234	      check_suitable_id() is a function that checks whether the
1235	      resulting identifier is acceptable (e.g., whether its in use,
1236	      etc.).

1238	   For obvious reasons, this algorithm results in predicable sequences.
1239	   If a global counter is used (such as "next_id" in the example above),
1240	   a node that learns one protocol identifier can also learn or guess
1241	   values employed by past and future protocol instances.  On the other
1242	   hand, when the value of increments is known (such as "1" in this
1243	   case), an attacker can sample two values, and learn the number of
1244	   identifiers that were generated in-between.

1246	   Where identifier reuse would lead to a hard failure, one typical
1247	   approach to generate unique identifiers (while minimizing the
1248	   security and privacy implications of predictable identifiers) is to
1249	   obfuscate the resulting protocol IDs by either:

1251	   o  Replace the global counter with multiple counters (initialized to
1252	      a random value)

1254	   o  Randomizing the "increments"

1256	   Avoiding global counters essentially means that learning one
1257	   identifier for a given context (e.g., one TCP ephemeral port for a
1258	   given {src IP, Dst IP, Dst Port}) is of no use for learning or
1259	   guessing identifiers for a different context (e.g., TCP ephemeral
1260	   ports that involve other peers).  However, this may imply keeping one
1261	   additional variable/counter per context, which may be prohibitive in
1262	   some environments.  The choice of id_inc has implications on both the
1263	   security and privacy properties of the resulting identifiers, but
1264	   also on the corresponding interoperability properties.  On one hand,
1265	   minimizing the increments (as in "id_inc = 1" in our case) generally
1266	   minimizes the identifier reuse frequency, albeit at increased
1267	   predictability.  On the other hand, if the increments are randomized,
1268	   predictability of the resulting identifiers is reduced, and the
1269	   information leakage produced by global constant increments is
1270	   mitigated.  However, using larger increments than necessary can
1271	   result in an increased identifier reuse frequency.

1273	A.2.  Random-Increments Algorithm

1275	   This algorithm offers a middle ground between the algorithms that
1276	   select ephemeral ports randomly (such as those described in
1277	   Section 7.1.1 and Section 7.1.2), and those that offer obfuscation
1278	   but no randomization (such as those described in Section 7.4.2 and
1279	   Section 7.4.3).

1281	       /* Initialization code at system boot time. */
1282	       next_id = random();        /* Initialization value */
1283	       id_inc = 500;        /* Determines the trade-off */

1285	       /* Identifier selection function */
1286	       id_range = max_id - min_id + 1;

1288	       count = id_range;

1290	       do {
1291	           /* Random increment */
1292	           next_id = next_id + (random() % id_inc) + 1;

1294	           /* Keep the identifier within acceptable range */
1295	           next_id = min_id + (next_id % id_range);

1297	           if(check_suitable_id(next_id))
1298	              return next_id;

1300	           count--;
1301	       } while (count > 0);

1303	       return ERROR;

1305	   This algorithm aims at producing a monotonically increasing sequence
1306	   of identifiers, while avoiding the use of fixed increments, which
1307	   would lead to trivially predictable sequences.  The value "id_inc"
1308	   allows for direct control of the trade-off between the level of
1309	   obfuscation and the ID reuse frequency.  The smaller the value of
1310	   "id_inc", the more similar this algorithm is to a predicable, global
1311	   monotonically-increasing ID generation algorithm.  The larger the
1312	   value of "id_inc", the more similar this algorithm is to the
1313	   algorithm described in Section 7.1.1 of this document.

1315	   When the identifiers wrap, there is the risk of collisions of
1316	   identifiers (i.e., identifier reuse).  Therefore, "id_inc" should be
1317	   selected according to the following criteria:

1319	   o  It should maximize the wrapping time of the identifier space.

1321	   o  It should minimize identifier reuse frequency.

1323	   o  It should maximize obfuscation.

1325	   Clearly, these are competing goals, and the decision of which value
1326	   of "id_inc" to use is a trade-off.  Therefore, the value of "id_inc"
1327	   should be configurable so that system administrators can make the
1328	   trade-off for themselves.

1330	Authors' Addresses

1332	   Fernando Gont
1333	   SI6 Networks
1334	   Evaristo Carriego 2644
1335	   Haedo, Provincia de Buenos Aires  1706
1336	   Argentina

1338	   Phone: +54 11 4650 8472
1339	   Email: fgont@si6networks.com
1340	   URI:   https://www.si6networks.com

1342	   Ivan Arce
1343	   Quarkslab

1345	   Email: iarce@quarkslab.com
1346	   URI:   https://www.quarkslab.com