idnits 2.17.1 draft-irtf-pearg-numeric-ids-generation-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (May 14, 2020) is 1436 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 6528 (Obsoleted by RFC 9293) == Outdated reference: A later version (-11) exists of draft-gont-numeric-ids-sec-considerations-04 == Outdated reference: A later version (-11) exists of draft-irtf-pearg-numeric-ids-history-02 Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Research Task Force (IRTF) F. Gont 3 Internet-Draft SI6 Networks 4 Intended status: Informational I. Arce 5 Expires: November 15, 2020 Quarkslab 6 May 14, 2020 8 On the Generation of Transient Numeric Identifiers 9 draft-irtf-pearg-numeric-ids-generation-02 11 Abstract 13 This document performs an analysis of the security and privacy 14 implications of different types of "numeric identifiers" used in IETF 15 protocols, and tries to categorize them based on their 16 interoperability requirements and the associated failure severity 17 when such requirements are not met. Subsequently, it provides advice 18 on possible algorithms that could be employed to satisfy the 19 interoperability requirements of each identifier category, while 20 minimizing the security and privacy implications, thus providing 21 guidance to protocol designers and protocol implementers. Finally, 22 this describes a number of algorithms that have been employed in real 23 implementations to generate transient numeric identifiers and 24 analyzes their security and privacy properties. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on November 15, 2020. 43 Copyright Notice 45 Copyright (c) 2020 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. Threat Model . . . . . . . . . . . . . . . . . . . . . . . . 5 60 4. Issues with the Specification of Identifiers . . . . . . . . 5 61 5. Protocol Failure Severity . . . . . . . . . . . . . . . . . . 6 62 6. Categorizing Identifiers . . . . . . . . . . . . . . . . . . 6 63 7. Common Algorithms for Transient Numeric Identifier Generation 9 64 7.1. Category #1: Uniqueness (soft failure) . . . . . . . . . 9 65 7.2. Category #2: Uniqueness (hard failure) . . . . . . . . . 11 66 7.3. Category #3: Uniqueness, stable within context (soft 67 failure) . . . . . . . . . . . . . . . . . . . . . . . . 12 68 7.4. Category #4: Uniqueness, monotonically increasing within 69 context (hard failure) . . . . . . . . . . . . . . . . . 13 70 8. Common Vulnerabilities Associated with Transient Numeric 71 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . 19 72 8.1. Network Activity Correlation . . . . . . . . . . . . . . 19 73 8.2. Information Leakage . . . . . . . . . . . . . . . . . . . 20 74 8.3. Exploitation of Semantics of Transient Numeric 75 Identifiers . . . . . . . . . . . . . . . . . . . . . . . 21 76 8.4. Exploitation of Collisions of Transient Numeric 77 Identifiers . . . . . . . . . . . . . . . . . . . . . . . 21 78 8.5. Cryptanalysis . . . . . . . . . . . . . . . . . . . . . . 21 79 9. Vulnerability Analysis of Specific Transient Numeric 80 Identifiers Categories . . . . . . . . . . . . . . . . . . . 22 81 9.1. Category #1: Uniqueness (soft failure) . . . . . . . . . 22 82 9.2. Category #2: Uniqueness (hard failure) . . . . . . . . . 23 83 9.3. Category #3: Uniqueness, stable within context (soft 84 failure) . . . . . . . . . . . . . . . . . . . . . . . . 23 85 9.4. Category #4: Uniqueness, monotonically increasing within 86 context (hard failure) . . . . . . . . . . . . . . . . . 23 87 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 88 11. Security Considerations . . . . . . . . . . . . . . . . . . . 25 89 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 90 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 91 13.1. Normative References . . . . . . . . . . . . . . . . . . 26 92 13.2. Informative References . . . . . . . . . . . . . . . . . 27 93 Appendix A. Flawed Algorithms . . . . . . . . . . . . . . . . . 30 94 A.1. Predictable Linear Identifiers Algorithm . . . . . . . . 30 95 A.2. Random-Increments Algorithm . . . . . . . . . . . . . . . 32 97 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 99 1. Introduction 101 Network protocols employ a variety of numeric identifiers for 102 different protocol entities, ranging from DNS Transaction IDs (TxIDs) 103 to transport protocol ports (e.g. TCP ports) or IPv6 Interface 104 Identifiers (IIDs). These identifiers usually have specific 105 properties (e.g. uniqueness during a specified period of time) that 106 must be satisfied such that they do not result in negative 107 interoperability implications, and an associated failure severity 108 when such properties are not met, ranging from soft to hard failures. 110 For more than 30 years, a large number of implementations of the TCP/ 111 IP protocol suite have been subject to a variety of attacks, with 112 effects ranging from Denial of Service (DoS) or data injection, to 113 information leakages that could be exploited for pervasive monitoring 114 [RFC7258]. The root cause of these issues has been, in many cases, 115 the poor selection of transient numeric identifiers in such 116 protocols, usually as a result of insufficient or misleading 117 specifications. While it is generally trivial to identify an 118 algorithm that can satisfy the interoperability requirements of a 119 given identifier, empirical evidence exists that doing so without 120 negatively affecting the security and/or privacy properties of the 121 aforementioned protocols is prone to error 122 [I-D.irtf-pearg-numeric-ids-history]. 124 For example, implementations have been subject to security and/or 125 privacy issues resulting from: 127 o Predictable TCP Initial Sequence Numbers (ISNs) 129 o Predictable transport protocol ephemeral port numbers 131 o Predictable IPv4 or IPv6 Fragment Identifiers (Fragment IDs) 133 o Predictable IPv6 Interface Identifiers (IIDs) 135 o Predictable DNS Transaction Identifiers (TxIDs) 137 Recent history indicates that when new protocols are standardized or 138 new protocol implementations are produced, the security and privacy 139 properties of the associated identifiers tend to be overlooked, and 140 inappropriate algorithms to generate transient numeric identifiers 141 are either suggested in the specification or selected by 142 implementers. As a result, it should be evident that advice in this 143 area is warranted. 145 This document contains a non-exhaustive survey of identifiers 146 employed in various IETF protocols, and aims to categorize such 147 identifiers based on their interoperability requirements, and the 148 associated failure severity when such requirements are not met. 149 Subsequently, it provides advice on possible algorithms that could be 150 employed to satisfy the interoperability requirements of each 151 category, while minimizing the associated security and privacy 152 implications. Finally, it analyzes several algorithms that have been 153 employed in real implementations to meet such requirements and 154 analyzes their security and privacy properties. 156 2. Terminology 158 Transient Numeric Identifier: 159 A data object in a protocol specification that can be used to 160 definitely distinguish a protocol object (a datagram, network 161 interface, transport protocol endpoint, session, etc) from all 162 other objects of the same type, in a given context. Transient 163 numeric identifiers are usually defined as a series of bits, and 164 represented using integer values. These identifiers are typically 165 dynamically selected, as opposed to statically-assigned numeric 166 identifiers (see e.g. [IANA-PROT]). We note that different 167 identifiers may have additional requirements or properties 168 depending on their specific use in a protocol. We use the term 169 "transient numeric identifier" (or simply "numeric identifier" or 170 "identifier" as short forms) as a generic term to refer to any 171 data object in a protocol specification that satisfies the 172 identification property stated above. 174 Failure Severity: 175 The consequences of a failure to comply with the interoperability 176 requirements of a given identifier. Severity considers the worst 177 potential consequence of a failure, determined by the system 178 damage and/or time lost to repair the failure. In this document 179 we define two types of failure severity: "soft failure" and "hard 180 failure". 182 Soft Failure: 183 A soft failure is a recoverable condition in which a protocol does 184 not operate in the prescribed manner but normal operation can be 185 resumed automatically in a short period of time. For example, a 186 simple packet-loss event that is subsequently recovered with a 187 retransmission can be considered a soft failure. 189 Hard Failure: 190 A hard failure is a non-recoverable condition in which a protocol 191 does not operate in the prescribed manner or it operates with 192 excessive degradation of service. For example, an established TCP 193 connection that is aborted due to an error condition constitutes, 194 from the point of view of the transport protocol, a hard failure, 195 since it enters a state from which normal operation cannot be 196 resumed. 198 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 199 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 200 document are to be interpreted as described in RFC 2119 [RFC2119]. 202 3. Threat Model 204 Throughout this document, we assume an attacker does not have 205 physical or logical access to the device(s) being attacked. We 206 assume the attacker can simply send any traffic to the target 207 device(s), to e.g. sample identifiers employed by such device(s). 209 4. Issues with the Specification of Identifiers 211 While assessing protocol specifications regarding the use of 212 identifiers, we found that most of the issues discussed in this 213 document arise as a result of one of the following conditions: 215 o Protocol specifications which under-specify the requirements for 216 their identifiers 218 o Protocol specifications that over-specify their identifiers 220 o Protocol implementations that simply fail to comply with the 221 specified requirements 223 A number of protocol specifications (too many of them) have simply 224 overlooked the security and privacy implications of transient numeric 225 identifiers [I-D.irtf-pearg-numeric-ids-history]. Examples of them 226 are the specification of TCP port numbers in [RFC0793], the 227 specification of TCP sequence numbers in [RFC0793], or the 228 specification of the DNS TxID in [RFC1035]. 230 On the other hand, there are a number of protocol specifications that 231 over-specify some of their associated transient numeric identifiers. 232 For example, [RFC4291] essentially overloads the semantics of IPv6 233 Interface Identifiers (IIDs) by embedding link-layer addresses in the 234 IPv6 IIDs, when the interoperability requirement of uniqueness could 235 be achieved in other ways that do not result in negative security and 236 privacy implications [RFC7721]. Similarly, [RFC2460] suggested the 237 use of a global counter for the generation of Fragment Identification 238 values, when the interoperability properties of uniqueness per {Src 239 IP, Dst IP} could be achieved with other algorithms that do not 240 result in negative security and privacy implications [RFC7739]. 242 Finally, there are protocol implementations that simply fail to 243 comply with existing protocol specifications. For example, some 244 popular operating systems (notably Microsoft Windows) still fail to 245 implement transport protocol ephemeral port randomization, as 246 recommended in [RFC6056]. 248 5. Protocol Failure Severity 250 Section 2 defines the concept of "Failure Severity", along with two 251 types of failure severities that we employ throughout this document: 252 soft and hard. 254 Our analysis of the severity of a failure is performed from the point 255 of view of the protocol in question. However, the corresponding 256 severity on the upper application or protocol may not be the same as 257 that of the protocol in question. For example, a TCP connection that 258 is aborted may or may not result in a hard failure of the upper 259 application protocol: if the upper application can establish a new 260 TCP connection without any impact on the application, a hard failure 261 at the TCP protocol may have no severity at the application level. 262 On the other hand, if a hard failure of a TCP connection results in 263 excessive degradation of service at the application layer, it will 264 also result in a hard failure at the application. 266 6. Categorizing Identifiers 268 This section includes a non-exhaustive survey of transient numeric 269 identifiers, and proposes a number of categories that can accommodate 270 these identifiers based on their interoperability requirements and 271 their failure modes (soft or hard) 272 +--------------+------------------------------------+---------------+ 273 | Identifier | Interoperability Requirements | Failure | 274 | | | Severity | 275 +--------------+------------------------------------+---------------+ 276 | IPv6 Frag ID | Uniqueness (for IP address pair) | Soft/Hard (1) | 277 +--------------+------------------------------------+---------------+ 278 | IPv6 IID | Uniqueness (and stable within IPv6 | Soft (3) | 279 | | prefix) (2) | | 280 +--------------+------------------------------------+---------------+ 281 | TCP ISN | Monotonically-increasing | Hard (4) | 282 +--------------+------------------------------------+---------------+ 283 | TCP eph. | Uniqueness (for connection ID) | Hard | 284 | port | | | 285 +--------------+------------------------------------+---------------+ 286 | IPv6 Flow | Uniqueness | None (5) | 287 | Label | | | 288 +--------------+------------------------------------+---------------+ 289 | DNS TxID | Uniqueness | None (6) | 290 +--------------+------------------------------------+---------------+ 292 Table 1: Survey of Identifiers 294 Notes: 296 (1) 297 While a single collision of Fragment ID values would simply lead 298 to a single packet drop (and hence a "soft" failure), repeated 299 collisions at high data rates might trash the Fragment ID space, 300 leading to a hard failure [RFC4963]. 302 (2) 303 While the interoperability requirements are simply that the 304 Interface ID results in a unique IPv6 address, for operational 305 reasons it is typically desirable that the resulting IPv6 address 306 (and hence the corresponding Interface ID) be stable within each 307 network [RFC7217] [RFC8064]. 309 (3) 310 While IPv6 Interface IDs must result in unique IPv6 addresses, 311 IPv6 Duplicate Address Detection (DAD) [RFC4862] allows for the 312 detection of duplicate addresses, and hence such Interface ID 313 collisions can be recovered. 315 (4) 316 In theory, there are no interoperability requirements for TCP 317 Initial Sequence Numbers (ISNs), since the TIME-WAIT state and 318 TCP's "quiet time" concept take care of old segments from previous 319 incarnations of the connection. However, a widespread 320 optimization allows for a new incarnation of a previous connection 321 to be created if the ISN of the incoming SYN is larger than the 322 last sequence number seen in that direction for the previous 323 incarnation of the connection. Thus, monotonically-increasing TCP 324 sequence numbers allow for such optimization to work as expected 325 [RFC6528], since otherwise such connections-establishment attempts 326 would fail. 328 (5) 329 The IPv6 Flow Label is typically employed for load sharing 330 [RFC7098], along with the Source and Destination IPv6 addresses. 331 Reuse of a Flow Label value for the same set {Source Address, 332 Destination Address} would typically cause both flows to be 333 multiplexed onto the same link. However, as long as this does not 334 occur deterministically, it will not result in any negative 335 implications. 337 (6) 338 DNS TxIDs are employed, together with the Source Address, 339 Destination Address, Source Port, and Destination Port, to match 340 DNS requests and responses. However, since an implementation 341 knows which DNS requests were sent for that set of {Source 342 Address, Destination Address, Source Port, and Destination Port, 343 DNS TxID}, a collision of TxID would result, if anything, in a 344 small performance penalty (the response would nevertheless be 345 discarded when it is found that it does not answer the query sent 346 in the corresponding DNS query). 348 Based on the survey above, we can categorize identifiers as follows: 350 +-----+---------------------------------------+---------------------+ 351 | Cat | Category | Sample Proto IDs | 352 | # | | | 353 +-----+---------------------------------------+---------------------+ 354 | 1 | Uniqueness (soft failure) | IPv6 Flow L., DNS | 355 | | | TxIDs | 356 +-----+---------------------------------------+---------------------+ 357 | 2 | Uniqueness (hard failure) | IPv6 Frag ID, TCP | 358 | | | ephemeral port | 359 +-----+---------------------------------------+---------------------+ 360 | 3 | Uniqueness, stable within context | IPv6 IIDs | 361 | | (soft failure) | | 362 +-----+---------------------------------------+---------------------+ 363 | 4 | Uniqueness, monotonically increasing | TCP ISN | 364 | | within context (hard failure) | | 365 +-----+---------------------------------------+---------------------+ 367 Table 2: Identifier Categories 369 We note that Category #4 could be considered a generalized case of 370 category #3, in which a monotonically increasing element is added to 371 a stable (within context) element, such that the resulting 372 identifiers are monotonically increasing within a specified context. 373 That is, the same algorithm could be employed for both #3 and #4, 374 given appropriate parameters. 376 7. Common Algorithms for Transient Numeric Identifier Generation 378 The following subsections describe some sample algorithms that can be 379 employed for generating transient numeric identifiers for each of the 380 categories above. 382 7.1. Category #1: Uniqueness (soft failure) 384 The requirement of uniqueness with a soft failure mode can be 385 complied with a Pseudo-Random Number Generator (PRNG). In scenarios 386 where ongoing use of previously selected numeric IDs is possible and 387 desirable, an implementation may opt to select the next available 388 identifier in the same sequence, or select another random number. 389 Section 7.1.1 is an implementation of the former strategy, while 390 Section 7.1.2 is an implementation of the later. 392 We note that since the premise is that collisions of numeric 393 identifiers of this category only leads to soft failures, in many (if 394 not most) cases, the algorithm will not need to check the suitability 395 of a selected identifier (i.e., check_suitable_id() would always be 396 "true"). 398 7.1.1. Simple Randomization Algorithm 399 /* Numeric ID selection function */ 401 id_range = max_id - min_id + 1; 402 next_id = min_id + (random() % id_range); 403 count = next_id; 405 do { 406 if(check_suitable_id(next_id)) 407 return next_id; 409 if (next_id == max_id) { 410 next_id = min_id; 411 } else { 412 next_id++; 413 } 415 count--; 416 } while (count > 0); 418 return ERROR; 420 NOTE: 421 random() is a function that returns a pseudo-random unsigned 422 integer number of appropriate size. Note that the output needs to 423 be unpredictable, and typical implementations of the POSIX 424 random() function do not necessarily meet this requirement. See 425 [RFC4086] for randomness requirements for security. Beware that 426 that "adapting" the length of the output of random() with a modulo 427 operator (e.g., C language's "%") may change the distribution of 428 the PRNG. 430 The function check_suitable_id() can check, when possible and 431 desirable, whether this identifier is suitable (e.g. it is not 432 already in use). Depending on how/where the numeric identifier is 433 used, it may or may not be possible (or even desirable) to check 434 whether the numeric identifier is in use (or whether it has been 435 recently been employed). When an identifier is found to be 436 unsuitable, this algorithm selects the next available numeric 437 identifier in sequence. 439 All the variables (in this and all the algorithms discussed in 440 this document) are unsigned integers. 442 This algorithm does not suffer from any of the issues discussed in 443 Section 8. 445 7.1.2. Another Simple Randomization Algorithm 447 The following pseudo-code illustrates another algorithm for selecting 448 a random numeric identifier which, in the event a selected identifier 449 is found to be unsuitable (e.g., already in use), another identifier 450 is randomly selected: 452 /* Numeric ID selection function */ 454 id_range = max_id - min_id + 1; 455 next_id = min_id + (random() % id_range); 456 count = id_range; 458 do { 459 if(check_suitable_id(next_id)) 460 return next_id; 462 next_id = min_id + (random() % id_range); 463 count--; 464 } while (count > 0); 466 return ERROR; 468 This algorithm might be unable to select an identifier (i.e., return 469 "ERROR") even if there are suitable identifiers available, in cases 470 where a large number of identifiers are unsuitable (e.g. "in use"). 472 The same considerations from Section 7.1.1 with respect to the 473 properties of random() and the adaptation of its output length apply 474 to this algorithm. 476 This algorithm does not suffer from any of the issues discussed in 477 Section 8. 479 7.2. Category #2: Uniqueness (hard failure) 481 One of the most trivial approaches for achieving uniqueness for an 482 identifier (with a hard failure mode) is to reduce the identifier 483 reuse frequency by generating the numeric identifiers with a linear 484 function. As a result, all of the algorithms described in 485 Section 7.4 ("Category #4: Uniqueness, monotonically increasing 486 within context (hard failure)") can be readily employed for complying 487 with the requirements of this numeric identifier category. 489 7.3. Category #3: Uniqueness, stable within context (soft failure) 491 The goal of the following algorithm is to produce identifiers that 492 are stable for a given context (identified by "CONTEXT"), but that 493 change when the aforementioned context changes. 495 In order to avoid storing in memory the numeric identifier computed 496 for each CONTEXT value, the following algorithm employs a calculated 497 technique (as opposed to keeping state in memory) to generate a 498 stable identifier for each given context. 500 /* Numeric ID selection function */ 502 id_range = max_id - min_id + 1; 504 counter = 0; 506 do { 507 offset = F(CONTEXT, counter, secret_key); 508 next_id = min_id + (offset % id_range); 510 if(check_suitable_id(next_id)) 511 return next_id; 513 counter++; 515 } while (counter <= MAX_RETRIES); 517 return ERROR; 519 In the following algorithm, the function F() provides a stateless and 520 stable per-CONTEXT numeric identifier, where CONTEXT is the 521 concatenation of all the elements that define the given context. 523 For example, if this algorithm is expected to produce IPv6 IIDs 524 that are unique per network interface card (NIC) and SLAAC 525 autoconfiguration prefix, the CONTEXT should be the concatenation 526 of e.g. the interface index and the SLAAC autoconfiguration prefix 527 (please see [RFC7217] for an implementation of this algorithm for 528 generation of stable IPv6 IIDs). 530 F() must be a cryptographically-secure hash function (e.g. SHA-256 531 [FIPS-SHS]), that is computed over the concatenation of its 532 arguments. The result of F() is no more secure than the secret key, 533 and therefore 'secret_key' must be unknown to the attacker, and must 534 be of a reasonable length. 'secret_key' must remain stable for a 535 given CONTEXT, since otherwise the numeric identifiers generated by 536 this algorithm would not have the desired stability properties (i.e., 537 stable for a given CONTEXT). In most cases, 'secret_key' can be 538 selected with a PRNG (see [RFC4086] for recommendations on choosing 539 secrets) at an appropriate time, and stored in stable or volatile 540 storage for future use. 542 The result of F() is stored in the variable 'offset', which may take 543 any value within the storage type range, since we are restricting the 544 resulting identifier to be in the range [min_id, max_id] in a similar 545 way as in the algorithm described in Section 7.1.1. 547 check_suitable_id() checks that the candidate identifier has suitable 548 uniqueness properties. Collisions (i.e., an identifier that is not 549 unique) are recovered by incrementing the 'counter' variable and 550 recomputing F(). 552 For obvious reasons, the transient network identifiers generated with 553 this algorithm allow for network activity correlation within 554 "CONTEXT". However, this is essentially a design goal of this 555 category of transient numeric identifiers. 557 7.4. Category #4: Uniqueness, monotonically increasing within context 558 (hard failure) 560 7.4.1. Per-context Counter Algorithm 562 One possible way to achieve low identifier reuse frequency while 563 still avoiding predictable sequences would be to employ a per-context 564 counter, as opposed to a global counter. Such an algorithm could be 565 described as follows: 567 /* Initialization code */ 568 id_inc = 1; 570 /* Numeric ID selection function */ 572 count = max_id - min_id + 1; 574 if(lookup_counter(CONTEXT) == ERROR){ 575 create_counter(CONTEXT); 576 } 578 next_id = lookup_counter(CONTEXT); 580 do { 581 if (next_id == max_id) { 582 next_id = min_id; 583 } 584 else { 585 next_id = next_id + id_inc; 586 } 588 if (check_suitable_id(next_id)){ 589 store_counter(CONTEXT, next_id); 590 return next_id; 591 } 593 count--; 595 } while (count > 0); 597 store_counter(CONTEXT, next_id); 598 return ERROR; 600 NOTE: 601 lookup_counter() returns the current counter for a given context, 602 or an error condition if such a counter does not exist. 604 create_counter() creates a counter for a given context, and 605 initializes such counter to a random value. 607 store_counter() saves (updates) the current counter for a given 608 context. 610 check_suitable_id() is a function that checks whether the 611 resulting identifier is acceptable (e.g., whether it is not 612 already in use, etc.). 614 Essentially, whenever a new identifier is to be selected, the 615 algorithm checks whether there there is a counter for the 616 corresponding context. If there is, such counter is incremented to 617 obtain the new identifier, and the new identifier updates the 618 corresponding counter. If there is no counter for such context, a 619 new counter is created an initialized to a random value, and used as 620 the new identifier. This algorithm produces a per-context counter, 621 which results in one linear function for each context. Since each 622 counter is initialized to a random value, the resulting values are 623 unpredictable by an off-path attacker. 625 This algorithm has the following drawbacks: 627 o This algorithm requires an implementation to store each per- 628 CONTEXT counter in memory. If, as a result of resource 629 management, the counter for a given context must be removed, the 630 last identifier value used for that context will be lost. Thus, 631 if subsequently an identifier needs to be generated for the same 632 context, that counter will need to be recreated and reinitialized 633 to random value, thus possibly leading to reuse/collision of 634 numeric identifiers. 636 o An implementation may map more than one context to the same 637 counter, such the amount of memory required to store counters is 638 reduce, at the expense of a possible unnecessary increase in the 639 numeric identifier reuse frequency. In such cases, if the 640 identifiers are predictable by the destination system (e.g., the 641 destination host represents the "context"), a vulnerable host 642 might possibly leak to third parties the identifiers used by other 643 hosts to send traffic to it (i.e., a vulnerable Host B could leak 644 to Host C the identifier values that Host A is using to send 645 packets to Host B). Appendix A of [RFC7739] describes one 646 possible scenario for such leakage in detail. 648 Otherwise, the identifiers produced by this algorithm do not suffer 649 from the other issues discussed in Section 8. 651 7.4.2. Simple Hash-Based Algorithm 653 The goal of this algorithm is to produce monotonically-increasing 654 sequences, with a randomized initial value, for each given context. 655 For example, if the identifiers being generated must be unique for 656 each {src IP, dst IP} set, then each possible combination of {src IP, 657 dst IP} should have a corresponding "next_id" value. 659 Keeping one counter for each possible "context" may in many cases be 660 considered too onerous in terms of memory requirements. As a 661 workaround, the following algorithm employs a calculated technique 662 (as opposed to keeping state in memory) to maintain the random offset 663 for each possible context. 665 In the following algorithm, the function F() provides a (stateless) 666 unpredictable offset for each given context (as identified by 667 'CONTEXT'). 669 /* Initialization code */ 670 counter = 0; 672 /* Numeric ID selection function */ 674 id_range = max_id - min_id + 1; 675 offset = F(CONTEXT, secret_key); 676 count = id_range; 678 do { 679 next_id = min_id + 680 (counter + offset) % id_range; 682 counter++; 684 if(check_suitable_id(next_id)) 685 return next_id; 687 count--; 689 } while (count > 0); 691 return ERROR; 693 The function F() provides a "per-CONTEXT" fixed offset within the 694 numeric identifier "space". Both the 'offset' and 'counter' 695 variables may take any value within the storage type range since we 696 are restricting the resulting identifier to be in the range [min_id, 697 max_id] in a similar way as in the algorithm described in 698 Section 7.1.1. This allows us to simply increment the 'counter' 699 variable and rely on the unsigned integer to wrap around. 701 The function F() should be a cryptographically-secure hash function 702 (e.g. SHA-256 [FIPS-SHS]). CONTEXT is the concatenation of all the 703 elements that define a given context. For example, if this algorithm 704 is expected to produce identifiers that are monotonically-increasing 705 for each set (Source IP Address, Destination IP Address), CONTEXT 706 should be the concatenation of these two IP addresses. 708 The result of F() is no more secure than the secret key, and 709 therefore 'secret_key' must be unknown to the attacker, and must be 710 of a reasonable length. 'secret_key' must remain stable for a given 711 CONTEXT, since otherwise the numeric identifiers generated by this 712 algorithm would not have the desired stability properties (i.e., 713 stable for a given CONTEXT). In most cases, 'secret_key' can be 714 selected with a PRNG (see [RFC4086] for recommendations on choosing 715 secrets) at an appropriate time, and stored in stable or volatile 716 storage for future use. 718 It should be noted that, since this algorithm uses a global counter 719 ("counter") for selecting identifiers (i.e., all counters share the 720 same increments space), this algorithm produces an information 721 leakage (as described in Section 8.2). For example, if this 722 algorithm were used for selecting TCP ephemeral ports, and an 723 attacker could force a client to periodically establish a new TCP 724 connection to an attacker-controlled machine (or through an attacker- 725 observable routing path), the attacker could subtract consecutive 726 source port values to obtain the number of outgoing TCP connections 727 established globally by the target host within that time period (up 728 to wrap-around issues and five-tuple collisions, of course). 730 7.4.3. Double-Hash Algorithm 732 A trade-off between maintaining a single global 'counter' variable 733 and maintaining 2**N 'counter' variables (where N is the width of the 734 result of F()), could be achieved as follows. The system would keep 735 an array of TABLE_LENGTH integers, which would provide a separation 736 of the increment space into multiple buckets. This improvement could 737 be incorporated into the algorithm from Section 7.4.2 as follows: 739 /* Initialization code */ 741 for(i = 0; i < TABLE_LENGTH; i++) 742 table[i] = random(); 744 id_inc = 1; 746 /* Numeric ID selection function */ 748 id_range = max_id - min_id + 1; 749 offset = F(CONTEXT, secret_key1); 750 index = G(CONTEXT, secret_key2) % TABLE_LENGTH; 751 count = id_range; 753 do { 754 next_id = min_id + (offset + table[index]) % id_range; 755 table[index] = table[index] + id_inc; 757 if(check_suitable_id(next_id)) 758 return next_id; 760 count--; 762 } while (count > 0); 764 return ERROR; 766 'table[]' could be initialized with random values, as indicated by 767 the initialization code in the pseudo-code above. 769 Both F() and G() should be a cryptographically-secure hash functions 770 (e.g. SHA-256 [FIPS-SHS]) computed over the concatenation of each of 771 their respective arguments. Both F() and G() would employ the same 772 CONTEXT (the concatenation of all the elements that define a given 773 context), and would use separate secreted keys (secret_key1, and 774 secret_key2, respectively). 776 The results of F() and G() are no more secure than their respective 777 secret keys ('secret_key1' and 'secret_key2', respectively), and 778 therefore both secret keys must be unknown to the attacker, and must 779 be of a reasonable length. Both secret keys must remain stable for 780 the given CONTEXT, since otherwise the numeric identifiers generated 781 by this algorithm would not have the desired stability properties 782 (i.e., stable for a given CONTEXT). In most cases, both secret keys 783 can be selected with a PRNG (see [RFC4086] for recommendations on 784 choosing secrets) at an appropriate time, and stored in stable or 785 volatile storage for future use. 787 The array 'table[]' assures that successive identifiers for a given 788 context will be monotonically-increasing. However, the increments 789 space is separated into TABLE_LENGTH different spaces, and thus 790 identifier reuse frequency will be (probabilistically) lower than 791 that of the algorithm in Section 7.4.2. That is, the generation of 792 an identifier for one given context will not necessarily result in 793 increments in the identifier sequence for other contexts. It is 794 interesting to note that the size of 'table[]' does not limit the 795 number of different identifier sequences, but rather separates the 796 *increments* into TABLE_LENGTH different spaces. The identifier 797 sequence will result from adding the corresponding entry of 'table[]' 798 to the variable 'offset', which selects the actual identifier 799 sequence (as in the algorithm from Section 7.4.2). 801 An attacker can perform traffic analysis for any "increment space" 802 (i.e., context) into which the attacker has "visibility" -- namely, 803 the attacker can force a node to generate identifiers where 804 G(CONTEXT, secret_key2) identifies the target "increment space". 805 However, the attacker's ability to perform traffic analysis is very 806 reduced when compared to the predictable linear identifiers 807 (described in Appendix A.1) and the hash-based identifiers (described 808 in Section 7.4.2). Additionally, an implementation can further limit 809 the attacker's ability to perform traffic analysis by further 810 separating the increment space (that is, using a larger value for 811 TABLE_LENGTH) and/or by randomizing the increments. 813 Otherwise, this algorithm does not suffer from the issues discussed 814 in Section 8. 816 8. Common Vulnerabilities Associated with Transient Numeric Identifiers 818 8.1. Network Activity Correlation 820 An identifier that is predictable or stable within a given context 821 allows for network activity correlation within that context. 823 For example, a stable IPv6 Interface Identifier allows for network 824 activity to be correlated for the context in which that address is 825 stable [RFC7721]. A stable-per-network IPv6 Interface Identifier (as 826 in [RFC7217]) allows for network activity correlation within a 827 network, whereas a constant IPv6 Interface Identifier (that remains 828 the same across networks) allows not only network activity 829 correlation within the same network, but also across networks ("host 830 tracking"). 832 Similarly, a node that generates TCP ISNs with a global counter could 833 allow network activity correlation across networks, since the 834 communicating nodes could infer the identity of the node based on the 835 TCP ISNs employed for subsequent communication instances. Similarly, 836 a node that generates predictable IPv6 Fragment Identification values 837 could be subject to network activity correlation (see e.g. 838 [Bellovin2002]). 840 8.2. Information Leakage 842 Transient numeric identifiers that are not randomized can leak out 843 information to other communicating nodes. For example, it is common 844 to generate identifiers like: 846 ID = offset(CONTEXT_1) + linear(CONTEXT_2); 848 This generic expression generates identifiers by adding a linear 849 function to an offset. The offset is stable within a given context, 850 whereas linear() is a linear function for a given context (possibly 851 different to that of offset()). Identifiers generated with this 852 expression will generally be predictable within CONTEXT_1. Thus, 853 CONTEXT_1 essentially specifies the context within which information 854 will be "leaked". When both CONTEXT_1 and CONTEXT_2 are a constant 855 value, then all the corresponding transient numeric identifiers 856 become predictable in all contexts. 858 NOTE: If offset() has a global context and the specific value is 859 known, the resulting identifiers may leak even more information. 860 For example, the if Fragment Identification values are generated 861 with the generic function above, and CONTEXT_1 is "global", then 862 the corresponding identifiers will leak the number of fragmented 863 datagrams sent for CONTEXT_2. If both CONTEXT_1 and CONTEXT_2 are 864 "global", then Fragment Identification values would be generated 865 with a global counter (initialized to offset()), and thus each 866 generated Fragment Identification value would leak the number of 867 fragmented datagrams transmitted by the node since it has been 868 bootstrapped. 870 On the other hand, linear() will be predictable within CONTEXT_2. 871 The predictability of linear(), irrespective of the context and/or 872 predictability of offset(), can leak out information that is of use 873 to attackers. For example, a node that selects ephemeral port 874 numbers on as in: 876 ephemeral_port = offset(Dest_IP) + linear() 878 that is, with a per-destination offset, but global linear() function 879 (e.g., a global counter), will leak information about the number of 880 outgoing connections that have been issued between any two issued 881 outgoing connections. 883 Similarly, a node that generates Fragment Identification values as 884 in: 886 Frag_ID = offset(Srd_IP, Dst_IP) + linear() 888 will leak out information about the number of fragmented packets that 889 have been transmitted between any two other transmitted fragmented 890 packets. The vulnerabilities described in [Sanfilippo1998a], 891 [Sanfilippo1998b], and [Sanfilippo1999] are all associated with the 892 use of a global linear() function (i.e., a global CONTEXT_2). 894 8.3. Exploitation of Semantics of Transient Numeric Identifiers 896 Identifiers that are not semantically opaque tend to be more 897 predictable than semantically-opaque identifiers. For example, a MAC 898 address contains an OUI (Organizationally-Unique Identifier) which 899 identifies the vendor that manufactured the underlying network 900 interface card. This fact may be leveraged by an attacker meaning to 901 "guess" MAC addresses and who has some knowledge about the possible 902 NIC vendor. 904 [RFC7707] discusses a number of techniques to reduce the search space 905 when performing IPv6 address-scanning attacks by leveraging the 906 semantics of the IIDs produced by a number by traditional IID- 907 generation algorithms that embed MAC addresses (now replaced by 908 [RFC8064] with [RFC7217]). 910 8.4. Exploitation of Collisions of Transient Numeric Identifiers 912 In many cases, the collision of transient network identifiers can 913 have a hard failure severity (or result in a hard failure severity if 914 an attacker can cause multiple collisions deterministically, one 915 after another). For example, predictable Fragment Identification 916 values open the door to Denial of Service (DoS) attacks (see e.g. 917 [RFC5722]. Similarly, predictable TCP ISNs open the door to trivial 918 connection-reset and data injection attacks (see e.g. 919 [Joncheray1995]). 921 8.5. Cryptanalysis 923 A number of algorithms discussed in this document (such as 924 Section 7.4.2 and Section 7.4.3 rely on cryptographically-secure hash 925 functions. Implementations that employ weak hash functions and keys 926 of inappropriate size may be subject to cryptanalysis, where an 927 attacker may be able to obtain the secret key employed for the hash 928 algorithms, predict numeric identifiers, etc. 930 Futhermore, an implementation that overloads the semantics of the 931 secret key may result in more trivial cryptanalysis, possibly 932 resulting in the leakage of the value employed for the secret key. 934 NOTE: 935 [IPID-DEV] describes two vulnerable numeric ID generators that 936 employ cryptographically-weak hash functions. Additionally, one 937 of such implementations employs a 32-bits of a kernel address as 938 the secret key for a hash function, and therefore successful 939 cryptanalysis leaks the aforementioned kernel address, allowing 940 for Kernel Address Space Layout Randomization (KASLR) [KASLR] 941 bypass. 943 9. Vulnerability Analysis of Specific Transient Numeric Identifiers 944 Categories 946 The following subsections analyze common vulnerabilities associated 947 with the generation of identifiers for each of the categories 948 identified in Section 6. 950 9.1. Category #1: Uniqueness (soft failure) 952 Possible vulnerabilities associated with identifiers of this category 953 are: 955 o Use of trivial algorithms (e.g. global counters) that generate 956 predictable identifiers 958 o Use of flawed PRNGs (please see e.g. [Zalewski2001], 959 [Zalewski2002] and [Klein2007]) 961 Since the only interoperability requirement for these identifiers is 962 uniqueness (with an associated soft failure), the obvious approach to 963 generate them is to employ a PRNG. An implementer should consult 964 [RFC4086] regarding randomness requirements for security, and consult 965 relevant documentation when employing a PRNG provided by the 966 underlying system. 968 Use of algorithms other than PRNGs for generating identifiers of this 969 category is discouraged. 971 9.2. Category #2: Uniqueness (hard failure) 973 As noted in Section 7.2 this category typically employs the same 974 algorithms as Category #4, since a monotonically-increasing sequence 975 tends to minimize the identifier reuse frequency. Therefore, the 976 vulnerability analysis of Section 9.4 applies to this category. 978 9.3. Category #3: Uniqueness, stable within context (soft failure) 980 There are three main vulnerabilities that may be associated with 981 identifiers of this category: 983 1. Use algorithms or sources that result in predictable identifiers 985 2. Use cryptographically-weak hash functions, or inappropriate 986 secret key sizes that allow for cryptanalysis 988 3. Employing the same identifier across contexts in which stability 989 is not required (overloading the numeric identifier) 991 At times, an implementation or specification may be tempted to employ 992 a source for the numeric identifiers which is known to provide unique 993 values, that may have other properties such as being predictable or 994 leaking information about the node in question. For example, as 995 noted in [RFC7721], embedding link-layer addresses for generating 996 IPv6 IIDs not only results in predictable values, but also leaks 997 information about the manufacturer of the network interface card. 999 Employing cryptographically-weak hash functions or inappropriate 1000 secret key sizes may allow for cryptanalysis, which may eventually be 1001 exploited by an attacker to predict future numeric identifiers and 1002 perform a variety of attacks. 1004 On the other hand, using an identifier across contexts where 1005 stability is not required can be leveraged for correlation of 1006 activities. On of the most trivial examples of this is the use of 1007 IPv6 IIDs that are stable across networks (such as IIDs that embed 1008 the underlying link-layer address). 1010 9.4. Category #4: Uniqueness, monotonically increasing within context 1011 (hard failure) 1013 A simple way to generalize algorithms employed for generating 1014 identifiers of Category #4 would be as follows: 1016 /* Numeric ID selection function */ 1018 count = max_id - min_id + 1; 1020 do { 1021 linear(CONTEXT_2)= linear(CONTEXT_2) + increment(); 1022 next_id = offset(CONTEXT_1) + linear(CONTEXT_2); 1024 if(check_suitable_id(next_id)) 1025 return next_id; 1027 count--; 1028 } while (count > 0); 1030 return ERROR; 1032 Essentially, an identifier (next_id) is generated by adding a linear 1033 function (linear()) to an offset value, which is unknown to the 1034 attacker, and stable for given context (CONTEXT_1). 1036 The following aspects of the algorithm should be considered: 1038 o For the most part, it is the offset() function that results in 1039 identifiers that are unpredictable by an off-path attacker. While 1040 the resulting sequence will be monotonically-increasing, the use 1041 of an offset value that is unknown to the attacker makes the 1042 resulting values unknown to the attacker. 1044 o The most straightforward "stateless" implementation of offset 1045 would be that in which offset() is the result of a 1046 cryptographically-secure hash-function that takes the values that 1047 identify the context and a "secret_key" (not shown in the figure 1048 above) as arguments. 1050 o Another possible (but stateful) approach would be to simply 1051 generate a random "per-context" "counter" and store it in memory, 1052 and then look-up the corresponding context when a new identifier 1053 is to be selected, and increment the counter to obtain the 1054 transient numeric identifier. The algorithm in Section 7.4.1 is 1055 essentially an implementation of this type. 1057 o The linear function is incremented according to increment(). In 1058 the most trivial case increment() could always return the constant 1059 "1". But it could also possibly return small random integers such 1060 the increments are unpredictable. 1062 Considering the generic algorithm illustrated above we can identify 1063 the following possible vulnerabilities: 1065 o All the vulnerabilities discussed in Section 9.3 ("Category #3: 1066 Uniqueness, stable within context (soft failure)") since the 1067 algorithms for this category are similar to those of Section 9.3, 1068 with the addition of a linear function. 1070 o The function linear() could be seen as representing the number of 1071 identifiers that have so far been generated for a given context 1072 (CONTEXT_2). If linear() spans more than the necessary context, 1073 the "increments" could be leaked to other parties, thus disclosing 1074 information about the number of identifiers that have so far been 1075 generated. For example, an implementation in which linear() is 1076 implemented as a single global counter will unnecessarily leak 1077 information the number of identifiers that have been produced. 1078 [Fyodor2004] is one example of how such information leakages can 1079 be exploited. However, limiting the span of the increments space 1080 will require a larger number of counters to be stored in memory 1081 (i.e., a larger size for the TABLE_LENGTH parameter of the 1082 algorithm in Section 7.4.3. 1084 o increment() determines the increments of linear() for each 1085 identifier that is selected. In the most trivial case, 1086 increment() will return the integer "1". However, an 1087 implementation may have increment() return a "small" random 1088 integer value such that even if the current value employed by the 1089 generator is guessed (see Appendix A of [RFC7739]), the exact next 1090 identifier to be selected will be slightly harder to identify. 1092 10. IANA Considerations 1094 There are no IANA registries within this document. The RFC-Editor 1095 can remove this section before publication of this document as an 1096 RFC. 1098 11. Security Considerations 1100 The entire document is about the security and privacy implications of 1101 transient numeric identifiers. 1102 [I-D.gont-numeric-ids-sec-considerations] formally requires protocol 1103 specifications to include an appropriate analysis of the 1104 interoperability, security, and privacy implications of the transient 1105 numeric identifiers they specify and employ, while this document 1106 analyzes possible algorithms (and their implications) that could be 1107 employed to comply with the interoperability properties of a 1108 transient numeric identifier, while mitigating the possible security 1109 and privacy implications. 1111 12. Acknowledgements 1113 The authors would like to thank (in alphabetical order) Steven 1114 Bellovin, Joseph Lorenzo Hall, Gre Norcie, Shivan Sahib, and Martin 1115 Thomson, for providing valuable comments on earlier versions of this 1116 document. 1118 The authors would like to thank Diego Armando Maradona for his magic 1119 and inspiration. 1121 13. References 1123 13.1. Normative References 1125 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 1126 RFC 793, DOI 10.17487/RFC0793, September 1981, 1127 . 1129 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1130 Requirement Levels", BCP 14, RFC 2119, 1131 DOI 10.17487/RFC2119, March 1997, 1132 . 1134 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1135 (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, 1136 December 1998, . 1138 [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, 1139 "Randomness Requirements for Security", BCP 106, RFC 4086, 1140 DOI 10.17487/RFC4086, June 2005, 1141 . 1143 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 1144 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 1145 2006, . 1147 [RFC4862] Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless 1148 Address Autoconfiguration", RFC 4862, 1149 DOI 10.17487/RFC4862, September 2007, 1150 . 1152 [RFC5722] Krishnan, S., "Handling of Overlapping IPv6 Fragments", 1153 RFC 5722, DOI 10.17487/RFC5722, December 2009, 1154 . 1156 [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence 1157 Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February 1158 2012, . 1160 [RFC7217] Gont, F., "A Method for Generating Semantically Opaque 1161 Interface Identifiers with IPv6 Stateless Address 1162 Autoconfiguration (SLAAC)", RFC 7217, 1163 DOI 10.17487/RFC7217, April 2014, 1164 . 1166 [RFC8064] Gont, F., Cooper, A., Thaler, D., and W. Liu, 1167 "Recommendation on Stable IPv6 Interface Identifiers", 1168 RFC 8064, DOI 10.17487/RFC8064, February 2017, 1169 . 1171 13.2. Informative References 1173 [Bellovin2002] 1174 Bellovin, S., "A Technique for Counting NATted Hosts", 1175 IMW'02 Nov. 6-8, 2002, Marseille, France, 2002. 1177 [CPNI-TCP] 1178 Gont, F., "Security Assessment of the Transmission Control 1179 Protocol (TCP)", United Kingdom's Centre for the 1180 Protection of National Infrastructure (CPNI) Technical 1181 Report, 2009, . 1184 [FIPS-SHS] 1185 FIPS, "Secure Hash Standard (SHS)", Federal Information 1186 Processing Standards Publication 180-4, August 2015, 1187 . 1190 [Fyodor2004] 1191 Fyodor, "Idle scanning and related IP ID games", 2004, 1192 . 1194 [I-D.gont-numeric-ids-sec-considerations] 1195 Gont, F. and I. Arce, "Security Considerations for 1196 Transient Numeric Identifiers Employed in Network 1197 Protocols", draft-gont-numeric-ids-sec-considerations-04 1198 (work in progress), July 2019. 1200 [I-D.irtf-pearg-numeric-ids-history] 1201 Gont, F. and I. Arce, "Unfortunate History of Transient 1202 Numeric Identifiers", draft-irtf-pearg-numeric-ids- 1203 history-02 (work in progress), April 2020. 1205 [IANA-PROT] 1206 IANA, "Protocol Registries", 1207 . 1209 [IPID-DEV] 1210 Klein, A. and B. Pinkas, "From IP ID to Device ID and 1211 KASLR Bypass (Extended Version)", June 2019, 1212 . 1214 [Joncheray1995] 1215 Joncheray, L., "A Simple Active Attack Against TCP", Proc. 1216 Fifth Usenix UNIX Security Symposium, 1995. 1218 [KASLR] PaX Team, "Address Space Layout Randomization", 1219 . 1221 [Klein2007] 1222 Klein, A., "OpenBSD DNS Cache Poisoning and Multiple O/S 1223 Predictable IP ID Vulnerability", 2007, 1224 . 1227 [Morris1985] 1228 Morris, R., "A Weakness in the 4.2BSD UNIX TCP/IP 1229 Software", CSTR 117, AT&T Bell Laboratories, Murray Hill, 1230 NJ, 1985, 1231 . 1233 [RFC1035] Mockapetris, P., "Domain names - implementation and 1234 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 1235 November 1987, . 1237 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 1238 Errors at High Data Rates", RFC 4963, 1239 DOI 10.17487/RFC4963, July 2007, 1240 . 1242 [RFC6056] Larsen, M. and F. Gont, "Recommendations for Transport- 1243 Protocol Port Randomization", BCP 156, RFC 6056, 1244 DOI 10.17487/RFC6056, January 2011, 1245 . 1247 [RFC7098] Carpenter, B., Jiang, S., and W. Tarreau, "Using the IPv6 1248 Flow Label for Load Balancing in Server Farms", RFC 7098, 1249 DOI 10.17487/RFC7098, January 2014, 1250 . 1252 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 1253 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 1254 2014, . 1256 [RFC7707] Gont, F. and T. Chown, "Network Reconnaissance in IPv6 1257 Networks", RFC 7707, DOI 10.17487/RFC7707, March 2016, 1258 . 1260 [RFC7721] Cooper, A., Gont, F., and D. Thaler, "Security and Privacy 1261 Considerations for IPv6 Address Generation Mechanisms", 1262 RFC 7721, DOI 10.17487/RFC7721, March 2016, 1263 . 1265 [RFC7739] Gont, F., "Security Implications of Predictable Fragment 1266 Identification Values", RFC 7739, DOI 10.17487/RFC7739, 1267 February 2016, . 1269 [Sanfilippo1998a] 1270 Sanfilippo, S., "about the ip header id", Post to Bugtraq 1271 mailing-list, Mon Dec 14 1998, 1272 . 1274 [Sanfilippo1998b] 1275 Sanfilippo, S., "Idle scan", Post to Bugtraq mailing-list, 1276 1998, . 1279 [Sanfilippo1999] 1280 Sanfilippo, S., "more ip id", Post to Bugtraq mailing- 1281 list, 1999, 1282 . 1285 [Shimomura1995] 1286 Shimomura, T., "Technical details of the attack described 1287 by Markoff in NYT", Message posted in USENET's 1288 comp.security.misc newsgroup Message-ID: 1289 <3g5gkl$5j1@ariel.sdsc.edu>, 1995, 1290 . 1292 [Silbersack2005] 1293 Silbersack, M., "Improving TCP/IP security through 1294 randomization without sacrificing interoperability", 1295 EuroBSDCon 2005 Conference, 2005, 1296 . 1299 [TCPT-uptime] 1300 McDanel, B., "TCP Timestamping - Obtaining System Uptime 1301 Remotely", March 2001, 1302 . 1304 [Zalewski2001] 1305 Zalewski, M., "Strange Attractors and TCP/IP Sequence 1306 Number Analysis", 2001, 1307 . 1309 [Zalewski2002] 1310 Zalewski, M., "Strange Attractors and TCP/IP Sequence 1311 Number Analysis - One Year Later", 2001, 1312 . 1314 Appendix A. Flawed Algorithms 1316 The following subsections document algorithms with known negative 1317 security and privacy implications. 1319 A.1. Predictable Linear Identifiers Algorithm 1321 One of the most trivial ways to achieve uniqueness with a low 1322 identifier reuse frequency is to produce a linear sequence. 1324 For example, the following algorithm has been employed (see e.g. 1325 [Morris1985], [Shimomura1995], [Silbersack2005] and [CPNI-TCP]) in a 1326 number of operating systems for selecting IP fragment IDs, TCP 1327 ephemeral ports, etc.: 1329 /* Initialization code */ 1331 next_id = min_id; 1332 id_inc= 1; 1334 /* Numeric ID selection function */ 1336 count = max_id - min_id + 1; 1338 do { 1339 if (next_id == max_id) { 1340 next_id = min_id; 1341 } 1342 else { 1343 next_id = next_id + id_inc; 1344 } 1346 if (check_suitable_id(next_id)) 1347 return next_id; 1349 count--; 1351 } while (count > 0); 1353 return ERROR; 1355 Note: 1356 check_suitable_id() is a function that checks whether the 1357 resulting identifier is acceptable (e.g., whether it's in use, 1358 etc.). 1360 For obvious reasons, this algorithm results in predicable sequences. 1361 If a global counter is used (such as "next_id" in the example above), 1362 a node that learns one numeric identifier can also learn or guess 1363 values employed by past and future protocol instances. On the other 1364 hand, when the value of increments is known (such as "1" in this 1365 case), an attacker can sample two values, and learn the number of 1366 identifiers that were generated in-between. Furthermore, if the 1367 counter is initialized e.g. when the system its bootstrapped to some 1368 known value, it will likely leak information (for example, the number 1369 of transmitted in the case of an IP ID generator [Sanfilippo1998a], 1370 or the system uptime in the case of TCP timestamps [TCPT-uptime]). 1372 Where identifier reuse would lead to a hard failure, one typical 1373 approach to generate unique identifiers (while minimizing the 1374 security and privacy implications of predictable identifiers) is to 1375 obfuscate the resulting numeric identifiers by either: 1377 o Replacing the global counter with multiple counters (initialized 1378 to a random value) 1380 o Randomizing the "increments" 1382 Avoiding global counters essentially means that learning one 1383 identifier for a given context (e.g., one TCP ephemeral port for a 1384 given {src IP, Dst IP, Dst Port}) is of no use for learning or 1385 guessing identifiers for a different context (e.g., TCP ephemeral 1386 ports that involve other peers). However, this may imply keeping one 1387 additional variables/counter per contexts, which may be prohibitive 1388 in some environments. 1390 The choice of id_inc has implications on both the security and 1391 privacy properties of the resulting identifiers, but also on the 1392 corresponding interoperability properties. On one hand, minimizing 1393 the increments (as in "id_inc = 1" in our case) generally minimizes 1394 the identifier reuse frequency, albeit at increased predictability. 1395 On the other hand, if the increments are randomized, predictability 1396 of the resulting identifiers is reduced, and the information leakage 1397 produced by global constant increments is mitigated. However, using 1398 larger increments than necessary can result in higher identifier 1399 reuse frequency. 1401 A.2. Random-Increments Algorithm 1403 This algorithm offers a middle ground between the algorithms that 1404 select numeric identifiers randomly (such as those described in 1405 Section 7.1.1 and Section 7.1.2), and those that offer obfuscation 1406 but no randomization (such as those described in Section 7.4.2 and 1407 Section 7.4.3). 1409 /* Initialization code */ 1411 next_id = random(); /* Initialization value */ 1412 id_inc = 500; /* Determines the trade-off */ 1414 /* Numeric ID selection function */ 1416 id_range = max_id - min_id + 1; 1418 count = id_range; 1420 do { 1421 /* Random increment */ 1422 next_id = next_id + (random() % id_inc) + 1; 1424 /* Keep the identifier within acceptable range */ 1425 next_id = min_id + (next_id % id_range); 1427 if(check_suitable_id(next_id)) 1428 return next_id; 1430 count--; 1431 } while (count > 0); 1433 return ERROR; 1435 This algorithm aims at producing a monotonically-increasing sequence 1436 of numeric identifiers, while avoiding the use of fixed increments, 1437 which would lead to trivially predictable sequences. The value 1438 "id_inc" allows for direct control of the trade-off between the level 1439 of obfuscation and the identifier reuse frequency. The smaller the 1440 value of "id_inc", the more similar this algorithm is to a 1441 predicable, global monotonically-increasing ID generation algorithm. 1442 The larger the value of "id_inc", the more similar this algorithm is 1443 to the algorithm described in Section 7.1.1 of this document. 1445 When the identifiers wrap, there is the risk of collisions of 1446 identifiers (i.e., identifier reuse). Therefore, "id_inc" should be 1447 selected according to the following criteria: 1449 o It should maximize the wrapping time of the identifier space. 1451 o It should minimize identifier reuse frequency. 1453 o It should maximize obfuscation. 1455 Clearly, these are competing goals, and the decision of which value 1456 of "id_inc" to use is a trade-off. Therefore, the value of "id_inc" 1457 should be configurable so that system administrators can make the 1458 trade-off for themselves. We note that the alternative algorithms 1459 discussed throughout this document offer better interoperability, 1460 security and privacy implications than this algorithm, and hence 1461 implementation of this algorithm is discouraged. 1463 Authors' Addresses 1465 Fernando Gont 1466 SI6 Networks 1467 Evaristo Carriego 2644 1468 Haedo, Provincia de Buenos Aires 1706 1469 Argentina 1471 Email: fgont@si6networks.com 1472 URI: https://www.si6networks.com 1474 Ivan Arce 1475 Quarkslab 1477 Email: iarce@quarkslab.com 1478 URI: https://www.quarkslab.com