idnits 2.17.1 draft-irtf-pearg-numeric-ids-generation-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (August 23, 2019) is 1698 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 6528 (Obsoleted by RFC 9293) == Outdated reference: A later version (-11) exists of draft-gont-numeric-ids-sec-considerations-04 Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Research Task Force (IRTF) F. Gont 3 Internet-Draft SI6 Networks 4 Intended status: Best Current Practice I. Arce 5 Expires: February 24, 2020 Quarkslab 6 August 23, 2019 8 On the Generation of Transient Numeric Identifiers 9 draft-irtf-pearg-numeric-ids-generation-00 11 Abstract 13 This document performs an analysis of the security and privacy 14 implications of different types of "numeric identifiers" used in IETF 15 protocols, and tries to categorize them based on their 16 interoperability requirements and the associated failure severity 17 when such requirements are not met. Subsequently, it provides advice 18 on possible algorithms that could be employed to satisfy the 19 interoperability requirements of each identifier type, while 20 minimizing the security and privacy implications, thus providing 21 guidance to protocol designers and protocol implementers. Finally, 22 this describes a number of algorithms that have been employed in real 23 implementations to generate transient numeric identifiers and 24 analyzes their security and privacy properties. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on February 24, 2020. 43 Copyright Notice 45 Copyright (c) 2019 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 3. Threat Model . . . . . . . . . . . . . . . . . . . . . . . . 5 63 4. Issues with the Specification of Identifiers . . . . . . . . 5 64 5. Protocol Failure Severity . . . . . . . . . . . . . . . . . . 6 65 6. Categorizing Identifiers . . . . . . . . . . . . . . . . . . 6 66 7. Common Algorithms for Identifier Generation . . . . . . . . . 8 67 7.1. Category #1: Uniqueness (soft failure) . . . . . . . . . 8 68 7.2. Category #2: Uniqueness (hard failure) . . . . . . . . . 10 69 7.3. Category #3: Uniqueness, constant within context (soft- 70 failure) . . . . . . . . . . . . . . . . . . . . . . . . 10 71 7.4. Category #4: Uniqueness, monotonically increasing within 72 context (hard failure) . . . . . . . . . . . . . . . . . 12 73 8. Common Vulnerabilities Associated with Transient Numeric 74 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . 17 75 8.1. Network Activity Correlation . . . . . . . . . . . . . . 17 76 8.2. Information Leakage . . . . . . . . . . . . . . . . . . . 17 77 8.3. Exploitation of Semantics of Transient Numeric 78 Identifiers . . . . . . . . . . . . . . . . . . . . . . . 19 79 8.4. Exploitation of Collisions of Transient Numeric 80 Identifiers . . . . . . . . . . . . . . . . . . . . . . . 19 81 9. Vulnerability Analysis of Specific Transient Numeric 82 Identifiers Categories . . . . . . . . . . . . . . . . . . . 19 83 9.1. Category #1: Uniqueness (soft failure) . . . . . . . . . 19 84 9.2. Category #2: Uniqueness (hard failure) . . . . . . . . . 20 85 9.3. Category #3: Uniqueness, constant within context (soft 86 failure) . . . . . . . . . . . . . . . . . . . . . . . . 20 87 9.4. Category #4: Uniqueness, monotonically increasing within 88 context (hard failure) . . . . . . . . . . . . . . . . . 20 89 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 90 11. Security Considerations . . . . . . . . . . . . . . . . . . . 22 91 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23 92 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 93 13.1. Normative References . . . . . . . . . . . . . . . . . . 23 94 13.2. Informative References . . . . . . . . . . . . . . . . . 24 95 Appendix A. Flawed Algorithms . . . . . . . . . . . . . . . . . 27 96 A.1. Predictable Linear Identifiers Algorithm . . . . . . . . 27 97 A.2. Random-Increments Algorithm . . . . . . . . . . . . . . . 28 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 100 1. Introduction 102 Network protocols employ a variety of numeric identifiers for 103 different protocol entities, ranging from DNS Transaction IDs (TxIDs) 104 to transport protocol ports (e.g. TCP ports) or IPv6 Interface 105 Identifiers (IIDs). These identifiers usually have specific 106 properties that must be satisfied such that they do not result in 107 negative interoperability implications (e.g. uniqueness during a 108 specified period of time), and an associated failure severity when 109 such properties are not met, ranging from soft to hard failures. 111 For more than 30 years, a large number of implementations of the TCP/ 112 IP protocol suite have been subject to a variety of attacks, with 113 effects ranging from Denial of Service (DoS) or data injection, to 114 information leakage that could be exploited for pervasive monitoring 115 [RFC7258]. The root of these issues has been, in many cases, the 116 poor selection of identifiers in such protocols, usually as a result 117 of insufficient or misleading specifications. While it is generally 118 trivial to identify an algorithm that can satisfy the 119 interoperability requirements of a given identifier, there exists 120 practical evidence that doing so without negatively affecting the 121 security and/or privacy properties of the aforementioned protocols is 122 prone to error [I-D.gont-numeric-ids-history]. 124 For example, implementations have been subject to security and/or 125 privacy issues resulting from: 127 o Predictable TCP sequence numbers 129 o Predictable transport protocol port numbers 131 o Predictable IPv4 or IPv6 Fragment Identifiers 133 o Predictable IPv6 Interface Identifiers (IIDs) 135 o Predictable DNS Transaction Identifiers (TxIDs) 137 Recent history indicates that when new protocols are standardized or 138 new protocol implementations are produced, the security and privacy 139 properties of the associated identifiers tend to be overlooked and 140 inappropriate algorithms to generate transient numeric identifiers 141 are either suggested in the specification or selected by 142 implementers. As a result, we believe that advice in this area is 143 warranted. 145 This document contains a non-exhaustive survey of identifiers 146 employed in various IETF protocols, and aims to categorize such 147 identifiers based on their interoperability requirements, and the 148 associated failure severity when such requirements are not met. 149 Subsequently, it provides advice on possible algorithms that could be 150 employed to satisfy the interoperability requirements of each 151 category, while minimizing the associated security and privacy 152 implications. Finally, it analyzes several algorithms that have been 153 employed in real implementations to meet such requirements and 154 analyzes their security and privacy properties. 156 2. Terminology 158 Identifier: 159 A data object in a protocol specification that can be used to 160 definitely distinguish a protocol object (a datagram, network 161 interface, transport protocol endpoint, session, etc) from all 162 other objects of the same type, in a given context. Identifiers 163 are usually defined as a series of bits and represented using 164 integer values. We note that different identifiers may have 165 additional requirements or properties depending on their specific 166 use in a protocol. We use the term "identifier" as a generic term 167 to refer to any data object in a protocol specification that 168 satisfies the identification property stated above. 170 Failure Severity: 171 The consequences of a failure to comply with the interoperability 172 requirements of a given identifier. Severity considers the worst 173 potential consequence of a failure, determined by the system 174 damage and/or time lost to repair the failure. In this document 175 we define two types of failure severity: "soft" and "hard". 177 Hard Failure: 178 A hard failure is a non-recoverable condition in which a protocol 179 does not operate in the prescribed manner or it operates with 180 excessive degradation of service. For example, an established TCP 181 connection that is aborted due to an error condition constitutes, 182 from the point of view of the transport protocol, a hard failure, 183 since it enters a state from which normal operation cannot be 184 recovered. 186 Soft Failure: 187 A soft failure is a recoverable condition in which a protocol does 188 not operate in the prescribed manner but normal operation can be 189 resumed automatically in a short period of time. For example, a 190 simple packet-loss event that is subsequently recovered with a 191 retransmission can be considered a soft failure. 193 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 194 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 195 document are to be interpreted as described in RFC 2119 [RFC2119]. 197 3. Threat Model 199 Throughout this document, we assume an attacker does not have 200 physical or logical access to the device(s) being attacked. We 201 assume the attacker can simply send any traffic to the target 202 devices, to e.g. sample identifiers employed by such devices. 204 4. Issues with the Specification of Identifiers 206 While assessing protocol specifications regarding the use of 207 identifiers, we found that most of the issues discussed in this 208 document arise as a result of one of the following conditions: 210 o Protocol specifications which under-specify the requirements for 211 their identifiers 213 o Protocol specifications that over-specify their identifiers 215 o Protocol implementations that simply fail to comply with the 216 specified requirements 218 A number of protocol implementations (too many of them) simply 219 overlook the security and privacy implications of identifiers 220 [I-D.gont-numeric-ids-history]. Examples of them are the 221 specification of TCP port numbers in [RFC0793], the specification of 222 TCP sequence numbers in [RFC0793], or the specification of the DNS 223 TxID in [RFC1035]. 225 On the other hand, there are a number of protocol specifications that 226 over-specify some of their associated protocol identifiers. For 227 example, [RFC4291] essentially results in link-layer addresses being 228 embedded in the IPv6 Interface Identifiers (IIDs) when the 229 interoperability requirement of uniqueness could be achieved in other 230 ways that do not result in negative security and privacy implications 231 [RFC7721]. Similarly, [RFC2460] suggested the use of a global 232 counter for the generation of Fragment Identification values, when 233 the interoperability properties of uniqueness per {Src IP, Dst IP} 234 could be achieved with other algorithms that do not result in 235 negative security and privacy implications. 237 Finally, there are protocol implementations that simply fail to 238 comply with existing protocol specifications. For example, some 239 popular operating systems (notably Microsoft Windows) still fail to 240 implement transport port randomization, as specified in [RFC6056]. 242 5. Protocol Failure Severity 244 Section 2 defines the concept of "Failure Severity" and two types of 245 failures that we employ throughout this document: soft and hard. 247 Our analysis of the severity of a failure is performed from the point 248 of view of the protocol in question. However, the corresponding 249 severity on the upper application or protocol may not be the same as 250 that of the protocol in question. For example, a TCP connection that 251 is aborted may or may not result in a hard failure of the upper 252 application: if the upper application can establish a new TCP 253 connection without any impact on the application, a hard failure at 254 the TCP protocol may have no severity at the application level. On 255 the other hand, if a hard failure of a TCP connection results in 256 excessive degradation of service at the application layer, it will 257 also result in a hard failure at the application. 259 6. Categorizing Identifiers 261 This section includes a non-exhaustive survey of identifiers, and 262 proposes a number of categories that can accommodate these 263 identifiers based on their interoperability requirements and their 264 failure modes (soft or hard) 266 +------------+--------------------------------------+---------------+ 267 | Identifier | Interoperability Requirements | Failure | 268 | | | Severity | 269 +------------+--------------------------------------+---------------+ 270 | IPv6 Frag | Uniqueness (for IP address pair) | Soft/Hard (1) | 271 | ID | | | 272 +------------+--------------------------------------+---------------+ 273 | IPv6 IID | Uniqueness (and constant within IPv6 | Soft (3) | 274 | | prefix) (2) | | 275 +------------+--------------------------------------+---------------+ 276 | TCP ISN | Monotonically-increasing | Hard (4) | 277 +------------+--------------------------------------+---------------+ 278 | TCP eph. | Uniqueness (for connection ID) | Hard | 279 | port | | | 280 +------------+--------------------------------------+---------------+ 281 | IPv6 Flow | Uniqueness | None (5) | 282 | L. | | | 283 +------------+--------------------------------------+---------------+ 284 | DNS TxID | Uniqueness | None (6) | 285 +------------+--------------------------------------+---------------+ 287 Table 1: Survey of Identifiers 289 Notes: 291 (1) 292 While a single collision of Fragment ID values would simply lead 293 to a single packet drop (and hence a "soft" failure), repeated 294 collisions at high data rates might trash the Fragment ID space, 295 leading to a hard failure [RFC4963]. 297 (2) 298 While the interoperability requirements are simply that the 299 Interface ID results in a unique IPv6 address, for operational 300 reasons it is typically desirable that the resulting IPv6 address 301 (and hence the corresponding Interface ID) be constant within each 302 network [RFC7217] [RFC8064] . 304 (3) 305 While IPv6 Interface IDs must result in unique IPv6 addresses, 306 IPv6 Duplicate Address Detection (DAD) [RFC4862] allows for the 307 detection of duplicate Interface IDs/addresses, and hence such 308 Interface ID collisions can be recovered. 310 (4) 311 In theory there are no interoperability requirements for TCP 312 Initial Sequence Numbers (ISNs), since the TIME-WAIT state and 313 TCP's "quiet time" take care of old segments from previous 314 incarnations of the connection. However, a widespread 315 optimization allows for a new incarnation of a previous connection 316 to be created if the ISN of the incoming SYN is larger than the 317 last sequence number seen in that direction for the previous 318 incarnation of the connection. Thus, monotonically-increasing TCP 319 sequence numbers allow for such optimization to work as expected 320 [RFC6528]. 322 (5) 323 The IPv6 Flow Label is typically employed for load sharing 324 [RFC7098], along with the Source and Destination IPv6 addresses. 325 Reuse of a Flow Label value for the same set {Source Address, 326 Destination Address} would typically cause both flows to be 327 multiplexed into the same link. However, as long as this does not 328 occur deterministically, it will not result in any negative 329 implications. 331 (6) 332 DNS TxIDs are employed, together with the Source Address, 333 Destination Address, Source Port, and Destination Port, to match 334 DNS requests and responses. However, since an implementation 335 knows which DNS requests were sent for that set of {Source 336 Address, Destination Address, Source Port, and Destination Port, 337 DNS TxID}, a collision of TxID would result, if anything, in a 338 small performance penalty (the response would be discarded when it 339 is found that it does not answer the query sent in the 340 corresponding DNS query). 342 Based on the survey above, we can categorize identifiers as follows: 344 +-----+---------------------------------------+---------------------+ 345 | Cat | Category | Sample Proto IDs | 346 | # | | | 347 +-----+---------------------------------------+---------------------+ 348 | 1 | Uniqueness (soft failure) | IPv6 Flow L., DNS | 349 | | | TxIDs | 350 +-----+---------------------------------------+---------------------+ 351 | 2 | Uniqueness (hard failure) | IPv6 Frag ID, TCP | 352 | | | ephemeral port | 353 +-----+---------------------------------------+---------------------+ 354 | 3 | Uniqueness, constant within context | IPv6 IIDs | 355 | | (soft failure) | | 356 +-----+---------------------------------------+---------------------+ 357 | 4 | Uniqueness, monotonically increasing | TCP ISN | 358 | | within context (hard failure) | | 359 +-----+---------------------------------------+---------------------+ 361 Table 2: Identifier Categories 363 We note that Category #4 could be considered a generalized case of 364 category #3, in which a monotonically increasing element is added to 365 a constant (within context) element, such that the resulting 366 identifiers are monotonically increasing within a specified context. 367 That is, the same algorithm could be employed for both #3 and #4, 368 given appropriate parameters. 370 7. Common Algorithms for Identifier Generation 372 The following subsections describe common algorithms found for 373 Protocol ID generation for each of the categories above. 375 7.1. Category #1: Uniqueness (soft failure) 377 7.1.1. Simple Randomization Algorithm 378 /* Ephemeral port selection function */ 379 id_range = max_id - min_id + 1; 380 next_id = min_id + (random() % id_range); 381 count = next_id; 383 do { 384 if(check_suitable_id(next_id)) 385 return next_id; 387 if (next_id == max_id) { 388 next_id = min_id; 389 } else { 390 next_id++; 391 } 393 count--; 394 } while (count > 0); 396 return ERROR; 398 Note: 399 random() is a function that returns a pseudo-random unsigned 400 integer number of appropriate size. Note that the output needs to 401 be unpredictable, and typical implementations of POSIX random() 402 function do not necessarily meet this requirement. See [RFC4086] 403 for randomness requirements for security. 405 The function check_suitable_id() can check, when possible, whether 406 this identifier is e.g. already in use. When already used, this 407 algorithm selects the next available protocol ID. 409 All the variables (in this and all the algorithms discussed in 410 this document) are unsigned integers. 412 This algorithm does not suffer from any of the issues discussed in 413 Section 8. 415 7.1.2. Another Simple Randomization Algorithm 417 The following pseudo-code illustrates another algorithm for selecting 418 a random identifier in which, in the event the identifier is found to 419 be not suitable (e.g., already in use), another identifier is 420 selected randomly: 422 id_range = max_id - min_id + 1; 423 next_id = min_id + (random() % id_range); 424 count = id_range; 426 do { 427 if(check_suitable_id(next_id)) 428 return next_id; 430 next_id = min_id + (random() % id_range); 431 count--; 432 } while (count > 0); 434 return ERROR; 436 This algorithm might be unable to select an identifier (i.e., return 437 "ERROR") even if there are suitable identifiers available, when there 438 are a large number of identifiers "in use". 440 This algorithm does not suffer from any of the issues discussed in 441 Section 8. 443 7.2. Category #2: Uniqueness (hard failure) 445 One of the most trivial approaches for achieving uniqueness for an 446 identifier (with a hard failure mode) is to implement a linear 447 function. As a result, all of the algorithms described in 448 Section 7.4 are of use for complying the requirements of this 449 identifier category. 451 7.3. Category #3: Uniqueness, constant within context (soft-failure) 453 The goal of this algorithm is to produce identifiers that are 454 constant for a given context, but that change when the aforementioned 455 context changes. 457 Keeping one value for each possible "context" may in many cases be 458 considered too onerous in terms of memory requirements. As a 459 workaround, the following algorithm employs a calculated technique 460 (as opposed to keeping state in memory) to maintain the constant 461 identifier for each given context. 463 In the following algorithm, the function F() provides (statelessly) a 464 constant identifier for each given context. 466 /* Protocol ID selection function */ 467 id_range = max_id - min_id + 1; 469 counter = 0; 471 do { 472 offset = F(CONTEXT, counter, secret_key); 473 next_id = min_id + (offset % id_range); 475 if(check_suitable_id(next_id)) 476 return next_id; 478 counter++; 480 } while (counter <= MAX_RETRIES); 482 return ERROR; 484 The function F() provides a "per-CONTEXT" constant identifier for a 485 given context. 'offset' may take any value within the storage type 486 range since we are restricting the resulting identifier to be in the 487 range [min_id, max_id] in a similar way as in the algorithm described 488 in Section 7.1.1. Collisions can be recovered by incrementing the 489 'counter' variable and recomputing F(). 491 The function F() should be a cryptographic hash function like SHA-256 492 [FIPS-SHS]. Note: MD5 [RFC1321] is considered unacceptable for F() 493 [RFC6151]. CONTEXT is the concatenation of all the elements that 494 define a given context. For example, if this algorithm is expected 495 to produce identifiers that are unique per network interface card 496 (NIC) and SLAAC autoconfiguration prefix, the CONTEXT should be the 497 concatenation of e.g. the interface index and the SLAAC 498 autoconfiguration prefix (please see [RFC7217] for an implementation 499 of this algorithm for the generation of IPv6 IIDs). 501 The secret should be chosen to be as random as possible (see 502 [RFC4086] for recommendations on choosing secrets). 504 For obvious reasons, the transient network identifiers generated with 505 this algorithm allow for network activity correlation within 506 "CONTEXT". However, this is essentially a design goal of this 507 category of transient numeric identifiers. 509 7.4. Category #4: Uniqueness, monotonically increasing within context 510 (hard failure) 512 7.4.1. Per-context Counter Algorithm 514 One possible way to achieve low identifier reuse frequency while 515 still avoiding predictable sequences would be to employ a per-context 516 counter, as opposed to a global counter. Such an algorithm could be 517 described as follows: 519 /* Initialization at system boot time. Could be random */ 520 id_inc= 1; 522 /* Identifier selection function */ 523 count = max_id - min_id + 1; 525 if(lookup_counter(CONTEXT) == ERROR){ 526 create_counter(CONTEXT); 527 } 529 next_id= lookup_counter(CONTEXT); 531 do { 532 if (next_id == max_id) { 533 next_id = min_id; 534 } 535 else { 536 next_id = next_id + id_inc; 537 } 539 if (check_suitable_id(next_id)){ 540 store_counter(CONTEXT, next_id); 541 return next_id; 542 } 544 count--; 546 } while (count > 0); 548 store_counter(CONTEXT, next_id); 549 return ERROR; 551 NOTE: 552 lookup_counter() returns the current counter for a given context, 553 or an error condition if such a counter does not exist. 555 create_counter() creates a counter for a given context, and 556 initializes such counter to a random value. 558 store_counter() saves (updates) the current counter for a given 559 context. 561 check_suitable_id() is a function that checks whether the 562 resulting identifier is acceptable (e.g., whether its in use, 563 etc.). 565 Essentially, whenever a new identifier is to be selected, the 566 algorithm checks whether there there is a counter for the 567 corresponding context. If there is, such counter is incremented to 568 obtain the new identifier, and the new identifier updates the 569 corresponding counter. If there is no counter for such context, a 570 new counter is created an initialized to a random value, and used as 571 the new identifier. 573 This algorithm produces a per-context counter, which results in one 574 linear function for each context. Since the origin of each "line" is 575 a random value, the resulting values are unknown to an off-path 576 attacker. 578 This algorithm has the following drawbacks: 580 o If, as a result of resource management, the counter for a given 581 context must be removed, the last identifier value used for that 582 context will be lost. Thus, if subsequently an identifier needs 583 to be generated for such context, that counter will need to be 584 recreated and reinitialized to random value, thus possibly leading 585 to reuse/collistion of identifiers. 587 o If the identifiers are predictable by the destination system 588 (e.g., the destination host represents the "context"), a 589 vulnerable host might possibly leak to third parties the 590 identifiers used by other hosts to send traffic to it (i.e., a 591 vulnerable Host B could leak to Host C the identifier values that 592 Host A is using to send packets to Host B). Appendix A of 593 [RFC7739] describes one possible scenario for such leakage in 594 detail. 596 Otherwise, the identifiers produced by this algorithm do not suffer 597 from the other issues discussed in Section 8. 599 7.4.2. Simple Hash-Based Algorithm 601 The goal of this algorithm is to produce monotonically-increasing 602 sequences, with a randomized initial value, for each given context. 603 For example, if the identifiers being generated must be unique for 604 each {src IP, dst IP} set, then each possible combination of {src IP, 605 dst IP} should have a corresponding "next_id" value. 607 Keeping one value for each possible "context" may in many cases be 608 considered too onerous in terms of memory requirements. As a 609 workaround, the following algorithm employs a calculated technique 610 (as opposed to keeping state in memory) to maintain the random offset 611 for each possible context. 613 In the following algorithm, the function F() provides (statelessly) a 614 random offset for each given context. 616 /* Initialization at system boot time. Could be random. */ 617 counter = 0; 619 /* Protocol ID selection function */ 620 id_range = max_id - min_id + 1; 621 offset = F(CONTEXT, secret_key); 622 count = id_range; 624 do { 625 next_id = min_id + 626 (counter + offset) % id_range; 628 counter++; 630 if(check_suitable_id(next_id)) 631 return next_id; 633 count--; 635 } while (count > 0); 637 return ERROR; 639 The function F() provides a "per-CONTEXT" fixed offset within the 640 identifier space. Both the 'offset' and 'counter' variables may take 641 any value within the storage type range since we are restricting the 642 resulting identifier to be in the range [min_id, max_id] in a similar 643 way as in the algorithm described in Section 7.1.1. This allows us 644 to simply increment the 'counter' variable and rely on the unsigned 645 integer to wrap around. 647 The function F() should be a cryptographic hash function like SHA-256 648 [FIPS-SHS]. Note: MD5 [RFC1321] is considered unacceptable for F() 649 [RFC6151]. CONTEXT is the concatenation of all the elements that 650 define a given context. For example, if this algorithm is expected 651 to produce identifiers that are monotonically-increasing for each set 652 (Source IP Address, Destination IP Address), the CONTEXT should be 653 the concatenation of these two values. 655 The secret should be chosen to be as random as possible (see 656 [RFC4086] for recommendations on choosing secrets). 658 It should be noted that, since this algorithm uses a global counter 659 ("counter") for selecting identifiers, this algorithm produces an 660 information leakage (as described in Section 8.2). For example, if 661 this algorithm were used for TCP ephemeral port selection, and an 662 attacker could force a client to periodically establish a new TCP 663 connection to an attacker-controlled machine (or through an attacker- 664 observable routing path), the attacker could subtract consecutive 665 source port values to obtain the number of outgoing TCP connections 666 established globally by the target host within that time period (up 667 to wrap-around issues and five-tuple collisions, of course). 669 7.4.3. Double-Hash Algorithm 671 A trade-off between maintaining a single global 'counter' variable 672 and maintaining 2**N 'counter' variables (where N is the width of the 673 result of F()) could be achieved as follows. The system would keep 674 an array of TABLE_LENGTH integers, which would provide a separation 675 of the increment of the 'counter' variable. This improvement could 676 be incorporated into the algorithm from Section 7.4.2 as follows: 678 /* Initialization at system boot time */ 679 for(i = 0; i < TABLE_LENGTH; i++) 680 table[i] = random(); 682 id_inc = 1; 684 /* Protocol ID selection function */ 685 id_range = max_id - min_id + 1; 686 offset = F(CONTEXT, secret_key1); 687 index = G(CONTEXT, secret_key2); 688 count = id_range; 690 do { 691 next_id = min_id + (offset + table[index]) % id_range; 692 table[index] = table[index] + id_inc; 694 if(check_suitable_id(next_id)) 695 return next_id; 697 count--; 699 } while (count > 0); 701 return ERROR; 703 'table[]' could be initialized with random values, as indicated by 704 the initialization code in pseudo-code above. The function G() 705 should be a cryptographic hash function. It should use the same 706 CONTEXT as F(), and a secret key value to compute a value between 0 707 and (TABLE_LENGTH-1). 709 The array 'table[]' assures that successive identifiers for a given 710 context will be monotonically-increasing. However, the increments 711 space is separated into TABLE_LENGTH different spaces, and thus 712 identifier reuse frequency will be (probabilistically) lower than 713 that of the algorithm in Section 7.4.2. That is, the generation of 714 identifier for one given context will not necessarily result in 715 increments in the identifiers for other contexts. 717 It is interesting to note that the size of 'table[]' does not limit 718 the number of different identifier sequences, but rather separates 719 the *increments* into TABLE_LENGTH different spaces. The identifier 720 sequence will result from adding the corresponding entry of 'table[]' 721 to the variable 'offset', which selects the actual identifier 722 sequence (as in the algorithm from Section 7.4.2). 724 An attacker can perform traffic analysis for any "increment space" 725 (i.e., context) into which the attacker has "visibility" -- namely, 726 the attacker can force a node to generate identifiers where G(offset) 727 identifies the target "increment space". However, the attacker's 728 ability to perform traffic analysis is very reduced when compared to 729 the predictable linear identifiers (described in Appendix A.1) and 730 the hash-based identifiers (described in Section 7.4.2). 731 Additionally, an implementation can further limit the attacker's 732 ability to perform traffic analysis by further separating the 733 increment space (that is, using a larger value for TABLE_LENGTH) and/ 734 or by randomizing the increments. 736 Otherwise, this algorithm does not suffer from the issues discussed 737 in Section 8. 739 8. Common Vulnerabilities Associated with Transient Numeric Identifiers 741 8.1. Network Activity Correlation 743 An identifier that is predictable or stable within a given context 744 allows for network activity correlation within that context. 746 For example, a stable IPv6 Interface Identifier allows for network 747 activity to be correlated for the context in which that address is 748 stable [RFC7721]. A stable-per-network (as in [RFC7217] allows for 749 network activity correlation within a network, whereas a constant 750 IPv6 Interface Identifier allows not only network activity 751 correlation within the same network, but also across networks ("host 752 tracking"). 754 Predictable transient numeric identifiers can also allow for network 755 activity correlation. For example, a node that generates TCP ISNs 756 with a global counter will typically allow network activity 757 correlation even as it roams across networks, since the communicating 758 nodes could infer the identity of the node based on the TCP ISNs 759 employed for subsequent communication instances. Similarly, a node 760 that generates predictable IPv6 Fragment Identification values could 761 be subject to network activity correlation (see e.g. 762 [Bellovin2002]). 764 8.2. Information Leakage 766 Transient numeric identifiers that are not randomized can leak out 767 information to other communicating nodes. For example, it is common 768 to generate identifiers like: 770 ID = offset(CONTEXT_1) + linear(CONTEXT_2); 772 This generic expression generates identifiers by adding a linear 773 function to an offset. The offset is constant within a given 774 context, whereas linear() is a linear function for a given context 775 (possibly different to that of offset()). Identifiers generated with 776 this expression will generally be predictable within CONTEXT_1. 777 Thus, CONTEXT_1 essentially specifies e.g. the context within which 778 network activity correlation is possible thanks to these identifiers. 779 When CONTEXT_1 is "global" (e.g., offset() is simply a constant 780 value), then all the corresponding transient numeric identifiers 781 become predictable in all contexts. 783 NOTE: If offset() has a global context and the specific value is 784 known, the resulting identifiers may leak even more information. 785 For example, the if Fragment Identification values are generated 786 with the generic function above, and CONTEXT_1 is "global", then 787 the corresponding identifiers will leak the number of fragmented 788 datagrams sent for CONTEXT_2. If both CONTEXT_1 and CONTEXT_2 are 789 "global", then Fragment Identification values would be generated 790 with a global counter (initialized to offset()), and thus each 791 generated Fragment Identification value would leak the number of 792 fragmented datagrams transmitted by the node since it was 793 bootstrapped. 795 On the other hand, linear() will be predictable within CONTEXT_2. 796 The predictability of linear(), irrespective of the context and/or 797 predictability of offset(), can leak out information that is of use 798 to attackers. For example, a node that selects ephemeral port 799 numbers on as in: 801 ehemeral_port = offset(Dest_IP) + linear() 803 that is, with a per-destination offset, but global linear() function 804 (e.g., a global counter), will leak information about the number of 805 outgoing connections that have been issued between any two issued 806 outgoing connections. 808 Similarly, a node that generates Fragment Identification values as 809 in: 811 Frag_ID = offset(Srd_IP, Dst_IP) + linear() 813 will leak out information about the number of fragmented packets that 814 have been transmitted between any two other transmitted fragmented 815 packets. The vulnerabilities described in [Sanfilippo1998a], 816 [Sanfilippo1998b], and [Sanfilippo1999] are all associated with the 817 use of a global linear() function (i.e., a global CONTEXT_2). 819 8.3. Exploitation of Semantics of Transient Numeric Identifiers 821 Identifiers that are not semantically opaque tend to be more 822 predictable than semantically-opaque identifiers. For example, a MAC 823 address contains an OUI (Organizationally-Unique Identifier) which 824 identifies the vendor that manufactured the underlying network 825 interface card. This fact may be leveraged by an attacker meaning to 826 "predict" MAC addresses, if he has some knowledge about the possible 827 NIC vendor. 829 [RFC7707] discusses a number of techniques to reduce the search space 830 when performing IPv6 address-scanning attacks by leveraging the 831 semantics of the IIDs produced by a number by traditional IID- 832 generation algorithms (now replaced by [RFC8064] with [RFC7217]). 834 8.4. Exploitation of Collisions of Transient Numeric Identifiers 836 In many cases, th collision of transient network identifiers can have 837 a hard failure severity (or result in a hard failure severity if an 838 attacker can cause multiple collisions deterministically, one after 839 another). For example, predictable Fragment Identification values 840 open the door to Denial of Service (DoS) attacks (see e.g. 841 [RFC5722]. Similarly, predictable TCP ISNs open the door to trivial 842 connection-reset and data injection attacks (see e.g. 843 [Joncheray1995]). 845 9. Vulnerability Analysis of Specific Transient Numeric Identifiers 846 Categories 848 The following subsections analyze common vulnerabilities associated 849 with the generation of identifiers for each of the categories 850 identified in Section 6. 852 9.1. Category #1: Uniqueness (soft failure) 854 Possible vulnerabilities associated with identifiers of this category 855 are: 857 o Use of trivial algorithms (e.g. global counters) that generate 858 predictable identifiers 860 o Use of flawed PRNGs (please see e.g. [Zalewski2001], 861 [Zalewski2002] and [Klein2007]) 863 Since the only interoperability requirement for these identifiers is 864 uniqueness, the obvious approach to generate them is to employ a 865 PRNG. An implementer should consult [RFC4086] regarding randomness 866 requirements for security, and consult relevant documentation when 867 employing a PRNG provided by the underlying system. 869 Use of algorithms other than PRNGs for generating identifiers of this 870 category is discouraged. 872 9.2. Category #2: Uniqueness (hard failure) 874 As noted in Section 7.2 this category typically employs the same 875 algorithms as Category #4, since a monotonically-increasing sequence 876 tends to minimize the identifier reuse frequency. Therefore, the 877 vulnerability analysis of Section 9.4 applies to this case. 879 9.3. Category #3: Uniqueness, constant within context (soft failure) 881 There are two main vulnerabilities that may be associated with 882 identifiers of this category: 884 1. Use algorithms or sources that result in predictable identifiers 886 2. Employing the same identifier across contexts in which constantcy 887 is not required 889 At times, an implementation or specification may be tempted to employ 890 a source for the identifier which is known to provide unique values. 891 However, while unique, the associated identifiers may have other 892 properties such as being predictable or leaking information about the 893 node in question. For example, as noted in [RFC7721], embedding 894 link-layer addresses for generating IPv6 IIDs not only results in 895 predictable values, but also leaks information about the manufacturer 896 of the network interface card. 898 On the other hand, using an identifier across contexts where 899 constantcy is not required can be leveraged for correlation of 900 activities. On of the most trivial examples of this is the use of 901 IPv6 IIDs that are constant across networks (such as IIDs that embed 902 the underlying link-layer address). 904 9.4. Category #4: Uniqueness, monotonically increasing within context 905 (hard failure) 907 A simple way to generalize algorithms employed for generating 908 identifiers of Category #4 would be as follows: 910 /* Identifier selection function */ 911 count = max_id - min_id + 1; 913 do { 914 linear(CONTEXT_2)= linear(CONTEXT_2) + increment(); 915 next_id= offset(CONTEXT_1) + linear(CONTEXT_2); 917 if(check_suitable_id(next_id)) 918 return next_id; 920 count--; 921 } while (count > 0); 923 return ERROR; 925 Essentially, an identifier (next_id) is generated by adding a linear 926 function (linear()) to an offset value, which is unknown to the 927 attacker, and constant for given context (CONTEXT_1). 929 The following aspects of the algorithm should be considered: 931 o For the most part, it is the offset() function that results in 932 identifiers that are unpredictable by an off-path attacker. While 933 the resulting sequence will be monotonically-increasing, the use 934 of an offset value that is unknown to the attacker makes the 935 resulting values unknown to the attacker. 937 o The most straightforward "stateless" implementation of offset 938 would be that in which offset() is the result of a 939 cryptographically-secure hash-function that takes the values that 940 identify the context and a "secret_key" (not shown in the figure 941 above) as arguments. 943 o Another possible (but stateful) approach would be to simply 944 generate a random "per-context" offset and store it in memory, and 945 then look-up the corresponding context when a new identifier is to 946 be selected. The algorithm in Section 7.4.1 is essentially an 947 implementation of this type. 949 o The linear function is incremented according to increment(). In 950 the most trivial case increment() could always return the constant 951 "1". But it could also possibly return small random integers such 952 the increments are unpredictable. 954 Considering the generic algorithm illustrated above we can identify 955 the following possible vulnerabilities: 957 o If the offset value spans more than the necessary context, 958 identifiers could be unnecessarily predictable by other parties, 959 since the offset value would be unnecessarily leaked to them. For 960 example, an implementation that means to produce a per-destination 961 counter but replaces offset() with a constant number (i.e., 962 employs a global counter), will unnecessarily result in 963 predictable identifiers. 965 o The function linear() could be seen as representing the number of 966 identifiers that have so far been generated for a given context 967 (CONTEXT_2). If linear() spans more than the necessary context, 968 the "increments" could be leaked to other parties, thus disclosing 969 information about the number of identifiers that have so far been 970 generated. For example, an implementation in which linear() is 971 implemented as a single global counter will unnecessarily leak 972 information the number of identifiers that have been produced. 973 [Fyodor2004] is one example of how such information leakages can 974 be exploited. 976 o increment() determines how the linear() is incremented for each 977 identifier that is selected. In the most trivial case, 978 increment() will return the integer "1". However, an 979 implementation may have increment() return a "small" random 980 integer value such that even if the current value employed by the 981 generator is guessed (see Appendix A of [RFC7739]), the exact next 982 identifier to be selected will be slightly harder to identify. 984 10. IANA Considerations 986 There are no IANA registries within this document. The RFC-Editor 987 can remove this section before publication of this document as an 988 RFC. 990 11. Security Considerations 992 The entire document is about the security and privacy implications of 993 identifiers. [I-D.gont-numeric-ids-sec-considerations] formally 994 requires protocols specifications to include an appropriate analysis 995 of the interoperability, security, and privacy implications of the 996 transient numeric identifiers they specify, while this document 997 analyzes possible algorithms (and their implications) that could be 998 employed to comply with the interoperability properties of a 999 transient numeric identifier, while mitigating the possible security 1000 and privacy implications. 1002 12. Acknowledgements 1004 The authors would like to thank (in alphabetical order) Steven 1005 Bellovin, Joseph Lorenzo Hall, Gre Norcie, and Martin Thomson, for 1006 providing valuable comments on earlier versions of this document. 1008 The authors would like to thank Diego Armando Maradona for his magic 1009 and inspiration. 1011 13. References 1013 13.1. Normative References 1015 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 1016 RFC 793, DOI 10.17487/RFC0793, September 1981, 1017 . 1019 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1020 Requirement Levels", BCP 14, RFC 2119, 1021 DOI 10.17487/RFC2119, March 1997, 1022 . 1024 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1025 (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, 1026 December 1998, . 1028 [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, 1029 "Randomness Requirements for Security", BCP 106, RFC 4086, 1030 DOI 10.17487/RFC4086, June 2005, 1031 . 1033 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 1034 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 1035 2006, . 1037 [RFC4862] Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless 1038 Address Autoconfiguration", RFC 4862, 1039 DOI 10.17487/RFC4862, September 2007, 1040 . 1042 [RFC5722] Krishnan, S., "Handling of Overlapping IPv6 Fragments", 1043 RFC 5722, DOI 10.17487/RFC5722, December 2009, 1044 . 1046 [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence 1047 Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February 1048 2012, . 1050 [RFC7217] Gont, F., "A Method for Generating Semantically Opaque 1051 Interface Identifiers with IPv6 Stateless Address 1052 Autoconfiguration (SLAAC)", RFC 7217, 1053 DOI 10.17487/RFC7217, April 2014, 1054 . 1056 [RFC8064] Gont, F., Cooper, A., Thaler, D., and W. Liu, 1057 "Recommendation on Stable IPv6 Interface Identifiers", 1058 RFC 8064, DOI 10.17487/RFC8064, February 2017, 1059 . 1061 13.2. Informative References 1063 [Bellovin2002] 1064 Bellovin, S., "A Technique for Counting NATted Hosts", 1065 IMW'02 Nov. 6-8, 2002, Marseille, France, 2002. 1067 [CPNI-TCP] 1068 Gont, F., "Security Assessment of the Transmission Control 1069 Protocol (TCP)", United Kingdom's Centre for the 1070 Protection of National Infrastructure (CPNI) Technical 1071 Report, 2009, . 1074 [FIPS-SHS] 1075 FIPS, "Secure Hash Standard (SHS)", Federal Information 1076 Processing Standards Publication 180-4, March 2012, 1077 . 1080 [Fyodor2004] 1081 Fyodor, "Idle scanning and related IP ID games", 2004, 1082 . 1084 [I-D.gont-numeric-ids-history] 1085 Gont, F. and I. Arce, "Unfortunate History of Transient 1086 Numeric Identifiers", draft-gont-numeric-ids-history-05 1087 (work in progress), July 2019. 1089 [I-D.gont-numeric-ids-sec-considerations] 1090 Gont, F. and I. Arce, "Security Considerations for 1091 Transient Numeric Identifiers Employed in Network 1092 Protocols", draft-gont-numeric-ids-sec-considerations-04 1093 (work in progress), July 2019. 1095 [Joncheray1995] 1096 Joncheray, L., "A Simple Active Attack Against TCP", Proc. 1097 Fifth Usenix UNIX Security Symposium, 1995. 1099 [Klein2007] 1100 Klein, A., "OpenBSD DNS Cache Poisoning and Multiple O/S 1101 Predictable IP ID Vulnerability", 2007, 1102 . 1105 [Morris1985] 1106 Morris, R., "A Weakness in the 4.2BSD UNIX TCP/IP 1107 Software", CSTR 117, AT&T Bell Laboratories, Murray Hill, 1108 NJ, 1985, 1109 . 1111 [RFC1035] Mockapetris, P., "Domain names - implementation and 1112 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 1113 November 1987, . 1115 [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, 1116 DOI 10.17487/RFC1321, April 1992, 1117 . 1119 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 1120 Errors at High Data Rates", RFC 4963, 1121 DOI 10.17487/RFC4963, July 2007, 1122 . 1124 [RFC6056] Larsen, M. and F. Gont, "Recommendations for Transport- 1125 Protocol Port Randomization", BCP 156, RFC 6056, 1126 DOI 10.17487/RFC6056, January 2011, 1127 . 1129 [RFC6151] Turner, S. and L. Chen, "Updated Security Considerations 1130 for the MD5 Message-Digest and the HMAC-MD5 Algorithms", 1131 RFC 6151, DOI 10.17487/RFC6151, March 2011, 1132 . 1134 [RFC7098] Carpenter, B., Jiang, S., and W. Tarreau, "Using the IPv6 1135 Flow Label for Load Balancing in Server Farms", RFC 7098, 1136 DOI 10.17487/RFC7098, January 2014, 1137 . 1139 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 1140 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 1141 2014, . 1143 [RFC7707] Gont, F. and T. Chown, "Network Reconnaissance in IPv6 1144 Networks", RFC 7707, DOI 10.17487/RFC7707, March 2016, 1145 . 1147 [RFC7721] Cooper, A., Gont, F., and D. Thaler, "Security and Privacy 1148 Considerations for IPv6 Address Generation Mechanisms", 1149 RFC 7721, DOI 10.17487/RFC7721, March 2016, 1150 . 1152 [RFC7739] Gont, F., "Security Implications of Predictable Fragment 1153 Identification Values", RFC 7739, DOI 10.17487/RFC7739, 1154 February 2016, . 1156 [Sanfilippo1998a] 1157 Sanfilippo, S., "about the ip header id", Post to Bugtraq 1158 mailing-list, Mon Dec 14 1998, 1159 . 1161 [Sanfilippo1998b] 1162 Sanfilippo, S., "Idle scan", Post to Bugtraq mailing-list, 1163 1998, . 1165 [Sanfilippo1999] 1166 Sanfilippo, S., "more ip id", Post to Bugtraq mailing- 1167 list, 1999, 1168 . 1170 [Shimomura1995] 1171 Shimomura, T., "Technical details of the attack described 1172 by Markoff in NYT", Message posted in USENET's 1173 comp.security.misc newsgroup Message-ID: 1174 <3g5gkl$5j1@ariel.sdsc.edu>, 1995, 1175 . 1177 [Silbersack2005] 1178 Silbersack, M., "Improving TCP/IP security through 1179 randomization without sacrificing interoperability", 1180 EuroBSDCon 2005 Conference, 2005, 1181 . 1184 [Zalewski2001] 1185 Zalewski, M., "Strange Attractors and TCP/IP Sequence 1186 Number Analysis", 2001, 1187 . 1189 [Zalewski2002] 1190 Zalewski, M., "Strange Attractors and TCP/IP Sequence 1191 Number Analysis - One Year Later", 2001, 1192 . 1194 Appendix A. Flawed Algorithms 1196 The following subsections document algorithms with known negative 1197 security and privacy implications. 1199 A.1. Predictable Linear Identifiers Algorithm 1201 One of the most trivial ways to achieve uniqueness with a low 1202 identifier reuse frequency is to produce a linear sequence. 1204 For example, the following algorithm has been employed (see e.g. 1205 [Morris1985], [Shimomura1995], [Silbersack2005] and [CPNI-TCP]) in a 1206 number of operating systems for selecting IP fragment IDs, TCP 1207 ephemeral ports, etc.: 1209 /* Initialization at system boot time. Could be random */ 1210 next_id = min_id; 1211 id_inc= 1; 1213 /* Identifier selection function */ 1214 count = max_id - min_id + 1; 1216 do { 1217 if (next_id == max_id) { 1218 next_id = min_id; 1219 } 1220 else { 1221 next_id = next_id + id_inc; 1222 } 1224 if (check_suitable_id(next_id)) 1225 return next_id; 1227 count--; 1229 } while (count > 0); 1231 return ERROR; 1233 Note: 1234 check_suitable_id() is a function that checks whether the 1235 resulting identifier is acceptable (e.g., whether its in use, 1236 etc.). 1238 For obvious reasons, this algorithm results in predicable sequences. 1239 If a global counter is used (such as "next_id" in the example above), 1240 a node that learns one protocol identifier can also learn or guess 1241 values employed by past and future protocol instances. On the other 1242 hand, when the value of increments is known (such as "1" in this 1243 case), an attacker can sample two values, and learn the number of 1244 identifiers that were generated in-between. 1246 Where identifier reuse would lead to a hard failure, one typical 1247 approach to generate unique identifiers (while minimizing the 1248 security and privacy implications of predictable identifiers) is to 1249 obfuscate the resulting protocol IDs by either: 1251 o Replace the global counter with multiple counters (initialized to 1252 a random value) 1254 o Randomizing the "increments" 1256 Avoiding global counters essentially means that learning one 1257 identifier for a given context (e.g., one TCP ephemeral port for a 1258 given {src IP, Dst IP, Dst Port}) is of no use for learning or 1259 guessing identifiers for a different context (e.g., TCP ephemeral 1260 ports that involve other peers). However, this may imply keeping one 1261 additional variable/counter per context, which may be prohibitive in 1262 some environments. The choice of id_inc has implications on both the 1263 security and privacy properties of the resulting identifiers, but 1264 also on the corresponding interoperability properties. On one hand, 1265 minimizing the increments (as in "id_inc = 1" in our case) generally 1266 minimizes the identifier reuse frequency, albeit at increased 1267 predictability. On the other hand, if the increments are randomized, 1268 predictability of the resulting identifiers is reduced, and the 1269 information leakage produced by global constant increments is 1270 mitigated. However, using larger increments than necessary can 1271 result in an increased identifier reuse frequency. 1273 A.2. Random-Increments Algorithm 1275 This algorithm offers a middle ground between the algorithms that 1276 select ephemeral ports randomly (such as those described in 1277 Section 7.1.1 and Section 7.1.2), and those that offer obfuscation 1278 but no randomization (such as those described in Section 7.4.2 and 1279 Section 7.4.3). 1281 /* Initialization code at system boot time. */ 1282 next_id = random(); /* Initialization value */ 1283 id_inc = 500; /* Determines the trade-off */ 1285 /* Identifier selection function */ 1286 id_range = max_id - min_id + 1; 1288 count = id_range; 1290 do { 1291 /* Random increment */ 1292 next_id = next_id + (random() % id_inc) + 1; 1294 /* Keep the identifier within acceptable range */ 1295 next_id = min_id + (next_id % id_range); 1297 if(check_suitable_id(next_id)) 1298 return next_id; 1300 count--; 1301 } while (count > 0); 1303 return ERROR; 1305 This algorithm aims at producing a monotonically increasing sequence 1306 of identifiers, while avoiding the use of fixed increments, which 1307 would lead to trivially predictable sequences. The value "id_inc" 1308 allows for direct control of the trade-off between the level of 1309 obfuscation and the ID reuse frequency. The smaller the value of 1310 "id_inc", the more similar this algorithm is to a predicable, global 1311 monotonically-increasing ID generation algorithm. The larger the 1312 value of "id_inc", the more similar this algorithm is to the 1313 algorithm described in Section 7.1.1 of this document. 1315 When the identifiers wrap, there is the risk of collisions of 1316 identifiers (i.e., identifier reuse). Therefore, "id_inc" should be 1317 selected according to the following criteria: 1319 o It should maximize the wrapping time of the identifier space. 1321 o It should minimize identifier reuse frequency. 1323 o It should maximize obfuscation. 1325 Clearly, these are competing goals, and the decision of which value 1326 of "id_inc" to use is a trade-off. Therefore, the value of "id_inc" 1327 should be configurable so that system administrators can make the 1328 trade-off for themselves. 1330 Authors' Addresses 1332 Fernando Gont 1333 SI6 Networks 1334 Evaristo Carriego 2644 1335 Haedo, Provincia de Buenos Aires 1706 1336 Argentina 1338 Phone: +54 11 4650 8472 1339 Email: fgont@si6networks.com 1340 URI: https://www.si6networks.com 1342 Ivan Arce 1343 Quarkslab 1345 Email: iarce@quarkslab.com 1346 URI: https://www.quarkslab.com