idnits 2.17.1 draft-ietf-dnsop-rfc5011-security-considerations-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC7583, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 19, 2017) is 2313 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 7583 ** Obsolete normative reference: RFC 7719 (Obsoleted by RFC 8499) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 dnsop W. Hardaker 3 Internet-Draft USC/ISI 4 Updates: 7583 (if approved) W. Kumari 5 Intended status: Standards Track Google 6 Expires: June 22, 2018 December 19, 2017 8 Security Considerations for RFC5011 Publishers 9 draft-ietf-dnsop-rfc5011-security-considerations-10 11 Abstract 13 This document extends the RFC5011 rollover strategy with timing 14 advice that must be followed by the publisher in order to maintain 15 security. Specifically, this document describes the math behind the 16 minimum time-length that a DNS zone publisher must wait before 17 signing exclusively with recently added DNSKEYs. This document also 18 describes the minimum time-length that a DNS zone publisher must wait 19 after publishing a revoked DNSKEY before assuming that all active 20 RFC5011 resolvers should have seen the revocation-marked key and 21 removed it from their list of trust anchors. 23 This document contains much math and complicated equations, but the 24 summary is that the key rollover / revocation time is much longer 25 than intuition would suggest. If you are not both publishing a 26 DNSSEC DNSKEY, and using RFC5011 to advertise this DNSKEY as a new 27 Secure Entry Point key for use as a trust anchor, you probably don't 28 need to read this document. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on June 22, 2018. 47 Copyright Notice 49 Copyright (c) 2017 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Document History and Motivation . . . . . . . . . . . . . 3 66 1.2. Safely Rolling the Root Zone's KSK in 2017/2018 . . . . . 3 67 1.3. Requirements notation . . . . . . . . . . . . . . . . . . 4 68 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 4. Timing Associated with RFC5011 Processing . . . . . . . . . . 5 71 4.1. Timing Associated with Publication . . . . . . . . . . . 5 72 4.2. Timing Associated with Revocation . . . . . . . . . . . . 5 73 5. Denial of Service Attack Walkthrough . . . . . . . . . . . . 6 74 5.1. Enumerated Attack Example . . . . . . . . . . . . . . . . 6 75 5.1.1. Attack Timing Breakdown . . . . . . . . . . . . . . . 7 76 6. Minimum RFC5011 Timing Requirements . . . . . . . . . . . . . 8 77 6.1. Equation Components . . . . . . . . . . . . . . . . . . . 9 78 6.1.1. addHoldDownTime . . . . . . . . . . . . . . . . . . . 9 79 6.1.2. lastSigExpirationTime . . . . . . . . . . . . . . . . 9 80 6.1.3. sigExpirationTime . . . . . . . . . . . . . . . . . . 9 81 6.1.4. sigExpirationTimeRemaining . . . . . . . . . . . . . 9 82 6.1.5. activeRefresh . . . . . . . . . . . . . . . . . . . . 9 83 6.1.6. activeRefreshOffset . . . . . . . . . . . . . . . . . 10 84 6.1.7. driftSafetyMargin . . . . . . . . . . . . . . . . . . 10 85 6.1.8. timingSafetyMargin . . . . . . . . . . . . . . . . . 10 86 6.1.9. retrySafetyMargin . . . . . . . . . . . . . . . . . . 11 87 6.2. Timing Requirements For Adding a New KSK . . . . . . . . 12 88 6.2.1. Wait Timer Based Calculation . . . . . . . . . . . . 12 89 6.2.2. Wall-Clock Based Calculation . . . . . . . . . . . . 13 90 6.2.3. Timing Constraint Summary . . . . . . . . . . . . . . 13 91 6.2.4. Additional Considerations for RFC7583 . . . . . . . . 14 92 6.2.5. Example Scenario Calculations . . . . . . . . . . . . 14 93 6.3. Timing Requirements For Revoking an Old KSK . . . . . . . 14 94 6.3.1. Wait Timer Based Calculation . . . . . . . . . . . . 15 95 6.3.2. Wall-Clock Based Calculation . . . . . . . . . . . . 15 96 6.3.3. Additional Considerations for RFC7583 . . . . . . . . 16 97 6.3.4. Example Scenario Calculations . . . . . . . . . . . . 16 98 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 99 8. Operational Considerations . . . . . . . . . . . . . . . . . 16 100 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 101 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 102 11. Normative References . . . . . . . . . . . . . . . . . . . . 17 103 Appendix A. Real World Example: The 2017 Root KSK Key Roll . . . 18 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 106 1. Introduction 108 [RFC5011] defines a mechanism by which DNSSEC validators can update 109 their list of trust anchors when they've seen a new key published in 110 a zone or revoke a properly marked key from a trust anchor list. 111 However, RFC5011 [intentionally] provides no guidance to the 112 publishers of DNSKEYs about how long they must wait before switching 113 to exclusively using recently published keys for signing records, or 114 how long they must wait before ceasing publication of a revoked key. 115 Because of this lack of guidance, zone publishers may derive 116 incorrect assumptions about safe usage of the RFC5011 DNSKEY 117 advertising, rolling and revocation process. This document describes 118 the minimum security requirements from a publisher's point of view 119 and is intended to complement the guidance offered in RFC5011 (which 120 is written to provide timing guidance solely to a Validating 121 Resolver's point of view). 123 1.1. Document History and Motivation 125 To verify this lack of understanding is wide-spread, the authors 126 reached out to 5 DNSSEC experts to ask them how long they thought 127 they must wait before signing a zone exclusively with a new KSK 128 [RFC4033] that was being introduced according to the 5011 process. 129 All 5 experts answered with an insecure value, and we determined that 130 this lack of mathematical understanding might cause security concerns 131 in deployment. We hope that this companion document to RFC5011 will 132 rectify this understanding and provide better guidance to zone 133 publishers that wish to make use of the RFC5011 rollover process. 135 1.2. Safely Rolling the Root Zone's KSK in 2017/2018 137 One important note about ICANN's (currently in process) 2017/2018 KSK 138 rollover plan for the root zone: the timing values chosen for rolling 139 the KSK in the root zone appear completely safe, and are not affected 140 by the timing concerns introduced by this draft 142 1.3. Requirements notation 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in [RFC2119]. 148 2. Background 150 The RFC5011 process describes a process by which a RFC5011 Resolver 151 may accept a newly published KSK as a trust anchor for validating 152 future DNSSEC signed records. It also describes the process for 153 publicly revoking a published KSK. This document augments that 154 information with additional constraints, from the SEP publisher's 155 points of view. Note that this document does not define any other 156 operational guidance or recommendations about the RFC5011 process and 157 restricts itself to solely the security and operational ramifications 158 of switching to exclusively using recently added keys or removing 159 revoked keys too soon. 161 Failure of a DNSKEY publisher to follow the minimum recommendations 162 associated with this draft can result in potential denial-of-service 163 attack opportunities against validating resolvers. Failure of a 164 DNSKEY publisher to publish a revoked key for a long enough period of 165 time may result in RFC5011 Resolvers leaving that key in their trust 166 anchor storage beyond the key's expected lifetime. 168 3. Terminology 170 SEP Publisher The entity responsible for publishing a DNSKEY (with 171 the Secure Entry Point (SEP) bit set) that can be used as a trust 172 anchor. 174 Zone Signer The owner of a zone intending to publish a new Key- 175 Signing-Key (KSK) that may become a trust anchor for validators 176 following the RFC5011 process. 178 RFC5011 Resolver A DNSSEC Resolver that is using the RFC5011 179 processes to track and update trust anchors. 181 Attacker An entity intent on foiling the RFC5011 Resolver's ability 182 to successfully adopt the Zone Signer's new DNSKEY as a new trust 183 anchor or to prevent the RFC5011 Resolver from removing an old 184 DNSKEY from its list of trust anchors. 186 sigExpirationTime The amount of time between the DNSKEY RRSIG's 187 Signature Inception field and the Signature Expiration field. 189 Also see Section 2 of [RFC4033] and [RFC7719] for additional 190 terminology. 192 4. Timing Associated with RFC5011 Processing 194 These sections define a high-level overview of [RFC5011] processing. 195 These steps are not sufficient for proper RFC5011 implementation, but 196 provide enough background for the reader to follow the discussion in 197 this document. Readers need to fully understand [RFC5011] as well to 198 fully comprehend the content and importance of this document. 200 4.1. Timing Associated with Publication 202 RFC5011's process of safely publishing a new DNSKEY and then assuming 203 RFC5011 Resolvers have adopted it for trust falls into a number of 204 high-level steps to be performed by the SEP Publisher. This document 205 discusses the following scenario, which the principle way RFC5011 is 206 currently being used (even though Section 6 of RFC5011 suggests 207 having a stand-by key available): 209 1. Publish a new DNSKEY in a zone, but continue to sign the zone 210 with the old one. 212 2. Wait a period of time. 214 3. Begin to exclusively use recently published DNSKEYs to sign the 215 appropriate resource records. 217 This document discusses the time required to wait during step 2 of 218 the above process. Some interpretations of RFC5011 have erroneously 219 determined that the wait time is equal to RFC5011's "hold down time". 220 Section 5 describes an attack based on this (common) erroneous 221 belief, which can result in a denial of service attack against the 222 zone. 224 4.2. Timing Associated with Revocation 226 RFC5011's process of advertising that an old key is to be revoked 227 from RFC5011 Resolvers falls into a number of high-level steps: 229 1. Set the revoke bit on the DNSKEY to be revoked. 231 2. Sign the revoked DNSKEY with itself. 233 3. Wait a period of time. 235 4. Remove the revoked key from the zone. 237 This document discusses the time required to wait in step 3 of the 238 above process. Some interpretations of RFC5011 have erroneously 239 determined that the wait time is equal to RFC5011's "hold down time". 240 This document describes an attack based on this (common) erroneous 241 belief, which results in a revoked DNSKEY potentially remaining as a 242 trust anchor in a RFC5011 Resolver long past its expected usage. 244 5. Denial of Service Attack Walkthrough 246 This section serves as an illustrative example of the problem being 247 discussed in this document. Note that in order to keep the example 248 simple enough to understand, some simplifications were made (such as 249 by not creating a set of pre-signed RRSIGs and by not using values 250 that result in the addHoldDownTime not being evenly divisible by the 251 activeRefresh value); the mathematical formulas in Section 6 are, 252 however, complete. 254 If an attacker is able to provide a RFC5011 Resolver with past 255 responses, such as when it is in-path or able to perform any number 256 of cache poisoning attacks, the attacker may be able to leave 257 compliant RFC5011 Resolvers without an appropriate DNSKEY trust 258 anchor. This scenario will remain until an administrator manually 259 fixes the situation. 261 The time-line below illustrates an example of this situation. 263 5.1. Enumerated Attack Example 265 The following example settings are used in the example scenario 266 within this section: 268 TTL (all records) 1 day 270 sigExpirationTime 10 days 272 Zone resigned every 1 day 274 Given these settings, the sequence of events in Section 5.1.1 depicts 275 how a SEP Publisher that waits for only the RFC5011 hold time timer 276 length of 30 days subjects its users to a potential Denial of Service 277 attack. The timing schedule listed below is based on a SEP Publisher 278 publishing a new Key Signing Key (KSK), with the intent that it will 279 later be used as a trust anchor. We label this publication time as 280 "T+0". All numbers in this sequence refer to days before and after 281 this initial publication event. Thus, T-1 is the day before the 282 introduction of the new key, and T+15 is the 15th day after the key 283 was introduced into the fictitious zone being discussed. 285 In this dialog, we consider two keys within the example zone: 287 K_old: An older KSK and Trust Anchor being replaced. 289 K_new: A new KSK being transitioned into active use and expected to 290 become a Trust Anchor via the RFC5011 automated trust anchor 291 update process. 293 5.1.1. Attack Timing Breakdown 295 The steps shows an attack that foils the adoption of a new DNSKEY by 296 a 5011 Resolver when the SEP Publisher that starts signing and 297 publishing with the new DNSKEY too quickly. 299 T-1 The K_old based RRSIGs are being published by the Zone Signer. 300 [It may also be signing ZSKs as well, but they are not relevant to 301 this event so we will not talk further about them; we are only 302 considering the RRSIGs that cover the DNSKEYs in this document.] 303 The Attacker queries for, retrieves and caches this DNSKEY set and 304 corresponding RRSIG signatures. 306 T+0 The Zone Signer adds K_new to their zone and signs the zone's 307 key set with K_old. The RFC5011 Resolver (later to be under 308 attack) retrieves this new key set and corresponding RRSIGs and 309 notices the publication of K_new. The RFC5011 Resolver starts the 310 (30-day) hold-down timer for K_new. [Note that in a more real- 311 world scenario there will likely be a further delay between the 312 point where the Zone Signer publishes a new RRSIG and the RFC5011 313 Resolver notices its publication; though not shown in this 314 example, this delay is accounted for in the equation in Section 6 315 below] 317 T+5 The RFC5011 Resolver queries for the zone's keyset per the 318 RFC5011 Active Refresh schedule, discussed in Section 2.3 of 319 RFC5011. Instead of receiving the intended published keyset, the 320 Attacker successfully replays the keyset and associated signatures 321 recorded at T-1 to the victim RFC5011 Resolver. Because the 322 signature lifetime is 10 days (in this example), the replayed 323 signature and keyset is accepted as valid (being only 6 days old, 324 which is less than sigExpirationTime) and the RFC5011 Resolver 325 cancels the (30-day) hold-down timer for K_new, per the RFC5011 326 algorithm. 328 T+10 The RFC5011 Resolver queries for the zone's keyset and 329 discovers a signed keyset that includes K_new (again), and is 330 signed by K_old. Note: the attacker is unable to replay the 331 records cached at T-1, because the signatures have now expired. 333 Thus at T+10, the RFC5011 Resolver starts (anew) the hold-timer 334 for K_new. 336 T+11 through T+29 The RFC5011 Resolver continues checking the zone's 337 key set at the prescribed regular intervals. During this period, 338 the attacker can no longer replay traffic to their benefit. 340 T+30 The Zone Signer knows that this is the first time at which some 341 validators might accept K_new as a new trust anchor, since the 342 hold-down timer of a RFC5011 Resolver not under attack that had 343 queried and retrieved K_new at T+0 would now have reached 30 days. 344 However, the hold-down timer of our attacked RFC5011 Resolver is 345 only at 20 days. 347 T+35 The Zone Signer (mistakenly) believes that all validators 348 following the Active Refresh schedule (Section 2.3 of RFC5011) 349 should have accepted K_new as a the new trust anchor (since the 350 hold down time (30 days) + the query interval [which is just 1/2 351 the signature validity period in this example] would have passed). 352 However, the hold-down timer of our attacked RFC5011 Resolver is 353 only at 25 days (T+35 minus T+10); thus the RFC5011 Resolver won't 354 consider it a valid trust anchor addition yet, as the required 30 355 days have not yet elapsed. 357 T+36 The Zone Signer, believing K_new is safe to use, switches their 358 active signing KSK to K_new and publishes a new RRSIG, signed with 359 (only) K_new, covering the DNSKEY set. Non-attacked RFC5011 360 validators, with a hold-down timer of at least 30 days, would have 361 accepted K_new into their set of trusted keys. But, because our 362 attacked RFC5011 Resolver now has a hold-down timer for K_new of 363 only 26 days, it failed to ever accept K_new as a trust anchor. 364 Since K_old is no longer being used to sign the zone's DNSKEYs, 365 all the DNSKEY records from the zone will be treated as invalid. 366 Subsequently, all of the records in the DNS tree below the zone's 367 apex will be deemed invalid by DNSSEC. 369 6. Minimum RFC5011 Timing Requirements 371 This section defines the minimum timing requirements for making 372 exclusive use of newly added DNSKEYs and timing requirements for 373 ceasing the publication of DNSKEYs to be revoked. We break our 374 timing solution requirements into two primary components: the 375 mathematically-based security analysis of the RFC5011 publication 376 process itself, and an extension of this that takes operational 377 realities into account that further affect the recommended timings. 379 First, we define the term components used in all equations in 380 Section 6.1. 382 6.1. Equation Components 384 6.1.1. addHoldDownTime 386 The addHoldDownTime is defined in Section 2.4.1 of [RFC5011] as: 388 The add hold-down time is 30 days or the expiration time of the 389 original TTL of the first trust point DNSKEY RRSet that contained 390 the new key, whichever is greater. This ensures that at least 391 two validated DNSKEY RRSets that contain the new key MUST be seen 392 by the resolver prior to the key's acceptance. 394 6.1.2. lastSigExpirationTime 396 The latest value (i.e. the future most date and time) of any RRSig 397 Signature Expiration field covering any DNSKEY RRSet containing only 398 the old trust anchor(s) that are being superseded. Note that for 399 organizations pre-creating signatures this time may be fairly far in 400 the future unless they can be significantly assured that none of 401 their pre-generated signatures can be replayed at a later date. 403 6.1.3. sigExpirationTime 405 The amount of time between the DNSKEY RRSIG's Signature Inception 406 field and the Signature Expiration field. 408 6.1.4. sigExpirationTimeRemaining 410 sigExpirationTimeRemaining is defined in Section 3. 412 6.1.5. activeRefresh 414 activeRefresh time is defined by RFC5011 by 416 A resolver that has been configured for an automatic update 417 of keys from a particular trust point MUST query that trust 418 point (e.g., do a lookup for the DNSKEY RRSet and related 419 RRSIG records) no less often than the lesser of 15 days, half 420 the original TTL for the DNSKEY RRSet, or half the RRSIG 421 expiration interval and no more often than once per hour. 423 This translates to: 425 activeRefresh = MAX(1 hour, 426 MIN(sigExpirationTime / 2, 427 MAX(TTL of K_old DNSKEY RRSet) / 2, 428 15 days) 429 ) 431 6.1.6. activeRefreshOffset 433 The activeRefreshOffset term must be added for situations where the 434 activeRefresh value is not a factor of the addHoldDownTime. 435 Specifically, activeRefreshOffset will be "addHoldDownTime % 436 activeRefresh", where % is the mathematical mod operator (calculating 437 the remainder in a division problem). This will frequently be zero, 438 but could be nearly as large as activeRefresh itself. 440 Note that later (in Section 6.1.8), when real-world scenerios will 441 trump this value that is useful only in theoretical worlds with no 442 network delays and other operational considerations. We leave it 443 here only as an important marker in the security analysis of the base 444 RFC5011 protocol. 446 6.1.7. driftSafetyMargin 448 Moving past the theoretical model parameters above, we not that clock 449 drift, network delays and implementation differences will result in 450 the RFC5011 Resolver query times to drift over time. Because of 451 this, a driftSafetyMargin term must be introduce that accounts for 452 these real world delays. We set this value to be the same as the 453 activeRefresh value, which will ensure that any timing drift in 454 RFC5011 Resolver queries will be accounted for. 456 Note: even a negative clock drift can actually cause RFC5011 457 Resolvers to require up to an extra activeRefresh period before it 458 will accept a new DNSKEY as a trust anchor. 460 6.1.8. timingSafetyMargin 462 Both of the activeRefreshOffset and driftSafetyMargin parameters deal 463 with timing delays introduced by mathematical analysis of RFC5011 464 (activeRefreshOffset) and by real world considerations 465 (driftSafetyMargin). To find a safe value to extend timing, we 466 define a timingSafetyMargin that is the maximum of these two values. 467 Since the driftSafetyMargin is set to activeRefresh, and 468 activeRefreshOffset is always less than an activeRefresh, the final 469 timingSafetyMargin value will be activeRefresh. 471 Explicitly expanding out the math: 473 timingSafetyMargin = min(activeRefreshOffset, driftSafetyMargin) 475 timingSafetyMargin = min(addHoldDownTime % activeRefresh, 476 activeRefresh) 478 timingSafetyMargin = activeRefresh 480 6.1.9. retrySafetyMargin 482 The retrySafetyMargin is an extra period of time to account for 483 caching, network delays, dropped packets, and other operational 484 concerns otherwise beyond the scope of this document. The value 485 operators should chose is highly dependent on the deployment 486 situation associated with their zone. Note that no value of a 487 retrySafetyMargin can protect against resolvers that are "down". 488 None the less, we do offer the following as one method considering 489 reasonable values to select from. 491 The following list of variables need to be considered when selecting 492 an appropriate retrySafetyMargin value: 494 successRate: A likely success rate for client queries and retries 496 numResolvers: The number of client RFC5011 Resolvers 498 Note that RFC5011 defines retryTime as: 500 If the query fails, the resolver MUST repeat the query until 501 satisfied no more often than once an hour and no less often 502 than the lesser of 1 day, 10% of the original TTL, or 10% of 503 the original expiration interval. That is, 504 retryTime = MAX (1 hour, MIN (1 day, .1 * origTTL, 505 .1 * expireInterval)). 507 With the successRate and numResolvers values selected and the 508 definition of retryTime from RFC5011, one method for determining how 509 many retryTime intervals to wait in order to reduce the set of 510 uncompleted servers to 0 assuming normal probability is thus: 512 x = (1/(1 - successRate)) 514 retryCountWait = Log_base_x(numResolvers) 516 To reduce the need for readers to pull out a scientific calculator, 517 we offer the following lookup table based on successRate and 518 numResolvers: 520 retryCountWait lookup table 521 --------------------------- 523 Number of client RFC5011 Resolvers (numResolvers) 524 ------------------------------------------------- 525 10,000 100,000 1,000,000 10,000,000 100,000,000 526 0.01 917 1146 1375 1604 1833 527 Probability 0.05 180 225 270 315 360 528 of Success 0.10 88 110 132 153 175 529 Per Retry 0.15 57 71 86 100 114 530 Interval 0.25 33 41 49 57 65 531 (successRate) 0.50 14 17 20 24 27 532 0.90 4 5 6 7 8 533 0.95 4 4 5 6 7 534 0.99 2 3 3 4 4 535 0.999 2 2 2 3 3 537 Finally, a suggested value of retrySafetyMargin can then be this 538 retryCountWait number multiplied by the retryTime from RFC5011: 540 retrySafetyMargin = retryCountWait * retryTime 542 6.2. Timing Requirements For Adding a New KSK 544 Section 6.2.1 defines a method for calculating the amount of time to 545 wait until it is safe to start signing exclusively with a new DNSKEY 546 (especially useful for writing code involving sleep based timers), 547 and Section 6.2.2 defines a method for calculating a wall-clock value 548 after which it is safe to start signing exclusively with a new DNSKEY 549 (especially useful for writing code based on clock-based event 550 triggers). 552 6.2.1. Wait Timer Based Calculation 554 Given the attack description in Section 5, the correct minimum length 555 of time required for the Zone Signer to wait after publishing K_new 556 but before exclusively using it and newer keys is: 558 addWaitTime = addHoldDownTime 559 + sigExpirationTimeRemaining 560 + activeRefresh 561 + timingSafetyMargin 562 + retrySafetyMargin 564 6.2.1.1. Fully expanded equation 566 Given the equation components defined in Section 6.1, the full 567 expanded equation is: 569 addWaitTime = addHoldDownTime 570 + sigExpirationTimeRemaining 571 + MAX(1 hour, 572 MIN(sigExpirationTime / 2, 573 MAX(TTL of K_old DNSKEY RRSet) / 2, 574 15 days) 575 ) 576 + activeRefresh 577 + retrySafetyMargin 579 6.2.2. Wall-Clock Based Calculation 581 The equations in Section 6.2.1 are defined based upon how long to 582 wait from a particular moment in time. An alternative, but 583 equivalent, method is to calculate the date and time before which it 584 is unsafe to use a key for signing. This calculation thus becomes: 586 addWallClockTime = lastSigExpirationTime 587 + addHoldDownTime 588 + activeRefresh 589 + timingSafetyMargin 590 + retrySafetyMargin 592 where lastSigExpirationTime is the latest value of any 593 sigExpirationTime for which RRSIGs were created that could 594 potentially be replayed. Fully expanded, this becomes: 596 addWallClockTime = lastSigExpirationTime 597 + addHoldDownTime 598 + 2 * MAX(1 hour, 599 MIN(sigExpirationTime / 2, 600 MAX(TTL of K_old DNSKEY RRSet) / 2, 601 15 days) 602 ) 603 + activeRefresh 604 + retrySafetyMargin 606 6.2.3. Timing Constraint Summary 608 The important timing constraint introduced by this memo relates to 609 the last point at which a RFC5011 Resolver may have received a 610 replayed original DNSKEY set, containing K_old and not K_new. The 611 next query of the RFC5011 validator at which K_new will be seen 612 without the potential for a replay attack will occur after the old 613 DNSKEY RRSIG's Signature Expriation Time. Thus, the latest time that 614 a RFC5011 Validator may begin their hold down timer is an "Active 615 Refresh" period after the last point that an attacker can replay the 616 K_old DNSKEY set. The worst case scenario of this attack is if the 617 attacker can replay K_old just seconds before the (DNSKEY RRSIG 618 Signature Validity) field of the last K_old only RRSIG. 620 6.2.4. Additional Considerations for RFC7583 622 Note: our notion of addWaitTime is called "Itrp" in Section 3.3.4.1 623 of [RFC7583]. The equation for Itrp in RFC7583 is insecure as it 624 does not include the sigExpirationTime listed above. The Itrp 625 equation in RFC7583 also does not include the 2*TTL safety margin, 626 though that is an operational consideration. 628 6.2.5. Example Scenario Calculations 630 For the parameters listed in Section 5.1, our resulting addWaitTime 631 is: 633 addWaitTime = 30 634 + 10 635 + 1 / 2 636 + 1 / 2 (days) 638 addWaitTime = 43 (days) 640 This addWaitTime of 42.5 days is 12.5 days longer than just the hold 641 down timer, even with the needed retrySafetyMargin value being left 642 out (which we exclude due to the lack of necessary operational 643 parameters). 645 6.3. Timing Requirements For Revoking an Old KSK 647 This issue affects not just the publication of new DNSKEYs intended 648 to be used as trust anchors, but also the length of time required to 649 continuously publish a DNSKEY with the revoke bit set. 651 Section 6.2.1 defines a method for calculating the amount of time 652 operators need to wait until it is safe to cease publishing a DNSKEY 653 (especially useful for writing code involving sleep based timers), 654 and Section 6.2.2 defines a method for calculating a minimal wall- 655 clock value after which it is safe to cease publishing a DNSKEY 656 (especially useful for writing code based on clock-based event 657 triggers). 659 6.3.1. Wait Timer Based Calculation 661 Both of these publication timing requirements are affected by the 662 attacks described in this document, but with revocation the key is 663 revoked immediately and the addHoldDown timer does not apply. Thus 664 the minimum amount of time that a SEP Publisher must wait before 665 removing a revoked key from publication is: 667 remWaitTime = sigExpirationTimeRemaining 668 + activeRefresh 669 + timingSafetyMargin 670 + retrySafetyMargin 672 remWaitTime = sigExpirationTimeRemaining 673 + MAX(1 hour, 674 MIN((sigExpirationTime) / 2, 675 MAX(TTL of K_old DNSKEY RRSet) / 2, 676 15 days)) 677 + activeRefresh 678 + retrySafetyMargin 680 Note also that adding retryTime intervals to the remWaitTime may be 681 wise, just as it was for addWaitTime in Section 6. 683 6.3.2. Wall-Clock Based Calculation 685 Like before, the above equations are defined based upon how long to 686 wait from a particular moment in time. An alternative, but 687 equivalent, method is to calculate the date and time before which it 688 is unsafe to cease publishing a revoked key. This calculation thus 689 becomes: 691 remWallClockTime = lastSigExpirationTime 692 + activeRefresh 693 + timingSafetyMargin 694 + retrySafetyMargin 696 remWallClockTime = lastSigExpirationTime 697 + MAX(1 hour, 698 MIN((sigExpirationTime) / 2, 699 MAX(TTL of K_old DNSKEY RRSet) / 2, 700 15 days)) 701 + timingSafetyMargin 702 + retrySafetyMargin 704 where lastSigExpirationTime is the latest value of any 705 sigExpirationTime for which RRSIGs were created that could 706 potentially be replayed. Fully expanded, this becomes: 708 6.3.3. Additional Considerations for RFC7583 710 Note that our notion of remWaitTime is called "Irev" in 711 Section 3.3.4.2 of [RFC7583]. The equation for Irev in RFC7583 is 712 insecure as it does not include the sigExpirationTime listed above. 713 The Irev equation in RFC7583 also does not include a safety margin, 714 though that is an operational consideration. 716 6.3.4. Example Scenario Calculations 718 For the parameters listed in Section 5.1, our example: 720 remwaitTime = 10 721 + 1 / 2 (days) 723 remwaitTime = 10.5 (days) 725 Note that for the values in this example produce a length shorter 726 than the recommended 30 days in RFC5011's section 6.6, step 3. Other 727 values of sigExpirationTime and the original TTL of the K_old DNSKEY 728 RRSet, however, can produce values longer than 30 days. 730 Note that because revocation happens immediately, an attacker has a 731 much harder job tricking a RFC5011 Resolver into leaving a trust 732 anchor in place, as the attacker must successfully replay the old 733 data for every query a RFC5011 Resolver sends, not just one. 735 7. IANA Considerations 737 This document contains no IANA considerations. 739 8. Operational Considerations 741 A companion document to RFC5011 was expected to be published that 742 describes the best operational practice considerations from the 743 perspective of a zone publisher and SEP Publisher. However, this 744 companion document has yet to be published. The authors of this 745 document hope that it will at some point in the future, as RFC5011 746 timing can be tricky as we have shown, and a BCP is clearly 747 warranted. This document is intended only to fill a single 748 operational void which, when left misunderstood, can result in 749 serious security ramifications. This document does not attempt to 750 document any other missing operational guidance for zone publishers. 752 9. Security Considerations 754 This document, is solely about the security considerations with 755 respect to the SEP Publisher's ability to advertise new DNSKEYs via 756 the RFC5011 automated trust anchor update process. Thus the entire 757 document is a discussion of Security Considerations when adding or 758 removing DNSKEYs from trust anchor storage using the RFC5011 process. 760 For simplicity, this document assumes that the SEP Publisher will use 761 a consistent RRSIG validity period. SEP Publishers that vary the 762 length of RRSIG validity periods will need to adjust the 763 sigExpirationTime value accordingly so that the equations in 764 Section 6 and Section 6.3 use a value that coincides with the last 765 time a replay of older RRSIGs will no longer succeed. 767 10. Acknowledgements 769 The authors would like to especially thank to Michael StJohns for his 770 help and advice and the care and thought he put into RFC5011 itself 771 and his continued reviews and suggestions for this document. He also 772 designed the suggested math behind the suggested retrySafetyMargin 773 values in Section 6.1.9. 775 We would also like to thank Bob Harold, Shane Kerr, Matthijs Mekking, 776 Duane Wessels, Petr Petr Spacek, Ed Lewis, and the dnsop working 777 group who have assisted with this document. 779 11. Normative References 781 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 782 Requirement Levels", BCP 14, RFC 2119, March 1997. 784 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 785 Rose, "DNS Security Introduction and Requirements", 786 RFC 4033, DOI 10.17487/RFC4033, March 2005, 787 . 789 [RFC5011] StJohns, M., "Automated Updates of DNS Security (DNSSEC) 790 Trust Anchors", STD 74, RFC 5011, DOI 10.17487/RFC5011, 791 September 2007, . 793 [RFC7583] Morris, S., Ihren, J., Dickinson, J., and W. Mekking, 794 "DNSSEC Key Rollover Timing Considerations", RFC 7583, 795 DOI 10.17487/RFC7583, October 2015, . 798 [RFC7719] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS 799 Terminology", RFC 7719, DOI 10.17487/RFC7719, December 800 2015, . 802 Appendix A. Real World Example: The 2017 Root KSK Key Roll 804 In 2017 and 2018, ICANN expects to (or has, depending on when you're 805 reading this) roll the key signing key (KSK) for the root zone. The 806 relevant parameters associated with the root zone at the time of this 807 writing is as follows: 809 addHoldDownTime: 30 days 810 Old DNSKEY sigExpirationTime: 21 days 811 Old DNSKEY TTL: 2 days 813 Thus, sticking this information into the equation in 814 Section Section 6 yields (in days from publication time): 816 addWaitTime = 30 817 + 21 818 + MAX(1 hour, 819 MIN(21 / 2, # activeRefresh 820 MAX(2) / 2, 821 15 days), 822 ) 823 + activeRefresh 825 addWaitTime = 30 + 21 + 1 + 1 827 addWaitTime = 53 days 829 Also note that we exclude the retrySafetyMargin value, which is 830 calculated based on the expected client deployment size. 832 Thus, ICANN must wait a minimum of 52 days before switching to the 833 newly published KSK (and 26 days before removing the old revoked key 834 once it is published as revoked). ICANN's current plans involve 835 waiting over 3 months before using the new KEY and 69 days before 836 removing the old, revoked key. Thus, their current rollover plans 837 are sufficiently secure from the attack discussed in this memo. 839 Authors' Addresses 840 Wes Hardaker 841 USC/ISI 842 P.O. Box 382 843 Davis, CA 95617 844 US 846 Email: ietf@hardakers.net 848 Warren Kumari 849 Google 850 1600 Amphitheatre Parkway 851 Mountain View, CA 94043 852 US 854 Email: warren@kumari.net