idnits 2.17.1 draft-ietf-dnsop-rfc5011-security-considerations-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC7583, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 01, 2018) is 2275 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 7583 ** Obsolete normative reference: RFC 7719 (Obsoleted by RFC 8499) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 dnsop W. Hardaker 3 Internet-Draft USC/ISI 4 Updates: 7583 (if approved) W. Kumari 5 Intended status: Standards Track Google 6 Expires: August 5, 2018 February 01, 2018 8 Security Considerations for RFC5011 Publishers 9 draft-ietf-dnsop-rfc5011-security-considerations-11 11 Abstract 13 This document extends the RFC5011 rollover strategy with timing 14 advice that must be followed by the publisher in order to maintain 15 security. Specifically, this document describes the math behind the 16 minimum time-length that a DNS zone publisher must wait before 17 signing exclusively with recently added DNSKEYs. This document also 18 describes the minimum time-length that a DNS zone publisher must wait 19 after publishing a revoked DNSKEY before assuming that all active 20 RFC5011 resolvers should have seen the revocation-marked key and 21 removed it from their list of trust anchors. 23 This document contains much math and complicated equations, but the 24 summary is that the key rollover / revocation time is much longer 25 than intuition would suggest. If you are not both publishing a 26 DNSSEC DNSKEY, and using RFC5011 to advertise this DNSKEY as a new 27 Secure Entry Point key for use as a trust anchor, you probably don't 28 need to read this document. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on August 5, 2018. 47 Copyright Notice 49 Copyright (c) 2018 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Document History and Motivation . . . . . . . . . . . . . 3 66 1.2. Safely Rolling the Root Zone's KSK in 2017/2018 . . . . . 4 67 1.3. Requirements notation . . . . . . . . . . . . . . . . . . 4 68 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 4. Timing Associated with RFC5011 Processing . . . . . . . . . . 5 71 4.1. Timing Associated with Publication . . . . . . . . . . . 5 72 4.2. Timing Associated with Revocation . . . . . . . . . . . . 5 73 5. Denial of Service Attack Walkthrough . . . . . . . . . . . . 6 74 5.1. Enumerated Attack Example . . . . . . . . . . . . . . . . 6 75 5.1.1. Attack Timing Breakdown . . . . . . . . . . . . . . . 7 76 6. Minimum RFC5011 Timing Requirements . . . . . . . . . . . . . 8 77 6.1. Equation Components . . . . . . . . . . . . . . . . . . . 9 78 6.1.1. addHoldDownTime . . . . . . . . . . . . . . . . . . . 9 79 6.1.2. lastSigExpirationTime . . . . . . . . . . . . . . . . 9 80 6.1.3. sigExpirationTime . . . . . . . . . . . . . . . . . . 9 81 6.1.4. sigExpirationTimeRemaining . . . . . . . . . . . . . 9 82 6.1.5. activeRefresh . . . . . . . . . . . . . . . . . . . . 9 83 6.1.6. timingSafetyMargin . . . . . . . . . . . . . . . . . 10 84 6.1.7. retrySafetyMargin . . . . . . . . . . . . . . . . . . 12 85 6.2. Timing Requirements For Adding a New KSK . . . . . . . . 13 86 6.2.1. Wait Timer Based Calculation . . . . . . . . . . . . 14 87 6.2.2. Wall-Clock Based Calculation . . . . . . . . . . . . 14 88 6.2.3. Timing Constraint Summary . . . . . . . . . . . . . . 15 89 6.2.4. Additional Considerations for RFC7583 . . . . . . . . 15 90 6.2.5. Example Scenario Calculations . . . . . . . . . . . . 15 91 6.3. Timing Requirements For Revoking an Old KSK . . . . . . . 16 92 6.3.1. Wait Timer Based Calculation . . . . . . . . . . . . 16 93 6.3.2. Wall-Clock Based Calculation . . . . . . . . . . . . 16 94 6.3.3. Additional Considerations for RFC7583 . . . . . . . . 17 95 6.3.4. Example Scenario Calculations . . . . . . . . . . . . 17 96 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 97 8. Operational Considerations . . . . . . . . . . . . . . . . . 18 98 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 99 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 100 11. Normative References . . . . . . . . . . . . . . . . . . . . 19 101 Appendix A. Real World Example: The 2017 Root KSK Key Roll . . . 19 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 104 1. Introduction 106 [RFC5011] defines a mechanism by which DNSSEC validators can update 107 their list of trust anchors when they've seen a new key published in 108 a zone or revoke a properly marked key from a trust anchor list. 109 However, RFC5011 [intentionally] provides no guidance to the 110 publishers of DNSKEYs about how long they must wait before switching 111 to exclusively using recently published keys for signing records, or 112 how long they must wait before ceasing publication of a revoked key. 113 Because of this lack of guidance, zone publishers may derive 114 incorrect assumptions about safe usage of the RFC5011 DNSKEY 115 advertising, rolling and revocation process. This document describes 116 the minimum security requirements from a publisher's point of view 117 and is intended to complement the guidance offered in RFC5011 (which 118 is written to provide timing guidance solely to a Validating 119 Resolver's point of view). 121 To explain the RFC5011 security analysis in this document better, 122 Section 5 first describes an attack on a zone publisher. Then in 123 Section 6.1 we break down each of the timing components that will be 124 later used to define timing requirements for adding keys in 125 Section 6.2 and revoking keys in Section 6.3. 127 1.1. Document History and Motivation 129 To verify this lack of understanding is wide-spread, the authors 130 reached out to 5 DNSSEC experts to ask them how long they thought 131 they must wait before signing a zone exclusively with a new KSK 132 [RFC4033] that was being introduced according to the 5011 process. 133 All 5 experts answered with an insecure value, and we determined that 134 this lack of mathematical understanding might cause security concerns 135 in deployment. We hope that this companion document to RFC5011 will 136 rectify this understanding and provide better guidance to zone 137 publishers that wish to make use of the RFC5011 rollover process. 139 1.2. Safely Rolling the Root Zone's KSK in 2017/2018 141 One important note about ICANN's (currently in process) 2017/2018 KSK 142 rollover plan for the root zone: the timing values chosen for rolling 143 the KSK in the root zone appear completely safe, and are not affected 144 by the timing concerns introduced by this draft 146 1.3. Requirements notation 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 150 document are to be interpreted as described in [RFC2119]. 152 2. Background 154 The RFC5011 process describes a process by which a RFC5011 Resolver 155 may accept a newly published KSK as a trust anchor for validating 156 future DNSSEC signed records. It also describes the process for 157 publicly revoking a published KSK. This document augments that 158 information with additional constraints, from the SEP publisher's 159 points of view. Note that this document does not define any other 160 operational guidance or recommendations about the RFC5011 process and 161 restricts itself to solely the security and operational ramifications 162 of switching to exclusively using recently added keys or removing 163 revoked keys too soon. 165 Failure of a DNSKEY publisher to follow the minimum recommendations 166 associated with this draft can result in potential denial-of-service 167 attack opportunities against validating resolvers. Failure of a 168 DNSKEY publisher to publish a revoked key for a long enough period of 169 time may result in RFC5011 Resolvers leaving that key in their trust 170 anchor storage beyond the key's expected lifetime. 172 3. Terminology 174 SEP Publisher The entity responsible for publishing a DNSKEY (with 175 the Secure Entry Point (SEP) bit set) that can be used as a trust 176 anchor. 178 Zone Signer The owner of a zone intending to publish a new Key- 179 Signing-Key (KSK) that may become a trust anchor for validators 180 following the RFC5011 process. 182 RFC5011 Resolver A DNSSEC Resolver that is using the RFC5011 183 processes to track and update trust anchors. 185 Attacker An entity intent on foiling the RFC5011 Resolver's ability 186 to successfully adopt the Zone Signer's new DNSKEY as a new trust 187 anchor or to prevent the RFC5011 Resolver from removing an old 188 DNSKEY from its list of trust anchors. 190 sigExpirationTime The amount of time between the DNSKEY RRSIG's 191 Signature Inception field and the Signature Expiration field. 193 Also see Section 2 of [RFC4033] and [RFC7719] for additional 194 terminology. 196 4. Timing Associated with RFC5011 Processing 198 These sections define a high-level overview of [RFC5011] processing. 199 These steps are not sufficient for proper RFC5011 implementation, but 200 provide enough background for the reader to follow the discussion in 201 this document. Readers need to fully understand [RFC5011] as well to 202 fully comprehend the content and importance of this document. 204 4.1. Timing Associated with Publication 206 RFC5011's process of safely publishing a new DNSKEY and then assuming 207 RFC5011 Resolvers have adopted it for trust falls into a number of 208 high-level steps to be performed by the SEP Publisher. This document 209 discusses the following scenario, which the principle way RFC5011 is 210 currently being used (even though Section 6 of RFC5011 suggests 211 having a stand-by key available): 213 1. Publish a new DNSKEY in a zone, but continue to sign the zone 214 with the old one. 216 2. Wait a period of time. 218 3. Begin to exclusively use recently published DNSKEYs to sign the 219 appropriate resource records. 221 This document discusses the time required to wait during step 2 of 222 the above process. Some interpretations of RFC5011 have erroneously 223 determined that the wait time is equal to RFC5011's "hold down time". 224 Section 5 describes an attack based on this (common) erroneous 225 belief, which can result in a denial of service attack against the 226 zone. 228 4.2. Timing Associated with Revocation 230 RFC5011's process of advertising that an old key is to be revoked 231 from RFC5011 Resolvers falls into a number of high-level steps: 233 1. Set the revoke bit on the DNSKEY to be revoked. 235 2. Sign the revoked DNSKEY with itself. 237 3. Wait a period of time. 239 4. Remove the revoked key from the zone. 241 This document discusses the time required to wait in step 3 of the 242 above process. Some interpretations of RFC5011 have erroneously 243 determined that the wait time is equal to RFC5011's "hold down time". 244 This document describes an attack based on this (common) erroneous 245 belief, which results in a revoked DNSKEY potentially remaining as a 246 trust anchor in a RFC5011 Resolver long past its expected usage. 248 5. Denial of Service Attack Walkthrough 250 This section serves as an illustrative example of the problem being 251 discussed in this document. Note that in order to keep the example 252 simple enough to understand, some simplifications were made (such as 253 by not creating a set of pre-signed RRSIGs and by not using values 254 that result in the addHoldDownTime not being evenly divisible by the 255 activeRefresh value); the mathematical formulas in Section 6 are, 256 however, complete. 258 If an attacker is able to provide a RFC5011 Resolver with past 259 responses, such as when it is in-path or able to perform any number 260 of cache poisoning attacks, the attacker may be able to leave 261 compliant RFC5011 Resolvers without an appropriate DNSKEY trust 262 anchor. This scenario will remain until an administrator manually 263 fixes the situation. 265 The time-line below illustrates an example of this situation. 267 5.1. Enumerated Attack Example 269 The following example settings are used in the example scenario 270 within this section: 272 TTL (all records) 1 day 274 sigExpirationTime 10 days 276 Zone resigned every 1 day 278 Given these settings, the sequence of events in Section 5.1.1 depicts 279 how a SEP Publisher that waits for only the RFC5011 hold time timer 280 length of 30 days subjects its users to a potential Denial of Service 281 attack. The timing schedule listed below is based on a SEP Publisher 282 publishing a new Key Signing Key (KSK), with the intent that it will 283 later be used as a trust anchor. We label this publication time as 284 "T+0". All numbers in this sequence refer to days before and after 285 this initial publication event. Thus, T-1 is the day before the 286 introduction of the new key, and T+15 is the 15th day after the key 287 was introduced into the fictitious zone being discussed. 289 In this dialog, we consider two keys within the example zone: 291 K_old: An older KSK and Trust Anchor being replaced. 293 K_new: A new KSK being transitioned into active use and expected to 294 become a Trust Anchor via the RFC5011 automated trust anchor 295 update process. 297 5.1.1. Attack Timing Breakdown 299 The steps shows an attack that foils the adoption of a new DNSKEY by 300 a 5011 Resolver when the SEP Publisher that starts signing and 301 publishing with the new DNSKEY too quickly. 303 T-1 The K_old based RRSIGs are being published by the Zone Signer. 304 [It may also be signing ZSKs as well, but they are not relevant to 305 this event so we will not talk further about them; we are only 306 considering the RRSIGs that cover the DNSKEYs in this document.] 307 The Attacker queries for, retrieves and caches this DNSKEY set and 308 corresponding RRSIG signatures. 310 T+0 The Zone Signer adds K_new to their zone and signs the zone's 311 key set with K_old. The RFC5011 Resolver (later to be under 312 attack) retrieves this new key set and corresponding RRSIGs and 313 notices the publication of K_new. The RFC5011 Resolver starts the 314 (30-day) hold-down timer for K_new. [Note that in a more real- 315 world scenario there will likely be a further delay between the 316 point where the Zone Signer publishes a new RRSIG and the RFC5011 317 Resolver notices its publication; though not shown in this 318 example, this delay is accounted for in the equation in Section 6 319 below] 321 T+5 The RFC5011 Resolver queries for the zone's keyset per the 322 RFC5011 Active Refresh schedule, discussed in Section 2.3 of 323 RFC5011. Instead of receiving the intended published keyset, the 324 Attacker successfully replays the keyset and associated signatures 325 recorded at T-1 to the victim RFC5011 Resolver. Because the 326 signature lifetime is 10 days (in this example), the replayed 327 signature and keyset is accepted as valid (being only 6 days old, 328 which is less than sigExpirationTime) and the RFC5011 Resolver 329 cancels the (30-day) hold-down timer for K_new, per the RFC5011 330 algorithm. 332 T+10 The RFC5011 Resolver queries for the zone's keyset and 333 discovers a signed keyset that includes K_new (again), and is 334 signed by K_old. Note: the attacker is unable to replay the 335 records cached at T-1, because the signatures have now expired. 336 Thus at T+10, the RFC5011 Resolver starts (anew) the hold-timer 337 for K_new. 339 T+11 through T+29 The RFC5011 Resolver continues checking the zone's 340 key set at the prescribed regular intervals. During this period, 341 the attacker can no longer replay traffic to their benefit. 343 T+30 The Zone Signer knows that this is the first time at which some 344 validators might accept K_new as a new trust anchor, since the 345 hold-down timer of a RFC5011 Resolver not under attack that had 346 queried and retrieved K_new at T+0 would now have reached 30 days. 347 However, the hold-down timer of our attacked RFC5011 Resolver is 348 only at 20 days. 350 T+35 The Zone Signer (mistakenly) believes that all validators 351 following the Active Refresh schedule (Section 2.3 of RFC5011) 352 should have accepted K_new as a the new trust anchor (since the 353 hold down time (30 days) + the query interval [which is just 1/2 354 the signature validity period in this example] would have passed). 355 However, the hold-down timer of our attacked RFC5011 Resolver is 356 only at 25 days (T+35 minus T+10); thus the RFC5011 Resolver won't 357 consider it a valid trust anchor addition yet, as the required 30 358 days have not yet elapsed. 360 T+36 The Zone Signer, believing K_new is safe to use, switches their 361 active signing KSK to K_new and publishes a new RRSIG, signed with 362 (only) K_new, covering the DNSKEY set. Non-attacked RFC5011 363 validators, with a hold-down timer of at least 30 days, would have 364 accepted K_new into their set of trusted keys. But, because our 365 attacked RFC5011 Resolver now has a hold-down timer for K_new of 366 only 26 days, it failed to ever accept K_new as a trust anchor. 367 Since K_old is no longer being used to sign the zone's DNSKEYs, 368 all the DNSKEY records from the zone will be treated as invalid. 369 Subsequently, all of the records in the DNS tree below the zone's 370 apex will be deemed invalid by DNSSEC. 372 6. Minimum RFC5011 Timing Requirements 374 This section defines the minimum timing requirements for making 375 exclusive use of newly added DNSKEYs and timing requirements for 376 ceasing the publication of DNSKEYs to be revoked. We break our 377 timing solution requirements into two primary components: the 378 mathematically-based security analysis of the RFC5011 publication 379 process itself, and an extension of this that takes operational 380 realities into account that further affect the recommended timings. 382 First, we define the term components used in all equations in 383 Section 6.1. 385 6.1. Equation Components 387 6.1.1. addHoldDownTime 389 The addHoldDownTime is defined in Section 2.4.1 of [RFC5011] as: 391 The add hold-down time is 30 days or the expiration time of the 392 original TTL of the first trust point DNSKEY RRSet that contained 393 the new key, whichever is greater. This ensures that at least 394 two validated DNSKEY RRSets that contain the new key MUST be seen 395 by the resolver prior to the key's acceptance. 397 6.1.2. lastSigExpirationTime 399 The latest value (i.e. the future most date and time) of any RRSig 400 Signature Expiration field covering any DNSKEY RRSet containing only 401 the old trust anchor(s) that are being superseded. Note that for 402 organizations pre-creating signatures this time may be fairly far in 403 the future unless they can be significantly assured that none of 404 their pre-generated signatures can be replayed at a later date. 406 6.1.3. sigExpirationTime 408 The amount of time between the DNSKEY RRSIG's Signature Inception 409 field and the Signature Expiration field. 411 6.1.4. sigExpirationTimeRemaining 413 sigExpirationTimeRemaining is defined in Section 3. 415 6.1.5. activeRefresh 417 activeRefresh time is defined by RFC5011 by 419 A resolver that has been configured for an automatic update 420 of keys from a particular trust point MUST query that trust 421 point (e.g., do a lookup for the DNSKEY RRSet and related 422 RRSIG records) no less often than the lesser of 15 days, half 423 the original TTL for the DNSKEY RRSet, or half the RRSIG 424 expiration interval and no more often than once per hour. 426 This translates to: 428 activeRefresh = MAX(1 hour, 429 MIN(sigExpirationTime / 2, 430 MAX(TTL of K_old DNSKEY RRSet) / 2, 431 15 days) 432 ) 434 6.1.6. timingSafetyMargin 436 Mentally, it is easy to assume that the period of time required for 437 SEP publishers to wait after making changes to SEP marked DNSKEY sets 438 will be entirely based off the length of the addHoldDownTime. 439 Unfortunately, analysis shows that both the design of the RFC5011 440 protocol and in operational realities in deploying it require waiting 441 and additional period of time longer. In subsections Section 6.1.6.1 442 to Section 6.1.6.3 below, we discuss three sources of additional 443 delay. In the end, we will pick the largest of these delays as the 444 minimum additional time that the SEP Publisher must wait in our final 445 timingSafetyMargin value, which we define in Section 6.1.6.4. 447 6.1.6.1. activeRefreshOffset 449 Security analysis of the timing associated with the query rate of 450 RFC5011 Resolvers shows that it may not perfectly align with the 451 addHoldDownTime when the addHoldDownTime is not evenly divisible by 452 the activeRefresh time. Consider the example of a zone with an 453 activeRefresh period of 7 days. If an associated RFC5011 Resolver 454 started it's holdDown timer just after the SEP published a new DNSKEY 455 (at time T), the resolver would send checking queries at T+7, T+14, 456 T+21 and T+28 Days and will finally accept it at T+35 days, which is 457 5 days longer than the 30-day addHoldDownTime. 459 The activeRefreshOffset term defines this time difference and 460 becomes: 462 activeRefreshOffset = addHoldDownTime % activeRefresh 464 The % symbol denotes the mathematical mod operator (calculating the 465 remainder in a division problem). This will frequently be zero, but 466 can be nearly as large as activeRefresh itself. 468 6.1.6.2. clockskewDriftMargin 470 Even small clock drifts can have negative impacts upon the timing of 471 the RFC5011 Resolver's measurements. Consider the simplest case 472 where the RFC5011 Resolver's clock shifts over time to be 2 seconds 473 slower near the end of the RFC5011 Resolver's addHoldDownTime period. 474 I.E., if the RFC5011 Resolver first noticed a new DNSKEY at: 476 firstSeen = sigExpirationTime + activeRefresh + 1 second 478 The effect of 2 second clock drift between the SEP Publisher and the 479 RFC5011 Resolver may result in the RFC5011 Resolver querying again 480 at: 482 justBefore = sigExpirationTime + addHoldDownTime + 483 activeRefresh + 1 second - 2 seconds 485 which becomes: 487 justBefore = sigExpirationTime + addHoldDownTime + 488 activeRefresh - 1 second 490 The net effect is the addHoldDownTime will not have been reached from 491 the perspective of the RFC5011 Resolver, but it will have been 492 reached from the perspective of the SEP Publisher. The net effect is 493 it may take one additional activeRefresh period longer for this 494 RFC5011 Resolver to accept the new key (at sigExpirationTime + 495 addHoldDownTime + 2 * activeRefresh - 1 second). 497 We note that even the smallest clockskew errors can require waiting 498 an additional activeRefresh period, and thus define the 499 clockskewDriftMargin as: 501 clockskewDriftMargin = activeRefresh 503 6.1.6.3. retryDriftMargin 505 Drift associated with a lost transmission and an accompanying re- 506 transmission (see Section 2.3 of [RFC5011]) will cause RFC5011 507 Resolvers to also change the timing associated with query times such 508 that it becomes impossible to predict, from the perspective of the 509 PEP Publisher, when the final important measurement query will 510 arrive. Similarly, any software that restarts/reboots without saving 511 next-query timing state may also commence with a new random starting 512 time. Thus, an additional activeRefresh is needed to handle both 513 these cases as well. 515 retryDriftMargin = activeRefresh 517 Note that we account for additional time associated with cumulative 518 multiple retries, especially under high-loss conditions, in 519 Section 6.1.6.4. 521 6.1.6.4. timingSafetyMargin Value 523 The activeRefreshOffset, clockskewDriftMargin, and retryDriftMargin 524 parameters all deal with additional wait-periods that must be 525 accounted for after analyzing what conditions the client will take 526 longer than expected to make its last query while waiting for the 527 addHoldDownTime period to pass. But these values may be merged into 528 a single term by waiting the longest of any of them. We define 529 timingSafetyMargin as this "worst case" value: 531 timingSafetyMargin = MAX(activeRefreshOffset, 532 clockskewDriftMargin, 533 retryDriftMargin) 535 timingSafetyMargin = MAX(addWaitTime % activeRefresh, 536 activeRefresh, 537 activeRefresh) 539 timingSafetyMargin = activeRefresh 541 6.1.7. retrySafetyMargin 543 The retrySafetyMargin is an extra period of time to account for 544 caching, network delays, dropped packets, and other operational 545 concerns otherwise beyond the scope of this document. The value 546 operators should chose is highly dependent on the deployment 547 situation associated with their zone. Note that no value of a 548 retrySafetyMargin can protect against resolvers that are "down". 549 None the less, we do offer the following as one method considering 550 reasonable values to select from. 552 The following list of variables need to be considered when selecting 553 an appropriate retrySafetyMargin value: 555 successRate: A likely success rate for client queries and retries 557 numResolvers: The number of client RFC5011 Resolvers 559 Note that RFC5011 defines retryTime as: 561 If the query fails, the resolver MUST repeat the query until 562 satisfied no more often than once an hour and no less often 563 than the lesser of 1 day, 10% of the original TTL, or 10% of 564 the original expiration interval. That is, 565 retryTime = MAX (1 hour, MIN (1 day, .1 * origTTL, 566 .1 * expireInterval)). 568 With the successRate and numResolvers values selected and the 569 definition of retryTime from RFC5011, one method for determining how 570 many retryTime intervals to wait in order to reduce the set of 571 uncompleted servers to 0 assuming normal probability is thus: 573 x = (1/(1 - successRate)) 575 retryCountWait = Log_base_x(numResolvers) 577 To reduce the need for readers to pull out a scientific calculator, 578 we offer the following lookup table based on successRate and 579 numResolvers: 581 retryCountWait lookup table 582 --------------------------- 584 Number of client RFC5011 Resolvers (numResolvers) 585 ------------------------------------------------- 586 10,000 100,000 1,000,000 10,000,000 100,000,000 587 0.01 917 1146 1375 1604 1833 588 Probability 0.05 180 225 270 315 360 589 of Success 0.10 88 110 132 153 175 590 Per Retry 0.15 57 71 86 100 114 591 Interval 0.25 33 41 49 57 65 592 (successRate) 0.50 14 17 20 24 27 593 0.90 4 5 6 7 8 594 0.95 4 4 5 6 7 595 0.99 2 3 3 4 4 596 0.999 2 2 2 3 3 598 Finally, a suggested value of retrySafetyMargin can then be this 599 retryCountWait number multiplied by the retryTime from RFC5011: 601 retrySafetyMargin = retryCountWait * retryTime 603 6.2. Timing Requirements For Adding a New KSK 605 Given the defined parameters and analysis from Section 6.1, we can 606 now create a method for calculating the amount of time to wait until 607 it is safe to start signing exclusively with a new DNSKEY (especially 608 useful for writing code involving sleep based timers) in 609 Section 6.2.1, and define a method for calculating a wall-clock value 610 after which it is safe to start signing exclusively with a new DNSKEY 611 (especially useful for writing code based on clock-based event 612 triggers) in Section 6.2.2. 614 6.2.1. Wait Timer Based Calculation 616 Given the attack description in Section 5, the correct minimum length 617 of time required for the Zone Signer to wait after publishing K_new 618 but before exclusively using it and newer keys is: 620 addWaitTime = addHoldDownTime 621 + sigExpirationTimeRemaining 622 + activeRefresh 623 + timingSafetyMargin 624 + retrySafetyMargin 626 6.2.1.1. Fully expanded equation 628 Given the equation components defined in Section 6.1, the full 629 expanded equation is: 631 addWaitTime = addHoldDownTime 632 + sigExpirationTimeRemaining 633 + 2 * MAX(1 hour, 634 MIN(sigExpirationTime / 2, 635 MAX(TTL of K_old DNSKEY RRSet) / 2, 636 15 days) 637 ) 638 + retrySafetyMargin 640 6.2.2. Wall-Clock Based Calculation 642 The equations in Section 6.2.1 are defined based upon how long to 643 wait from a particular moment in time. An alternative, but 644 equivalent, method is to calculate the date and time before which it 645 is unsafe to use a key for signing. This calculation thus becomes: 647 addWallClockTime = lastSigExpirationTime 648 + addHoldDownTime 649 + activeRefresh 650 + timingSafetyMargin 651 + retrySafetyMargin 653 where lastSigExpirationTime is the latest value of any 654 sigExpirationTime for which RRSIGs were created that could 655 potentially be replayed. Fully expanded, this becomes: 657 addWallClockTime = lastSigExpirationTime 658 + addHoldDownTime 659 + 2 * MAX(1 hour, 660 MIN(sigExpirationTime / 2, 661 MAX(TTL of K_old DNSKEY RRSet) / 2, 662 15 days) 663 ) 664 + retrySafetyMargin 666 6.2.3. Timing Constraint Summary 668 The important timing constraint introduced by this memo relates to 669 the last point at which a RFC5011 Resolver may have received a 670 replayed original DNSKEY set, containing K_old and not K_new. The 671 next query of the RFC5011 validator at which K_new will be seen 672 without the potential for a replay attack will occur after the old 673 DNSKEY RRSIG's Signature Expriation Time. Thus, the latest time that 674 a RFC5011 Validator may begin their hold down timer is an "Active 675 Refresh" period after the last point that an attacker can replay the 676 K_old DNSKEY set. The worst case scenario of this attack is if the 677 attacker can replay K_old just seconds before the (DNSKEY RRSIG 678 Signature Validity) field of the last K_old only RRSIG. 680 6.2.4. Additional Considerations for RFC7583 682 Note: our notion of addWaitTime is called "Itrp" in Section 3.3.4.1 683 of [RFC7583]. The equation for Itrp in RFC7583 is insecure as it 684 does not include the sigExpirationTime listed above. The Itrp 685 equation in RFC7583 also does not include the 2*TTL safety margin, 686 though that is an operational consideration. 688 6.2.5. Example Scenario Calculations 690 For the parameters listed in Section 5.1, our resulting addWaitTime 691 is: 693 addWaitTime = 30 694 + 10 695 + 1 / 2 696 + 1 / 2 (days) 698 addWaitTime = 43 (days) 700 This addWaitTime of 42.5 days is 12.5 days longer than just the hold 701 down timer, even with the needed retrySafetyMargin value being left 702 out (which we exclude due to the lack of necessary operational 703 parameters). 705 6.3. Timing Requirements For Revoking an Old KSK 707 This issue affects not just the publication of new DNSKEYs intended 708 to be used as trust anchors, but also the length of time required to 709 continuously publish a DNSKEY with the revoke bit set. 711 Section 6.2.1 defines a method for calculating the amount of time 712 operators need to wait until it is safe to cease publishing a DNSKEY 713 (especially useful for writing code involving sleep based timers), 714 and Section 6.2.2 defines a method for calculating a minimal wall- 715 clock value after which it is safe to cease publishing a DNSKEY 716 (especially useful for writing code based on clock-based event 717 triggers). 719 6.3.1. Wait Timer Based Calculation 721 Both of these publication timing requirements are affected by the 722 attacks described in this document, but with revocation the key is 723 revoked immediately and the addHoldDown timer does not apply. Thus 724 the minimum amount of time that a SEP Publisher must wait before 725 removing a revoked key from publication is: 727 remWaitTime = sigExpirationTimeRemaining 728 + activeRefresh 729 + timingSafetyMargin 730 + retrySafetyMargin 732 remWaitTime = sigExpirationTimeRemaining 733 + MAX(1 hour, 734 MIN((sigExpirationTime) / 2, 735 MAX(TTL of K_old DNSKEY RRSet) / 2, 736 15 days)) 737 + activeRefresh 738 + retrySafetyMargin 740 Note also that adding retryTime intervals to the remWaitTime may be 741 wise, just as it was for addWaitTime in Section 6. 743 6.3.2. Wall-Clock Based Calculation 745 Like before, the above equations are defined based upon how long to 746 wait from a particular moment in time. An alternative, but 747 equivalent, method is to calculate the date and time before which it 748 is unsafe to cease publishing a revoked key. This calculation thus 749 becomes: 751 remWallClockTime = lastSigExpirationTime 752 + activeRefresh 753 + timingSafetyMargin 754 + retrySafetyMargin 756 remWallClockTime = lastSigExpirationTime 757 + MAX(1 hour, 758 MIN((sigExpirationTime) / 2, 759 MAX(TTL of K_old DNSKEY RRSet) / 2, 760 15 days)) 761 + timingSafetyMargin 762 + retrySafetyMargin 764 where lastSigExpirationTime is the latest value of any 765 sigExpirationTime for which RRSIGs were created that could 766 potentially be replayed. Fully expanded, this becomes: 768 6.3.3. Additional Considerations for RFC7583 770 Note that our notion of remWaitTime is called "Irev" in 771 Section 3.3.4.2 of [RFC7583]. The equation for Irev in RFC7583 is 772 insecure as it does not include the sigExpirationTime listed above. 773 The Irev equation in RFC7583 also does not include a safety margin, 774 though that is an operational consideration. 776 6.3.4. Example Scenario Calculations 778 For the parameters listed in Section 5.1, our example: 780 remwaitTime = 10 781 + 1 / 2 (days) 783 remwaitTime = 10.5 (days) 785 Note that for the values in this example produce a length shorter 786 than the recommended 30 days in RFC5011's section 6.6, step 3. Other 787 values of sigExpirationTime and the original TTL of the K_old DNSKEY 788 RRSet, however, can produce values longer than 30 days. 790 Note that because revocation happens immediately, an attacker has a 791 much harder job tricking a RFC5011 Resolver into leaving a trust 792 anchor in place, as the attacker must successfully replay the old 793 data for every query a RFC5011 Resolver sends, not just one. 795 7. IANA Considerations 797 This document contains no IANA considerations. 799 8. Operational Considerations 801 A companion document to RFC5011 was expected to be published that 802 describes the best operational practice considerations from the 803 perspective of a zone publisher and SEP Publisher. However, this 804 companion document has yet to be published. The authors of this 805 document hope that it will at some point in the future, as RFC5011 806 timing can be tricky as we have shown, and a BCP is clearly 807 warranted. This document is intended only to fill a single 808 operational void which, when left misunderstood, can result in 809 serious security ramifications. This document does not attempt to 810 document any other missing operational guidance for zone publishers. 812 9. Security Considerations 814 This document, is solely about the security considerations with 815 respect to the SEP Publisher's ability to advertise new DNSKEYs via 816 the RFC5011 automated trust anchor update process. Thus the entire 817 document is a discussion of Security Considerations when adding or 818 removing DNSKEYs from trust anchor storage using the RFC5011 process. 820 For simplicity, this document assumes that the SEP Publisher will use 821 a consistent RRSIG validity period. SEP Publishers that vary the 822 length of RRSIG validity periods will need to adjust the 823 sigExpirationTime value accordingly so that the equations in 824 Section 6 and Section 6.3 use a value that coincides with the last 825 time a replay of older RRSIGs will no longer succeed. 827 10. Acknowledgements 829 The authors would like to especially thank to Michael StJohns for his 830 help and advice and the care and thought he put into RFC5011 itself 831 and his continued reviews and suggestions for this document. He also 832 designed the suggested math behind the suggested retrySafetyMargin 833 values in Section 6.1.7. 835 We would also like to thank Bob Harold, Shane Kerr, Matthijs Mekking, 836 Duane Wessels, Petr Petr Spacek, Ed Lewis, and the dnsop working 837 group who have assisted with this document. 839 11. Normative References 841 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 842 Requirement Levels", BCP 14, RFC 2119, 843 DOI 10.17487/RFC2119, March 1997, . 846 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 847 Rose, "DNS Security Introduction and Requirements", 848 RFC 4033, DOI 10.17487/RFC4033, March 2005, 849 . 851 [RFC5011] StJohns, M., "Automated Updates of DNS Security (DNSSEC) 852 Trust Anchors", STD 74, RFC 5011, DOI 10.17487/RFC5011, 853 September 2007, . 855 [RFC7583] Morris, S., Ihren, J., Dickinson, J., and W. Mekking, 856 "DNSSEC Key Rollover Timing Considerations", RFC 7583, 857 DOI 10.17487/RFC7583, October 2015, . 860 [RFC7719] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS 861 Terminology", RFC 7719, DOI 10.17487/RFC7719, December 862 2015, . 864 Appendix A. Real World Example: The 2017 Root KSK Key Roll 866 In 2017 and 2018, ICANN expects to (or has, depending on when you're 867 reading this) roll the key signing key (KSK) for the root zone. The 868 relevant parameters associated with the root zone at the time of this 869 writing is as follows: 871 addHoldDownTime: 30 days 872 Old DNSKEY sigExpirationTime: 21 days 873 Old DNSKEY TTL: 2 days 875 Thus, sticking this information into the equation in 876 Section Section 6 yields (in days from publication time): 878 addWaitTime = 30 879 + 21 880 + MAX(1 hour, 881 MIN(21 / 2, # activeRefresh 882 MAX(2) / 2, 883 15 days), 884 ) 885 + activeRefresh 887 addWaitTime = 30 + 21 + 1 + 1 889 addWaitTime = 53 days 891 Also note that we exclude the retrySafetyMargin value, which is 892 calculated based on the expected client deployment size. 894 Thus, ICANN must wait a minimum of 52 days before switching to the 895 newly published KSK (and 26 days before removing the old revoked key 896 once it is published as revoked). ICANN's current plans involve 897 waiting over 3 months before using the new KEY and 69 days before 898 removing the old, revoked key. Thus, their current rollover plans 899 are sufficiently secure from the attack discussed in this memo. 901 Authors' Addresses 903 Wes Hardaker 904 USC/ISI 905 P.O. Box 382 906 Davis, CA 95617 907 US 909 Email: ietf@hardakers.net 911 Warren Kumari 912 Google 913 1600 Amphitheatre Parkway 914 Mountain View, CA 94043 915 US 917 Email: warren@kumari.net