idnits 2.17.1 draft-ietf-dnsop-rfc5011-security-considerations-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC7583, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 23, 2018) is 2224 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 7583 ** Obsolete normative reference: RFC 7719 (Obsoleted by RFC 8499) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 dnsop W. Hardaker 3 Internet-Draft USC/ISI 4 Updates: 7583 (if approved) W. Kumari 5 Intended status: Standards Track Google 6 Expires: September 24, 2018 March 23, 2018 8 Security Considerations for RFC5011 Publishers 9 draft-ietf-dnsop-rfc5011-security-considerations-12 11 Abstract 13 This document extends the RFC5011 rollover strategy with timing 14 advice that must be followed by the publisher in order to maintain 15 security. Specifically, this document describes the math behind the 16 minimum time-length that a DNS zone publisher must wait before 17 signing exclusively with recently added DNSKEYs. This document also 18 describes the minimum time-length that a DNS zone publisher must wait 19 after publishing a revoked DNSKEY before assuming that all active 20 RFC5011 resolvers should have seen the revocation-marked key and 21 removed it from their list of trust anchors. 23 This document contains much math and complicated equations, but the 24 summary is that the key rollover / revocation time is much longer 25 than intuition would suggest. If you are not both publishing a 26 DNSSEC DNSKEY, and using RFC5011 to advertise this DNSKEY as a new 27 Secure Entry Point key for use as a trust anchor, you probably don't 28 need to read this document. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on September 24, 2018. 47 Copyright Notice 49 Copyright (c) 2018 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Document History and Motivation . . . . . . . . . . . . . 3 66 1.2. Safely Rolling the Root Zone's KSK in 2017/2018 . . . . . 4 67 1.3. Requirements notation . . . . . . . . . . . . . . . . . . 4 68 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 4. Timing Associated with RFC5011 Processing . . . . . . . . . . 5 71 4.1. Timing Associated with Publication . . . . . . . . . . . 5 72 4.2. Timing Associated with Revocation . . . . . . . . . . . . 5 73 5. Denial of Service Attack Walkthrough . . . . . . . . . . . . 6 74 5.1. Enumerated Attack Example . . . . . . . . . . . . . . . . 6 75 5.1.1. Attack Timing Breakdown . . . . . . . . . . . . . . . 7 76 6. Minimum RFC5011 Timing Requirements . . . . . . . . . . . . . 8 77 6.1. Equation Components . . . . . . . . . . . . . . . . . . . 9 78 6.1.1. addHoldDownTime . . . . . . . . . . . . . . . . . . . 9 79 6.1.2. lastSigExpirationTime . . . . . . . . . . . . . . . . 9 80 6.1.3. sigExpirationTime . . . . . . . . . . . . . . . . . . 9 81 6.1.4. sigExpirationTimeRemaining . . . . . . . . . . . . . 9 82 6.1.5. activeRefresh . . . . . . . . . . . . . . . . . . . . 9 83 6.1.6. timingSafetyMargin . . . . . . . . . . . . . . . . . 10 84 6.1.7. retrySafetyMargin . . . . . . . . . . . . . . . . . . 12 85 6.2. Timing Requirements For Adding a New KSK . . . . . . . . 13 86 6.2.1. Wait Timer Based Calculation . . . . . . . . . . . . 14 87 6.2.2. Wall-Clock Based Calculation . . . . . . . . . . . . 14 88 6.2.3. Timing Constraint Summary . . . . . . . . . . . . . . 15 89 6.2.4. Additional Considerations for RFC7583 . . . . . . . . 15 90 6.2.5. Example Scenario Calculations . . . . . . . . . . . . 15 91 6.3. Timing Requirements For Revoking an Old KSK . . . . . . . 16 92 6.3.1. Wait Timer Based Calculation . . . . . . . . . . . . 16 93 6.3.2. Wall-Clock Based Calculation . . . . . . . . . . . . 16 94 6.3.3. Additional Considerations for RFC7583 . . . . . . . . 17 95 6.3.4. Example Scenario Calculations . . . . . . . . . . . . 17 96 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 97 8. Operational Considerations . . . . . . . . . . . . . . . . . 18 98 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 99 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 100 11. Normative References . . . . . . . . . . . . . . . . . . . . 19 101 Appendix A. Real World Example: The 2017 Root KSK Key Roll . . . 19 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 104 1. Introduction 106 [RFC5011] defines a mechanism by which DNSSEC validators can update 107 their list of trust anchors when they've seen a new key published in 108 a zone or revoke a properly marked key from a trust anchor list. 109 However, RFC5011 [intentionally] provides no guidance to the 110 publishers of DNSKEYs about how long they must wait before switching 111 to exclusively using recently published keys for signing records, or 112 how long they must wait before ceasing publication of a revoked key. 113 Because of this lack of guidance, zone publishers may arrive at 114 incorrect assumptions about safe usage of the RFC5011 DNSKEY 115 advertising, rolling and revocation process. This document describes 116 the minimum security requirements from a publisher's point of view 117 and is intended to complement the guidance offered in RFC5011 (which 118 is written to provide timing guidance solely to a Validating 119 Resolver's point of view). 121 To explain the RFC5011 security analysis in this document better, 122 Section 5 first describes an attack on a zone publisher. Then in 123 Section 6.1 we break down each of the timing components that will be 124 later used to define timing requirements for adding keys in 125 Section 6.2 and revoking keys in Section 6.3. 127 1.1. Document History and Motivation 129 To confirm that this lack of understanding is wide-spread, the 130 authors reached out to 5 DNSSEC experts to ask them how long they 131 thought they must wait before signing a zone exclusively with a new 132 KSK [RFC4033] that was being introduced according to the 5011 133 process. All 5 experts answered with an insecure value, and we 134 determined that this lack of understanding might cause security 135 concerns in deployment. We hope that this companion document to 136 RFC5011 will rectify this and provide better guidance to zone 137 publishers who wish to make use of the RFC5011 rollover process. 139 1.2. Safely Rolling the Root Zone's KSK in 2017/2018 141 One important note about ICANN's (currently in process) 2017/2018 KSK 142 rollover plan for the root zone: the timing values chosen for rolling 143 the KSK in the root zone appear completely safe, and are not affected 144 by the timing concerns discussed in this draft. 146 1.3. Requirements notation 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 150 document are to be interpreted as described in [RFC2119]. 152 2. Background 154 RFC5011 describes a process by which an RFC5011 Resolver may accept a 155 newly published KSK as a trust anchor for validating future DNSSEC 156 signed records. It also describes the process for publicly revoking 157 a published KSK. This document augments that information with 158 additional constraints, from the SEP publisher's points of view. 159 Note that this document does not define any other operational 160 guidance or recommendations about the RFC5011 process and restricts 161 itself solely to the security and operational ramifications of 162 prematurely switching to exclusively using recently added keys or 163 removing revoked keys. 165 Failure of a DNSKEY publisher to follow the minimum recommendations 166 associated with this draft can result in potential denial-of-service 167 attack opportunities against validating resolvers. Failure of a 168 DNSKEY publisher to publish a revoked key for a long enough period of 169 time may result in RFC5011 Resolvers leaving that key in their trust 170 anchor storage beyond the key's expected lifetime. 172 3. Terminology 174 SEP Publisher The entity responsible for publishing a DNSKEY (with 175 the Secure Entry Point (SEP) bit set) that can be used as a trust 176 anchor. 178 Zone Signer The owner of a zone intending to publish a new Key- 179 Signing-Key (KSK) that may become a trust anchor for validators 180 following the RFC5011 process. 182 RFC5011 Resolver A DNSSEC Resolver that is using the RFC5011 183 processes to track and update trust anchors. 185 Attacker An entity intent on foiling the RFC5011 Resolver's ability 186 to successfully adopt the Zone Signer's new DNSKEY as a new trust 187 anchor or to prevent the RFC5011 Resolver from removing an old 188 DNSKEY from its list of trust anchors. 190 sigExpirationTime The amount of time between the DNSKEY RRSIG's 191 Signature Inception field and the Signature Expiration field. 193 Also see Section 2 of [RFC4033] and [RFC7719] for additional 194 terminology. 196 4. Timing Associated with RFC5011 Processing 198 These subsections below give a high-level overview of [RFC5011] 199 processing. This description is not sufficient for fully 200 understanding RFC5011, but provide enough background for the reader 201 to follow the discussion in this document. Readers need to fully 202 understand [RFC5011] as well to fully comprehend the content and 203 importance of this document. 205 4.1. Timing Associated with Publication 207 RFC5011's process of safely publishing a new DNSKEY and then assuming 208 RFC5011 Resolvers have adopted it for trust can be broken down into a 209 number of high-level steps to be performed by the SEP Publisher. 210 This document discusses the following scenario, which the principal 211 way RFC5011 is currently being used (even though Section 6 of RFC5011 212 suggests having a stand-by key available): 214 1. Publish a new DNSKEY in a zone, but continue to sign the zone 215 with the old one. 217 2. Wait a period of time. 219 3. Begin to exclusively use recently published DNSKEYs to sign the 220 appropriate resource records. 222 This document discusses the time required to wait during step 2 of 223 the above process. Some interpretations of RFC5011 have erroneously 224 determined that the wait time is equal to RFC5011's "hold down time". 225 Section 5 describes an attack based on this (common) erroneous 226 belief, which can result in a denial of service attack against the 227 zone. 229 4.2. Timing Associated with Revocation 231 RFC5011's process of advertising that an old key is to be revoked 232 from RFC5011 Resolvers falls into a number of high-level steps: 234 1. Set the revoke bit on the DNSKEY to be revoked. 236 2. Sign the revoked DNSKEY with itself. 238 3. Wait a period of time. 240 4. Remove the revoked key from the zone. 242 This document discusses the time required to wait in step 3 of the 243 above process. Some interpretations of RFC5011 have erroneously 244 determined that the wait time is equal to RFC5011's "hold down time". 245 This document describes an attack based on this (common) erroneous 246 belief, which results in a revoked DNSKEY potentially remaining as a 247 trust anchor in a RFC5011 Resolver long past its expected usage. 249 5. Denial of Service Attack Walkthrough 251 This section serves as an illustrative example of the problem being 252 discussed in this document. Note that in order to keep the example 253 simple enough to understand, some simplifications were made (such as 254 by not creating a set of pre-signed RRSIGs and by not using values 255 that result in the addHoldDownTime not being evenly divisible by the 256 activeRefresh value); the mathematical formulas in Section 6 are, 257 however, complete. 259 If an attacker is able to provide a RFC5011 Resolver with past 260 responses, such as when it is on-path or able to perform any number 261 of cache poisoning attacks, the attacker may be able to leave 262 compliant RFC5011 Resolvers without an appropriate DNSKEY trust 263 anchor. This scenario will remain until an administrator manually 264 fixes the situation. 266 The time-line below illustrates an example of this situation. 268 5.1. Enumerated Attack Example 270 The following settings are used in the example scenario within this 271 section: 273 TTL (all records) 1 day 275 sigExpirationTime 10 days 277 Zone resigned every 1 day 279 Given these settings, the sequence of events in Section 5.1.1 depicts 280 how a SEP Publisher that waits for only the RFC5011 hold time timer 281 length of 30 days subjects its users to a potential Denial of Service 282 attack. The timeline below is based on a SEP Publisher publishing a 283 new Key Signing Key (KSK), with the intent that it will later be used 284 as a trust anchor. We label this publication time as "T+0". All 285 numbers in this timeline refer to days before and after this initial 286 publication event. Thus, T-1 is the day before the introduction of 287 the new key, and T+15 is the 15th day after the key was introduced 288 into the example zone being discussed. 290 In this exposition, we consider two keys within the example zone: 292 K_old: An older KSK and Trust Anchor being replaced. 294 K_new: A new KSK being transitioned into active use and expected to 295 become a Trust Anchor via the RFC5011 automated trust anchor 296 update process. 298 5.1.1. Attack Timing Breakdown 300 Below we examine an attack that foils the adoption of a new DNSKEY by 301 a 5011 Resolver when the SEP Publisher that starts signing and 302 publishing with the new DNSKEY too quickly. 304 T-1 The K_old based RRSIGs are being published by the Zone Signer. 305 [It may also be signing ZSKs as well, but they are not relevant to 306 this event so we will not talk further about them; we are only 307 considering the RRSIGs that cover the DNSKEYs in this document.] 308 The Attacker queries for, retrieves and caches this DNSKEY set and 309 corresponding RRSIG signatures. 311 T+0 The Zone Signer adds K_new to their zone and signs the zone's 312 key set with K_old. The RFC5011 Resolver (later to be under 313 attack) retrieves this new key set and corresponding RRSIGs and 314 notices the publication of K_new. The RFC5011 Resolver starts the 315 (30-day) hold-down timer for K_new. [Note that in a more real- 316 world scenario there will likely be a further delay between the 317 point where the Zone Signer publishes a new RRSIG and the RFC5011 318 Resolver notices its publication; though not shown in this 319 example, this delay is accounted for in the equation in Section 6 320 below] 322 T+5 The RFC5011 Resolver queries for the zone's keyset per the 323 RFC5011 Active Refresh schedule, discussed in Section 2.3 of 324 RFC5011. Instead of receiving the intended published keyset, the 325 Attacker successfully replays the keyset and associated signatures 326 recorded at T-1 to the victim RFC5011 Resolver. Because the 327 signature lifetime is 10 days (in this example), the replayed 328 signature and keyset is accepted as valid (being only 6 days old, 329 which is less than sigExpirationTime) and the RFC5011 Resolver 330 cancels the (30-day) hold-down timer for K_new, per the RFC5011 331 algorithm. 333 T+10 The RFC5011 Resolver queries for the zone's keyset and 334 discovers a signed keyset that includes K_new (again), and is 335 signed by K_old. Note: the attacker is unable to replay the 336 records cached at T-1, because the signatures have now expired. 337 Thus at T+10, the RFC5011 Resolver starts (anew) the hold-timer 338 for K_new. 340 T+11 through T+29 The RFC5011 Resolver continues checking the zone's 341 key set at the prescribed regular intervals. During this period, 342 the attacker can no longer replay traffic to their benefit. 344 T+30 The Zone Signer knows that this is the first time at which some 345 validators might accept K_new as a new trust anchor, since the 346 hold-down timer of a RFC5011 Resolver not under attack that had 347 queried and retrieved K_new at T+0 would now have reached 30 days. 348 However, the hold-down timer of our attacked RFC5011 Resolver is 349 only at 20 days. 351 T+35 The Zone Signer (mistakenly) believes that all validators 352 following the Active Refresh schedule (Section 2.3 of RFC5011) 353 should have accepted K_new as a the new trust anchor (since the 354 hold down time (30 days) + the query interval [which is just 1/2 355 the signature validity period in this example] would have passed). 356 However, the hold-down timer of our attacked RFC5011 Resolver is 357 only at 25 days (T+35 minus T+10); thus the RFC5011 Resolver won't 358 consider it a valid trust anchor addition yet, as the required 30 359 days have not yet elapsed. 361 T+36 The Zone Signer, believing K_new is safe to use, switches their 362 active signing KSK to K_new and publishes a new RRSIG, signed with 363 (only) K_new, covering the DNSKEY set. Non-attacked RFC5011 364 validators, with a hold-down timer of at least 30 days, would have 365 accepted K_new into their set of trusted keys. But, because our 366 attacked RFC5011 Resolver now has a hold-down timer for K_new of 367 only 26 days, it failed to ever accept K_new as a trust anchor. 368 Since K_old is no longer being used to sign the zone's DNSKEYs, 369 all the DNSKEY records from the zone will be treated as invalid. 370 Subsequently, all of the records in the DNS tree below the zone's 371 apex will be deemed invalid by DNSSEC. 373 6. Minimum RFC5011 Timing Requirements 375 This section defines the minimum timing requirements for making 376 exclusive use of newly added DNSKEYs and timing requirements for 377 ceasing the publication of DNSKEYs to be revoked. We break our 378 timing solution requirements into two primary components: the 379 mathematically-based security analysis of the RFC5011 publication 380 process itself, and an extension of this that takes operational 381 realities into account that further affect the recommended timings. 383 First, we define the component terms used in all equations in 384 Section 6.1. 386 6.1. Equation Components 388 6.1.1. addHoldDownTime 390 The addHoldDownTime is defined in Section 2.4.1 of [RFC5011] as: 392 The add hold-down time is 30 days or the expiration time of the 393 original TTL of the first trust point DNSKEY RRSet that contained 394 the new key, whichever is greater. This ensures that at least 395 two validated DNSKEY RRSets that contain the new key MUST be seen 396 by the resolver prior to the key's acceptance. 398 6.1.2. lastSigExpirationTime 400 The latest value (i.e. the future most date and time) of any RRSig 401 Signature Expiration field covering any DNSKEY RRSet containing only 402 the old trust anchor(s) that are being superseded. Note that for 403 organizations pre-creating signatures this time may be fairly far in 404 the future unless they can be significantly assured that none of 405 their pre-generated signatures can be replayed at a later date. 407 6.1.3. sigExpirationTime 409 The amount of time between the DNSKEY RRSIG's Signature Inception 410 field and the Signature Expiration field. 412 6.1.4. sigExpirationTimeRemaining 414 sigExpirationTimeRemaining is defined in Section 3. 416 6.1.5. activeRefresh 418 activeRefresh time is defined by RFC5011 by 420 A resolver that has been configured for an automatic update 421 of keys from a particular trust point MUST query that trust 422 point (e.g., do a lookup for the DNSKEY RRSet and related 423 RRSIG records) no less often than the lesser of 15 days, half 424 the original TTL for the DNSKEY RRSet, or half the RRSIG 425 expiration interval and no more often than once per hour. 427 This translates to: 429 activeRefresh = MAX(1 hour, 430 MIN(sigExpirationTime / 2, 431 MAX(TTL of K_old DNSKEY RRSet) / 2, 432 15 days) 433 ) 435 6.1.6. timingSafetyMargin 437 Mentally, it is easy to assume that the period of time required for 438 SEP publishers to wait after making changes to SEP marked DNSKEY sets 439 will be entirely based on the length of the addHoldDownTime. 440 Unfortunately, analysis shows that both the design of the RFC5011 441 protocol an the operational realities in deploying it require waiting 442 and additional period of time longer. In subsections Section 6.1.6.1 443 to Section 6.1.6.3 below, we discuss three sources of additional 444 delay. In the end, we will pick the largest of these delays as the 445 minimum additional time that the SEP Publisher must wait in our final 446 timingSafetyMargin value, which we define in Section 6.1.6.4. 448 6.1.6.1. activeRefreshOffset 450 A security analysis of the timing associated with the query rate of 451 RFC5011 Resolvers shows that it may not perfectly align with the 452 addHoldDownTime when the addHoldDownTime is not evenly divisible by 453 the activeRefresh time. Consider the example of a zone with an 454 activeRefresh period of 7 days. If an associated RFC5011 Resolver 455 started it's holdDown timer just after the SEP published a new DNSKEY 456 (at time T+0), the resolver would send checking queries at T+7, T+14, 457 T+21 and T+28 Days and will finally accept it at T+35 days, which is 458 5 days longer than the 30-day addHoldDownTime. 460 The activeRefreshOffset term defines this time difference and 461 becomes: 463 activeRefreshOffset = addHoldDownTime % activeRefresh 465 The % symbol denotes the mathematical mod operator (calculating the 466 remainder in a division problem). This will frequently be zero, but 467 can be nearly as large as activeRefresh itself. 469 6.1.6.2. clockskewDriftMargin 471 Even small clock drifts can have negative impacts upon the timing of 472 the RFC5011 Resolver's measurements. Consider the simplest case 473 where the RFC5011 Resolver's clock shifts over time to be 2 seconds 474 slower near the end of the RFC5011 Resolver's addHoldDownTime period. 475 I.E., if the RFC5011 Resolver first noticed a new DNSKEY at: 477 firstSeen = sigExpirationTime + activeRefresh + 1 second 479 The effect of 2 second clock drift between the SEP Publisher and the 480 RFC5011 Resolver may result in the RFC5011 Resolver querying again 481 at: 483 justBefore = sigExpirationTime + addHoldDownTime + 484 activeRefresh + 1 second - 2 seconds 486 which becomes: 488 justBefore = sigExpirationTime + addHoldDownTime + 489 activeRefresh - 1 second 491 The net effect is the addHoldDownTime will not have been reached from 492 the perspective of the RFC5011 Resolver, but it will have been 493 reached from the perspective of the SEP Publisher. The net effect is 494 it may take one additional activeRefresh period longer for this 495 RFC5011 Resolver to accept the new key (at sigExpirationTime + 496 addHoldDownTime + 2 * activeRefresh - 1 second). 498 We note that even the smallest clockskew errors can require waiting 499 an additional activeRefresh period, and thus define the 500 clockskewDriftMargin as: 502 clockskewDriftMargin = activeRefresh 504 6.1.6.3. retryDriftMargin 506 Drift associated with a lost transmission and an accompanying re- 507 transmission (see Section 2.3 of [RFC5011]) will cause RFC5011 508 Resolvers to also change the timing associated with query times such 509 that it becomes impossible to predict, from the perspective of the 510 SEP Publisher, when the conclusive measurement query will arrive. 511 Similarly, any software that restarts/reboots without saving next- 512 query timing state may also commence with a new random starting time. 513 Thus, an additional activeRefresh is needed to handle both these 514 cases as well. 516 retryDriftMargin = activeRefresh 518 Note that we account for additional time associated with cumulative 519 multiple retries, especially under high-loss conditions, in 520 Section 6.1.6.4. 522 6.1.6.4. timingSafetyMargin Value 524 The activeRefreshOffset, clockskewDriftMargin, and retryDriftMargin 525 parameters all deal with additional wait-periods that must be 526 accounted for after analyzing what conditions the client will take 527 longer than expected to make its last query while waiting for the 528 addHoldDownTime period to pass. But these values may be merged into 529 a single term by waiting the longest of any of them. We define 530 timingSafetyMargin as this "worst case" value: 532 timingSafetyMargin = MAX(activeRefreshOffset, 533 clockskewDriftMargin, 534 retryDriftMargin) 536 timingSafetyMargin = MAX(addWaitTime % activeRefresh, 537 activeRefresh, 538 activeRefresh) 540 timingSafetyMargin = activeRefresh 542 6.1.7. retrySafetyMargin 544 The retrySafetyMargin is an extra period of time to account for 545 caching, network delays, dropped packets, and other operational 546 concerns otherwise beyond the scope of this document. The value 547 operators should chose is highly dependent on the deployment 548 situation associated with their zone. Note that no value of a 549 retrySafetyMargin can protect against resolvers that are "down". 550 Nonetheless, we do offer the following as one method considering 551 reasonable values to select from. 553 The following list of variables need to be considered when selecting 554 an appropriate retrySafetyMargin value: 556 successRate: A likely success rate for client queries and retries 558 numResolvers: The number of client RFC5011 Resolvers 560 Note that RFC5011 defines retryTime as: 562 If the query fails, the resolver MUST repeat the query until 563 satisfied no more often than once an hour and no less often 564 than the lesser of 1 day, 10% of the original TTL, or 10% of 565 the original expiration interval. That is, 566 retryTime = MAX (1 hour, MIN (1 day, .1 * origTTL, 567 .1 * expireInterval)). 569 With the successRate and numResolvers values selected and the 570 definition of retryTime from RFC5011, one method for determining how 571 many retryTime intervals to wait in order to reduce the set of 572 resolvers that have not accepted the new trust anchor to 0 is thus: 574 x = (1/(1 - successRate)) 576 retryCountWait = Log_base_x(numResolvers) 578 To reduce the need for readers to pull out a scientific calculator, 579 we offer the following lookup table based on successRate and 580 numResolvers: 582 retryCountWait lookup table 583 --------------------------- 585 Number of client RFC5011 Resolvers (numResolvers) 586 ------------------------------------------------- 587 10,000 100,000 1,000,000 10,000,000 100,000,000 588 0.01 917 1146 1375 1604 1833 589 Probability 0.05 180 225 270 315 360 590 of Success 0.10 88 110 132 153 175 591 Per Retry 0.15 57 71 86 100 114 592 Interval 0.25 33 41 49 57 65 593 (successRate) 0.50 14 17 20 24 27 594 0.90 4 5 6 7 8 595 0.95 4 4 5 6 7 596 0.99 2 3 3 4 4 597 0.999 2 2 2 3 3 599 Finally, a suggested value of retrySafetyMargin can then be this 600 retryCountWait number multiplied by the retryTime from RFC5011: 602 retrySafetyMargin = retryCountWait * retryTime 604 6.2. Timing Requirements For Adding a New KSK 606 Given the defined parameters and analysis from Section 6.1, we can 607 now create a method for calculating the amount of time to wait until 608 it is safe to start signing exclusively with a new DNSKEY (especially 609 useful for writing code involving sleep based timers) in 610 Section 6.2.1, and define a method for calculating a wall-clock value 611 after which it is safe to start signing exclusively with a new DNSKEY 612 (especially useful for writing code based on clock-based event 613 triggers) in Section 6.2.2. 615 6.2.1. Wait Timer Based Calculation 617 Given the attack description in Section 5, the correct minimum length 618 of time required for the Zone Signer to wait after publishing K_new 619 but before exclusively using it and newer keys is: 621 addWaitTime = addHoldDownTime 622 + sigExpirationTimeRemaining 623 + activeRefresh 624 + timingSafetyMargin 625 + retrySafetyMargin 627 6.2.1.1. Fully expanded equation 629 Given the equation components defined in Section 6.1, the full 630 expanded equation is: 632 addWaitTime = addHoldDownTime 633 + sigExpirationTimeRemaining 634 + 2 * MAX(1 hour, 635 MIN(sigExpirationTime / 2, 636 MAX(TTL of K_old DNSKEY RRSet) / 2, 637 15 days) 638 ) 639 + retrySafetyMargin 641 6.2.2. Wall-Clock Based Calculation 643 The equations in Section 6.2.1 are defined based upon how long to 644 wait from a particular moment in time. An alternative, but 645 equivalent, method is to calculate the date and time before which it 646 is unsafe to use a key for signing. This calculation thus becomes: 648 addWallClockTime = lastSigExpirationTime 649 + addHoldDownTime 650 + activeRefresh 651 + timingSafetyMargin 652 + retrySafetyMargin 654 where lastSigExpirationTime is the latest value of any 655 sigExpirationTime for which RRSIGs were created that could 656 potentially be replayed. Fully expanded, this becomes: 658 addWallClockTime = lastSigExpirationTime 659 + addHoldDownTime 660 + 2 * MAX(1 hour, 661 MIN(sigExpirationTime / 2, 662 MAX(TTL of K_old DNSKEY RRSet) / 2, 663 15 days) 664 ) 665 + retrySafetyMargin 667 6.2.3. Timing Constraint Summary 669 The important timing constraint introduced by this memo relates to 670 the last point at which a RFC5011 Resolver may have received a 671 replayed original DNSKEY set, containing K_old and not K_new. The 672 next query of the RFC5011 validator at which K_new will be seen 673 without the potential for a replay attack will occur after the old 674 DNSKEY RRSIG's Signature Expriation Time. Thus, the latest time that 675 a RFC5011 Validator may begin their hold down timer is an "Active 676 Refresh" period after the last point that an attacker can replay the 677 K_old DNSKEY set. The worst case scenario of this attack is if the 678 attacker can replay K_old just seconds before the (DNSKEY RRSIG 679 Signature Validity) field of the last K_old only RRSIG. 681 6.2.4. Additional Considerations for RFC7583 683 Note: our notion of addWaitTime is called "Itrp" in Section 3.3.4.1 684 of [RFC7583]. The equation for Itrp in RFC7583 is insecure as it 685 does not include the sigExpirationTime listed above. The Itrp 686 equation in RFC7583 also does not include the 2*TTL safety margin, 687 though that is an operational consideration. 689 6.2.5. Example Scenario Calculations 691 For the parameters listed in Section 5.1, our resulting addWaitTime 692 is: 694 addWaitTime = 30 695 + 10 696 + 1 / 2 697 + 1 / 2 (days) 699 addWaitTime = 43 (days) 701 This addWaitTime of 42.5 days is 12.5 days longer than just the hold 702 down timer, even with the needed retrySafetyMargin value being left 703 out (which we exclude due to the lack of necessary operational 704 parameters). 706 6.3. Timing Requirements For Revoking an Old KSK 708 This issue affects not just the publication of new DNSKEYs intended 709 to be used as trust anchors, but also the length of time required to 710 continuously publish a DNSKEY with the revoke bit set. 712 Section 6.2.1 defines a method for calculating the amount of time 713 operators need to wait until it is safe to cease publishing a DNSKEY 714 (especially useful for writing code involving sleep based timers), 715 and Section 6.2.2 defines a method for calculating a minimal wall- 716 clock value after which it is safe to cease publishing a DNSKEY 717 (especially useful for writing code based on clock-based event 718 triggers). 720 6.3.1. Wait Timer Based Calculation 722 Both of these publication timing requirements are affected by the 723 attacks described in this document, but with revocation the key is 724 revoked immediately and the addHoldDown timer does not apply. Thus 725 the minimum amount of time that a SEP Publisher must wait before 726 removing a revoked key from publication is: 728 remWaitTime = sigExpirationTimeRemaining 729 + activeRefresh 730 + timingSafetyMargin 731 + retrySafetyMargin 733 remWaitTime = sigExpirationTimeRemaining 734 + MAX(1 hour, 735 MIN((sigExpirationTime) / 2, 736 MAX(TTL of K_old DNSKEY RRSet) / 2, 737 15 days)) 738 + activeRefresh 739 + retrySafetyMargin 741 Note also that adding retryTime intervals to the remWaitTime may be 742 wise, just as it was for addWaitTime in Section 6. 744 6.3.2. Wall-Clock Based Calculation 746 Like before, the above equations are defined based upon how long to 747 wait from a particular moment in time. An alternative, but 748 equivalent, method is to calculate the date and time before which it 749 is unsafe to cease publishing a revoked key. This calculation thus 750 becomes: 752 remWallClockTime = lastSigExpirationTime 753 + activeRefresh 754 + timingSafetyMargin 755 + retrySafetyMargin 757 remWallClockTime = lastSigExpirationTime 758 + MAX(1 hour, 759 MIN((sigExpirationTime) / 2, 760 MAX(TTL of K_old DNSKEY RRSet) / 2, 761 15 days)) 762 + timingSafetyMargin 763 + retrySafetyMargin 765 where lastSigExpirationTime is the latest value of any 766 sigExpirationTime for which RRSIGs were created that could 767 potentially be replayed. Fully expanded, this becomes: 769 6.3.3. Additional Considerations for RFC7583 771 Note that our notion of remWaitTime is called "Irev" in 772 Section 3.3.4.2 of [RFC7583]. The equation for Irev in RFC7583 is 773 insecure as it does not include the sigExpirationTime listed above. 774 The Irev equation in RFC7583 also does not include a safety margin, 775 though that is an operational consideration. 777 6.3.4. Example Scenario Calculations 779 For the parameters listed in Section 5.1, our example: 781 remwaitTime = 10 782 + 1 / 2 (days) 784 remwaitTime = 10.5 (days) 786 Note that for the values in this example produce a length shorter 787 than the recommended 30 days in RFC5011's section 6.6, step 3. Other 788 values of sigExpirationTime and the original TTL of the K_old DNSKEY 789 RRSet, however, can produce values longer than 30 days. 791 Note that because revocation happens immediately, an attacker has a 792 much harder job tricking a RFC5011 Resolver into leaving a trust 793 anchor in place, as the attacker must successfully replay the old 794 data for every query a RFC5011 Resolver sends, not just one. 796 7. IANA Considerations 798 This document contains no IANA considerations. 800 8. Operational Considerations 802 A companion document to RFC5011 was expected to be published that 803 describes the best operational practice considerations from the 804 perspective of a zone publisher and SEP Publisher. However, this 805 companion document has yet to be published. The authors of this 806 document hope that it will at some point in the future, as RFC5011 807 timing can be tricky as we have shown, and a BCP is clearly 808 warranted. This document is intended only to fill a single 809 operational void which, when left misunderstood, can result in 810 serious security ramifications. This document does not attempt to 811 document any other missing operational guidance for zone publishers. 813 9. Security Considerations 815 This document, is solely about the security considerations with 816 respect to the SEP Publisher's ability to advertise new DNSKEYs via 817 the RFC5011 automated trust anchor update process. Thus the entire 818 document is a discussion of Security Considerations when adding or 819 removing DNSKEYs from trust anchor storage using the RFC5011 process. 821 For simplicity, this document assumes that the SEP Publisher will use 822 a consistent RRSIG validity period. SEP Publishers that vary the 823 length of RRSIG validity periods will need to adjust the 824 sigExpirationTime value accordingly so that the equations in 825 Section 6 and Section 6.3 use a value that coincides with the last 826 time a replay of older RRSIGs will no longer succeed. 828 10. Acknowledgements 830 The authors would like to especially thank to Michael StJohns for his 831 help and advice and the care and thought he put into RFC5011 itself 832 and his continued reviews and suggestions for this document. He also 833 designed the suggested math behind the suggested retrySafetyMargin 834 values in Section 6.1.7. 836 We would also like to thank Bob Harold, Shane Kerr, Matthijs Mekking, 837 Duane Wessels, Petr Petr Spacek, Ed Lewis, and the dnsop working 838 group who have assisted with this document. 840 11. Normative References 842 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 843 Requirement Levels", BCP 14, RFC 2119, March 1997. 845 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 846 Rose, "DNS Security Introduction and Requirements", 847 RFC 4033, DOI 10.17487/RFC4033, March 2005, 848 . 850 [RFC5011] StJohns, M., "Automated Updates of DNS Security (DNSSEC) 851 Trust Anchors", STD 74, RFC 5011, DOI 10.17487/RFC5011, 852 September 2007, . 854 [RFC7583] Morris, S., Ihren, J., Dickinson, J., and W. Mekking, 855 "DNSSEC Key Rollover Timing Considerations", RFC 7583, 856 DOI 10.17487/RFC7583, October 2015, . 859 [RFC7719] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS 860 Terminology", RFC 7719, DOI 10.17487/RFC7719, December 861 2015, . 863 Appendix A. Real World Example: The 2017 Root KSK Key Roll 865 In 2017 and 2018, ICANN expects to (or has, depending on when you're 866 reading this) roll the key signing key (KSK) for the root zone. The 867 relevant parameters associated with the root zone at the time of this 868 writing is as follows: 870 addHoldDownTime: 30 days 871 Old DNSKEY sigExpirationTime: 21 days 872 Old DNSKEY TTL: 2 days 874 Thus, sticking this information into the equation in 875 Section Section 6 yields (in days from publication time): 877 addWaitTime = 30 878 + 21 879 + MAX(1 hour, 880 MIN(21 / 2, # activeRefresh 881 MAX(2) / 2, 882 15 days), 883 ) 884 + activeRefresh 886 addWaitTime = 30 + 21 + 1 + 1 888 addWaitTime = 53 days 890 Also note that we exclude the retrySafetyMargin value, which is 891 calculated based on the expected client deployment size. 893 Thus, ICANN must wait a minimum of 52 days before switching to the 894 newly published KSK (and 26 days before removing the old revoked key 895 once it is published as revoked). ICANN's current plans involve 896 waiting over 3 months before using the new KEY and 69 days before 897 removing the old, revoked key. Thus, their current rollover plans 898 are sufficiently secure from the attack discussed in this memo. 900 Authors' Addresses 902 Wes Hardaker 903 USC/ISI 904 P.O. Box 382 905 Davis, CA 95617 906 US 908 Email: ietf@hardakers.net 910 Warren Kumari 911 Google 912 1600 Amphitheatre Parkway 913 Mountain View, CA 94043 914 US 916 Email: warren@kumari.net