idnits 2.17.1 draft-ietf-dnsop-rfc5011-security-considerations-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The abstract seems to indicate that this document updates RFC5011, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 16, 2018) is 2110 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 7719 (Obsoleted by RFC 8499) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 dnsop W. Hardaker 3 Internet-Draft USC/ISI 4 Updates: 7583 (if approved) W. Kumari 5 Intended status: Informational Google 6 Expires: January 17, 2019 July 16, 2018 8 Security Considerations for RFC5011 Publishers 9 draft-ietf-dnsop-rfc5011-security-considerations-13 11 Abstract 13 This document extends the RFC5011 rollover strategy with timing 14 advice that must be followed by the publisher in order to maintain 15 security. Specifically, this document describes the math behind the 16 minimum time-length that a DNS zone publisher must wait before 17 signing exclusively with recently added DNSKEYs. This document also 18 describes the minimum time-length that a DNS zone publisher must wait 19 after publishing a revoked DNSKEY before assuming that all active 20 RFC5011 resolvers should have seen the revocation-marked key and 21 removed it from their list of trust anchors. 23 This document contains much math and complicated equations, but the 24 summary is that the key rollover / revocation time is much longer 25 than intuition would suggest. This document updates RFC7583 by 26 adding an additional delays (sigExpirationTime and 27 timingSafetyMargin). 29 If you are not both publishing a DNSSEC DNSKEY, and using RFC5011 to 30 advertise this DNSKEY as a new Secure Entry Point key for use as a 31 trust anchor, you probably don't need to read this document. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at https://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 17, 2019. 50 Copyright Notice 52 Copyright (c) 2018 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Document History and Motivation . . . . . . . . . . . . . 3 69 1.2. Safely Rolling the Root Zone's KSK in 2017/2018 . . . . . 4 70 1.3. Requirements notation . . . . . . . . . . . . . . . . . . 4 71 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 4. Timing Associated with RFC5011 Processing . . . . . . . . . . 5 74 4.1. Timing Associated with Publication . . . . . . . . . . . 5 75 4.2. Timing Associated with Revocation . . . . . . . . . . . . 5 76 5. Denial of Service Attack Walkthrough . . . . . . . . . . . . 6 77 5.1. Enumerated Attack Example . . . . . . . . . . . . . . . . 6 78 5.1.1. Attack Timing Breakdown . . . . . . . . . . . . . . . 7 79 6. Minimum RFC5011 Timing Requirements . . . . . . . . . . . . . 8 80 6.1. Equation Components . . . . . . . . . . . . . . . . . . . 9 81 6.1.1. addHoldDownTime . . . . . . . . . . . . . . . . . . . 9 82 6.1.2. lastSigExpirationTime . . . . . . . . . . . . . . . . 9 83 6.1.3. sigExpirationTime . . . . . . . . . . . . . . . . . . 9 84 6.1.4. sigExpirationTimeRemaining . . . . . . . . . . . . . 9 85 6.1.5. activeRefresh . . . . . . . . . . . . . . . . . . . . 9 86 6.1.6. timingSafetyMargin . . . . . . . . . . . . . . . . . 10 87 6.1.7. retrySafetyMargin . . . . . . . . . . . . . . . . . . 12 88 6.2. Timing Requirements For Adding a New KSK . . . . . . . . 13 89 6.2.1. Wait Timer Based Calculation . . . . . . . . . . . . 14 90 6.2.2. Wall-Clock Based Calculation . . . . . . . . . . . . 14 91 6.2.3. Timing Constraint Summary . . . . . . . . . . . . . . 15 92 6.2.4. Additional Considerations for RFC7583 . . . . . . . . 15 93 6.2.5. Example Scenario Calculations . . . . . . . . . . . . 15 94 6.3. Timing Requirements For Revoking an Old KSK . . . . . . . 16 95 6.3.1. Wait Timer Based Calculation . . . . . . . . . . . . 16 96 6.3.2. Wall-Clock Based Calculation . . . . . . . . . . . . 16 97 6.3.3. Additional Considerations for RFC7583 . . . . . . . . 17 98 6.3.4. Example Scenario Calculations . . . . . . . . . . . . 17 99 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 100 8. Operational Considerations . . . . . . . . . . . . . . . . . 18 101 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 102 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 103 11. Normative References . . . . . . . . . . . . . . . . . . . . 19 104 Appendix A. Real World Example: The 2017 Root KSK Key Roll . . . 19 105 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 107 1. Introduction 109 [RFC5011] defines a mechanism by which DNSSEC validators can update 110 their list of trust anchors when they've seen a new key published in 111 a zone or revoke a properly marked key from a trust anchor list. 112 However, RFC5011 [intentionally] provides no guidance to the 113 publishers of DNSKEYs about how long they must wait before switching 114 to exclusively using recently published keys for signing records, or 115 how long they must wait before ceasing publication of a revoked key. 116 Because of this lack of guidance, zone publishers may arrive at 117 incorrect assumptions about safe usage of the RFC5011 DNSKEY 118 advertising, rolling and revocation process. This document describes 119 the minimum security requirements from a publisher's point of view 120 and is intended to complement the guidance offered in RFC5011 (which 121 is written to provide timing guidance solely to a Validating 122 Resolver's point of view). 124 To explain the RFC5011 security analysis in this document better, 125 Section 5 first describes an attack on a zone publisher. Then in 126 Section 6.1 we break down each of the timing components that will be 127 later used to define timing requirements for adding keys in 128 Section 6.2 and revoking keys in Section 6.3. 130 1.1. Document History and Motivation 132 To confirm that this lack of understanding is wide-spread, the 133 authors reached out to 5 DNSSEC experts to ask them how long they 134 thought they must wait before signing a zone exclusively with a new 135 KSK [RFC4033] that was being introduced according to the 5011 136 process. All 5 experts answered with an insecure value, and we 137 determined that this lack of understanding might cause security 138 concerns in deployment. We hope that this companion document to 139 RFC5011 will rectify this and provide better guidance to zone 140 publishers who wish to make use of the RFC5011 rollover process. 142 1.2. Safely Rolling the Root Zone's KSK in 2017/2018 144 One important note about ICANN's (currently in process) 2017/2018 KSK 145 rollover plan for the root zone: the timing values chosen for rolling 146 the KSK in the root zone appear completely safe, and are not affected 147 by the timing concerns discussed in this draft. 149 1.3. Requirements notation 151 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 152 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 153 document are to be interpreted as described in [RFC2119]. 155 2. Background 157 RFC5011 describes a process by which an RFC5011 Resolver may accept a 158 newly published KSK as a trust anchor for validating future DNSSEC 159 signed records. It also describes the process for publicly revoking 160 a published KSK. This document augments that information with 161 additional constraints, from the SEP publisher's points of view. 162 Note that this document does not define any other operational 163 guidance or recommendations about the RFC5011 process and restricts 164 itself solely to the security and operational ramifications of 165 prematurely switching to exclusively using recently added keys or 166 removing revoked keys. 168 Failure of a DNSKEY publisher to follow the minimum recommendations 169 associated with this draft can result in potential denial-of-service 170 attack opportunities against validating resolvers. Failure of a 171 DNSKEY publisher to publish a revoked key for a long enough period of 172 time may result in RFC5011 Resolvers leaving that key in their trust 173 anchor storage beyond the key's expected lifetime. 175 3. Terminology 177 SEP Publisher The entity responsible for publishing a DNSKEY (with 178 the Secure Entry Point (SEP) bit set) that can be used as a trust 179 anchor. 181 Zone Signer The owner of a zone intending to publish a new Key- 182 Signing-Key (KSK) that may become a trust anchor for validators 183 following the RFC5011 process. 185 RFC5011 Resolver A DNSSEC Resolver that is using the RFC5011 186 processes to track and update trust anchors. 188 Attacker An entity intent on foiling the RFC5011 Resolver's ability 189 to successfully adopt the Zone Signer's new DNSKEY as a new trust 190 anchor or to prevent the RFC5011 Resolver from removing an old 191 DNSKEY from its list of trust anchors. 193 sigExpirationTime The amount of time between the DNSKEY RRSIG's 194 Signature Inception field and the Signature Expiration field. 196 Also see Section 2 of [RFC4033] and [RFC7719] for additional 197 terminology. 199 4. Timing Associated with RFC5011 Processing 201 These subsections below give a high-level overview of [RFC5011] 202 processing. This description is not sufficient for fully 203 understanding RFC5011, but provide enough background for the reader 204 to follow the discussion in this document. Readers need to fully 205 understand [RFC5011] as well to fully comprehend the content and 206 importance of this document. 208 4.1. Timing Associated with Publication 210 RFC5011's process of safely publishing a new DNSKEY and then assuming 211 RFC5011 Resolvers have adopted it for trust can be broken down into a 212 number of high-level steps to be performed by the SEP Publisher. 213 This document discusses the following scenario, which the principal 214 way RFC5011 is currently being used (even though Section 6 of RFC5011 215 suggests having a stand-by key available): 217 1. Publish a new DNSKEY in a zone, but continue to sign the zone 218 with the old one. 220 2. Wait a period of time. 222 3. Begin to exclusively use recently published DNSKEYs to sign the 223 appropriate resource records. 225 This document discusses the time required to wait during step 2 of 226 the above process. Some interpretations of RFC5011 have erroneously 227 determined that the wait time is equal to RFC5011's "hold down time". 228 Section 5 describes an attack based on this (common) erroneous 229 belief, which can result in a denial of service attack against the 230 zone. 232 4.2. Timing Associated with Revocation 234 RFC5011's process of advertising that an old key is to be revoked 235 from RFC5011 Resolvers falls into a number of high-level steps: 237 1. Set the revoke bit on the DNSKEY to be revoked. 239 2. Sign the revoked DNSKEY with itself. 241 3. Wait a period of time. 243 4. Remove the revoked key from the zone. 245 This document discusses the time required to wait in step 3 of the 246 above process. Some interpretations of RFC5011 have erroneously 247 determined that the wait time is equal to RFC5011's "hold down time". 248 This document describes an attack based on this (common) erroneous 249 belief, which results in a revoked DNSKEY potentially remaining as a 250 trust anchor in a RFC5011 Resolver long past its expected usage. 252 5. Denial of Service Attack Walkthrough 254 This section serves as an illustrative example of the problem being 255 discussed in this document. Note that in order to keep the example 256 simple enough to understand, some simplifications were made (such as 257 by not creating a set of pre-signed RRSIGs and by not using values 258 that result in the addHoldDownTime not being evenly divisible by the 259 activeRefresh value); the mathematical formulas in Section 6 are, 260 however, complete. 262 If an attacker is able to provide a RFC5011 Resolver with past 263 responses, such as when it is on-path or able to perform any number 264 of cache poisoning attacks, the attacker may be able to leave 265 compliant RFC5011 Resolvers without an appropriate DNSKEY trust 266 anchor. This scenario will remain until an administrator manually 267 fixes the situation. 269 The time-line below illustrates an example of this situation. 271 5.1. Enumerated Attack Example 273 The following settings are used in the example scenario within this 274 section: 276 TTL (all records) 1 day 278 sigExpirationTime 10 days 280 Zone resigned every 1 day 282 Given these settings, the sequence of events in Section 5.1.1 depicts 283 how a SEP Publisher that waits for only the RFC5011 hold time timer 284 length of 30 days subjects its users to a potential Denial of Service 285 attack. The timeline below is based on a SEP Publisher publishing a 286 new Key Signing Key (KSK), with the intent that it will later be used 287 as a trust anchor. We label this publication time as "T+0". All 288 numbers in this timeline refer to days before and after this initial 289 publication event. Thus, T-1 is the day before the introduction of 290 the new key, and T+15 is the 15th day after the key was introduced 291 into the example zone being discussed. 293 In this exposition, we consider two keys within the example zone: 295 K_old: An older KSK and Trust Anchor being replaced. 297 K_new: A new KSK being transitioned into active use and expected to 298 become a Trust Anchor via the RFC5011 automated trust anchor 299 update process. 301 5.1.1. Attack Timing Breakdown 303 Below we examine an attack that foils the adoption of a new DNSKEY by 304 a 5011 Resolver when the SEP Publisher that starts signing and 305 publishing with the new DNSKEY too quickly. 307 T-1 The K_old based RRSIGs are being published by the Zone Signer. 308 [It may also be signing ZSKs as well, but they are not relevant to 309 this event so we will not talk further about them; we are only 310 considering the RRSIGs that cover the DNSKEYs in this document.] 311 The Attacker queries for, retrieves and caches this DNSKEY set and 312 corresponding RRSIG signatures. 314 T+0 The Zone Signer adds K_new to their zone and signs the zone's 315 key set with K_old. The RFC5011 Resolver (later to be under 316 attack) retrieves this new key set and corresponding RRSIGs and 317 notices the publication of K_new. The RFC5011 Resolver starts the 318 (30-day) hold-down timer for K_new. [Note that in a more real- 319 world scenario there will likely be a further delay between the 320 point where the Zone Signer publishes a new RRSIG and the RFC5011 321 Resolver notices its publication; though not shown in this 322 example, this delay is accounted for in the equation in Section 6 323 below] 325 T+5 The RFC5011 Resolver queries for the zone's keyset per the 326 RFC5011 Active Refresh schedule, discussed in Section 2.3 of 327 RFC5011. Instead of receiving the intended published keyset, the 328 Attacker successfully replays the keyset and associated signatures 329 recorded at T-1 to the victim RFC5011 Resolver. Because the 330 signature lifetime is 10 days (in this example), the replayed 331 signature and keyset is accepted as valid (being only 6 days old, 332 which is less than sigExpirationTime) and the RFC5011 Resolver 333 cancels the (30-day) hold-down timer for K_new, per the RFC5011 334 algorithm. 336 T+10 The RFC5011 Resolver queries for the zone's keyset and 337 discovers a signed keyset that includes K_new (again), and is 338 signed by K_old. Note: the attacker is unable to replay the 339 records cached at T-1, because the signatures have now expired. 340 Thus at T+10, the RFC5011 Resolver starts (anew) the hold-timer 341 for K_new. 343 T+11 through T+29 The RFC5011 Resolver continues checking the zone's 344 key set at the prescribed regular intervals. During this period, 345 the attacker can no longer replay traffic to their benefit. 347 T+30 The Zone Signer knows that this is the first time at which some 348 validators might accept K_new as a new trust anchor, since the 349 hold-down timer of a RFC5011 Resolver not under attack that had 350 queried and retrieved K_new at T+0 would now have reached 30 days. 351 However, the hold-down timer of our attacked RFC5011 Resolver is 352 only at 20 days. 354 T+35 The Zone Signer (mistakenly) believes that all validators 355 following the Active Refresh schedule (Section 2.3 of RFC5011) 356 should have accepted K_new as a the new trust anchor (since the 357 hold down time (30 days) + the query interval [which is just 1/2 358 the signature validity period in this example] would have passed). 359 However, the hold-down timer of our attacked RFC5011 Resolver is 360 only at 25 days (T+35 minus T+10); thus the RFC5011 Resolver won't 361 consider it a valid trust anchor addition yet, as the required 30 362 days have not yet elapsed. 364 T+36 The Zone Signer, believing K_new is safe to use, switches their 365 active signing KSK to K_new and publishes a new RRSIG, signed with 366 (only) K_new, covering the DNSKEY set. Non-attacked RFC5011 367 validators, with a hold-down timer of at least 30 days, would have 368 accepted K_new into their set of trusted keys. But, because our 369 attacked RFC5011 Resolver now has a hold-down timer for K_new of 370 only 26 days, it failed to ever accept K_new as a trust anchor. 371 Since K_old is no longer being used to sign the zone's DNSKEYs, 372 all the DNSKEY records from the zone will be treated as invalid. 373 Subsequently, all of the records in the DNS tree below the zone's 374 apex will be deemed invalid by DNSSEC. 376 6. Minimum RFC5011 Timing Requirements 378 This section defines the minimum timing requirements for making 379 exclusive use of newly added DNSKEYs and timing requirements for 380 ceasing the publication of DNSKEYs to be revoked. We break our 381 timing solution requirements into two primary components: the 382 mathematically-based security analysis of the RFC5011 publication 383 process itself, and an extension of this that takes operational 384 realities into account that further affect the recommended timings. 386 First, we define the component terms used in all equations in 387 Section 6.1. 389 6.1. Equation Components 391 6.1.1. addHoldDownTime 393 The addHoldDownTime is defined in Section 2.4.1 of [RFC5011] as: 395 The add hold-down time is 30 days or the expiration time of the 396 original TTL of the first trust point DNSKEY RRSet that contained 397 the new key, whichever is greater. This ensures that at least 398 two validated DNSKEY RRSets that contain the new key MUST be seen 399 by the resolver prior to the key's acceptance. 401 6.1.2. lastSigExpirationTime 403 The latest value (i.e. the future most date and time) of any RRSig 404 Signature Expiration field covering any DNSKEY RRSet containing only 405 the old trust anchor(s) that are being superseded. Note that for 406 organizations pre-creating signatures this time may be fairly far in 407 the future unless they can be significantly assured that none of 408 their pre-generated signatures can be replayed at a later date. 410 6.1.3. sigExpirationTime 412 The amount of time between the DNSKEY RRSIG's Signature Inception 413 field and the Signature Expiration field. 415 6.1.4. sigExpirationTimeRemaining 417 sigExpirationTimeRemaining is defined in Section 3. 419 6.1.5. activeRefresh 421 activeRefresh time is defined by RFC5011 by 423 A resolver that has been configured for an automatic update 424 of keys from a particular trust point MUST query that trust 425 point (e.g., do a lookup for the DNSKEY RRSet and related 426 RRSIG records) no less often than the lesser of 15 days, half 427 the original TTL for the DNSKEY RRSet, or half the RRSIG 428 expiration interval and no more often than once per hour. 430 This translates to: 432 activeRefresh = MAX(1 hour, 433 MIN(sigExpirationTime / 2, 434 MAX(TTL of K_old DNSKEY RRSet) / 2, 435 15 days) 436 ) 438 6.1.6. timingSafetyMargin 440 Mentally, it is easy to assume that the period of time required for 441 SEP publishers to wait after making changes to SEP marked DNSKEY sets 442 will be entirely based on the length of the addHoldDownTime. 443 Unfortunately, analysis shows that both the design of the RFC5011 444 protocol an the operational realities in deploying it require waiting 445 and additional period of time longer. In subsections Section 6.1.6.1 446 to Section 6.1.6.3 below, we discuss three sources of additional 447 delay. In the end, we will pick the largest of these delays as the 448 minimum additional time that the SEP Publisher must wait in our final 449 timingSafetyMargin value, which we define in Section 6.1.6.4. 451 6.1.6.1. activeRefreshOffset 453 A security analysis of the timing associated with the query rate of 454 RFC5011 Resolvers shows that it may not perfectly align with the 455 addHoldDownTime when the addHoldDownTime is not evenly divisible by 456 the activeRefresh time. Consider the example of a zone with an 457 activeRefresh period of 7 days. If an associated RFC5011 Resolver 458 started it's holdDown timer just after the SEP published a new DNSKEY 459 (at time T+0), the resolver would send checking queries at T+7, T+14, 460 T+21 and T+28 Days and will finally accept it at T+35 days, which is 461 5 days longer than the 30-day addHoldDownTime. 463 The activeRefreshOffset term defines this time difference and 464 becomes: 466 activeRefreshOffset = addHoldDownTime % activeRefresh 468 The % symbol denotes the mathematical mod operator (calculating the 469 remainder in a division problem). This will frequently be zero, but 470 can be nearly as large as activeRefresh itself. 472 6.1.6.2. clockskewDriftMargin 474 Even small clock drifts can have negative impacts upon the timing of 475 the RFC5011 Resolver's measurements. Consider the simplest case 476 where the RFC5011 Resolver's clock shifts over time to be 2 seconds 477 slower near the end of the RFC5011 Resolver's addHoldDownTime period. 478 I.E., if the RFC5011 Resolver first noticed a new DNSKEY at: 480 firstSeen = sigExpirationTime + activeRefresh + 1 second 482 The effect of 2 second clock drift between the SEP Publisher and the 483 RFC5011 Resolver may result in the RFC5011 Resolver querying again 484 at: 486 justBefore = sigExpirationTime + addHoldDownTime + 487 activeRefresh + 1 second - 2 seconds 489 which becomes: 491 justBefore = sigExpirationTime + addHoldDownTime + 492 activeRefresh - 1 second 494 The net effect is the addHoldDownTime will not have been reached from 495 the perspective of the RFC5011 Resolver, but it will have been 496 reached from the perspective of the SEP Publisher. The net effect is 497 it may take one additional activeRefresh period longer for this 498 RFC5011 Resolver to accept the new key (at sigExpirationTime + 499 addHoldDownTime + 2 * activeRefresh - 1 second). 501 We note that even the smallest clockskew errors can require waiting 502 an additional activeRefresh period, and thus define the 503 clockskewDriftMargin as: 505 clockskewDriftMargin = activeRefresh 507 6.1.6.3. retryDriftMargin 509 Drift associated with a lost transmission and an accompanying re- 510 transmission (see Section 2.3 of [RFC5011]) will cause RFC5011 511 Resolvers to also change the timing associated with query times such 512 that it becomes impossible to predict, from the perspective of the 513 SEP Publisher, when the conclusive measurement query will arrive. 514 Similarly, any software that restarts/reboots without saving next- 515 query timing state may also commence with a new random starting time. 516 Thus, an additional activeRefresh is needed to handle both these 517 cases as well. 519 retryDriftMargin = activeRefresh 521 Note that we account for additional time associated with cumulative 522 multiple retries, especially under high-loss conditions, in 523 Section 6.1.6.4. 525 6.1.6.4. timingSafetyMargin Value 527 The activeRefreshOffset, clockskewDriftMargin, and retryDriftMargin 528 parameters all deal with additional wait-periods that must be 529 accounted for after analyzing what conditions the client will take 530 longer than expected to make its last query while waiting for the 531 addHoldDownTime period to pass. But these values may be merged into 532 a single term by waiting the longest of any of them. We define 533 timingSafetyMargin as this "worst case" value: 535 timingSafetyMargin = MAX(activeRefreshOffset, 536 clockskewDriftMargin, 537 retryDriftMargin) 539 timingSafetyMargin = MAX(addWaitTime % activeRefresh, 540 activeRefresh, 541 activeRefresh) 543 timingSafetyMargin = activeRefresh 545 6.1.7. retrySafetyMargin 547 The retrySafetyMargin is an extra period of time to account for 548 caching, network delays, dropped packets, and other operational 549 concerns otherwise beyond the scope of this document. The value 550 operators should chose is highly dependent on the deployment 551 situation associated with their zone. Note that no value of a 552 retrySafetyMargin can protect against resolvers that are "down". 553 Nonetheless, we do offer the following as one method considering 554 reasonable values to select from. 556 The following list of variables need to be considered when selecting 557 an appropriate retrySafetyMargin value: 559 successRate: A likely success rate for client queries and retries 561 numResolvers: The number of client RFC5011 Resolvers 563 Note that RFC5011 defines retryTime as: 565 If the query fails, the resolver MUST repeat the query until 566 satisfied no more often than once an hour and no less often 567 than the lesser of 1 day, 10% of the original TTL, or 10% of 568 the original expiration interval. That is, 569 retryTime = MAX (1 hour, MIN (1 day, .1 * origTTL, 570 .1 * expireInterval)). 572 With the successRate and numResolvers values selected and the 573 definition of retryTime from RFC5011, one method for determining how 574 many retryTime intervals to wait in order to reduce the set of 575 resolvers that have not accepted the new trust anchor to 0 is thus: 577 x = (1/(1 - successRate)) 579 retryCountWait = Log_base_x(numResolvers) 581 To reduce the need for readers to pull out a scientific calculator, 582 we offer the following lookup table based on successRate and 583 numResolvers: 585 retryCountWait lookup table 586 --------------------------- 588 Number of client RFC5011 Resolvers (numResolvers) 589 ------------------------------------------------- 590 10,000 100,000 1,000,000 10,000,000 100,000,000 591 0.01 917 1146 1375 1604 1833 592 Probability 0.05 180 225 270 315 360 593 of Success 0.10 88 110 132 153 175 594 Per Retry 0.15 57 71 86 100 114 595 Interval 0.25 33 41 49 57 65 596 (successRate) 0.50 14 17 20 24 27 597 0.90 4 5 6 7 8 598 0.95 4 4 5 6 7 599 0.99 2 3 3 4 4 600 0.999 2 2 2 3 3 602 Finally, a suggested value of retrySafetyMargin can then be this 603 retryCountWait number multiplied by the retryTime from RFC5011: 605 retrySafetyMargin = retryCountWait * retryTime 607 6.2. Timing Requirements For Adding a New KSK 609 Given the defined parameters and analysis from Section 6.1, we can 610 now create a method for calculating the amount of time to wait until 611 it is safe to start signing exclusively with a new DNSKEY (especially 612 useful for writing code involving sleep based timers) in 613 Section 6.2.1, and define a method for calculating a wall-clock value 614 after which it is safe to start signing exclusively with a new DNSKEY 615 (especially useful for writing code based on clock-based event 616 triggers) in Section 6.2.2. 618 6.2.1. Wait Timer Based Calculation 620 Given the attack description in Section 5, the correct minimum length 621 of time required for the Zone Signer to wait after publishing K_new 622 but before exclusively using it and newer keys is: 624 addWaitTime = addHoldDownTime 625 + sigExpirationTimeRemaining 626 + activeRefresh 627 + timingSafetyMargin 628 + retrySafetyMargin 630 6.2.1.1. Fully expanded equation 632 Given the equation components defined in Section 6.1, the full 633 expanded equation is: 635 addWaitTime = addHoldDownTime 636 + sigExpirationTimeRemaining 637 + 2 * MAX(1 hour, 638 MIN(sigExpirationTime / 2, 639 MAX(TTL of K_old DNSKEY RRSet) / 2, 640 15 days) 641 ) 642 + retrySafetyMargin 644 6.2.2. Wall-Clock Based Calculation 646 The equations in Section 6.2.1 are defined based upon how long to 647 wait from a particular moment in time. An alternative, but 648 equivalent, method is to calculate the date and time before which it 649 is unsafe to use a key for signing. This calculation thus becomes: 651 addWallClockTime = lastSigExpirationTime 652 + addHoldDownTime 653 + activeRefresh 654 + timingSafetyMargin 655 + retrySafetyMargin 657 where lastSigExpirationTime is the latest value of any 658 sigExpirationTime for which RRSIGs were created that could 659 potentially be replayed. Fully expanded, this becomes: 661 addWallClockTime = lastSigExpirationTime 662 + addHoldDownTime 663 + 2 * MAX(1 hour, 664 MIN(sigExpirationTime / 2, 665 MAX(TTL of K_old DNSKEY RRSet) / 2, 666 15 days) 667 ) 668 + retrySafetyMargin 670 6.2.3. Timing Constraint Summary 672 The important timing constraint introduced by this memo relates to 673 the last point at which a RFC5011 Resolver may have received a 674 replayed original DNSKEY set, containing K_old and not K_new. The 675 next query of the RFC5011 validator at which K_new will be seen 676 without the potential for a replay attack will occur after the old 677 DNSKEY RRSIG's Signature Expriation Time. Thus, the latest time that 678 a RFC5011 Validator may begin their hold down timer is an "Active 679 Refresh" period after the last point that an attacker can replay the 680 K_old DNSKEY set. The worst case scenario of this attack is if the 681 attacker can replay K_old just seconds before the (DNSKEY RRSIG 682 Signature Validity) field of the last K_old only RRSIG. 684 6.2.4. Additional Considerations for RFC7583 686 Note: our notion of addWaitTime is called "Itrp" in Section 3.3.4.1 687 of [RFC7583]. The equation for Itrp in RFC7583 is insecure as it 688 does not include the sigExpirationTime listed above. The Itrp 689 equation in RFC7583 also does not include the 2*TTL safety margin, 690 though that is an operational consideration. 692 6.2.5. Example Scenario Calculations 694 For the parameters listed in Section 5.1, our resulting addWaitTime 695 is: 697 addWaitTime = 30 698 + 10 699 + 1 / 2 700 + 1 / 2 (days) 702 addWaitTime = 43 (days) 704 This addWaitTime of 42.5 days is 12.5 days longer than just the hold 705 down timer, even with the needed retrySafetyMargin value being left 706 out (which we exclude due to the lack of necessary operational 707 parameters). 709 6.3. Timing Requirements For Revoking an Old KSK 711 This issue affects not just the publication of new DNSKEYs intended 712 to be used as trust anchors, but also the length of time required to 713 continuously publish a DNSKEY with the revoke bit set. 715 Section 6.2.1 defines a method for calculating the amount of time 716 operators need to wait until it is safe to cease publishing a DNSKEY 717 (especially useful for writing code involving sleep based timers), 718 and Section 6.2.2 defines a method for calculating a minimal wall- 719 clock value after which it is safe to cease publishing a DNSKEY 720 (especially useful for writing code based on clock-based event 721 triggers). 723 6.3.1. Wait Timer Based Calculation 725 Both of these publication timing requirements are affected by the 726 attacks described in this document, but with revocation the key is 727 revoked immediately and the addHoldDown timer does not apply. Thus 728 the minimum amount of time that a SEP Publisher must wait before 729 removing a revoked key from publication is: 731 remWaitTime = sigExpirationTimeRemaining 732 + activeRefresh 733 + timingSafetyMargin 734 + retrySafetyMargin 736 remWaitTime = sigExpirationTimeRemaining 737 + MAX(1 hour, 738 MIN((sigExpirationTime) / 2, 739 MAX(TTL of K_old DNSKEY RRSet) / 2, 740 15 days)) 741 + activeRefresh 742 + retrySafetyMargin 744 Note also that adding retryTime intervals to the remWaitTime may be 745 wise, just as it was for addWaitTime in Section 6. 747 6.3.2. Wall-Clock Based Calculation 749 Like before, the above equations are defined based upon how long to 750 wait from a particular moment in time. An alternative, but 751 equivalent, method is to calculate the date and time before which it 752 is unsafe to cease publishing a revoked key. This calculation thus 753 becomes: 755 remWallClockTime = lastSigExpirationTime 756 + activeRefresh 757 + timingSafetyMargin 758 + retrySafetyMargin 760 remWallClockTime = lastSigExpirationTime 761 + MAX(1 hour, 762 MIN((sigExpirationTime) / 2, 763 MAX(TTL of K_old DNSKEY RRSet) / 2, 764 15 days)) 765 + timingSafetyMargin 766 + retrySafetyMargin 768 where lastSigExpirationTime is the latest value of any 769 sigExpirationTime for which RRSIGs were created that could 770 potentially be replayed. Fully expanded, this becomes: 772 6.3.3. Additional Considerations for RFC7583 774 Note that our notion of remWaitTime is called "Irev" in 775 Section 3.3.4.2 of [RFC7583]. The equation for Irev in RFC7583 is 776 insecure as it does not include the sigExpirationTime listed above. 777 The Irev equation in RFC7583 also does not include a safety margin, 778 though that is an operational consideration. 780 6.3.4. Example Scenario Calculations 782 For the parameters listed in Section 5.1, our example: 784 remwaitTime = 10 785 + 1 / 2 (days) 787 remwaitTime = 10.5 (days) 789 Note that for the values in this example produce a length shorter 790 than the recommended 30 days in RFC5011's section 6.6, step 3. Other 791 values of sigExpirationTime and the original TTL of the K_old DNSKEY 792 RRSet, however, can produce values longer than 30 days. 794 Note that because revocation happens immediately, an attacker has a 795 much harder job tricking a RFC5011 Resolver into leaving a trust 796 anchor in place, as the attacker must successfully replay the old 797 data for every query a RFC5011 Resolver sends, not just one. 799 7. IANA Considerations 801 This document contains no IANA considerations. 803 8. Operational Considerations 805 A companion document to RFC5011 was expected to be published that 806 describes the best operational practice considerations from the 807 perspective of a zone publisher and SEP Publisher. However, this 808 companion document has yet to be published. The authors of this 809 document hope that it will at some point in the future, as RFC5011 810 timing can be tricky as we have shown, and a BCP is clearly 811 warranted. This document is intended only to fill a single 812 operational void which, when left misunderstood, can result in 813 serious security ramifications. This document does not attempt to 814 document any other missing operational guidance for zone publishers. 816 9. Security Considerations 818 This document, is solely about the security considerations with 819 respect to the SEP Publisher's ability to advertise new DNSKEYs via 820 the RFC5011 automated trust anchor update process. Thus the entire 821 document is a discussion of Security Considerations when adding or 822 removing DNSKEYs from trust anchor storage using the RFC5011 process. 824 For simplicity, this document assumes that the SEP Publisher will use 825 a consistent RRSIG validity period. SEP Publishers that vary the 826 length of RRSIG validity periods will need to adjust the 827 sigExpirationTime value accordingly so that the equations in 828 Section 6 and Section 6.3 use a value that coincides with the last 829 time a replay of older RRSIGs will no longer succeed. 831 10. Acknowledgements 833 The authors would like to especially thank to Michael StJohns for his 834 help and advice and the care and thought he put into RFC5011 itself 835 and his continued reviews and suggestions for this document. He also 836 designed the suggested math behind the suggested retrySafetyMargin 837 values in Section 6.1.7. 839 We would also like to thank Bob Harold, Shane Kerr, Matthijs Mekking, 840 Duane Wessels, Petr Petr Spacek, Ed Lewis, Viktor Dukhovni, and the 841 dnsop working group who have assisted with this document. 843 11. Normative References 845 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 846 Requirement Levels", BCP 14, RFC 2119, 847 DOI 10.17487/RFC2119, March 1997, 848 . 850 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 851 Rose, "DNS Security Introduction and Requirements", 852 RFC 4033, DOI 10.17487/RFC4033, March 2005, 853 . 855 [RFC5011] StJohns, M., "Automated Updates of DNS Security (DNSSEC) 856 Trust Anchors", STD 74, RFC 5011, DOI 10.17487/RFC5011, 857 September 2007, . 859 [RFC7583] Morris, S., Ihren, J., Dickinson, J., and W. Mekking, 860 "DNSSEC Key Rollover Timing Considerations", RFC 7583, 861 DOI 10.17487/RFC7583, October 2015, 862 . 864 [RFC7719] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS 865 Terminology", RFC 7719, DOI 10.17487/RFC7719, December 866 2015, . 868 Appendix A. Real World Example: The 2017 Root KSK Key Roll 870 In 2017 and 2018, ICANN expects to (or has, depending on when you're 871 reading this) roll the key signing key (KSK) for the root zone. The 872 relevant parameters associated with the root zone at the time of this 873 writing is as follows: 875 addHoldDownTime: 30 days 876 Old DNSKEY sigExpirationTime: 21 days 877 Old DNSKEY TTL: 2 days 879 Thus, sticking this information into the equation in 880 Section Section 6 yields (in days from publication time): 882 addWaitTime = 30 883 + 21 884 + MAX(1 hour, 885 MIN(21 / 2, # activeRefresh 886 MAX(2) / 2, 887 15 days), 888 ) 889 + activeRefresh 891 addWaitTime = 30 + 21 + 1 + 1 893 addWaitTime = 53 days 895 Also note that we exclude the retrySafetyMargin value, which is 896 calculated based on the expected client deployment size. 898 Thus, ICANN must wait a minimum of 52 days before switching to the 899 newly published KSK (and 26 days before removing the old revoked key 900 once it is published as revoked). ICANN's current plans involve 901 waiting over 3 months before using the new KEY and 69 days before 902 removing the old, revoked key. Thus, their current rollover plans 903 are sufficiently secure from the attack discussed in this memo. 905 Authors' Addresses 907 Wes Hardaker 908 USC/ISI 909 P.O. Box 382 910 Davis, CA 95617 911 US 913 Email: ietf@hardakers.net 915 Warren Kumari 916 Google 917 1600 Amphitheatre Parkway 918 Mountain View, CA 94043 919 US 921 Email: warren@kumari.net