idnits 2.17.1 

draft-ietf-dnsop-rfc5011-security-considerations-10.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document updates RFC7583, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 19, 2017) is 2313 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Downref: Normative reference to an Informational RFC: RFC 7583

  ** Obsolete normative reference: RFC 7719 (Obsoleted by RFC 8499)


     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	dnsop                                                        W. Hardaker
3	Internet-Draft                                                   USC/ISI
4	Updates: 7583 (if approved)                                    W. Kumari
5	Intended status: Standards Track                                  Google
6	Expires: June 22, 2018                                 December 19, 2017

8	             Security Considerations for RFC5011 Publishers
9	          draft-ietf-dnsop-rfc5011-security-considerations-10

11	Abstract

13	   This document extends the RFC5011 rollover strategy with timing
14	   advice that must be followed by the publisher in order to maintain
15	   security.  Specifically, this document describes the math behind the
16	   minimum time-length that a DNS zone publisher must wait before
17	   signing exclusively with recently added DNSKEYs.  This document also
18	   describes the minimum time-length that a DNS zone publisher must wait
19	   after publishing a revoked DNSKEY before assuming that all active
20	   RFC5011 resolvers should have seen the revocation-marked key and
21	   removed it from their list of trust anchors.

23	   This document contains much math and complicated equations, but the
24	   summary is that the key rollover / revocation time is much longer
25	   than intuition would suggest.  If you are not both publishing a
26	   DNSSEC DNSKEY, and using RFC5011 to advertise this DNSKEY as a new
27	   Secure Entry Point key for use as a trust anchor, you probably don't
28	   need to read this document.

30	Status of This Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	   This Internet-Draft will expire on June 22, 2018.

47	Copyright Notice

49	   Copyright (c) 2017 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
65	     1.1.  Document History and Motivation . . . . . . . . . . . . .   3
66	     1.2.  Safely Rolling the Root Zone's KSK in 2017/2018 . . . . .   3
67	     1.3.  Requirements notation . . . . . . . . . . . . . . . . . .   4
68	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   4
69	   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
70	   4.  Timing Associated with RFC5011 Processing . . . . . . . . . .   5
71	     4.1.  Timing Associated with Publication  . . . . . . . . . . .   5
72	     4.2.  Timing Associated with Revocation . . . . . . . . . . . .   5
73	   5.  Denial of Service Attack Walkthrough  . . . . . . . . . . . .   6
74	     5.1.  Enumerated Attack Example . . . . . . . . . . . . . . . .   6
75	       5.1.1.  Attack Timing Breakdown . . . . . . . . . . . . . . .   7
76	   6.  Minimum RFC5011 Timing Requirements . . . . . . . . . . . . .   8
77	     6.1.  Equation Components . . . . . . . . . . . . . . . . . . .   9
78	       6.1.1.  addHoldDownTime . . . . . . . . . . . . . . . . . . .   9
79	       6.1.2.  lastSigExpirationTime . . . . . . . . . . . . . . . .   9
80	       6.1.3.  sigExpirationTime . . . . . . . . . . . . . . . . . .   9
81	       6.1.4.  sigExpirationTimeRemaining  . . . . . . . . . . . . .   9
82	       6.1.5.  activeRefresh . . . . . . . . . . . . . . . . . . . .   9
83	       6.1.6.  activeRefreshOffset . . . . . . . . . . . . . . . . .  10
84	       6.1.7.  driftSafetyMargin . . . . . . . . . . . . . . . . . .  10
85	       6.1.8.  timingSafetyMargin  . . . . . . . . . . . . . . . . .  10
86	       6.1.9.  retrySafetyMargin . . . . . . . . . . . . . . . . . .  11
87	     6.2.  Timing Requirements For Adding a New KSK  . . . . . . . .  12
88	       6.2.1.  Wait Timer Based Calculation  . . . . . . . . . . . .  12
89	       6.2.2.  Wall-Clock Based Calculation  . . . . . . . . . . . .  13
90	       6.2.3.  Timing Constraint Summary . . . . . . . . . . . . . .  13
91	       6.2.4.  Additional Considerations for RFC7583 . . . . . . . .  14
92	       6.2.5.  Example Scenario Calculations . . . . . . . . . . . .  14
93	     6.3.  Timing Requirements For Revoking an Old KSK . . . . . . .  14
94	       6.3.1.  Wait Timer Based Calculation  . . . . . . . . . . . .  15
95	       6.3.2.  Wall-Clock Based Calculation  . . . . . . . . . . . .  15
96	       6.3.3.  Additional Considerations for RFC7583 . . . . . . . .  16
97	       6.3.4.  Example Scenario Calculations . . . . . . . . . . . .  16
98	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
99	   8.  Operational Considerations  . . . . . . . . . . . . . . . . .  16
100	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  17
101	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  17
102	   11. Normative References  . . . . . . . . . . . . . . . . . . . .  17
103	   Appendix A.  Real World Example: The 2017 Root KSK Key Roll . . .  18
104	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  18

106	1.  Introduction

108	   [RFC5011] defines a mechanism by which DNSSEC validators can update
109	   their list of trust anchors when they've seen a new key published in
110	   a zone or revoke a properly marked key from a trust anchor list.
111	   However, RFC5011 [intentionally] provides no guidance to the
112	   publishers of DNSKEYs about how long they must wait before switching
113	   to exclusively using recently published keys for signing records, or
114	   how long they must wait before ceasing publication of a revoked key.
115	   Because of this lack of guidance, zone publishers may derive
116	   incorrect assumptions about safe usage of the RFC5011 DNSKEY
117	   advertising, rolling and revocation process.  This document describes
118	   the minimum security requirements from a publisher's point of view
119	   and is intended to complement the guidance offered in RFC5011 (which
120	   is written to provide timing guidance solely to a Validating
121	   Resolver's point of view).

123	1.1.  Document History and Motivation

125	   To verify this lack of understanding is wide-spread, the authors
126	   reached out to 5 DNSSEC experts to ask them how long they thought
127	   they must wait before signing a zone exclusively with a new KSK
128	   [RFC4033] that was being introduced according to the 5011 process.
129	   All 5 experts answered with an insecure value, and we determined that
130	   this lack of mathematical understanding might cause security concerns
131	   in deployment.  We hope that this companion document to RFC5011 will
132	   rectify this understanding and provide better guidance to zone
133	   publishers that wish to make use of the RFC5011 rollover process.

135	1.2.  Safely Rolling the Root Zone's KSK in 2017/2018

137	   One important note about ICANN's (currently in process) 2017/2018 KSK
138	   rollover plan for the root zone: the timing values chosen for rolling
139	   the KSK in the root zone appear completely safe, and are not affected
140	   by the timing concerns introduced by this draft

142	1.3.  Requirements notation

144	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
145	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
146	   document are to be interpreted as described in [RFC2119].

148	2.  Background

150	   The RFC5011 process describes a process by which a RFC5011 Resolver
151	   may accept a newly published KSK as a trust anchor for validating
152	   future DNSSEC signed records.  It also describes the process for
153	   publicly revoking a published KSK.  This document augments that
154	   information with additional constraints, from the SEP publisher's
155	   points of view.  Note that this document does not define any other
156	   operational guidance or recommendations about the RFC5011 process and
157	   restricts itself to solely the security and operational ramifications
158	   of switching to exclusively using recently added keys or removing
159	   revoked keys too soon.

161	   Failure of a DNSKEY publisher to follow the minimum recommendations
162	   associated with this draft can result in potential denial-of-service
163	   attack opportunities against validating resolvers.  Failure of a
164	   DNSKEY publisher to publish a revoked key for a long enough period of
165	   time may result in RFC5011 Resolvers leaving that key in their trust
166	   anchor storage beyond the key's expected lifetime.

168	3.  Terminology

170	   SEP Publisher  The entity responsible for publishing a DNSKEY (with
171	      the Secure Entry Point (SEP) bit set) that can be used as a trust
172	      anchor.

174	   Zone Signer  The owner of a zone intending to publish a new Key-
175	      Signing-Key (KSK) that may become a trust anchor for validators
176	      following the RFC5011 process.

178	   RFC5011 Resolver  A DNSSEC Resolver that is using the RFC5011
179	      processes to track and update trust anchors.

181	   Attacker  An entity intent on foiling the RFC5011 Resolver's ability
182	      to successfully adopt the Zone Signer's new DNSKEY as a new trust
183	      anchor or to prevent the RFC5011 Resolver from removing an old
184	      DNSKEY from its list of trust anchors.

186	   sigExpirationTime  The amount of time between the DNSKEY RRSIG's
187	      Signature Inception field and the Signature Expiration field.

189	   Also see Section 2 of [RFC4033] and [RFC7719] for additional
190	   terminology.

192	4.  Timing Associated with RFC5011 Processing

194	   These sections define a high-level overview of [RFC5011] processing.
195	   These steps are not sufficient for proper RFC5011 implementation, but
196	   provide enough background for the reader to follow the discussion in
197	   this document.  Readers need to fully understand [RFC5011] as well to
198	   fully comprehend the content and importance of this document.

200	4.1.  Timing Associated with Publication

202	   RFC5011's process of safely publishing a new DNSKEY and then assuming
203	   RFC5011 Resolvers have adopted it for trust falls into a number of
204	   high-level steps to be performed by the SEP Publisher.  This document
205	   discusses the following scenario, which the principle way RFC5011 is
206	   currently being used (even though Section 6 of RFC5011 suggests
207	   having a stand-by key available):

209	   1.  Publish a new DNSKEY in a zone, but continue to sign the zone
210	       with the old one.

212	   2.  Wait a period of time.

214	   3.  Begin to exclusively use recently published DNSKEYs to sign the
215	       appropriate resource records.

217	   This document discusses the time required to wait during step 2 of
218	   the above process.  Some interpretations of RFC5011 have erroneously
219	   determined that the wait time is equal to RFC5011's "hold down time".
220	   Section 5 describes an attack based on this (common) erroneous
221	   belief, which can result in a denial of service attack against the
222	   zone.

224	4.2.  Timing Associated with Revocation

226	   RFC5011's process of advertising that an old key is to be revoked
227	   from RFC5011 Resolvers falls into a number of high-level steps:

229	   1.  Set the revoke bit on the DNSKEY to be revoked.

231	   2.  Sign the revoked DNSKEY with itself.

233	   3.  Wait a period of time.

235	   4.  Remove the revoked key from the zone.

237	   This document discusses the time required to wait in step 3 of the
238	   above process.  Some interpretations of RFC5011 have erroneously
239	   determined that the wait time is equal to RFC5011's "hold down time".
240	   This document describes an attack based on this (common) erroneous
241	   belief, which results in a revoked DNSKEY potentially remaining as a
242	   trust anchor in a RFC5011 Resolver long past its expected usage.

244	5.  Denial of Service Attack Walkthrough

246	   This section serves as an illustrative example of the problem being
247	   discussed in this document.  Note that in order to keep the example
248	   simple enough to understand, some simplifications were made (such as
249	   by not creating a set of pre-signed RRSIGs and by not using values
250	   that result in the addHoldDownTime not being evenly divisible by the
251	   activeRefresh value); the mathematical formulas in Section 6 are,
252	   however, complete.

254	   If an attacker is able to provide a RFC5011 Resolver with past
255	   responses, such as when it is in-path or able to perform any number
256	   of cache poisoning attacks, the attacker may be able to leave
257	   compliant RFC5011 Resolvers without an appropriate DNSKEY trust
258	   anchor.  This scenario will remain until an administrator manually
259	   fixes the situation.

261	   The time-line below illustrates an example of this situation.

263	5.1.  Enumerated Attack Example

265	   The following example settings are used in the example scenario
266	   within this section:

268	   TTL (all records)  1 day

270	   sigExpirationTime  10 days

272	   Zone resigned every  1 day

274	   Given these settings, the sequence of events in Section 5.1.1 depicts
275	   how a SEP Publisher that waits for only the RFC5011 hold time timer
276	   length of 30 days subjects its users to a potential Denial of Service
277	   attack.  The timing schedule listed below is based on a SEP Publisher
278	   publishing a new Key Signing Key (KSK), with the intent that it will
279	   later be used as a trust anchor.  We label this publication time as
280	   "T+0".  All numbers in this sequence refer to days before and after
281	   this initial publication event.  Thus, T-1 is the day before the
282	   introduction of the new key, and T+15 is the 15th day after the key
283	   was introduced into the fictitious zone being discussed.

285	   In this dialog, we consider two keys within the example zone:

287	   K_old:  An older KSK and Trust Anchor being replaced.

289	   K_new:  A new KSK being transitioned into active use and expected to
290	      become a Trust Anchor via the RFC5011 automated trust anchor
291	      update process.

293	5.1.1.  Attack Timing Breakdown

295	   The steps shows an attack that foils the adoption of a new DNSKEY by
296	   a 5011 Resolver when the SEP Publisher that starts signing and
297	   publishing with the new DNSKEY too quickly.

299	   T-1  The K_old based RRSIGs are being published by the Zone Signer.
300	      [It may also be signing ZSKs as well, but they are not relevant to
301	      this event so we will not talk further about them; we are only
302	      considering the RRSIGs that cover the DNSKEYs in this document.]
303	      The Attacker queries for, retrieves and caches this DNSKEY set and
304	      corresponding RRSIG signatures.

306	   T+0  The Zone Signer adds K_new to their zone and signs the zone's
307	      key set with K_old.  The RFC5011 Resolver (later to be under
308	      attack) retrieves this new key set and corresponding RRSIGs and
309	      notices the publication of K_new.  The RFC5011 Resolver starts the
310	      (30-day) hold-down timer for K_new.  [Note that in a more real-
311	      world scenario there will likely be a further delay between the
312	      point where the Zone Signer publishes a new RRSIG and the RFC5011
313	      Resolver notices its publication; though not shown in this
314	      example, this delay is accounted for in the equation in Section 6
315	      below]

317	   T+5  The RFC5011 Resolver queries for the zone's keyset per the
318	      RFC5011 Active Refresh schedule, discussed in Section 2.3 of
319	      RFC5011.  Instead of receiving the intended published keyset, the
320	      Attacker successfully replays the keyset and associated signatures
321	      recorded at T-1 to the victim RFC5011 Resolver.  Because the
322	      signature lifetime is 10 days (in this example), the replayed
323	      signature and keyset is accepted as valid (being only 6 days old,
324	      which is less than sigExpirationTime) and the RFC5011 Resolver
325	      cancels the (30-day) hold-down timer for K_new, per the RFC5011
326	      algorithm.

328	   T+10  The RFC5011 Resolver queries for the zone's keyset and
329	      discovers a signed keyset that includes K_new (again), and is
330	      signed by K_old.  Note: the attacker is unable to replay the
331	      records cached at T-1, because the signatures have now expired.

333	      Thus at T+10, the RFC5011 Resolver starts (anew) the hold-timer
334	      for K_new.

336	   T+11 through T+29  The RFC5011 Resolver continues checking the zone's
337	      key set at the prescribed regular intervals.  During this period,
338	      the attacker can no longer replay traffic to their benefit.

340	   T+30  The Zone Signer knows that this is the first time at which some
341	      validators might accept K_new as a new trust anchor, since the
342	      hold-down timer of a RFC5011 Resolver not under attack that had
343	      queried and retrieved K_new at T+0 would now have reached 30 days.
344	      However, the hold-down timer of our attacked RFC5011 Resolver is
345	      only at 20 days.

347	   T+35  The Zone Signer (mistakenly) believes that all validators
348	      following the Active Refresh schedule (Section 2.3 of RFC5011)
349	      should have accepted K_new as a the new trust anchor (since the
350	      hold down time (30 days) + the query interval [which is just 1/2
351	      the signature validity period in this example] would have passed).
352	      However, the hold-down timer of our attacked RFC5011 Resolver is
353	      only at 25 days (T+35 minus T+10); thus the RFC5011 Resolver won't
354	      consider it a valid trust anchor addition yet, as the required 30
355	      days have not yet elapsed.

357	   T+36  The Zone Signer, believing K_new is safe to use, switches their
358	      active signing KSK to K_new and publishes a new RRSIG, signed with
359	      (only) K_new, covering the DNSKEY set.  Non-attacked RFC5011
360	      validators, with a hold-down timer of at least 30 days, would have
361	      accepted K_new into their set of trusted keys.  But, because our
362	      attacked RFC5011 Resolver now has a hold-down timer for K_new of
363	      only 26 days, it failed to ever accept K_new as a trust anchor.
364	      Since K_old is no longer being used to sign the zone's DNSKEYs,
365	      all the DNSKEY records from the zone will be treated as invalid.
366	      Subsequently, all of the records in the DNS tree below the zone's
367	      apex will be deemed invalid by DNSSEC.

369	6.  Minimum RFC5011 Timing Requirements

371	   This section defines the minimum timing requirements for making
372	   exclusive use of newly added DNSKEYs and timing requirements for
373	   ceasing the publication of DNSKEYs to be revoked.  We break our
374	   timing solution requirements into two primary components: the
375	   mathematically-based security analysis of the RFC5011 publication
376	   process itself, and an extension of this that takes operational
377	   realities into account that further affect the recommended timings.

379	   First, we define the term components used in all equations in
380	   Section 6.1.

382	6.1.  Equation Components

384	6.1.1.  addHoldDownTime

386	   The addHoldDownTime is defined in Section 2.4.1 of [RFC5011] as:

388	       The add hold-down time is 30 days or the expiration time of the
389	       original TTL of the first trust point DNSKEY RRSet that contained
390	       the new key, whichever is greater.  This ensures that at least
391	       two validated DNSKEY RRSets that contain the new key MUST be seen
392	       by the resolver prior to the key's acceptance.

394	6.1.2.  lastSigExpirationTime

396	   The latest value (i.e. the future most date and time) of any RRSig
397	   Signature Expiration field covering any DNSKEY RRSet containing only
398	   the old trust anchor(s) that are being superseded.  Note that for
399	   organizations pre-creating signatures this time may be fairly far in
400	   the future unless they can be significantly assured that none of
401	   their pre-generated signatures can be replayed at a later date.

403	6.1.3.  sigExpirationTime

405	   The amount of time between the DNSKEY RRSIG's Signature Inception
406	   field and the Signature Expiration field.

408	6.1.4.  sigExpirationTimeRemaining

410	   sigExpirationTimeRemaining is defined in Section 3.

412	6.1.5.  activeRefresh

414	   activeRefresh time is defined by RFC5011 by

416	     A resolver that has been configured for an automatic update
417	     of keys from a particular trust point MUST query that trust
418	     point (e.g., do a lookup for the DNSKEY RRSet and related
419	     RRSIG records) no less often than the lesser of 15 days, half
420	     the original TTL for the DNSKEY RRSet, or half the RRSIG
421	     expiration interval and no more often than once per hour.

423	   This translates to:

425	    activeRefresh = MAX(1 hour,
426	                        MIN(sigExpirationTime / 2,
427	                            MAX(TTL of K_old DNSKEY RRSet) / 2,
428	                            15 days)
429	                        )

431	6.1.6.  activeRefreshOffset

433	   The activeRefreshOffset term must be added for situations where the
434	   activeRefresh value is not a factor of the addHoldDownTime.
435	   Specifically, activeRefreshOffset will be "addHoldDownTime %
436	   activeRefresh", where % is the mathematical mod operator (calculating
437	   the remainder in a division problem).  This will frequently be zero,
438	   but could be nearly as large as activeRefresh itself.

440	   Note that later (in Section 6.1.8), when real-world scenerios will
441	   trump this value that is useful only in theoretical worlds with no
442	   network delays and other operational considerations.  We leave it
443	   here only as an important marker in the security analysis of the base
444	   RFC5011 protocol.

446	6.1.7.  driftSafetyMargin

448	   Moving past the theoretical model parameters above, we not that clock
449	   drift, network delays and implementation differences will result in
450	   the RFC5011 Resolver query times to drift over time.  Because of
451	   this, a driftSafetyMargin term must be introduce that accounts for
452	   these real world delays.  We set this value to be the same as the
453	   activeRefresh value, which will ensure that any timing drift in
454	   RFC5011 Resolver queries will be accounted for.

456	   Note: even a negative clock drift can actually cause RFC5011
457	   Resolvers to require up to an extra activeRefresh period before it
458	   will accept a new DNSKEY as a trust anchor.

460	6.1.8.  timingSafetyMargin

462	   Both of the activeRefreshOffset and driftSafetyMargin parameters deal
463	   with timing delays introduced by mathematical analysis of RFC5011
464	   (activeRefreshOffset) and by real world considerations
465	   (driftSafetyMargin).  To find a safe value to extend timing, we
466	   define a timingSafetyMargin that is the maximum of these two values.
467	   Since the driftSafetyMargin is set to activeRefresh, and
468	   activeRefreshOffset is always less than an activeRefresh, the final
469	   timingSafetyMargin value will be activeRefresh.

471	   Explicitly expanding out the math:

473	        timingSafetyMargin = min(activeRefreshOffset, driftSafetyMargin)

475	        timingSafetyMargin = min(addHoldDownTime % activeRefresh,
476	                                 activeRefresh)

478	        timingSafetyMargin = activeRefresh

480	6.1.9.  retrySafetyMargin

482	   The retrySafetyMargin is an extra period of time to account for
483	   caching, network delays, dropped packets, and other operational
484	   concerns otherwise beyond the scope of this document.  The value
485	   operators should chose is highly dependent on the deployment
486	   situation associated with their zone.  Note that no value of a
487	   retrySafetyMargin can protect against resolvers that are "down".
488	   None the less, we do offer the following as one method considering
489	   reasonable values to select from.

491	   The following list of variables need to be considered when selecting
492	   an appropriate retrySafetyMargin value:

494	   successRate:  A likely success rate for client queries and retries

496	   numResolvers:  The number of client RFC5011 Resolvers

498	   Note that RFC5011 defines retryTime as:

500	         If the query fails, the resolver MUST repeat the query until
501	         satisfied no more often than once an hour and no less often
502	         than the lesser of 1 day, 10% of the original TTL, or 10% of
503	         the original expiration interval.  That is,
504	         retryTime = MAX (1 hour, MIN (1 day, .1 * origTTL,
505	                                       .1 * expireInterval)).

507	   With the successRate and numResolvers values selected and the
508	   definition of retryTime from RFC5011, one method for determining how
509	   many retryTime intervals to wait in order to reduce the set of
510	   uncompleted servers to 0 assuming normal probability is thus:

512	                         x = (1/(1 - successRate))

514	            retryCountWait = Log_base_x(numResolvers)

516	   To reduce the need for readers to pull out a scientific calculator,
517	   we offer the following lookup table based on successRate and
518	   numResolvers:

520	                          retryCountWait lookup table
521	                        ---------------------------

523	                       Number of client RFC5011 Resolvers (numResolvers)
524	                       -------------------------------------------------
525	                        10,000  100,000 1,000,000 10,000,000 100,000,000
526	                 0.01      917     1146      1375       1604        1833
527	   Probability   0.05      180      225       270        315         360
528	   of Success    0.10       88      110       132        153         175
529	   Per Retry     0.15       57       71        86        100         114
530	   Interval      0.25       33       41        49         57          65
531	   (successRate) 0.50       14       17        20         24          27
532	                 0.90        4        5         6          7           8
533	                 0.95        4        4         5          6           7
534	                 0.99        2        3         3          4           4
535	                 0.999       2        2         2          3           3

537	   Finally, a suggested value of retrySafetyMargin can then be this
538	   retryCountWait number multiplied by the retryTime from RFC5011:

540	                 retrySafetyMargin = retryCountWait * retryTime

542	6.2.  Timing Requirements For Adding a New KSK

544	   Section 6.2.1 defines a method for calculating the amount of time to
545	   wait until it is safe to start signing exclusively with a new DNSKEY
546	   (especially useful for writing code involving sleep based timers),
547	   and Section 6.2.2 defines a method for calculating a wall-clock value
548	   after which it is safe to start signing exclusively with a new DNSKEY
549	   (especially useful for writing code based on clock-based event
550	   triggers).

552	6.2.1.  Wait Timer Based Calculation

554	   Given the attack description in Section 5, the correct minimum length
555	   of time required for the Zone Signer to wait after publishing K_new
556	   but before exclusively using it and newer keys is:

558	      addWaitTime = addHoldDownTime
559	                    + sigExpirationTimeRemaining
560	                    + activeRefresh
561	                    + timingSafetyMargin
562	                    + retrySafetyMargin

564	6.2.1.1.  Fully expanded equation

566	   Given the equation components defined in Section 6.1, the full
567	   expanded equation is:

569	      addWaitTime = addHoldDownTime
570	                    + sigExpirationTimeRemaining
571	                    + MAX(1 hour,
572	                          MIN(sigExpirationTime / 2,
573	                              MAX(TTL of K_old DNSKEY RRSet) / 2,
574	                              15 days)
575	                          )
576	                    + activeRefresh
577	                    + retrySafetyMargin

579	6.2.2.  Wall-Clock Based Calculation

581	   The equations in Section 6.2.1 are defined based upon how long to
582	   wait from a particular moment in time.  An alternative, but
583	   equivalent, method is to calculate the date and time before which it
584	   is unsafe to use a key for signing.  This calculation thus becomes:

586	      addWallClockTime = lastSigExpirationTime
587	                       + addHoldDownTime
588	                       + activeRefresh
589	                       + timingSafetyMargin
590	                       + retrySafetyMargin

592	   where lastSigExpirationTime is the latest value of any
593	   sigExpirationTime for which RRSIGs were created that could
594	   potentially be replayed.  Fully expanded, this becomes:

596	    addWallClockTime = lastSigExpirationTime
597	                       + addHoldDownTime
598	                       + 2 * MAX(1 hour,
599	                                 MIN(sigExpirationTime / 2,
600	                                     MAX(TTL of K_old DNSKEY RRSet) / 2,
601	                                     15 days)
602	                                 )
603	                       + activeRefresh
604	                       + retrySafetyMargin

606	6.2.3.  Timing Constraint Summary

608	   The important timing constraint introduced by this memo relates to
609	   the last point at which a RFC5011 Resolver may have received a
610	   replayed original DNSKEY set, containing K_old and not K_new.  The
611	   next query of the RFC5011 validator at which K_new will be seen
612	   without the potential for a replay attack will occur after the old
613	   DNSKEY RRSIG's Signature Expriation Time.  Thus, the latest time that
614	   a RFC5011 Validator may begin their hold down timer is an "Active
615	   Refresh" period after the last point that an attacker can replay the
616	   K_old DNSKEY set.  The worst case scenario of this attack is if the
617	   attacker can replay K_old just seconds before the (DNSKEY RRSIG
618	   Signature Validity) field of the last K_old only RRSIG.

620	6.2.4.  Additional Considerations for RFC7583

622	   Note: our notion of addWaitTime is called "Itrp" in Section 3.3.4.1
623	   of [RFC7583].  The equation for Itrp in RFC7583 is insecure as it
624	   does not include the sigExpirationTime listed above.  The Itrp
625	   equation in RFC7583 also does not include the 2*TTL safety margin,
626	   though that is an operational consideration.

628	6.2.5.  Example Scenario Calculations

630	   For the parameters listed in Section 5.1, our resulting addWaitTime
631	   is:

633	     addWaitTime = 30
634	                   + 10
635	                   + 1 / 2
636	                   + 1 / 2          (days)

638	     addWaitTime = 43               (days)

640	   This addWaitTime of 42.5 days is 12.5 days longer than just the hold
641	   down timer, even with the needed retrySafetyMargin value being left
642	   out (which we exclude due to the lack of necessary operational
643	   parameters).

645	6.3.  Timing Requirements For Revoking an Old KSK

647	   This issue affects not just the publication of new DNSKEYs intended
648	   to be used as trust anchors, but also the length of time required to
649	   continuously publish a DNSKEY with the revoke bit set.

651	   Section 6.2.1 defines a method for calculating the amount of time
652	   operators need to wait until it is safe to cease publishing a DNSKEY
653	   (especially useful for writing code involving sleep based timers),
654	   and Section 6.2.2 defines a method for calculating a minimal wall-
655	   clock value after which it is safe to cease publishing a DNSKEY
656	   (especially useful for writing code based on clock-based event
657	   triggers).

659	6.3.1.  Wait Timer Based Calculation

661	   Both of these publication timing requirements are affected by the
662	   attacks described in this document, but with revocation the key is
663	   revoked immediately and the addHoldDown timer does not apply.  Thus
664	   the minimum amount of time that a SEP Publisher must wait before
665	   removing a revoked key from publication is:

667	     remWaitTime = sigExpirationTimeRemaining
668	                   + activeRefresh
669	                   + timingSafetyMargin
670	                   + retrySafetyMargin

672	     remWaitTime = sigExpirationTimeRemaining
673	                   + MAX(1 hour,
674	                         MIN((sigExpirationTime) / 2,
675	                             MAX(TTL of K_old DNSKEY RRSet) / 2,
676	                             15 days))
677	                   + activeRefresh
678	                   + retrySafetyMargin

680	   Note also that adding retryTime intervals to the remWaitTime may be
681	   wise, just as it was for addWaitTime in Section 6.

683	6.3.2.  Wall-Clock Based Calculation

685	   Like before, the above equations are defined based upon how long to
686	   wait from a particular moment in time.  An alternative, but
687	   equivalent, method is to calculate the date and time before which it
688	   is unsafe to cease publishing a revoked key.  This calculation thus
689	   becomes:

691	      remWallClockTime = lastSigExpirationTime
692	                       + activeRefresh
693	                       + timingSafetyMargin
694	                       + retrySafetyMargin

696	      remWallClockTime = lastSigExpirationTime
697	                       + MAX(1 hour,
698	                             MIN((sigExpirationTime) / 2,
699	                                 MAX(TTL of K_old DNSKEY RRSet) / 2,
700	                                 15 days))
701	                       + timingSafetyMargin
702	                       + retrySafetyMargin

704	   where lastSigExpirationTime is the latest value of any
705	   sigExpirationTime for which RRSIGs were created that could
706	   potentially be replayed.  Fully expanded, this becomes:

708	6.3.3.  Additional Considerations for RFC7583

710	   Note that our notion of remWaitTime is called "Irev" in
711	   Section 3.3.4.2 of [RFC7583].  The equation for Irev in RFC7583 is
712	   insecure as it does not include the sigExpirationTime listed above.
713	   The Irev equation in RFC7583 also does not include a safety margin,
714	   though that is an operational consideration.

716	6.3.4.  Example Scenario Calculations

718	   For the parameters listed in Section 5.1, our example:

720	     remwaitTime = 10
721	                   + 1 / 2          (days)

723	     remwaitTime = 10.5             (days)

725	   Note that for the values in this example produce a length shorter
726	   than the recommended 30 days in RFC5011's section 6.6, step 3.  Other
727	   values of sigExpirationTime and the original TTL of the K_old DNSKEY
728	   RRSet, however, can produce values longer than 30 days.

730	   Note that because revocation happens immediately, an attacker has a
731	   much harder job tricking a RFC5011 Resolver into leaving a trust
732	   anchor in place, as the attacker must successfully replay the old
733	   data for every query a RFC5011 Resolver sends, not just one.

735	7.  IANA Considerations

737	   This document contains no IANA considerations.

739	8.  Operational Considerations

741	   A companion document to RFC5011 was expected to be published that
742	   describes the best operational practice considerations from the
743	   perspective of a zone publisher and SEP Publisher.  However, this
744	   companion document has yet to be published.  The authors of this
745	   document hope that it will at some point in the future, as RFC5011
746	   timing can be tricky as we have shown, and a BCP is clearly
747	   warranted.  This document is intended only to fill a single
748	   operational void which, when left misunderstood, can result in
749	   serious security ramifications.  This document does not attempt to
750	   document any other missing operational guidance for zone publishers.

752	9.  Security Considerations

754	   This document, is solely about the security considerations with
755	   respect to the SEP Publisher's ability to advertise new DNSKEYs via
756	   the RFC5011 automated trust anchor update process.  Thus the entire
757	   document is a discussion of Security Considerations when adding or
758	   removing DNSKEYs from trust anchor storage using the RFC5011 process.

760	   For simplicity, this document assumes that the SEP Publisher will use
761	   a consistent RRSIG validity period.  SEP Publishers that vary the
762	   length of RRSIG validity periods will need to adjust the
763	   sigExpirationTime value accordingly so that the equations in
764	   Section 6 and Section 6.3 use a value that coincides with the last
765	   time a replay of older RRSIGs will no longer succeed.

767	10.  Acknowledgements

769	   The authors would like to especially thank to Michael StJohns for his
770	   help and advice and the care and thought he put into RFC5011 itself
771	   and his continued reviews and suggestions for this document.  He also
772	   designed the suggested math behind the suggested retrySafetyMargin
773	   values in Section 6.1.9.

775	   We would also like to thank Bob Harold, Shane Kerr, Matthijs Mekking,
776	   Duane Wessels, Petr Petr Spacek, Ed Lewis, and the dnsop working
777	   group who have assisted with this document.

779	11.  Normative References

781	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
782	              Requirement Levels", BCP 14, RFC 2119, March 1997.

784	   [RFC4033]  Arends, R., Austein, R., Larson, M., Massey, D., and S.
785	              Rose, "DNS Security Introduction and Requirements",
786	              RFC 4033, DOI 10.17487/RFC4033, March 2005,
787	              <http://www.rfc-editor.org/info/rfc4033>.

789	   [RFC5011]  StJohns, M., "Automated Updates of DNS Security (DNSSEC)
790	              Trust Anchors", STD 74, RFC 5011, DOI 10.17487/RFC5011,
791	              September 2007, <http://www.rfc-editor.org/info/rfc5011>.

793	   [RFC7583]  Morris, S., Ihren, J., Dickinson, J., and W. Mekking,
794	              "DNSSEC Key Rollover Timing Considerations", RFC 7583,
795	              DOI 10.17487/RFC7583, October 2015, <https://www.rfc-
796	              editor.org/info/rfc7583>.

798	   [RFC7719]  Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
799	              Terminology", RFC 7719, DOI 10.17487/RFC7719, December
800	              2015, <http://www.rfc-editor.org/info/rfc7719>.

802	Appendix A.  Real World Example: The 2017 Root KSK Key Roll

804	   In 2017 and 2018, ICANN expects to (or has, depending on when you're
805	   reading this) roll the key signing key (KSK) for the root zone.  The
806	   relevant parameters associated with the root zone at the time of this
807	   writing is as follows:

809	         addHoldDownTime:                      30 days
810	         Old DNSKEY sigExpirationTime:         21 days
811	         Old DNSKEY TTL:                        2 days

813	   Thus, sticking this information into the equation in
814	   Section Section 6 yields (in days from publication time):

816	     addWaitTime = 30
817	                   + 21
818	                   + MAX(1 hour,
819	                         MIN(21 / 2,     # activeRefresh
820	                             MAX(2) / 2,
821	                             15 days),
822	                         )
823	                   + activeRefresh

825	     addWaitTime = 30 + 21 + 1 + 1

827	     addWaitTime = 53 days

829	   Also note that we exclude the retrySafetyMargin value, which is
830	   calculated based on the expected client deployment size.

832	   Thus, ICANN must wait a minimum of 52 days before switching to the
833	   newly published KSK (and 26 days before removing the old revoked key
834	   once it is published as revoked).  ICANN's current plans involve
835	   waiting over 3 months before using the new KEY and 69 days before
836	   removing the old, revoked key.  Thus, their current rollover plans
837	   are sufficiently secure from the attack discussed in this memo.

839	Authors' Addresses
840	   Wes Hardaker
841	   USC/ISI
842	   P.O. Box 382
843	   Davis, CA  95617
844	   US

846	   Email: ietf@hardakers.net

848	   Warren Kumari
849	   Google
850	   1600 Amphitheatre Parkway
851	   Mountain View, CA  94043
852	   US

854	   Email: warren@kumari.net