idnits 2.17.1 

draft-ietf-dnsop-rfc5011-security-considerations-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document updates RFC7583, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 07, 2017) is 2332 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Downref: Normative reference to an Informational RFC: RFC 7583

  ** Obsolete normative reference: RFC 7719 (Obsoleted by RFC 8499)


     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	dnsop                                                        W. Hardaker
3	Internet-Draft                                                   USC/ISI
4	Updates: 7583 (if approved)                                    W. Kumari
5	Intended status: Standards Track                                  Google
6	Expires: June 10, 2018                                 December 07, 2017

8	             Security Considerations for RFC5011 Publishers
9	          draft-ietf-dnsop-rfc5011-security-considerations-09

11	Abstract

13	   This document extends the RFC5011 rollover strategy with timing
14	   advice that must be followed by the publisher in order to maintain
15	   security.  Specifically, this document describes the math behind the
16	   minimum time-length that a DNS zone publisher must wait before
17	   signing exclusively with recently added DNSKEYs.  This document also
18	   describes the minimum time-length that a DNS zone publisher must wait
19	   after publishing a revoked DNSKEY before assuming that all active
20	   RFC5011 resolvers should have seen the revocation-marked key and
21	   removed it from their list of trust anchors.

23	   This document contains much math and complicated equations, but the
24	   summary is that the key rollover / revocation time is much longer
25	   than intuition would suggest.  If you are not both publishing a
26	   DNSSEC DNSKEY, and using RFC5011 to advertise this DNSKEY as a new
27	   Secure Entry Point key for use as a trust anchor, you probably don't
28	   need to read this document.

30	Status of This Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	   This Internet-Draft will expire on June 10, 2018.

47	Copyright Notice

49	   Copyright (c) 2017 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
65	     1.1.  Document History and Motivation . . . . . . . . . . . . .   3
66	     1.2.  Safely Rolling the Root Zone's KSK in 2017/2018 . . . . .   3
67	     1.3.  Requirements notation . . . . . . . . . . . . . . . . . .   4
68	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   4
69	   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
70	   4.  Timing Associated with RFC5011 Processing . . . . . . . . . .   5
71	     4.1.  Timing Associated with Publication  . . . . . . . . . . .   5
72	     4.2.  Timing Associated with Revocation . . . . . . . . . . . .   5
73	   5.  Denial of Service Attack Walkthrough  . . . . . . . . . . . .   6
74	     5.1.  Enumerated Attack Example . . . . . . . . . . . . . . . .   6
75	       5.1.1.  Attack Timing Breakdown . . . . . . . . . . . . . . .   7
76	   6.  Minimum RFC5011 Timing Requirements . . . . . . . . . . . . .   8
77	     6.1.  Equation Components . . . . . . . . . . . . . . . . . . .   8
78	       6.1.1.  addHoldDownTime . . . . . . . . . . . . . . . . . . .   9
79	       6.1.2.  lastSigExpirationTime . . . . . . . . . . . . . . . .   9
80	       6.1.3.  sigExpirationTime . . . . . . . . . . . . . . . . . .   9
81	       6.1.4.  sigExpirationTimeRemaining  . . . . . . . . . . . . .   9
82	       6.1.5.  sigExpirationTimeRemaining  . . . . . . . . . . . . .   9
83	       6.1.6.  activeRefresh . . . . . . . . . . . . . . . . . . . .   9
84	       6.1.7.  activeRefreshOffset . . . . . . . . . . . . . . . . .  10
85	       6.1.8.  safetyMargin  . . . . . . . . . . . . . . . . . . . .  10
86	     6.2.  Timing Requirements For Adding a New KSK  . . . . . . . .  11
87	       6.2.1.  Wait Timer Based Calculation  . . . . . . . . . . . .  11
88	       6.2.2.  Wall-Clock Based Calculation  . . . . . . . . . . . .  12
89	       6.2.3.  Timing Constraint Summary . . . . . . . . . . . . . .  13
90	       6.2.4.  Additional Considerations for RFC7583 . . . . . . . .  13
91	       6.2.5.  Example Scenario Calculations . . . . . . . . . . . .  13
92	     6.3.  Timing Requirements For Revoking an Old KSK . . . . . . .  14
93	       6.3.1.  Wait Timer Based Calculation  . . . . . . . . . . . .  14
94	       6.3.2.  Wall-Clock Based Calculation  . . . . . . . . . . . .  14
95	       6.3.3.  Additional Considerations for RFC7583 . . . . . . . .  15
96	       6.3.4.  Example Scenario Calculations . . . . . . . . . . . .  15
97	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
98	   8.  Operational Considerations  . . . . . . . . . . . . . . . . .  16
99	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  16
100	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  16
101	   11. Normative References  . . . . . . . . . . . . . . . . . . . .  17
102	   Appendix A.  Real World Example: The 2017 Root KSK Key Roll . . .  17
103	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  18

105	1.  Introduction

107	   [RFC5011] defines a mechanism by which DNSSEC validators can update
108	   their list of trust anchors when they've seen a new key published in
109	   a zone or revoke a properly marked key from a trust anchor list.
110	   However, RFC5011 [intentionally] provides no guidance to the
111	   publishers of DNSKEYs about how long they must wait before switching
112	   to exclusively using recently published keys for signing records, or
113	   how long they must wait before ceasing publication of a revoked key.
114	   Because of this lack of guidance, zone publishers may derive
115	   incorrect assumptions about safe usage of the RFC5011 DNSKEY
116	   advertising, rolling and revocation process.  This document describes
117	   the minimum security requirements from a publisher's point of view
118	   and is intended to complement the guidance offered in RFC5011 (which
119	   is written to provide timing guidance solely to a Validating
120	   Resolver's point of view).

122	1.1.  Document History and Motivation

124	   To verify this lack of understanding is wide-spread, the authors
125	   reached out to 5 DNSSEC experts to ask them how long they thought
126	   they must wait before signing a zone exclusively with a new KSK
127	   [RFC4033] that was being introduced according to the 5011 process.
128	   All 5 experts answered with an insecure value, and we determined that
129	   this lack of mathematical understanding might cause security concerns
130	   in deployment.  We hope that this companion document to RFC5011 will
131	   rectify this understanding and provide better guidance to zone
132	   publishers that wish to make use of the RFC5011 rollover process.

134	1.2.  Safely Rolling the Root Zone's KSK in 2017/2018

136	   One important note about ICANN's (currently in process) 2017/2018 KSK
137	   rollover plan for the root zone: the timing values chosen for rolling
138	   the KSK in the root zone appear completely safe, and are not affected
139	   by the timing concerns introduced by this draft

141	1.3.  Requirements notation

143	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
144	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
145	   document are to be interpreted as described in [RFC2119].

147	2.  Background

149	   The RFC5011 process describes a process by which a RFC5011 Resolver
150	   may accept a newly published KSK as a trust anchor for validating
151	   future DNSSEC signed records.  It also describes the process for
152	   publicly revoking a published KSK.  This document augments that
153	   information with additional constraints, from the SEP publisher's
154	   points of view.  Note that this document does not define any other
155	   operational guidance or recommendations about the RFC5011 process and
156	   restricts itself to solely the security and operational ramifications
157	   of switching to exclusively using recently added keys or removing
158	   revoked keys too soon.

160	   Failure of a DNSKEY publisher to follow the minimum recommendations
161	   associated with this draft can result in potential denial-of-service
162	   attack opportunities against validating resolvers.  Failure of a
163	   DNSKEY publisher to publish a revoked key for a long enough period of
164	   time may result in RFC5011 Resolvers leaving that key in their trust
165	   anchor storage beyond the key's expected lifetime.

167	3.  Terminology

169	   SEP Publisher  The entity responsible for publishing a DNSKEY (with
170	      the Secure Entry Point (SEP) bit set) that can be used as a trust
171	      anchor.

173	   Zone Signer  The owner of a zone intending to publish a new Key-
174	      Signing-Key (KSK) that may become a trust anchor for validators
175	      following the RFC5011 process.

177	   RFC5011 Resolver  A DNSSEC Resolver that is using the RFC5011
178	      processes to track and update trust anchors.

180	   Attacker  An entity intent on foiling the RFC5011 Resolver's ability
181	      to successfully adopt the Zone Signer's new DNSKEY as a new trust
182	      anchor or to prevent the RFC5011 Resolver from removing an old
183	      DNSKEY from its list of trust anchors.

185	   sigExpirationTime  The amount of time between the DNSKEY RRSIG's
186	      Signature Inception field and the Signature Expiration field.

188	   Also see Section 2 of [RFC4033] and [RFC7719] for additional
189	   terminology.

191	4.  Timing Associated with RFC5011 Processing

193	   These sections define a high-level overview of [RFC5011] processing.
194	   These steps are not sufficient for proper RFC5011 implementation, but
195	   provide enough background for the reader to follow the discussion in
196	   this document.  Readers need to fully understand [RFC5011] as well to
197	   fully comprehend the content and importance of this document.

199	4.1.  Timing Associated with Publication

201	   RFC5011's process of safely publishing a new DNSKEY and then assuming
202	   RFC5011 Resolvers have adopted it for trust falls into a number of
203	   high-level steps to be performed by the SEP Publisher.  This document
204	   discusses the following scenario, which the principle way RFC5011 is
205	   currently being used (even though Section 6 of RFC5011 suggests
206	   having a stand-by key available):

208	   1.  Publish a new DNSKEY in a zone, but continue to sign the zone
209	       with the old one.

211	   2.  Wait a period of time.

213	   3.  Begin to exclusively use recently published DNSKEYs to sign the
214	       appropriate resource records.

216	   This document discusses the time required to wait during step 2 of
217	   the above process.  Some interpretations of RFC5011 have erroneously
218	   determined that the wait time is equal to RFC5011's "hold down time".
219	   Section 5 describes an attack based on this (common) erroneous
220	   belief, which can result in a denial of service attack against the
221	   zone.

223	4.2.  Timing Associated with Revocation

225	   RFC5011's process of advertising that an old key is to be revoked
226	   from RFC5011 Resolvers falls into a number of high-level steps:

228	   1.  Set the revoke bit on the DNSKEY to be revoked.

230	   2.  Sign the revoked DNSKEY with itself.

232	   3.  Wait a period of time.

234	   4.  Remove the revoked key from the zone.

236	   This document discusses the time required to wait in step 3 of the
237	   above process.  Some interpretations of RFC5011 have erroneously
238	   determined that the wait time is equal to RFC5011's "hold down time".
239	   This document describes an attack based on this (common) erroneous
240	   belief, which results in a revoked DNSKEY potentially remaining as a
241	   trust anchor in a RFC5011 Resolver long past its expected usage.

243	5.  Denial of Service Attack Walkthrough

245	   This section serves as an illustrative example of the problem being
246	   discussed in this document.  Note that in order to keep the example
247	   simple enough to understand, some simplifications were made (such as
248	   by not creating a set of pre-signed RRSIGs and by not using values
249	   that result in the addHoldDownTime not being evenly divisible by the
250	   activeRefresh value); the mathematical formulas in Section 6 are,
251	   however, complete.

253	   If an attacker is able to provide a RFC5011 Resolver with past
254	   responses, such as when it is in-path or able to perform any number
255	   of cache poisoning attacks, the attacker may be able to leave
256	   compliant RFC5011 Resolvers without an appropriate DNSKEY trust
257	   anchor.  This scenario will remain until an administrator manually
258	   fixes the situation.

260	   The time-line below illustrates an example of this situation.

262	5.1.  Enumerated Attack Example

264	   The following example settings are used in the example scenario
265	   within this section:

267	   TTL (all records)  1 day

269	   sigExpirationTime  10 days

271	   Zone resigned every  1 day

273	   Given these settings, the sequence of events in Section 5.1.1 depicts
274	   how a SEP Publisher that waits for only the RFC5011 hold time timer
275	   length of 30 days subjects its users to a potential Denial of Service
276	   attack.  The timing schedule listed below is based on a SEP Publisher
277	   publishing a new Key Signing Key (KSK), with the intent that it will
278	   later be used as a trust anchor.  We label this publication time as
279	   "T+0".  All numbers in this sequence refer to days before and after
280	   this initial publication event.  Thus, T-1 is the day before the
281	   introduction of the new key, and T+15 is the 15th day after the key
282	   was introduced into the fictitious zone being discussed.

284	   In this dialog, we consider two keys within the example zone:

286	   K_old:  An older KSK and Trust Anchor being replaced.

288	   K_new:  A new KSK being transitioned into active use and expected to
289	      become a Trust Anchor via the RFC5011 automated trust anchor
290	      update process.

292	5.1.1.  Attack Timing Breakdown

294	   The steps shows an attack that foils the adoption of a new DNSKEY by
295	   a 5011 Resolver when the SEP Publisher that starts signing and
296	   publishing with the new DNSKEY too quickly.

298	   T-1  The K_old based RRSIGs are being published by the Zone Signer.
299	      [It may also be signing ZSKs as well, but they are not relevant to
300	      this event so we will not talk further about them; we are only
301	      considering the RRSIGs that cover the DNSKEYs in this document.]
302	      The Attacker queries for, retrieves and caches this DNSKEY set and
303	      corresponding RRSIG signatures.

305	   T+0  The Zone Signer adds K_new to their zone and signs the zone's
306	      key set with K_old.  The RFC5011 Resolver (later to be under
307	      attack) retrieves this new key set and corresponding RRSIGs and
308	      notices the publication of K_new.  The RFC5011 Resolver starts the
309	      (30-day) hold-down timer for K_new.  [Note that in a more real-
310	      world scenario there will likely be a further delay between the
311	      point where the Zone Signer publishes a new RRSIG and the RFC5011
312	      Resolver notices its publication; though not shown in this
313	      example, this delay is accounted for in the equation in Section 6
314	      below]

316	   T+5  The RFC5011 Resolver queries for the zone's keyset per the
317	      RFC5011 Active Refresh schedule, discussed in Section 2.3 of
318	      RFC5011.  Instead of receiving the intended published keyset, the
319	      Attacker successfully replays the keyset and associated signatures
320	      recorded at T-1 to the victim RFC5011 Resolver.  Because the
321	      signature lifetime is 10 days (in this example), the replayed
322	      signature and keyset is accepted as valid (being only 6 days old,
323	      which is less than sigExpirationTime) and the RFC5011 Resolver
324	      cancels the (30-day) hold-down timer for K_new, per the RFC5011
325	      algorithm.

327	   T+10  The RFC5011 Resolver queries for the zone's keyset and
328	      discovers a signed keyset that includes K_new (again), and is
329	      signed by K_old.  Note: the attacker is unable to replay the
330	      records cached at T-1, because the signatures have now expired.

332	      Thus at T+10, the RFC5011 Resolver starts (anew) the hold-timer
333	      for K_new.

335	   T+11 through T+29  The RFC5011 Resolver continues checking the zone's
336	      key set at the prescribed regular intervals.  During this period,
337	      the attacker can no longer replay traffic to their benefit.

339	   T+30  The Zone Signer knows that this is the first time at which some
340	      validators might accept K_new as a new trust anchor, since the
341	      hold-down timer of a RFC5011 Resolver not under attack that had
342	      queried and retrieved K_new at T+0 would now have reached 30 days.
343	      However, the hold-down timer of our attacked RFC5011 Resolver is
344	      only at 20 days.

346	   T+35  The Zone Signer (mistakenly) believes that all validators
347	      following the Active Refresh schedule (Section 2.3 of RFC5011)
348	      should have accepted K_new as a the new trust anchor (since the
349	      hold down time (30 days) + the query interval [which is just 1/2
350	      the signature validity period in this example] would have passed).
351	      However, the hold-down timer of our attacked RFC5011 Resolver is
352	      only at 25 days (T+35 minus T+10); thus the RFC5011 Resolver won't
353	      consider it a valid trust anchor addition yet, as the required 30
354	      days have not yet elapsed.

356	   T+36  The Zone Signer, believing K_new is safe to use, switches their
357	      active signing KSK to K_new and publishes a new RRSIG, signed with
358	      (only) K_new, covering the DNSKEY set.  Non-attacked RFC5011
359	      validators, with a hold-down timer of at least 30 days, would have
360	      accepted K_new into their set of trusted keys.  But, because our
361	      attacked RFC5011 Resolver now has a hold-down timer for K_new of
362	      only 26 days, it failed to ever accept K_new as a trust anchor.
363	      Since K_old is no longer being used to sign the zone's DNSKEYs,
364	      all the DNSKEY records from the zone will be treated as invalid.
365	      Subsequently, all of the records in the DNS tree below the zone's
366	      apex will be deemed invalid by DNSSEC.

368	6.  Minimum RFC5011 Timing Requirements

370	   This section defines the minimum timing requirements for making
371	   exclusive use of newly added DNSKEYs and timing requirements for
372	   ceasing the publication of DNSKEYs to be revoked.  First, we define
373	   the term components used in both equations in Section 6.1.

375	6.1.  Equation Components
376	6.1.1.  addHoldDownTime

378	   The addHoldDownTime is defined in Section 2.4.1 of [RFC5011] as:

380	       The add hold-down time is 30 days or the expiration time of the
381	       original TTL of the first trust point DNSKEY RRSet that contained
382	       the new key, whichever is greater.  This ensures that at least
383	       two validated DNSKEY RRSets that contain the new key MUST be seen
384	       by the resolver prior to the key's acceptance.

386	6.1.2.  lastSigExpirationTime

388	   The latest value (i.e. the future most date and time) of any RRSig
389	   Signature Expiration field covering any DNSKEY RRSet containing only
390	   the old trust anchor(s) that are being superseded.  Note that for
391	   organizations pre-creating signatures this time may be fairly far in
392	   the future unless they can be significantly assured that none of
393	   their pre-generated signatures can be replayed at a later date.

395	6.1.3.  sigExpirationTime

397	   The amount of time between the DNSKEY RRSIG's Signature Inception
398	   field and the Signature Expiration field.

400	6.1.4.  sigExpirationTimeRemaining

402	   The amount of time remaining before lastSigExpirationTime is reached.

404	6.1.5.  sigExpirationTimeRemaining

406	   sigExpirationTimeRemaining is defined in Section 3.

408	6.1.6.  activeRefresh

410	   activeRefresh time is defined by RFC5011 by

412	     A resolver that has been configured for an automatic update
413	     of keys from a particular trust point MUST query that trust
414	     point (e.g., do a lookup for the DNSKEY RRSet and related
415	     RRSIG records) no less often than the lesser of 15 days, half
416	     the original TTL for the DNSKEY RRSet, or half the RRSIG
417	     expiration interval and no more often than once per hour.

419	   This translates to:

421	    activeRefresh = MAX(1 hour,
422	                        MIN(sigExpirationTime / 2,
423	                            MAX(TTL of K_old DNSKEY RRSet) / 2,
424	                            15 days)
425	                        )

427	6.1.7.  activeRefreshOffset

429	   The activeRefreshOffset term must be added for situations where the
430	   activeRefresh value is not a factor of the addHoldDownTime.
431	   Specifically, activeRefreshOffset will be "addHoldDownTime %
432	   activeRefresh", where % is the mathematical mod operator (calculating
433	   the remainder in a division problem).  This will frequently be zero,
434	   but could be nearly as large as activeRefresh itself.  For
435	   simplicity, setting the activeRefreshOffset to the activeRefresh
436	   value itself is always safe.

438	6.1.8.  safetyMargin

440	   The safetyMargin is an extra period of time to account for caching,
441	   network delays, dropped packets, and other operational concerns
442	   otherwise beyond the scope of this document.  The value operators
443	   should chose is highly dependent on the deployment situation
444	   associated with their zone.  Note that no value of a safetyMargin can
445	   protect against resolvers that are "down".  None the less, we do
446	   offer the following as one method considering reasonable values to
447	   select from.

449	   The following list of variables need to be considered when selecting
450	   an appropriate safetyMargin value:

452	   successRate:  A likely success rate for client queries and retries

454	   numResolvers:  The number of client RFC5011 Resolvers

456	   Note that RFC5011 defines retryTime as:

458	         If the query fails, the resolver MUST repeat the query until
459	         satisfied no more often than once an hour and no less often
460	         than the lesser of 1 day, 10% of the original TTL, or 10% of
461	         the original expiration interval.  That is,
462	         retryTime = MAX (1 hour, MIN (1 day, .1 * origTTL,
463	                                       .1 * expireInterval)).

465	   With the successRate and numResolvers values selected and the
466	   definition of retryTime from RFC5011, one method for determining how
467	   many retryTime intervals to wait in order to reduce the set of
468	   uncompleted servers to 0 assuming normal probability is thus:

470	                         x = (1/(1 - successRate))

472	            retryCountWait = Log_base_x(numResolvers)

474	   To reduce the need for readers to pull out a scientific calculator,
475	   we offer the following lookup table based on successRate and
476	   numResolvers:

478	                        retryCountWait lookup table
479	                        ---------------------------

481	                       Number of client RFC5011 Resolvers (numResolvers)
482	                       -------------------------------------------------
483	                        10,000  100,000 1,000,000 10,000,000 100,000,000
484	                 0.01      917     1146      1375       1604        1833
485	 Probability     0.05      180      225       270        315         360
486	 of Success      0.10       88      110       132        153         175
487	 Per Retry       0.15       57       71        86        100         114
488	 Interval        0.25       33       41        49         57          65
489	 (successRate)   0.50       14       17        20         24          27
490	                 0.90        4        5         6          7           8
491	                 0.95        4        4         5          6           7
492	                 0.99        2        3         3          4           4
493	                 0.999       2        2         2          3           3

495	   Finally, a suggested value of safetyMargin can then be this
496	   retryCountWait number multiplied by the retryTime from RFC5011:

498	                 safetyMargin = retryCountWait * retryTime

500	6.2.  Timing Requirements For Adding a New KSK

502	   Section 6.2.1 defines a method for calculating the amount of time to
503	   wait until it is safe to start signing exclusively with a new DNSKEY
504	   (especially useful for writing code involving sleep based timers),
505	   and Section 6.2.2 defines a method for calculating a wall-clock value
506	   after which it is safe to start signing exclusively with a new DNSKEY
507	   (especially useful for writing code based on clock-based event
508	   triggers).

510	6.2.1.  Wait Timer Based Calculation

512	   Given the attack description in Section 5, the correct minimum length
513	   of time required for the Zone Signer to wait after publishing K_new
514	   but before exclusively using it and newer keys is:

516	      addWaitTime = addHoldDownTime
517	                    + sigExpirationTimeRemaining
518	                    + activeRefresh
519	                    + activeRefreshOffset
520	                    + safetyMargin

522	6.2.1.1.  Fully expanded equation

524	   Given the equation components defined in Section 6.1, the full
525	   expanded equation is:

527	      addWaitTime = addHoldDownTime
528	                    + sigExpirationTimeRemaining
529	                    + MAX(1 hour,
530	                          MIN(sigExpirationTime / 2,
531	                              MAX(TTL of K_old DNSKEY RRSet) / 2,
532	                              15 days)
533	                          )
534	                    + (addHoldDownTime % activeRefresh)
535	                    + MAX(1.5 hours, 2 * MAX(TTL of all records))
536	                    + safetyMargin

538	6.2.2.  Wall-Clock Based Calculation

540	   The equations in Section 6.2.1 are defined based upon how long to
541	   wait from a particular moment in time.  An alternative, but
542	   equivalent, method is to calculate the date and time before which it
543	   is unsafe to use a key for signing.  This calculation thus becomes:

545	      addWallClockTime = lastSigExpirationTime
546	                       + addHoldDownTime
547	                       + activeRefresh
548	                       + activeRefreshOffset
549	                       + safetyMargin

551	   where lastSigExpirationTime is the latest value of any
552	   sigExpirationTime for which RRSIGs were created that could
553	   potentially be replayed.  Fully expanded, this becomes:

555	    addWallClockTime = lastSigExpirationTime
556	                       + addHoldDownTime
557	                       + 2 * MAX(1 hour,
558	                                 MIN(sigExpirationTime / 2,
559	                                     MAX(TTL of K_old DNSKEY RRSet) / 2,
560	                                     15 days)
561	                                 )
562	                       + (addHoldDownTime % activeRefresh)
563	                       + MAX(1.5 hours, 2 * MAX(TTL of all records))
564	                       + safetyMargin

566	6.2.3.  Timing Constraint Summary

568	   The important timing constraint introduced by this memo relates to
569	   the last point at which a RFC5011 Resolver may have received a
570	   replayed original DNSKEY set, containing K_old and not K_new.  The
571	   next query of the RFC5011 validator at which K_new will be seen
572	   without the potential for a replay attack will occur after the old
573	   DNSKEY RRSIG's Signature Expriation Time.  Thus, the latest time that
574	   a RFC5011 Validator may begin their hold down timer is an "Active
575	   Refresh" period after the last point that an attacker can replay the
576	   K_old DNSKEY set.  The worst case scenario of this attack is if the
577	   attacker can replay K_old just seconds before the (DNSKEY RRSIG
578	   Signature Validity) field of the last K_old only RRSIG.

580	6.2.4.  Additional Considerations for RFC7583

582	   Note: our notion of addWaitTime is called "Itrp" in Section 3.3.4.1
583	   of [RFC7583].  The equation for Itrp in RFC7583 is insecure as it
584	   does not include the sigExpirationTime listed above.  The Itrp
585	   equation in RFC7583 also does not include the 2*TTL safety margin,
586	   though that is an operational consideration.

588	6.2.5.  Example Scenario Calculations

590	   For the parameters listed in Section 5.1, the activeRefreshOffset is
591	   0, since 30 days is evenly divisible by activeRefresh (1/2 day), and
592	   our resulting addWaitTime is:

594	     addWaitTime = 30
595	                   + 10
596	                   + 1 / 2
597	                   + 0              (days)

599	     addWaitTime = 42.5             (days)

601	   This addWaitTime of 42.5 days is 12.5 days longer than just the hold
602	   down timer, even with the needed safetyMargin value being left out
603	   (which we exclude due to the lack of necessary operational
604	   parameters).

606	6.3.  Timing Requirements For Revoking an Old KSK

608	   This issue affects not just the publication of new DNSKEYs intended
609	   to be used as trust anchors, but also the length of time required to
610	   continuously publish a DNSKEY with the revoke bit set.

612	   Section 6.2.1 defines a method for calculating the amount of time
613	   operators need to wait until it is safe to cease publishing a DNSKEY
614	   (especially useful for writing code involving sleep based timers),
615	   and Section 6.2.2 defines a method for calculating a minimal wall-
616	   clock value after which it is safe to cease publishing a DNSKEY
617	   (especially useful for writing code based on clock-based event
618	   triggers).

620	6.3.1.  Wait Timer Based Calculation

622	   Both of these publication timing requirements are affected by the
623	   attacks described in this document, but with revocation the key is
624	   revoked immediately and the addHoldDown timer does not apply.  Thus
625	   the minimum amount of time that a SEP Publisher must wait before
626	   removing a revoked key from publication is:

628	     remWaitTime = sigExpirationTimeRemaining
629	                   + activeRefresh
630	                   + safetyMargin

632	     remWaitTime = sigExpirationTimeRemaining
633	                   + MAX(1 hour,
634	                         MIN((sigExpirationTime) / 2,
635	                             MAX(TTL of K_old DNSKEY RRSet) / 2,
636	                             15 days))
637	                   + safetyMargin

639	   Note that the activeRefreshOffset time does not apply to this
640	   equation.

642	   Note also that adding retryTime intervals to the remWaitTime may be
643	   wise, just as it was for addWaitTime in Section 6.

645	6.3.2.  Wall-Clock Based Calculation

647	   Like before, the above equations are defined based upon how long to
648	   wait from a particular moment in time.  An alternative, but
649	   equivalent, method is to calculate the date and time before which it
650	   is unsafe to cease publishing a revoked key.  This calculation thus
651	   becomes:

653	      remWallClockTime = lastSigExpirationTime
654	                       + activeRefresh
655	                       + safetyMargin

657	      remWallClockTime = lastSigExpirationTime
658	                       + MAX(1 hour,
659	                             MIN((sigExpirationTime) / 2,
660	                                 MAX(TTL of K_old DNSKEY RRSet) / 2,
661	                                 15 days))
662	                       + safetyMargin

664	   where lastSigExpirationTime is the latest value of any
665	   sigExpirationTime for which RRSIGs were created that could
666	   potentially be replayed.  Fully expanded, this becomes:

668	6.3.3.  Additional Considerations for RFC7583

670	   Note that our notion of remWaitTime is called "Irev" in
671	   Section 3.3.4.2 of [RFC7583].  The equation for Irev in RFC7583 is
672	   insecure as it does not include the sigExpirationTime listed above.
673	   The Irev equation in RFC7583 also does not include a safety margin,
674	   though that is an operational consideration.

676	6.3.4.  Example Scenario Calculations

678	   For the parameters listed in Section 5.1, our example:

680	     remwaitTime = 10
681	                   + 1 / 2          (days)

683	     remwaitTime = 10.5             (days)

685	   Note that for the values in this example produce a length shorter
686	   than the recommended 30 days in RFC5011's section 6.6, step 3.  Other
687	   values of sigExpirationTime and the original TTL of the K_old DNSKEY
688	   RRSet, however, can produce values longer than 30 days.

690	   Note that because revocation happens immediately, an attacker has a
691	   much harder job tricking a RFC5011 Resolver into leaving a trust
692	   anchor in place, as the attacker must successfully replay the old
693	   data for every query a RFC5011 Resolver sends, not just one.

695	7.  IANA Considerations

697	   This document contains no IANA considerations.

699	8.  Operational Considerations

701	   A companion document to RFC5011 was expected to be published that
702	   describes the best operational practice considerations from the
703	   perspective of a zone publisher and SEP Publisher.  However, this
704	   companion document has yet to be published.  The authors of this
705	   document hope that it will at some point in the future, as RFC5011
706	   timing can be tricky as we have shown, and a BCP is clearly
707	   warranted.  This document is intended only to fill a single
708	   operational void which, when left misunderstood, can result in
709	   serious security ramifications.  This document does not attempt to
710	   document any other missing operational guidance for zone publishers.

712	9.  Security Considerations

714	   This document, is solely about the security considerations with
715	   respect to the SEP Publisher's ability to advertise new DNSKEYs via
716	   the RFC5011 automated trust anchor update process.  Thus the entire
717	   document is a discussion of Security Considerations when adding or
718	   removing DNSKEYs from trust anchor storage using the RFC5011 process.

720	   For simplicity, this document assumes that the SEP Publisher will use
721	   a consistent RRSIG validity period.  SEP Publishers that vary the
722	   length of RRSIG validity periods will need to adjust the
723	   sigExpirationTime value accordingly so that the equations in
724	   Section 6 and Section 6.3 use a value that coincides with the last
725	   time a replay of older RRSIGs will no longer succeed.

727	10.  Acknowledgements

729	   The authors would like to especially thank to Michael StJohns for his
730	   help and advice and the care and thought he put into RFC5011 itself
731	   and his continued reviews and suggestions for this document.  He also
732	   designed the suggested math behind the suggested safetyMargin values
733	   in Section 6.1.8.

735	   We would also like to thank Bob Harold, Shane Kerr, Matthijs Mekking,
736	   Duane Wessels, Petr Petr Spacek, Ed Lewis, and the dnsop working
737	   group who have assisted with this document.

739	11.  Normative References

741	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
742	              Requirement Levels", BCP 14, RFC 2119,
743	              DOI 10.17487/RFC2119, March 1997, <https://www.rfc-
744	              editor.org/info/rfc2119>.

746	   [RFC4033]  Arends, R., Austein, R., Larson, M., Massey, D., and S.
747	              Rose, "DNS Security Introduction and Requirements",
748	              RFC 4033, DOI 10.17487/RFC4033, March 2005,
749	              <https://www.rfc-editor.org/info/rfc4033>.

751	   [RFC5011]  StJohns, M., "Automated Updates of DNS Security (DNSSEC)
752	              Trust Anchors", STD 74, RFC 5011, DOI 10.17487/RFC5011,
753	              September 2007, <https://www.rfc-editor.org/info/rfc5011>.

755	   [RFC7583]  Morris, S., Ihren, J., Dickinson, J., and W. Mekking,
756	              "DNSSEC Key Rollover Timing Considerations", RFC 7583,
757	              DOI 10.17487/RFC7583, October 2015, <https://www.rfc-
758	              editor.org/info/rfc7583>.

760	   [RFC7719]  Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
761	              Terminology", RFC 7719, DOI 10.17487/RFC7719, December
762	              2015, <https://www.rfc-editor.org/info/rfc7719>.

764	Appendix A.  Real World Example: The 2017 Root KSK Key Roll

766	   In 2017 and 2018, ICANN expects to (or has, depending on when you're
767	   reading this) roll the key signing key (KSK) for the root zone.  The
768	   relevant parameters associated with the root zone at the time of this
769	   writing is as follows:

771	         addHoldDownTime:                      30 days
772	         Old DNSKEY sigExpirationTime:         21 days
773	         Old DNSKEY TTL:                        2 days

775	   Thus, sticking this information into the equation in
776	   Section Section 6 yields (in days from publication time):

778	     addWaitTime = 30
779	                   + 21
780	                   + MAX(1 hour,
781	                         MIN(21 / 2,     # activeRefresh
782	                             MAX(2) / 2,
783	                             15 days),
784	                         )
785	                   + 30 % activeRefresh

787	     addWaitTime = 30 + 21
788	                   + MAX(1 hour, MIN(11.5, 1, 15)))
789	                   + 30 % activeRefresh

791	     addWaitTime = 30 + 21 + 1 + 30%1

793	     addWaitTime = 30 + 21 + 1 + 0

795	     addWaitTime = 52 days

797	   Note that activeRefreshOffset ends up being 0, since 30 days is
798	   evenly divisible by activeRefresh (1 day).

800	   Also note that we exclude the safetyMargin value, which is calculated
801	   based on the expected client deployment size.

803	   Thus, ICANN must wait a minimum of 52 days before switching to the
804	   newly published KSK (and 26 days before removing the old revoked key
805	   once it is published as revoked).  ICANN's current plans involve
806	   waiting over 3 months before using the new KEY and 69 days before
807	   removing the old, revoked key.  Thus, their current rollover plans
808	   are sufficiently secure from the attack discussed in this memo.

810	Authors' Addresses

812	   Wes Hardaker
813	   USC/ISI
814	   P.O. Box 382
815	   Davis, CA  95617
816	   US

818	   Email: ietf@hardakers.net
819	   Warren Kumari
820	   Google
821	   1600 Amphitheatre Parkway
822	   Mountain View, CA  94043
823	   US

825	   Email: warren@kumari.net