idnits 2.17.1 

draft-ietf-dnsop-rfc5011-security-considerations-11.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document updates RFC7583, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 01, 2018) is 2275 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Downref: Normative reference to an Informational RFC: RFC 7583

  ** Obsolete normative reference: RFC 7719 (Obsoleted by RFC 8499)


     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	dnsop                                                        W. Hardaker
3	Internet-Draft                                                   USC/ISI
4	Updates: 7583 (if approved)                                    W. Kumari
5	Intended status: Standards Track                                  Google
6	Expires: August 5, 2018                                February 01, 2018

8	             Security Considerations for RFC5011 Publishers
9	          draft-ietf-dnsop-rfc5011-security-considerations-11

11	Abstract

13	   This document extends the RFC5011 rollover strategy with timing
14	   advice that must be followed by the publisher in order to maintain
15	   security.  Specifically, this document describes the math behind the
16	   minimum time-length that a DNS zone publisher must wait before
17	   signing exclusively with recently added DNSKEYs.  This document also
18	   describes the minimum time-length that a DNS zone publisher must wait
19	   after publishing a revoked DNSKEY before assuming that all active
20	   RFC5011 resolvers should have seen the revocation-marked key and
21	   removed it from their list of trust anchors.

23	   This document contains much math and complicated equations, but the
24	   summary is that the key rollover / revocation time is much longer
25	   than intuition would suggest.  If you are not both publishing a
26	   DNSSEC DNSKEY, and using RFC5011 to advertise this DNSKEY as a new
27	   Secure Entry Point key for use as a trust anchor, you probably don't
28	   need to read this document.

30	Status of This Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	   This Internet-Draft will expire on August 5, 2018.

47	Copyright Notice

49	   Copyright (c) 2018 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
65	     1.1.  Document History and Motivation . . . . . . . . . . . . .   3
66	     1.2.  Safely Rolling the Root Zone's KSK in 2017/2018 . . . . .   4
67	     1.3.  Requirements notation . . . . . . . . . . . . . . . . . .   4
68	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   4
69	   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
70	   4.  Timing Associated with RFC5011 Processing . . . . . . . . . .   5
71	     4.1.  Timing Associated with Publication  . . . . . . . . . . .   5
72	     4.2.  Timing Associated with Revocation . . . . . . . . . . . .   5
73	   5.  Denial of Service Attack Walkthrough  . . . . . . . . . . . .   6
74	     5.1.  Enumerated Attack Example . . . . . . . . . . . . . . . .   6
75	       5.1.1.  Attack Timing Breakdown . . . . . . . . . . . . . . .   7
76	   6.  Minimum RFC5011 Timing Requirements . . . . . . . . . . . . .   8
77	     6.1.  Equation Components . . . . . . . . . . . . . . . . . . .   9
78	       6.1.1.  addHoldDownTime . . . . . . . . . . . . . . . . . . .   9
79	       6.1.2.  lastSigExpirationTime . . . . . . . . . . . . . . . .   9
80	       6.1.3.  sigExpirationTime . . . . . . . . . . . . . . . . . .   9
81	       6.1.4.  sigExpirationTimeRemaining  . . . . . . . . . . . . .   9
82	       6.1.5.  activeRefresh . . . . . . . . . . . . . . . . . . . .   9
83	       6.1.6.  timingSafetyMargin  . . . . . . . . . . . . . . . . .  10
84	       6.1.7.  retrySafetyMargin . . . . . . . . . . . . . . . . . .  12
85	     6.2.  Timing Requirements For Adding a New KSK  . . . . . . . .  13
86	       6.2.1.  Wait Timer Based Calculation  . . . . . . . . . . . .  14
87	       6.2.2.  Wall-Clock Based Calculation  . . . . . . . . . . . .  14
88	       6.2.3.  Timing Constraint Summary . . . . . . . . . . . . . .  15
89	       6.2.4.  Additional Considerations for RFC7583 . . . . . . . .  15
90	       6.2.5.  Example Scenario Calculations . . . . . . . . . . . .  15
91	     6.3.  Timing Requirements For Revoking an Old KSK . . . . . . .  16
92	       6.3.1.  Wait Timer Based Calculation  . . . . . . . . . . . .  16
93	       6.3.2.  Wall-Clock Based Calculation  . . . . . . . . . . . .  16
94	       6.3.3.  Additional Considerations for RFC7583 . . . . . . . .  17
95	       6.3.4.  Example Scenario Calculations . . . . . . . . . . . .  17
96	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  18
97	   8.  Operational Considerations  . . . . . . . . . . . . . . . . .  18
98	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  18
99	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  18
100	   11. Normative References  . . . . . . . . . . . . . . . . . . . .  19
101	   Appendix A.  Real World Example: The 2017 Root KSK Key Roll . . .  19
102	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20

104	1.  Introduction

106	   [RFC5011] defines a mechanism by which DNSSEC validators can update
107	   their list of trust anchors when they've seen a new key published in
108	   a zone or revoke a properly marked key from a trust anchor list.
109	   However, RFC5011 [intentionally] provides no guidance to the
110	   publishers of DNSKEYs about how long they must wait before switching
111	   to exclusively using recently published keys for signing records, or
112	   how long they must wait before ceasing publication of a revoked key.
113	   Because of this lack of guidance, zone publishers may derive
114	   incorrect assumptions about safe usage of the RFC5011 DNSKEY
115	   advertising, rolling and revocation process.  This document describes
116	   the minimum security requirements from a publisher's point of view
117	   and is intended to complement the guidance offered in RFC5011 (which
118	   is written to provide timing guidance solely to a Validating
119	   Resolver's point of view).

121	   To explain the RFC5011 security analysis in this document better,
122	   Section 5 first describes an attack on a zone publisher.  Then in
123	   Section 6.1 we break down each of the timing components that will be
124	   later used to define timing requirements for adding keys in
125	   Section 6.2 and revoking keys in Section 6.3.

127	1.1.  Document History and Motivation

129	   To verify this lack of understanding is wide-spread, the authors
130	   reached out to 5 DNSSEC experts to ask them how long they thought
131	   they must wait before signing a zone exclusively with a new KSK
132	   [RFC4033] that was being introduced according to the 5011 process.
133	   All 5 experts answered with an insecure value, and we determined that
134	   this lack of mathematical understanding might cause security concerns
135	   in deployment.  We hope that this companion document to RFC5011 will
136	   rectify this understanding and provide better guidance to zone
137	   publishers that wish to make use of the RFC5011 rollover process.

139	1.2.  Safely Rolling the Root Zone's KSK in 2017/2018

141	   One important note about ICANN's (currently in process) 2017/2018 KSK
142	   rollover plan for the root zone: the timing values chosen for rolling
143	   the KSK in the root zone appear completely safe, and are not affected
144	   by the timing concerns introduced by this draft

146	1.3.  Requirements notation

148	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
149	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
150	   document are to be interpreted as described in [RFC2119].

152	2.  Background

154	   The RFC5011 process describes a process by which a RFC5011 Resolver
155	   may accept a newly published KSK as a trust anchor for validating
156	   future DNSSEC signed records.  It also describes the process for
157	   publicly revoking a published KSK.  This document augments that
158	   information with additional constraints, from the SEP publisher's
159	   points of view.  Note that this document does not define any other
160	   operational guidance or recommendations about the RFC5011 process and
161	   restricts itself to solely the security and operational ramifications
162	   of switching to exclusively using recently added keys or removing
163	   revoked keys too soon.

165	   Failure of a DNSKEY publisher to follow the minimum recommendations
166	   associated with this draft can result in potential denial-of-service
167	   attack opportunities against validating resolvers.  Failure of a
168	   DNSKEY publisher to publish a revoked key for a long enough period of
169	   time may result in RFC5011 Resolvers leaving that key in their trust
170	   anchor storage beyond the key's expected lifetime.

172	3.  Terminology

174	   SEP Publisher  The entity responsible for publishing a DNSKEY (with
175	      the Secure Entry Point (SEP) bit set) that can be used as a trust
176	      anchor.

178	   Zone Signer  The owner of a zone intending to publish a new Key-
179	      Signing-Key (KSK) that may become a trust anchor for validators
180	      following the RFC5011 process.

182	   RFC5011 Resolver  A DNSSEC Resolver that is using the RFC5011
183	      processes to track and update trust anchors.

185	   Attacker  An entity intent on foiling the RFC5011 Resolver's ability
186	      to successfully adopt the Zone Signer's new DNSKEY as a new trust
187	      anchor or to prevent the RFC5011 Resolver from removing an old
188	      DNSKEY from its list of trust anchors.

190	   sigExpirationTime  The amount of time between the DNSKEY RRSIG's
191	      Signature Inception field and the Signature Expiration field.

193	   Also see Section 2 of [RFC4033] and [RFC7719] for additional
194	   terminology.

196	4.  Timing Associated with RFC5011 Processing

198	   These sections define a high-level overview of [RFC5011] processing.
199	   These steps are not sufficient for proper RFC5011 implementation, but
200	   provide enough background for the reader to follow the discussion in
201	   this document.  Readers need to fully understand [RFC5011] as well to
202	   fully comprehend the content and importance of this document.

204	4.1.  Timing Associated with Publication

206	   RFC5011's process of safely publishing a new DNSKEY and then assuming
207	   RFC5011 Resolvers have adopted it for trust falls into a number of
208	   high-level steps to be performed by the SEP Publisher.  This document
209	   discusses the following scenario, which the principle way RFC5011 is
210	   currently being used (even though Section 6 of RFC5011 suggests
211	   having a stand-by key available):

213	   1.  Publish a new DNSKEY in a zone, but continue to sign the zone
214	       with the old one.

216	   2.  Wait a period of time.

218	   3.  Begin to exclusively use recently published DNSKEYs to sign the
219	       appropriate resource records.

221	   This document discusses the time required to wait during step 2 of
222	   the above process.  Some interpretations of RFC5011 have erroneously
223	   determined that the wait time is equal to RFC5011's "hold down time".
224	   Section 5 describes an attack based on this (common) erroneous
225	   belief, which can result in a denial of service attack against the
226	   zone.

228	4.2.  Timing Associated with Revocation

230	   RFC5011's process of advertising that an old key is to be revoked
231	   from RFC5011 Resolvers falls into a number of high-level steps:

233	   1.  Set the revoke bit on the DNSKEY to be revoked.

235	   2.  Sign the revoked DNSKEY with itself.

237	   3.  Wait a period of time.

239	   4.  Remove the revoked key from the zone.

241	   This document discusses the time required to wait in step 3 of the
242	   above process.  Some interpretations of RFC5011 have erroneously
243	   determined that the wait time is equal to RFC5011's "hold down time".
244	   This document describes an attack based on this (common) erroneous
245	   belief, which results in a revoked DNSKEY potentially remaining as a
246	   trust anchor in a RFC5011 Resolver long past its expected usage.

248	5.  Denial of Service Attack Walkthrough

250	   This section serves as an illustrative example of the problem being
251	   discussed in this document.  Note that in order to keep the example
252	   simple enough to understand, some simplifications were made (such as
253	   by not creating a set of pre-signed RRSIGs and by not using values
254	   that result in the addHoldDownTime not being evenly divisible by the
255	   activeRefresh value); the mathematical formulas in Section 6 are,
256	   however, complete.

258	   If an attacker is able to provide a RFC5011 Resolver with past
259	   responses, such as when it is in-path or able to perform any number
260	   of cache poisoning attacks, the attacker may be able to leave
261	   compliant RFC5011 Resolvers without an appropriate DNSKEY trust
262	   anchor.  This scenario will remain until an administrator manually
263	   fixes the situation.

265	   The time-line below illustrates an example of this situation.

267	5.1.  Enumerated Attack Example

269	   The following example settings are used in the example scenario
270	   within this section:

272	   TTL (all records)  1 day

274	   sigExpirationTime  10 days

276	   Zone resigned every  1 day

278	   Given these settings, the sequence of events in Section 5.1.1 depicts
279	   how a SEP Publisher that waits for only the RFC5011 hold time timer
280	   length of 30 days subjects its users to a potential Denial of Service
281	   attack.  The timing schedule listed below is based on a SEP Publisher
282	   publishing a new Key Signing Key (KSK), with the intent that it will
283	   later be used as a trust anchor.  We label this publication time as
284	   "T+0".  All numbers in this sequence refer to days before and after
285	   this initial publication event.  Thus, T-1 is the day before the
286	   introduction of the new key, and T+15 is the 15th day after the key
287	   was introduced into the fictitious zone being discussed.

289	   In this dialog, we consider two keys within the example zone:

291	   K_old:  An older KSK and Trust Anchor being replaced.

293	   K_new:  A new KSK being transitioned into active use and expected to
294	      become a Trust Anchor via the RFC5011 automated trust anchor
295	      update process.

297	5.1.1.  Attack Timing Breakdown

299	   The steps shows an attack that foils the adoption of a new DNSKEY by
300	   a 5011 Resolver when the SEP Publisher that starts signing and
301	   publishing with the new DNSKEY too quickly.

303	   T-1  The K_old based RRSIGs are being published by the Zone Signer.
304	      [It may also be signing ZSKs as well, but they are not relevant to
305	      this event so we will not talk further about them; we are only
306	      considering the RRSIGs that cover the DNSKEYs in this document.]
307	      The Attacker queries for, retrieves and caches this DNSKEY set and
308	      corresponding RRSIG signatures.

310	   T+0  The Zone Signer adds K_new to their zone and signs the zone's
311	      key set with K_old.  The RFC5011 Resolver (later to be under
312	      attack) retrieves this new key set and corresponding RRSIGs and
313	      notices the publication of K_new.  The RFC5011 Resolver starts the
314	      (30-day) hold-down timer for K_new.  [Note that in a more real-
315	      world scenario there will likely be a further delay between the
316	      point where the Zone Signer publishes a new RRSIG and the RFC5011
317	      Resolver notices its publication; though not shown in this
318	      example, this delay is accounted for in the equation in Section 6
319	      below]

321	   T+5  The RFC5011 Resolver queries for the zone's keyset per the
322	      RFC5011 Active Refresh schedule, discussed in Section 2.3 of
323	      RFC5011.  Instead of receiving the intended published keyset, the
324	      Attacker successfully replays the keyset and associated signatures
325	      recorded at T-1 to the victim RFC5011 Resolver.  Because the
326	      signature lifetime is 10 days (in this example), the replayed
327	      signature and keyset is accepted as valid (being only 6 days old,
328	      which is less than sigExpirationTime) and the RFC5011 Resolver
329	      cancels the (30-day) hold-down timer for K_new, per the RFC5011
330	      algorithm.

332	   T+10  The RFC5011 Resolver queries for the zone's keyset and
333	      discovers a signed keyset that includes K_new (again), and is
334	      signed by K_old.  Note: the attacker is unable to replay the
335	      records cached at T-1, because the signatures have now expired.
336	      Thus at T+10, the RFC5011 Resolver starts (anew) the hold-timer
337	      for K_new.

339	   T+11 through T+29  The RFC5011 Resolver continues checking the zone's
340	      key set at the prescribed regular intervals.  During this period,
341	      the attacker can no longer replay traffic to their benefit.

343	   T+30  The Zone Signer knows that this is the first time at which some
344	      validators might accept K_new as a new trust anchor, since the
345	      hold-down timer of a RFC5011 Resolver not under attack that had
346	      queried and retrieved K_new at T+0 would now have reached 30 days.
347	      However, the hold-down timer of our attacked RFC5011 Resolver is
348	      only at 20 days.

350	   T+35  The Zone Signer (mistakenly) believes that all validators
351	      following the Active Refresh schedule (Section 2.3 of RFC5011)
352	      should have accepted K_new as a the new trust anchor (since the
353	      hold down time (30 days) + the query interval [which is just 1/2
354	      the signature validity period in this example] would have passed).
355	      However, the hold-down timer of our attacked RFC5011 Resolver is
356	      only at 25 days (T+35 minus T+10); thus the RFC5011 Resolver won't
357	      consider it a valid trust anchor addition yet, as the required 30
358	      days have not yet elapsed.

360	   T+36  The Zone Signer, believing K_new is safe to use, switches their
361	      active signing KSK to K_new and publishes a new RRSIG, signed with
362	      (only) K_new, covering the DNSKEY set.  Non-attacked RFC5011
363	      validators, with a hold-down timer of at least 30 days, would have
364	      accepted K_new into their set of trusted keys.  But, because our
365	      attacked RFC5011 Resolver now has a hold-down timer for K_new of
366	      only 26 days, it failed to ever accept K_new as a trust anchor.
367	      Since K_old is no longer being used to sign the zone's DNSKEYs,
368	      all the DNSKEY records from the zone will be treated as invalid.
369	      Subsequently, all of the records in the DNS tree below the zone's
370	      apex will be deemed invalid by DNSSEC.

372	6.  Minimum RFC5011 Timing Requirements

374	   This section defines the minimum timing requirements for making
375	   exclusive use of newly added DNSKEYs and timing requirements for
376	   ceasing the publication of DNSKEYs to be revoked.  We break our
377	   timing solution requirements into two primary components: the
378	   mathematically-based security analysis of the RFC5011 publication
379	   process itself, and an extension of this that takes operational
380	   realities into account that further affect the recommended timings.

382	   First, we define the term components used in all equations in
383	   Section 6.1.

385	6.1.  Equation Components

387	6.1.1.  addHoldDownTime

389	   The addHoldDownTime is defined in Section 2.4.1 of [RFC5011] as:

391	       The add hold-down time is 30 days or the expiration time of the
392	       original TTL of the first trust point DNSKEY RRSet that contained
393	       the new key, whichever is greater.  This ensures that at least
394	       two validated DNSKEY RRSets that contain the new key MUST be seen
395	       by the resolver prior to the key's acceptance.

397	6.1.2.  lastSigExpirationTime

399	   The latest value (i.e. the future most date and time) of any RRSig
400	   Signature Expiration field covering any DNSKEY RRSet containing only
401	   the old trust anchor(s) that are being superseded.  Note that for
402	   organizations pre-creating signatures this time may be fairly far in
403	   the future unless they can be significantly assured that none of
404	   their pre-generated signatures can be replayed at a later date.

406	6.1.3.  sigExpirationTime

408	   The amount of time between the DNSKEY RRSIG's Signature Inception
409	   field and the Signature Expiration field.

411	6.1.4.  sigExpirationTimeRemaining

413	   sigExpirationTimeRemaining is defined in Section 3.

415	6.1.5.  activeRefresh

417	   activeRefresh time is defined by RFC5011 by

419	     A resolver that has been configured for an automatic update
420	     of keys from a particular trust point MUST query that trust
421	     point (e.g., do a lookup for the DNSKEY RRSet and related
422	     RRSIG records) no less often than the lesser of 15 days, half
423	     the original TTL for the DNSKEY RRSet, or half the RRSIG
424	     expiration interval and no more often than once per hour.

426	   This translates to:

428	    activeRefresh = MAX(1 hour,
429	                        MIN(sigExpirationTime / 2,
430	                            MAX(TTL of K_old DNSKEY RRSet) / 2,
431	                            15 days)
432	                        )

434	6.1.6.  timingSafetyMargin

436	   Mentally, it is easy to assume that the period of time required for
437	   SEP publishers to wait after making changes to SEP marked DNSKEY sets
438	   will be entirely based off the length of the addHoldDownTime.
439	   Unfortunately, analysis shows that both the design of the RFC5011
440	   protocol and in operational realities in deploying it require waiting
441	   and additional period of time longer.  In subsections Section 6.1.6.1
442	   to Section 6.1.6.3 below, we discuss three sources of additional
443	   delay.  In the end, we will pick the largest of these delays as the
444	   minimum additional time that the SEP Publisher must wait in our final
445	   timingSafetyMargin value, which we define in Section 6.1.6.4.

447	6.1.6.1.  activeRefreshOffset

449	   Security analysis of the timing associated with the query rate of
450	   RFC5011 Resolvers shows that it may not perfectly align with the
451	   addHoldDownTime when the addHoldDownTime is not evenly divisible by
452	   the activeRefresh time.  Consider the example of a zone with an
453	   activeRefresh period of 7 days.  If an associated RFC5011 Resolver
454	   started it's holdDown timer just after the SEP published a new DNSKEY
455	   (at time T), the resolver would send checking queries at T+7, T+14,
456	   T+21 and T+28 Days and will finally accept it at T+35 days, which is
457	   5 days longer than the 30-day addHoldDownTime.

459	   The activeRefreshOffset term defines this time difference and
460	   becomes:

462	    activeRefreshOffset = addHoldDownTime % activeRefresh

464	   The % symbol denotes the mathematical mod operator (calculating the
465	   remainder in a division problem).  This will frequently be zero, but
466	   can be nearly as large as activeRefresh itself.

468	6.1.6.2.  clockskewDriftMargin

470	   Even small clock drifts can have negative impacts upon the timing of
471	   the RFC5011 Resolver's measurements.  Consider the simplest case
472	   where the RFC5011 Resolver's clock shifts over time to be 2 seconds
473	   slower near the end of the RFC5011 Resolver's addHoldDownTime period.
474	   I.E., if the RFC5011 Resolver first noticed a new DNSKEY at:

476	             firstSeen = sigExpirationTime + activeRefresh + 1 second

478	   The effect of 2 second clock drift between the SEP Publisher and the
479	   RFC5011 Resolver may result in the RFC5011 Resolver querying again
480	   at:

482	             justBefore = sigExpirationTime + addHoldDownTime +
483	                          activeRefresh + 1 second - 2 seconds

485	             which becomes:

487	             justBefore = sigExpirationTime + addHoldDownTime +
488	                          activeRefresh - 1 second

490	   The net effect is the addHoldDownTime will not have been reached from
491	   the perspective of the RFC5011 Resolver, but it will have been
492	   reached from the perspective of the SEP Publisher.  The net effect is
493	   it may take one additional activeRefresh period longer for this
494	   RFC5011 Resolver to accept the new key (at sigExpirationTime +
495	   addHoldDownTime + 2 * activeRefresh - 1 second).

497	   We note that even the smallest clockskew errors can require waiting
498	   an additional activeRefresh period, and thus define the
499	   clockskewDriftMargin as:

501	       clockskewDriftMargin = activeRefresh

503	6.1.6.3.  retryDriftMargin

505	   Drift associated with a lost transmission and an accompanying re-
506	   transmission (see Section 2.3 of [RFC5011]) will cause RFC5011
507	   Resolvers to also change the timing associated with query times such
508	   that it becomes impossible to predict, from the perspective of the
509	   PEP Publisher, when the final important measurement query will
510	   arrive.  Similarly, any software that restarts/reboots without saving
511	   next-query timing state may also commence with a new random starting
512	   time.  Thus, an additional activeRefresh is needed to handle both
513	   these cases as well.

515	             retryDriftMargin = activeRefresh

517	   Note that we account for additional time associated with cumulative
518	   multiple retries, especially under high-loss conditions, in
519	   Section 6.1.6.4.

521	6.1.6.4.  timingSafetyMargin Value

523	   The activeRefreshOffset, clockskewDriftMargin, and retryDriftMargin
524	   parameters all deal with additional wait-periods that must be
525	   accounted for after analyzing what conditions the client will take
526	   longer than expected to make its last query while waiting for the
527	   addHoldDownTime period to pass.  But these values may be merged into
528	   a single term by waiting the longest of any of them.  We define
529	   timingSafetyMargin as this "worst case" value:

531	        timingSafetyMargin = MAX(activeRefreshOffset,
532	                                 clockskewDriftMargin,
533	                                 retryDriftMargin)

535	        timingSafetyMargin = MAX(addWaitTime % activeRefresh,
536	                                 activeRefresh,
537	                                 activeRefresh)

539	        timingSafetyMargin = activeRefresh

541	6.1.7.  retrySafetyMargin

543	   The retrySafetyMargin is an extra period of time to account for
544	   caching, network delays, dropped packets, and other operational
545	   concerns otherwise beyond the scope of this document.  The value
546	   operators should chose is highly dependent on the deployment
547	   situation associated with their zone.  Note that no value of a
548	   retrySafetyMargin can protect against resolvers that are "down".
549	   None the less, we do offer the following as one method considering
550	   reasonable values to select from.

552	   The following list of variables need to be considered when selecting
553	   an appropriate retrySafetyMargin value:

555	   successRate:  A likely success rate for client queries and retries

557	   numResolvers:  The number of client RFC5011 Resolvers

559	   Note that RFC5011 defines retryTime as:

561	         If the query fails, the resolver MUST repeat the query until
562	         satisfied no more often than once an hour and no less often
563	         than the lesser of 1 day, 10% of the original TTL, or 10% of
564	         the original expiration interval.  That is,
565	         retryTime = MAX (1 hour, MIN (1 day, .1 * origTTL,
566	                                       .1 * expireInterval)).

568	   With the successRate and numResolvers values selected and the
569	   definition of retryTime from RFC5011, one method for determining how
570	   many retryTime intervals to wait in order to reduce the set of
571	   uncompleted servers to 0 assuming normal probability is thus:

573	                         x = (1/(1 - successRate))

575	            retryCountWait = Log_base_x(numResolvers)

577	   To reduce the need for readers to pull out a scientific calculator,
578	   we offer the following lookup table based on successRate and
579	   numResolvers:

581	                          retryCountWait lookup table
582	                        ---------------------------

584	                       Number of client RFC5011 Resolvers (numResolvers)
585	                       -------------------------------------------------
586	                        10,000  100,000 1,000,000 10,000,000 100,000,000
587	                 0.01      917     1146      1375       1604        1833
588	   Probability   0.05      180      225       270        315         360
589	   of Success    0.10       88      110       132        153         175
590	   Per Retry     0.15       57       71        86        100         114
591	   Interval      0.25       33       41        49         57          65
592	   (successRate) 0.50       14       17        20         24          27
593	                 0.90        4        5         6          7           8
594	                 0.95        4        4         5          6           7
595	                 0.99        2        3         3          4           4
596	                 0.999       2        2         2          3           3

598	   Finally, a suggested value of retrySafetyMargin can then be this
599	   retryCountWait number multiplied by the retryTime from RFC5011:

601	                 retrySafetyMargin = retryCountWait * retryTime

603	6.2.  Timing Requirements For Adding a New KSK

605	   Given the defined parameters and analysis from Section 6.1, we can
606	   now create a method for calculating the amount of time to wait until
607	   it is safe to start signing exclusively with a new DNSKEY (especially
608	   useful for writing code involving sleep based timers) in
609	   Section 6.2.1, and define a method for calculating a wall-clock value
610	   after which it is safe to start signing exclusively with a new DNSKEY
611	   (especially useful for writing code based on clock-based event
612	   triggers) in Section 6.2.2.

614	6.2.1.  Wait Timer Based Calculation

616	   Given the attack description in Section 5, the correct minimum length
617	   of time required for the Zone Signer to wait after publishing K_new
618	   but before exclusively using it and newer keys is:

620	      addWaitTime = addHoldDownTime
621	                    + sigExpirationTimeRemaining
622	                    + activeRefresh
623	                    + timingSafetyMargin
624	                    + retrySafetyMargin

626	6.2.1.1.  Fully expanded equation

628	   Given the equation components defined in Section 6.1, the full
629	   expanded equation is:

631	      addWaitTime = addHoldDownTime
632	                    + sigExpirationTimeRemaining
633	                    + 2 * MAX(1 hour,
634	                          MIN(sigExpirationTime / 2,
635	                              MAX(TTL of K_old DNSKEY RRSet) / 2,
636	                              15 days)
637	                          )
638	                    + retrySafetyMargin

640	6.2.2.  Wall-Clock Based Calculation

642	   The equations in Section 6.2.1 are defined based upon how long to
643	   wait from a particular moment in time.  An alternative, but
644	   equivalent, method is to calculate the date and time before which it
645	   is unsafe to use a key for signing.  This calculation thus becomes:

647	      addWallClockTime = lastSigExpirationTime
648	                       + addHoldDownTime
649	                       + activeRefresh
650	                       + timingSafetyMargin
651	                       + retrySafetyMargin

653	   where lastSigExpirationTime is the latest value of any
654	   sigExpirationTime for which RRSIGs were created that could
655	   potentially be replayed.  Fully expanded, this becomes:

657	    addWallClockTime = lastSigExpirationTime
658	                       + addHoldDownTime
659	                       + 2 * MAX(1 hour,
660	                                 MIN(sigExpirationTime / 2,
661	                                     MAX(TTL of K_old DNSKEY RRSet) / 2,
662	                                     15 days)
663	                                 )
664	                       + retrySafetyMargin

666	6.2.3.  Timing Constraint Summary

668	   The important timing constraint introduced by this memo relates to
669	   the last point at which a RFC5011 Resolver may have received a
670	   replayed original DNSKEY set, containing K_old and not K_new.  The
671	   next query of the RFC5011 validator at which K_new will be seen
672	   without the potential for a replay attack will occur after the old
673	   DNSKEY RRSIG's Signature Expriation Time.  Thus, the latest time that
674	   a RFC5011 Validator may begin their hold down timer is an "Active
675	   Refresh" period after the last point that an attacker can replay the
676	   K_old DNSKEY set.  The worst case scenario of this attack is if the
677	   attacker can replay K_old just seconds before the (DNSKEY RRSIG
678	   Signature Validity) field of the last K_old only RRSIG.

680	6.2.4.  Additional Considerations for RFC7583

682	   Note: our notion of addWaitTime is called "Itrp" in Section 3.3.4.1
683	   of [RFC7583].  The equation for Itrp in RFC7583 is insecure as it
684	   does not include the sigExpirationTime listed above.  The Itrp
685	   equation in RFC7583 also does not include the 2*TTL safety margin,
686	   though that is an operational consideration.

688	6.2.5.  Example Scenario Calculations

690	   For the parameters listed in Section 5.1, our resulting addWaitTime
691	   is:

693	     addWaitTime = 30
694	                   + 10
695	                   + 1 / 2
696	                   + 1 / 2          (days)

698	     addWaitTime = 43               (days)

700	   This addWaitTime of 42.5 days is 12.5 days longer than just the hold
701	   down timer, even with the needed retrySafetyMargin value being left
702	   out (which we exclude due to the lack of necessary operational
703	   parameters).

705	6.3.  Timing Requirements For Revoking an Old KSK

707	   This issue affects not just the publication of new DNSKEYs intended
708	   to be used as trust anchors, but also the length of time required to
709	   continuously publish a DNSKEY with the revoke bit set.

711	   Section 6.2.1 defines a method for calculating the amount of time
712	   operators need to wait until it is safe to cease publishing a DNSKEY
713	   (especially useful for writing code involving sleep based timers),
714	   and Section 6.2.2 defines a method for calculating a minimal wall-
715	   clock value after which it is safe to cease publishing a DNSKEY
716	   (especially useful for writing code based on clock-based event
717	   triggers).

719	6.3.1.  Wait Timer Based Calculation

721	   Both of these publication timing requirements are affected by the
722	   attacks described in this document, but with revocation the key is
723	   revoked immediately and the addHoldDown timer does not apply.  Thus
724	   the minimum amount of time that a SEP Publisher must wait before
725	   removing a revoked key from publication is:

727	     remWaitTime = sigExpirationTimeRemaining
728	                   + activeRefresh
729	                   + timingSafetyMargin
730	                   + retrySafetyMargin

732	     remWaitTime = sigExpirationTimeRemaining
733	                   + MAX(1 hour,
734	                         MIN((sigExpirationTime) / 2,
735	                             MAX(TTL of K_old DNSKEY RRSet) / 2,
736	                             15 days))
737	                   + activeRefresh
738	                   + retrySafetyMargin

740	   Note also that adding retryTime intervals to the remWaitTime may be
741	   wise, just as it was for addWaitTime in Section 6.

743	6.3.2.  Wall-Clock Based Calculation

745	   Like before, the above equations are defined based upon how long to
746	   wait from a particular moment in time.  An alternative, but
747	   equivalent, method is to calculate the date and time before which it
748	   is unsafe to cease publishing a revoked key.  This calculation thus
749	   becomes:

751	      remWallClockTime = lastSigExpirationTime
752	                       + activeRefresh
753	                       + timingSafetyMargin
754	                       + retrySafetyMargin

756	      remWallClockTime = lastSigExpirationTime
757	                       + MAX(1 hour,
758	                             MIN((sigExpirationTime) / 2,
759	                                 MAX(TTL of K_old DNSKEY RRSet) / 2,
760	                                 15 days))
761	                       + timingSafetyMargin
762	                       + retrySafetyMargin

764	   where lastSigExpirationTime is the latest value of any
765	   sigExpirationTime for which RRSIGs were created that could
766	   potentially be replayed.  Fully expanded, this becomes:

768	6.3.3.  Additional Considerations for RFC7583

770	   Note that our notion of remWaitTime is called "Irev" in
771	   Section 3.3.4.2 of [RFC7583].  The equation for Irev in RFC7583 is
772	   insecure as it does not include the sigExpirationTime listed above.
773	   The Irev equation in RFC7583 also does not include a safety margin,
774	   though that is an operational consideration.

776	6.3.4.  Example Scenario Calculations

778	   For the parameters listed in Section 5.1, our example:

780	     remwaitTime = 10
781	                   + 1 / 2          (days)

783	     remwaitTime = 10.5             (days)

785	   Note that for the values in this example produce a length shorter
786	   than the recommended 30 days in RFC5011's section 6.6, step 3.  Other
787	   values of sigExpirationTime and the original TTL of the K_old DNSKEY
788	   RRSet, however, can produce values longer than 30 days.

790	   Note that because revocation happens immediately, an attacker has a
791	   much harder job tricking a RFC5011 Resolver into leaving a trust
792	   anchor in place, as the attacker must successfully replay the old
793	   data for every query a RFC5011 Resolver sends, not just one.

795	7.  IANA Considerations

797	   This document contains no IANA considerations.

799	8.  Operational Considerations

801	   A companion document to RFC5011 was expected to be published that
802	   describes the best operational practice considerations from the
803	   perspective of a zone publisher and SEP Publisher.  However, this
804	   companion document has yet to be published.  The authors of this
805	   document hope that it will at some point in the future, as RFC5011
806	   timing can be tricky as we have shown, and a BCP is clearly
807	   warranted.  This document is intended only to fill a single
808	   operational void which, when left misunderstood, can result in
809	   serious security ramifications.  This document does not attempt to
810	   document any other missing operational guidance for zone publishers.

812	9.  Security Considerations

814	   This document, is solely about the security considerations with
815	   respect to the SEP Publisher's ability to advertise new DNSKEYs via
816	   the RFC5011 automated trust anchor update process.  Thus the entire
817	   document is a discussion of Security Considerations when adding or
818	   removing DNSKEYs from trust anchor storage using the RFC5011 process.

820	   For simplicity, this document assumes that the SEP Publisher will use
821	   a consistent RRSIG validity period.  SEP Publishers that vary the
822	   length of RRSIG validity periods will need to adjust the
823	   sigExpirationTime value accordingly so that the equations in
824	   Section 6 and Section 6.3 use a value that coincides with the last
825	   time a replay of older RRSIGs will no longer succeed.

827	10.  Acknowledgements

829	   The authors would like to especially thank to Michael StJohns for his
830	   help and advice and the care and thought he put into RFC5011 itself
831	   and his continued reviews and suggestions for this document.  He also
832	   designed the suggested math behind the suggested retrySafetyMargin
833	   values in Section 6.1.7.

835	   We would also like to thank Bob Harold, Shane Kerr, Matthijs Mekking,
836	   Duane Wessels, Petr Petr Spacek, Ed Lewis, and the dnsop working
837	   group who have assisted with this document.

839	11.  Normative References

841	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
842	              Requirement Levels", BCP 14, RFC 2119,
843	              DOI 10.17487/RFC2119, March 1997, <https://www.rfc-
844	              editor.org/info/rfc2119>.

846	   [RFC4033]  Arends, R., Austein, R., Larson, M., Massey, D., and S.
847	              Rose, "DNS Security Introduction and Requirements",
848	              RFC 4033, DOI 10.17487/RFC4033, March 2005,
849	              <https://www.rfc-editor.org/info/rfc4033>.

851	   [RFC5011]  StJohns, M., "Automated Updates of DNS Security (DNSSEC)
852	              Trust Anchors", STD 74, RFC 5011, DOI 10.17487/RFC5011,
853	              September 2007, <https://www.rfc-editor.org/info/rfc5011>.

855	   [RFC7583]  Morris, S., Ihren, J., Dickinson, J., and W. Mekking,
856	              "DNSSEC Key Rollover Timing Considerations", RFC 7583,
857	              DOI 10.17487/RFC7583, October 2015, <https://www.rfc-
858	              editor.org/info/rfc7583>.

860	   [RFC7719]  Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
861	              Terminology", RFC 7719, DOI 10.17487/RFC7719, December
862	              2015, <https://www.rfc-editor.org/info/rfc7719>.

864	Appendix A.  Real World Example: The 2017 Root KSK Key Roll

866	   In 2017 and 2018, ICANN expects to (or has, depending on when you're
867	   reading this) roll the key signing key (KSK) for the root zone.  The
868	   relevant parameters associated with the root zone at the time of this
869	   writing is as follows:

871	         addHoldDownTime:                      30 days
872	         Old DNSKEY sigExpirationTime:         21 days
873	         Old DNSKEY TTL:                        2 days

875	   Thus, sticking this information into the equation in
876	   Section Section 6 yields (in days from publication time):

878	     addWaitTime = 30
879	                   + 21
880	                   + MAX(1 hour,
881	                         MIN(21 / 2,     # activeRefresh
882	                             MAX(2) / 2,
883	                             15 days),
884	                         )
885	                   + activeRefresh

887	     addWaitTime = 30 + 21 + 1 + 1

889	     addWaitTime = 53 days

891	   Also note that we exclude the retrySafetyMargin value, which is
892	   calculated based on the expected client deployment size.

894	   Thus, ICANN must wait a minimum of 52 days before switching to the
895	   newly published KSK (and 26 days before removing the old revoked key
896	   once it is published as revoked).  ICANN's current plans involve
897	   waiting over 3 months before using the new KEY and 69 days before
898	   removing the old, revoked key.  Thus, their current rollover plans
899	   are sufficiently secure from the attack discussed in this memo.

901	Authors' Addresses

903	   Wes Hardaker
904	   USC/ISI
905	   P.O. Box 382
906	   Davis, CA  95617
907	   US

909	   Email: ietf@hardakers.net

911	   Warren Kumari
912	   Google
913	   1600 Amphitheatre Parkway
914	   Mountain View, CA  94043
915	   US

917	   Email: warren@kumari.net