idnits 2.17.1 

draft-ietf-dnsop-rfc5011-security-considerations-13.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The abstract seems to indicate that this document updates RFC5011, but
     the header doesn't have an 'Updates:' line to match this.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 16, 2018) is 2110 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 7719 (Obsoleted by RFC 8499)


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	dnsop                                                        W. Hardaker
3	Internet-Draft                                                   USC/ISI
4	Updates: 7583 (if approved)                                    W. Kumari
5	Intended status: Informational                                    Google
6	Expires: January 17, 2019                                  July 16, 2018

8	             Security Considerations for RFC5011 Publishers
9	          draft-ietf-dnsop-rfc5011-security-considerations-13

11	Abstract

13	   This document extends the RFC5011 rollover strategy with timing
14	   advice that must be followed by the publisher in order to maintain
15	   security.  Specifically, this document describes the math behind the
16	   minimum time-length that a DNS zone publisher must wait before
17	   signing exclusively with recently added DNSKEYs.  This document also
18	   describes the minimum time-length that a DNS zone publisher must wait
19	   after publishing a revoked DNSKEY before assuming that all active
20	   RFC5011 resolvers should have seen the revocation-marked key and
21	   removed it from their list of trust anchors.

23	   This document contains much math and complicated equations, but the
24	   summary is that the key rollover / revocation time is much longer
25	   than intuition would suggest.  This document updates RFC7583 by
26	   adding an additional delays (sigExpirationTime and
27	   timingSafetyMargin).

29	   If you are not both publishing a DNSSEC DNSKEY, and using RFC5011 to
30	   advertise this DNSKEY as a new Secure Entry Point key for use as a
31	   trust anchor, you probably don't need to read this document.

33	Status of This Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at https://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on January 17, 2019.

50	Copyright Notice

52	   Copyright (c) 2018 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (https://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
68	     1.1.  Document History and Motivation . . . . . . . . . . . . .   3
69	     1.2.  Safely Rolling the Root Zone's KSK in 2017/2018 . . . . .   4
70	     1.3.  Requirements notation . . . . . . . . . . . . . . . . . .   4
71	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   4
72	   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
73	   4.  Timing Associated with RFC5011 Processing . . . . . . . . . .   5
74	     4.1.  Timing Associated with Publication  . . . . . . . . . . .   5
75	     4.2.  Timing Associated with Revocation . . . . . . . . . . . .   5
76	   5.  Denial of Service Attack Walkthrough  . . . . . . . . . . . .   6
77	     5.1.  Enumerated Attack Example . . . . . . . . . . . . . . . .   6
78	       5.1.1.  Attack Timing Breakdown . . . . . . . . . . . . . . .   7
79	   6.  Minimum RFC5011 Timing Requirements . . . . . . . . . . . . .   8
80	     6.1.  Equation Components . . . . . . . . . . . . . . . . . . .   9
81	       6.1.1.  addHoldDownTime . . . . . . . . . . . . . . . . . . .   9
82	       6.1.2.  lastSigExpirationTime . . . . . . . . . . . . . . . .   9
83	       6.1.3.  sigExpirationTime . . . . . . . . . . . . . . . . . .   9
84	       6.1.4.  sigExpirationTimeRemaining  . . . . . . . . . . . . .   9
85	       6.1.5.  activeRefresh . . . . . . . . . . . . . . . . . . . .   9
86	       6.1.6.  timingSafetyMargin  . . . . . . . . . . . . . . . . .  10
87	       6.1.7.  retrySafetyMargin . . . . . . . . . . . . . . . . . .  12
88	     6.2.  Timing Requirements For Adding a New KSK  . . . . . . . .  13
89	       6.2.1.  Wait Timer Based Calculation  . . . . . . . . . . . .  14
90	       6.2.2.  Wall-Clock Based Calculation  . . . . . . . . . . . .  14
91	       6.2.3.  Timing Constraint Summary . . . . . . . . . . . . . .  15
92	       6.2.4.  Additional Considerations for RFC7583 . . . . . . . .  15
93	       6.2.5.  Example Scenario Calculations . . . . . . . . . . . .  15
94	     6.3.  Timing Requirements For Revoking an Old KSK . . . . . . .  16
95	       6.3.1.  Wait Timer Based Calculation  . . . . . . . . . . . .  16
96	       6.3.2.  Wall-Clock Based Calculation  . . . . . . . . . . . .  16
97	       6.3.3.  Additional Considerations for RFC7583 . . . . . . . .  17
98	       6.3.4.  Example Scenario Calculations . . . . . . . . . . . .  17
99	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  18
100	   8.  Operational Considerations  . . . . . . . . . . . . . . . . .  18
101	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  18
102	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  18
103	   11. Normative References  . . . . . . . . . . . . . . . . . . . .  19
104	   Appendix A.  Real World Example: The 2017 Root KSK Key Roll . . .  19
105	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20

107	1.  Introduction

109	   [RFC5011] defines a mechanism by which DNSSEC validators can update
110	   their list of trust anchors when they've seen a new key published in
111	   a zone or revoke a properly marked key from a trust anchor list.
112	   However, RFC5011 [intentionally] provides no guidance to the
113	   publishers of DNSKEYs about how long they must wait before switching
114	   to exclusively using recently published keys for signing records, or
115	   how long they must wait before ceasing publication of a revoked key.
116	   Because of this lack of guidance, zone publishers may arrive at
117	   incorrect assumptions about safe usage of the RFC5011 DNSKEY
118	   advertising, rolling and revocation process.  This document describes
119	   the minimum security requirements from a publisher's point of view
120	   and is intended to complement the guidance offered in RFC5011 (which
121	   is written to provide timing guidance solely to a Validating
122	   Resolver's point of view).

124	   To explain the RFC5011 security analysis in this document better,
125	   Section 5 first describes an attack on a zone publisher.  Then in
126	   Section 6.1 we break down each of the timing components that will be
127	   later used to define timing requirements for adding keys in
128	   Section 6.2 and revoking keys in Section 6.3.

130	1.1.  Document History and Motivation

132	   To confirm that this lack of understanding is wide-spread, the
133	   authors reached out to 5 DNSSEC experts to ask them how long they
134	   thought they must wait before signing a zone exclusively with a new
135	   KSK [RFC4033] that was being introduced according to the 5011
136	   process.  All 5 experts answered with an insecure value, and we
137	   determined that this lack of understanding might cause security
138	   concerns in deployment.  We hope that this companion document to
139	   RFC5011 will rectify this and provide better guidance to zone
140	   publishers who wish to make use of the RFC5011 rollover process.

142	1.2.  Safely Rolling the Root Zone's KSK in 2017/2018

144	   One important note about ICANN's (currently in process) 2017/2018 KSK
145	   rollover plan for the root zone: the timing values chosen for rolling
146	   the KSK in the root zone appear completely safe, and are not affected
147	   by the timing concerns discussed in this draft.

149	1.3.  Requirements notation

151	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
152	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
153	   document are to be interpreted as described in [RFC2119].

155	2.  Background

157	   RFC5011 describes a process by which an RFC5011 Resolver may accept a
158	   newly published KSK as a trust anchor for validating future DNSSEC
159	   signed records.  It also describes the process for publicly revoking
160	   a published KSK.  This document augments that information with
161	   additional constraints, from the SEP publisher's points of view.
162	   Note that this document does not define any other operational
163	   guidance or recommendations about the RFC5011 process and restricts
164	   itself solely to the security and operational ramifications of
165	   prematurely switching to exclusively using recently added keys or
166	   removing revoked keys.

168	   Failure of a DNSKEY publisher to follow the minimum recommendations
169	   associated with this draft can result in potential denial-of-service
170	   attack opportunities against validating resolvers.  Failure of a
171	   DNSKEY publisher to publish a revoked key for a long enough period of
172	   time may result in RFC5011 Resolvers leaving that key in their trust
173	   anchor storage beyond the key's expected lifetime.

175	3.  Terminology

177	   SEP Publisher  The entity responsible for publishing a DNSKEY (with
178	      the Secure Entry Point (SEP) bit set) that can be used as a trust
179	      anchor.

181	   Zone Signer  The owner of a zone intending to publish a new Key-
182	      Signing-Key (KSK) that may become a trust anchor for validators
183	      following the RFC5011 process.

185	   RFC5011 Resolver  A DNSSEC Resolver that is using the RFC5011
186	      processes to track and update trust anchors.

188	   Attacker  An entity intent on foiling the RFC5011 Resolver's ability
189	      to successfully adopt the Zone Signer's new DNSKEY as a new trust
190	      anchor or to prevent the RFC5011 Resolver from removing an old
191	      DNSKEY from its list of trust anchors.

193	   sigExpirationTime  The amount of time between the DNSKEY RRSIG's
194	      Signature Inception field and the Signature Expiration field.

196	   Also see Section 2 of [RFC4033] and [RFC7719] for additional
197	   terminology.

199	4.  Timing Associated with RFC5011 Processing

201	   These subsections below give a high-level overview of [RFC5011]
202	   processing.  This description is not sufficient for fully
203	   understanding RFC5011, but provide enough background for the reader
204	   to follow the discussion in this document.  Readers need to fully
205	   understand [RFC5011] as well to fully comprehend the content and
206	   importance of this document.

208	4.1.  Timing Associated with Publication

210	   RFC5011's process of safely publishing a new DNSKEY and then assuming
211	   RFC5011 Resolvers have adopted it for trust can be broken down into a
212	   number of high-level steps to be performed by the SEP Publisher.
213	   This document discusses the following scenario, which the principal
214	   way RFC5011 is currently being used (even though Section 6 of RFC5011
215	   suggests having a stand-by key available):

217	   1.  Publish a new DNSKEY in a zone, but continue to sign the zone
218	       with the old one.

220	   2.  Wait a period of time.

222	   3.  Begin to exclusively use recently published DNSKEYs to sign the
223	       appropriate resource records.

225	   This document discusses the time required to wait during step 2 of
226	   the above process.  Some interpretations of RFC5011 have erroneously
227	   determined that the wait time is equal to RFC5011's "hold down time".
228	   Section 5 describes an attack based on this (common) erroneous
229	   belief, which can result in a denial of service attack against the
230	   zone.

232	4.2.  Timing Associated with Revocation

234	   RFC5011's process of advertising that an old key is to be revoked
235	   from RFC5011 Resolvers falls into a number of high-level steps:

237	   1.  Set the revoke bit on the DNSKEY to be revoked.

239	   2.  Sign the revoked DNSKEY with itself.

241	   3.  Wait a period of time.

243	   4.  Remove the revoked key from the zone.

245	   This document discusses the time required to wait in step 3 of the
246	   above process.  Some interpretations of RFC5011 have erroneously
247	   determined that the wait time is equal to RFC5011's "hold down time".
248	   This document describes an attack based on this (common) erroneous
249	   belief, which results in a revoked DNSKEY potentially remaining as a
250	   trust anchor in a RFC5011 Resolver long past its expected usage.

252	5.  Denial of Service Attack Walkthrough

254	   This section serves as an illustrative example of the problem being
255	   discussed in this document.  Note that in order to keep the example
256	   simple enough to understand, some simplifications were made (such as
257	   by not creating a set of pre-signed RRSIGs and by not using values
258	   that result in the addHoldDownTime not being evenly divisible by the
259	   activeRefresh value); the mathematical formulas in Section 6 are,
260	   however, complete.

262	   If an attacker is able to provide a RFC5011 Resolver with past
263	   responses, such as when it is on-path or able to perform any number
264	   of cache poisoning attacks, the attacker may be able to leave
265	   compliant RFC5011 Resolvers without an appropriate DNSKEY trust
266	   anchor.  This scenario will remain until an administrator manually
267	   fixes the situation.

269	   The time-line below illustrates an example of this situation.

271	5.1.  Enumerated Attack Example

273	   The following settings are used in the example scenario within this
274	   section:

276	   TTL (all records)  1 day

278	   sigExpirationTime  10 days

280	   Zone resigned every  1 day

282	   Given these settings, the sequence of events in Section 5.1.1 depicts
283	   how a SEP Publisher that waits for only the RFC5011 hold time timer
284	   length of 30 days subjects its users to a potential Denial of Service
285	   attack.  The timeline below is based on a SEP Publisher publishing a
286	   new Key Signing Key (KSK), with the intent that it will later be used
287	   as a trust anchor.  We label this publication time as "T+0".  All
288	   numbers in this timeline refer to days before and after this initial
289	   publication event.  Thus, T-1 is the day before the introduction of
290	   the new key, and T+15 is the 15th day after the key was introduced
291	   into the example zone being discussed.

293	   In this exposition, we consider two keys within the example zone:

295	   K_old:  An older KSK and Trust Anchor being replaced.

297	   K_new:  A new KSK being transitioned into active use and expected to
298	      become a Trust Anchor via the RFC5011 automated trust anchor
299	      update process.

301	5.1.1.  Attack Timing Breakdown

303	   Below we examine an attack that foils the adoption of a new DNSKEY by
304	   a 5011 Resolver when the SEP Publisher that starts signing and
305	   publishing with the new DNSKEY too quickly.

307	   T-1  The K_old based RRSIGs are being published by the Zone Signer.
308	      [It may also be signing ZSKs as well, but they are not relevant to
309	      this event so we will not talk further about them; we are only
310	      considering the RRSIGs that cover the DNSKEYs in this document.]
311	      The Attacker queries for, retrieves and caches this DNSKEY set and
312	      corresponding RRSIG signatures.

314	   T+0  The Zone Signer adds K_new to their zone and signs the zone's
315	      key set with K_old.  The RFC5011 Resolver (later to be under
316	      attack) retrieves this new key set and corresponding RRSIGs and
317	      notices the publication of K_new.  The RFC5011 Resolver starts the
318	      (30-day) hold-down timer for K_new.  [Note that in a more real-
319	      world scenario there will likely be a further delay between the
320	      point where the Zone Signer publishes a new RRSIG and the RFC5011
321	      Resolver notices its publication; though not shown in this
322	      example, this delay is accounted for in the equation in Section 6
323	      below]

325	   T+5  The RFC5011 Resolver queries for the zone's keyset per the
326	      RFC5011 Active Refresh schedule, discussed in Section 2.3 of
327	      RFC5011.  Instead of receiving the intended published keyset, the
328	      Attacker successfully replays the keyset and associated signatures
329	      recorded at T-1 to the victim RFC5011 Resolver.  Because the
330	      signature lifetime is 10 days (in this example), the replayed
331	      signature and keyset is accepted as valid (being only 6 days old,
332	      which is less than sigExpirationTime) and the RFC5011 Resolver
333	      cancels the (30-day) hold-down timer for K_new, per the RFC5011
334	      algorithm.

336	   T+10  The RFC5011 Resolver queries for the zone's keyset and
337	      discovers a signed keyset that includes K_new (again), and is
338	      signed by K_old.  Note: the attacker is unable to replay the
339	      records cached at T-1, because the signatures have now expired.
340	      Thus at T+10, the RFC5011 Resolver starts (anew) the hold-timer
341	      for K_new.

343	   T+11 through T+29  The RFC5011 Resolver continues checking the zone's
344	      key set at the prescribed regular intervals.  During this period,
345	      the attacker can no longer replay traffic to their benefit.

347	   T+30  The Zone Signer knows that this is the first time at which some
348	      validators might accept K_new as a new trust anchor, since the
349	      hold-down timer of a RFC5011 Resolver not under attack that had
350	      queried and retrieved K_new at T+0 would now have reached 30 days.
351	      However, the hold-down timer of our attacked RFC5011 Resolver is
352	      only at 20 days.

354	   T+35  The Zone Signer (mistakenly) believes that all validators
355	      following the Active Refresh schedule (Section 2.3 of RFC5011)
356	      should have accepted K_new as a the new trust anchor (since the
357	      hold down time (30 days) + the query interval [which is just 1/2
358	      the signature validity period in this example] would have passed).
359	      However, the hold-down timer of our attacked RFC5011 Resolver is
360	      only at 25 days (T+35 minus T+10); thus the RFC5011 Resolver won't
361	      consider it a valid trust anchor addition yet, as the required 30
362	      days have not yet elapsed.

364	   T+36  The Zone Signer, believing K_new is safe to use, switches their
365	      active signing KSK to K_new and publishes a new RRSIG, signed with
366	      (only) K_new, covering the DNSKEY set.  Non-attacked RFC5011
367	      validators, with a hold-down timer of at least 30 days, would have
368	      accepted K_new into their set of trusted keys.  But, because our
369	      attacked RFC5011 Resolver now has a hold-down timer for K_new of
370	      only 26 days, it failed to ever accept K_new as a trust anchor.
371	      Since K_old is no longer being used to sign the zone's DNSKEYs,
372	      all the DNSKEY records from the zone will be treated as invalid.
373	      Subsequently, all of the records in the DNS tree below the zone's
374	      apex will be deemed invalid by DNSSEC.

376	6.  Minimum RFC5011 Timing Requirements

378	   This section defines the minimum timing requirements for making
379	   exclusive use of newly added DNSKEYs and timing requirements for
380	   ceasing the publication of DNSKEYs to be revoked.  We break our
381	   timing solution requirements into two primary components: the
382	   mathematically-based security analysis of the RFC5011 publication
383	   process itself, and an extension of this that takes operational
384	   realities into account that further affect the recommended timings.

386	   First, we define the component terms used in all equations in
387	   Section 6.1.

389	6.1.  Equation Components

391	6.1.1.  addHoldDownTime

393	   The addHoldDownTime is defined in Section 2.4.1 of [RFC5011] as:

395	       The add hold-down time is 30 days or the expiration time of the
396	       original TTL of the first trust point DNSKEY RRSet that contained
397	       the new key, whichever is greater.  This ensures that at least
398	       two validated DNSKEY RRSets that contain the new key MUST be seen
399	       by the resolver prior to the key's acceptance.

401	6.1.2.  lastSigExpirationTime

403	   The latest value (i.e. the future most date and time) of any RRSig
404	   Signature Expiration field covering any DNSKEY RRSet containing only
405	   the old trust anchor(s) that are being superseded.  Note that for
406	   organizations pre-creating signatures this time may be fairly far in
407	   the future unless they can be significantly assured that none of
408	   their pre-generated signatures can be replayed at a later date.

410	6.1.3.  sigExpirationTime

412	   The amount of time between the DNSKEY RRSIG's Signature Inception
413	   field and the Signature Expiration field.

415	6.1.4.  sigExpirationTimeRemaining

417	   sigExpirationTimeRemaining is defined in Section 3.

419	6.1.5.  activeRefresh

421	   activeRefresh time is defined by RFC5011 by

423	     A resolver that has been configured for an automatic update
424	     of keys from a particular trust point MUST query that trust
425	     point (e.g., do a lookup for the DNSKEY RRSet and related
426	     RRSIG records) no less often than the lesser of 15 days, half
427	     the original TTL for the DNSKEY RRSet, or half the RRSIG
428	     expiration interval and no more often than once per hour.

430	   This translates to:

432	    activeRefresh = MAX(1 hour,
433	                        MIN(sigExpirationTime / 2,
434	                            MAX(TTL of K_old DNSKEY RRSet) / 2,
435	                            15 days)
436	                        )

438	6.1.6.  timingSafetyMargin

440	   Mentally, it is easy to assume that the period of time required for
441	   SEP publishers to wait after making changes to SEP marked DNSKEY sets
442	   will be entirely based on the length of the addHoldDownTime.
443	   Unfortunately, analysis shows that both the design of the RFC5011
444	   protocol an the operational realities in deploying it require waiting
445	   and additional period of time longer.  In subsections Section 6.1.6.1
446	   to Section 6.1.6.3 below, we discuss three sources of additional
447	   delay.  In the end, we will pick the largest of these delays as the
448	   minimum additional time that the SEP Publisher must wait in our final
449	   timingSafetyMargin value, which we define in Section 6.1.6.4.

451	6.1.6.1.  activeRefreshOffset

453	   A security analysis of the timing associated with the query rate of
454	   RFC5011 Resolvers shows that it may not perfectly align with the
455	   addHoldDownTime when the addHoldDownTime is not evenly divisible by
456	   the activeRefresh time.  Consider the example of a zone with an
457	   activeRefresh period of 7 days.  If an associated RFC5011 Resolver
458	   started it's holdDown timer just after the SEP published a new DNSKEY
459	   (at time T+0), the resolver would send checking queries at T+7, T+14,
460	   T+21 and T+28 Days and will finally accept it at T+35 days, which is
461	   5 days longer than the 30-day addHoldDownTime.

463	   The activeRefreshOffset term defines this time difference and
464	   becomes:

466	    activeRefreshOffset = addHoldDownTime % activeRefresh

468	   The % symbol denotes the mathematical mod operator (calculating the
469	   remainder in a division problem).  This will frequently be zero, but
470	   can be nearly as large as activeRefresh itself.

472	6.1.6.2.  clockskewDriftMargin

474	   Even small clock drifts can have negative impacts upon the timing of
475	   the RFC5011 Resolver's measurements.  Consider the simplest case
476	   where the RFC5011 Resolver's clock shifts over time to be 2 seconds
477	   slower near the end of the RFC5011 Resolver's addHoldDownTime period.
478	   I.E., if the RFC5011 Resolver first noticed a new DNSKEY at:

480	             firstSeen = sigExpirationTime + activeRefresh + 1 second

482	   The effect of 2 second clock drift between the SEP Publisher and the
483	   RFC5011 Resolver may result in the RFC5011 Resolver querying again
484	   at:

486	             justBefore = sigExpirationTime + addHoldDownTime +
487	                          activeRefresh + 1 second - 2 seconds

489	             which becomes:

491	             justBefore = sigExpirationTime + addHoldDownTime +
492	                          activeRefresh - 1 second

494	   The net effect is the addHoldDownTime will not have been reached from
495	   the perspective of the RFC5011 Resolver, but it will have been
496	   reached from the perspective of the SEP Publisher.  The net effect is
497	   it may take one additional activeRefresh period longer for this
498	   RFC5011 Resolver to accept the new key (at sigExpirationTime +
499	   addHoldDownTime + 2 * activeRefresh - 1 second).

501	   We note that even the smallest clockskew errors can require waiting
502	   an additional activeRefresh period, and thus define the
503	   clockskewDriftMargin as:

505	       clockskewDriftMargin = activeRefresh

507	6.1.6.3.  retryDriftMargin

509	   Drift associated with a lost transmission and an accompanying re-
510	   transmission (see Section 2.3 of [RFC5011]) will cause RFC5011
511	   Resolvers to also change the timing associated with query times such
512	   that it becomes impossible to predict, from the perspective of the
513	   SEP Publisher, when the conclusive measurement query will arrive.
514	   Similarly, any software that restarts/reboots without saving next-
515	   query timing state may also commence with a new random starting time.
516	   Thus, an additional activeRefresh is needed to handle both these
517	   cases as well.

519	             retryDriftMargin = activeRefresh

521	   Note that we account for additional time associated with cumulative
522	   multiple retries, especially under high-loss conditions, in
523	   Section 6.1.6.4.

525	6.1.6.4.  timingSafetyMargin Value

527	   The activeRefreshOffset, clockskewDriftMargin, and retryDriftMargin
528	   parameters all deal with additional wait-periods that must be
529	   accounted for after analyzing what conditions the client will take
530	   longer than expected to make its last query while waiting for the
531	   addHoldDownTime period to pass.  But these values may be merged into
532	   a single term by waiting the longest of any of them.  We define
533	   timingSafetyMargin as this "worst case" value:

535	        timingSafetyMargin = MAX(activeRefreshOffset,
536	                                 clockskewDriftMargin,
537	                                 retryDriftMargin)

539	        timingSafetyMargin = MAX(addWaitTime % activeRefresh,
540	                                 activeRefresh,
541	                                 activeRefresh)

543	        timingSafetyMargin = activeRefresh

545	6.1.7.  retrySafetyMargin

547	   The retrySafetyMargin is an extra period of time to account for
548	   caching, network delays, dropped packets, and other operational
549	   concerns otherwise beyond the scope of this document.  The value
550	   operators should chose is highly dependent on the deployment
551	   situation associated with their zone.  Note that no value of a
552	   retrySafetyMargin can protect against resolvers that are "down".
553	   Nonetheless, we do offer the following as one method considering
554	   reasonable values to select from.

556	   The following list of variables need to be considered when selecting
557	   an appropriate retrySafetyMargin value:

559	   successRate:  A likely success rate for client queries and retries

561	   numResolvers:  The number of client RFC5011 Resolvers

563	   Note that RFC5011 defines retryTime as:

565	         If the query fails, the resolver MUST repeat the query until
566	         satisfied no more often than once an hour and no less often
567	         than the lesser of 1 day, 10% of the original TTL, or 10% of
568	         the original expiration interval.  That is,
569	         retryTime = MAX (1 hour, MIN (1 day, .1 * origTTL,
570	                                       .1 * expireInterval)).

572	   With the successRate and numResolvers values selected and the
573	   definition of retryTime from RFC5011, one method for determining how
574	   many retryTime intervals to wait in order to reduce the set of
575	   resolvers that have not accepted the new trust anchor to 0 is thus:

577	                         x = (1/(1 - successRate))

579	            retryCountWait = Log_base_x(numResolvers)

581	   To reduce the need for readers to pull out a scientific calculator,
582	   we offer the following lookup table based on successRate and
583	   numResolvers:

585	                          retryCountWait lookup table
586	                        ---------------------------

588	                       Number of client RFC5011 Resolvers (numResolvers)
589	                       -------------------------------------------------
590	                        10,000  100,000 1,000,000 10,000,000 100,000,000
591	                 0.01      917     1146      1375       1604        1833
592	   Probability   0.05      180      225       270        315         360
593	   of Success    0.10       88      110       132        153         175
594	   Per Retry     0.15       57       71        86        100         114
595	   Interval      0.25       33       41        49         57          65
596	   (successRate) 0.50       14       17        20         24          27
597	                 0.90        4        5         6          7           8
598	                 0.95        4        4         5          6           7
599	                 0.99        2        3         3          4           4
600	                 0.999       2        2         2          3           3

602	   Finally, a suggested value of retrySafetyMargin can then be this
603	   retryCountWait number multiplied by the retryTime from RFC5011:

605	                 retrySafetyMargin = retryCountWait * retryTime

607	6.2.  Timing Requirements For Adding a New KSK

609	   Given the defined parameters and analysis from Section 6.1, we can
610	   now create a method for calculating the amount of time to wait until
611	   it is safe to start signing exclusively with a new DNSKEY (especially
612	   useful for writing code involving sleep based timers) in
613	   Section 6.2.1, and define a method for calculating a wall-clock value
614	   after which it is safe to start signing exclusively with a new DNSKEY
615	   (especially useful for writing code based on clock-based event
616	   triggers) in Section 6.2.2.

618	6.2.1.  Wait Timer Based Calculation

620	   Given the attack description in Section 5, the correct minimum length
621	   of time required for the Zone Signer to wait after publishing K_new
622	   but before exclusively using it and newer keys is:

624	      addWaitTime = addHoldDownTime
625	                    + sigExpirationTimeRemaining
626	                    + activeRefresh
627	                    + timingSafetyMargin
628	                    + retrySafetyMargin

630	6.2.1.1.  Fully expanded equation

632	   Given the equation components defined in Section 6.1, the full
633	   expanded equation is:

635	      addWaitTime = addHoldDownTime
636	                    + sigExpirationTimeRemaining
637	                    + 2 * MAX(1 hour,
638	                          MIN(sigExpirationTime / 2,
639	                              MAX(TTL of K_old DNSKEY RRSet) / 2,
640	                              15 days)
641	                          )
642	                    + retrySafetyMargin

644	6.2.2.  Wall-Clock Based Calculation

646	   The equations in Section 6.2.1 are defined based upon how long to
647	   wait from a particular moment in time.  An alternative, but
648	   equivalent, method is to calculate the date and time before which it
649	   is unsafe to use a key for signing.  This calculation thus becomes:

651	      addWallClockTime = lastSigExpirationTime
652	                       + addHoldDownTime
653	                       + activeRefresh
654	                       + timingSafetyMargin
655	                       + retrySafetyMargin

657	   where lastSigExpirationTime is the latest value of any
658	   sigExpirationTime for which RRSIGs were created that could
659	   potentially be replayed.  Fully expanded, this becomes:

661	    addWallClockTime = lastSigExpirationTime
662	                       + addHoldDownTime
663	                       + 2 * MAX(1 hour,
664	                                 MIN(sigExpirationTime / 2,
665	                                     MAX(TTL of K_old DNSKEY RRSet) / 2,
666	                                     15 days)
667	                                 )
668	                       + retrySafetyMargin

670	6.2.3.  Timing Constraint Summary

672	   The important timing constraint introduced by this memo relates to
673	   the last point at which a RFC5011 Resolver may have received a
674	   replayed original DNSKEY set, containing K_old and not K_new.  The
675	   next query of the RFC5011 validator at which K_new will be seen
676	   without the potential for a replay attack will occur after the old
677	   DNSKEY RRSIG's Signature Expriation Time.  Thus, the latest time that
678	   a RFC5011 Validator may begin their hold down timer is an "Active
679	   Refresh" period after the last point that an attacker can replay the
680	   K_old DNSKEY set.  The worst case scenario of this attack is if the
681	   attacker can replay K_old just seconds before the (DNSKEY RRSIG
682	   Signature Validity) field of the last K_old only RRSIG.

684	6.2.4.  Additional Considerations for RFC7583

686	   Note: our notion of addWaitTime is called "Itrp" in Section 3.3.4.1
687	   of [RFC7583].  The equation for Itrp in RFC7583 is insecure as it
688	   does not include the sigExpirationTime listed above.  The Itrp
689	   equation in RFC7583 also does not include the 2*TTL safety margin,
690	   though that is an operational consideration.

692	6.2.5.  Example Scenario Calculations

694	   For the parameters listed in Section 5.1, our resulting addWaitTime
695	   is:

697	     addWaitTime = 30
698	                   + 10
699	                   + 1 / 2
700	                   + 1 / 2          (days)

702	     addWaitTime = 43               (days)

704	   This addWaitTime of 42.5 days is 12.5 days longer than just the hold
705	   down timer, even with the needed retrySafetyMargin value being left
706	   out (which we exclude due to the lack of necessary operational
707	   parameters).

709	6.3.  Timing Requirements For Revoking an Old KSK

711	   This issue affects not just the publication of new DNSKEYs intended
712	   to be used as trust anchors, but also the length of time required to
713	   continuously publish a DNSKEY with the revoke bit set.

715	   Section 6.2.1 defines a method for calculating the amount of time
716	   operators need to wait until it is safe to cease publishing a DNSKEY
717	   (especially useful for writing code involving sleep based timers),
718	   and Section 6.2.2 defines a method for calculating a minimal wall-
719	   clock value after which it is safe to cease publishing a DNSKEY
720	   (especially useful for writing code based on clock-based event
721	   triggers).

723	6.3.1.  Wait Timer Based Calculation

725	   Both of these publication timing requirements are affected by the
726	   attacks described in this document, but with revocation the key is
727	   revoked immediately and the addHoldDown timer does not apply.  Thus
728	   the minimum amount of time that a SEP Publisher must wait before
729	   removing a revoked key from publication is:

731	     remWaitTime = sigExpirationTimeRemaining
732	                   + activeRefresh
733	                   + timingSafetyMargin
734	                   + retrySafetyMargin

736	     remWaitTime = sigExpirationTimeRemaining
737	                   + MAX(1 hour,
738	                         MIN((sigExpirationTime) / 2,
739	                             MAX(TTL of K_old DNSKEY RRSet) / 2,
740	                             15 days))
741	                   + activeRefresh
742	                   + retrySafetyMargin

744	   Note also that adding retryTime intervals to the remWaitTime may be
745	   wise, just as it was for addWaitTime in Section 6.

747	6.3.2.  Wall-Clock Based Calculation

749	   Like before, the above equations are defined based upon how long to
750	   wait from a particular moment in time.  An alternative, but
751	   equivalent, method is to calculate the date and time before which it
752	   is unsafe to cease publishing a revoked key.  This calculation thus
753	   becomes:

755	      remWallClockTime = lastSigExpirationTime
756	                       + activeRefresh
757	                       + timingSafetyMargin
758	                       + retrySafetyMargin

760	      remWallClockTime = lastSigExpirationTime
761	                       + MAX(1 hour,
762	                             MIN((sigExpirationTime) / 2,
763	                                 MAX(TTL of K_old DNSKEY RRSet) / 2,
764	                                 15 days))
765	                       + timingSafetyMargin
766	                       + retrySafetyMargin

768	   where lastSigExpirationTime is the latest value of any
769	   sigExpirationTime for which RRSIGs were created that could
770	   potentially be replayed.  Fully expanded, this becomes:

772	6.3.3.  Additional Considerations for RFC7583

774	   Note that our notion of remWaitTime is called "Irev" in
775	   Section 3.3.4.2 of [RFC7583].  The equation for Irev in RFC7583 is
776	   insecure as it does not include the sigExpirationTime listed above.
777	   The Irev equation in RFC7583 also does not include a safety margin,
778	   though that is an operational consideration.

780	6.3.4.  Example Scenario Calculations

782	   For the parameters listed in Section 5.1, our example:

784	     remwaitTime = 10
785	                   + 1 / 2          (days)

787	     remwaitTime = 10.5             (days)

789	   Note that for the values in this example produce a length shorter
790	   than the recommended 30 days in RFC5011's section 6.6, step 3.  Other
791	   values of sigExpirationTime and the original TTL of the K_old DNSKEY
792	   RRSet, however, can produce values longer than 30 days.

794	   Note that because revocation happens immediately, an attacker has a
795	   much harder job tricking a RFC5011 Resolver into leaving a trust
796	   anchor in place, as the attacker must successfully replay the old
797	   data for every query a RFC5011 Resolver sends, not just one.

799	7.  IANA Considerations

801	   This document contains no IANA considerations.

803	8.  Operational Considerations

805	   A companion document to RFC5011 was expected to be published that
806	   describes the best operational practice considerations from the
807	   perspective of a zone publisher and SEP Publisher.  However, this
808	   companion document has yet to be published.  The authors of this
809	   document hope that it will at some point in the future, as RFC5011
810	   timing can be tricky as we have shown, and a BCP is clearly
811	   warranted.  This document is intended only to fill a single
812	   operational void which, when left misunderstood, can result in
813	   serious security ramifications.  This document does not attempt to
814	   document any other missing operational guidance for zone publishers.

816	9.  Security Considerations

818	   This document, is solely about the security considerations with
819	   respect to the SEP Publisher's ability to advertise new DNSKEYs via
820	   the RFC5011 automated trust anchor update process.  Thus the entire
821	   document is a discussion of Security Considerations when adding or
822	   removing DNSKEYs from trust anchor storage using the RFC5011 process.

824	   For simplicity, this document assumes that the SEP Publisher will use
825	   a consistent RRSIG validity period.  SEP Publishers that vary the
826	   length of RRSIG validity periods will need to adjust the
827	   sigExpirationTime value accordingly so that the equations in
828	   Section 6 and Section 6.3 use a value that coincides with the last
829	   time a replay of older RRSIGs will no longer succeed.

831	10.  Acknowledgements

833	   The authors would like to especially thank to Michael StJohns for his
834	   help and advice and the care and thought he put into RFC5011 itself
835	   and his continued reviews and suggestions for this document.  He also
836	   designed the suggested math behind the suggested retrySafetyMargin
837	   values in Section 6.1.7.

839	   We would also like to thank Bob Harold, Shane Kerr, Matthijs Mekking,
840	   Duane Wessels, Petr Petr Spacek, Ed Lewis, Viktor Dukhovni, and the
841	   dnsop working group who have assisted with this document.

843	11.  Normative References

845	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
846	              Requirement Levels", BCP 14, RFC 2119,
847	              DOI 10.17487/RFC2119, March 1997,
848	              <https://www.rfc-editor.org/info/rfc2119>.

850	   [RFC4033]  Arends, R., Austein, R., Larson, M., Massey, D., and S.
851	              Rose, "DNS Security Introduction and Requirements",
852	              RFC 4033, DOI 10.17487/RFC4033, March 2005,
853	              <https://www.rfc-editor.org/info/rfc4033>.

855	   [RFC5011]  StJohns, M., "Automated Updates of DNS Security (DNSSEC)
856	              Trust Anchors", STD 74, RFC 5011, DOI 10.17487/RFC5011,
857	              September 2007, <https://www.rfc-editor.org/info/rfc5011>.

859	   [RFC7583]  Morris, S., Ihren, J., Dickinson, J., and W. Mekking,
860	              "DNSSEC Key Rollover Timing Considerations", RFC 7583,
861	              DOI 10.17487/RFC7583, October 2015,
862	              <https://www.rfc-editor.org/info/rfc7583>.

864	   [RFC7719]  Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
865	              Terminology", RFC 7719, DOI 10.17487/RFC7719, December
866	              2015, <https://www.rfc-editor.org/info/rfc7719>.

868	Appendix A.  Real World Example: The 2017 Root KSK Key Roll

870	   In 2017 and 2018, ICANN expects to (or has, depending on when you're
871	   reading this) roll the key signing key (KSK) for the root zone.  The
872	   relevant parameters associated with the root zone at the time of this
873	   writing is as follows:

875	         addHoldDownTime:                      30 days
876	         Old DNSKEY sigExpirationTime:         21 days
877	         Old DNSKEY TTL:                        2 days

879	   Thus, sticking this information into the equation in
880	   Section Section 6 yields (in days from publication time):

882	     addWaitTime = 30
883	                   + 21
884	                   + MAX(1 hour,
885	                         MIN(21 / 2,     # activeRefresh
886	                             MAX(2) / 2,
887	                             15 days),
888	                         )
889	                   + activeRefresh

891	     addWaitTime = 30 + 21 + 1 + 1

893	     addWaitTime = 53 days

895	   Also note that we exclude the retrySafetyMargin value, which is
896	   calculated based on the expected client deployment size.

898	   Thus, ICANN must wait a minimum of 52 days before switching to the
899	   newly published KSK (and 26 days before removing the old revoked key
900	   once it is published as revoked).  ICANN's current plans involve
901	   waiting over 3 months before using the new KEY and 69 days before
902	   removing the old, revoked key.  Thus, their current rollover plans
903	   are sufficiently secure from the attack discussed in this memo.

905	Authors' Addresses

907	   Wes Hardaker
908	   USC/ISI
909	   P.O. Box 382
910	   Davis, CA  95617
911	   US

913	   Email: ietf@hardakers.net

915	   Warren Kumari
916	   Google
917	   1600 Amphitheatre Parkway
918	   Mountain View, CA  94043
919	   US

921	   Email: warren@kumari.net