idnits 2.17.1 

draft-barwood-dnsext-fr-resolver-mitigations-08.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 439.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 450.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 457.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 463.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 5 instances of too long lines in the document, the longest one
     being 1 character in excess of 72.

  == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the
     document.

  == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 133: '...eneral purpose resolver MUST implement...'
     RFC 2119 keyword, line 224: '...purpose resolver MUST not rely on port...'
     RFC 2119 keyword, line 229: '...   MUST use the same source port when ...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == Line 275 has weird spacing: '...ecision  is th...'

  == Line 330 has weird spacing: '...ults be  prope...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     Unfortunately it is impractical for a program to reliably determine
     whether a resolver is currently situated behind a NAT device that may
     undo port randomization ( and this can change for each packet sent ), so
     a general purpose resolver MUST not rely on port randomization for
     security.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 26, 2008) is 5661 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC2181' is defined on line 410, but no explicit
     reference was found in the text


     Summary: 3 errors (**), 0 flaws (~~), 7 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	DNS Extensions Working Group                                  G. Barwood
3	Internet-Draft
4	Intended status: Informational                          October 26, 2008
5	Expires: April 2009

7	                       Resolver side mitigations
8	              draft-barwood-dnsext-fr-resolver-mitigations-08

10	Status of This Memo

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been or will be disclosed, and any of which he or she becomes
15	   aware will be disclosed, in accordance with Section 6 of BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire in March 2009   .

35	Abstract

37	   Describes mitigations against spoofing attacks on DNS, including:

39	   (1) Repeating the query, including techniques for handling
40	       non-deterministic responses.

42	   (2) Prepending a random nonce to the question where a referral is
43	       probable.

45	   (3) Estimating the entropy available, taking into account
46	      (a) Observed packets with incorrect IDs.
47	      (b) The content of the cache.

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3

53	   2.  Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . .  3

55	   3.  Mitigations  . . . . . . . . . . . . . . . . . . . . . . . . .  4
56	     3.1.  Query repetition  . . . . . . . . . . . . . . . . .  . . .  4
57	     3.2.  Randomize the case of the question (0x20). . . . . . . . .  5
58	     3.3.  Use a randomly chosen source port  . . . . . . . . . . . .  6
59	     3.4.  Prepend a random nonce label to the question.  . . . . . .  6
60	     3.5.  Maintain a count of observed Bad IDs . . . . . . . . . . .  7
61	     3.6.  Use of calculated entropy  . . . . . . . . . . . . . . . .  7

63	   4. Analyis . . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
64	     4.1.  Query repetition . . . . . . . . . . . . . . . . . . . . .  8
65	     4.2.  Impact on Root and TLD . . . . . . . . . . . . . . . . . .  8
66	     4.3.  Impact on other levels . . . . . . . . . . . . . . . . . .  9
67	     4.4.  Lame servers and the random nonce. . . . . . . . . . . . .  9
68	     4.5.  Security level . . . . . . . . . . . . . . . . . . . . . .  9

70	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10

72	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10

74	   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 10

76	   8.  Informative References . . . . . . . . . . . . . . . . . . . . 10

78	1.  Introduction

80	   This document describes mitigations that a resolver can currently
81	   deploy to resist spoofing attacks on DNS, without server software
82	   being updated.

84	   The context in which these solutions were explored is CERT
85	   Vulnerability Note VU#800113, "Multiple DNS implementations
86	   vulnerable to cache poisoning".

88	   The Kaminsky attack proceeds by asking a recursive DNS server
89	   a series of questions, each with a different random prefix,
90	   and then sending spoof packets to the server, containing
91	   additional records with genuine owner names but invalid data.
92	   For example:

94	   Query:
95	   Question <nonce>.com A

97	   Spoof response:
98	   Question <nonce>.com A
99	   Authority: com NS ns.evil.com

101	   The effect is to inject an invalid record into the cache.

103	   Since the ID field in the DNS packet header is only 16 bits, a
104	   DNS server that does not deploy any mitigations can be
105	   compromised in a matter of seconds.

107	   [ An implementation of the techniques described can accessed at
108	     http://www.george-barwood.pwp.blueyonder.co.uk/DnsServer/ ]

110	2.  Criteria

112	   These are resolver side solutions, thus only the resolver needs to be
113	   redeployed, or the software updated.  This allows updated resolvers
114	   to be deployed immediately.

116	   The solutions have to follow the DNS protocol.

118	   The solutions have to be practical, non disruptive, and not
119	   anti-social.

121	3.  Mitigations

123	   Below, the resolver side mitigations are described.

125	   Query repetition (3.1) is necessary and sufficient, the other
126	   mitigations reduce the number of queries needed for good security.

128	3.1.  Query repetition

130	   By repeating the query, additional entropy may be obtained.

132	   Repetition is the only method of obtaining suitable entropy under
133	   all conditions, so a general purpose resolver MUST implement
134	   repetition.

136	   A practical problem occurs when responses are non-deterministic,
137	   that is many different responses are obtained for the same question.

139	   In this case, the resolver will need to perform an analysis to
140	   produce a converged result, or to report server failure (or a
141	   security warning, if this is possible) if convergence has not
142	   been achieved after some iteration limit.

144	   The suggested method is to accumulate entropy for various attributes
145	   of the response, specifically non-zero Rcodes (including an internal
146	   representation of no Data ), the Resource Records (RRs), and the
147	   cardinality of each Resource Record Set (RRset).

149	   Each Response can have a counter that represents the number of
150	   attributes that have not reached the required threshold. When the
151	   counter reaches zero, that response is considered fully checked,
152	   and is used as the converged result.

154	   For example, suppose the question is MX records for example.com.

156	   First response:
157	   example.com MX mail1.example.com
158	   example.com MX mail2.example.com

160	   Second response:
161	   example.com MX mail2.example.com  ( mail2.example.com confirmed)
162	   example.com MX mail3.example.com

164	   Also confirmed : example.com MX has 2 alternatives.

166	   Third response:
167	   example.com MX mail3.example.com ( mail3.example.com confirmed )
168	   example.com MX mail4.example.com

170	   The result is the second response.

172	   Note that it is possible for an attacker to break RRset integrity
173	   with a single forged response in the non-deterministic case.
174	   For example, the second response in the example could be forged.
175	   However this appears to be a very weak achievement.

177	   Where convergence is very slow, some records may be omitted from the
178	   convergence test, and discarded ( if not acceptable as described
179	   in section 3.6 ), to be fetched later as required.

181	   The records that are always kept are

183	   (E1) Records where the owner name and type exactly match the question.
184	   (E2) NS records where the query question ends with the owner name.

186	   Other records may be discarded ( normally glue A records ).

188	   For example, if the question is www.example.com A, then in a response

190	   www.example.com A 1.2.3.4 : is always kept by (E1)

192	   example.com NS ns.example.com : is always kept by (E2)

194	   ns.example.com A 1.2.3.4 : may be discarded

196	   There is a possibility that combinations of resource records may
197	   result that would not occur normally. In the Akamai case, this could
198	   in principle result in a loss of resilience, instead of 9 distinct
199	   IP addresses for the name servers, some might be duplicated.

201	   However no examples have yet been identified where a significant
202	   problem arises, and discarding records is only found to be necessary
203	   for the Akamai case, where full convergence might otherwise need about
204	   100 queries. Stopping after about 10 queries typically results in one
205	   or two glue A records being discarded, and 9 NS records and the
206	   remaining 7 glue records being accepted.

208	   In other cases, convergence generally occurs after at most 3 or 4
209	   queries.

211	3.2.  Randomize the case of the question (0x20)

213	   Most authoritative servers preserve the case of the question in the
214	   response, so some additional entropy may usually be obtained by
215	   randomizing the case of the question.

217	3.3.  Use a randomly chosen source port

219	   This is a well-known method of obtaining extra entropy.

221	   Unfortunately it is impractical for a program to reliably determine
222	   whether a resolver is currently situated behind a NAT device that
223	   may undo port randomization ( and this can change for each packet
224	   sent ), so a general purpose resolver MUST not rely on port
225	   randomization for security.

227	   To avoid problems where authoritative servers may be behind firewalls
228	   that enforce very low limits on incoming UDP connections, resolvers
229	   MUST use the same source port when repeating a query ( 3.1 ).

231	3.4. Prepend a random nonce label to the question.

233	   This msy be used where a referral is probable.

235	   It allows an amount of entropy to be encoded limited only by the 256
236	   character limit on a question, provided the authority server returns
237	   a copy of the question in the response.

239	   If the response is not a referral*, the response should be discarded,
240	   and the query repeated without the nonce.

242	   * That is any of the following are observed:
243	     (a) The response is Authoritative ( AA bit is set in the header ).
244	     (b) There is an error ( RCODE is not zero ).
245	     (c) The answer section is not empty.
246	     (d) The authority section is empty.

248	   A simple heuristic for deciding where a referral is probable is:

250	   (1) If the Bailiwick is Root or a TLD, and the question is not equal
251	       to the Bailiwick a referral is probable.

253	   (2) Otherwise a referral is not probable.

255	3.5.  Maintain a count of observed Bad IDs

257	   The approximate number of incorrect IDs observed in some fixed
258	   time period, for example the last 20 seconds, may be kept.

260	   This value may be used to decide when to deploy mitigations, such
261	   as extra query repetition, and allows a smooth response to attacks,
262	   while maximising performance under normal conditions where no
263	   attack is observed.

265	3.6.  Use of calculated entropy

267	   When a response is received, an entropy calculation may be performed
268	   to estimate how many bits have been checked.

270	   It will typically include 16 bits for the ID, 0x20 bits,
271	   bits from the prepended nonce, and discount for unusual /
272	   non-standard features (such as IP mismatch, question not copied).

274	   The entropy is accumulated for each response attribute, as described
275	   in 3.1, and a decision  is then made to decide whether a value is
276	   to be accepted as valid, which in turn affects whether the query needs
277	   to be repeated as described in 3.1.

279	   For example, the test for whether a value is valid could be

281	   E + C > 50 + 2*K

283	   where
284	     E is the accumulated entropy
285	     C is zero if the value is not in the cache, otherwise 30
286	     K is the logarithm (base 2) of the Bad Id count (3.5)

288	   Cache entries may be retained in the cache for some period ( say 1
289	   day ) after their normal TTL expiry time, to reduce the number of
290	   queries when the value needs to be refreshed after TTL expiry.

292	4. Analysis

294	  This section is intended to be less formal, to give some insight
295	  into the rationale for the recommendations given in section 3,
296	  and to discuss possible adverse effects.

298	  The intention is that these mitigations have minimal effects, other
299	  than to make DNS spoof attacks impractical.

301	4.1.  Query repetition
302	  Query repetition should have no impact other than on server load.
303	  Servers do not normally retain any state information about clients
304	  after the query/response transaction completes.

306	4.2.  Impact on Root and TLD servers

308	  The random nonce (3.4) is valuable because it means that no
309	  extra queries to Root and top level servers are needed in normal
310	  operation. This is important because these servers constitute
311	  the shared public base of the DNS, so the stability of these
312	  servers is very important.

314	  The exceptions are the initial root "priming" query and queries
315	  for non-existent domains. For the root domain, by assuming
316	  that every child domain has an SOA record, Name Errors need not
317	  be retried ( by checking the ower name for the SOA record ).
318	  While this assumption is currently correct (and is also observed
319	  to be true for net and com domains), implementors need to carefully
320	  weigh any performance advantage with the risk that the assumption
321	  may not be valid in future.

323	  Clients in general should implement user interfaces that make it
324	  unlikely that users will enter invalid domain names, and that
325	  errors are properly notified, so they can be corrected. However
326	  this is outside the scope of this document.

328	  In practice, most root server queries emanate from mis-configured
329	  software, so in any case proportional effect on root servers will be
330	  small. It is important that negative results be  properly cached.

332	4.3.  Impact on other levels

334	  For the example test given in 3.6, two queries are usually
335	  required the first time a record is fetched. However when the
336	  TTL expires, the refresh operation only requires a single query.

338	  It is expected that such refresh operations dominate proper
339	  DNS traffic, so the impact should be minimal.

341	  Operators of authoritative servers have several options if
342	  the query repetition may cause overload.

344	  (a) Increase unreasonably low TTLs.
345	  (b) Use names with more alpha characters (to take advantage of 0x20).
346	  (c) Implement support for the proposed AL record or equivalent.

348	  The latter implies that agreeing a specification for the proposed
349	  AL record type (or EDNS Ping equivalent) would be useful.

351	4.4   Lame servers and the random nonce

353	  In order to resolve domain names where servers are incorrectly
354	  configured, it may be necessary to use a query without the nonce.

356	  A current example is resolving the IP addresses for the name servers
357	  for www.iahc.org, which are ns2.ar.com and ns3.ar.com.

359	  The com nameservers generate a referral for the question
360	  <nonce>.ns2.ar.com, which leads only to lame name servers, but the
361	  IP address for a non-lame server when the nonce is omitted.

363	  Thus when lame servers are detected, special logic to allow name
364	  resolution to still occur is needed.

366	  Of course a resolver may choose to merely report failure in this
367	  case, however this may not be practical.

369	4.5.  Security Level

371	  The 50 bits suggested in 3.6 should provide a good margin of
372	  safety. An attack sending one spoof packet every 20 seconds at a
373	  particular target will take about 50 million years to succeed.

375	  Taking Bad IDs into consideration (3.5) implies that an attacker gains
376	  nothing from sending attacks at a faster rate.

378	  As a test, the resolver was run with the security level set to 200 bits
379	  with no perceptible decrease in performance ( the required number of
380	  packets can be calculated in advance and sent in parallel, except in
381	  the non-deterministic case ).

383	5.  Security Considerations

385	   All of the mitigations aim to provide more security. Query repetition
386	   has an obvious adverse effect on performance and bandwith.

388	   Each query repetition provides an extra attack opportunity, so the
389	   total entropy requirement may be adjusted to reflect this.

391	   The random nonce may expose internal state to an attacker who
392	   controls a name server. It is essential that a cryptographically
393	   strong source of random numbers be used to generate IDs, 0x20 bits
394	   and prepended nonces. This must be seeded from data that cannot be
395	   guessed by an attacker, such as thermal noise or other random
396	   physical fluctuations.

398	6.  IANA Considerations

400	   No direct considerations.
401	   Indirectly, the TYPE code for AL record described in 4.4.

403	7.  Acknowledgments

405	   Thanks to Nicholas Weaver (ICSI Berkeley) and Wouter Wijngaards (NLnet
406	   Labs). The idea of prepending a nonce may be due to Paul Vixie (ISC).

408	8.  Informative References

410	   [RFC2181]  Elz, R. and R. Bush, "Clarifications to the DNS
411	              Specification", RFC 2181, July 1997.

413	Author's Address

415	   George Barwood
416	   33 Sandpiper Close
417	   Gloucester
418	   GL2 4LZ
419	   United Kingdom

421	   Phone: +44 452 722670
422	   EMail: george.barwood@blueyonder.co.uk
423	   Skype: george.barwood

425	Full Copyright Statement

427	   Copyright (C) The IETF Trust (2008).

429	   This document is subject to the rights, licenses and restrictions
430	   contained in BCP 78, and except as set forth therein, the authors
431	   retain all their rights.

433	   This document and the information contained herein are provided on an
434	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
435	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
436	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
437	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
438	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
439	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

441	Intellectual Property

443	   The IETF takes no position regarding the validity or scope of any
444	   Intellectual Property Rights or other rights that might be claimed to
445	   pertain to the implementation or use of the technology described in
446	   this document or the extent to which any license under such rights
447	   might or might not be available; nor does it represent that it has
448	   made any independent effort to identify any such rights.  Information
449	   on the procedures with respect to rights in RFC documents can be
450	   found in BCP 78 and BCP 79.

452	   Copies of IPR disclosures made to the IETF Secretariat and any
453	   assurances of licenses to be made available, or the result of an
454	   attempt made to obtain a general license or permission for the use of
455	   such proprietary rights by implementers or users of this
456	   specification can be obtained from the IETF on-line IPR repository at
457	   http://www.ietf.org/ipr.

459	   The IETF invites any interested party to bring to its attention any
460	   copyrights, patents or patent applications, or other proprietary
461	   rights that may cover technology that may be required to implement
462	   this standard.  Please address the information to the IETF at
463	   ietf-ipr@ietf.org.