idnits 2.17.1 

draft-wijngaards-dnsext-resolver-side-mitigation-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The document has examples using IPv4 documentation addresses according
     to RFC6890, but does not use any IPv6 documentation addresses.  Maybe
     there should be IPv6 examples, too?


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 24, 2009) is 5539 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	DNS Extensions Working Group                               W. Wijngaards
3	Internet-Draft                                                NLnet Labs
4	Intended status: Informational                         February 24, 2009
5	Expires: August 28, 2009

7	                       Resolver side mitigations
8	          draft-wijngaards-dnsext-resolver-side-mitigation-01

10	Status of This Memo

12	   This Internet-Draft is submitted to IETF in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at any
22	   time.  It is inappropriate to use Internet-Drafts as reference
23	   material or to cite them other than as "work in progress."

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt.

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	   This Internet-Draft will expire on August 28, 2009.

33	Copyright Notice

35	   Copyright (c) 2009 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document.  Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document.

45	Abstract

47	   This document describes a set of mitigations that stop the known
48	   variations of the Kaminsky cache poisoning attacks against the DNS
49	   system, for which only resolver side deployment is necessary.

51	Table of Contents

53	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3

55	   2.  Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . .  3

57	   3.  Mitigations  . . . . . . . . . . . . . . . . . . . . . . . . .  4
58	     3.1.  Add Entropy  . . . . . . . . . . . . . . . . . . . . . . .  4
59	     3.2.  Use Care with the Cache  . . . . . . . . . . . . . . . . .  5
60	     3.3.  Obtain Authoritative Data  . . . . . . . . . . . . . . . .  6
61	     3.4.  Detection  . . . . . . . . . . . . . . . . . . . . . . . .  7

63	   4.  Variants to Protect against  . . . . . . . . . . . . . . . . .  8

65	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10

67	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 11

69	   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 11

71	   8.  Informative References . . . . . . . . . . . . . . . . . . . . 11

73	1.  Introduction

75	   [WW: These are the counter measures for the Kaminsky attack scenarios
76	   that I envision for the Unbound resolver (http://unbound.net).  These
77	   are counter measures that require resolver side deployment only.
78	   Depending on working group input this document could remain an
79	   Unbound specific information document or can be made more generic,
80	   and move towards a BCP.]

82	   This document describes the mitigations that a resolver can deploy on
83	   its own in the meantime, while a more comprehensive (read: DNSSEC)
84	   solution is being rolled out.  For counter measures that require
85	   changes to authoritative and recursive servers everywhere, DNSSEC
86	   provides the most protection, followed by Nonce-based approaches
87	   (e.g.  EDNS PING), followed by transport protocol games.  Because
88	   Unbound implements DNSSEC validation already, and DNSSEC provides the
89	   most protection (e.g. against new unknown variations and also against
90	   full man-in-the-middle attacks), this is a good long term choice.

92	   The solutions covered in this document hope to cover all of the
93	   variations in the recent Kaminsky-style attacks.  However, it seems
94	   likely that other variations besides the ones described in this
95	   document are going to be discovered.  For that reason a number of
96	   generic protections are included, chief amongst those is the use of
97	   extra entropy.

99	   Since this document focuses on Unbound it is worth noting that
100	   although current versions implement these mitigations, they are not
101	   all turned on by default.  Unbound should support the mitigations
102	   considered 'best' by the community.  This means without weird, ill-
103	   considered, mitigations of its own.  Hence this document.

105	   It is assumed the reader is aware of, and implementing, the forgery-
106	   resilience [RFC5452] recommendations.

108	   In Section 2 the criteria are listed.  In Section 3 the various
109	   measures that can be used to mitigate threats are described.  Section
110	   4 enumerates Kaminsky-style attack variations, and shows what
111	   measures provide protection against each one of them.  Section 5
112	   discusses consequences caused by the mitigations.

114	2.  Criteria

116	   The first and foremost criterium is that these are resolver side
117	   solutions, thus only the resolver needs to be redeployed, or the
118	   software updated, for this to work.  The reason behind this is that a
119	   short term deployment is possible.  The idea is to provide some
120	   (partial) protection on the short term.  On the long term it is
121	   possible to redeploy both authority and recursors, and the solution
122	   space is greatly increased (e.g. options range from EDNS PING, using
123	   TCP or SCTP, to DNSSEC deployment).

125	   Many solutions in this document could also be used in stub resolvers.
126	   Stub resolvers are not mentioned specifically further on, the main
127	   focus is on the caching recursive server.

129	   The solutions have to follow the DNS protocol.

131	   The solutions have to be non disruptive, and non anti-social.
132	   Specifically, they must not put the costs of the solution with 3rd
133	   parties.  For example, large scale fallback to TCP both uses a
134	   limited resource (TCP connections to authority servers), and disrupts
135	   deployment behind many middle boxes.

137	   Solutions without an 'attack mode' are preferred.  An 'attack mode'
138	   is a different state of behaviour that the resolver enters into after
139	   something anomalous is detected.  It may be for only a subset of
140	   operations or only a limited time.  One reason to avoid such modal
141	   design is that paranoia dictates that maximal protection should
142	   always be used.  A second reason is that if a protection measure
143	   cannot be used always, it is likely to be disruptive (see above).
144	   Such an 'attack mode' complicates implementation, testing and
145	   especially security analysis.

147	3.  Mitigations

149	   Below, the resolver side mitigations are described.

151	3.1.  Add Entropy

153	   The mitigations in this section increase the transaction entropy
154	   above the 16 bits in the ID number.  This is pretty close to the
155	   forgery-resilience [RFC5452] text, differences are in the rtt banding
156	   text and 0x20 consideration.

158	   o  port randomisation

160	      As many as possible, using only 1000 or 2000 ports (as some
161	      commercial DNS products do) is not enough.  A range of 59000 port
162	      numbers (15.8 bits) can be usefully achieved.  This causes
163	      operational problems (NAT boxes using predictable port numbers),
164	      portability problems (bugs, features not available), and volume
165	      problems (using port number uses limited resource).

167	   o  0x20.

169	      Breaks queries to some authorities, but more than 99.9% works.  It
170	      is like a proposal that needs authority server deployment where
171	      the authority servers are already deployed to a large extent.
172	      [I-D.vixie-dnsext-dns0x20].

174	   o  rtt banding

176	      RTT banding refers to the method of picking a random nameserver
177	      for the query out of the set of nameservers that are within a RTT
178	      band (say at most 200 msec slower) from the fastest nameserver.

180	      New attack opportunities can be created by sending a new fake
181	      question to be resolved by the resolver.  Therefore the actual
182	      size of the roundtrip time window is not as important as the
183	      additional entropy gained by selecting randomly from a set of
184	      servers.

186	   o  IPv4 - IPv6

188	      When both IPv4 and IPv6 are available, the protocol can be chosen
189	      randomly together with rtt banding to provide more entropy.

191	   o  source address randomisation

193	      If the resolver has multiple public IP addresses these can be used
194	      to randomise with.

196	   If all the above entropy settings are in use, it is estimated that
197	   Unbound can provide about 44 bits of entropy (16 ID, 15.8 port bits,
198	   about 8 0x20 bits, about 2 rtt banding + protocol bits and about 2
199	   source address bits).  Without user configuration or queries amenable
200	   to 0x20, 34 bits of entropy are likely, or even 18 if a NAT box kills
201	   the port randomisation.  Entropy thus provides only limited
202	   protection.

204	3.2.  Use Care with the Cache

206	   o  rfc2181 adherence

208	      This means that RRsets are ranked in trustworthiness depending on
209	      whether they come from the answer section, or from another part of
210	      the message.  The authoritative answers are preferred.  [RFC2181]

212	      In addition, do not give data obtained from authority or
213	      additional sections in answer sections to clients.

215	   o  CNAME chain.

217	      Only use first entry in answer section.  Perform new lookups for
218	      remainder.

220	   o  DNAME chain.

222	      Only use the first entry DNAME and its synthesized CNAME from the
223	      answer section.  Perform new lookups for remainder.

225	   o  no DNAME from cache

227	      Do not pick a DNAME RR out of the cache for a query for which that
228	      DNAME RR was not returned.  Thus, a DNAME is only used for query
229	      names for which answers have been received from the authority
230	      server.

232	      When the DNAME is signed with DNSSEC, it is allowed to synthesize
233	      new CNAMEs from it to answer new queries with it.  This is because
234	      the zone owner whose zone is redirected is signing away his own
235	      zone.

237	3.3.  Obtain Authoritative Data

239	   o  Authority query for NS after referral

241	      The idea is to obtain authoritative data for the NS RRset instead
242	      of using data tacked along on another message.  Care must be taken
243	      to avoid DoSing parent nameservers, and not break resolution in
244	      common cases where the NS RRsets in parent and child differ.

246	      On a referral, the data from the referral may be used to continue
247	      answering the current query, but it is not stored in the cache.
248	      If the question equals the referred zone name and has qtype NS,
249	      then the NS RRset from the referral does get stored in the cache.

251	      If the question is not that already, a new lookup is performed for
252	      the referred zone name with qtype NS.  The results from that
253	      lookup are cached normally.  The lookup has to start at a parent
254	      of the referred zone, so that a new referral is obtained.

256	      The upshot is that RFC2181 adherence pins the NS RRset data in the
257	      cache because it is seen in the answer section, and tacked on data
258	      from other messages is ignored until the TTL expires.  It should
259	      be noted that most infrastructure TTLs for NS records are very
260	      large.

262	      It does not break existing disjoint RRsets, or servers that do not
263	      answer for qtype NS at all, or servers that are offline, because
264	      the referral is cached when making the qtype NS query.  This is
265	      why the qtype NS query has to be made in such a way that it
266	      elicits a fresh referral from the parent server.  This gives a
267	      once per TTL opportunity for spoofing the referral.

269	      The NS RRset answered from the child side of the zone cut
270	      overrides the NS RRset picked up from the referral.  This causes
271	      the same data to be used as today, where the authority section NS
272	      set sent along by the child server overrides the NS set seen from
273	      the referral.

275	      Additional queries are sent for this solution.  This increases
276	      resolver and authority server load and bandwith usage.

278	   o  Authority queries for nameserver addresses, A and AAAA.

280	      Same idea, like NS query above.  You ask for A or AAAA records
281	      directly at the authoritative server.  It is not necessary to
282	      elicit the referral again, the query can be directed at the best
283	      server.

285	      Additional queries are sent for this solution.  This increases
286	      resolver and authority server load and bandwith usage.

288	   A bonus when using the above methods to obtain authoritative data is
289	   that when using DNSSEC, the data can be validated, and thus spoofed
290	   infrastructure data can be detected and handled appropriately.  This
291	   protects DNSSEC, where the referral contains unsigned NS, A and AAAA
292	   records from spoofed infrastructure data.  Of course, DNSSEC is
293	   designed to protect end-user data anyway, whether or not the referral
294	   data was poisoned.  It simply adds the opportunity to add another
295	   layer of defense.

297	3.4.  Detection

299	   o  trouble counter

301	      This is a simple detection method.  It counts all packets that
302	      were not asked for.  The only thing noted about the packet is that
303	      it is a query reply (QR bit) and was not asked for.

305	      This may show false positives due to UDP packet duplicates,
306	      delayed responses (delayed for longer than the implementation
307	      cares to keep track of what it asks for).  The idea is that false
308	      positives are probably a low amount.  Conversely, some unasked for
309	      packets may not be noticed because the implementation may not be
310	      listening to particular ports, or whatever implementation choices.

312	      When a particular threshold is met, the cache is wiped clean.

314	      The threshold is set so that denial of service does not become all
315	      that much easier, and that false positives do not (often) result
316	      in cache wipes.  A threshold in the range of 10 million is
317	      proposed.  This many packets itself is already a sizable denial of
318	      service attack, and also, the amount of data sent gets close to
319	      the cache size of the resolver to keep amplification towards the
320	      authority servers low.

322	      Since this mitigation is meant to protect against hitherto unknown
323	      variations, it does not help to examine the packets any further
324	      than the QR bit (and the fact that they were not used for regular
325	      processing).

327	      The result of this is that the probability that there is a
328	      poisoned item present in the cache is capped at some maximum.  The
329	      exact value depends on the entropy per message and the threshold.

331	4.  Variants to Protect against

333	   In the descriptions below a short title is given to quickly summarize
334	   the exploit.  The query 'q:' is what the attacker sends as fake
335	   question to the resolver to answer.  The answer, authority 'auth:'
336	   and additional 'add:' sections list the content that the spoofer
337	   provides.  The mitigation strategy, and sometimes discussion, is
338	   provided in the 'protected:' line.

340	   The real target is example.com or www.example.com or ns1.example.com,
341	   which is the real nameserver for example.com here.  The domain
342	   evil.example.net is under control of the attacker and
343	   192.0.2.66(evil) is an IP address under control of the attacker.  The
344	   label 'bad123' is used in place of a label that the attacker varies
345	   every attempt to obtain new spoofing windows.

347	   Glue with new DNS server
348	   q: bad123.example.com.
349	   answer: bad123.example.com. A whatever
350	   auth: example.com. NS evil.example.com.
351	   add: evil.example.com. A 192.0.2.66(evil)
352	   protected: 2181 adherence plus NS record pinned by NS query.
353	   Also name error or no data answers could be used, instead of
354	   this answer section.

356	   Glue for DNS server
357	   q: bad123.example.com.
358	   answer: bad123.example.com. A whatever
359	   auth: example.com. NS ns1.example.com. (normal entry)
360	   add: ns1.example.com. A 192.0.2.66(evil)
361	   protected: 2181 adherence plus NS record pinned by NS query,
362	   plus A record pinned by glue query.
363	   Also name error or no data answers could be used, instead of
364	   this answer section.

366	   Glue for Web server
367	   q: bad123.example.com.
368	   answer: bad123.example.com. A whatever
369	   auth: example.com. NS www.example.com.
370	   add: www.example.com. A 192.0.2.66(evil)
371	   protected: 2181 adherence plus NS record pinned by NS query.

373	   Glue smaller
374	   q: bad123.example.com.
375	   answer: bad123.example.com. A 192.0.2.66(evil)
376	   auth: example.com. NS bad123.example.com.
377	   protected: 2181 adherence plus NS record pinned by NS query.

379	   NS change
380	   q: bad123.example.com.
381	   answer: bad123.example.com. A whatever
382	   auth: example.com. NS evil.example.net.
383	   protected: 2181 adherence plus NS record pinned by NS query.

385	   NS server migration
386	   q: bad123.example.com.
387	   answer: bad123.example.com. A whatever
388	   auth: example.com. NS ns1.example.com. (normal entry)
389	   auth: example.com. NS ns2.example.com.evil.example.net.
390	         (evil, looks like typo in server migration)
391	   protected: 2181 adherence plus NS record pinned by NS query.

393	   CNAME
394	   q: bad123.example.com.
395	   answer: bad123.example.com. CNAME www.example.com.
396	   answer: www.example.com. A 192.0.2.66(evil)
397	   protected: CNAME chain cutoff.

399	   DNAME one message
400	   q: www.bad123.example.com.
401	   answer: bad123.example.com. DNAME example.com.
402	   answer: www.bad123.example.com. CNAME www.example.com.
403	   answer: www.example.com. A 192.0.2.66(evil)
404	   protected: DNAME chain cutoff.

406	   DNAME whole zone
407	   q: bad123.example.com.
408	   answer: example.com. DNAME evil.example.net.

410	   answer: bad123.example.com. CNAME bad123.evil.example.net.
411	   answer: bad123.evil.example.net. A whatever
412	   protected: no DNAME from cache.

414	   New Delegation - rigged
415	   q: bad123.www.example.com.
416	   answer: (empty)
417	   auth: www.example.com. NS www.example.com.
418	   add: www.example.com. A 192.0.2.66(evil)
419	   protected: the NS queries that ask referral confirmation
420	   together with glue queries.

422	   New Delegation - looks normal
423	   q: bad123.www.example.com.
424	   answer: (empty)
425	   auth: www.example.com. NS ns1.evil.example.net.
426	   auth: www.example.com. NS ns2.evil.example.net.
427	   protected: the NS queries that ask referral confirmation
428	   together with glue queries.

430	   New Delegation - for glue
431	   q: bad123.example.com.
432	   answer: (empty)
433	   auth: bad123.example.com. NS ns1.example.com.
434	   additional:  ns1.example.com. A 192.0.2.66(evil)
435	   protected: rfc2181 adherence.

437	   Another hitherto unknown variation
438	   These are a lot of variations and it is very likely that other
439	   people can come up with better, different ideas.
440	   protected: by entropy measures, by the count-and-wipe measure.
441	   Long term solutions (PING, TCP, DNSSEC) also aim to protect
442	   against these much more thoroughly.

444	5.  Security Considerations

446	   All of the mitigations aim to provide more security.  But, several of
447	   these mitigations have adverse effects on performance and bandwith.

449	   The CNAME, DNAME, NS and nameserver address mitigations all require
450	   that additional lookups be performed.  The CNAME and DNAME target
451	   lookups cause the answer to the client to be delayed.  The NS set and
452	   nameserver address lookups cause a higher load on both authority and
453	   resolver servers.

455	   The detection mechanism is susceptible to denial of service attacks.
456	   A small, calculated, amount of additional DoS leverage is provided.
457	   This changes some spoof attacks into a denial of service.

459	   The NS set and nameserver address lookups cause the NS, A and AAAA
460	   RRsets to be pinned in the cache until the TTL expires.  This
461	   provides cache overwriting protection, but at the cost of not picking
462	   up updates to these RRsets in the course of normal resolution.
463	   Changes to these RRsets are then no longer seen on the next query,
464	   but only after the TTL times out.  This adversely affects the
465	   coherency of the DNS server infrastructure, as it becomes more likely
466	   that resolvers operate using out of date nameserver data.

468	6.  IANA Considerations

470	   None.

472	7.  Acknowledgments

474	   Thanks to Nicholas Weaver (ICSI Berkeley) and Olaf Kolkman (NLnet
475	   Labs).

477	8.  Informative References

479	   [I-D.vixie-dnsext-dns0x20]  Vixie, P. and D. Dagon, "Use of Bit 0x20
480	                               in DNS Labels to Improve Transaction
481	                               Identity", draft-vixie-dnsext-dns0x20-00
482	                               (work in progress), March 2008.

484	   [RFC2181]                   Elz, R. and R. Bush, "Clarifications to
485	                               the DNS Specification", RFC 2181,
486	                               July 1997.

488	   [RFC5452]                   Hubert, A. and R. van Mook, "Measures for
489	                               Making DNS More Resilient against Forged
490	                               Answers", RFC 5452, January 2009.

492	Author's Address

494	   Wouter Wijngaards
495	   NLnet Labs
496	   Science Park 140
497	   Amsterdam  1098 XG
498	   The Netherlands

500	   Phone: +31-20-888-4551
501	   EMail: wouter@nlnetlabs.nl