DNS Extensions Working Group                                   N. Weaver
Internet-Draft                            International Computer Science
Intended status: Informational                                 Institute
Expires: April 3, 2009                                September 30, 2008


      Comprehensive DNS Resolver Defenses Against Cache Poisoning
             draft-weaver-dnsext-comprehensive-resolver-00

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 3, 2009.

Abstract

   DNS resolvers are vulnerable to many attacks on their network
   communication, ranging from blind attacks to full men-in-the-middle.
   Although a full man-in-the-middle can only be countered with
   cryptography, there are many layers of defenses which apply to less
   powerful attackers.  Of particular interest are defenses which only
   require changing the DNS resolvers, not the authoritative servers or
   the DNS protocols.  This document begins with a taxonomy of attacker
   capabilities and desires, and then discusses defenses against classes
   of attackers, including detecting non-disruptive attacks, entropy
   budgeting, detecting entropy stripping, semantics of duplication, and
   cache policies to eliminate "race-until-win" conditions.  Proposed
   defenses were evaluated with traces of network behavior.


Weaver                    Expires April 3, 2009                 [Page 1]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  A Taxonomy of Attacks  . . . . . . . . . . . . . . . . . . . .  3
     2.1.  Passive or Active Attacks  . . . . . . . . . . . . . . . .  3
     2.2.  Blind or Aware Attacks . . . . . . . . . . . . . . . . . .  4
     2.3.  Non-Disruptive or Disruptive Attacks . . . . . . . . . . .  4
     2.4.  Transaction or Cache Attacks . . . . . . . . . . . . . . .  4
     2.5.  Direct Mapping or Ancillary Data Attacks . . . . . . . . .  4
   3.  Race-Until-Win Blind Attacks . . . . . . . . . . . . . . . . .  4
   4.  Requirements for Blind Transaction Attacks . . . . . . . . . .  5
   5.  General Evaluation Strategy  . . . . . . . . . . . . . . . . .  6
   6.  Directly Detecting Non-Disruptive Attacks  . . . . . . . . . .  6
   7.  Entropy Budgeting  . . . . . . . . . . . . . . . . . . . . . .  8
   8.  Entropy Stripping  . . . . . . . . . . . . . . . . . . . . . .  9
   9.  On Duplication for Entropy Increase  . . . . . . . . . . . . .  9
   10. Cache Policy, Scoping, and Ancillary Data Attacks  . . . . . . 11
     10.1. Preliminary Estimates of Performance Impact  . . . . . . . 13
     10.2. Optimization: Only One A Record for the NS RRSet . . . . . 14
     10.3. Optimization: Object Scope . . . . . . . . . . . . . . . . 14
     10.4. Optimization: Lazily Fetching the NS RRSet . . . . . . . . 15
     10.5. Accepting Requests for Change  . . . . . . . . . . . . . . 16
   11. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 16
   12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
   13. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 17
   14. Security Considerations  . . . . . . . . . . . . . . . . . . . 17
   15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
     15.1. Normative References . . . . . . . . . . . . . . . . . . . 18
     15.2. Informative References . . . . . . . . . . . . . . . . . . 18
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 18
   Intellectual Property and Copyright Statements . . . . . . . . . . 20


Weaver                    Expires April 3, 2009                 [Page 2]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


1.  Introduction

   DNS resolvers are susceptible to many attacks on their network
   traffic, ranging from an attacker performing blind packet injection
   to a full man-in-the-middle, capable of controlling all traffic the
   resolver receives.

   With the recent discovery of the Kaminski Attack [kaminski], new
   attention has been focused on securing DNS from adversaries.  This
   document focuses on a subset of the problem: securing DNS resolvers
   without changing the DNS authoritative servers or protocols,
   including authorities that do not actively follow the DNS
   specification.

   This document begins with a taxonomy of attacker properties
   (Section 2), observations on race-until-win blind attacks
   (Section 3), the limitations of blind transaction attacks
   (Section 4), the evaluation strategy used to study possible defenses
   (Section 5), directly detecting non-disruptive attacks (Section 6),
   entropy budgeting (Section 7), detecting entropy stripping
   (Section 8), the effects of duplication on protecting the transaction
   and cache while maintaining compatibility (Section 9), and finally,
   the effects of cache policies which resist race-until-win attacks
   (Section 10).

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


2.  A Taxonomy of Attacks

   Not all attacker capabilities are equal, and different defenses are
   able to block some classes of attackers.  By forming a taxonomy, one
   can describe the classes of attackers which each defense can detect
   or mitigate.

2.1.  Passive or Active Attacks

   A passive attacker must wait for the targeted resolver to make
   requests, while an active attacker is able to trigger specific
   requests by the resolver.  In general, there are numerous mechanisms
   which an attacker can use to trigger requests.  One must assume that,
   should the attacker desire it, the attacker can be active.


Weaver                    Expires April 3, 2009                 [Page 3]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


2.2.  Blind or Aware Attacks

   A blind attacker does not know any of the entropy sources (e.g. the
   Transaction ID, port selection, capitalization) of the request.  An
   aware attacker knows this information because he can directly observe
   the request of the resolver.  Increasing the entropy of a request
   increases the difficulty for blind attackers, but has no effect on
   aware attackers.

2.3.  Non-Disruptive or Disruptive Attacks

   A non-disruptive attacker is unable to block either the resolver's
   request or the authority's response, while a disruptive attacker can
   halt either the legitimate request or the legitimate response.
   Disruptions can take the shape of a DOS attack (if the attacker is
   not on the packet path), taking advantage of a failure of the
   authoritative DNS server, mechanisms in the physical layer which
   enable sequelching a sender, or dropping packets if the attacker is a
   man-in-the-middle.

2.4.  Transaction or Cache Attacks

   A transaction attack targets the individual transaction requested by
   the end-client: the attacker needs to get his response immediately
   accepted by the victim application, while a cache attack requires
   that the resolver cache the attacker's result for future use.

   Transaction attacks are often less powerful than cache attacks: A
   transaction attack targets the single victim request, while cache
   attacks can have lasting effects until the TTL expires.  For blind
   attacks, as discussed in Section 4, blind transaction attacks are
   strictly less powerful than blind cache attacks.

2.5.  Direct Mapping or Ancillary Data Attacks

   A direct attack targets the immediate answer to the question asked by
   the resolver.  An ancillary data attack requires that the resolver
   accept some data, be it an NS RRSet, a CNAME, an A record, or other
   result, which is not the direct answer to the question.  A
   transaction attack must target the direct mapping, while cache
   attacks can target ancillary data.


3.  Race-Until-Win Blind Attacks

   The Kaminski attack is a blind, cache attack targeting ancilllatory
   data.  The additional power of the Kaminski attack is not a reduction
   in the number of packets required for a successful blind attack, but


Weaver                    Expires April 3, 2009                 [Page 4]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   a reduction in time.  Rather than only running one race-condition
   attack every TTL, the Kaminski attack and variations all rely on
   ancillary information to achieve a "race-until-win" property: when
   the attack fails, it can be immediately retried without delay instead
   of waiting for the TTL to expire.

   If a blind attacker targets an actual record, that race can only be
   run once every TTL, as subsequent requests will simply hit in the
   cache.  Thus, creating any race-until-win attack on a single name
   requires targeting ancillary data.


4.  Requirements for Blind Transaction Attacks

   A blind transaction attack is strictly harder than a blind cache
   attack.  In a blind cache attack, the attacker only needs to coerce
   the victim into generating DNS requests.  A blind transaction attack
   not only requires coercing the victim into generating a DNS request,
   but also requires that the victim act upon the result of that
   request.

   Blind transaction attacks are also less powerful than blind cache
   attacks.  In order to target a name with a race-until-win attack, the
   attacker must be able to not only get the victim to generate a DNS
   request, and act on that request, but that request must be valid for
   any arbitrary subname within the target domain.  If the blind
   transaction attack is targeting a single name, it can never be run
   with a race-until-win property, as it must target the direct mapping
   and not ancillatory data.

   However, there does exist at least one significant blind transaction
   attack which can be conducted with a "race-until-win" property:
   targeting cookies contained in a web browser.  In this attack, the
   attacker coerces the victim into visiting a site of the attacker's
   choosing.  This site opens up an iFrame which points to
   1.www.target.com, which the attacker attempts to poison with a blind
   attack.  If it fails, the Javascript on the site creates a second
   iFrame, 2.www.target.com.  This process iterates until success,
   whereupon the victim's browser contacts the attacker's web site,
   presenting all relevant cookies.  Since each attempt uses a different
   name, the attacker can try continuously until successful.

   Although such cookie stealing is noteworthy, any site which allows
   cookies to be stolen in this manner is also trivially vulnerable to
   many attack tools such as Cookie Monster [perry], and any web service
   which resists such tools also resists this attack.  It is unclear
   whether the DNS infrastructure should be concerned with this
   particular attack.


Weaver                    Expires April 3, 2009                 [Page 5]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   It is also unclear what other blind transaction attacks are possible
   with this "race-until-win" property, but it relies on the end-
   application trusting arbitrary subnames and subdomains for attacker-
   triggered requests.  Without this property, the attacker cannot
   perform a "race-until-win" blind transaction attack.  Thus, blind
   transaction attacks, although they need to be considered, are a far
   narrower threat than blind cache attacks.


5.  General Evaluation Strategy

   We evaluate multiple defenses in this document using network traces
   analyzed by "Bro", a network analysis environment primarily used for
   intrusion detection.  Bro includes a detailed analyzer for DNS
   behavior, coupled to a scriptable analysis language.

   The primary evaluation was conducted on traces of the ICSI border,
   capturing 18,329,249 UDP external DNS requests generated by ICSI's
   resolvers over a period of 19 days.  This initial evaluation ignored
   PTR lookups.


6.  Directly Detecting Non-Disruptive Attacks

   Any non-disruptive attack will, with high probability, have the
   resolver receive two separate and valid responses: one from the
   attacker and one from the authoritative server.  This will occur on
   blind attacks where the attacker does not suppress the correct
   operation of the authoritative resolver, such as by flooding it to
   achieve a denial-of-service.  It will also occur if an attacker is
   acting as a packet injector on a broadcast media if the attacker is
   not able to squelch the sender.

   The presence of a changed duplicate response, where the second
   response is different from the first response within a short period
   of time can thus be used to directly detect that a non-disruptive
   attack has occurred on a transaction, and mitigate any poisoning of
   the cache.

   The algorithm is simple: the resolver maintains a list of all
   answered responses for a timeout which should be on the order of one
   or two seconds.  If a second response for a request arrives whose
   transaction and other entropy matches the accepted response but whose
   value is different, this should be treated as an attack, and all
   cache entries set or dependent on the first result should be voided,
   mitigating the attack's effect on the cache.  Thus a transaction
   attack is detected but not mitigated (as the final victim may have
   acted on the result), but a cache attack is both detected and


Weaver                    Expires April 3, 2009                 [Page 6]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   mitigated

   Furthermore, if the resolver is a recursive resolver, it should also
   forward the second response to the originator with the TTL changed to
   0 seconds, which allows the initial questioner, if running the same
   algorithm, to also know that the result was compromised, enabling the
   end system to detect the attack on the transaction and mitigate any
   effects on a local cache.

   We developed a Bro analysis script to detect this condition to
   evaluate the false positive rate.  We observed only one such site at
   ICSI, a realtime DNS blackhole service.  A run at Ohio State did
   found a few other sites, which we haven't yet assessed.

   As importantly, if resolvers implemented this approach, they would
   fail to cache only a handful of authorities which present this
   anomaly on normal results.  Until the stub resolvers are updated to
   treat this condition as an attack, these sites would still resolve
   successfully but would not be cached.

   The state-holding requirements, although nontrivial, are also
   reasonable.  Consider a recursive resolver (or a cluster node in a
   clustered resolver implementation) that generates 10,000 outstanding
   external queries per second.  If the timeout is 1 second, and the
   state-holding required involves 1 KB of data, the additional memory
   requirements would only be 10 MB of data for such a resolver.  Even
   with a longer timeout or a more active resolver, the state-holding
   requirements should be reasonable.

   Furthermore, although the replies themselves need to be maintained,
   for the most part this information is already retained in the cache.
   For any data element that is cached, only a pointer to that element
   needs to be stored to perform this consistency check, rather than the
   full answer.

   This should also not affect any resolver composed of a cluster of
   systems as long as the load balancer is deterministic: always sending
   the second response back to the cluster node which received the first
   response.  This enables each cluster node to maintain its own list of
   replies to check for changed responses.

   This policy would have no effect on the latency of resolvers.  Until
   end-user stub resolvers and applications are updated to treat this as
   an attack, even sites with anomalous DNS authorities will still
   resolve properly, but may experience slightly higher load due to lack
   of cacheability.


Weaver                    Expires April 3, 2009                 [Page 7]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


7.  Entropy Budgeting

   An important question is "how much entropy is necessary" to prevent
   blind attacks.  It's obvious that 16 bits (16b) of entropy in a query
   is insufficient, both to protect transactions and to protect the
   cache. 32b of entropy may in practice currently suffice to mitigate
   most attacks today, as attackers will prefer victims that do not
   implement proper randomization.

   Thus the question, assuming all resolvers implement increased entropy
   defenses, what is sufficient query entropy to protect the cache?  We
   argue that 40b of entropy is more than sufficient.  As the
   probability of attack success with K tries and N bits of entropy is
   1-(1-1/2^N)^K, 40b of entropy requires nearly a terapacket of attacks
   to be successful with reasonable probability.

   40b of entropy is easily obtainable for resolvers not behind entropy-
   stripping devices such as NATs (Section 8).  With 16b of entropy for
   the transaction ID, over 15b of entropy for source-port
   randomization, and names of at least 9 characters using 0x20
   randomization (observing that most DNS authoritative servers maintain
   capitalization, thus random capitolization can provide a nonce
   value)[0x20] achieves the necessary 40b threshold without resorting
   to duplication in most cases (Section 9).

   It is possible for a resolver to use a simple count of failures,
   responses which match requests and appear to come from the proper
   authority but have different entropy values, to know that it is under
   attack and respond with duplication Section 9 while the attack is
   ongoing, well before an attacker could have an effect on the cache
   with large entropy.  In our observations on ICSI traces we noticed
   that some authoritative servers do this naturally, but the rate is
   low.

   If the resolver responded to 1k PPS of attacks with duplication, and
   the entropy budget is 40b, an attacker attempting to go below the
   threshold by sending 990 attacks per second would need over 2 hours
   to have even a .001% chance of success, or over 80 days of a
   continual attack to have just a 1% chance.  Yet by requiring the
   attacker to send 1k packets-per-second to trigger duplication in the
   resolver, duplication which only needs to be in place while under
   attack, this shouldn't prove to be an effective DOS.

   However, the author initially advocated clearing the cache at a
   threshold of attack.  This was an error, as voiding the cache does
   not provide a benefit as the attacker knows when his attack is
   successful, and could accept the voided cache and just keep trying
   until successful.


Weaver                    Expires April 3, 2009                 [Page 8]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


8.  Entropy Stripping

   Many network devices, such as NATs, firewalls, and mandatory proxies,
   can exist between a resolver and the remote authority it is querying.
   Some of these devices will rewrite important information, such as IP
   addresses, UDP port numbers, or even DNS transaction IDs, fields that
   the resolver relies on as sources of query entropy.  These devices
   are likely to be located near one or the other endpoint system.

   If a DNS resolver is attempting to increase its request entropy by
   using one or more of these sources, the resolver must know if these
   entropy sources are being stripped from its network communication.

   The simple solution is to query a specialized authority which returns
   the entropy value it receives as an A record.  An example of such an
   authority has been temporarily operating at
   {port,id,server}.nettest.icir.org, where the transaction ID and port
   are returned as the least significant two bytes of the address, while
   server returns the IP address of the contacting DNS resolver.  An
   alternate version, entropy.nettest.icir.org, returns a CNAME to a
   human readable form of all three values, while respecting the
   incoming capitalization to verify 0x20 operation.

   When a DNS resolver starts, and when a resolver notices that its IP
   address has changed, it should query such an authority, provided by
   either the resolver's software developer or a third party, to
   determine if it is located behind a NAT or other entropy-stripping
   device.  If the resolver is behind a NAT, it must then use
   duplication (Section 9) to protect the cache if the remaining entropy
   is not sufficient to meet the security goals (Section 7).

   A case where this does not provide protection is if the authority and
   the attacker, not the resolver, is behind an entropy-stripping
   network device.  In such a case, an attacker capable of forging
   packets from within the authority's network is likely able to perform
   other activities far more damaging than a blind attack on DNS
   requests from that authority.  Also, the entropy stripping is only
   affecting queries to this authority, not all queries performed by a
   resolver.  This also assumes than any entropy-stripper is not
   malicious and would therefore not benefit from actively whitelisting
   the test.


9.  On Duplication for Entropy Increase

   If 40b of entropy are available on the request and the resolver is
   not under a significant attack, duplication is not necessary.  Thus
   duplication should be viewed as a fallback position for resolvers


Weaver                    Expires April 3, 2009                 [Page 9]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   which are behind a NAT or other entropy stripping device, accessing
   authoritative servers which don't respond to 0x20 (per Dagon et. al's
   proposal [0x20], or is known to be under an active blind attack.

   Most authorities are deterministic to multiple queries: if two
   requests for the same name are received in a short period of time,
   the returned values will be identical.  By issuing two identical
   requests with different entropy values, this nearly doubles the
   entropy if we require that both responses are identical before
   acceptance.  Specifically, if the request has K bits of entropy, two
   requests which are accepted if the responses are identical has 2K-1
   bits of entropy.

   Dagon et. al.'s 0x20 proposal [0x20] uses this to increase query
   entropy when capitalization is not preserved by an authority, as
   without 0x20 the available entropy is only around 32b per query.
   Similarly, duplication can be used in any case where there is
   insufficient entropy, such as the impact of an entropy-stripping
   network device, or if the resolver knows it is currently under a
   blind attack.

   Unfortunately, not all authorities are deterministic in this manner,
   including some critical authoritative servers belonging to Content
   Distrbution Networks such as Akamai[akamai] and others that use
   frequently changing DNS responses for load-balancing.  If two
   distinct responses are received, and the resolver randomly selects
   one, this reduces by 1 bit the entropy of the request when compared
   with no duplication.  Thus, for purposes of the cache, it is clear
   that a resolver must not cache a response when the two responses are
   different.

   However, it appears reasonable to return one of the values as the
   answer to the transaction, as blind transaction attacks are less
   powerful than blind cache attacks (Section 4).  For a blind
   transaction attack to work, the attacker must target a domain served
   by such a volatile authority as well as coerce the victim to act on
   this.  Although this does leave a small window of vulnerability open,
   it is proabably preferable to the alternative of not resolving thees
   names.

   By setting the TTL to 0 and returning a randomly selected response,
   this enables compatibility with nondeterministic authorities without
   compromising the integrity of either the resolver's cache or the
   final client's cache.  Since it also only reduces the transactional
   entropy by 1b, it does not make transactions significantly less
   secure than without duplication.

   The alternate approach, iterate until convergence as proposed by


Weaver                    Expires April 3, 2009                [Page 10]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   Barwood [barwood], will succeed with very high probability if the
   remote resolver returns single answers, but fails when the authority
   returns RRSets which contain more than one element where the RRSets
   are randomly chosen, if a complete match on the RRSet is required.
   If "return one of N of the RRSets" is employed it works well.


10.  Cache Policy, Scoping, and Ancillary Data Attacks

   RFC 2181 section 5.4.1 [RFC2181] is the current specification for how
   to accept ancillatory data.  It allows ancillatory data, if "in
   bailywick" (from a server which should be an authority for this
   name), to be cached for subsequent use, although it should not be
   returned as an answer.  It is this caching of ancillatory data that
   enables the "race-until-win" Kaminski blind cache attack.  Likewise,
   before bailywick-checking was deployed, ancillatory data was used for
   classic glue poisoning.

   The ancillatory data includes all the data beyond the direct answer
   to the query (including the NS RRSet, A records associated with the
   NS RRSet, A records associated with a CNAME for the direct answer, or
   any other data).  This data serves four purposes:

   o  The NS RRSet and associated A records needed to resolve the
      current request.

   o  Items, such as an A record for a CNAME alias, which if accepted
      will speed the current request's processing by removing the need
      to fetch additional records.

   o  Items which if placed in the cache will speed subsequent lookups.

   o  An indication that an item in the cache is now obsolete.

   This suggests that items have a different role in two scopes,
   analogous to how programming languages view scope: the local scope of
   the current recursive query and the global scope of the cache.
   Within the local scope, for purposes of resolving the direct
   question, ancillary data often must be trusted: otherwise
   authoritative nameservers may not be reachable when they exist within
   the domain being queried or in other cases where domains host each-
   others nameservers.  Yet for the purposes of resolving a query, if an
   authority lies about the ancillatory data, it could just as easily
   lie about the direct answer, making this data no less trustworthy for
   processing this answer.

   Yet for purposes of inclusion into the global scope, or for returning
   as the response to a query, ancillary data must not be trusted.  It


Weaver                    Expires April 3, 2009                [Page 11]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   is, by definition, unsolicited information and not an authoritative
   response, and lies at the heart of the Kaminski attack.  If a
   recursive resolver never accepts ancillary data into the cache, it
   becomes impossible to target a single name with a race-until-win
   blind attack.

   However, a resolver may safely perform an independent fetch for any
   piece of ancillary data.  This second fetch must not reuse the local
   scope of the previous fetch, but instead be fetched using a new local
   context.  This enables both the use of ancillary data in responses
   (such as A records involved in CNAMES) and to speed subsequent
   responses (such as obtaining the NS RRSet for subsequent lookups).

   As an example, suppose the query for 'www.example.com' returns
   'www.example.com CNAME server.example.com, example.com NS
   ns.example.com, ns.example.com A 12.34.56.78, server.example.com A
   12.34.56.90'.  The resolver can succesfully cache and return
   'www.example.com CNAME server.example.com', but must perform
   independent queries to fetch the NS RRSet and the A records for
   ns.example.com and server.example.com, and it can't return
   'www.example.com CNAME server.example.com, server.example.com A
   12.34.56.90' to the final client until the lookup for
   server.example.com completes.

   As a result, all information which is either placed in the global
   scope or returned to the final client will be validated directly by
   querying the proper authoritative server.  There are no race-until-
   win attacks possible for including a name into the cache, as
   inserting a name only would take place if it doesn't exist in the
   cache which limits any attempt to once every TTL.

   This is more restrictive than the policy in RFC 2181 section 5.4.1
   [RFC2181].  The RFC policy allows ancillary data to be accepted into
   the global scope (cache) for purposes of subsequent query processing,
   which enables race-until-win attacks.

   Although the RFC states that the final client should also not accept
   these authority or additional records, it is unclear whether the stub
   resolver follow the RFC.  Thus a well structured recursive resolver
   should return results which are safe even if the client does cache
   additional records in violation of the RFC.

   The simplest mechanism for validation is to perform an explicit
   fetch, a policy being implemented in Unbound [unbound]: For CNAMEs
   and A records that are not the direct answer to the query, such
   records are not accepted directly but are instead fetched
   independently.  For the NS record and associated A records, the
   separate global and local scope is approximated by validating the NS


Weaver                    Expires April 3, 2009                [Page 12]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   RRSet and all elements within it using subsequent separate fetches.

   There is some minor uncertanty if this is a close-enough
   approximation for the NS RRSet.  There exists a short-term time
   window where the NS RRSet and A records would be in the cache but
   unvalidated, which may provide a narrow opportunity for abuse.
   However, this policy does not require new data structures to maintain
   the local scope of a recursive query, and the abuse window is
   probably sufficiently small.

   Note that if an explicit local scope is maintained, such a policy of
   fetching all ancillary data rather than including it subsumes
   bailiwick checking for accepting data into the cache, as no data is
   included into the cache for any use that was not explicitly requested
   from an authoritative server.

10.1.  Preliminary Estimates of Performance Impact

   Such a policy, although protecting from race-until-win conditions,
   does impose a higher load on the DNS infrastructure.  For queries
   whose direct answer is not a CNAME, it does not add any latency to
   query processing if the resolver returns an empty authoritative NS
   RRset when the NS RRset has yet to be validated, but the number of
   outstanding queries is significantly increased.

   We developed a Bro analysis script to estimate the impact of this
   policy.  This policy accepted a trace of DNS requests and replies and
   estimated the number of additional queries that would be generated to
   follow CNAME chains, to cache the NS RRSet and associated A records,
   and to cache any A records not associated with either a CNAME chain
   or the NS RRSet.  It creates a model of what the resolver's own cache
   looks like and uses this to estimate the load from additional
   requests.

   For ICSI's 18,329,249 queries, this simple policy would require an
   additional 16% additional fetches for NS records, an additional 10%
   of fetches for A records associated with the NS RRSets, .7%
   additional fetches of A records not associated with NS RRSets and a
   trivial number of CNAMEs.  Thus this policy increases the number of
   requests by 27%, a nontrivial but still modest overhead.  Later on,
   we discuss some policies that can significantly reduce the number of
   fetches while maintaining safety.

   Only a few queries would have their latency increased if the resolver
   does not return the NS RRset to the client for the first query
   because it is still unvalidated.  Thus the only cases where a query
   would see increased latency is when the answer is a CNAME and the
   record pointed to by this CNAME is in the record.  Only .2% of ICSI's


Weaver                    Expires April 3, 2009                [Page 13]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   queries showed this behavior.

   If the resolver does include the NS RRSet it needs to wait for it to
   be validated in case the final client caches the names for other
   purposes.  This would increase the latency on 16% of the queries
   which require accessing an external authority to obtain the NS RRSet.

10.2.  Optimization: Only One A Record for the NS RRSet

   It is obviously excessive to fetch all A records for the NS RRSet:
   unless there is a failure, only one nameserver needs to be contacted
   to return future results, although that resolver might be sub-optimal
   from a latency viewpoint.  If only one A record is fetched for the NS
   RRSet rather than all A records, this reduces from 10% to 6% the
   number of additional records fetched for the purpose of providing a
   valid name associated with each NS RRset.

   However, this may increase the latency of subsequent responses if the
   chosen authoritative server is not the closest.  This latency could
   be mitigated by performing additional fetches of alternate
   nameservers as requests for a domain continue to be made.  One
   suggestion is to use the "closest name", to ranodmly select the name
   which most closely matches the domain.  Thus if "example.com NS
   ns.example.com, ns.example2.com, ns.example3.org", needed to be
   cached, ns.example.com would be fetched.  If this lookup failed,
   ns.example2.com would be fetched, followed by ns.example3.org:
   selecting from the most matching to least-matching name.  It is
   unclear what the latency penalty would be for this heuristic.

10.3.  Optimization: Object Scope

   There does exist an additional data scope: object local scope, that
   can act as an optimization.  For example, the A records associated
   with an NS RRSet cannot be accepted directly.  Yet since they were
   returned in association with a specific query for the NS RRSet, they
   can be trusted solely for purposes of evaluating the NS RRset.

   As an example, if the response for a query for the nameservers of
   example.com is 'example.com NS ns.example.com, ns.example.com A
   12.34.56.78', it is acceptable to use the A record solely for the
   purposes of a subsequent lookup of a value in example.com, but not
   for any other purposes.  This, in programming language terms,
   represents object-local scope: a data value that can only be used in
   the context of another data value.

   This arrises from a simple observation: if the response is corrupted,
   any value in the response could have been corrupted, including the
   legitimate answer.  Thus there is no risk in using the ancillatory


Weaver                    Expires April 3, 2009                [Page 14]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   data in a context where the direct answer was trusted.  But
   ancillatory data can not be trusted outside the context where the
   direct answer is trusted, because this enables race-until-win
   attacks.

   There are two mechanisms where support for object-local scope can
   enhance resolver performance without compromising safety.  The first
   is to collapse CNAMES and return the result, per Ohta's suggestion
   (cite).  In this, an authority that returns "www.example.com CNAME
   a.example2.com, a.example2.com A 10.34.56.78" could be treated by the
   resolver (and returned to the final client) as "www.example.com A
   10.34.45.78".  This acts to eliminate the latency penalty for
   fetching CNAME chains.

   The second area where object-local scope provides significant savings
   is for the NS RRset.  When the NS RRset is requested, the A records
   can't be considered as authoritative in general, but MAY be
   considered as valid only for the use of the NS RRset for subsequent
   name lookups.  This completely eliminates the need to look up the A
   records associated with the NS RRSet, while still preventing auxilary
   data from poisoning the cache.  However, it does require changes to
   cache architectures to support this notion: the NS RRset records must
   include inernal A records which are not exported to the rest of the
   cache.

   The savings are significant: support for object-local scope allows
   the resolver to not fetch the A records for the NS entries, reducing
   the total overhead from 27% to 17% for ICSI's traffic

10.4.  Optimization: Lazily Fetching the NS RRSet

   There exists a latency/performance tradeoff in fetching the NS RRset.
   Observe that, for many sites, only a single name is used.  Thus,
   fetching (and caching) the NS RRset represents wasted effort.
   Instead of eagerly fetching the NS RRset, fetching the NS RRset and
   any associated A records can be delayed until a second query is
   generated.

   The traffic savings are substantial.  In the ICSI traces, instead of
   requiring 16% additional fetches of the NS RRset, only 4% additional
   fetches would have been required, with a similar reduction in the
   number of A records which need to be fetched.  As a consequence,
   however, the lookup of the second name will be delayed as the NS
   RRset needs to be fetched first.  Yet the penalty is fairly modest,
   as only the second (and not subsequent) fetches would incur this
   latency penalty.  Thus only 4% of the queries would see this latency
   penalty.


Weaver                    Expires April 3, 2009                [Page 15]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


10.5.  Accepting Requests for Change

   The final use of ancillary records is to indicate a change request: a
   statement that a previously cached value is no longer valid despite
   still being within the TTL of the item.  The use of ancillatory data
   to indicate changes is somewhat out of the protocol specification,
   but is often considered essential behavior.

   It is clear that a resolver must not overwrite an item in the cache
   with an ancillary item for any reason, as otherwise ancillary data
   can be used for a race-until-win attack using data replacement
   instead of data inclusion.

   It is also clear that a resolver must not void a cache entry upon a
   change request for an ancillary item when the response is not in
   bailiwick.  Otherwise, an attacker who controls an arbitrary
   authority could construct a race-until-win attack by alternating
   between attacking the name and performing an unrelated query which
   uses the attacker's authority uses to void the name.

   However, a resolver may (and probably should) safely respond to an
   in-bailiwick request for change by voiding the cache entry associated
   with the ancillary item.  A blind attacker cannot use this behavior
   to create a race-until-win condition, as the attacker would have to
   win a race against an arbitrary name in the same domain to void the
   cache entry and then win the next subsequent race to set the cache
   entry to the attacker's desired value.  As winning two back-to-back
   races is exponentially harder than winning a single race, the policy
   is safe from attack while still enabling ancillary data to act as a
   notification of change.


11.  Conclusions

   There are many defenses which can be layered to provide robust
   defenses for recursive DNS resolvers.

   By looking for duplicate responses to the same transaction which have
   different value, all non-disruptive attacks can be directly detected,
   and their effects on the cache mitigated.  By forwarding the second
   response, the final victim can also be notified of the attack.  This
   requires ~10MB of state for a resolver performing 10,000 external
   queries a second.

   By setting an entropy budget of 40b, blind attacks are infeasible,
   requiring terapackets to have a high probability.  By querying
   special authoritative DNS servers, a resolver can detect any process
   which reduces this entropy, and can use duplicate requests to restore


Weaver                    Expires April 3, 2009                [Page 16]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


   entropy.

   For the few sites where duplication produces different results, it is
   probably safe to return a randomly selected result.  Although this
   still enables transactional cache attacks, it is probably better to
   accept this very narrow window of vulnerability to enable resolution
   of key sites.

   Finally, cache policy can eliminate 'race-until-win' attacks while
   subsuming most bailywick checks.  By only accepting and returning the
   direct answer to requests, an attacker can no longer conduct any
   race-until-win attacks targeting specific names.  The overhead is
   reasonable as even without optimizations, a study of our resolvers
   showed a 26% increase in requestes before any optimizations are
   applied.

   These changes all involve only resolver behavior, and they also
   combine to provide better protection than any one defense alone.
   Increasing entropy increases the blind attacker's work in packets,
   while eliminating race-until-win increases the work in time.  With a
   sufficient entropy budget, a resolver can detect that it is under
   attack and act according.  While directly detecting non-disruptive
   attacks can detect both packet injectors and many blind injectors.


12.  Acknowledgements

   This work is sponsored in part by NSF Grants ITR/ANI-0205519 and NSF-
   0433702.  All opinions are those of the author, not the funding
   institution.

   Feedback from Vern Paxson, Robin Sommer, Christian Kreibich, Paul
   Vixie (who also suggested the "closest name" heuristic), Seth Hall,
   Wooter Wijngaards, Dan Kaminski, and David Dagon


13.  IANA Considerations

   None


14.  Security Considerations

   This text is focused on security concerns for DNS resolvers.  The
   security aspects of each defense are discussed as part of each
   section.


Weaver                    Expires April 3, 2009                [Page 17]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


15.  References

15.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2181]  Elz, R. and R. Bush, "Clarifications to the DNS
              Specification", RFC 2181, July 1997.

15.2.  Informative References

   [0x20]     Vixie, P. and D. Dagon, "Use of Bit 0x20 in DNS Labels to
              Improve Transaction Identity", March 2008,
              <http://tools.ietf.org/html/
              draft-vixie-dnsext-dns0x20-00>.

   [akamai]   Akamai Inc, "The Akamai CDN", 2008,
              <http://www.akamai.com>.

   [barwood]  Barwood, G., "The Birthday Defense (IETF NameDroppers
              Mailing List)", September 2008, <http://www.ops.ietf.org/
              lists/namedroppers/namedroppers.2008/msg01647.html>.

   [kaminski]
              US-CERT, "Multiple DNS implementations vulnerable to cache
              poisoning", July 2008,
              <http://www.kb.cert.org/vuls/id/800113>.

   [perry]    Perry, M., "Fully Automated Active HTTPS Cookie
              Hijacking", August 2008, <http://fscked.org/blog/
              fully-automated-active-https-cookie-hijacking>.

   [unbound]  Wijngaards, W., "Resolver Side Mitigations", August 2008,
              <http://tools.ietf.org/html/
              draft-wijngaards-dnsext-resolver-side-mitigation-00>.


Weaver                    Expires April 3, 2009                [Page 18]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


Author's Address

   Nicholas Weaver
   International Computer Science Institute
   1947 Center Street suite 600
   Berkeley, CA  94704
   USA

   Phone: +1 510 666 2903
   Email: nweaver@icsi.berkeley.edu


Weaver                    Expires April 3, 2009                [Page 19]

Internet-Draft     Comprehensive DNS Resolver Defenses    September 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Weaver                    Expires April 3, 2009                [Page 20]