idnits 2.17.1 draft-barwood-dnsext-fr-resolver-mitigations-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 439. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 450. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 457. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 463. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 5 instances of too long lines in the document, the longest one being 1 character in excess of 72. == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 133: '...eneral purpose resolver MUST implement...' RFC 2119 keyword, line 224: '...purpose resolver MUST not rely on port...' RFC 2119 keyword, line 229: '... MUST use the same source port when ...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 275 has weird spacing: '...ecision is th...' == Line 330 has weird spacing: '...ults be prope...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Unfortunately it is impractical for a program to reliably determine whether a resolver is currently situated behind a NAT device that may undo port randomization ( and this can change for each packet sent ), so a general purpose resolver MUST not rely on port randomization for security. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 26, 2008) is 5661 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2181' is defined on line 410, but no explicit reference was found in the text Summary: 3 errors (**), 0 flaws (~~), 7 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 DNS Extensions Working Group G. Barwood 3 Internet-Draft 4 Intended status: Informational October 26, 2008 5 Expires: April 2009 7 Resolver side mitigations 8 draft-barwood-dnsext-fr-resolver-mitigations-08 10 Status of This Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire in March 2009 . 35 Abstract 37 Describes mitigations against spoofing attacks on DNS, including: 39 (1) Repeating the query, including techniques for handling 40 non-deterministic responses. 42 (2) Prepending a random nonce to the question where a referral is 43 probable. 45 (3) Estimating the entropy available, taking into account 46 (a) Observed packets with incorrect IDs. 47 (b) The content of the cache. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 2. Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 3. Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3.1. Query repetition . . . . . . . . . . . . . . . . . . . . 4 57 3.2. Randomize the case of the question (0x20). . . . . . . . . 5 58 3.3. Use a randomly chosen source port . . . . . . . . . . . . 6 59 3.4. Prepend a random nonce label to the question. . . . . . . 6 60 3.5. Maintain a count of observed Bad IDs . . . . . . . . . . . 7 61 3.6. Use of calculated entropy . . . . . . . . . . . . . . . . 7 63 4. Analyis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 64 4.1. Query repetition . . . . . . . . . . . . . . . . . . . . . 8 65 4.2. Impact on Root and TLD . . . . . . . . . . . . . . . . . . 8 66 4.3. Impact on other levels . . . . . . . . . . . . . . . . . . 9 67 4.4. Lame servers and the random nonce. . . . . . . . . . . . . 9 68 4.5. Security level . . . . . . . . . . . . . . . . . . . . . . 9 70 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 72 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 74 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 10 76 8. Informative References . . . . . . . . . . . . . . . . . . . . 10 78 1. Introduction 80 This document describes mitigations that a resolver can currently 81 deploy to resist spoofing attacks on DNS, without server software 82 being updated. 84 The context in which these solutions were explored is CERT 85 Vulnerability Note VU#800113, "Multiple DNS implementations 86 vulnerable to cache poisoning". 88 The Kaminsky attack proceeds by asking a recursive DNS server 89 a series of questions, each with a different random prefix, 90 and then sending spoof packets to the server, containing 91 additional records with genuine owner names but invalid data. 92 For example: 94 Query: 95 Question .com A 97 Spoof response: 98 Question .com A 99 Authority: com NS ns.evil.com 101 The effect is to inject an invalid record into the cache. 103 Since the ID field in the DNS packet header is only 16 bits, a 104 DNS server that does not deploy any mitigations can be 105 compromised in a matter of seconds. 107 [ An implementation of the techniques described can accessed at 108 http://www.george-barwood.pwp.blueyonder.co.uk/DnsServer/ ] 110 2. Criteria 112 These are resolver side solutions, thus only the resolver needs to be 113 redeployed, or the software updated. This allows updated resolvers 114 to be deployed immediately. 116 The solutions have to follow the DNS protocol. 118 The solutions have to be practical, non disruptive, and not 119 anti-social. 121 3. Mitigations 123 Below, the resolver side mitigations are described. 125 Query repetition (3.1) is necessary and sufficient, the other 126 mitigations reduce the number of queries needed for good security. 128 3.1. Query repetition 130 By repeating the query, additional entropy may be obtained. 132 Repetition is the only method of obtaining suitable entropy under 133 all conditions, so a general purpose resolver MUST implement 134 repetition. 136 A practical problem occurs when responses are non-deterministic, 137 that is many different responses are obtained for the same question. 139 In this case, the resolver will need to perform an analysis to 140 produce a converged result, or to report server failure (or a 141 security warning, if this is possible) if convergence has not 142 been achieved after some iteration limit. 144 The suggested method is to accumulate entropy for various attributes 145 of the response, specifically non-zero Rcodes (including an internal 146 representation of no Data ), the Resource Records (RRs), and the 147 cardinality of each Resource Record Set (RRset). 149 Each Response can have a counter that represents the number of 150 attributes that have not reached the required threshold. When the 151 counter reaches zero, that response is considered fully checked, 152 and is used as the converged result. 154 For example, suppose the question is MX records for example.com. 156 First response: 157 example.com MX mail1.example.com 158 example.com MX mail2.example.com 160 Second response: 161 example.com MX mail2.example.com ( mail2.example.com confirmed) 162 example.com MX mail3.example.com 164 Also confirmed : example.com MX has 2 alternatives. 166 Third response: 167 example.com MX mail3.example.com ( mail3.example.com confirmed ) 168 example.com MX mail4.example.com 170 The result is the second response. 172 Note that it is possible for an attacker to break RRset integrity 173 with a single forged response in the non-deterministic case. 174 For example, the second response in the example could be forged. 175 However this appears to be a very weak achievement. 177 Where convergence is very slow, some records may be omitted from the 178 convergence test, and discarded ( if not acceptable as described 179 in section 3.6 ), to be fetched later as required. 181 The records that are always kept are 183 (E1) Records where the owner name and type exactly match the question. 184 (E2) NS records where the query question ends with the owner name. 186 Other records may be discarded ( normally glue A records ). 188 For example, if the question is www.example.com A, then in a response 190 www.example.com A 1.2.3.4 : is always kept by (E1) 192 example.com NS ns.example.com : is always kept by (E2) 194 ns.example.com A 1.2.3.4 : may be discarded 196 There is a possibility that combinations of resource records may 197 result that would not occur normally. In the Akamai case, this could 198 in principle result in a loss of resilience, instead of 9 distinct 199 IP addresses for the name servers, some might be duplicated. 201 However no examples have yet been identified where a significant 202 problem arises, and discarding records is only found to be necessary 203 for the Akamai case, where full convergence might otherwise need about 204 100 queries. Stopping after about 10 queries typically results in one 205 or two glue A records being discarded, and 9 NS records and the 206 remaining 7 glue records being accepted. 208 In other cases, convergence generally occurs after at most 3 or 4 209 queries. 211 3.2. Randomize the case of the question (0x20) 213 Most authoritative servers preserve the case of the question in the 214 response, so some additional entropy may usually be obtained by 215 randomizing the case of the question. 217 3.3. Use a randomly chosen source port 219 This is a well-known method of obtaining extra entropy. 221 Unfortunately it is impractical for a program to reliably determine 222 whether a resolver is currently situated behind a NAT device that 223 may undo port randomization ( and this can change for each packet 224 sent ), so a general purpose resolver MUST not rely on port 225 randomization for security. 227 To avoid problems where authoritative servers may be behind firewalls 228 that enforce very low limits on incoming UDP connections, resolvers 229 MUST use the same source port when repeating a query ( 3.1 ). 231 3.4. Prepend a random nonce label to the question. 233 This msy be used where a referral is probable. 235 It allows an amount of entropy to be encoded limited only by the 256 236 character limit on a question, provided the authority server returns 237 a copy of the question in the response. 239 If the response is not a referral*, the response should be discarded, 240 and the query repeated without the nonce. 242 * That is any of the following are observed: 243 (a) The response is Authoritative ( AA bit is set in the header ). 244 (b) There is an error ( RCODE is not zero ). 245 (c) The answer section is not empty. 246 (d) The authority section is empty. 248 A simple heuristic for deciding where a referral is probable is: 250 (1) If the Bailiwick is Root or a TLD, and the question is not equal 251 to the Bailiwick a referral is probable. 253 (2) Otherwise a referral is not probable. 255 3.5. Maintain a count of observed Bad IDs 257 The approximate number of incorrect IDs observed in some fixed 258 time period, for example the last 20 seconds, may be kept. 260 This value may be used to decide when to deploy mitigations, such 261 as extra query repetition, and allows a smooth response to attacks, 262 while maximising performance under normal conditions where no 263 attack is observed. 265 3.6. Use of calculated entropy 267 When a response is received, an entropy calculation may be performed 268 to estimate how many bits have been checked. 270 It will typically include 16 bits for the ID, 0x20 bits, 271 bits from the prepended nonce, and discount for unusual / 272 non-standard features (such as IP mismatch, question not copied). 274 The entropy is accumulated for each response attribute, as described 275 in 3.1, and a decision is then made to decide whether a value is 276 to be accepted as valid, which in turn affects whether the query needs 277 to be repeated as described in 3.1. 279 For example, the test for whether a value is valid could be 281 E + C > 50 + 2*K 283 where 284 E is the accumulated entropy 285 C is zero if the value is not in the cache, otherwise 30 286 K is the logarithm (base 2) of the Bad Id count (3.5) 288 Cache entries may be retained in the cache for some period ( say 1 289 day ) after their normal TTL expiry time, to reduce the number of 290 queries when the value needs to be refreshed after TTL expiry. 292 4. Analysis 294 This section is intended to be less formal, to give some insight 295 into the rationale for the recommendations given in section 3, 296 and to discuss possible adverse effects. 298 The intention is that these mitigations have minimal effects, other 299 than to make DNS spoof attacks impractical. 301 4.1. Query repetition 302 Query repetition should have no impact other than on server load. 303 Servers do not normally retain any state information about clients 304 after the query/response transaction completes. 306 4.2. Impact on Root and TLD servers 308 The random nonce (3.4) is valuable because it means that no 309 extra queries to Root and top level servers are needed in normal 310 operation. This is important because these servers constitute 311 the shared public base of the DNS, so the stability of these 312 servers is very important. 314 The exceptions are the initial root "priming" query and queries 315 for non-existent domains. For the root domain, by assuming 316 that every child domain has an SOA record, Name Errors need not 317 be retried ( by checking the ower name for the SOA record ). 318 While this assumption is currently correct (and is also observed 319 to be true for net and com domains), implementors need to carefully 320 weigh any performance advantage with the risk that the assumption 321 may not be valid in future. 323 Clients in general should implement user interfaces that make it 324 unlikely that users will enter invalid domain names, and that 325 errors are properly notified, so they can be corrected. However 326 this is outside the scope of this document. 328 In practice, most root server queries emanate from mis-configured 329 software, so in any case proportional effect on root servers will be 330 small. It is important that negative results be properly cached. 332 4.3. Impact on other levels 334 For the example test given in 3.6, two queries are usually 335 required the first time a record is fetched. However when the 336 TTL expires, the refresh operation only requires a single query. 338 It is expected that such refresh operations dominate proper 339 DNS traffic, so the impact should be minimal. 341 Operators of authoritative servers have several options if 342 the query repetition may cause overload. 344 (a) Increase unreasonably low TTLs. 345 (b) Use names with more alpha characters (to take advantage of 0x20). 346 (c) Implement support for the proposed AL record or equivalent. 348 The latter implies that agreeing a specification for the proposed 349 AL record type (or EDNS Ping equivalent) would be useful. 351 4.4 Lame servers and the random nonce 353 In order to resolve domain names where servers are incorrectly 354 configured, it may be necessary to use a query without the nonce. 356 A current example is resolving the IP addresses for the name servers 357 for www.iahc.org, which are ns2.ar.com and ns3.ar.com. 359 The com nameservers generate a referral for the question 360 .ns2.ar.com, which leads only to lame name servers, but the 361 IP address for a non-lame server when the nonce is omitted. 363 Thus when lame servers are detected, special logic to allow name 364 resolution to still occur is needed. 366 Of course a resolver may choose to merely report failure in this 367 case, however this may not be practical. 369 4.5. Security Level 371 The 50 bits suggested in 3.6 should provide a good margin of 372 safety. An attack sending one spoof packet every 20 seconds at a 373 particular target will take about 50 million years to succeed. 375 Taking Bad IDs into consideration (3.5) implies that an attacker gains 376 nothing from sending attacks at a faster rate. 378 As a test, the resolver was run with the security level set to 200 bits 379 with no perceptible decrease in performance ( the required number of 380 packets can be calculated in advance and sent in parallel, except in 381 the non-deterministic case ). 383 5. Security Considerations 385 All of the mitigations aim to provide more security. Query repetition 386 has an obvious adverse effect on performance and bandwith. 388 Each query repetition provides an extra attack opportunity, so the 389 total entropy requirement may be adjusted to reflect this. 391 The random nonce may expose internal state to an attacker who 392 controls a name server. It is essential that a cryptographically 393 strong source of random numbers be used to generate IDs, 0x20 bits 394 and prepended nonces. This must be seeded from data that cannot be 395 guessed by an attacker, such as thermal noise or other random 396 physical fluctuations. 398 6. IANA Considerations 400 No direct considerations. 401 Indirectly, the TYPE code for AL record described in 4.4. 403 7. Acknowledgments 405 Thanks to Nicholas Weaver (ICSI Berkeley) and Wouter Wijngaards (NLnet 406 Labs). The idea of prepending a nonce may be due to Paul Vixie (ISC). 408 8. Informative References 410 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 411 Specification", RFC 2181, July 1997. 413 Author's Address 415 George Barwood 416 33 Sandpiper Close 417 Gloucester 418 GL2 4LZ 419 United Kingdom 421 Phone: +44 452 722670 422 EMail: george.barwood@blueyonder.co.uk 423 Skype: george.barwood 425 Full Copyright Statement 427 Copyright (C) The IETF Trust (2008). 429 This document is subject to the rights, licenses and restrictions 430 contained in BCP 78, and except as set forth therein, the authors 431 retain all their rights. 433 This document and the information contained herein are provided on an 434 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 435 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 436 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 437 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 438 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 439 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 441 Intellectual Property 443 The IETF takes no position regarding the validity or scope of any 444 Intellectual Property Rights or other rights that might be claimed to 445 pertain to the implementation or use of the technology described in 446 this document or the extent to which any license under such rights 447 might or might not be available; nor does it represent that it has 448 made any independent effort to identify any such rights. Information 449 on the procedures with respect to rights in RFC documents can be 450 found in BCP 78 and BCP 79. 452 Copies of IPR disclosures made to the IETF Secretariat and any 453 assurances of licenses to be made available, or the result of an 454 attempt made to obtain a general license or permission for the use of 455 such proprietary rights by implementers or users of this 456 specification can be obtained from the IETF on-line IPR repository at 457 http://www.ietf.org/ipr. 459 The IETF invites any interested party to bring to its attention any 460 copyrights, patents or patent applications, or other proprietary 461 rights that may cover technology that may be required to implement 462 this standard. Please address the information to the IETF at 463 ietf-ipr@ietf.org.