idnits 2.17.1 draft-andrews-dns-no-response-issue-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 6, 2014) is 3636 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5966 (Obsoleted by RFC 7766) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Andrews 3 Internet-Draft ISC 4 Expires: November 7, 2014 May 6, 2014 6 A Common Operational Problem in DNS Servers - Failure To Respond. 7 draft-andrews-dns-no-response-issue-03.txt 9 Abstract 11 The DNS is a query / response protocol. Failure to respond to 12 queries causes both immediate operational problems and long term 13 problems with protocol development. 15 This document identifies a number of common classes of queries that 16 some servers fail to respond too. This document also suggests 17 procedures for TLD and other similar zone operators to apply to 18 reduce / eliminate the problem. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on November 7, 2014. 37 Copyright Notice 39 Copyright (c) 2014 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Common queries class that result in non responses. . . . . . . 4 56 2.1. EDNS Queries - Version Independent . . . . . . . . . . . . 4 57 2.2. EDNS Queries - Version Specific . . . . . . . . . . . . . . 4 58 2.3. Unknown / Unsupported Type Queries . . . . . . . . . . . . 4 59 2.4. TCP Queries . . . . . . . . . . . . . . . . . . . . . . . . 5 60 3. Remediating . . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 4. Firewalls and Load Balancers . . . . . . . . . . . . . . . . . 6 62 5. Response Code Selection . . . . . . . . . . . . . . . . . . . . 7 63 6. Normative References . . . . . . . . . . . . . . . . . . . . . 7 64 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 8 66 1. Introduction 68 The DNS [RFC1034], [RFC1035] is a query / response protocol. Failure 69 to respond to queries causes both immediate operational problems and 70 long term problems with protocol development. 72 Failure to respond to a query is indistinguishable from a packet loss 73 without doing a analysis of query response patterns and results in 74 unnecessary additional queries being made by DNS clients and 75 unnecessary delays being introduced to the resolution process. 77 Due to the inability to distingish between packet loss and 78 nameservers dropping EDNS [RFC6891] queries, packet loss is sometimes 79 misclassified as lack of EDNS support which can lead to DNSSEC 80 validation failures. 82 Allowing servers which fail to respond to queries to remain results 83 in developers being afraid to deploy implementations of recent 84 standards. Such servers need to be identified and corrected / 85 replaced. 87 The DNS has response codes that cover almost any conceivable query 88 response. A nameserver should be able to respond to any conceivable 89 query using them. 91 Unless a nameserver is under attack, it should respond to all queries 92 directed to it as a result of following delegations. Additionally 93 code should not assume that there isn't a delegation to the server 94 even if it is not configured to serve the zone. Broken delegation 95 are a common occurrence in the DNS and receiving queries for zones 96 that you are not configured for is not a necessarily a indication 97 that you are under attack. Parent zone operators are supposed to 98 regularly check that the delegating NS records are consistent with 99 those of the delegated zone and to correct them when they are not 100 [RFC1034]. If this was being done regularly the instances of broke 101 delegations would be much lower. 103 When a nameserver is under attack it may wish to drop packets. A 104 common attack is to use a nameserver as a amplifier by sending 105 spoofed packets. This is done because response packets are bigger 106 than the queries and big amplification factors are available 107 especially if EDNS is supported. Limiting the rate of responses is 108 reasonable when this is occuring and the client should retry. This 109 however only works if legitimate clients are not being forced to 110 guess whether EDNS queries are accept or not. While there is still a 111 pool of servers that don't repsond to EDNS requests, clients have no 112 way to know if the lack of response is due to packet loss, EDNS 113 packets not being supported or rate limiting due to the server being 114 under attack. Mis-classifications of server characteristics are 115 unavoidable when rate limiting is done. 117 2. Common queries class that result in non responses. 119 There are three common query classes that result in non responses 120 today. These are EDNS queries, queries for unknown (unallocated) or 121 unsupported types and filtering of TCP queries. 123 2.1. EDNS Queries - Version Independent 125 Identifying servers that fail to respond to EDNS queries can be done 126 by first identifying that the server responds to regular DNS queries 127 then making a series of otherwise identical responses using EDNS, 128 then making the original query again. A series of EDNS queries is 129 needed as at least one DNS implementation responds to the first EDNS 130 query with FORMERR but fails to respond to subsequent queries from 131 the same address for a period until a regular DNS query is made. The 132 EDNS query should specify a UDP buffer size of 512 bytes to avoid 133 false classification of not supporting EDNS due to response packet 134 size. 136 If the server responds to the first and last queries but fails to 137 respond to most or all of the EDNS queries it is probably faulty. 138 The test should be repeated a number of times to eliminate the 139 likelihood of a false positive due to packet loss. 141 Firewalls may also block larger EDNS responses but there is no easy 142 way to check authoritative servers to see if the firewall is 143 misconfigured. 145 2.2. EDNS Queries - Version Specific 147 Some servers respond correctly to EDNS version 0 queries but fail to 148 respond to EDNS queries with version numbers that are higher than 149 zero. Servers should respond with BADVERS to EDNS queries with 150 version numbers that they do not support. 152 2.3. Unknown / Unsupported Type Queries 154 Identifying servers that fail to respond to unknown or unsupported 155 types can be done by making an initial DNS query for an A record, 156 making a number of queries for an unallocated type, them making a 157 query for an A record again. IANA maintains a registry of allocated 158 types. 160 If the server responds to the first and last queries but fails to 161 respond to the queries for the unallocated type it is probably 162 faulty. The test should be repeated a number of times to eliminate 163 the likely hood of a false positive due to packet loss. 165 2.4. TCP Queries 167 All DNS servers are supposed to respond to queries over TCP 168 [RFC5966]. Firewalls that drop TCP connection attempts rather that 169 resetting the connect attempt or send a ICMP/ICMPv6 administratively 170 prohibited message introduce excessive delays to the resolution 171 process. 173 Whether a server accepts TCP connections can be tested by first 174 checking that it responds to UDP queries to confirm that it is up and 175 operating then attempting the same query over TCP. An additional 176 query should be made over UDP if the TCP connection attempt fails to 177 confirm that the server under test is still operating. 179 3. Remediating 181 While the first step in remediating this problem is to get the 182 offending nameserver code corrected, there is a very long tail 183 problem with DNS servers in that it can often take over a decade 184 between the code being corrected and a nameserver being upgraded with 185 corrected code. With that in mind it is requested that TLD, and 186 other similar zone operators, take steps to identify and inform their 187 customers, directly or indirectly through registrars, that they are 188 running such servers and that the customers need to correct the 189 problem. 191 TLD operators should construct a list of servers child zones are 192 delegated to along with a delegated zone name. This name shall be 193 the query name used to test the server as it is supposed to exist. 195 For each server the TLD operator shall make an SOA query the 196 delegated zone name. This should result in the SOA record being 197 returned in the answer section. If the SOA record is not return but 198 some other response is returned this is a indication of a bad 199 delegation and the TLD operator should take whatever steps it 200 normally takes to rectify a bad delegation. If more that one zone is 201 delegated to the server it should choose another zone until it finds 202 a zone which responds correctly or it exhausts the list of zones 203 delegated to the server. 205 If the server fails to get a response to a SOA query the TLD operator 206 should make a A query as some nameservers fail to respond to SOA 207 queries but respond to A queries. If it gets no response to the A 208 query another delegated zone should be queried for as some 209 nameservers fail to respond to zones they are not configured for. If 210 subsequent queries find a responding zone all delegation to this 211 server need to be checked and rectified using the TLD's normal 212 procedures. 214 Having identified a working tuple the TLD 215 operator should now check that the server responds to EDNS, Unknown 216 Query Type and TCP tests as described above. If the TLD operator 217 finds that server fails any of the tests, the TLD operator shall take 218 steps to inform the operator of the server that they are running a 219 faulty nameserver and that they need to take steps to correct the 220 matter. The TLD operator shall also record the 221 for followup testing. 223 If repeated attempts to inform and get the customer to correct / 224 replace the faulty server are unsuccessful the TLD operator shall 225 remove all delegations to said server from the zone. 227 It will also be necessary for TLD operators to repeat the scans 228 periodically. It is recommended that this be performed monthly 229 backing off to bi-annually once the numbers of faulty servers found 230 drops off to less than 1 in 100000 servers tested. Follow up tests 231 for faulty servers still need to be performed monthly. 233 Some operators claim that they can't perform checks at registration 234 time. If a check is not performed at registration time it needs to 235 be performed within a week of registration in order to detect faulty 236 servers swiftly. 238 Checking of delegations by TLD operators should be nothing new as 239 they have been required from the very beginings of DNS to do this 240 [RFC1034]. Checking for compliance of nameserver operations should 241 just be a extension of such testing. 243 It is recommended that TLD operators setup a test web page which 244 performs the tests the TLD operator performs as part of their regular 245 audits to allow nameserver operators to test that they have correctly 246 fixed their servers. Such tests should be rate limited to avoid 247 these pages being a denial of service vector. 249 4. Firewalls and Load Balancers 251 Firewalls and load balancers can affect the externally visible 252 behaviour of a nameserver. Tests for conformance need to be done 253 from outside of any firewall so that the system as a whole is tested. 255 Firewalls and load balancers should not drop DNS packets that they 256 don't understand. They should either pass through the packets or 257 generate a appropriate error response. 259 Requests for unknown query types are not attacks and should not be 260 treated as such. 262 5. Response Code Selection 264 Choosing the correct response code when fixing a nameserver is 265 important. Just because a type is not implemented does not mean that 266 NOTIMP is the correct response code to return. Response codes need 267 to be choosen considering how clients will handle them. 269 For unimplemented opcodes NOTIMP is the expected response code. 271 In general, for unimplemented type codes Name Error (NXDOMAIN) and 272 NOERROR (no data) are the expected response codes. A server is not 273 supposed to serve a zone which contains unsupported types ([RFC1034]) 274 so the only thing left is return if the QNAME exists or not. NOTIMP 275 and REFUSED are not useful responses as they force the clients to try 276 all the authoritative servers for a zone looking for a server which 277 will answer the query. 279 Meta queries type may be the exception but these need to be thought 280 about on a case by case basis. 282 If you support EDNS and get a query with a unsupported EDNS version 283 the correct response is BADVERS [RFC6891]. 285 If you do not support EDNS at all FORMERR and NOTIMP are the expected 286 error codes. That said a mimimal EDNS server implementation just 287 requires parsing the OPT records and responding with a empty OPT 288 record. There is no need to interpret any EDNS options present in 289 the request as unsupported options are expected to be ignored 290 [RFC6891]. 292 6. Normative References 294 [RFC1034] Mockapetris, P., "DOMAIN NAMES - CONCEPTS AND FACILITIES", 295 STD 13, RFC 1034, November 1987. 297 [RFC1035] Mockapetris, P., "DOMAIN NAMES - IMPLEMENTATION AND 298 SPECIFICATION", STD 13, RFC 1035, November 1987. 300 [RFC5966] Bellis, R., "DNS Transport over TCP - Implementation 301 Requirements", RFC 5966, August 2010. 303 [RFC6891] Damas, J., Graff, M., and P. Vixie, "Extension Mechanisms 304 for DNS (EDNS(0))", STD 75, RFC 6891, April 2013. 306 Author's Address 308 M. Andrews 309 Internet Systems Consortium 310 950 Charter Street 311 Redwood City, CA 94063 312 US 314 Email: marka@isc.org