idnits 2.17.1 draft-andrews-dns-no-response-issue-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 22, 2013) is 3990 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2671 (Obsoleted by RFC 6891) ** Obsolete normative reference: RFC 5966 (Obsoleted by RFC 7766) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Andrews 3 Internet-Draft ISC 4 Expires: November 23, 2013 May 22, 2013 6 A Common Operational Problem in DNS Servers - Failure To Respond. 7 draft-andrews-dns-no-response-issue-01.txt 9 Abstract 11 The DNS is a query / response protocol. Failure to respond to 12 queries causes both immediate operational problems and long term 13 problems with protocol development. 15 This document will identify a number of common classes of queries 16 that some servers fail to respond too. This document will also 17 suggest procedures for TLD and other similar zone operators to apply 18 to reduce / eliminate the problem. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on November 23, 2013. 37 Copyright Notice 39 Copyright (c) 2013 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Common queries class that result in non responses. . . . . . . 3 56 2.1. EDNS Queries . . . . . . . . . . . . . . . . . . . . . . . 3 57 2.2. Unknown / Unsupported Type Queries . . . . . . . . . . . . 4 58 2.3. TCP Queries . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. Remediating . . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 4. Firewalls and Load Balancers . . . . . . . . . . . . . . . . . 6 61 5. Normative References . . . . . . . . . . . . . . . . . . . . . 6 62 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 1. Introduction 66 The DNS [RFC1034], [RFC1035] is a query / response protocol. Failure 67 to respond to queries causes both immediate operational problems and 68 long term problems with protocol development. 70 Failure to respond to a query is indistinguishable from a packet loss 71 without doing a analysis of query response patterns and results in 72 unnecessary additional queries being made by DNS clients and 73 unnecessary delays being introduced to the resolution process. 75 Due to the inability to distingish between packet loss and 76 nameservers dropping EDNS queries, packet loss is sometimes 77 misclassified as lack of EDNS support which can lead to DNSSEC 78 validation failures. 80 Allowing servers which fail to respond to queries to remain in the 81 DNS hierarchy for extended periods results in developers being afraid 82 to deploy new type codes. Such servers need to be identified and 83 corrected / replaced. 85 The DNS has response codes that cover almost any conceivable query 86 response. A nameserver should be able to respond to any conceivable 87 query using them. 89 Unless a nameserver is under attack, it should respond to all queries 90 directed to it as a result of following delegations. Additionally 91 code should not assume that there isn't a delegation to the server 92 even if it is not configured to serve the zone. Broken delegation 93 are a common occurrence in the DNS and receiving queries for zones 94 that you are not configured for is not a necessarily a indication 95 that you are under attack. 97 2. Common queries class that result in non responses. 99 There are three common query class that result in non responses 100 today. These are EDNS [RFC2671] queries, queries for unknown 101 (unallocated) or unsupported types and filtering of TCP queries. 103 2.1. EDNS Queries 105 Identifying servers that fail to respond to EDNS queries can be done 106 by first identifying that the server responds to regular DNS queries 107 then making a series otherwise identical responses using EDNS, then 108 making the original query again. A series of EDNS queries is needed 109 as at least one DNS implementation responds to the first EDNS query 110 with FORMERR but fails to respond to subsequent queries from the same 111 address for a period until a regular DNS query is made. The EDNS 112 query should specify a UDP buffer size of 512 bytes to avoid false 113 classification of not supporting EDNS due to response packet size. 115 If the server responds to the first and last queries but fails to 116 respond to most or all of the EDNS queries it is probably faulty. 117 The test should be repeated a number of times to eliminate the likely 118 hood of a false positive due to packet loss. 120 Firewalls may also block larger EDNS responses but there is no easy 121 way to check authoritative servers to see if the firewall is 122 misconfigured. 124 2.2. Unknown / Unsupported Type Queries 126 Identifying servers that fail to respond to unknown or unsupported 127 types can be done by making a initial DNS query for a A record, 128 making a number of queries for unallocated type, them making a query 129 for a A record again. IANA maintains a registry of allocated types. 131 If the server responds to the first and last queries but fails to 132 respond to the queries for the unallocated type it is probably 133 faulty. The test should be repeated a number of times to eliminate 134 the likely hood of a false positive due to packet loss. 136 2.3. TCP Queries 138 All DNS servers are supposed to respond to queries over TCP 139 [RFC5966]. Firewalls that drop TCP connection attempts rather that 140 resetting the connect attempt or send a ICMP/ICMPv6 administratively 141 prohibited message introduce excessive delays to the resolution 142 process. 144 Whether a server accepts TCP connections can be tested by first 145 checking that it responds to UDP queries to confirm that it is up and 146 operating then attempting the same query over TCP. A additional 147 query should be made over UDP if the TCP connection attempt fails to 148 confirm that the server under test is still operating. 150 3. Remediating 152 While the first step in remediating this problem is to get the 153 offending nameserver code corrected, there is a very long tail 154 problem with DNS servers in that it can often take over a decade 155 between the code being corrected and a nameserver being upgraded with 156 corrected code. With that in mind it is requested that TLD, and 157 other similar zone operators, take steps to identify and inform their 158 customers, directly or indirectly through registrars, that they are 159 running such servers and that the customers need to correct the 160 problem. 162 TLD operators should construct a list of servers child zones are 163 delegated to along with a delegated zone name. This name shall be 164 the query name used to test the server as it is supposed to exist. 166 For each server the TLD operator shall make a SOA query the delegated 167 zone name. This should result in the SOA record being returned in 168 the answer section. If the SOA record is not return but some other 169 response is returned this is a indication of a bad delegation and the 170 TLD operator should take whatever steps it normally takes to rectify 171 a bad delegation. If more that one zone is delegated to the server 172 it should choose another zone until it finds a zone which responds 173 correctly or it exhausts the list of zones delegated to the server. 175 If the server fails to get a response to a SOA query the TLD operator 176 should make a A query as some nameservers fail to respond to SOA 177 queries but respond to A queries. If it gets no response to the A 178 query another delegated zone should be queried for as some 179 nameservers fail to respond to zones they are not configured for. If 180 subsequent queries find a responding zone all delegation to this 181 server need to be checked and rectified using the TLD's normal 182 procedures. 184 Having identified a working tuple the TLD 185 operator should now check that the server responds to EDNS, Unknown 186 Query Type and TCP tests as described above. If the TLD operator 187 finds that server fails any of the tests, the TLD operator shall take 188 steps to inform the operator of the server that they are running a 189 fault nameserver and that they need to take steps to correct the 190 matter. The TLD operator shall also record the 191 for followup testing. 193 If repeated attempts to inform and get the customer to correct / 194 replace the fault server are unsuccessful the TLD operator shall 195 remove all delegations to said server from the zone. 197 It will also be necessary for TLD operators to repeat the scans 198 periodically. It is recommended that this be performed monthly 199 backing off to bi-annually once the numbers of faulty servers found 200 drops off to less than 1 in 100000 servers tested. Follow up tests 201 for faulty servers still need to be performed monthly. 203 Some operators claim that they can't perform checks at registration 204 time. If a check is not performed at registration time it needs to 205 be performed within a week of registration in order to detect faulty 206 servers swiftly. 208 Checking of delegations by TLD operators should be nothing new as 209 they have been required from the very beginings of DNS to do this 210 [RFC1034]. Checking for compliance of nameserver operations should 211 just be a extension of such testing. 213 It is recommended that TLD operators setup a test web page which 214 performs the tests the TLD operator performs as part of their regular 215 audits to allow nameserver operators to test that they have correctly 216 fixed their servers. Such tests should be rate limited to avoid 217 these pages being a denial of service vector. 219 4. Firewalls and Load Balancers 221 Firewalls and load balancers can affect the externally visible 222 behaviour of a nameserver. Tests for conformance need to be done 223 from outside of any firewall so that the system as a whole is tested. 225 Firewalls and load balancers should not drop DNS packets that they 226 don't understand. They should either pass through the packets or 227 generate a appropriate error response. 229 Requests for unknown query types are not attacks and should not be 230 treated as such. 232 5. Normative References 234 [RFC1034] Mockapetris, P., "DOMAIN NAMES - CONCEPTS AND FACILITIES", 235 STD 13, RFC 1034, November 1987. 237 [RFC1035] Mockapetris, P., "DOMAIN NAMES - IMPLEMENTATION AND 238 SPECIFICATION", STD 13, RFC 1035, November 1987. 240 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", 241 RFC 2671, August 1999. 243 [RFC5966] Bellis, R., "DNS Transport over TCP - Implementation 244 Requirements", RFC 5966, August 2010. 246 Author's Address 248 M. Andrews 249 Internet Systems Consortium 250 950 Charter Street 251 Redwood City, CA 94063 252 US 254 Email: marka@isc.org