idnits 2.17.1 draft-andrews-dns-no-response-issue-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 15, 2015) is 3355 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5966 (Obsoleted by RFC 7766) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Andrews 3 Internet-Draft ISC 4 Expires: August 19, 2015 February 15, 2015 6 A Common Operational Problem in DNS Servers - Failure To Respond. 7 draft-andrews-dns-no-response-issue-07.txt 9 Abstract 11 The DNS is a query / response protocol. Failure to respond to 12 queries causes both immediate operational problems and long term 13 problems with protocol development. 15 This document identifies a number of common classes of queries that 16 some servers fail to respond too. This document also suggests 17 procedures for TLD and other similar zone operators to apply to 18 reduce / eliminate the problem. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on August 19, 2015. 37 Copyright Notice 39 Copyright (c) 2015 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Common queries class that result in non responses. . . . . . . 4 56 2.1. EDNS Queries - Version Independent . . . . . . . . . . . . 4 57 2.2. EDNS Queries - Version Specific . . . . . . . . . . . . . 4 58 2.3. EDNS Options . . . . . . . . . . . . . . . . . . . . . . . 4 59 2.4. EDNS Flags . . . . . . . . . . . . . . . . . . . . . . . . 5 60 2.5. Unknown / Unsupported Type Queries . . . . . . . . . . . . 5 61 2.6. TCP Queries . . . . . . . . . . . . . . . . . . . . . . . 5 62 3. Remediating . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 4. Firewalls and Load Balancers . . . . . . . . . . . . . . . . . 7 64 5. Scrubbing Services . . . . . . . . . . . . . . . . . . . . . . 7 65 6. Response Code Selection . . . . . . . . . . . . . . . . . . . 8 66 7. Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 67 8. Normative References . . . . . . . . . . . . . . . . . . . . . 11 68 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12 70 1. Introduction 72 The DNS [RFC1034], [RFC1035] is a query / response protocol. Failure 73 to respond to queries causes both immediate operational problems and 74 long term problems with protocol development. 76 Failure to respond to a query is indistinguishable from a packet loss 77 without doing a analysis of query response patterns and results in 78 unnecessary additional queries being made by DNS clients and 79 unnecessary delays being introduced to the resolution process. 81 Due to the inability to distinguish between packet loss and 82 nameservers dropping EDNS [RFC6891] queries, packet loss is sometimes 83 misclassified as lack of EDNS support which can lead to DNSSEC 84 validation failures. 86 Allowing servers which fail to respond to queries to remain results 87 in developers being afraid to deploy implementations of recent 88 standards. Such servers need to be identified and corrected / 89 replaced. 91 The DNS has response codes that cover almost any conceivable query 92 response. A nameserver should be able to respond to any conceivable 93 query using them. 95 Unless a nameserver is under attack, it should respond to all queries 96 directed to it as a result of following delegations. Additionally 97 code should not assume that there isn't a delegation to the server 98 even if it is not configured to serve the zone. Broken delegation 99 are a common occurrence in the DNS and receiving queries for zones 100 that you are not configured for is not a necessarily a indication 101 that you are under attack. Parent zone operators are supposed to 102 regularly check that the delegating NS records are consistent with 103 those of the delegated zone and to correct them when they are not 104 [RFC1034]. If this was being done regularly the instances of broken 105 delegations would be much lower. 107 When a nameserver is under attack it may wish to drop packets. A 108 common attack is to use a nameserver as a amplifier by sending 109 spoofed packets. This is done because response packets are bigger 110 than the queries and big amplification factors are available 111 especially if EDNS is supported. Limiting the rate of responses is 112 reasonable when this is occurring and the client should retry. This 113 however only works if legitimate clients are not being forced to 114 guess whether EDNS queries are accept or not. While there is still a 115 pool of servers that don't respond to EDNS requests, clients have no 116 way to know if the lack of response is due to packet loss, EDNS 117 packets not being supported or rate limiting due to the server being 118 under attack. Mis-classifications of server characteristics are 119 unavoidable when rate limiting is done. 121 2. Common queries class that result in non responses. 123 There are three common query classes that result in non responses 124 today. These are EDNS queries, queries for unknown (unallocated) or 125 unsupported types and filtering of TCP queries. 127 2.1. EDNS Queries - Version Independent 129 Identifying servers that fail to respond to EDNS queries can be done 130 by first identifying that the server responds to regular DNS queries 131 then making a series of otherwise identical responses using EDNS, 132 then making the original query again. A series of EDNS queries is 133 needed as at least one DNS implementation responds to the first EDNS 134 query with FORMERR but fails to respond to subsequent queries from 135 the same address for a period until a regular DNS query is made. The 136 EDNS query should specify a UDP buffer size of 512 bytes to avoid 137 false classification of not supporting EDNS due to response packet 138 size. 140 If the server responds to the first and last queries but fails to 141 respond to most or all of the EDNS queries it is probably faulty. 142 The test should be repeated a number of times to eliminate the 143 likelihood of a false positive due to packet loss. 145 Firewalls may also block larger EDNS responses but there is no easy 146 way to check authoritative servers to see if the firewall is 147 misconfigured. 149 2.2. EDNS Queries - Version Specific 151 Some servers respond correctly to EDNS version 0 queries but fail to 152 respond to EDNS queries with version numbers that are higher than 153 zero. Servers should respond with BADVERS to EDNS queries with 154 version numbers that they do not support. 156 Some servers respond correctly to EDNS version 0 queries but fail to 157 set QR=1 when responding to EDNS versions they do not support. Such 158 answer are discarded or treated as requests. 160 2.3. EDNS Options 162 Some servers fail to respond to EDNS queries with EDNS options set. 163 Unknown EDNS options are supposed to be ignored by the server 164 [RFC6891]. 166 2.4. EDNS Flags 168 Some servers fail to respond to EDNS queries with EDNS Flags set. 169 Server should ignore EDNS flags there do not understand and not add 170 them to the response [RFC6891]. 172 2.5. Unknown / Unsupported Type Queries 174 Identifying servers that fail to respond to unknown or unsupported 175 types can be done by making an initial DNS query for an A record, 176 making a number of queries for an unallocated type, them making a 177 query for an A record again. IANA maintains a registry of allocated 178 types. 180 If the server responds to the first and last queries but fails to 181 respond to the queries for the unallocated type it is probably 182 faulty. The test should be repeated a number of times to eliminate 183 the likelihood of a false positive due to packet loss. 185 2.6. TCP Queries 187 All DNS servers are supposed to respond to queries over TCP 188 [RFC5966]. Firewalls that drop TCP connection attempts rather that 189 resetting the connect attempt or send a ICMP/ICMPv6 administratively 190 prohibited message introduce excessive delays to the resolution 191 process. 193 Whether a server accepts TCP connections can be tested by first 194 checking that it responds to UDP queries to confirm that it is up and 195 operating then attempting the same query over TCP. An additional 196 query should be made over UDP if the TCP connection attempt fails to 197 confirm that the server under test is still operating. 199 3. Remediating 201 While the first step in remediating this problem is to get the 202 offending nameserver code corrected, there is a very long tail 203 problem with DNS servers in that it can often take over a decade 204 between the code being corrected and a nameserver being upgraded with 205 corrected code. With that in mind it is requested that TLD, and 206 other similar zone operators, take steps to identify and inform their 207 customers, directly or indirectly through registrars, that they are 208 running such servers and that the customers need to correct the 209 problem. 211 TLD operators should construct a list of servers child zones are 212 delegated to along with a delegated zone name. This name shall be 213 the query name used to test the server as it is supposed to exist. 215 For each server the TLD operator shall make an SOA query the 216 delegated zone name. This should result in the SOA record being 217 returned in the answer section. If the SOA record is not return but 218 some other response is returned this is a indication of a bad 219 delegation and the TLD operator should take whatever steps it 220 normally takes to rectify a bad delegation. If more that one zone is 221 delegated to the server it should choose another zone until it finds 222 a zone which responds correctly or it exhausts the list of zones 223 delegated to the server. 225 If the server fails to get a response to a SOA query the TLD operator 226 should make a A query as some nameservers fail to respond to SOA 227 queries but respond to A queries. If it gets no response to the A 228 query another delegated zone should be queried for as some 229 nameservers fail to respond to zones they are not configured for. If 230 subsequent queries find a responding zone all delegation to this 231 server need to be checked and rectified using the TLD's normal 232 procedures. 234 Having identified a working tuple the TLD 235 operator should now check that the server responds to EDNS, Unknown 236 Query Type and TCP tests as described above. If the TLD operator 237 finds that server fails any of the tests, the TLD operator shall take 238 steps to inform the operator of the server that they are running a 239 faulty nameserver and that they need to take steps to correct the 240 matter. The TLD operator shall also record the 241 for followup testing. 243 If repeated attempts to inform and get the customer to correct / 244 replace the faulty server are unsuccessful the TLD operator shall 245 remove all delegations to said server from the zone. 247 It will also be necessary for TLD operators to repeat the scans 248 periodically. It is recommended that this be performed monthly 249 backing off to bi-annually once the numbers of faulty servers found 250 drops off to less than 1 in 100000 servers tested. Follow up tests 251 for faulty servers still need to be performed monthly. 253 Some operators claim that they can't perform checks at registration 254 time. If a check is not performed at registration time it needs to 255 be performed within a week of registration in order to detect faulty 256 servers swiftly. 258 Checking of delegations by TLD operators should be nothing new as 259 they have been required from the very beginnings of DNS to do this 260 [RFC1034]. Checking for compliance of nameserver operations should 261 just be a extension of such testing. 263 It is recommended that TLD operators setup a test web page which 264 performs the tests the TLD operator performs as part of their regular 265 audits to allow nameserver operators to test that they have correctly 266 fixed their servers. Such tests should be rate limited to avoid 267 these pages being a denial of service vector. 269 4. Firewalls and Load Balancers 271 Firewalls and load balancers can affect the externally visible 272 behaviour of a nameserver. Tests for conformance need to be done 273 from outside of any firewall so that the system as a whole is tested. 275 Firewalls and load balancers should not drop DNS packets that they 276 don't understand. They should either pass through the packets or 277 generate a appropriate error response. 279 Requests for unknown query types are not attacks and should not be 280 treated as such. 282 Requests with unassigned flags set (DNS or EDNS) are not attacks and 283 should not be treated as such. The behaviour for unassigned is to 284 ignore them in the request and to not set them in the response. All 285 dropping DNS / EDNS packets with unassigned flags does is make it 286 harder to deploy extension that make use of them due to the need to 287 reconfigure / update firewalls. 289 Requests with unknown EDNS options are not a attack and should not be 290 treated as such. The correct behaviour for unknown EDNS options is 291 to ignore them. 293 Requests with unknown EDNS versions are not a attack and should not 294 be treated as such. The correct behaviour for unknown EDNS versions 295 is to return BADVERS along with the highest EDNS version the server 296 supports. All dropping EDNS packets does is break EDNS version 297 negotiation. 299 5. Scrubbing Services 301 Scrubbing services, like firewalls, can affect the externally visible 302 behaviour of a nameserver. If you use a scrubbing service you should 303 check that legitimate queries are not being blocked. 305 Scrubbing services, unlike firewalls, are also turned on and off in 306 response to denial of service attacks. One needs to take care when 307 choosing a scrubbing service and ask questions like: 309 do they pass unknown DNS query types. 310 do they pass unknown EDNS versions. 311 do they pass unknown EDNS options. 312 do they pass unknown EDNS flags. 314 All of these are not attack vectors but some scrubbing services treat 315 them as such. 317 6. Response Code Selection 319 Choosing the correct response code when fixing a nameserver is 320 important. Just because a type is not implemented does not mean that 321 NOTIMP is the correct response code to return. Response codes need 322 to be chosen considering how clients will handle them. 324 For unimplemented opcodes NOTIMP is the expected response code. 326 In general, for unimplemented type codes Name Error (NXDOMAIN) and 327 NOERROR (no data) are the expected response codes. A server is not 328 supposed to serve a zone which contains unsupported types ([RFC1034]) 329 so the only thing left is return if the QNAME exists or not. NOTIMP 330 and REFUSED are not useful responses as they force the clients to try 331 all the authoritative servers for a zone looking for a server which 332 will answer the query. 334 Meta queries type may be the exception but these need to be thought 335 about on a case by case basis. 337 If you support EDNS and get a query with a unsupported EDNS version 338 the correct response is BADVERS [RFC6891]. 340 If you do not support EDNS at all FORMERR and NOTIMP are the expected 341 error codes. That said a minimal EDNS server implementation just 342 requires parsing the OPT records and responding with a empty OPT 343 record. There is no need to interpret any EDNS options present in 344 the request as unsupported options are expected to be ignored 345 [RFC6891]. 347 7. Testing 349 Verify the server is configured for the zone: 351 dig +noedns +noad +norec soa $zone @$server 353 expect: status: NOERROR 354 expect: SOA record 356 Check that TCP queries work: 358 dig +noedns +noad +norec +tcp soa $zone @$server 360 expect: status: NOERROR 361 expect: SOA record 363 Check that queries for a unknown type to work: 365 dig +noedns +noad +norec type1000 $zone @$server 367 expect: status: NOERROR 368 expect: a empty answer section. 370 Check that queries the CD=1 work: 372 dig +noedns +noad +norec +cd soa $zone @$server 374 expect: status: NOERROR 375 expect: SOA record to be present 377 Check that queries the AD=1 work: 379 dig +noedns +norec +ad soa $zone @$server 381 expect: status: NOERROR 382 expect: SOA record to be present 384 Check that queries with the last unassigned DNS header flag to work: 386 dig +noedns +noad +norec +zflag soa $zone @$server 388 expect: status: NOERROR 389 expect: SOA record to be present 390 expect: MBZ to not be in the response 392 Check that plain EDNS queries work: 394 dig +edns=0 +noad +norec soa $zone @$server 396 expect: status: NOERROR 397 expect: SOA record to be present 398 expect: OPT record to be present 399 expect: EDNS Version 0 in response 401 Check that EDNS version 1 queries work (EDNS supported): 403 dig +edns=1 +noednsneg +noad +norec soa $zone @$server 405 expect: status: BADVERS 406 expect: SOA record to not be present 407 expect: OPT record to be present 408 expect: EDNS Version 0 in response 409 (this will change when EDNS version 1 is defined) 411 Check that EDNS queries with a unknown option work (EDNS supported): 413 dig +edns=0 +noad +norec +ednsopt=100 soa $zone @$server 415 expect: status: NOERROR 416 expect: SOA record to be present 417 expect: OPT record to be present 418 expect: OPT=100 to not be present 419 expect: EDNS Version 0 in response 421 Check that EDNS queries with a unknown flags work (EDNS supported): 423 dig +edns=0 +noad +norec +ednsflags=0x40 soa $zone @$server 425 expect: status: NOERROR 426 expect: SOA record to be present 427 expect: OPT record to be present 428 expect: MBZ not to be present 429 expect: EDNS Version 0 in response 431 Check that EDNS version 1 queries with a unknown flags work (EDNS 432 supported): 434 dig +edns=1 +noednsneg +noad +norec +ednsflags=0x40 soa \ 435 $zone @$server 437 expect: status: BADVERS 438 expect: SOA record to NOT be present 439 expect: OPT record to be present 440 expect: MBZ not to be present 441 expect: EDNS Version 0 in response 443 Check that EDNS version 1 queries with a unknown options work (EDNS 444 supported): 446 dig +edns=1 +noednsneg +noad +norec +ednsopt=100 soa $zone @$server 448 expect: status: BADVERS 449 expect: SOA record to NOT be present 450 expect: OPT record to be present 451 expect: OPT=100 to NOT be present 452 expect: EDNS Version 0 in response 454 Check that a DNSSEC queries work (EDNS supported): 456 dig +edns=0 +noad +norec +dnssec soa $zone @$server 458 expect: status: NOERROR 459 expect: SOA record to be present 460 expect: OPT record to be present 461 expect: DO=1 to be present if a RRSIG is in the response 462 expect: EDNS Version 0 in response 464 DO=1 as per [RFC3225]. 466 Check that EDNS version 1 DNSSEC queries work (EDNS supported): 468 dig +edns=1 +noednsneg +noad +norec +dnssec soa \ 469 $zone @$server 471 expect: status: BADVERS 472 expect: SOA record to not be present 473 expect: OPT record to be present 474 expect: DO=1 to be present if the EDNS version 0 DNSSEC query test 475 returned DO=1 476 expect: EDNS Version 0 in response 478 If EDNS is not supported by the nameserver we expect a response to 479 all the above queries. That response may be a FORMERR or NOTIMP 480 error response or the OPT record may just be ignored. 482 It is advisable to run all the above tests in parallel so as to 483 minimise the delays due to multiple timeouts when the servers do not 484 respond. 486 The above tests use dig from BIND 9.11.0 which is still in 487 development. 489 8. Normative References 491 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 492 STD 13, RFC 1034, November 1987. 494 [RFC1035] Mockapetris, P., "Domain names - implementation and 495 specification", STD 13, RFC 1035, November 1987. 497 [RFC3225] Conrad, D., "Indicating Resolver Support of DNSSEC", 498 RFC 3225, December 2001. 500 [RFC5966] Bellis, R., "DNS Transport over TCP - Implementation 501 Requirements", RFC 5966, August 2010. 503 [RFC6891] Damas, J., Graff, M., and P. Vixie, "Extension Mechanisms 504 for DNS (EDNS(0))", STD 75, RFC 6891, April 2013. 506 Author's Address 508 M. Andrews 509 Internet Systems Consortium 510 950 Charter Street 511 Redwood City, CA 94063 512 US 514 Email: marka@isc.org