idnits 2.17.1 draft-andrews-dns-no-response-issue-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 2, 2015) is 3097 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 5966 (Obsoleted by RFC 7766) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Andrews 3 Internet-Draft ISC 4 Intended status: Informational November 2, 2015 5 Expires: May 5, 2016 7 A Common Operational Problem in DNS Servers - Failure To Respond. 8 draft-andrews-dns-no-response-issue-13 10 Abstract 12 The DNS is a query / response protocol. Failure to respond to 13 queries causes both immediate operational problems and long term 14 problems with protocol development. 16 This document identifies a number of common classes of queries that 17 some servers fail to respond too. This document also suggests 18 procedures for TLD and other similar zone operators to apply to help 19 reduce / eliminate the problem. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on May 5, 2016. 38 Copyright Notice 40 Copyright (c) 2015 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Common queries class that result in non responses. . . . . . 3 57 2.1. EDNS Queries - Version Independent . . . . . . . . . . . 3 58 2.2. EDNS Queries - Version Specific . . . . . . . . . . . . . 4 59 2.3. EDNS Options . . . . . . . . . . . . . . . . . . . . . . 4 60 2.4. EDNS Flags . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.5. DNS Flags . . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.6. Unknown / Unsupported Type Queries . . . . . . . . . . . 4 63 2.7. Unknown DNS opcodes . . . . . . . . . . . . . . . . . . . 5 64 2.8. TCP Queries . . . . . . . . . . . . . . . . . . . . . . . 5 65 3. Remediating . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 4. Firewalls and Load Balancers . . . . . . . . . . . . . . . . 7 67 5. Scrubbing Services . . . . . . . . . . . . . . . . . . . . . 8 68 6. Whole Answer Caches . . . . . . . . . . . . . . . . . . . . . 8 69 7. Response Code Selection . . . . . . . . . . . . . . . . . . . 8 70 8. Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 71 9. Security Considerations . . . . . . . . . . . . . . . . . . . 13 72 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 73 11. Normative References . . . . . . . . . . . . . . . . . . . . 13 74 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 14 76 1. Introduction 78 The DNS [RFC1034], [RFC1035] is a query / response protocol. Failure 79 to respond to queries causes both immediate operational problems and 80 long term problems with protocol development. 82 Failure to respond to a query is indistinguishable from a packet loss 83 without doing a analysis of query response patterns and results in 84 unnecessary additional queries being made by DNS clients and 85 unnecessary delays being introduced to the resolution process. 87 Due to the inability to distinguish between packet loss and 88 nameservers dropping EDNS [RFC6891] queries, packet loss is sometimes 89 misclassified as lack of EDNS support which can lead to DNSSEC 90 validation failures. 92 Allowing servers which fail to respond to queries to remain results 93 in developers being afraid to deploy implementations of recent 94 standards. Such servers need to be identified and corrected / 95 replaced. 97 The DNS has response codes that cover almost any conceivable query 98 response. A nameserver should be able to respond to any conceivable 99 query using them. 101 Unless a nameserver is under attack, it should respond to all queries 102 directed to it as a result of following delegations. Additionally 103 code should not assume that there isn't a delegation to the server 104 even if it is not configured to serve the zone. Broken delegations 105 are a common occurrence in the DNS and receiving queries for zones 106 that you are not configured for is not a necessarily a indication 107 that you are under attack. Parent zone operators are supposed to 108 regularly check that the delegating NS records are consistent with 109 those of the delegated zone and to correct them when they are not 110 [RFC1034]. If this was being done regularly, the instances of broken 111 delegations would be much lower. 113 When a nameserver is under attack it may wish to drop packets. A 114 common attack is to use a nameserver as a amplifier by sending 115 spoofed packets. This is done because response packets are bigger 116 than the queries and big amplification factors are available 117 especially if EDNS is supported. Limiting the rate of responses is 118 reasonable when this is occurring and the client should retry. This 119 however only works if legitimate clients are not being forced to 120 guess whether EDNS queries are accept or not. While there is still a 121 pool of servers that don't respond to EDNS requests, clients have no 122 way to know if the lack of response is due to packet loss, EDNS 123 packets not being supported or rate limiting due to the server being 124 under attack. Mis-classifications of server characteristics are 125 unavoidable when rate limiting is done. 127 2. Common queries class that result in non responses. 129 There are three common query classes that result in non responses 130 today. These are EDNS queries, queries for unknown (unallocated) or 131 unsupported types, and filtering of TCP queries. 133 2.1. EDNS Queries - Version Independent 135 Identifying servers that fail to respond to EDNS queries can be done 136 by first identifying that the server responds to regular DNS queries, 137 followed by a series of otherwise identical responses using EDNS, 138 then making the original query again. A series of EDNS queries is 139 needed as at least one DNS implementation responds to the first EDNS 140 query with FORMERR but fails to respond to subsequent queries from 141 the same address for a period until a regular DNS query is made. The 142 EDNS query should specify a UDP buffer size of 512 bytes to avoid 143 false classification of not supporting EDNS due to response packet 144 size. 146 If the server responds to the first and last queries but fails to 147 respond to most or all of the EDNS queries, it is probably faulty. 148 The test should be repeated a number of times to eliminate the 149 likelihood of a false positive due to packet loss. 151 Firewalls may also block larger EDNS responses but there is no easy 152 way to check authoritative servers to see if the firewall is 153 misconfigured. 155 2.2. EDNS Queries - Version Specific 157 Some servers respond correctly to EDNS version 0 queries but fail to 158 respond to EDNS queries with version numbers that are higher than 159 zero. Servers should respond with BADVERS to EDNS queries with 160 version numbers that they do not support. 162 Some servers respond correctly to EDNS version 0 queries but fail to 163 set QR=1 when responding to EDNS versions they do not support. Such 164 answers are discarded or treated as requests. 166 2.3. EDNS Options 168 Some servers fail to respond to EDNS queries with EDNS options set. 169 Unknown EDNS options are supposed to be ignored by the server 170 [RFC6891]. 172 2.4. EDNS Flags 174 Some servers fail to respond to EDNS queries with EDNS flags set. 175 Server should ignore EDNS flags there do not understand and should 176 not add them to the response [RFC6891]. 178 2.5. DNS Flags 180 Some servers fail to respond to DNS queries with various DNS flags 181 set, regardless of whether they are defined or still reserved. At 182 the time of writing there are servers that fail to respond to queries 183 with the AD bit set to 1 and servers that fail to respond to queries 184 with the last reserved flag bit set. 186 2.6. Unknown / Unsupported Type Queries 188 Identifying servers that fail to respond to unknown or unsupported 189 types can be done by making an initial DNS query for an A record, 190 making a number of queries for an unallocated type, them making a 191 query for an A record again. IANA maintains a registry of allocated 192 types. 194 If the server responds to the first and last queries but fails to 195 respond to the queries for the unallocated type, it is probably 196 faulty. The test should be repeated a number of times to eliminate 197 the likelihood of a false positive due to packet loss. 199 2.7. Unknown DNS opcodes 201 The use of previously undefined opcodes is to be expected. Since the 202 DNS was first defined two new opcodes have been added, UPDATE and 203 NOTIFY. 205 NOTIMP is the expected rcode to an unknown / unimplemented opcode. 207 Note: while new opcodes will most probably use the current layout 208 structure for the rest of the message there is no requirement than 209 anything other than the DNS header match. 211 2.8. TCP Queries 213 All DNS servers are supposed to respond to queries over TCP 214 [RFC5966]. Firewalls that drop TCP connection attempts rather that 215 resetting the connect attempt or send a ICMP/ICMPv6 administratively 216 prohibited message introduce excessive delays to the resolution 217 process. 219 Whether a server accepts TCP connections can be tested by first 220 checking that it responds to UDP queries to confirm that it is up and 221 operating, then attempting the same query over TCP. An additional 222 query should be made over UDP if the TCP connection attempt fails to 223 confirm that the server under test is still operating. 225 3. Remediating 227 While the first step in remediating this problem is to get the 228 offending nameserver code corrected, there is a very long tail 229 problem with DNS servers in that it can often take over a decade 230 between the code being corrected and a nameserver being upgraded with 231 corrected code. With that in mind it is requested that TLD, and 232 other similar zone operators, take steps to identify and inform their 233 customers, directly or indirectly through registrars, that they are 234 running such servers and that the customers need to correct the 235 problem. 237 TLD operators are being asked to do this as they, due to the nature 238 of running a TLD and the heirachical nature of the DNS, have access 239 to a large numbers of nameserver names as well as contact details for 240 the registrants of those nameservers. One can construct lists of 241 nameservers from other sources and that has been done to survey the 242 state of the Internet, but that doesn't give you the contact details 243 necessary to inform the operators. The SOA RNAME is often invalid 244 and whois data is obscured and / or not available which makes 245 infeasible for others to do this. 247 TLD operators should construct a list of servers child zones are 248 delegated to along with a delegated zone name. This name shall be 249 the query name used to test the server as it is supposed to exist. 251 For each server the TLD operator shall make an SOA query of the 252 delegated zone name. This should result in the SOA record being 253 returned in the answer section. If the SOA record is not returned 254 but some other response is returned, this is a indication of a bad 255 delegation and the TLD operator should take whatever steps it 256 normally takes to rectify a bad delegation. If more that one zone is 257 delegated to the server, it should choose another zone until it finds 258 a zone which responds correctly or it exhausts the list of zones 259 delegated to the server. 261 If the server fails to get a response to a SOA query, the TLD 262 operator should make an A query as some nameservers fail to respond 263 to SOA queries but respond to A queries. If it gets no response to 264 the A query, another delegated zone should be queried for as some 265 nameservers fail to respond to zones they are not configured for. If 266 subsequent queries find a responding zone, all delegation to this 267 server need to be checked and rectified using the TLD's normal 268 procedures. 270 Having identified a working tuple the TLD 271 operator should now check that the server responds to EDNS, Unknown 272 Query Type and TCP tests as described above. If the TLD operator 273 finds that server fails any of the tests, the TLD operator shall take 274 steps to inform the operator of the server that they are running a 275 faulty nameserver and that they need to take steps to correct the 276 matter. The TLD operator shall also record the 277 for follow-up testing. 279 If repeated attempts to inform and get the customer to correct / 280 replace the faulty server are unsuccessful the TLD operator shall 281 remove all delegations to said server from the zone. 283 It will also be necessary for TLD operators to repeat the scans 284 periodically. It is recommended that this be performed monthly 285 backing off to bi-annually once the numbers of faulty servers found 286 drops off to less than 1 in 100000 servers tested. Follow-up tests 287 for faulty servers still need to be performed monthly. 289 Some operators claim that they can't perform checks at registration 290 time. If a check is not performed at registration time, it needs to 291 be performed within a week of registration in order to detect faulty 292 servers swiftly. 294 Checking of delegations by TLD operators should be nothing new as 295 they have been required from the very beginnings of DNS to do this 296 [RFC1034]. Checking for compliance of nameserver operations should 297 just be a extension of such testing. 299 It is recommended that TLD operators setup a test web page which 300 performs the tests the TLD operator performs as part of their regular 301 audits to allow nameserver operators to test that they have correctly 302 fixed their servers. Such tests should be rate limited to avoid 303 these pages being a denial of service vector. 305 4. Firewalls and Load Balancers 307 Firewalls and load balancers can affect the externally visible 308 behaviour of a nameserver. Tests for conformance need to be done 309 from outside of any firewall so that the system as a whole is tested. 311 Firewalls and load balancers should not drop DNS packets that they 312 don't understand. They should either pass through the packets or 313 generate an appropriate error response. 315 Requests for unknown query types are not attacks and should not be 316 treated as such. 318 Requests with unassigned flags set (DNS or EDNS) are not attacks and 319 should not be treated as such. The behaviour for unassigned is to 320 ignore them in the request and to not set them in the response. All 321 dropping DNS / EDNS packets with unassigned flags does is make it 322 harder to deploy extensions that make use of them due to the need to 323 reconfigure / update firewalls. 325 Requests with unknown EDNS options are not an attack and should not 326 be treated as such. The correct behaviour for unknown EDNS options 327 is to ignore them. 329 Requests with unknown EDNS versions are not a attack and should not 330 be treated as such. The correct behaviour for unknown EDNS versions 331 is to return BADVERS along with the highest EDNS version the server 332 supports. All dropping EDNS packets does is break EDNS version 333 negotiation. 335 Firewalls should not assume that there will only be a single response 336 message to a requests. There have been proposals to use EDNS to 337 signal that multiple DNS messages be returned rather than a single 338 UDP message that is fragmented at the IP layer. 340 5. Scrubbing Services 342 Scrubbing services, like firewalls, can affect the externally visible 343 behaviour of a nameserver. If you use a scrubbing service, you 344 should check that legitimate queries are not being blocked. 346 Scrubbing services, unlike firewalls, are also turned on and off in 347 response to denial of service attacks. One needs to take care when 348 choosing a scrubbing service and ask questions like: 350 Do they pass unknown DNS query types? 352 Do they pass unknown EDNS versions? 354 Do they pass unknown EDNS options? 356 Do they pass unknown EDNS flags? 358 Do they pass requests with unknown DNS opcodes? 360 Do they pass requests with the remaining reserved DNS header flag 361 bit set? 363 All of these are not attack vectors but some scrubbing services treat 364 them as such. 366 6. Whole Answer Caches 368 Whole answer caches can return the wrong reponse to a query if they 369 do not take all of the query into account. This has implications 370 when testing and with overall protocol compliance. 372 e.g. There are whole answer caches that ingore the EDNS version 373 field which results in incorrect answers to non EDNS version 0 374 queries being returned if they were proceeded by a EDNS version 0 375 query for the same name and type. 377 7. Response Code Selection 379 Choosing the correct response code when fixing a nameserver is 380 important. Just because a type is not implemented does not mean that 381 NOTIMP is the correct response code to return. Response codes need 382 to be chosen considering how clients will handle them. 384 For unimplemented opcodes NOTIMP is the expected response code. 385 Additionally a new opcode could change the message format by 386 extending the header or changing the structure of the records etc. 387 This may result in FORMERR being returned though NOTIMP would be more 388 correct. 390 In general, for unimplemented type codes Name Error (NXDOMAIN) and 391 NOERROR (no data) are the expected response codes. A server is not 392 supposed to serve a zone which contains unsupported types ([RFC1034]) 393 so the only thing left is return if the QNAME exists or not. NOTIMP 394 and REFUSED are not useful responses as they force the clients to try 395 all the authoritative servers for a zone looking for a server which 396 will answer the query. 398 Meta queries type may be the exception but these need to be thought 399 about on a case by case basis. 401 If you support EDNS and get a query with an unsupported EDNS version, 402 the correct response is BADVERS [RFC6891]. 404 If you do not support EDNS at all, FORMERR and NOTIMP are the 405 expected error codes. That said a minimal EDNS server implementation 406 just requires parsing the OPT records and responding with an empty 407 OPT record. There is no need to interpret any EDNS options present 408 in the request as unsupported options are expected to be ignored 409 [RFC6891]. 411 8. Testing 413 Verify the server is configured for the zone: 415 dig +noedns +noad +norec soa $zone @$server 417 expect: status: NOERROR 418 expect: SOA record 420 Check that TCP queries work: 422 dig +noedns +noad +norec +tcp soa $zone @$server 424 expect: status: NOERROR 425 expect: SOA record 427 Check that queries for an unknown type to work: 429 dig +noedns +noad +norec type1000 $zone @$server 431 expect: status: NOERROR 432 expect: an empty answer section. 434 Check that queries with CD=1 work: 436 dig +noedns +noad +norec +cd soa $zone @$server 438 expect: status: NOERROR 439 expect: SOA record to be present 441 Check that queries with AD=1 work: 443 dig +noedns +norec +ad soa $zone @$server 445 expect: status: NOERROR 446 expect: SOA record to be present 448 Check that queries with the last unassigned DNS header flag to work: 450 dig +noedns +noad +norec +zflag soa $zone @$server 452 expect: status: NOERROR 453 expect: SOA record to be present 454 expect: MBZ to not be in the response 456 MBZ (Must Be Zero) presence indicates the flag bit has been copied. 458 Check that plain EDNS queries work: 460 dig +edns=0 +noad +norec soa $zone @$server 462 expect: status: NOERROR 463 expect: SOA record to be present 464 expect: OPT record to be present 465 expect: EDNS Version 0 in response 467 Check that EDNS version 1 queries work (EDNS supported): 469 dig +edns=1 +noednsneg +noad +norec soa $zone @$server 471 expect: status: BADVERS 472 expect: SOA record to not be present 473 expect: OPT record to be present 474 expect: EDNS Version 0 in response 475 (Only EDNS Verion 0 is currently defined so the response should 476 always be a 0 version. This will change when EDNS version 1 is 477 defined.) 479 Check that EDNS queries with an unknown option work (EDNS supported): 481 dig +edns=0 +noad +norec +ednsopt=100 soa $zone @$server 483 expect: status: NOERROR 484 expect: SOA record to be present 485 expect: OPT record to be present 486 expect: OPT=100 to not be present 487 expect: EDNS Version 0 in response 489 Check that EDNS queries with unknown flags work (EDNS supported): 491 dig +edns=0 +noad +norec +ednsflags=0x40 soa $zone @$server 493 expect: status: NOERROR 494 expect: SOA record to be present 495 expect: OPT record to be present 496 expect: MBZ not to be present 497 expect: EDNS Version 0 in response 499 MBZ (Must Be Zero) presence indicates the flag bit has been copied. 501 Check that EDNS version 1 queries with unknown flags work (EDNS 502 supported): 504 dig +edns=1 +noednsneg +noad +norec +ednsflags=0x40 soa \ 505 $zone @$server 507 expect: status: BADVERS 508 expect: SOA record to NOT be present 509 expect: OPT record to be present 510 expect: MBZ not to be present 511 expect: EDNS Version 0 in response 513 +noednsneg disables EDNS version negotiation in DiG; MBZ (Must Be 514 Zero) presence indicates the flag bit has been copied. 516 Check that EDNS version 1 queries with unknown options work (EDNS 517 supported): 519 dig +edns=1 +noednsneg +noad +norec +ednsopt=100 soa $zone @$server 521 expect: status: BADVERS 522 expect: SOA record to NOT be present 523 expect: OPT record to be present 524 expect: OPT=100 to NOT be present 525 expect: EDNS Version 0 in response 527 +noednsneg disables EDNS version negotiation in DiG. 529 Check that a DNSSEC queries work (EDNS supported): 531 dig +edns=0 +noad +norec +dnssec soa $zone @$server 533 expect: status: NOERROR 534 expect: SOA record to be present 535 expect: OPT record to be present 536 expect: DO=1 to be present if a RRSIG is in the response 537 expect: EDNS Version 0 in response 539 DO=1 should be present if RRSIGs are returned as they indicate that 540 the server supports DNSSEC. Servers that support DNSSEC are supposed 541 to copy the DO bit from the request to the response as per [RFC3225]. 543 Check that EDNS version 1 DNSSEC queries work (EDNS supported): 545 dig +edns=1 +noednsneg +noad +norec +dnssec soa \ 546 $zone @$server 548 expect: status: BADVERS 549 expect: SOA record to not be present 550 expect: OPT record to be present 551 expect: DO=1 to be present if the EDNS version 0 DNSSEC query test 552 returned DO=1 553 expect: EDNS Version 0 in response 555 +noednsneg disables EDNS version negotiation in DiG. 557 Check that new opcodes are handled: 559 dig +noedns +noad +opcode=15 +norec soa $zone @$server 561 expect: status: NOTIMP 562 expect: SOA record to not be present 564 If EDNS is not supported by the nameserver, we expect a response to 565 all the above queries. That response may be a FORMERR or NOTIMP 566 error response or the OPT record may just be ignored. 568 It is advisable to run all the above tests in parallel so as to 569 minimise the delays due to multiple timeouts when the servers do not 570 respond. 572 The above tests use dig from BIND 9.11.0 which is still in 573 development. 575 9. Security Considerations 577 Testing protocol compliance can potentially result in false reports 578 of attempts to break services from Intrusion Detection Services and 579 firewalls. None of the tests listed above should break nominally 580 EDNS compliant servers. None of the tests above should break non 581 EDNS servers. All the tests above are well formed, though not 582 necessarily common, DNS queries. 584 Relaxing firewall settings to ensure EDNS compliance could 585 potentially expose a critical implementation flaw in the nameserver. 586 Nameservers should be tested for conformance before relaxing firewall 587 settings. 589 10. IANA Considerations 591 IANA / ICANN needs to consider what tests, if any, from above that it 592 should add to the zone maintenance procedures for zones under its 593 control including pre-delegation checks. Otherwise this document has 594 no actions for IANA. 596 11. Normative References 598 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 599 STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, 600 . 602 [RFC1035] Mockapetris, P., "Domain names - implementation and 603 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 604 November 1987, . 606 [RFC3225] Conrad, D., "Indicating Resolver Support of DNSSEC", RFC 607 3225, DOI 10.17487/RFC3225, December 2001, 608 . 610 [RFC5966] Bellis, R., "DNS Transport over TCP - Implementation 611 Requirements", RFC 5966, DOI 10.17487/RFC5966, August 612 2010, . 614 [RFC6891] Damas, J., Graff, M., and P. Vixie, "Extension Mechanisms 615 for DNS (EDNS(0))", STD 75, RFC 6891, DOI 10.17487/ 616 RFC6891, April 2013, 617 . 619 Author's Address 621 M. Andrews 622 Internet Systems Consortium 623 950 Charter Street 624 Redwood City, CA 94063 625 US 627 Email: marka@isc.org