| < draft-ietf-dnsop-serve-stale-05.txt | draft-ietf-dnsop-serve-stale-06.txt > | |||
|---|---|---|---|---|
| DNSOP Working Group D. Lawrence | DNSOP Working Group D. Lawrence | |||
| Internet-Draft Oracle | Internet-Draft Oracle | |||
| Updates: 1034, 1035 (if approved) W. Kumari | Updates: 1034, 1035 (if approved) W. Kumari | |||
| Intended status: Standards Track P. Sood | Intended status: Standards Track P. Sood | |||
| Expires: October 18, 2019 Google | Expires: February 9, 2020 Google | |||
| April 16, 2019 | August 08, 2019 | |||
| Serving Stale Data to Improve DNS Resiliency | Serving Stale Data to Improve DNS Resiliency | |||
| draft-ietf-dnsop-serve-stale-05 | draft-ietf-dnsop-serve-stale-06 | |||
| Abstract | Abstract | |||
| This draft defines a method (serve-stale) for recursive resolvers to | This draft defines a method (serve-stale) for recursive resolvers to | |||
| use stale DNS data to avoid outages when authoritative nameservers | use stale DNS data to avoid outages when authoritative nameservers | |||
| cannot be reached to refresh expired data. It updates the definition | cannot be reached to refresh expired data. It updates the definition | |||
| of TTL from [RFC1034], [RFC1035], and [RFC2181] to make it clear that | of TTL from [RFC1034], [RFC1035], and [RFC2181] to make it clear that | |||
| data can be kept in the cache beyond the TTL expiry and used for | data can be kept in the cache beyond the TTL expiry and used for | |||
| responses when a refreshed answer is not readily available. One of | responses when a refreshed answer is not readily available. One of | |||
| the motivations for serve-stale is to make the DNS more resilient to | the motivations for serve-stale is to make the DNS more resilient to | |||
| DoS attacks, and thereby make them less attractive as an attack | DoS attacks, and thereby make them less attractive as an attack | |||
| vector. | vector. | |||
| Ed note | ||||
| Text inside square brackets ([]) is additional background | ||||
| information, answers to frequently asked questions, general musings, | ||||
| etc. They will be removed before publication. This document is | ||||
| being collaborated on in GitHub at <https://github.com/vttale/serve- | ||||
| stale>. The most recent version of the document, open issues, etc | ||||
| should all be available here. The authors gratefully accept pull | ||||
| requests. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on October 18, 2019. | ||||
| This Internet-Draft will expire on February 9, 2020. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 31 ¶ | skipping to change at page 2, line 21 ¶ | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 4. Standards Action . . . . . . . . . . . . . . . . . . . . . . 4 | 4. Standards Action . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 5. Example Method . . . . . . . . . . . . . . . . . . . . . . . 4 | 5. Example Method . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 6. Implementation Considerations . . . . . . . . . . . . . . . . 6 | 6. Implementation Considerations . . . . . . . . . . . . . . . . 6 | |||
| 7. Implementation Caveats . . . . . . . . . . . . . . . . . . . 8 | 7. Implementation Caveats . . . . . . . . . . . . . . . . . . . 8 | |||
| 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 9 | 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 9 | |||
| 9. EDNS Option . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 9. EDNS Option . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | |||
| 11. Privacy Considerations . . . . . . . . . . . . . . . . . . . 10 | 11. Privacy Considerations . . . . . . . . . . . . . . . . . . . 10 | |||
| 12. NAT Considerations . . . . . . . . . . . . . . . . . . . . . 11 | 12. NAT Considerations . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 | 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 15.1. Normative References . . . . . . . . . . . . . . . . . . 11 | 15.1. Normative References . . . . . . . . . . . . . . . . . . 11 | |||
| 15.2. Informative References . . . . . . . . . . . . . . . . . 12 | 15.2. Informative References . . . . . . . . . . . . . . . . . 11 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 1. Introduction | 1. Introduction | |||
| Traditionally the Time To Live (TTL) of a DNS resource record has | Traditionally the Time To Live (TTL) of a DNS resource record has | |||
| been understood to represent the maximum number of seconds that a | been understood to represent the maximum number of seconds that a | |||
| record can be used before it must be discarded, based on its | record can be used before it must be discarded, based on its | |||
| description and usage in [RFC1035] and clarifications in [RFC2181]. | description and usage in [RFC1035] and clarifications in [RFC2181]. | |||
| This document proposes that the definition of the TTL be explicitly | This document proposes that the definition of the TTL be explicitly | |||
| skipping to change at page 3, line 25 ¶ | skipping to change at page 3, line 19 ¶ | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| For a comprehensive treatment of DNS terms, please see [RFC7719]. | For a comprehensive treatment of DNS terms, please see [RFC7719]. | |||
| 3. Background | 3. Background | |||
| There are a number of reasons why an authoritative server may become | There are a number of reasons why an authoritative server may become | |||
| unreachable, including Denial of Service (DoS) attacks, network | unreachable, including Denial of Service (DoS) attacks, network | |||
| issues, and so on. If the recursive server is unable to contact the | issues, and so on. If a recursive server is unable to contact the | |||
| authoritative servers for a query but still has relevant data that | authoritative servers for a query but still has relevant data that | |||
| has aged past its TTL, that information can still be useful for | has aged past its TTL, that information can still be useful for | |||
| generating an answer under the metaphorical assumption that "stale | generating an answer under the metaphorical assumption that "stale | |||
| bread is better than no bread." | bread is better than no bread." | |||
| [RFC1035] Section 3.2.1 says that the TTL "specifies the time | [RFC1035] Section 3.2.1 says that the TTL "specifies the time | |||
| interval that the resource record may be cached before the source of | interval that the resource record may be cached before the source of | |||
| the information should again be consulted", and Section 4.1.3 further | the information should again be consulted", and Section 4.1.3 further | |||
| says the TTL, "specifies the time interval (in seconds) that the | says the TTL, "specifies the time interval (in seconds) that the | |||
| resource record may be cached before it should be discarded." | resource record may be cached before it should be discarded." | |||
| skipping to change at page 4, line 5 ¶ | skipping to change at page 3, line 46 ¶ | |||
| [RFC2181] aimed to provide "the precise definition of the Time to | [RFC2181] aimed to provide "the precise definition of the Time to | |||
| Live", but in Section 8 was mostly concerned with the numeric range | Live", but in Section 8 was mostly concerned with the numeric range | |||
| of values and the possibility that very large values should be | of values and the possibility that very large values should be | |||
| capped. (It also has the curious suggestion that a value in the | capped. (It also has the curious suggestion that a value in the | |||
| range 2147483648 to 4294967295 should be treated as zero.) It closes | range 2147483648 to 4294967295 should be treated as zero.) It closes | |||
| that section by noting, "The TTL specifies a maximum time to live, | that section by noting, "The TTL specifies a maximum time to live, | |||
| not a mandatory time to live." This is again not [RFC2119]-normative | not a mandatory time to live." This is again not [RFC2119]-normative | |||
| language, but does convey the natural language connotation that data | language, but does convey the natural language connotation that data | |||
| becomes unusable past TTL expiry. | becomes unusable past TTL expiry. | |||
| Several major recursive resolver operators currently use stale data | Several recursive resolver operators currently use stale data for | |||
| for answers in some way, including Akamai (in three different | answers in some way, including Akamai. A number of recursive | |||
| resolver implementations), BIND, Knot, OpenDNS, and Unbound. Apple | resolver packages (including BIND, Know, OpenDNS, Unbound) provide | |||
| MacOS can also use stale data as part of the Happy Eyeballs | options to use stale data. Apple MacOS can also use stale data as | |||
| algorithms in mDNSResponder. The collective operational experience | part of the Happy Eyeballs algorithms in mDNSResponder. The | |||
| is that it provides significant benefit with minimal downside. | collective operational experience is that it provides significant | |||
| benefit with minimal downside. | ||||
| 4. Standards Action | 4. Standards Action | |||
| The definition of TTL in [RFC1035] Sections 3.2.1 and 4.1.3 is | The definition of TTL in [RFC1035] Sections 3.2.1 and 4.1.3 is | |||
| amended to read: | amended to read: | |||
| TTL a 32-bit unsigned integer number of seconds that specifies the | TTL a 32-bit unsigned integer number of seconds that specifies the | |||
| duration that the resource record MAY be cached before the source | duration that the resource record MAY be cached before the source | |||
| of the information MUST again be consulted. Zero values are | of the information MUST again be consulted. Zero values are | |||
| interpreted to mean that the RR can only be used for the | interpreted to mean that the RR can only be used for the | |||
| transaction in progress, and should not be cached. Values SHOULD | transaction in progress, and should not be cached. Values SHOULD | |||
| be capped on the orders of days to weeks, with a recommended cap | be capped on the orders of days to weeks, with a recommended cap | |||
| of 604,800 seconds. If the data is unable to be authoritatively | of 604,800 seconds (seven days). If the data is unable to be | |||
| refreshed when the TTL expires, the record MAY be used as though | authoritatively refreshed when the TTL expires, the record MAY be | |||
| it is unexpired. | used as though it is unexpired. | |||
| Interpreting values which have the high order bit set as being | Interpreting values which have the high order bit set as being | |||
| positive, rather than 0, is a change from [RFC2181]. Suggesting a | positive, rather than 0, is a change from [RFC2181]. Suggesting a | |||
| cap of seven days, rather than the 68 years allowed by [RFC2181], | cap of seven days, rather than the 68 years allowed by [RFC2181], | |||
| reflects the current practice of major modern DNS resolvers. | reflects the current practice of major modern DNS resolvers. | |||
| When returning a response containing stale records, the recursive | When returning a response containing stale records, a recursive | |||
| resolver MUST set the TTL of each expired record in the message to a | resolver MUST set the TTL of each expired record in the message to a | |||
| value greater than 0, with 30 seconds RECOMMENDED. | value greater than 0, with 30 seconds RECOMMENDED. | |||
| Answers from authoritative servers that have a DNS Response Code of | Answers from authoritative servers that have a DNS Response Code of | |||
| either 0 (NoError) or 3 (NXDomain) and the Authoritative Answers (AA) | either 0 (NoError) or 3 (NXDomain) and the Authoritative Answers (AA) | |||
| bit set MUST be considered to have refreshed the data at the | bit set MUST be considered to have refreshed the data at the | |||
| resolver. Answers from authoritative servers that have any other | resolver. Answers from authoritative servers that have any other | |||
| response code SHOULD be considered a failure to refresh the data and | response code SHOULD be considered a failure to refresh the data and | |||
| therefor leave any previous state intact. | therefor leave any previous state intact. | |||
| 5. Example Method | 5. Example Method | |||
| There is conceivably more than one way a recursive resolver could | There is more than one way a recursive resolver could responsibly | |||
| responsibly implement this resiliency feature while still respecting | implement this resiliency feature while still respecting the intent | |||
| the intent of the TTL as a signal for when data is to be refreshed. | of the TTL as a signal for when data is to be refreshed. | |||
| In this example method four notable timers drive considerations for | In this example method four notable timers drive considerations for | |||
| the use of stale data, as follows: | the use of stale data: | |||
| o A client response timer, which is the maximum amount of time a | o A client response timer, which is the maximum amount of time a | |||
| recursive resolver should allow between the receipt of a | recursive resolver should allow between the receipt of a | |||
| resolution request and sending its response. | resolution request and sending its response. | |||
| o A query resolution timer, which caps the total amount of time a | o A query resolution timer, which caps the total amount of time a | |||
| recursive resolver spends processing the query. | recursive resolver spends processing the query. | |||
| o A failure recheck timer, which limits the frequency at which a | o A failure recheck timer, which limits the frequency at which a | |||
| failed lookup will be attempted again. | failed lookup will be attempted again. | |||
| o A maximum stale timer, which caps the amount of time that records | o A maximum stale timer, which caps the amount of time that records | |||
| will be kept past their expiration. | will be kept past their expiration. | |||
| Most recursive resolvers already have the query resolution timer, and | Most recursive resolvers already have the query resolution timer, and | |||
| effectively some kind of failure recheck timer. The client response | effectively some kind of failure recheck timer. The client response | |||
| timer and maximum stale timer are new concepts for this mechanism. | timer and maximum stale timer are new concepts for this mechanism. | |||
| When a request is received by the recursive resolver, it should start | When a request is received by a recursive resolver, it should start | |||
| the client response timer. This timer is used to avoid client | the client response timer. This timer is used to avoid client | |||
| timeouts. It should be configurable, with a recommended value of 1.8 | timeouts. It should be configurable, with a recommended value of 1.8 | |||
| seconds as being just under a common timeout value of 2 seconds while | seconds as being just under a common timeout value of 2 seconds while | |||
| still giving the resolver a fair shot at resolving the name. | still giving the resolver a fair shot at resolving the name. | |||
| The resolver then checks its cache for any unexpired data that | The resolver then checks its cache for any unexpired records that | |||
| satisfies the request and of course returns them if available. If it | satisfy the request and returns them if available. If it finds no | |||
| finds no relevant unexpired data and the Recursion Desired flag is | relevant unexpired data and the Recursion Desired flag is not set in | |||
| not set in the request, it should immediately return the response | the request, it should immediately return the response without | |||
| without consulting the cache for expired records. Typically this | consulting the cache for expired records. Typically this response | |||
| response would be a referral to authoritative nameservers covering | would be a referral to authoritative nameservers covering the zone, | |||
| the zone, but the specifics are implementation dependent. | but the specifics are implementation dependent. | |||
| If iterative lookups will be done, then the failure recheck timer is | If iterative lookups will be done, then the failure recheck timer is | |||
| consulted. Attempts to refresh from non-responsive or otherwise | consulted. Attempts to refresh from non-responsive or otherwise | |||
| failing authoritative nameservers are recommended to be done no more | failing authoritative nameservers are recommended to be done no more | |||
| frequently than every 30 seconds. If this request was received | frequently than every 30 seconds. If this request was received | |||
| within this period, the cache may be immediately consulted for stale | within this period, the cache may be immediately consulted for stale | |||
| data to satisfy the request. | data to satisfy the request. | |||
| Outside the period of the failure recheck timer, the resolver should | Outside the period of the failure recheck timer, the resolver should | |||
| start the query resolution timer and begin the iterative resolution | start the query resolution timer and begin the iterative resolution | |||
| process. This timer bounds the work done by the resolver when | process. This timer bounds the work done by the resolver when | |||
| contacting external authorities, and is commonly around 10 to 30 | contacting external authorities, and is commonly around 10 to 30 | |||
| seconds. If this timer expires on an attempted lookup that is still | seconds. If this timer expires on an attempted lookup that is still | |||
| being processed, the resolution effort is abandoned. | being processed, the resolution effort is abandoned. | |||
| If the answer has not been completely determined by the time the | If the answer has not been completely determined by the time the | |||
| client response timer has elapsed, the resolver should then check its | client response timer has elapsed, the resolver should then check its | |||
| cache to see whether there is expired data that would satisfy the | cache to see whether there is expired data that would satisfy the | |||
| request. If so, it adds that data to the response message with a TTL | request. If so, it adds that data to the response message with a TTL | |||
| greater than 0 per Section 4. The response is then sent to the | greater than 0 (as specified in Section 4). The response is then | |||
| client while the resolver continues its attempt to refresh the data. | sent to the client while the resolver continues its attempt to | |||
| refresh the data. | ||||
| When no authorities are able to be reached during a resolution | When no authorities are able to be reached during a resolution | |||
| attempt, the resolver should attempt to refresh the delegation and | attempt, the resolver should attempt to refresh the delegation and | |||
| restart the iterative lookup process with the remaining time on the | restart the iterative lookup process with the remaining time on the | |||
| query resolution timer. This resumption should be done only once | query resolution timer. This resumption should be done only once | |||
| during one resolution effort. | during one resolution effort. | |||
| Outside the resolution process, the maximum stale timer is used for | Outside the resolution process, the maximum stale timer is used for | |||
| cache management and is independent of the query resolution process. | cache management and is independent of the query resolution process. | |||
| This timer is conceptually different from the maximum cache TTL that | This timer is conceptually different from the maximum cache TTL that | |||
| exists in many resolvers, the latter being a clamp on the value of | exists in many resolvers, the latter being a clamp on the value of | |||
| TTLs as received from authoritative servers and recommended to be 7 | TTLs as received from authoritative servers and recommended to be | |||
| days in the TTL definition above. The maximum stale timer should be | seven days in the TTL definition in Section 4. The maximum stale | |||
| configurable, and defines the length of time after a record expires | timer should be configurable, and defines the length of time after a | |||
| that it should be retained in the cache. The suggested value is | record expires that it should be retained in the cache. The | |||
| between 1 and 3 days. | suggested value is between 1 and 3 days. | |||
| 6. Implementation Considerations | 6. Implementation Considerations | |||
| This document mainly describes the issues behind serving stale data | This document mainly describes the issues behind serving stale data | |||
| and intentionally does not provide a formal algorithm. The concept | and intentionally does not provide a formal algorithm. The concept | |||
| is not overly complex, and the details are best left to resolver | is not overly complex, and the details are best left to resolver | |||
| authors to implement in their codebases. The processing of serve- | authors to implement in their codebases. The processing of serve- | |||
| stale is a local operation, and consistent variables between | stale is a local operation, and consistent variables between | |||
| deployments are not needed for interoperability. However, we would | deployments are not needed for interoperability. However, we would | |||
| like to highlight the impact of various implementation choices, | like to highlight the impact of various implementation choices, | |||
| skipping to change at page 8, line 11 ¶ | skipping to change at page 8, line 4 ¶ | |||
| where a failed lookup (say, during pre-fetching) doesn't impact the | where a failed lookup (say, during pre-fetching) doesn't impact the | |||
| existing cache state. Some authoritative servers operators have said | existing cache state. Some authoritative servers operators have said | |||
| that they would prefer stale answers to be used in the event that | that they would prefer stale answers to be used in the event that | |||
| their servers are responding with errors like ServFail instead of | their servers are responding with errors like ServFail instead of | |||
| giving true authoritative answers. Implementers MAY decide to return | giving true authoritative answers. Implementers MAY decide to return | |||
| stale answers in this situation. | stale answers in this situation. | |||
| Since the goal of serve-stale is to provide resiliency for all | Since the goal of serve-stale is to provide resiliency for all | |||
| obvious errors to refresh data, these other RCODEs are treated as | obvious errors to refresh data, these other RCODEs are treated as | |||
| though they are equivalent to not getting an authoritative response. | though they are equivalent to not getting an authoritative response. | |||
| Although NXDomain for a previously existing name might well be an | Although NXDomain for a previously existing name might well be an | |||
| error, it is not handled that way because there is no effective way | error, it is not handled that way because there is no effective way | |||
| to distinguish operator intent for legitimate cases versus error | to distinguish operator intent for legitimate cases versus error | |||
| cases. | cases. | |||
| During discussion in dnsop it was suggested that Refused from all | During discussion in the IETF, it was suggested that, if all | |||
| authorities should be treated, from a serve-stale perspective, as | authorities return responses with RCODE of Refused, it may be an | |||
| though it were equivalent to NXDomain because it represents an | ||||
| explicit signal to take down the zone from servers that still have | explicit signal to take down the zone from servers that still have | |||
| the zone's delegation pointed to them. Refused, however, is also | the zone's delegation pointed to them. Refused, however, is also | |||
| overloaded to mean multiple possible failures which could represent | overloaded to mean multiple possible failures which could represent | |||
| transient configuration failures. Operational experience has shown | transient configuration failures. Operational experience has shown | |||
| that purposely returning Refused is a poor way to achieve an explicit | that purposely returning Refused is a poor way to achieve an explicit | |||
| takedown of a zone compared to either updating the delegation or | takedown of a zone compared to either updating the delegation or | |||
| returning NXDomain with a suitable SOA for extended negative caching. | returning NXDomain with a suitable SOA for extended negative caching. | |||
| Implementers MAY nonetheless consider whether to treat all | Implementers MAY nonetheless consider whether to treat all | |||
| authorities returning Refused as preempting the use of stale data. | authorities returning Refused as preempting the use of stale data. | |||
| 7. Implementation Caveats | 7. Implementation Caveats | |||
| Stale data is used only when refreshing has failed in order to adhere | Stale data is used only when refreshing has failed in order to adhere | |||
| to the original intent of the design of the DNS and the behaviour | to the original intent of the design of the DNS and the behaviour | |||
| expected by operators. If stale data were to always be used | expected by operators. If stale data were to always be used | |||
| immediately and then a cache refresh attempted after the client | immediately and then a cache refresh attempted after the client | |||
| response has been sent, the resolver would frequently be sending data | response has been sent, the resolver would frequently be sending data | |||
| that it would have had no trouble refreshing. As modern resolvers | that it would have had no trouble refreshing. Because modern | |||
| use techniques like pre-fetching and request coalescing for | resolvers use techniques like pre-fetching and request coalescing for | |||
| efficiency, it is not necessary that every client request needs to | efficiency, it is not necessary that every client request needs to | |||
| trigger a new lookup flow in the presence of stale data, but rather | trigger a new lookup flow in the presence of stale data, but rather | |||
| that a good-faith effort has been recently made to refresh the stale | that a good-faith effort has been recently made to refresh the stale | |||
| data before it is delivered to any client. | data before it is delivered to any client. | |||
| It is important to continue the resolution attempt after the stale | It is important to continue the resolution attempt after the stale | |||
| response has been sent, until the query resolution timeout, because | response has been sent, until the query resolution timeout, because | |||
| some pathological resolutions can take many seconds to succeed as | some pathological resolutions can take many seconds to succeed as | |||
| they cope with unavailable servers, bad networks, and other problems. | they cope with unavailable servers, bad networks, and other problems. | |||
| Stopping the resolution attempt when the response with expired data | Stopping the resolution attempt when the response with expired data | |||
| has been sent would mean that answers in these pathological cases | has been sent would mean that answers in these pathological cases | |||
| would never be refreshed. | would never be refreshed. | |||
| The continuing prohibition against using data with a 0 second TTL | The continuing prohibition against using data with a 0 second TTL | |||
| beyond the current transaction explicitly extends to it being | beyond the current transaction explicitly extends to it being | |||
| unusable even for stale fallback, as it is not to be cached at all. | unusable even for stale fallback, as it is not to be cached at all. | |||
| Be aware that Canonical Name (CNAME) records mingled in the expired | Be aware that Canonical Name (CNAME) and DNAME [RFC6672] records | |||
| cache with other records at the same owner name can cause surprising | mingled in the expired cache with other records at the same owner | |||
| results. This was observed with an initial implementation in BIND | name can cause surprising results. This was observed with an initial | |||
| when a hostname changed from having an IPv4 Address (A) record to a | implementation in BIND when a hostname changed from having an IPv4 | |||
| CNAME. The version of BIND being used did not evict other types in | Address (A) record to a CNAME. The version of BIND being used did | |||
| the cache when a CNAME was received, which in normal operations is | not evict other types in the cache when a CNAME was received, which | |||
| not a significant issue. However, after both records expired and the | in normal operations is not a significant issue. However, after both | |||
| authorities became unavailable, the fallback to stale answers | records expired and the authorities became unavailable, the fallback | |||
| returned the older A instead of the newer CNAME. | to stale answers returned the older A instead of the newer CNAME. | |||
| 8. Implementation Status | 8. Implementation Status | |||
| [RFC Editor: per RFC 6982 this section should be removed prior to | [RFC Editor: per RFC 6982 this section should be removed prior to | |||
| publication.] | publication.] | |||
| The algorithm described in Section 5 was originally implemented as a | The algorithm described in Section 5 was originally implemented as a | |||
| patch to BIND 9.7.0. It has been in production on Akamai's | patch to BIND 9.7.0. It has been in production on Akamai's | |||
| production network since 2011, and effectively smoothed over | production network since 2011, and effectively smoothed over | |||
| transient failures and longer outages that would have resulted in | transient failures and longer outages that would have resulted in | |||
| skipping to change at page 10, line 7 ¶ | skipping to change at page 9, line 41 ¶ | |||
| In the research paper "When the Dike Breaks: Dissecting DNS Defenses | In the research paper "When the Dike Breaks: Dissecting DNS Defenses | |||
| During DDoS" [DikeBreaks], the authors detected some use of stale | During DDoS" [DikeBreaks], the authors detected some use of stale | |||
| answers by resolvers when authorities came under attack. Their | answers by resolvers when authorities came under attack. Their | |||
| research results suggest that more widespread adoption of the | research results suggest that more widespread adoption of the | |||
| technique would significantly improve resiliency for the large number | technique would significantly improve resiliency for the large number | |||
| of requests that fail or experience abnormally long resolution times | of requests that fail or experience abnormally long resolution times | |||
| during an attack. | during an attack. | |||
| 9. EDNS Option | 9. EDNS Option | |||
| During the discussion of serve-stale in the IETF dnsop working group, | During the discussion of serve-stale in the IETF, it was suggested | |||
| it was suggested that an EDNS option should be available to either | that an EDNS option should be available to either explicitly opt-in | |||
| explicitly opt-in to getting data that is possibly stale, or at least | to getting data that is possibly stale, or at least as a debugging | |||
| as a debugging tool to indicate when stale data has been used for a | tool to indicate when stale data has been used for a response. | |||
| response. | ||||
| The opt-in use case was rejected as the technique was meant to be | The opt-in use case was rejected as the technique was meant to be | |||
| immediately useful in improving DNS resiliency for all clients. | immediately useful in improving DNS resiliency for all clients. | |||
| The reporting case was ultimately also rejected as working group | The reporting case was ultimately also rejected because even the | |||
| participants determined that even the simpler version of a proposed | simpler version of a proposed option was still too much bother to | |||
| option was still too much bother to implement for too little | implement for too little perceived value. | |||
| perceived value. | ||||
| 10. Security Considerations | 10. Security Considerations | |||
| The most obvious security issue is the increased likelihood of DNSSEC | The most obvious security issue is the increased likelihood of DNSSEC | |||
| validation failures when using stale data because signatures could be | validation failures when using stale data because signatures could be | |||
| returned outside their validity period. This would only be an issue | returned outside their validity period. This would only be an issue | |||
| if the authoritative servers are unreachable, the only time the | if the authoritative servers are unreachable, the only time the | |||
| techniques in this document are used, and thus does not introduce a | techniques in this document are used, and thus does not introduce a | |||
| new failure in place of what would have otherwise been success. | new failure in place of what would have otherwise been success. | |||
| Additionally, bad actors have been known to use DNS caches to keep | Additionally, bad actors have been known to use DNS caches to keep | |||
| records alive even after their authorities have gone away. This | records alive even after their authorities have gone away. This | |||
| potentially makes that easier, although without introducing a new | potentially makes that easier, although without introducing a new | |||
| risk. | risk. | |||
| In [CloudStrife] it was demonstrated how stale DNS data, namely | In [CloudStrife], it was demonstrated how stale DNS data, namely | |||
| hostnames pointing to addresses that are no longer in use by the | hostnames pointing to addresses that are no longer in use by the | |||
| owner of the name, can be used to co-opt security such as to get | owner of the name, can be used to co-opt security such as to get | |||
| domain-validated certificates fraudulently issued to an attacker. | domain-validated certificates fraudulently issued to an attacker. | |||
| While this RFC does not create a new vulnerability in this area, it | While this document does not create a new vulnerability in this area, | |||
| does potentially enlarge the window in which such an attack could be | it does potentially enlarge the window in which such an attack could | |||
| made. An obvious mitigation is that not only should a certificate | be made. A proposed mitigation is that certificate authorities | |||
| authority not use a resolver that has this feature enabled, it should | should fully look up each name starting at the DNS root for every | |||
| probably not use a caching resolver at all and instead fully look up | name lookup. Alternatively, CAs should use a resolver that is not | |||
| each name freshly from the root. | serving stale data. | |||
| 11. Privacy Considerations | 11. Privacy Considerations | |||
| This document does not add any practical new privacy issues. | This document does not add any practical new privacy issues. | |||
| 12. NAT Considerations | 12. NAT Considerations | |||
| The method described here is not affected by the use of NAT devices. | The method described here is not affected by the use of NAT devices. | |||
| 13. IANA Considerations | 13. IANA Considerations | |||
| There are no IANA considerations. | There are no IANA considerations. | |||
| 14. Acknowledgements | 14. Acknowledgements | |||
| The authors wish to thank Robert Edmonds, Tony Finch, Bob Harold, | The authors wish to thank Robert Edmonds, Tony Finch, Bob Harold, | |||
| Tatuya Jinmei, Matti Klock, Jason Moreau, Giovane Moura, Jean Roy, | Tatuya Jinmei, Matti Klock, Jason Moreau, Giovane Moura, Jean Roy, | |||
| Mukund Sivaraman, Davey Song, Paul Vixie, Ralf Weber and Paul Wouters | Mukund Sivaraman, Davey Song, Paul Vixie, Ralf Weber and Paul Wouters | |||
| for their review and feedback. | for their review and feedback. | |||
| Paul Hoffman deserves special thanks for submitting a number of Pull | ||||
| Requests. | ||||
| 15. References | 15. References | |||
| 15.1. Normative References | 15.1. Normative References | |||
| [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | |||
| STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, | STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, | |||
| <https://www.rfc-editor.org/info/rfc1034>. | <https://www.rfc-editor.org/info/rfc1034>. | |||
| [RFC1035] Mockapetris, P., "Domain names - implementation and | [RFC1035] Mockapetris, P., "Domain names - implementation and | |||
| specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, | specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, | |||
| skipping to change at page 12, line 23 ¶ | skipping to change at page 12, line 5 ¶ | |||
| content/uploads/2018/02/ | content/uploads/2018/02/ | |||
| ndss2018_06A-4_Borgolte_paper.pdf>. | ndss2018_06A-4_Borgolte_paper.pdf>. | |||
| [DikeBreaks] | [DikeBreaks] | |||
| Moura, G., Heidemann, J., Mueller, M., Schmidt, R., and M. | Moura, G., Heidemann, J., Mueller, M., Schmidt, R., and M. | |||
| Davids, "When the Dike Breaks: Dissecting DNS Defenses | Davids, "When the Dike Breaks: Dissecting DNS Defenses | |||
| During DDos", ACM 2018 Internet Measurement Conference, | During DDos", ACM 2018 Internet Measurement Conference, | |||
| DOI 10.1145/3278532.3278534, October 2018, | DOI 10.1145/3278532.3278534, October 2018, | |||
| <https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf>. | <https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf>. | |||
| [RFC6672] Rose, S. and W. Wijngaards, "DNAME Redirection in the | ||||
| DNS", RFC 6672, DOI 10.17487/RFC6672, June 2012, | ||||
| <https://www.rfc-editor.org/info/rfc6672>. | ||||
| [RFC7719] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS | [RFC7719] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS | |||
| Terminology", RFC 7719, DOI 10.17487/RFC7719, December | Terminology", RFC 7719, DOI 10.17487/RFC7719, December | |||
| 2015, <https://www.rfc-editor.org/info/rfc7719>. | 2015, <https://www.rfc-editor.org/info/rfc7719>. | |||
| Authors' Addresses | Authors' Addresses | |||
| David C Lawrence | David C Lawrence | |||
| Oracle | Oracle | |||
| Email: tale@dd.org | Email: tale@dd.org | |||
| End of changes. 27 change blocks. | ||||
| 79 lines changed or deleted | 77 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||