| < draft-ietf-dnsop-serve-stale-04.txt | draft-ietf-dnsop-serve-stale-05.txt > | |||
|---|---|---|---|---|
| DNSOP Working Group D. Lawrence | DNSOP Working Group D. Lawrence | |||
| Internet-Draft Oracle | Internet-Draft Oracle | |||
| Updates: 1034, 1035 (if approved) W. Kumari | Updates: 1034, 1035 (if approved) W. Kumari | |||
| Intended status: Standards Track P. Sood | Intended status: Standards Track P. Sood | |||
| Expires: September 10, 2019 Google | Expires: October 18, 2019 Google | |||
| March 09, 2019 | April 16, 2019 | |||
| Serving Stale Data to Improve DNS Resiliency | Serving Stale Data to Improve DNS Resiliency | |||
| draft-ietf-dnsop-serve-stale-04 | draft-ietf-dnsop-serve-stale-05 | |||
| Abstract | Abstract | |||
| This draft defines a method (serve-stale) for recursive resolvers to | This draft defines a method (serve-stale) for recursive resolvers to | |||
| use stale DNS data to avoid outages when authoritative nameservers | use stale DNS data to avoid outages when authoritative nameservers | |||
| cannot be reached to refresh expired data. It updates the definition | cannot be reached to refresh expired data. It updates the definition | |||
| of TTL from [RFC1034], [RFC1035], and [RFC2181] to make it clear that | of TTL from [RFC1034], [RFC1035], and [RFC2181] to make it clear that | |||
| data can be kept in the cache beyond the TTL expiry and used for | data can be kept in the cache beyond the TTL expiry and used for | |||
| responses when a refreshed answer is not readily available. One of | responses when a refreshed answer is not readily available. One of | |||
| the motivations for serve-stale is to make the DNS more resilient to | the motivations for serve-stale is to make the DNS more resilient to | |||
| skipping to change at page 2, line 4 ¶ | skipping to change at page 2, line 4 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on September 10, 2019. | This Internet-Draft will expire on October 18, 2019. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 31 ¶ | skipping to change at page 2, line 31 ¶ | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 4. Standards Action . . . . . . . . . . . . . . . . . . . . . . 4 | 4. Standards Action . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 5. Example Method . . . . . . . . . . . . . . . . . . . . . . . 4 | 5. Example Method . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 6. Implementation Considerations . . . . . . . . . . . . . . . . 6 | 6. Implementation Considerations . . . . . . . . . . . . . . . . 6 | |||
| 7. Implementation Caveats . . . . . . . . . . . . . . . . . . . 8 | 7. Implementation Caveats . . . . . . . . . . . . . . . . . . . 8 | |||
| 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 9 | 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 9 | |||
| 9. EDNS Option . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 9. EDNS Option . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | |||
| 11. Privacy Considerations . . . . . . . . . . . . . . . . . . . 10 | 11. Privacy Considerations . . . . . . . . . . . . . . . . . . . 10 | |||
| 12. NAT Considerations . . . . . . . . . . . . . . . . . . . . . 10 | 12. NAT Considerations . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 | 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 15.1. Normative References . . . . . . . . . . . . . . . . . . 11 | 15.1. Normative References . . . . . . . . . . . . . . . . . . 11 | |||
| 15.2. Informative References . . . . . . . . . . . . . . . . . 11 | 15.2. Informative References . . . . . . . . . . . . . . . . . 12 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 1. Introduction | 1. Introduction | |||
| Traditionally the Time To Live (TTL) of a DNS resource record has | Traditionally the Time To Live (TTL) of a DNS resource record has | |||
| been understood to represent the maximum number of seconds that a | been understood to represent the maximum number of seconds that a | |||
| record can be used before it must be discarded, based on its | record can be used before it must be discarded, based on its | |||
| description and usage in [RFC1035] and clarifications in [RFC2181]. | description and usage in [RFC1035] and clarifications in [RFC2181]. | |||
| This document proposes that the definition of the TTL be explicitly | This document proposes that the definition of the TTL be explicitly | |||
| skipping to change at page 5, line 22 ¶ | skipping to change at page 5, line 22 ¶ | |||
| o A failure recheck timer, which limits the frequency at which a | o A failure recheck timer, which limits the frequency at which a | |||
| failed lookup will be attempted again. | failed lookup will be attempted again. | |||
| o A maximum stale timer, which caps the amount of time that records | o A maximum stale timer, which caps the amount of time that records | |||
| will be kept past their expiration. | will be kept past their expiration. | |||
| Most recursive resolvers already have the query resolution timer, and | Most recursive resolvers already have the query resolution timer, and | |||
| effectively some kind of failure recheck timer. The client response | effectively some kind of failure recheck timer. The client response | |||
| timer and maximum stale timer are new concepts for this mechanism. | timer and maximum stale timer are new concepts for this mechanism. | |||
| When a request is received by the recursive resolver, it SHOULD start | When a request is received by the recursive resolver, it should start | |||
| the client response timer. This timer is used to avoid client | the client response timer. This timer is used to avoid client | |||
| timeouts. It SHOULD be configurable, with a recommended value of 1.8 | timeouts. It should be configurable, with a recommended value of 1.8 | |||
| seconds as being just under a common timeout value of 2 seconds while | seconds as being just under a common timeout value of 2 seconds while | |||
| still giving the resolver a fair shot at resolving the name. | still giving the resolver a fair shot at resolving the name. | |||
| The resolver then checks its cache for any unexpired data that | The resolver then checks its cache for any unexpired data that | |||
| satisfies the request and of course returns them if available. If it | satisfies the request and of course returns them if available. If it | |||
| finds no relevant unexpired data and the Recursion Desired flag is | finds no relevant unexpired data and the Recursion Desired flag is | |||
| not set in the request, it SHOULD immediately return the response | not set in the request, it should immediately return the response | |||
| without consulting the cache for expired records. Typically this | without consulting the cache for expired records. Typically this | |||
| response would be a referral to authoritative nameservers covering | response would be a referral to authoritative nameservers covering | |||
| the zone, but the specifics are implementation dependent. | the zone, but the specifics are implementation dependent. | |||
| If iterative lookups will be done, then the failure recheck timer is | If iterative lookups will be done, then the failure recheck timer is | |||
| consulted. Attempts to refresh from non-responsive or otherwise | consulted. Attempts to refresh from non-responsive or otherwise | |||
| failing authoritative nameservers are recommended to be done no more | failing authoritative nameservers are recommended to be done no more | |||
| frequently than every 30 seconds. If this request was received | frequently than every 30 seconds. If this request was received | |||
| within this period, the cache may be immediately consulted for stale | within this period, the cache may be immediately consulted for stale | |||
| data to satisfy the request. | data to satisfy the request. | |||
| Outside the period of the failure recheck timer, the resolver SHOULD | Outside the period of the failure recheck timer, the resolver should | |||
| start the query resolution timer and begin the iterative resolution | start the query resolution timer and begin the iterative resolution | |||
| process. This timer bounds the work done by the resolver when | process. This timer bounds the work done by the resolver when | |||
| contacting external authorities, and is commonly around 10 to 30 | contacting external authorities, and is commonly around 10 to 30 | |||
| seconds. If this timer expires on an attempted lookup that is still | seconds. If this timer expires on an attempted lookup that is still | |||
| being processed, the resolution effort is abandoned. | being processed, the resolution effort is abandoned. | |||
| If the answer has not been completely determined by the time the | If the answer has not been completely determined by the time the | |||
| client response timer has elapsed, the resolver SHOULD then check its | client response timer has elapsed, the resolver should then check its | |||
| cache to see whether there is expired data that would satisfy the | cache to see whether there is expired data that would satisfy the | |||
| request. If so, it adds that data to the response message with a TTL | request. If so, it adds that data to the response message with a TTL | |||
| greater than 0 per Section 4. The response is then sent to the | greater than 0 per Section 4. The response is then sent to the | |||
| client while the resolver continues its attempt to refresh the data. | client while the resolver continues its attempt to refresh the data. | |||
| When no authorities are able to be reached during a resolution | When no authorities are able to be reached during a resolution | |||
| attempt, the resolver SHOULD attempt to refresh the delegation and | attempt, the resolver should attempt to refresh the delegation and | |||
| restart the iterative lookup process with the remaining time on the | restart the iterative lookup process with the remaining time on the | |||
| query resolution timer. This resumption should be done only once | query resolution timer. This resumption should be done only once | |||
| during one resolution effort. | during one resolution effort. | |||
| Outside the resolution process, the maximum stale timer is used for | Outside the resolution process, the maximum stale timer is used for | |||
| cache management and is independent of the query resolution process. | cache management and is independent of the query resolution process. | |||
| This timer is conceptually different from the maximum cache TTL that | This timer is conceptually different from the maximum cache TTL that | |||
| exists in many resolvers, the latter being a clamp on the value of | exists in many resolvers, the latter being a clamp on the value of | |||
| TTLs as received from authoritative servers and recommended to be 7 | TTLs as received from authoritative servers and recommended to be 7 | |||
| days in the TTL definition above. The maximum stale timer SHOULD be | days in the TTL definition above. The maximum stale timer should be | |||
| configurable, and defines the length of time after a record expires | configurable, and defines the length of time after a record expires | |||
| that it SHOULD be retained in the cache. The suggested value is 7 | that it should be retained in the cache. The suggested value is | |||
| days, which gives time for monitoring to notice the resolution | between 1 and 3 days. | |||
| problem and for human intervention to fix it. | ||||
| 6. Implementation Considerations | 6. Implementation Considerations | |||
| This document mainly describes the issues behind serving stale data | This document mainly describes the issues behind serving stale data | |||
| and intentionally does not provide a formal algorithm. The concept | and intentionally does not provide a formal algorithm. The concept | |||
| is not overly complex, and the details are best left to resolver | is not overly complex, and the details are best left to resolver | |||
| authors to implement in their codebases. The processing of serve- | authors to implement in their codebases. The processing of serve- | |||
| stale is a local operation, and consistent variables between | stale is a local operation, and consistent variables between | |||
| deployments are not needed for interoperability. However, we would | deployments are not needed for interoperability. However, we would | |||
| like to highlight the impact of various implementation choices, | like to highlight the impact of various implementation choices, | |||
| starting with the timers involved. | starting with the timers involved. | |||
| The most obvious of these is the maximum stale timer. If this | The most obvious of these is the maximum stale timer. If this | |||
| variable is too large it could cause excessive cache memory usage, | variable is too large it could cause excessive cache memory usage, | |||
| but if it is too small, the serve-stale technique becomes less | but if it is too small, the serve-stale technique becomes less | |||
| effective, as the record may not be in the cache to be used if | effective, as the record may not be in the cache to be used if | |||
| needed. Increased memory consumption could be mitigated by | needed. Shorter values, even less than a day, can effectively handle | |||
| prioritizing removal of stale records over non-expired records during | the vast majority of outages. Longer values, as much as a week, give | |||
| cache exhaustion. Implementations may also wish to consider whether | time for monitoring systems to notice a resolution problem and for | |||
| to track the names in requests for their last time of use or their | human intervention to fix it; operational experience has been that | |||
| sometimes the right people can be hard to track down and | ||||
| unfortunately slow to remedy the situation. | ||||
| Increased memory consumption could be mitigated by prioritizing | ||||
| removal of stale records over non-expired records during cache | ||||
| exhaustion. Implementations may also wish to consider whether to | ||||
| track the names in requests for their last time of use or their | ||||
| popularity, using that as an additional factor when considering cache | popularity, using that as an additional factor when considering cache | |||
| eviction. A feature to manually flush only stale records could also | eviction. A feature to manually flush only stale records could also | |||
| be useful. | be useful. | |||
| The client response timer is another variable which deserves | The client response timer is another variable which deserves | |||
| consideration. If this value is too short, there exists the risk | consideration. If this value is too short, there exists the risk | |||
| that stale answers may be used even when the authoritative server is | that stale answers may be used even when the authoritative server is | |||
| actually reachable but slow; this may result in sub-optimal answers | actually reachable but slow; this may result in sub-optimal answers | |||
| being returned. Conversely, waiting too long will negatively impact | being returned. Conversely, waiting too long will negatively impact | |||
| user experience. | user experience. | |||
| End of changes. 15 change blocks. | ||||
| 23 lines changed or deleted | 29 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||