idnits 2.17.1 draft-ietf-dnsop-avoid-fragmentation-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (July 28, 2020) is 1361 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 3542 ** Downref: Normative reference to an Informational RFC: RFC 7739 ** Obsolete normative reference: RFC 8499 (Obsoleted by RFC 9499) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Fujiwara 3 Internet-Draft JPRS 4 Intended status: Best Current Practice P. Vixie 5 Expires: January 29, 2021 Farsight 6 July 28, 2020 8 Fragmentation Avoidance in DNS 9 draft-ietf-dnsop-avoid-fragmentation-01 11 Abstract 13 EDNS0 enables a DNS server to send large responses using UDP and is 14 widely deployed. Path MTU discovery remains widely undeployed due to 15 security issues, and IP fragmentation has exposed weaknesses in 16 application protocols. Currently, DNS is known to be the largest 17 user of IP fragmentation. It is possible to avoid IP fragmentation 18 in DNS by limiting response size where possible, and signaling the 19 need to upgrade from UDP to TCP transport where necessary. This 20 document proposes to avoid IP fragmentation in DNS. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on January 29, 2021. 39 Copyright Notice 41 Copyright (c) 2020 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 3. Proposal to avoid IP fragmentation in DNS . . . . . . . . . . 3 59 4. Maximum DNS/UDP payload size . . . . . . . . . . . . . . . . 5 60 5. Incremental deployment . . . . . . . . . . . . . . . . . . . 5 61 6. Request to zone operators and DNS server operators . . . . . 6 62 7. Considerations . . . . . . . . . . . . . . . . . . . . . . . 6 63 7.1. Protocol compliance . . . . . . . . . . . . . . . . . . . 6 64 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 65 9. Security Considerations . . . . . . . . . . . . . . . . . . . 7 66 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 7 67 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 68 11.1. Normative References . . . . . . . . . . . . . . . . . . 7 69 11.2. Informative References . . . . . . . . . . . . . . . . . 8 70 Appendix A. How to retrieve path MTU value to a destination from 71 applications . . . . . . . . . . . . . . . . . . . . 9 72 Appendix B. Minimal-responses . . . . . . . . . . . . . . . . . 9 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 75 1. Introduction 77 DNS has EDNS0 [RFC6891] mechanism. It enables a DNS server to send 78 large responses using UDP. EDNS0 is now widely deployed, and DNS 79 (over UDP) is said to be the biggest user of IP fragmentation. 81 However, "Fragmentation Considered Poisonous" [Herzberg2013] proposed 82 effective off-path DNS cache poisoning attack vectors using IP 83 fragmentation. "IP fragmentation attack on DNS" [Hlavacek2013] and 84 "Domain Validation++ For MitM-Resilient PKI" [Brandt2018] proposed 85 that off-path attackers can intervene in path MTU discovery [RFC1191] 86 to perform intentionally fragmented responses from authoritative 87 servers. [RFC7739] stated the security implications of predictable 88 fragment identification values. 90 DNSSEC is a countermeasure against cache poisoning attacks that use 91 IP fragmentation. However, DNS delegation responses are not signed 92 with DNSSEC, and DNSSEC does not have a mechanism to get the correct 93 response if an incorrect delegation is injected. This is a denial- 94 of-service vulnerability that can yield failed name resolutions. If 95 cache poisoning attacks can be avoided, DNSSEC validation failures 96 will be avoided. 98 In Section 3.2 (Message Side Guidelines) of UDP Usage Guidelines 99 [RFC8085] we are told that an application SHOULD NOT send UDP 100 datagrams that result in IP packets that exceed the Maximum 101 Transmission Unit (MTU) along the path to the destination. 103 A DNS message receiver cannot trust fragmented UDP datagrams 104 primarily due to the small amount of entropy provided by UDP port 105 numbers and DNS message identifiers, each of which being only 16 bits 106 in size, and both likely being in the first fragment of a packet, if 107 fragmentation occurs. By comparison, TCP protocol stack controls 108 packet size and avoid IP fragmentation under ICMP NEEDFRAG attacks. 109 In TCP, fragmentation should be avoided for performance reasons, 110 whereas for UDP, fragmentation should be avoided for resiliency and 111 authenticity reasons. 113 [I-D.ietf-intarea-frag-fragile] summarized that IP fragmentation 114 introduces fragility to Internet communication. The transport of DNS 115 messages over UDP should take account of the observations stated in 116 that document. 118 This document proposes to avoid IP fragmentation in DNS/UDP. 120 2. Terminology 122 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 123 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 124 "OPTIONAL" in this document are to be interpreted as described in 125 BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all 126 capitals, as shown here. 128 "Requestor" refers to the side that sends a request. "Responder" 129 refers to an authoritative, recursive resolver or other DNS component 130 that responds to questions. (Quoted from EDNS0 [RFC6891]) 132 "Path MTU" is the minimum link MTU of all the links in a path between 133 a source node and a destination node. (Quoted from [RFC8201]) 135 Many of the specialized terms used in this document are defined in 136 DNS Terminology [RFC8499]. 138 3. Proposal to avoid IP fragmentation in DNS 140 TCP avoids fragmentation using its Maximum Segment Size (MSS) 141 parameter, but each transmitted segment is header-size aware such 142 that the size of the IP and TCP headers is known, as well as the far 143 end's MSS parameter and the interface or path MTU, so that the 144 segment size can be chosen so as to keep the each IP datagram below a 145 target size. This takes advantage of the elasticity of TCP's 146 packetizing process as to how much queued data will fit into the next 147 segment. In contrast, DNS over UDP has little datagram size 148 elasticity and lacks insight into IP header and option size, and so 149 must make more conservative estimates about available UDP payload 150 space. 152 The minimum MTU for an IPv4 interface is 68 octets, and all receivers 153 must be able to receive and reassemble datagrams at least 576 octets 154 in size (see Section 2.1, NOTE 1 of [I-D.ietf-intarea-frag-fragile]). 155 The minimum MTU for an IPv6 interface is 1280 octets (see Section 5 156 of [RFC8200]). These are theoretic limits and no modern networks 157 implement them. In practice, the smallest MTU witnessed in the 158 operational DNS community is 1500 octets, the Ethernet maximum 159 payload size. While many non-Ethernet networks exist such as Packet 160 on SONET (PoS), Fiber Distributed Data Exchange (FDDI), and Ethernet 161 Jumbo Frame, there is currently no reliable way of discovering such 162 links in an IP transmission path. Absent some kind of path MTU 163 discovery result or a static configuration by the server or system 164 operator, a conservative estimate must be chosen, even if it is less 165 efficient than the path MTU would have been had that been 166 discoverable. 168 The methods to avoid IP fragmentation in DNS are described below: 170 o UDP requestors and responders SHOULD send DNS responses with 171 IP_DONTFRAG / IPV6_DONTFRAG [RFC3542] options, which will yield 172 either a silent timeout, or a network (ICMP) error, if the path 173 MTU is exceeded. Upon a timeout, UDP requestors may retry using 174 TCP or UDP, per local policy. 176 o The estimated maximum DNS/UDP payload size SHOULD be the 177 discovered or estimated path MTU minus the estimated header space. 178 Path MTU discovery [RFC1191], [RFC8201] and 179 [I-D.ietf-tsvwg-datagram-plpmtud] may discover real path MTU value 180 to destinations. One method to retrieve path MTU value is 181 described in Appendix A. When discovered path MTU information is 182 not available, a message sender SHOULD use the default maximum 183 DNS/UDP payload size described in following section. 185 o The maximum buffer size offered by an EDNS0 initiator SHOULD be no 186 larger than the estimated maximum DNS/UDP payload size. If the 187 desired response cannot be reasonably expected to fit into a 188 buffer of that size, the initiator should use TCP instead of UDP. 190 o Responders SHOULD compose UDP responses that result in IP packets 191 that do not exceed the path MTU to the requestor. Thus, if the 192 requestor offers a buffer size larger than responder's discovered 193 or estimated maximum DNS/UDP payload size, then the responder will 194 behave as though the requestor had specified a buffer size equal 195 to the responder's estimated maximum DNS/UDP payload size. 197 o Fragmented DNS/UDP messages may be dropped without IP reassembly. 198 An ICMP error should be sent in this case, with rate limiting to 199 prevent this logic from becoming a DDoS amplification vector. If 200 rate limiting is not possible, then no ICMP error should be sent. 201 (This is a countermeasure against DNS spoofing attacks using IP 202 fragmentation.) 204 The cause and effect of the TC bit is unchanged from EDNS0 [RFC6891]. 206 4. Maximum DNS/UDP payload size 208 o Most of the Internet and especially the inner core has an MTU of 209 at least 1500 octets. An operator of a full resolver would be 210 well advised to measure their path MTU to several authority name 211 servers and to a random sample of their expected stub resolver 212 client networks, to find the upper boundary on IP/UDP packet size 213 in the average case. This limit should not be exceeded by most 214 messages received or transmitted by a full resolver, or else 215 fallback to TCP will occur too often. An operator of 216 authoritative servers would also be well advised to measure their 217 path MTU to several full-service resolvers. The Linux tool 218 "tracepath" can be used to measure the path MTU to well known 219 authority name servers such as [a-m].root-servers.net or [a- 220 m].gtld-servers.net. If the reported path MTU is for example no 221 smaller than 1460, then the maximum DNS/UDP payload would be 1432 222 for IP4 (which is 1460 - IP4 header(20) - UDP header(8)) and 1412 223 for IP6 (which is 1460 - IP6 header(40) - UDP header(8)). To 224 allow for possible IP options and distant tunnel overhead, a 225 useful default for maximum DNS/UDP payload size would be 1400. 227 o [RFC4035] defines that "A security-aware name server MUST support 228 the EDNS0 message size extension, MUST support a message size of 229 at least 1220 octets". Then, the smallest number of the maximum 230 DNS/UDP payload size is 1220. 232 o DNS flag day 2020 proposed 1232 as an EDNS buffer size. 233 [DNSFlagDay2020] By the above reasoning, this proposal is either 234 too small or too large. 236 5. Incremental deployment 238 The proposed method supports incremental deployment. 240 When a full-service resolver implements the proposed method, its stub 241 resolvers (clients) and the authority server network will no longer 242 observe IP fragmentation or reassembly from that server, and will 243 fall back to TCP when necessary. 245 When an authoritative server implements the proposed method, its full 246 service resolvers (clients) will no longer observe IP fragmentation 247 or reassembly from that server, and will fall back to TCP when 248 necessary. 250 6. Request to zone operators and DNS server operators 252 Large DNS responses are the result of zone configuration. Zone 253 operators SHOULD seek configurations resulting in small responses. 254 For example, 256 o Use smaller number of name servers (13 may be too large) 258 o Use smaller number of A/AAAA RRs for a domain name 260 o Use 'minimal-responses' configuration: Some implementations have 261 'minimal responses' configuration that causes DNS servers to make 262 response packets smaller, containing only mandatory and required 263 data (Appendix B). 265 o Use smaller signature / public key size algorithm for DNSSEC. 266 Notably, the signature size of ECDSA or EdDSA is smaller than RSA. 268 7. Considerations 270 7.1. Protocol compliance 272 In prior research ([Fujiwara2018] and dns-operations mailing list 273 discussions), there are some authoritative servers that ignore EDNS0 274 requestor's UDP payload size, and return large UDP responses. 276 It is also well known that there are some authoritative servers that 277 do not support TCP transport. 279 Such non-compliant behavior cannot become implementation or 280 configuration constraints for the rest of the DNS. If failure is the 281 result, then that failure must be localized to the non-compliant 282 servers. 284 8. IANA Considerations 286 This document has no IANA actions. 288 9. Security Considerations 290 10. Acknowledgments 292 The author would like to specifically thank Paul Wouters, Mukund 293 Sivaraman for extensive review and comments. 295 11. References 297 11.1. Normative References 299 [I-D.ietf-intarea-frag-fragile] 300 Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O., 301 and F. Gont, "IP Fragmentation Considered Fragile", draft- 302 ietf-intarea-frag-fragile-17 (work in progress), September 303 2019. 305 [I-D.ietf-tsvwg-datagram-plpmtud] 306 Fairhurst, G., Jones, T., Tuexen, M., Ruengeler, I., and 307 T. Voelker, "Packetization Layer Path MTU Discovery for 308 Datagram Transports", draft-ietf-tsvwg-datagram-plpmtud-22 309 (work in progress), June 2020. 311 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 312 DOI 10.17487/RFC1191, November 1990, 313 . 315 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 316 Requirement Levels", BCP 14, RFC 2119, 317 DOI 10.17487/RFC2119, March 1997, 318 . 320 [RFC3542] Stevens, W., Thomas, M., Nordmark, E., and T. Jinmei, 321 "Advanced Sockets Application Program Interface (API) for 322 IPv6", RFC 3542, DOI 10.17487/RFC3542, May 2003, 323 . 325 [RFC4035] Arends, R., Austein, R., Larson, M., Massey, D., and S. 326 Rose, "Protocol Modifications for the DNS Security 327 Extensions", RFC 4035, DOI 10.17487/RFC4035, March 2005, 328 . 330 [RFC5155] Laurie, B., Sisson, G., Arends, R., and D. Blacka, "DNS 331 Security (DNSSEC) Hashed Authenticated Denial of 332 Existence", RFC 5155, DOI 10.17487/RFC5155, March 2008, 333 . 335 [RFC6891] Damas, J., Graff, M., and P. Vixie, "Extension Mechanisms 336 for DNS (EDNS(0))", STD 75, RFC 6891, 337 DOI 10.17487/RFC6891, April 2013, 338 . 340 [RFC7739] Gont, F., "Security Implications of Predictable Fragment 341 Identification Values", RFC 7739, DOI 10.17487/RFC7739, 342 February 2016, . 344 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 345 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 346 March 2017, . 348 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 349 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 350 May 2017, . 352 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 353 (IPv6) Specification", STD 86, RFC 8200, 354 DOI 10.17487/RFC8200, July 2017, 355 . 357 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 358 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 359 DOI 10.17487/RFC8201, July 2017, 360 . 362 [RFC8499] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS 363 Terminology", BCP 219, RFC 8499, DOI 10.17487/RFC8499, 364 January 2019, . 366 11.2. Informative References 368 [Brandt2018] 369 Brandt, M., Dai, T., Klein, A., Shulman, H., and M. 370 Waidner, "Domain Validation++ For MitM-Resilient PKI", 371 Proceedings of the 2018 ACM SIGSAC Conference on Computer 372 and Communications Security , 2018. 374 [DNSFlagDay2020] 375 "DNS flag day 2020", n.d., . 377 [Fujiwara2018] 378 Fujiwara, K., "Measures against cache poisoning attacks 379 using IP fragmentation in DNS", OARC 30 Workshop , 2019. 381 [Herzberg2013] 382 Herzberg, A. and H. Shulman, "Fragmentation Considered 383 Poisonous", IEEE Conference on Communications and Network 384 Security , 2013. 386 [Hlavacek2013] 387 Hlavacek, T., "IP fragmentation attack on DNS", RIPE 67 388 Meeting , 2013, . 391 Appendix A. How to retrieve path MTU value to a destination from 392 applications 394 Socket options: "IP_MTU (since Linux 2.2) Retrieve the current known 395 path MTU of the current socket. Valid only when the socket has been 396 connected. Returns an integer. Only valid as a getsockopt(2)." 397 (Quoted from Debian GNU Linux manual: ip(7)) 399 "IPV6_MTU getsockopt(): Retrieve the current known path MTU of the 400 current socket. Only valid when the socket has been connected. 401 Returns an integer." (Quoted from Debian GNU Linux manual: ipv6(7)) 403 Appendix B. Minimal-responses 405 Some implementations have 'minimal responses' configuration that 406 causes a DNS server to make response packets smaller, containing only 407 mandatory and required data. 409 Under the minimal-responses configuration, DNS servers compose 410 response messages using only RRSets corresponding to queries. In 411 case of delegation, DNS servers compose response packets with 412 delegation NS RRSet in authority section and in-domain (in-zone and 413 below-zone) glue in the additional data section. In case of non- 414 existent domain name or non-existent type, the start of authority 415 (SOA RR) will be placed in the Authority Section. 417 In addition, if the zone is DNSSEC signed and a query has the DNSSEC 418 OK bit, signatures are added in answer section, or the corresponding 419 DS RRSet and signatures are added in authority section. Details are 420 defined in [RFC4035] and [RFC5155]. 422 Authors' Addresses 423 Kazunori Fujiwara 424 Japan Registry Services Co., Ltd. 425 Chiyoda First Bldg. East 13F, 3-8-1 Nishi-Kanda 426 Chiyoda-ku, Tokyo 101-0065 427 Japan 429 Phone: +81 3 5215 8451 430 Email: fujiwara@jprs.co.jp 432 Paul Vixie 433 Farsight Security Inc 434 177 Bovet Road, Suite 180 435 San Mateo, CA 94402 436 United States of America 438 Phone: +1 650 393 3994 439 Email: vixie@fsi.io