idnits 2.17.1 draft-bonica-intarea-frag-fragile-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 4, 2018) is 2243 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'Anderson2001' is defined on line 818, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Area WG R. Bonica 3 Internet-Draft Juniper Networks 4 Intended status: Best Current Practice F. Baker 5 Expires: September 5, 2018 Unaffiliated 6 G. Huston 7 APNIC 8 R. Hinden 9 Check Point Software 10 O. Troan 11 Cisco 12 F. Gont 13 SI6 Networks 14 March 4, 2018 16 IP Fragmentation Considered Fragile 17 draft-bonica-intarea-frag-fragile-01 19 Abstract 21 This document provides an overview of IP fragmentation. It explains 22 how IP fragmentation works and why it is required. As part of that 23 explanation, this document also explains how IP fragmentation reduces 24 the reliability of Internet communication. 26 This document also proposes alternatives to IP fragmentation. 27 Finally, it provides recommendations for application developers and 28 network operators. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on September 5, 2018. 47 Copyright Notice 49 Copyright (c) 2018 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . . 3 66 2.1. Links, Paths, MTU and PMTU . . . . . . . . . . . . . . . 3 67 2.2. Upper-layer Protocols . . . . . . . . . . . . . . . . . . 5 68 3. Requirements Language . . . . . . . . . . . . . . . . . . . . 7 69 4. IP Fragmentation Reduces Reliability . . . . . . . . . . . . 7 70 4.1. Middle Box Failures . . . . . . . . . . . . . . . . . . . 7 71 4.2. Partial Filtering . . . . . . . . . . . . . . . . . . . . 8 72 4.3. Suboptimal Load Balancing . . . . . . . . . . . . . . . . 8 73 4.4. Security Vulnerabilities . . . . . . . . . . . . . . . . 9 74 4.5. Blackholing Due to ICMP Loss . . . . . . . . . . . . . . 11 75 4.6. Blackholing Due To Filtering . . . . . . . . . . . . . . 12 76 5. Alternatives to IP Fragmentation . . . . . . . . . . . . . . 12 77 5.1. Transport Layer Solutions . . . . . . . . . . . . . . . . 13 78 5.2. Application Layer Solutions . . . . . . . . . . . . . . . 14 79 6. Applications That Rely on IPv6 Fragmentation . . . . . . . . 15 80 6.1. DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 81 6.2. OSPFv3 . . . . . . . . . . . . . . . . . . . . . . . . . 15 82 6.3. IP Encapsulations . . . . . . . . . . . . . . . . . . . . 16 83 7. Recommendation . . . . . . . . . . . . . . . . . . . . . . . 16 84 7.1. For Application Developers . . . . . . . . . . . . . . . 16 85 7.2. For Network Operators . . . . . . . . . . . . . . . . . . 16 86 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 87 9. Security Considerations . . . . . . . . . . . . . . . . . . . 16 88 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 89 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 90 11.1. Normative References . . . . . . . . . . . . . . . . . . 17 91 11.2. Informative References . . . . . . . . . . . . . . . . . 18 92 Appendix A. Contributors' Address . . . . . . . . . . . . . . . 20 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 95 1. Introduction 97 Operational experience [RFC7872] [Huston] reveals that IP 98 fragmentation reduces the reliability of Internet communication. 99 This document provides an overview of IP fragmentation. It explains 100 how IP fragmentation works and why it is required. As part of that 101 explanation, this document also explains how IP fragmentation reduces 102 the reliability of Internet communication. 104 This document also proposes alternatives to IP fragmentation. 105 Finally, it provides recommendations for application developers and 106 network operators. 108 2. IP Fragmentation 110 2.1. Links, Paths, MTU and PMTU 112 An Internet path connects a source node to a destination node. A 113 path can contain links and intermediate systems. If a path contains 114 more than one link, the links are connected in series and an 115 intermediate system connects each link to the next. An intermediate 116 system can be a router or a middle box. 118 Internet paths are dynamic. Assume that the path from one node to 119 another contains a set of links and intermediate systems. If the 120 network topology changes, that path can also change so that it 121 includes a different set of links and intermediate systems. 123 Each link is constrained by the number of bytes that it can convey in 124 a single IP packet. This constraint is called the link Maximum 125 Transmission Unit (MTU). IPv4 [RFC0791] requires every link to have 126 an MTU of 68 bytes or greater. IPv6 [RFC8200] requires every link to 127 have an MTU of 1280 bytes or greater. These are called the IPv4 and 128 IPv6 minimum link MTU's. 130 Each Internet path is constrained by the number of bytes that it can 131 convey in a IP single packet. This constraint is called the Path MTU 132 (PMTU). For any given path, the PMTU is equal to the smallest of its 133 link MTU's. Because Internet paths are dynamic, PMTU is also 134 dynamic. 136 For reasons described below, source nodes estimate the PMTU between 137 themselves and destination nodes. A source node can produce 138 extremely conservative PMTU estimates in which: 140 o The estimate for each IPv4 path is equal to IPv4 minimum link MTU 141 (68 bytes). 143 o The estimate for each IPv6 path is equal to the IPv6 minimum link 144 MTU (1280 bytes). 146 While these conservative estimates are guaranteed to be less than or 147 equal to the actual MTU, they are likely to be much less than the 148 actual PMTU. This may adversely affect upper-layer protocol 149 performance. 151 By executing Path MTU Discovery (PMTUD) [RFC1191] [RFC8201] 152 procedures, a source node can maintain a less conservative, running 153 estimate of the PMTU between itself and a destination node. 154 According to these procedures, the source node produces an initial 155 PMTU estimate. This initial estimate is equal to the MTU of the 156 first link along the path to the destination node. It can be greater 157 than the actual PMTU. 159 Having produced an initial PMTU estimate, the source node sends non- 160 fragmentable IP packets to the destination node. If one of these 161 packets is larger than the actual PMTU, a downstream router will not 162 be able to forward the packet through the next link along the path. 163 Therefore, the downstream router drops the packet and send an 164 Internet Control Message Protocol (ICMP) [RFC0792] [RFC4443] Packet 165 Too Big (PTB) message to the source node. The ICMP PTB message 166 indicates the MTU of the link through which the packet could not be 167 forwarded. The source node uses this information to refine its PMTU 168 estimate. 170 PMTUD produces a running estimate of the PMTU between a source node 171 and a destination node. Because PMTU is dynamic, at any given time, 172 the PMTU estimate can differ from the actual PMTU. In order to 173 detect PMTU increases, PMTUD occasionally resets the PMTU estimate to 174 the MTU of the first link along path to the destination node. It 175 then repeats the procedure described above. 177 Furthermore, PMTUD has the following characteristics: 179 o It relies on the network's ability to deliver ICMP PTB messages to 180 the source node. 182 o It is susceptible to attack because ICMP messages are easily 183 forged [RFC5927]. 185 FOOTNOTE: According to RFC 0791, every IPv4 host must be capable of 186 receiving a packet whose length is equal to 576 bytes. However, the 187 IPv4 minimum link MTU is not 576. Section 3.2 of RFC 0791 explicitly 188 states that the IPv4 minimum link MTU is 68 bytes. 190 FOOTNOTE: In the paragraphs above, the term "non-fragmentable packet" 191 is introduced. A non-fragmentable packet can be fragmented at its 192 source. However, it cannot be fragmented by a downstream node. An 193 IPv4 packet whose DF-bit is set to zero is fragmentable. An IPv4 194 packet whose DF-bit is set to one is non-fragmentable. All IPv6 195 packets are also non-fragmentable. 197 FOOTNOTE: In the paragraphs above, the term "ICMP PTB message" is 198 introduced. The ICMP PTB message has two instantiations. In ICMPv4 199 [RFC0792], the ICMP PTB message is Destination Unreachable message 200 with Code equal to (4) fragmentation needed and DF set. This message 201 was augmented by [RFC1191] to indicates the MTU of the link through 202 which the packet could not be forwarded. In ICMPv6 [RFC4443], the 203 ICMP PTB message is a Packet Too Big Message with Code equal to (0). 204 This message also indicates the MTU of the link through which the 205 packet could not be forwarded. 207 2.2. Upper-layer Protocols 209 When an upper-layer protocol submits data to the underlying IP 210 module, and the resulting IP packet's length is greater than the 211 PMTU, IP fragmentation may be required. IP fragmentation divides a 212 packet into fragments. Each fragment includes an IP header and a 213 portion of the original packet. 215 [RFC0791] describes IPv4 fragmentation procedures. IPv4 packets 216 whose DF-bit is set to one cannot be fragmented. IPv4 packets whose 217 DF-bit is set to zero can be fragmented at the source node or by any 218 downstream router. [RFC8200] describes IPv6 fragmentation 219 procedures. IPv6 packets can be fragmented at the source node only. 221 IPv4 fragmentation differs slightly from IPv6 fragmentation. 222 However, in both IP versions, the upper-layer header appears in the 223 first fragment only. It does not appear in subsequent fragments. 225 Upper-layer protocols can operate in the following modes: 227 o Do not rely on IP fragmentation. 229 o Rely on IP source fragmentation only (i.e., fragmentation at the 230 source node). 232 o Rely on IP source fragmentation and downstream fragmentation 233 (i.e., fragmentation at any node along the path). 235 Upper-layer protocols running over IPv4 can operate in the first and 236 third modes (above). Upper-layer protocols running over IPv6 can 237 operate in the first and second modes (above). 239 Upper-layer protocols that operate in the first two modes (above) 240 require access to the PMTU estimate. In order to fulfil this 241 requirement, they can 243 o Estimate the PMTU to be equal to the IPv4 or IPv6 minimum link 244 MTU. 246 o Access the estimate that PMTUD produced. 248 o Execute PMTUD procedures themselves. 250 o Execute Packetization Layer PMTUD (PLPMTUD) [RFC4821] 251 [I-D.fairhurst-tsvwg-datagram-plpmtud] procedures. 253 According to PLPMTUD procedures, the upper-layer protocol maintains a 254 running PMTU estimate. It does so by sending probe packets of 255 various sizes to its peer and receiving acknowledgements. This 256 strategy differs from PMTUD in that it relies of acknowledgement of 257 received messages, as opposed to ICMP PTB messages concerning dropped 258 messages. Therefore, PLPMTUD does not rely on the network's ability 259 to deliver ICMP PTB messages to the source. 261 An upper-layer protocol that does not rely on IP fragmentation never 262 causes the underlying IP module to emit 264 o A fragmentable IP packet (i.e., an IPv4 packet with the DF-bit set 265 to zero). 267 o An IP fragment. 269 o A packet whose length is greater than the PMTU estimate. 271 However, when the PMTU estimate is greater than the actual PMTU, the 272 upper-layer protocol can cause the underlying IP module to emit a 273 packet whose length is greater than the actual PMTU. When this 274 occurs, a downstream router drops the packet and the source node 275 refines its PMTU estimate, employing either PMTUD or PLPMTUD 276 procedures. 278 When an upper-layer protocol that relies on IP source fragmentation 279 only submits data to the underlying IP module, and the resulting 280 packet is larger than the PMTU estimate, the underlying IP module 281 fragments the packet and emits the fragments. However, the upper- 282 layer protocol never causes the underlying IP module to emit 284 o A fragmentable IP packet. 286 o A packet whose length is greater than the PMTU estimate. 288 When the PMTU estimate is greater than the actual PMTU, the upper- 289 layer protocol can cause the underlying IP module to emit a packet 290 whose length is greater than the actual PMTU. When this occurs, a 291 downstream router drops the packet and the source node refines its 292 PMTU estimate, employing either PMTUD or PLPMTUD procedures. 294 An upper-layer protocol that relies on IP source fragmentation and 295 downstream fragmentation can cause the underlying IP module to emit 297 o A fragmentable IP packet. 299 o An IP fragment. 301 o A packet whose length is greater than the PMTU estimate. 303 A protocol that relies on IP source fragmentation and downstream 304 fragmentation does not require access to the PMTU estimate. For 305 these protocols, the underlying IP module: 307 o Fragments all packets whose length exceeds the MTU of the first 308 link along the path to the destination. 310 o Sets the DF-bit to zero, so that downstream nodes can fragment the 311 packet. 313 3. Requirements Language 315 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 316 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 317 "OPTIONAL" in this document are to be interpreted as described in BCP 318 14 [RFC2119] [RFC8174] when, and only when, they appear in all 319 capitals, as shown here. 321 4. IP Fragmentation Reduces Reliability 323 This section explains how IP fragmentation reduces the reliability of 324 Internet communication. 326 4.1. Middle Box Failures 328 Many middle boxes require access to the transport-layer header. 329 However, when a packet is divided into fragments, the transport-layer 330 header appears in the first fragment only. It does not appear in 331 subsequent fragments. This omission can prevent middle boxes from 332 delivering their intended services. 334 For example, assume that a router diverts selected packets from their 335 normal path towards network appliances that support deep packet 336 inspection and lawful intercept. The router selects packets for 337 diversion based upon the following 5-tuple: 339 o IP Source Address. 341 o IP Destination Address. 343 o IPv4 Protocol or IPv6 Next Header. 345 o transport-layer source port. 347 o transport-layer destination port. 349 IP fragmentation causes this selection algorithm to behave 350 suboptimally, because the transport-layer header appears only in the 351 first fragment of each packet. 353 In another example, a middle box remarks a packet's Differentiated 354 Services Code Point [RFC2474] based upon the above mentioned 5-tuple. 355 IP fragmentation causes this process to behave suboptimally, because 356 the transport-layer header appears only in the first fragment of each 357 packet. 359 In all of the above-mentioned examples, the middle box cannot deliver 360 its intended service without reassembling fragmented packets. 362 4.2. Partial Filtering 364 IP fragments cause problems for firewalls whose filter rules include 365 decision making based on TCP and UDP ports. As the port information 366 is not in the trailing fragments the firewall may elect to accept all 367 trailing fragments, which may admit certain classes of attack, or may 368 elect to block all trailing fragments, which may block otherwise 369 legitimate traffic, or may elect to reassemble all fragmented 370 packets, which may be inefficient and negatively affect performance. 372 4.3. Suboptimal Load Balancing 374 Many stateless load-balancers require access to the transport-layer 375 header. Assume that a load-balancer distributes flows among parallel 376 links. In order to optimize load balancing, the load-balancer sends 377 every packet or packet fragment belonging to a flow through the same 378 link. 380 In order to assign a packet or packet fragment to a link, the load- 381 balancer executes an algorithm. If the packet or packet fragment 382 contains a transport-layer header, the load balancing algorithm 383 accepts the following 5-tuple as input: 385 o IP Source Address. 387 o IP Destination Address. 389 o IPv4 Protocol or IPv6 Next Header. 391 o transport-layer source port. 393 o transport-layer destination port. 395 However, if the packet or packet fragment does not contain a 396 transport-layer header, the load balancing algorithm accepts only the 397 following 3-tuple as input: 399 o IP Source Address. 401 o IP Destination Address. 403 o IPv4 Protocol or IPv6 Next Header. 405 Therefore, non-fragmented packets belonging to a flow can be assigned 406 to one link while fragmented packets belonging to the same flow can 407 be divided between that link and another. This can cause suboptimal 408 load balancing. 410 4.4. Security Vulnerabilities 412 Security researchers have documented several attacks that rely on IP 413 fragmentation. The following are examples: 415 o Overlapping fragment attack [RFC1858] [RFC5722] 417 o Resource exhaustion attacks (such as the Rose Attack) 419 o Attacks based on predictable fragment Identification values 420 [RFC7739] 422 o Attacks based on bugs in the implementation of the fragment 423 reassembly algorithm 425 o Evasion of Network Intrusion Detection Systems (NIDS) [Ptacek1998] 427 In the overlapping fragment attack, an attacker constructs a series 428 of packet fragments. The first fragment contains an IP header, a 429 transport-layer header, and some transport-layer payload. This 430 fragment complies with local security policy and is allowed to pass 431 through a stateless firewall. A second fragment, having a non-zero 432 offset, overlaps with the first fragment. The second fragment also 433 passes through the stateless firewall. When the packet is 434 reassembled, the transport layer header from the first fragment is 435 overwritten by data from the second fragment. The reassembled packet 436 does not comply with local security policy. Had it traversed the 437 firewall in one piece, the firewall would have rejected it. 439 A stateless firewall cannot protect against the overlapping fragment 440 attack. However, destination nodes can protect against the 441 overlapping fragment attack by implementing the reassembly procedures 442 described in RFC 1858 and RFC 8200. These reassembly procedures 443 detect the overlap and discard the packet. 445 The fragment reassembly algorithm is a stateful procedure for an 446 otherwise stateless protocol. As such, it can be exploited for 447 resource exhaustion attacks. An attacker can construct a series of 448 fragmented packets, with one fragment missing from each packet such 449 that the reassembly process cannot complete. Thus, this attack 450 causes resource exhaustion on the destination node, possibly denying 451 reassembly services to other flows. This type of attack can be 452 mitigated by flushing fragment reassembly buffers when necessary, at 453 the expense of possibly dropping legitimate fragments. 455 An IP fragment contains an "Identification" field that, together with 456 the IP Source Address and Destination Address of a packet, identifies 457 fragments that correspond to the same original datagram, such that 458 they can be reassembled together by the receiving host. Many 459 implementations have employed predictable values for the 460 Identification field, thus making it easy for an attacker to forge 461 malicious IP fragments that would cause the reassembly procedure for 462 legitimate packets to fail. 464 Over the years multiple IPv4 and IPv6 implementations have been found 465 to have flaws in their implementation of the IP fragment reassembly 466 algorithm, typically resulting in buffer overflows. These buffer 467 overflows have been exploitable for denial of service and remote code 468 execution attacks. 470 NIDS aims at identifying malicious activity by analyzing network 471 traffic. Ambiguity in the possible result of the fragment reasembly 472 process may allow an attacker to evade these systems. Many of these 473 systems try to mitigate some of these evasion techniques by e.g. 474 computing all possible outcomes of the fragment reassembly process, 475 at the expense of increased processing requirements. 477 4.5. Blackholing Due to ICMP Loss 479 As stated above, an upper-layer protocol requires access the PMTU 480 estimate if it: 482 o Does not rely on IP fragmentation. 484 o Relies on IP source fragmentation only (i.e., fragmentation at the 485 source node). 487 In order to satisfy this requirement, the upper-layer protocol can: 489 o Estimate the PMTU to be equal to the IPv4 or IPv6 minimum link 490 MTU. 492 o Access the estimate that PMTUD produced. 494 o Execute PMTUD procedures itself. 496 o Execute PLPMTUD procedures. 498 PMTUD relies upon the network's ability to deliver ICMP PTB messages 499 to the source node. Therefore, if an upper-layer protocol relies on 500 PMTUD for its PMTU estimate, it also relies on the networks ability 501 to deliver ICMP PTB messages to the source node. 503 [RFC4890] states that the PTB messages must not be filtered. 504 However, ICMP delivery is not reliable. It is subject to transient 505 loss and, in some configurations, more persistent delivery issues. 507 ICMP rate limiting, network congestion and packet corruption can 508 cause transient loss. The effect of rate limiting may be severe, as 509 RFC 4443 recommends strict rate limiting of IPv6 traffic. 511 While transient loss causes PMTUD to perform less efficiently, it 512 does not cause PMTUD to fail completely. When the conditions 513 contributing to transient loss abate, the network regains its ability 514 to deliver ICMP PTB messages and PMTUD regains its ability to 515 function. 517 By contrast, more persistent delivery issues cause PMTUD to fail 518 completely. Consider the following example: 520 A DNS client sends a request to an anycast address. The network 521 routes that DNS request to the nearest instance of that anycast 522 address (i.e., a DNS Server). The DNS server generates a response 523 and sends it back to the DNS client. While the response does not 524 exceed the DNS server's PMTU estimate, it does exceed the actual 525 PMTU. 527 A downstream router drops the packet and sends an ICMP PTB message 528 the packet's source (i.e., the anycast address). The network routes 529 the ICMP PTB message to the anycast instance closest to the 530 downstream router. Sadly, that anycast instance may not be the DNS 531 server that originated the DNS response. It may be another DNS 532 server with the same anycast address. The DNS server that originated 533 the response may never receive the ICMP PTB message and may never 534 updates it PMTU estimate. 536 The problem described in this section is specific to PMTUD. It does 537 not occur when the upper-layer protocol obtains its PMTU estimate 538 from PLPMTUD or any other source. 540 Furthermore, the problem described in this section occurs when the 541 upper-layer protocol does not rely on IP fragmentation, as well as 542 when the upper-layer protocol relies on IP source fragmentation only. 544 4.6. Blackholing Due To Filtering 546 In RFC 7872, researchers sampled Internet paths to determine whether 547 they would convey packets that contain IPv6 extension headers. 548 Sampled paths terminated at popular Internet sites (e.g., popular 549 web, mail and DNS servers). 551 The study revealed that at least 28% of the sampled paths did not 552 convey packets containing the IPv6 Fragment extension header. In 553 most cases, fragments were dropped in the destination autonomous 554 system. In other cases, the fragments were dropped in transit 555 autonomous systems. 557 Another recent study [Huston] confirmed this finding. It reported 558 that 37% of sampled endpoints used IPv6-capable DNS resolvers that 559 were incapable of receiving a fragmented IPv6 response. 561 It is difficult to determine why network operators drop fragments. 562 In some cases, packet drop may be caused by misconfiguration. In 563 other cases, network operators may consciously choose to drop IPv6 564 fragments, in order to address the issues raised in Section 4.1 565 through Section 4.5, above. 567 5. Alternatives to IP Fragmentation 568 5.1. Transport Layer Solutions 570 The Transport Control Protocol (TCP) [RFC0793]) can be operated in a 571 mode that does not require IP fragmentation. 573 Applications submit a stream of data to TCP. TCP divides that stream 574 of data into segments, with no segment exceeding the TCP Maximum 575 Segment Size (MSS). Each segment is encapsulated in a TCP header and 576 submitted to the underlying IP module. The underlying IP module 577 prepends an IP header and forwards the resulting packet. 579 If the TCP MSS is sufficiently small, the underlying IP module never 580 produces a packet whose length is greater than the actual PMTU. 581 Therefore, IP fragmentation is not required. 583 TCP offers the following mechanisms for MSS management: 585 o Manual configuration 587 o PMTUD 589 o PLPMTUD 591 For IPv6 nodes, manual configuration is always applicable. If the 592 MSS is manually configured to 1220 bytes and the packet does not 593 contain extension headers, the IP layer will never produce a packet 594 whose length is greater than the IPv6 minimum link MTU (1280 bytes). 595 However, manual configuration prevents TCP from taking advantage of 596 larger link MTU's. 598 RFC 8200 strongly recommends that IPv6 nodes implement PMTUD, in 599 order to discover and take advantage of path MTUs greater than 1280 600 bytes. However, as mentioned in Section 2.1, PMTUD relies upon the 601 network's ability to deliver ICMP PTB messages. Therefore, PMTUD is 602 applicable only in environments where the risk of ICMP PTB loss is 603 acceptable. 605 By contrast, PLPMTUD does not rely upon the network's ability to 606 deliver ICMP PTB messages. However, in many loss-based TCP 607 congestion control algorithms, the dropping of a packet may cause the 608 TCP control algorithm to drop the congestion control window, or even 609 re-start with the entire slow start process. For high capacity, long 610 RTT, large volume TCP streams, the deliberate probing with large 611 packets and the consequent packet drop may impose too harsh a penalty 612 on total TCP throughput for it to be a viable approach. [RFC4821] 613 defines PLPMTUD procedures for TCP. 615 While TCP will never cause the underlying IP module to emit a packet 616 that is larger than the PMTU estimate, it can cause the underlying IP 617 module to emit a packet that is larger than the actual PMTU. If this 618 occurs, the packet is dropped, the PMTU estimate is updated, the 619 segment is divided into smaller segments and each smaller segment is 620 submitted to the underlying IP module. 622 The Datagram Congestion Control Protocol (DCCP) [RFC4340] and the 623 Stream Control Protocol (SCP) [RFC4960] also can be operated in a 624 mode that does not require IP fragmentation. They both accept data 625 from an application and divide that data into segments, with no 626 segment exceeding a maximum size. Both DCCP and SCP offer manual 627 configuration, PMTUD and PLPMTUD as mechanisms for managing that 628 maximum size. [I-D.fairhurst-tsvwg-datagram-plpmtud] proposes 629 PLPMTUD procedures for DCCP and SCP. 631 5.2. Application Layer Solutions 633 [RFC8085] recognizes that IP fragmentation reduces the reliability of 634 Internet communication. Therefore, it offers the following advice 635 regarding applications the run over the User Data Protocol (UDP) 636 [RFC0768]. 638 "An application SHOULD NOT send UDP datagrams that result in IP 639 packets that exceed the Maximum Transmission Unit (MTU) along the 640 path to the destination. Consequently, an application SHOULD either 641 use the path MTU information provided by the IP layer or implement 642 Path MTU Discovery (PMTUD) itself to determine whether the path to a 643 destination will support its desired message size without 644 fragmentation." 646 RFC 8085 continues: 648 "Applications that do not follow the recommendation to do PMTU/ 649 PLPMTUD discovery SHOULD still avoid sending UDP datagrams that would 650 result in IP packets that exceed the path MTU. Because the actual 651 path MTU is unknown, such applications SHOULD fall back to sending 652 messages that are shorter than the default effective MTU for sending 653 (EMTU_S in [RFC1122]). For IPv4, EMTU_S is the smaller of 576 bytes 654 and the first-hop MTU. For IPv6, EMTU_S is 1280 bytes. The 655 effective PMTU for a directly connected destination (with no routers 656 on the path) is the configured interface MTU, which could be less 657 than the maximum link payload size. Transmission of minimum-sized 658 UDP datagrams is inefficient over paths that support a larger PMTU, 659 which is a second reason to implement PMTU discovery." 661 RFC 8085 assumes that for IPv4, an EMTU_S of 576 is sufficiently 662 small, even though the IPv4 minimum link MTU is 68 bytes. 664 This advice applies equally to application that run directly over IP. 666 6. Applications That Rely on IPv6 Fragmentation 668 The following applications rely on IPv6 fragmentation: 670 o DNS [RFC1035] 672 o OSPFv3 [RFC5340] 674 o IP Encapsulations 676 Each of these applications relies on IPv6 fragmentation to a varying 677 degree. In some cases, that reliance is essential, and cannot be 678 broken without fundamentally changing the protocol. In other cases, 679 that reliance is incidental, and most implementations already take 680 appropriate steps to avoid fragmentation. 682 This list is not comprehensive, and other protocols that rely on IPv6 683 fragmentation may exist. They are not specifically considered in the 684 context of this document. 686 6.1. DNS 688 DNS can obtain transport services from either UDP or TCP. Superior 689 performance and scaling characteristics are observed when DNS runs 690 over UDP. 692 DNS Servers that execute DNSSEC [RFC4035] procedures are more likely 693 to generate large responses. Therefore, when running over UDP, they 694 are more likely to cause the generation of IPv6 fragments. DNS's 695 reliance upon IPv6 fragmentation is fundamental and cannot be broken 696 without changing the DNS specification. 698 DNS is an essential part of the Internet architecture. Therefore, 699 this issue is for further study and must be resolved before DNSSEC 700 can be deployed successfully in IPv6 only networks. 702 6.2. OSPFv3 704 OSPFv3 implementations can emit messages large enough to cause IPv6 705 fragmentation. However, in keeping with the recommendations of 706 RFC8200, and in order to optimize performance, most OSPFv3 707 implementations restrict their maximum message size to the IPv6 708 minimum link MTU. 710 6.3. IP Encapsulations 712 In this document, IP encapsulations include IP-in-IP [RFC2003], 713 Generic Routing Encapsulation (GRE) [RFC2784], GRE-in-UDP [RFC8086] 714 and Generic Packet Tunneling in IPv6 [RFC2473]. The fragmentation 715 strategy described for GRE in [RFC7588] has been deployed for all of 716 the above-mentioned IP encapsulations. This strategy does not rely 717 on IPv6 fragmentation except in one corner case. (see Section 3.3.2.2 718 of RFC 7588 and Section 7.1 of RFC 2473). Section 3.3 of [RFC7676] 719 further describes this corner case. 721 7. Recommendation 723 7.1. For Application Developers 725 Application developers SHOULD NOT develop applications that rely on 726 IPv6 fragmentation. 728 Application-layer protocols then depend upon IPv6 fragmentation 729 SHOULD be updated to break that dependency. 731 7.2. For Network Operators 733 As per RFC 4890, network operators MUST NOT filter ICMPv6 PTB 734 messages unless they are known to be forged or otherwise 735 illegitimate. As stated in Section 4.5, filtering ICMPv6 PTB packets 736 causes PMTUD to fail. Many upper-layer protocols rely on PMTUD. 738 8. IANA Considerations 740 This document makes no request of IANA. 742 9. Security Considerations 744 This document mitigates some of the security considerations 745 associated with IP fragmentation by discouraging the use of IP 746 fragmentation. It does not introduce any new security 747 vulnerabilities, because it does not introduce any new alternatives 748 to IP fragmentation. Instead, it recommends well-understood 749 alternatives. 751 10. Acknowledgements 753 TBD 755 11. References 757 11.1. Normative References 759 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 760 DOI 10.17487/RFC0768, August 1980, 761 . 763 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 764 DOI 10.17487/RFC0791, September 1981, 765 . 767 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 768 RFC 792, DOI 10.17487/RFC0792, September 1981, 769 . 771 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 772 RFC 793, DOI 10.17487/RFC0793, September 1981, 773 . 775 [RFC1035] Mockapetris, P., "Domain names - implementation and 776 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 777 November 1987, . 779 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 780 DOI 10.17487/RFC1191, November 1990, 781 . 783 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 784 Requirement Levels", BCP 14, RFC 2119, 785 DOI 10.17487/RFC2119, March 1997, 786 . 788 [RFC4443] Conta, A., Deering, S., and M. Gupta, Ed., "Internet 789 Control Message Protocol (ICMPv6) for the Internet 790 Protocol Version 6 (IPv6) Specification", STD 89, 791 RFC 4443, DOI 10.17487/RFC4443, March 2006, 792 . 794 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 795 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 796 . 798 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 799 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 800 March 2017, . 802 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 803 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 804 May 2017, . 806 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 807 (IPv6) Specification", STD 86, RFC 8200, 808 DOI 10.17487/RFC8200, July 2017, 809 . 811 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 812 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 813 DOI 10.17487/RFC8201, July 2017, 814 . 816 11.2. Informative References 818 [Anderson2001] 819 Anderson, J., "An Analysis of Fragmentation Attacks", 820 2001, . 822 [Huston] Huston, G., "IPv6, Large UDP Packets and the DNS 823 (http://www.potaroo.net/ispcol/2017-08/xtn-hdrs.html)", 824 August 2017. 826 [I-D.fairhurst-tsvwg-datagram-plpmtud] 827 Fairhurst, G., Jones, T., Tuexen, M., and I. Ruengeler, 828 "Packetization Layer Path MTU Discovery for Datagram 829 Transports", draft-fairhurst-tsvwg-datagram-plpmtud-02 830 (work in progress), December 2017. 832 [Ptacek1998] 833 Ptacek, T. and T. Newsham, "Insertion, Evasion and Denial 834 of Service: Eluding Network Intrusion Detection", 1998, 835 . 837 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 838 Communication Layers", STD 3, RFC 1122, 839 DOI 10.17487/RFC1122, October 1989, 840 . 842 [RFC1858] Ziemba, G., Reed, D., and P. Traina, "Security 843 Considerations for IP Fragment Filtering", RFC 1858, 844 DOI 10.17487/RFC1858, October 1995, 845 . 847 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 848 DOI 10.17487/RFC2003, October 1996, 849 . 851 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 852 IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, 853 December 1998, . 855 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 856 "Definition of the Differentiated Services Field (DS 857 Field) in the IPv4 and IPv6 Headers", RFC 2474, 858 DOI 10.17487/RFC2474, December 1998, 859 . 861 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 862 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 863 DOI 10.17487/RFC2784, March 2000, 864 . 866 [RFC4035] Arends, R., Austein, R., Larson, M., Massey, D., and S. 867 Rose, "Protocol Modifications for the DNS Security 868 Extensions", RFC 4035, DOI 10.17487/RFC4035, March 2005, 869 . 871 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 872 Congestion Control Protocol (DCCP)", RFC 4340, 873 DOI 10.17487/RFC4340, March 2006, 874 . 876 [RFC4890] Davies, E. and J. Mohacsi, "Recommendations for Filtering 877 ICMPv6 Messages in Firewalls", RFC 4890, 878 DOI 10.17487/RFC4890, May 2007, 879 . 881 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 882 RFC 4960, DOI 10.17487/RFC4960, September 2007, 883 . 885 [RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF 886 for IPv6", RFC 5340, DOI 10.17487/RFC5340, July 2008, 887 . 889 [RFC5722] Krishnan, S., "Handling of Overlapping IPv6 Fragments", 890 RFC 5722, DOI 10.17487/RFC5722, December 2009, 891 . 893 [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, 894 DOI 10.17487/RFC5927, July 2010, 895 . 897 [RFC7588] Bonica, R., Pignataro, C., and J. Touch, "A Widely 898 Deployed Solution to the Generic Routing Encapsulation 899 (GRE) Fragmentation Problem", RFC 7588, 900 DOI 10.17487/RFC7588, July 2015, 901 . 903 [RFC7676] Pignataro, C., Bonica, R., and S. Krishnan, "IPv6 Support 904 for Generic Routing Encapsulation (GRE)", RFC 7676, 905 DOI 10.17487/RFC7676, October 2015, 906 . 908 [RFC7739] Gont, F., "Security Implications of Predictable Fragment 909 Identification Values", RFC 7739, DOI 10.17487/RFC7739, 910 February 2016, . 912 [RFC7872] Gont, F., Linkova, J., Chown, T., and W. Liu, 913 "Observations on the Dropping of Packets with IPv6 914 Extension Headers in the Real World", RFC 7872, 915 DOI 10.17487/RFC7872, June 2016, 916 . 918 [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- 919 in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, 920 March 2017, . 922 Appendix A. Contributors' Address 924 Authors' Addresses 926 Ron Bonica 927 Juniper Networks 928 2251 Corporate Park Drive 929 Herndon, Virginia 20171 930 USA 932 Email: rbonica@juniper.net 934 Fred Baker 935 Unaffiliated 936 Santa Barbara, California 93117 937 USA 939 Email: FredBaker.IETF@gmail.com 940 Geoff Huston 941 APNIC 942 6 Cordelia St 943 Brisbane, 4101 QLD 944 Australia 946 Email: gih@apnic.net 948 Robert M. Hinden 949 Check Point Software 950 959 Skyway Road 951 San Carlos, California 94070 952 USA 954 Email: bob.hinden@gmail.com 956 Ole Troan 957 Cisco 958 Philip Pedersens vei 1 959 N-1366 Lysaker 960 Norway 962 Email: ot@cisco.com 964 Fernando Gont 965 SI6 Networks 966 Evaristo Carriego 2644 967 Haedo, Provincia de Buenos Aires 968 Argentina 970 Email: fgont@si6networks.com