idnits 2.17.1 draft-ietf-tsvwg-datagram-plpmtud-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The abstract seems to indicate that this document updates RFC8201, but the header doesn't have an 'Updates:' line to match this. -- The abstract seems to indicate that this document updates RFC8085, but the header doesn't have an 'Updates:' line to match this. -- The abstract seems to indicate that this document updates RFC4960, but the header doesn't have an 'Updates:' line to match this. -- The abstract seems to indicate that this document updates RFC4821, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (20 January 2020) is 1558 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-20 ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) == Outdated reference: A later version (-13) exists of draft-ietf-intarea-tunnels-10 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Fairhurst 3 Internet-Draft T. Jones 4 Updates4821, 4960, 8085 (if approved) University of Aberdeen 5 Intended status: Standards Track M. Tuexen 6 Expires: 23 July 2020 I. Ruengeler 7 T. Voelker 8 Muenster University of Applied Sciences 9 20 January 2020 11 Packetization Layer Path MTU Discovery for Datagram Transports 12 draft-ietf-tsvwg-datagram-plpmtud-13 14 Abstract 16 This document describes a robust method for Path MTU Discovery 17 (PMTUD) for datagram Packetization Layers (PLs). It describes an 18 extension to RFC 1191 and RFC 8201, which specifies ICMP-based Path 19 MTU Discovery for IPv4 and IPv6. The method allows a PL, or a 20 datagram application that uses a PL, to discover whether a network 21 path can support the current size of datagram. This can be used to 22 detect and reduce the message size when a sender encounters a packet 23 black hole (where packets are discarded). The method can probe a 24 network path with progressively larger packets to discover whether 25 the maximum packet size can be increased. This allows a sender to 26 determine an appropriate packet size, providing functionality for 27 datagram transports that is equivalent to the Packetization Layer 28 PMTUD specification for TCP, specified in RFC 4821. 30 The document updates RFC 4821 to specify the method for datagram PLs, 31 and updates RFC 8085 as the method to use in place of RFC 4821 with 32 UDP datagrams. Section 7.3 of RFC4960 recommends an endpoint apply 33 the techniques in RFC4821 on a per-destination-address basis. 34 RFC4960 is updated to recommend that SCTP uses the method specified 35 in this document instead of the method in RFC4821. 37 The document also provides implementation notes for incorporating 38 Datagram PMTUD into IETF datagram transports or applications that use 39 datagram transports. 41 When published, this specification updates RFC 4821 and RFC 8085. 43 Status of This Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF). Note that other groups may also distribute 50 working documents as Internet-Drafts. The list of current Internet- 51 Drafts is at https://datatracker.ietf.org/drafts/current/. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 This Internet-Draft will expire on 23 July 2020. 60 Copyright Notice 62 Copyright (c) 2020 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 67 license-info) in effect on the date of publication of this document. 68 Please review these documents carefully, as they describe your rights 69 and restrictions with respect to this document. Code Components 70 extracted from this document must include Simplified BSD License text 71 as described in Section 4.e of the Trust Legal Provisions and are 72 provided without warranty as described in the Simplified BSD License. 74 Table of Contents 76 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 77 1.1. Classical Path MTU Discovery . . . . . . . . . . . . . . 4 78 1.2. Packetization Layer Path MTU Discovery . . . . . . . . . 6 79 1.3. Path MTU Discovery for Datagram Services . . . . . . . . 7 80 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 81 3. Features Required to Provide Datagram PLPMTUD . . . . . . . . 10 82 4. DPLPMTUD Mechanisms . . . . . . . . . . . . . . . . . . . . . 13 83 4.1. PLPMTU Probe Packets . . . . . . . . . . . . . . . . . . 13 84 4.2. Confirmation of Probed Packet Size . . . . . . . . . . . 14 85 4.3. Black Hole Detection . . . . . . . . . . . . . . . . . . 14 86 4.4. The Maximum Packet Size (MPS) . . . . . . . . . . . . . . 15 87 4.5. Disabling the Effect of PMTUD . . . . . . . . . . . . . . 16 88 4.6. Response to PTB Messages . . . . . . . . . . . . . . . . 17 89 4.6.1. Validation of PTB Messages . . . . . . . . . . . . . 17 90 4.6.2. Use of PTB Messages . . . . . . . . . . . . . . . . . 18 91 5. Datagram Packetization Layer PMTUD . . . . . . . . . . . . . 19 92 5.1. DPLPMTUD Components . . . . . . . . . . . . . . . . . . . 20 93 5.1.1. Timers . . . . . . . . . . . . . . . . . . . . . . . 20 94 5.1.2. Constants . . . . . . . . . . . . . . . . . . . . . . 21 95 5.1.3. Variables . . . . . . . . . . . . . . . . . . . . . . 22 96 5.1.4. Overview of DPLPMTUD Phases . . . . . . . . . . . . . 23 97 5.2. State Machine . . . . . . . . . . . . . . . . . . . . . . 25 98 5.3. Search to Increase the PLPMTU . . . . . . . . . . . . . . 28 99 5.3.1. Probing for a larger PLPMTU . . . . . . . . . . . . . 28 100 5.3.2. Selection of Probe Sizes . . . . . . . . . . . . . . 29 101 5.3.3. Resilience to Inconsistent Path Information . . . . . 30 102 5.4. Robustness to Inconsistent Paths . . . . . . . . . . . . 30 103 6. Specification of Protocol-Specific Methods . . . . . . . . . 30 104 6.1. Application support for DPLPMTUD with UDP or 105 UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . . 30 106 6.1.1. Application Request . . . . . . . . . . . . . . . . . 31 107 6.1.2. Application Response . . . . . . . . . . . . . . . . 31 108 6.1.3. Sending Application Probe Packets . . . . . . . . . . 31 109 6.1.4. Initial Connectivity . . . . . . . . . . . . . . . . 31 110 6.1.5. Validating the Path . . . . . . . . . . . . . . . . . 32 111 6.1.6. Handling of PTB Messages . . . . . . . . . . . . . . 32 112 6.2. DPLPMTUD for SCTP . . . . . . . . . . . . . . . . . . . . 32 113 6.2.1. SCTP/IPv4 and SCTP/IPv6 . . . . . . . . . . . . . . . 32 114 6.2.2. DPLPMTUD for SCTP/UDP . . . . . . . . . . . . . . . . 33 115 6.2.3. DPLPMTUD for SCTP/DTLS . . . . . . . . . . . . . . . 34 116 6.3. DPLPMTUD for QUIC . . . . . . . . . . . . . . . . . . . . 35 117 6.3.1. Initial Connectivity . . . . . . . . . . . . . . . . 35 118 6.3.2. Sending QUIC Probe Packets . . . . . . . . . . . . . 35 119 6.3.3. Validating the Path with QUIC . . . . . . . . . . . . 36 120 6.3.4. Handling of PTB Messages by QUIC . . . . . . . . . . 36 121 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36 122 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 123 9. Security Considerations . . . . . . . . . . . . . . . . . . . 36 124 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 37 125 10.1. Normative References . . . . . . . . . . . . . . . . . . 37 126 10.2. Informative References . . . . . . . . . . . . . . . . . 39 127 Appendix A. Revision Notes . . . . . . . . . . . . . . . . . . . 40 128 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 44 130 1. Introduction 132 The IETF has specified datagram transport using UDP, SCTP, and DCCP, 133 as well as protocols layered on top of these transports (e.g., SCTP/ 134 UDP, DCCP/UDP, QUIC/UDP), and direct datagram transport over the IP 135 network layer. This document describes a robust method for Path MTU 136 Discovery (PMTUD) that can be used with these transport protocols (or 137 the applications that use their transport service) to discover an 138 appropriate size of packet to use across an Internet path. 140 1.1. Classical Path MTU Discovery 142 Classical Path Maximum Transmission Unit Discovery (PMTUD) can be 143 used with any transport that is able to process ICMP Packet Too Big 144 (PTB) messages (e.g., [RFC1191] and [RFC8201]). In this document, 145 the term PTB message is applied to both IPv4 ICMP Unreachable 146 messages (type 3) that carry the error Fragmentation Needed (Type 3, 147 Code 4) [RFC0792] and ICMPv6 Packet Too Big messages (Type 2) 148 [RFC4443]. When a sender receives a PTB message, it reduces the 149 effective MTU to the value reported as the Link MTU in the PTB 150 message. A method from time-to-time increases the packet size in 151 attempt to discover an increase in the supported PMTU. The packets 152 sent with a size larger than the current effective PMTU are known as 153 probe packets. 155 Packets not intended as probe packets are either fragmented to the 156 current effective PMTU, or the attempt to send fails with an error 157 code. Applications can be provided with a primitive to let them read 158 the Maximum Packet Size (MPS), derived from the current effective 159 PMTU. 161 Classical PMTUD is subject to protocol failures. One failure arises 162 when traffic using a packet size larger than the actual PMTU is 163 black-holed (all datagrams sent with this size, or larger, are 164 discarded). This could arise when the PTB messages are not delivered 165 back to the sender for some reason (see for example [RFC2923]). 167 Examples where PTB messages are not delivered include: 169 * The generation of ICMP messages is usually rate limited. This 170 could result in no PTB messages being generated to the sender (see 171 section 2.4 of [RFC4443]) 173 * ICMP messages can be filtered by middleboxes (including firewalls) 174 [RFC4890]. A stateful firewall could be configured with a policy 175 to block incoming ICMP messages, which would prevent reception of 176 PTB messages to a sending endpoint behind this firewall. 178 * When the router issuing the ICMP message drops a tunneled packet, 179 the resulting ICMP message will be directed to the tunnel ingress. 180 This tunnel endpoint is responsible for forwarding the ICMP 181 message and also processing the quoted packet within the payload 182 field to remove the effect of the tunnel, and return a correctly 183 formatted ICMP message to the sender [I-D.ietf-intarea-tunnels]. 184 Failure to do this prevents the PTB message reaching the original 185 sender. 187 * Asymmetry in forwarding can result in there being no return route 188 to the original sender, which would prevent an ICMP message being 189 delivered to the sender. This issue can also arise when policy- 190 based routing is used, Equal Cost Multipath (ECMP) routing is 191 used, or a middlebox acts as an application load balancer. An 192 example is where the path towards the server is chosen by ECMP 193 routing depending on bytes in the IP payload. In this case, when 194 a packet sent by the server encounters a problem after the ECMP 195 router, then any resulting ICMP message also needs to be directed 196 by the ECMP router towards the original sender. 198 * There are additional cases where the next hop destination fails to 199 receive a packet because of its size. This could be due to 200 misconfiguration of the layer 2 path between nodes, for instance 201 the MTU configured in a layer 2 switch, or misconfiguration of the 202 Maximum Receive Unit (MRU). If a packet is dropped by the link, 203 this will not cause a PTB message to be sent to the original 204 sender. 206 Another failure could result if a node that is not on the network 207 path sends a PTB message that attempts to force a sender to change 208 the effective PMTU [RFC8201]. A sender can protect itself from 209 reacting to such messages by utilising the quoted packet within a PTB 210 message payload to validate that the received PTB message was 211 generated in response to a packet that had actually originated from 212 the sender. However, there are situations where a sender would be 213 unable to provide this validation. Examples where validation of the 214 PTB message is not possible include: 216 * When a router issuing the ICMP message implements RFC792 217 [RFC0792], it is only required to include the first 64 bits of the 218 IP payload of the packet within the quoted payload. There could 219 be insufficient bytes remaining for the sender to interpret the 220 quoted transport information. 222 Note: The recommendation in RFC1812 [RFC1812] is that IPv4 routers 223 return a quoted packet with as much of the original datagram as 224 possible without the length of the ICMP datagram exceeding 576 225 bytes. IPv6 routers include as much of the invoking packet as 226 possible without the ICMPv6 packet exceeding 1280 bytes [RFC4443]. 228 * The use of tunnels/encryption can reduce the size of the quoted 229 packet returned to the original source address, increasing the 230 risk that there could be insufficient bytes remaining for the 231 sender to interpret the quoted transport information. 233 * Even when the PTB message includes sufficient bytes of the quoted 234 packet, the network layer could lack sufficient context to 235 validate the message, because validation depends on information 236 about the active transport flows at an endpoint node (e.g., the 237 socket/address pairs being used, and other protocol header 238 information). 240 * When a packet is encapsulated/tunneled over an encrypted 241 transport, the tunnel/encapsulation ingress might have 242 insufficient context, or computational power, to reconstruct the 243 transport header that would be needed to perform validation. 245 * A Network Addres Translation (NAT) device that translates a packet 246 header, ought to also translate ICMP messages and update the ICMP 247 quoted packet [RFC5508] in that message. If this is not correctly 248 translated then the sender would not be able to associate the 249 message with the PL that originated the packet, and hence this 250 ICMP message cannot be validated. 252 1.2. Packetization Layer Path MTU Discovery 254 The term Packetization Layer (PL) has been introduced to describe the 255 layer that is responsible for placing data blocks into the payload of 256 IP packets and selecting an appropriate MPS. This function is often 257 performed by a transport protocol (e.g., DCCP, RTP, SCTP, QUIC), but 258 can also be performed by other encapsulation methods working above 259 the transport layer. 261 In contrast to PMTUD, Packetization Layer Path MTU Discovery 262 (PLPMTUD) [RFC4821] introduced a method that does not rely upon 263 reception and validation of PTB messages. It is therefore more 264 robust than Classical PMTUD. This has become the recommended 265 approach for implementing discovery of the PMTU [RFC8085]. 267 It uses a general strategy where the PL sends probe packets to search 268 for the largest size of unfragmented datagram that can be sent over a 269 network path. Probe packets are sent to explore using a larger 270 packet size. If a probe packet is successfully delivered (as 271 determined by the PL), then the PLPMTU is raised to the size of the 272 successful probe. If no response is received to a probe packet, the 273 method then reduces the PLPMTU. 275 Datagram PLPMTUD introduces flexibility in implementation. At one 276 extreme, it can be configured to only perform Black Hole Detection 277 and recovery with increased robustness compared to Classical PMTUD. 278 At the other extreme, all PTB processing can be disabled, and PLPMTUD 279 replaces Classical PMTUD. 281 PLPMTUD can also include additional consistency checks without 282 increasing the risk that data is lost when probing to discover the 283 Path MTU. For example, information available at the PL, or higher 284 layers, enables received PTB messages to be validated before being 285 utilized. 287 1.3. Path MTU Discovery for Datagram Services 289 Section 5 of this document presents a set of algorithms for datagram 290 protocols to discover the largest size of unfragmented datagram that 291 can be sent over a network path. The method relies upon features of 292 the PL described in Section 3 and applies to transport protocols 293 operating over IPv4 and IPv6. It does not require cooperation from 294 the lower layers, although it can utilize PTB messages when these 295 received messages are made available to the PL. 297 The message size guidelines in section 3.2 of the UDP Usage 298 Guidelines [RFC8085] state "an application SHOULD either use the Path 299 MTU information provided by the IP layer or implement Path MTU 300 Discovery (PMTUD)", but does not provide a mechanism for discovering 301 the largest size of unfragmented datagram that can be used on a 302 network path. The present document updates RFC 8085 to specify this 303 method in place of PLPMTUD [RFC4821] and provides a mechanism for 304 sharing the discovered largest size as the Maximum Packet Size (MPS) 305 (see Section 4.4). 307 Section 10.2 of [RFC4821] recommended a PLPMTUD probing method for 308 the Stream Control Transport Protocol (SCTP). SCTP utilizes probe 309 packets consisting of a minimal sized HEARTBEAT chunk bundled with a 310 PAD chunk as defined in [RFC4820]. However, RFC 4821 did not provide 311 a complete specification. The present document replaces this by 312 providing a complete specification. 314 The Datagram Congestion Control Protocol (DCCP) [RFC4340] requires 315 implementations to support Classical PMTUD and states that a DCCP 316 sender "MUST maintain the MPS allowed for each active DCCP session". 317 It also defines the current congestion control MPS (CCMPS) supported 318 by a network path. This recommends use of PMTUD, and suggests use of 319 control packets (DCCP-Sync) as path probe packets, because they do 320 not risk application data loss. The method defined in this 321 specification can be used with DCCP. 323 Section 6 specifies the method for datagram transports and provides 324 information to enable the implementation of PLPMTUD with other 325 datagram transports and applications that use datagram transports. 327 2. Terminology 329 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 330 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 331 "OPTIONAL" in this document are to be interpreted as described in BCP 332 14 [RFC2119] [RFC8174] when, and only when, they appear in all 333 capitals, as shown here. 335 Other terminology is directly copied from [RFC4821], and the 336 definitions in [RFC1122]. 338 Actual PMTU: The Actual PMTU is the PMTU of a network path between a 339 sender PL and a destination PL, which the DPLPMTUD algorithm seeks 340 to determine. 342 Black Hole: A Black Hole is encountered when a sender is unaware 343 that packets are not being delivered to the destination end point. 344 Two types of Black Hole are relevant to DPLPMTUD: 346 * Packets encounter a packet Black Hole when packets are not 347 delivered to the destination endpoint (e.g., when the sender 348 transmits packets of a particular size with a previously known 349 effective PMTU and they are discarded by the network). 351 * An ICMP Black Hole is encountered when the sender is unaware 352 that packets are not delivered to the destination endpoint 353 because PTB messages are not received by the originating PL 354 sender. 356 Classical Path MTU Discovery: Classical PMTUD is a process described 357 in [RFC1191] and [RFC8201], in which nodes rely on PTB messages to 358 learn the largest size of unfragmented packet that can be used 359 across a network path. 361 Datagram: A datagram is a transport-layer protocol data unit, 362 transmitted in the payload of an IP packet. 364 Effective PMTU: The Effective PMTU is the current estimated value 365 for PMTU that is used by a PMTUD. This is equivalent to the 366 PLPMTU derived by PLPMTUD. 368 EMTU_S: The Effective MTU for sending (EMTU_S) is defined in 369 [RFC1122] as "the maximum IP datagram size that may be sent, for a 370 particular combination of IP source and destination addresses...". 372 EMTU_R: The Effective MTU for receiving (EMTU_R) is designated in 373 [RFC1122] as the largest datagram size that can be reassembled by 374 EMTU_R (Effective MTU to receive). 376 Link: A Link is a communication facility or medium over which nodes 377 can communicate at the link layer, i.e., a layer below the IP 378 layer. Examples are Ethernet LANs and Internet (or higher) layer 379 and tunnels. 381 Link MTU: The Link Maximum Transmission Unit (MTU) is the size in 382 bytes of the largest IP packet, including the IP header and 383 payload, that can be transmitted over a link. Note that this 384 could more properly be called the IP MTU, to be consistent with 385 how other standards organizations use the acronym. This includes 386 the IP header, but excludes link layer headers and other framing 387 that is not part of IP or the IP payload. Other standards 388 organizations generally define the link MTU to include the link 389 layer headers. This specification continues the requirement in 390 [RFC4821], that states "All links MUST enforce their MTU: links 391 that might non- deterministically deliver packets that are larger 392 than their rated MTU MUST consistently discard such packets." 394 MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU that DPLPMTUD 395 will attempt to use. 397 MPS: The Maximum Packet Size (MPS) is the largest size of 398 application data block that can be sent across a network path by a 399 PL. In DPLPMTUD this quantity is derived from the PLPMTU by 400 taking into consideration the size of the lower protocol layer 401 headers. Probe packets generated by DPLPMTUD can have a size 402 larger than the MPS. 404 MIN_PMTU: The MIN_PMTU is the smallest size of PLPMTU that DPLPMTUD 405 will attempt to use. 407 Packet: A Packet is the IP header plus the IP payload. 409 Packetization Layer (PL): The Packetization Layer (PL) is a layer of 410 the network stack that places data into packets and performs 411 transport protocol functions. Examples of a PL include: TCP, 412 SCTP, SCTP over DTLS or QUIC. 414 Path: The Path is the set of links and routers traversed by a packet 415 between a source node and a destination node by a particular flow. 417 Path MTU (PMTU): The Path MTU (PMTU) is the minimum of the Link MTU 418 of all the links forming a network path between a source node and 419 a destination node. 421 PTB_SIZE: The PTB_SIZE is a value reported in a validated PTB 422 message that indicates next hop link MTU of a router along the 423 path. 425 PLPMTU: The Packetization Layer PMTU is an estimate of the actual 426 PMTU provided by the DPLPMTUD algorithm. 428 PLPMTUD: Packetization Layer Path MTU Discovery (PLPMTUD), the 429 method described in this document for datagram PLs, which is an 430 extension to Classical PMTU Discovery. 432 Probe packet: A probe packet is a datagram sent with a purposely 433 chosen size (typically the current PLPMTU or larger) to detect if 434 packets of this size can be successfully sent end-to-end across 435 the network path. 437 3. Features Required to Provide Datagram PLPMTUD 439 The principles expressed in [RFC4821] apply to the use of the 440 technique with any PL. TCP PLPMTUD has been defined using standard 441 TCP protocol mechanisms. Unlike TCP, datagram PLs require additional 442 mechanisms and considerations to implement PLPMTUD. 444 The requirements for datagram PLPMTUD are: 446 1. PLPMTU: The PLPMTU (specified as the effective PMTU in Section 1 447 of [RFC1191]) is equivalent to the EMTU_S (specified in 448 [RFC1122]). For datagram PLs,] the PLPMTU is managed by 449 DPLPMTUD. A PL MUST NOT send a packet (other than a probe 450 packet) with a size larger than the current PLPMTU at the 451 network layer. 453 2. Probe packets: On request, a DPLPMTUD sender is REQUIRED to be 454 able to transmit a packet larger than the PLMPMTU. This is used 455 to send a probe packet. In IPv4, a probe packet MUST be sent 456 with the Don't Fragment (DF) bit set in the IP header, and 457 without network layer endpoint fragmentation. In IPv6, a probe 458 packet is always sent without source fragmentation (as specified 459 in section 5.4 of [RFC8201]). 461 3. Reception feedback: The destination PL endpoint is REQUIRED to 462 provide a feedback method that indicates to the DPLPMTUD sender 463 when a probe packet has been received by the destination PL 464 endpoint. 466 4. Probe loss recovery: It is RECOMMENDED to use probe packets that 467 do not carry any user data that would require retransmission if 468 lost. Most datagram transports permit this. If a probe packet 469 contains user data requiring retransmission in case of loss, the 470 PL (or layers above) are REQUIRED to arrange any retransmission/ 471 repair of any resulting loss. The PL is REQUIRED to be robust 472 in the case where probe packets are lost due to other reasons 473 (including link transmission error, congestion). 475 5. PMTU parameters: A DPLPMTUD sender is RECOMMENDED to utilise 476 information about the maximum size of packet that can be 477 transmitted by the sender on the local link (e.g., the local 478 Link MTU). It MAY utilize similar information about the 479 receiver when this is supplied (note this could be less than 480 EMTU_R). This avoids implementations trying to send probe 481 packets that can not be transmitted by the local link. Too high 482 of a value could reduce the efficiency of the search algorithm. 483 Some applications also have a maximum transport protocol data 484 unit (PDU) size, in which case there is no benefit from probing 485 for a size larger than this (unless a transport allows 486 multiplexing multiple applications PDUs into the same datagram). 488 6. Processing PTB messages: A DPLPMTUD sender MAY optionally 489 utilize PTB messages received from the network layer to help 490 identify when a network path does not support the current size 491 of probe packet. Any received PTB message MUST be validated 492 before it is used to update the PLPMTU discovery information 493 [RFC8201]. This validation confirms that the PTB message was 494 sent in response to a packet originating by the sender, and 495 needs to be performed before the PLPMTU discovery method reacts 496 to the PTB message. A PTB message MUST NOT be used to increase 497 the PLPMTU [RFC8201], but could trigger a probe to test for a 498 larger PLPMTU. A PTB_SIZE greater than the currently probed 499 MUST be ignored. 501 7. Probing and congestion control: The decision about when to send 502 a probe packet does not need to be limited by the congestion 503 controller. When not controlled by the congestion controller, 504 the interval between probe packets MUST be at least one RTT. If 505 transmission of probe packets is limited by the congestion 506 controller, this could result in transmission of probe packets 507 being delayed. 509 8. Loss of a probe packet SHOULD NOT be treated as an indication of 510 congestion and SHOULD NOT trigger a congestion control reaction 511 [RFC4821], because this could result in unnecessary reduction of 512 the sending rate. 514 9. An update to the PLPMTU (or MPS) MUST NOT modify the congestion 515 window measured in bytes [RFC4821]. Therefore, an increase in 516 the packet size does not cause an increase the data rate in 517 bytes per second. 519 10. Probing and flow control: Flow control at the PL concerns the 520 end-to-end flow of data using the PL service. This does not 521 apply to DPLPMTU when probe packets use a design that does not 522 carry user data to the remote application. 524 11. Shared PLPMTU state: The PLPMTU value MAY also be stored with 525 the corresponding entry associated with the destination in the 526 IP layer cache, and used by other PL instances. The 527 specification of PLPMTUD [RFC4821] states: "If PLPMTUD updates 528 the MTU for a particular path, all Packetization Layer sessions 529 that share the path representation (as described in Section 5.2 530 of [RFC4821]) SHOULD be notified to make use of the new MTU". 531 Such methods MUST be robust to the wide variety of underlying 532 network forwarding behaviors. Section 5.2 of [RFC8201] provides 533 guidance on the caching of PMTU information and also the 534 relation to IPv6 flow labels. 536 In addition, the following principles are stated for design of a 537 DPLPMTUD method: 539 * Maximum Packet Size (MPS): A PL MAY be designed to segment data 540 blocks larger than the MPS into multiple datagrams. However, not 541 all datagram PLs support segmentation of data blocks. It is 542 RECOMMENDED that methods avoid forcing an application to use an 543 arbitrary small MPS for transmission while the method is searching 544 for the currently supported PLPMTU. A reduced MPS can adversely 545 impact the performance of an application. 547 * To assist applications in choosing a suitable data block size, the 548 PL is RECOMMENDED to provide a primitive that returns the MPS 549 derived from the PLPMTU to the higher layer using the PL. The 550 value of the MPS can change following a change in the path, or 551 loss of probe packets. 553 * Path validation: It is RECOMMENDED that methods are robust to path 554 changes that could have occurred since the path characteristics 555 were last confirmed, and to the possibility of inconsistent path 556 information being received. 558 * Datagram reordering: A method is REQUIRED to be robust to the 559 possibility that a flow encounters reordering, or the traffic 560 (including probe packets) is divided over more than one network 561 path. 563 * Datagram delay and duplication: The feedback mechanism is REQUIRED 564 to be robust to the possibility that packets could be 565 significantly delayed or duplicated along a network path. 567 * When to probe: It is RECOMMENDED that methods determine whether 568 the path has changed since it last measured the path. This can 569 help determine when to probe the path again. 571 4. DPLPMTUD Mechanisms 573 This section lists the protocol mechanisms used in this 574 specification. 576 4.1. PLPMTU Probe Packets 578 The DPLPMTUD method relies upon the PL sender being able to generate 579 probe packets with a specific size. TCP is able to generate these 580 probe packets by choosing to appropriately segment data being sent 581 [RFC4821]. In contrast, a datagram PL that constructs a probe packet 582 has to either request an application to send a data block that is 583 larger than that generated by an application, or to utilize padding 584 functions to extend a datagram beyond the size of the application 585 data block. Protocols that permit exchange of control messages 586 (without an application data block) can generate a probe packet by 587 extending a control message with padding data. 589 A receiver is REQUIRED to be able to distinguish an in-band data 590 block from any added padding. This is needed to ensure that any 591 added padding is not passed on to an application at the receiver. 593 This results in three possible ways that a sender can create a probe 594 packet: 596 Probing using padding data: A probe packet that contains only 597 control information together with any padding, which is needed to 598 be inflated to the size of the probe packet. Since these probe 599 packets do not carry an application-supplied data block, they do 600 not typically require retransmission, although they do still 601 consume network capacity and incur endpoint processing. 603 Probing using application data and padding 604 data: A probe packet that 605 contains a data block supplied by an application that is combined 606 with padding to inflate the length of the datagram to the size of 607 the probe packet. If the application/transport needs protection 608 from the loss of this probe packet, the application/transport 609 could perform transport-layer retransmission/repair of the data 610 block (e.g., by retransmission after loss is detected or by 611 duplicating the data block in a datagram without the padding 612 data). 614 Probing using application data: A probe packet that contains a data 615 block supplied by an application that matches the size of the 616 probe packet. This method requests the application to issue a 617 data block of the desired probe size. If the application/ 618 transport needs protection from the loss of an unsuccessful probe 619 packet, the application/transport needs then to perform transport- 620 layer retransmission/repair of the data block (e.g., by 621 retransmission after loss is detected). 623 A PL that uses a probe packet carrying an application data block, 624 could need to retransmit this application data block if the probe 625 fails, possibly using a smaller PLPMTU. This could need the PL to to 626 use a smaller packet size to traverse the end-to-end path (which 627 could utilize endpoint network-layer or a PL that can re-segment the 628 data block into multiple datagrams). 630 DPLPMTUD MAY choose to use only one of these methods to simplify the 631 implementation. 633 Probe messages sent by a PL MUST contain enough information to 634 uniquely identify the probe within Maximum Segment Lifetime, while 635 being robust to reordering and replay of probe response and PTB 636 messages. 638 4.2. Confirmation of Probed Packet Size 640 The PL needs a method to determine (confirm) when probe packets have 641 been successfully received end-to-end across a network path. 643 Transport protocols can include end-to-end methods that detect and 644 report reception of specific datagrams that they send (e.g., DCCP and 645 SCTP provide keep-alive/heartbeat features). When supported, this 646 mechanism MAY also be used by DPLPMTUD to acknowledge reception of a 647 probe packet. 649 A PL that does not acknowledge data reception (e.g., UDP and UDP- 650 Lite) is unable itself to detect when the packets that it sends are 651 discarded because their size is greater than the actual PMTU. These 652 PLs need to rely on an application protocol to detect this loss. 654 Section 6 specifies this function for a set of IETF-specified 655 protocols. 657 4.3. Black Hole Detection 659 Black Hole Detection is triggered by an indication that the network 660 path could be unable to support the current PLPMTU size. 662 There are three ways to detect black holes: 664 * A validated PTB message can be received that indicates a PTB_SIZE 665 less than the current PLPMTU. A DPLPMTUD method MUST NOT rely 666 soley on this method. 668 * A PL can use the DPLPMTUD probing mechanism to periodically 669 generate probe packets of the size of the current PLPMTU (e.g., 670 using the confirmation timer Section 5.1.1). A timer tracks 671 whether acknowledgments are received. Successive loss of probes 672 is an indication that the current path no longer supports the 673 PLPMTU (e.g., when the number of probe packets sent without 674 receiving an acknowledgement, PROBE_COUNT, becomes greater than 675 MAX_PROBES). 677 * A PL can utilise an event that indicates the network path no 678 longer sustains the sender's PLPMTU size. This could use a 679 mechanism implemented within the PL to detect excessive loss of 680 data sent with a specific packet size and then conclude that this 681 excessive loss could be a result of an invalid PLPMTU (as in 682 PLPMTUD for TCP [RFC4821]). 684 A PL MAY inhibit sending probe packets when no application data has 685 been sent since the previous probe packet. A PL preferring to use an 686 up-to-data PLPMTU once user data is sent again, MAY choose to 687 continue PLPMTU discovery for each path. However, this could result 688 in additional packets being sent. 690 When the method detects the current PLPMTU is not supported, DPLPMTUD 691 sets a lower PLPMTU, and sets a lower MPS. The PL then confirms that 692 the new PLPMTU can be successfully used across the path. A probe 693 packet could need to have a size less than the size of the data block 694 generated by the application. 696 4.4. The Maximum Packet Size (MPS) 698 The result of probing determines a usable PLPMTU, which is used to 699 set the MPS used by the application. The MPS is smaller than the 700 PLPMTU because of the presence of PL headers and any IP options or 701 extensions added to the PL packet. The relationship between the MPS 702 and the PLPMTUD is illustrated in Figure 1. 704 any additional 705 headers .--- MPS -----. 706 | | | 707 v v v 708 +------------------------------+ 709 | IP | ** | PL | protocol data | 710 +------------------------------+ 712 <---------- PLPMTU ------------> 714 Figure 1: Relationship between MPS and PLPMTU 716 A PL is unable to send a packet (other than a probe packet) with a 717 size larger than the current PLPMTU at the network layer. To avoid 718 this, a PL MAY be designed to segment data blocks larger than the MPS 719 into multiple datagrams. 721 DPLPMTUD seeks to avoid IP fragmentation. An attempt to send a data 722 block larger than the MPS will therefore fail if a PL is unable to 723 segment data. To determine the largest data block that can be sent, 724 a PL SHOULD provide applications with a primitive that returns the 725 Maximum Packet Size (MPS), derived from the current PLPMTU. 727 If DPLPMTUD results in a change to the MPS, the application needs to 728 adapt to the new MPS. A particular case can arise when packets have 729 been sent with a size less than the MPS and the PLPMTU was 730 subsequently reduced. If these packets are lost, the PL MAY segment 731 the data using the new MPS. If a PL is unable to re-segment a 732 previously sent datagram (e.g., [RFC4960]), then the sender either 733 discards the datagram or could perform retransmission using network- 734 layer fragmentation to form multiple IP packets not larger than the 735 PLPMTU. For IPv4, the use of endpoint fragmentation by the sender is 736 preferred over clearing the DF-bit in the IPv4 header. Operational 737 experience reveals that IP fragmentation can reduce the reliability 738 of Internet communication [I-D.ietf-intarea-frag-fragile], which may 739 reduce the success of retransmission 741 4.5. Disabling the Effect of PMTUD 743 A PL implementing this specification MUST suspend network layer 744 processing of outgoing packets that enforces a PMTU 745 [RFC1191][RFC8201] for each flow utilising DPLPMTUD, and instead use 746 DPLPMTUD to control the size of packets that are sent by a flow. 747 This removes the need for the network layer to drop or fragment sent 748 packets that have a size greater than the PMTU. 750 4.6. Response to PTB Messages 752 This method requires the DPLPMTUD sender to validate any received PTB 753 message before using the PTB information. The response to a PTB 754 message depends on the PTB_SIZE indicated in the PTB message, the 755 state of the PLPMTUD state machine, and the IP protocol being used. 757 Section 4.6.1 first describes validation for both IPv4 ICMP 758 Unreachable messages (type 3) and ICMPv6 Packet Too Big messages, 759 both of which are referred to as PTB messages in this document. 761 4.6.1. Validation of PTB Messages 763 This section specifies utilization of PTB messages. 765 * A simple implementation MAY ignore received PTB messages and in 766 this case the PLPMTU is not updated when a PTB message is 767 received. 769 * An implementation that supports PTB messages MUST validate 770 messages before they are further processed. 772 A PL that receives a PTB message from a router or middlebox, performs 773 ICMP validation as specified in Section 5.2 of [RFC8085][RFC8201]. 774 Because DPLPMTUD operates at the PL, the PL needs to check that each 775 received PTB message is received in response to a packet transmitted 776 by the endpoint PL performing DPLPMTUD. 778 The PL MUST check the protocol information in the quoted packet 779 carried in an ICMP PTB message payload to validate the message 780 originated from the sending node. This validation includes 781 determining that the combination of the IP addresses, the protocol, 782 the source port and destination port match those returned in the 783 quoted packet - this is also necessary for the PTB message to be 784 passed to the corresponding PL. 786 The validation SHOULD utilize information that it is not simple for 787 an off-path attacker to determine [RFC8085]. For example, by 788 checking the value of a protocol header field known only to the two 789 PL endpoints. A datagram application that uses well-known source and 790 destination ports ought to also rely on other information to complete 791 this validation. 793 These checks are intended to provide protection from packets that 794 originate from a node that is not on the network path. A PTB message 795 that does not complete the validation MUST NOT be further utilized by 796 the DPLPMTUD method. 798 PTB messages that have been validated MAY be utilized by the DPLPMTUD 799 algorithm, but MUST NOT be used directly to set the PLPMTU. A method 800 that utilizes these PTB messages can improve the speed at the which 801 the algorithm detects an appropriate PLPMTU by triggering an 802 immediate probe for the PTB_SIZE, compared to one that relies solely 803 on probing using a timer-based search algorithm. Section 4.6.2 804 describes this processing. 806 4.6.2. Use of PTB Messages 808 A set of checks are intended to provide protection from a router that 809 reports an unexpected PTB_SIZE. The PL also needs to check that the 810 indicated PTB_SIZE is less than the size used by probe packets and at 811 least the minimum size accepted. 813 This section provides a summary of how PTB messages can be utilized. 814 This processing depends on the PTB_SIZE and the current value of a 815 set of variables: 817 PTB_SIZE < MIN_PMTU 818 * Invalid PTB_SIZE see Section 4.6.1. 820 * PTB message ought to be discarded without further processing 821 (e. g. PLPMTU not modified). 823 * The information could be utilized as an input to trigger 824 enabling a resilience mode. 826 MIN_PMTU < PTB_SIZE < BASE_PMTU 827 * A robust PL MAY enter an error state (see Section 5.2) for an 828 IPv4 path when the PTB_SIZE reported in the PTB message is 829 larger than or equal to 68 bytes [RFC0791] and when this is 830 less than the BASE_PMTU. 832 * A robust PL MAY enter an error state (see Section 5.2) for an 833 IPv6 path when the PTB_SIZE reported in the PTB message is 834 larger than or equal to 1280 bytes [RFC8200] and when this is 835 less than the BASE_PMTU. 837 PTB_SIZE = PLPMTU 838 * Completes the search for a larger PLPMTU. 840 PTB_SIZE > PROBED_SIZE 841 * Inconsistent network signal. 843 * PTB message ought to be discarded without further processing 844 (e. g. PLPMTU not modified). 846 * The information could be utilized as an input to trigger 847 enabling a resilience mode. 849 BASE_PMTU <= PTB_SIZE < PLPMTU 850 * This could be an indication of a black hole. The PLPMTU SHOULD 851 be set to BASE_PMTU (the PLPMTU is reduced to the BASE_PMTU to 852 avoid unnecessary packet loss when a black hole is 853 encountered). 855 * The PL ought to start a search to quickly discover the new 856 PLPMTU. The PTB_SIZE reported in the PTB message can be used 857 to initialize a search algorithm. 859 PLPMTU < PTB_SIZE < PROBED_SIZE 860 * The PLPMTU continues to be valid, but the last PROBED_SIZE 861 searched was larger than the actual PMTU. 863 * The PLPMTU is not updated. 865 * The PL can use the reported PTB_SIZE from the PTB message as 866 the next search point when it resumes the search algorithm. 868 5. Datagram Packetization Layer PMTUD 870 This section specifies Datagram PLPMTUD (DPLPMTUD). The method can 871 be introduced at various points (as indicated with * in the figure 872 below) in the IP protocol stack to discover the PLPMTU so that an 873 application can utilize an appropriate MPS for the current network 874 path. 876 DPLPMTUD SHOULD NOT be used by an upper PL or application if it is 877 already used in a lower layer, DPLPMTUD SHOULD only be performed once 878 between a pair of endpoints. A PL MUST adjust the MPS indicated by 879 DPLPMTUD to account for any additional overhead introduced by the PL. 881 +----------------------+ 882 | Application* | 883 +-+-------+----+----+--+ 884 | | | | 885 +---+--+ +--+--+ | +-+---+ 886 | QUIC*| |UDPO*| | |SCTP*| 887 +---+--+ +--+--+ | +--+--+ 888 | | | | | 889 +-------+--+ | | | 890 | | | | 891 +-+-+--+ | 892 | UDP | | 893 +---+--+ | 894 | | 895 +--------------+-----+-+ 896 | Network Interface | 897 +----------------------+ 899 Figure 2: Examples where DPLPMTUD can be implemented 901 The central idea of DPLPMTUD is probing by a sender. Probe packets 902 are sent to find the maximum size of user message that can be 903 completely transferred across the network path from the sender to the 904 destination. 906 The following sections identify the components needed for 907 implementation, provides an overview of the phases of operation, and 908 specifies the state machine and search algorithm. 910 5.1. DPLPMTUD Components 912 This section describes the timers, constants, and variables of 913 DPLPMTUD. 915 5.1.1. Timers 917 The method utilizes up to three timers: 919 PROBE_TIMER: The PROBE_TIMER is configured to expire after a 920 period longer than the maximum time to receive 921 an acknowledgment to a probe packet. This value 922 MUST NOT be smaller than 1 second, and SHOULD be 923 larger than 15 seconds. Guidance on selection 924 of the timer value are provided in section 3.1.1 925 of the UDP Usage Guidelines [RFC8085]. 927 PMTU_RAISE_TIMER: The PMTU_RAISE_TIMER is configured to the period 928 a sender will continue to use the current 929 PLPMTU, after which it re-enters the Search 930 phase. This timer has a period of 600 seconds, 931 as recommended by PLPMTUD [RFC4821]. 933 DPLPMTUD MAY inhibit sending probe packets when 934 no application data has been sent since the 935 previous probe packet. A PL preferring to use 936 an up-to-data PMTU once user data is sent again, 937 can choose to continue PMTU discovery for each 938 path. However, this could result in sending 939 additional packets. 941 CONFIRMATION_TIMER: When an acknowledged PL is used, this timer MUST 942 NOT be used. For other PLs, the 943 CONFIRMATION_TIMER is configured to the period a 944 PL sender waits before confirming the current 945 PLPMTU is still supported. This is less than 946 the PMTU_RAISE_TIMER and used to decrease the 947 PLPMTU (e.g., when a black hole is encountered). 948 Confirmation needs to be frequent enough when 949 data is flowing that the sending PL does not 950 black hole extensive amounts of traffic. 951 Guidance on selection of the timer value are 952 provided in section 3.1.1 of the UDP Usage 953 Guidelines [RFC8085]. 955 DPLPMTUD MAY inhibit sending probe packets when 956 no application data has been sent since the 957 previous probe packet. A PL preferring to use 958 an up-to-data PMTU once user data is sent again, 959 can choose to continue PMTU discovery for each 960 path. However, this could result in sending 961 additional packets. 963 An implementation could implement the various timers using a single 964 timer. 966 5.1.2. Constants 968 The following constants are defined: 970 MAX_PROBES: The MAX_PROBES is the maximum value of the PROBE_COUNT 971 counter (see Section 5.1.3). MAX_PROBES represents the 972 limit for the number of consecutive probe attempts of 973 any size. The default value of MAX_PROBES is 3. This 974 value is greater than 1 to provide robustness to 975 isolated packet loss. 977 MIN_PMTU: The MIN_PMTU is the smallest allowed probe packet size. 978 For IPv6, this value is 1280 bytes, as specified in 979 [RFC8200]. For IPv4, the minimum value is 68 bytes. 981 Note: An IPv4 router is required to be able to forward a 982 datagram of 68 bytes without further fragmentation. 983 This is the combined size of an IPv4 header and the 984 minimum fragment size of 8 bytes. In addition, 985 receivers are required to be able to reassemble 986 fragmented datagrams at least up to 576 bytes, as stated 987 in section 3.3.3 of [RFC1122]. 989 MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU. This has to 990 be less than or equal to the minimum of the local MTU of 991 the outgoing interface and the destination PMTU for 992 receiving. An application, or PL, MAY choose a smaller 993 MAX_PMTU when there is no need to send packets larger 994 than a specific size. 996 BASE_PMTU: The BASE_PMTU is a configured size expected to work for 997 most paths. The size is equal to or larger than the 998 MIN_PMTU and smaller than the MAX_PMTU. In the case of 999 IPv6, this value is 1280 bytes [RFC8200]. When using 1000 IPv4, a size of 1200 bytes is RECOMMENDED. 1002 5.1.3. Variables 1004 This method utilizes a set of variables: 1006 PROBED_SIZE: The PROBED_SIZE is the size of the current probe 1007 packet. This is a tentative value for the PLPMTU, 1008 which is awaiting confirmation by an acknowledgment. 1010 PROBE_COUNT: The PROBE_COUNT is a count of the number of successive 1011 unsuccessful probe packets that have been sent. Each 1012 time a probe packet is acknowledged, the value is set 1013 to zero. (Some probe loss is expected while searching, 1014 therefore loss of a single probe is not an indication 1015 of a PMTU problem.) 1017 The figure below illustrates the relationship between the packet size 1018 constants and variables at a point of time when the DPLPMTUD 1019 algorithm performs path probing to increase the size of the PLPMTU. 1020 A probe packet has been sent of size PROBED_SIZE. Once this is 1021 acknowledged, the PLPMTU will raise to PROBED_SIZE allowing the 1022 DPLPMTUD algorithm to further increase PROBED_SIZE towards the actual 1023 PMTU. 1025 MIN_PMTU MAX_PMTU 1026 <--------------------------------------------------> 1027 | | | | 1028 v | | v 1029 BASE_PMTU | v Actual PMTU 1030 | PROBED_SIZE 1031 v 1032 PLPMTU 1034 Figure 3: Relationships between packet size constants and variables 1036 5.1.4. Overview of DPLPMTUD Phases 1038 This section provides a high-level informative view of the DPLPMTUD 1039 method, by describing the movement of the method through several 1040 phases of operation. More detail is available in the state machine 1041 Section 5.2. 1043 +------+ 1044 +------->| Base |----------------+ Connectivity 1045 | +------+ | or BASE_PMTU 1046 | | | confirmation failed 1047 | | v 1048 | | Connectivity +-------+ 1049 | | and BASE_PMTU | Error | 1050 | | confirmed +-------+ 1051 | | | Consistent 1052 | v | connectivity 1053 PLPMTU | +--------+ | and BASE_PMTU 1054 confirmation | | Search |<--------------+ confirmed 1055 failed | +--------+ 1056 | ^ | 1057 | | | 1058 | Raise | | Search 1059 | timer | | algorithm 1060 | expired | | completed 1061 | | | 1062 | | v 1063 | +-----------------+ 1064 +---| Search Complete | 1065 +-----------------+ 1067 Figure 4: DPLPMTUD Phases 1069 Base: The Base Phase confirms connectivity to the remote 1070 peer using packets of the BASE_PMTU. This phase is 1071 implicit for a connection-oriented PL (where it can 1072 be performed in a PL connection handshake). A 1073 connectionless PL sends an acknowledged probe 1074 packet to confirm that the remote peer is 1075 reachable. The sender also confirms that BASE_PMTU 1076 is supported across the network path. 1078 A PL that does not wish to support a path with a 1079 PLPMTU less than BASE_PMTU can simplify the phase 1080 into a single step by performing the connectivity 1081 checks with a probe of the BASE_PMTU size. 1083 Once confirmed, DPLPMTUD enters the Search Phase. 1084 If this phase fails to confirm, DPLPMTUD enters the 1085 Error Phase. 1087 Search: The Search Phase utilizes a search algorithm to 1088 send probe packets to seek to increase the PLPMTU. 1089 The algorithm concludes when it has found a 1090 suitable PLPMTU, by entering the Search Complete 1091 Phase. 1093 A PL could respond to PTB messages using the PTB to 1094 advance or terminate the search, see Section 4.6. 1096 Search Complete: The Search Complete Phase is entered when the 1097 PLPMTU is supported across the network path. A PL 1098 can use a CONFIRMATION_TIMER to periodically repeat 1099 a probe packet for the current PLPMTU size. If the 1100 sender is unable to confirm reachability (e.g., if 1101 the CONFIRMATION_TIMER expires) or the PL signals a 1102 lack of reachability, DPLPMTUD enters the Base 1103 phase. 1105 The PMTU_RAISE_TIMER is used to periodically resume 1106 the search phase to discover if the PLPMTU can be 1107 raised. Black Hole Detection causes the sender to 1108 enter the Base Phase. 1110 Error: The Error Phase is entered when there is 1111 conflicting or invalid PLPMTU information for the 1112 path (e.g. a failure to support the BASE_PMTU) that 1113 cause DPLPMTUD to be unable to progress and the 1114 PLPMTU is lowered. 1116 DPLPMTUD remains in the Error Phase until a 1117 consistent view of the path can be discovered and 1118 it has also been confirmed that the path supports 1119 the BASE_PMTU (or DPLPMTUD is suspended). 1121 An implementation that only reduces the PLPMTU to a suitable size 1122 would be sufficient to ensure reliable operation, but can be very 1123 inefficient when the actual PMTU changes or when the method (for 1124 whatever reason) makes a suboptimal choice for the PLPMTU. 1126 A full implementation of DPLPMTUD provides an algorithm enabling the 1127 DPLPMTUD sender to increase the PLPMTU following a change in the 1128 characteristics of the path, such as when a link is reconfigured with 1129 a larger MTU, or when there is a change in the set of links traversed 1130 by an end-to-end flow (e.g., after a routing or path fail-over 1131 decision). 1133 5.2. State Machine 1135 A state machine for DPLPMTUD is depicted in Figure 5. If multipath 1136 or multihoming is supported, a state machine is needed for each path. 1138 Note: Not all changes are shown to simplify the diagram. 1140 | | 1141 | Start | PL indicates loss 1142 | | of connectivity 1143 v v 1144 +---------------+ +---------------+ 1145 | DISABLED | | ERROR | 1146 +---------------+ PROBE_TIMER expiry: +---------------+ 1147 | PL indicates PROBE_COUNT = MAX_PROBES or ^ | 1148 | connectivity PTB: PTB_SIZE < BASE_PMTU | | 1149 +--------------------+ +---------------+ | 1150 | | | 1151 v | BASE_PMTU Probe | 1152 +---------------+ acked | 1153 | BASE |----------------------+ 1154 +---------------+ | 1155 ^ | ^ ^ | 1156 Black hole detected | | | | Black hole detected | 1157 +--------------------+ | | +--------------------+ | 1158 | +----+ | | 1159 | PROBE_TIMER expiry: | | 1160 | PROBE_COUNT < MAX_PROBES | | 1161 | | | 1162 | PMTU_RAISE_TIMER expiry | | 1163 | +-----------------------------------------+ | | 1164 | | | | | 1165 | | v | v 1166 +---------------+ +---------------+ 1167 |SEARCH_COMPLETE| | SEARCHING | 1168 +---------------+ +---------------+ 1169 | ^ ^ | | ^ 1170 | | | | | | 1171 | | +-----------------------------------------+ | | 1172 | | MAX_PMTU Probe acked or | | 1173 | | PROBE_TIMER expiry: PROBE_COUNT = MAX_PROBES or | | 1174 +----+ PTB: PTB_SIZE = PLPMTU +----+ 1175 CONFIRMATION_TIMER expiry: PROBE_TIMER expiry: 1176 PROBE_COUNT < MAX_PROBES or PROBE_COUNT < MAX_PROBES or 1177 PLPMTU Probe acked Probe acked or PTB: 1178 PLPMTU < PTB_SIZE < PROBED_SIZE 1180 Figure 5: State machine for Datagram PLPMTUD 1182 The following states are defined: 1184 DISABLED: The DISABLED state is the initial state before 1185 probing has started. It is also entered from any 1186 other state, when the PL indicates loss of 1187 connectivity. This state is left, once the PL 1188 indicates connectivity to the remote PL. 1190 BASE: The BASE state is used to confirm that the 1191 BASE_PMTU size is supported by the network path and 1192 is designed to allow an application to continue 1193 working when there are transient reductions in the 1194 actual PMTU. It also seeks to avoid long periods 1195 when a sender searching for a larger PLPMTU is 1196 unaware that packets are not being delivered due to 1197 a packet or ICMP Black Hole. 1199 On entry, the PROBED_SIZE is set to the BASE_PMTU 1200 size and the PROBE_COUNT is set to zero. 1202 Each time a probe packet is sent, the PROBE_TIMER 1203 is started. The state is exited when the probe 1204 packet is acknowledged, and the PL sender enters 1205 the SEARCHING state. 1207 The state is also left when the PROBE_COUNT reaches 1208 MAX_PROBES or a received PTB message is validated. 1209 This causes the PL sender to enter the ERROR state. 1211 SEARCHING: The SEARCHING state is the main probing state. 1212 This state is entered when probing for the 1213 BASE_PMTU was successful. 1215 Each time a probe packet is acknowledged, the 1216 PROBE_COUNT is set to zero, the PLPMTU is set to 1217 the PROBED_SIZE and then the PROBED_SIZE is 1218 increased using the search algorithm. 1220 When a probe packet is sent and not acknowledged 1221 within the period of the PROBE_TIMER, the 1222 PROBE_COUNT is incremented and a new probe packet 1223 is transmitted. 1225 The state is exited to enter SEARCH_COMPLETE when 1226 the PROBE_COUNT reaches MAX_PROBES, a validated PTB 1227 is received that corresponds to the last 1228 successfully probed size (PTB_SIZE = PLPMTU), or a 1229 probe of size MAX_PMTU is acknowledged (PLPMTU = 1230 MAX_PMTU). 1232 When a black hole is detected in the SEARCHING 1233 state, this causes the PL sender to enter the BASE 1234 state. 1236 SEARCH_COMPLETE: The SEARCH_COMPLETE state indicates a successful 1237 end to the SEARCHING state. DPLPMTUD remains in 1238 this state until either the PMTU_RAISE_TIMER 1239 expires or a black hole is detected. 1241 When DPLPMTUD uses an unacknowledged PL and is in 1242 the SEARCH_COMPLETE state, a CONFIRMATION_TIMER 1243 periodically resets the PROBE_COUNT and schedules a 1244 probe packet with the size of the PLPMTU. If 1245 MAX_PROBES successive PLPMTUD sized probes fail to 1246 be acknowledged the method enters the BASE state. 1247 When used with an acknowledged PL (e.g., SCTP), 1248 DPLPMTUD SHOULD NOT continue to generate PLPMTU 1249 probes in this state. 1251 ERROR: The ERROR state represents the case where either 1252 the network path is not known to support a PLPMTU 1253 of at least the BASE_PMTU size or when there is 1254 contradictory information about the network path 1255 that would otherwise result in excessive variation 1256 in the MPS signalled to the higher layer. The 1257 state implements a method to mitigate oscillation 1258 in the state-event engine. It signals a 1259 conservative value of the MPS to the higher layer 1260 by the PL. The state is exited when packet probes 1261 no longer detect the error or when the PL indicates 1262 that connectivity has been lost. The PL sender 1263 then enters the SEARCHING state. 1265 Implementations are permitted to enable endpoint 1266 fragmentation if the DPLPMTUD is unable to validate 1267 MIN_PMTU within PROBE_COUNT probes. If DPLPMTUD is 1268 unable to validate MIN_PMTU the implementation will 1269 transition to the DISABLED state. 1271 Note: MIN_PMTU could be identical to BASE_PMTU, 1272 simplifying the actions in this state. 1274 5.3. Search to Increase the PLPMTU 1276 This section describes the algorithms used by DPLPMTUD to search for 1277 a larger PLPMTU. 1279 5.3.1. Probing for a larger PLPMTU 1281 Implementations use a search algorithm across the search range to 1282 determine whether a larger PLPMTU can be supported across a network 1283 path. 1285 The method discovers the search range by confirming the minimum 1286 PLPMTU and then using the probe method to select a PROBED_SIZE less 1287 than or equal to MAX_PMTU. MAX_PMTU is the minimum of the local MTU 1288 and EMTU_R (learned from the remote endpoint). The MAX_PMTU MAY be 1289 reduced by an application that sets a maximum to the size of 1290 datagrams it will send. 1292 The PROBE_COUNT is initialized to zero when the first probe with a 1293 size greater than or equal to PLPMTUD is sent. A timer is used to 1294 trigger the sending of probe packets of size PROBED_SIZE, larger than 1295 the PLPMTU. Each probe packet successfully sent to the remote peer 1296 is confirmed by acknowledgement at the PL, see Section 4.1. 1298 Each time a probe packet is sent to the destination, the PROBE_TIMER 1299 is started. The timer is canceled when the PL receives 1300 acknowledgment that the probe packet has been successfully sent 1301 across the path Section 4.1. This confirms that the PROBED_SIZE is 1302 supported, and the PROBED_SIZE value is then assigned to the PLPMTU. 1303 The search algorithm can continue to send subsequent probe packets of 1304 an increasing size. 1306 If the timer expires before a probe packet is acknowledged, the probe 1307 has failed to confirm the PROBED_SIZE. Each time the PROBE_TIMER 1308 expires, the PROBE_COUNT is incremented, the PROBE_TIMER is 1309 reinitialized, and a new probe of the same size or any other size 1310 (determined by the search algorithm) can be sent. The maximum number 1311 of consecutive failed probes is configured (MAX_PROBES). If the 1312 value of the PROBE_COUNT reaches MAX_PROBES, probing will stop, and 1313 the PL sender enters the SEARCH_COMPLETE state. 1315 5.3.2. Selection of Probe Sizes 1317 The search algorithm determines a minimum useful gain in PLPMTU. It 1318 would not be constructive for a PL sender to attempt to probe for all 1319 sizes. This would incur unnecessary load on the path. 1320 Implementations SHOULD select the set of probe packet sizes to 1321 maximize the gain in PLPMTU from each search step. 1323 Implementations could optimize the search procedure by selecting step 1324 sizes from a table of common PMTU sizes. When selecting the 1325 appropriate next size to search, an implementer ought to also 1326 consider that there can be common sizes of MPS that applications seek 1327 to use, and their could be common sizes of MTU used within the 1328 network. 1330 5.3.3. Resilience to Inconsistent Path Information 1332 A decision to increase the PLPMTU needs to be resilient to the 1333 possibility that information learned about the network path is 1334 inconsistent. A path is inconsistent, when, for example, probe 1335 packets are lost due to other reasons (i.e., not packet size) or due 1336 to frequent path changes. Frequent path changes could occur by 1337 unexpected "flapping" - where some packets from a flow pass along one 1338 path, but other packets follow a different path with different 1339 properties. 1341 A PL sender is able to detect inconsistency from the sequence of 1342 PLPMTU probes that are acknowledged or the sequence of PTB messages 1343 that it receives. When inconsistent path information is detected, a 1344 PL sender could use an alternate search mode that clamps the offered 1345 MPS to a smaller value for a period of time. This avoids unnecessary 1346 loss of packets. 1348 5.4. Robustness to Inconsistent Paths 1350 Some paths could be unable to sustain packets of the BASE_PMTU size. 1351 To be robust to these paths an implementation could implement the 1352 Error State. This allows fallback to a smaller than desired PLPMTU, 1353 rather than suffer connectivity failure. This could utilize methods 1354 such as endpoint IP fragmentation to enable the PL sender to 1355 communicate using packets smaller than the BASE_PMTU. 1357 6. Specification of Protocol-Specific Methods 1359 DPLPMTUD requires protocol-specific details to be specified for each 1360 PL that is used. 1362 The first subsection provides guidance on how to implement the 1363 DPLPMTUD method as a part of an application using UDP or UDP-Lite. 1364 The guidance also applies to other datagram services that do not 1365 include a specific transport protocol (such as a tunnel 1366 encapsulation). The following subsections describe how DPLPMTUD can 1367 be implemented as a part of the transport service, allowing 1368 applications using the service to benefit from discovery of the 1369 PLPMTU without themselves needing to implement this method when using 1370 SCTP and QUIC. 1372 6.1. Application support for DPLPMTUD with UDP or UDP-Lite 1374 The current specifications of UDP [RFC0768] and UDP-Lite [RFC3828] do 1375 not define a method in the RFC-series that supports PLPMTUD. In 1376 particular, the UDP transport does not provide the transport features 1377 needed to implement datagram PLPMTUD. 1379 The DPLPMTUD method can be implemented as a part of an application 1380 built directly or indirectly on UDP or UDP-Lite, but relies on 1381 higher-layer protocol features to implement the method [RFC8085]. 1383 Some primitives used by DPLPMTUD might not be available via the 1384 Datagram API (e.g., the ability to access the PLPMTU from the IP 1385 layer cache, or interpret received PTB messages). 1387 In addition, it is desirable that PMTU discovery is not performed by 1388 multiple protocol layers. An application SHOULD avoid using DPLPMTUD 1389 when the underlying transport system provides this capability. To 1390 use common method for managing the PLPMTU has benefits, both in the 1391 ability to share state between different processes and opportunities 1392 to coordinate probing. 1394 6.1.1. Application Request 1396 An application needs an application-layer protocol mechanism (such as 1397 a message acknowledgement method) that solicits a response from a 1398 destination endpoint. The method SHOULD allow the sender to check 1399 the value returned in the response to provide additional protection 1400 from off-path insertion of data [RFC8085], suitable methods include a 1401 parameter known only to the two endpoints, such as a session ID or 1402 initialized sequence number. 1404 6.1.2. Application Response 1406 An application needs an application-layer protocol mechanism to 1407 communicate the response from the destination endpoint. This 1408 response could indicate successful reception of the probe across the 1409 path, but could also indicate that some (or all packets) have failed 1410 to reach the destination. 1412 6.1.3. Sending Application Probe Packets 1414 A probe packet that could carry an application data block, but the 1415 successful transmission of this data is at risk when used for 1416 probing. Some applications might prefer to use a probe packet that 1417 does not carry an application data block to avoid disruption to data 1418 transfer. 1420 6.1.4. Initial Connectivity 1422 An application that does not have other higher-layer information 1423 confirming connectivity with the remote peer SHOULD implement a 1424 connectivity mechanism using acknowledged probe packets before 1425 entering the BASE state. 1427 6.1.5. Validating the Path 1429 An application that does not have other higher-layer information 1430 confirming correct delivery of datagrams SHOULD implement the 1431 CONFIRMATION_TIMER to periodically send probe packets while in the 1432 SEARCH_COMPLETE state. 1434 6.1.6. Handling of PTB Messages 1436 An application that is able and wishes to receive PTB messages MUST 1437 perform ICMP validation as specified in Section 5.2 of [RFC8085]. 1438 This requires that the application to check each received PTB 1439 messages to validate it is received in response to transmitted 1440 traffic and that the reported PTB_SIZE is less than the current 1441 probed size (see Section 4.6.2). A validated PTB message MAY be used 1442 as input to the DPLPMTUD algorithm, but MUST NOT be used directly to 1443 set the PLPMTU. 1445 6.2. DPLPMTUD for SCTP 1447 Section 10.2 of [RFC4821] specified a recommended PLPMTUD probing 1448 method for SCTP and Section 7.3 of [RFC4960] and recommended an 1449 endpoint apply the techniques in RFC4821 on a per-destination-address 1450 basis. The specification for DPLPMTUD continues the practice of 1451 using the PL to discover the PMTU, but updates, RFC4960 with a 1452 recommendation to use the method specified in this document: The 1453 RECOMMENDED method for generating probes is to add a chunk consisting 1454 only of padding to an SCTP message. The PAD chunk defined in 1455 [RFC4820] SHOULD be attached to a minimum length HEARTBEAT (HB) chunk 1456 to build a probe packet. This enables probing without affecting the 1457 transfer of user messages and without being limited by congestion 1458 control or flow control. This is preferred to using DATA chunks 1459 (with padding as required) as path probes. 1461 Section 6.9 of [RFC4960] describes dividing the user messages into 1462 data chunks sent by the PL when using SCTP. This notes that once an 1463 SCTP message has been sent, it cannot be re-segmented. [RFC4960] 1464 describes the method to retransmit data chunks when the MPS has 1465 reduced, and the use of IP fragmentation for this case. 1467 6.2.1. SCTP/IPv4 and SCTP/IPv6 1469 6.2.1.1. Initial Connectivity 1471 The base protocol is specified in [RFC4960]. This provides an 1472 acknowledged PL. A sender can therefore enter the BASE state as soon 1473 as connectivity has been confirmed. 1475 6.2.1.2. Sending SCTP Probe Packets 1477 Probe packets consist of an SCTP common header followed by a 1478 HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to control 1479 the length of the probe packet. The HEARTBEAT chunk is used to 1480 trigger the sending of a HEARTBEAT ACK chunk. The reception of the 1481 HEARTBEAT ACK chunk acknowledges reception of a successful probe. A 1482 successful probe updates the association and path counters, but an 1483 unsuccessful probe is discounted (assumed to be a result of choosing 1484 too large a PLPMTU). 1486 The HEARTBEAT chunk carries a Heartbeat Information parameter which 1487 includes, besides the information suggested in [RFC4960], the probe 1488 size, which is the size of the complete datagram. The size of the 1489 PAD chunk is therefore computed by reducing the probing size by the 1490 IPv4 or IPv6 header size, the SCTP common header, the HEARTBEAT 1491 request and the PAD chunk header. The payload of the PAD chunk 1492 contains arbitrary data. 1494 Probing starts directly after the PL handshake, before data is sent. 1495 Assuming this behavior (i.e., the PMTU is smaller than or equal to 1496 the interface MTU), this process will take a few round trip time 1497 periods, dependent on the number of PMTU probes sent. The Heartbeat 1498 timer can be used to implement the PROBE_TIMER. 1500 6.2.1.3. Validating the Path with SCTP 1502 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1503 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1505 6.2.1.4. PTB Message Handling by SCTP 1507 Normal ICMP validation MUST be performed as specified in Appendix C 1508 of [RFC4960]. This requires that the first 8 bytes of the SCTP 1509 common header are quoted in the payload of the PTB message, which can 1510 be the case for ICMPv4 and is normally the case for ICMPv6. 1512 When a PTB message has been validated, the PTB_SIZE reported in the 1513 PTB message SHOULD be used with the DPLPMTUD algorithm, providing 1514 that the reported PTB_SIZE is less than the current probe size (see 1515 Section 4.6). 1517 6.2.2. DPLPMTUD for SCTP/UDP 1519 The UDP encapsulation of SCTP is specified in [RFC6951]. 1521 6.2.2.1. Initial Connectivity 1523 A sender can enter the BASE state as soon as SCTP connectivity has 1524 been confirmed. 1526 6.2.2.2. Sending SCTP/UDP Probe Packets 1528 Packet probing can be performed as specified in Section 6.2.1.2. The 1529 maximum payload is reduced by 8 bytes, which has to be considered 1530 when filling the PAD chunk. 1532 6.2.2.3. Validating the Path with SCTP/UDP 1534 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1535 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1537 6.2.2.4. Handling of PTB Messages by SCTP/UDP 1539 ICMP validation MUST be performed for PTB messages as specified in 1540 Appendix C of [RFC4960]. This requires that the first 8 bytes of the 1541 SCTP common header are contained in the PTB message, which can be the 1542 case for ICMPv4 (but note the UDP header also consumes a part of the 1543 quoted packet header) and is normally the case for ICMPv6. When the 1544 validation is completed, the PTB_SIZE indicated in the PTB message 1545 SHOULD be used with the DPLPMTUD providing that the reported PTB_SIZE 1546 is less than the current probe size. 1548 6.2.3. DPLPMTUD for SCTP/DTLS 1550 The Datagram Transport Layer Security (DTLS) encapsulation of SCTP is 1551 specified in [RFC8261]. This is used for data channels in WebRTC 1552 implementations. 1554 6.2.3.1. Initial Connectivity 1556 A sender can enter the BASE state as soon as SCTP connectivity has 1557 been confirmed. 1559 6.2.3.2. Sending SCTP/DTLS Probe Packets 1561 Packet probing can be done, as specified in Section 6.2.1.2. 1563 6.2.3.3. Validating the Path with SCTP/DTLS 1565 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1566 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1568 6.2.3.4. Handling of PTB Messages by SCTP/DTLS 1570 [RFC4960] does not specify a way to validate SCTP/DTLS ICMP message 1571 payload. This can prevent processing of PTB messages at the PL. 1573 6.3. DPLPMTUD for QUIC 1575 QUIC [I-D.ietf-quic-transport] is a UDP-based transport that provides 1576 reception feedback. The UDP payload includes the QUIC packet header, 1577 protected payload, and any authentication fields. QUIC depends on a 1578 PMTU of at least 1280 bytes. 1580 Section 14 of [I-D.ietf-quic-transport] describes the path 1581 considerations when sending QUIC packets. It recommends the use of 1582 PADDING frames to build the probe packet. Pure probe-only packets 1583 are constructed with PADDING frames and PING frames to create a 1584 padding only packet that will elicit an acknowledgement. Such 1585 padding only packets enable probing without affecting the transfer of 1586 other QUIC frames. 1588 The recommendation for QUIC endpoints implementing DPLPMTUD is that a 1589 MPS is maintained for each combination of local and remote IP 1590 addresses [I-D.ietf-quic-transport]. If a QUIC endpoint determines 1591 that the PMTU between any pair of local and remote IP addresses has 1592 fallen below an acceptable MPS, it immediately ceases to send QUIC 1593 packets on the affected path. This could result in termination of 1594 the connection if an alternative path cannot be found 1595 [I-D.ietf-quic-transport]. 1597 6.3.1. Initial Connectivity 1599 The base protocol is specified in [I-D.ietf-quic-transport]. This 1600 provides an acknowledged PL. A sender can therefore enter the BASE 1601 state as soon as connectivity has been confirmed. 1603 6.3.2. Sending QUIC Probe Packets 1605 A probe packet consists of a QUIC Header and a payload containing 1606 PADDING Frames and a PING Frame. PADDING Frames are a single octet 1607 (0x00) and several of these can be used to create a probe packet of 1608 size PROBED_SIZE. QUIC provides an acknowledged PL, a sender can 1609 therefore enter the BASE state as soon as connectivity has been 1610 confirmed. 1612 The current specification of QUIC sets the following: 1614 * BASE_PMTU: 1280. A QUIC sender pads initial packets to confirm 1615 the path can support packets of the required size. 1617 * MIN_PMTU: 1280 bytes. A QUIC sender that determines the PLPMTU 1618 has fallen below 1280 bytes MUST immediately stop sending on the 1619 affected path. 1621 6.3.3. Validating the Path with QUIC 1623 QUIC provides an acknowledged PL. A sender therefore MUST NOT 1624 implement the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1626 6.3.4. Handling of PTB Messages by QUIC 1628 QUIC validates ICMP PTB messages. In addition to UDP Port 1629 validation, QUIC can validate an ICMP message by using other PL 1630 information (e.g., validation of connection IDs in the quoted packet 1631 of any received ICMP message). 1633 7. Acknowledgements 1635 This work was partially funded by the European Union's Horizon 2020 1636 research and innovation programme under grant agreement No. 644334 1637 (NEAT). The views expressed are solely those of the author(s). 1639 Thanks to all that have commented or contributed, the TSVWG and QUIC 1640 working groups, and Mathew Calder and Julius Flohr for providing 1641 early implementations. 1643 8. IANA Considerations 1645 This memo includes no request to IANA. 1647 If there are no requirements for IANA, the section will be removed 1648 during conversion into an RFC by the RFC Editor. 1650 9. Security Considerations 1652 The security considerations for the use of UDP and SCTP are provided 1653 in the referenced RFCs. 1655 To avoid excessive load, the interval between individual probe 1656 packets MUST be at least one RTT, and the interval between rounds of 1657 probing is determined by the PMTU_RAISE_TIMER. 1659 A PL sender needs to ensure that the method used to confirm reception 1660 of probe packets protects from off-path attackers injecting packets 1661 into the path. This protection if provided in IETF-defined protocols 1662 (e.g., TCP, SCTP) using a randomly-initialized sequence number. A 1663 description of one way to do this when using UDP is provided in 1664 section 5.1 of [RFC8085]). 1666 There are cases where ICMP Packet Too Big (PTB) messages are not 1667 delivered due to policy, configuration or equipment design (see 1668 Section 1.1), this method therefore does not rely upon PTB messages 1669 being received, but is able to utilize these when they are received 1670 by the sender. PTB messages could potentially be used to cause a 1671 node to inappropriately reduce the PLPMTU. A node supporting 1672 DPLPMTUD MUST therefore appropriately validate the payload of PTB 1673 messages to ensure these are received in response to transmitted 1674 traffic (i.e., a reported error condition that corresponds to a 1675 datagram actually sent by the path layer, see Section 4.6.1). 1677 An on-path attacker, able to create a PTB message could forge PTB 1678 messages that include a valid quoted IP packet. Such an attack could 1679 be used to drive down the PLPMTU. There are two ways this method can 1680 be mitigated against such attacks: First, by ensuring that a PL 1681 sender never reduces the PLPMTU below the base size, solely in 1682 response to receiving a PTB message. This is achieved by first 1683 entering the BASE state when such a message is received. Second, the 1684 design does not require processing of PTB messages, a PL sender could 1685 therefore suspend processing of PTB messages (e.g., in a robustness 1686 mode after detecting that subsequent probes actually confirm that a 1687 size larger than the PTB_SIZE is supported by a path). 1689 The successful processing of an ICMP message can trigger a probe when 1690 the reported PTB size is valid, but this does not directly update the 1691 PLPMTU for the path. This prevents a message attempting to black 1692 hole data by indicating a size larger than supported by the path. 1694 Parallel forwarding paths SHOULD be considered. Section 5.4 1695 identifies the need for robustness in the method because the path 1696 information might be inconsistent. 1698 A node performing DPLPMTUD could experience conflicting information 1699 about the size of supported probe packets. This could occur when 1700 there are multiple paths are concurrently in use and these exhibit a 1701 different PMTU. If not considered, this could result in packets not 1702 being delivered (black holed) when the PLPMTU is larger than the 1703 smallest actual PMTU. 1705 10. References 1707 10.1. Normative References 1709 [I-D.ietf-quic-transport] 1710 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1711 and Secure Transport", draft-ietf-quic-transport-20 (work 1712 in progress), 23 April 2019, 1713 . 1716 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1717 DOI 10.17487/RFC0768, August 1980, 1718 . 1720 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1721 DOI 10.17487/RFC0791, September 1981, 1722 . 1724 [RFC1191] Mogul, J.C. and S.E. Deering, "Path MTU discovery", 1725 RFC 1191, DOI 10.17487/RFC1191, November 1990, 1726 . 1728 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1729 Requirement Levels", BCP 14, RFC 2119, 1730 DOI 10.17487/RFC2119, March 1997, 1731 . 1733 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., 1734 and G. Fairhurst, Ed., "The Lightweight User Datagram 1735 Protocol (UDP-Lite)", RFC 3828, DOI 10.17487/RFC3828, July 1736 2004, . 1738 [RFC4820] Tuexen, M., Stewart, R., and P. Lei, "Padding Chunk and 1739 Parameter for the Stream Control Transmission Protocol 1740 (SCTP)", RFC 4820, DOI 10.17487/RFC4820, March 2007, 1741 . 1743 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1744 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1745 . 1747 [RFC6951] Tuexen, M. and R. Stewart, "UDP Encapsulation of Stream 1748 Control Transmission Protocol (SCTP) Packets for End-Host 1749 to End-Host Communication", RFC 6951, 1750 DOI 10.17487/RFC6951, May 2013, 1751 . 1753 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1754 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1755 March 2017, . 1757 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1758 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1759 May 2017, . 1761 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1762 (IPv6) Specification", STD 86, RFC 8200, 1763 DOI 10.17487/RFC8200, July 2017, 1764 . 1766 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1767 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1768 DOI 10.17487/RFC8201, July 2017, 1769 . 1771 [RFC8261] Tuexen, M., Stewart, R., Jesup, R., and S. Loreto, 1772 "Datagram Transport Layer Security (DTLS) Encapsulation of 1773 SCTP Packets", RFC 8261, DOI 10.17487/RFC8261, November 1774 2017, . 1776 10.2. Informative References 1778 [I-D.ietf-intarea-frag-fragile] 1779 Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O., 1780 and F. Gont, "IP Fragmentation Considered Fragile", draft- 1781 ietf-intarea-frag-fragile-17 (work in progress), 30 1782 September 2019, 1783 . 1786 [I-D.ietf-intarea-tunnels] 1787 Touch, J. and M. Townsley, "IP Tunnels in the Internet 1788 Architecture", draft-ietf-intarea-tunnels-10 (work in 1789 progress), 12 September 2019, 1790 . 1793 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1794 RFC 792, DOI 10.17487/RFC0792, September 1981, 1795 . 1797 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1798 Communication Layers", STD 3, RFC 1122, 1799 DOI 10.17487/RFC1122, October 1989, 1800 . 1802 [RFC1812] Baker, F., Ed., "Requirements for IP Version 4 Routers", 1803 RFC 1812, DOI 10.17487/RFC1812, June 1995, 1804 . 1806 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 1807 RFC 2923, DOI 10.17487/RFC2923, September 2000, 1808 . 1810 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1811 Congestion Control Protocol (DCCP)", RFC 4340, 1812 DOI 10.17487/RFC4340, March 2006, 1813 . 1815 [RFC4443] Conta, A., Deering, S., and M. Gupta, Ed., "Internet 1816 Control Message Protocol (ICMPv6) for the Internet 1817 Protocol Version 6 (IPv6) Specification", STD 89, 1818 RFC 4443, DOI 10.17487/RFC4443, March 2006, 1819 . 1821 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1822 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 1823 . 1825 [RFC4890] Davies, E. and J. Mohacsi, "Recommendations for Filtering 1826 ICMPv6 Messages in Firewalls", RFC 4890, 1827 DOI 10.17487/RFC4890, May 2007, 1828 . 1830 [RFC5508] Srisuresh, P., Ford, B., Sivakumar, S., and S. Guha, "NAT 1831 Behavioral Requirements for ICMP", BCP 148, RFC 5508, 1832 DOI 10.17487/RFC5508, April 2009, 1833 . 1835 Appendix A. Revision Notes 1837 Note to RFC-Editor: please remove this entire section prior to 1838 publication. 1840 Individual draft -00: 1842 * Comments and corrections are welcome directly to the authors or 1843 via the IETF TSVWG working group mailing list. 1845 * This update is proposed for WG comments. 1847 Individual draft -01: 1849 * Contains the first representation of the algorithm, showing the 1850 states and timers 1852 * This update is proposed for WG comments. 1854 Individual draft -02: 1856 * Contains updated representation of the algorithm, and textual 1857 corrections. 1859 * The text describing when to set the effective PMTU has not yet 1860 been validated by the authors 1862 * To determine security to off-path-attacks: We need to decide 1863 whether a received PTB message SHOULD/MUST be validated? The text 1864 on how to handle a PTB message indicating a link MTU larger than 1865 the probe has yet not been validated by the authors 1867 * No text currently describes how to handle inconsistent results 1868 from arbitrary re-routing along different parallel paths 1870 * This update is proposed for WG comments. 1872 Working Group draft -00: 1874 * This draft follows a successful adoption call for TSVWG 1876 * There is still work to complete, please comment on this draft. 1878 Working Group draft -01: 1880 * This draft includes improved introduction. 1882 * The draft is updated to require ICMP validation prior to accepting 1883 PTB messages - this to be confirmed by WG 1885 * Section added to discuss Selection of Probe Size - methods to be 1886 evaluated and recommendations to be considered 1888 * Section added to align with work proposed in the QUIC WG. 1890 Working Group draft -02: 1892 * The draft was updated based on feedback from the WG, and a 1893 detailed review by Magnus Westerlund. 1895 * The document updates RFC 4821. 1897 * Requirements list updated. 1899 * Added more explicit discussion of a simpler black-hole detection 1900 mode. 1902 * This draft includes reorganisation of the section on IETF 1903 protocols. 1905 * Added more discussion of implementation within an application. 1907 * Added text on flapping paths. 1909 * Replaced 'effective MTU' with new term PLPMTU. 1911 Working Group draft -03: 1913 * Updated figures 1915 * Added more discussion on blackhole detection 1917 * Added figure describing just blackhole detection 1919 * Added figure relating MPS sizes 1921 Working Group draft -04: 1923 * Described phases and named these consistently. 1925 * Corrected transition from confirmation directly to the search 1926 phase (Base has been checked). 1928 * Redrawn state diagrams. 1930 * Renamed BASE_MTU to BASE_PMTU (because it is a base for the PMTU). 1932 * Clarified Error state. 1934 * Clarified suspending DPLPMTUD. 1936 * Verified normative text in requirements section. 1938 * Removed duplicate text. 1940 * Changed all text to refer to /packet probe/probe packet/ 1941 /validation/verification/ added term /Probe Confirmation/ and 1942 clarified BlackHole detection. 1944 Working Group draft -05: 1946 * Updated security considerations. 1948 * Feedback after speaking with Joe Touch helped improve UDP-Options 1949 description. 1951 Working Group draft -06: 1953 * Updated description of ICMP issues in section 1.1 1954 * Update to description of QUIC. 1956 Working group draft -07: 1958 * Moved description of the PTB processing method from the PTB 1959 requirements section. 1961 * Clarified what is performed in the PTB validation check. 1963 * Updated security consideration to explain PTB security without 1964 needing to read the rest of the document. 1966 * Reformatted state machine diagram 1968 Working group draft -08: 1970 * Moved to rfcxml v3+ 1972 * Rendered diagrams to svg in html version. 1974 * Removed Appendix A. Event-driven state changes. 1976 * Removed section on DPLPMTUD with UDP Options. 1978 * Shortened the description of phases. 1980 Working group draft -09: 1982 * Remove final mention of UDP Options 1984 * Add Initial Connectivity sections to each PL 1986 * Add to disable outgoing pmtu enforcement of packets 1988 Working group draft -10: 1990 * Address comments from Lars Eggert 1992 * Reinforce that PROBE_COUNT is successive attempts to probe for any 1993 size 1995 * Redefine MAx_PROBES to 3 1997 * Address PTB_SIZE of 0 or less that MIN_PMTU 1999 Working group draft -11: 2001 * Restore a sentence removed in previous rev 2002 * De-acronymise QUIC 2004 * Address some nits 2006 Working group draft -12: 2008 * Add TSVWG, QUIC and implementers to acknowledgements 2010 * Shorten a diagram line. 2012 * Address nits from Julius and Wes. 2014 * Be clearer when talking about IP layer caches 2016 Authors' Addresses 2018 Godred Fairhurst 2019 University of Aberdeen 2020 School of Engineering, Fraser Noble Building 2021 Aberdeen 2022 AB24 3UE 2023 United Kingdom 2025 Email: gorry@erg.abdn.ac.uk 2027 Tom Jones 2028 University of Aberdeen 2029 School of Engineering, Fraser Noble Building 2030 Aberdeen 2031 AB24 3UE 2032 United Kingdom 2034 Email: tom@erg.abdn.ac.uk 2036 Michael Tuexen 2037 Muenster University of Applied Sciences 2038 Stegerwaldstrasse 39 2039 48565 Steinfurt 2040 Germany 2042 Email: tuexen@fh-muenster.de 2044 Irene Ruengeler 2045 Muenster University of Applied Sciences 2046 Stegerwaldstrasse 39 2047 48565 Steinfurt 2048 Germany 2050 Email: i.ruengeler@fh-muenster.de 2052 Timo Voelker 2053 Muenster University of Applied Sciences 2054 Stegerwaldstrasse 39 2055 48565 Steinfurt 2056 Germany 2058 Email: timo.voelker@fh-muenster.de