idnits 2.17.1 draft-ietf-tsvwg-datagram-plpmtud-19.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The abstract seems to indicate that this document updates RFC8201, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC4821, updated by this document, for RFC5378 checks: 2003-10-21) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (3 April 2020) is 1483 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-27 ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) == Outdated reference: A later version (-13) exists of draft-ietf-intarea-tunnels-10 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Fairhurst 3 Internet-Draft T. Jones 4 Updates: 4821, 4960, 6951, 8085, 8261 (if University of Aberdeen 5 approved) M. Tuexen 6 Intended status: Standards Track I. Ruengeler 7 Expires: 5 October 2020 T. Voelker 8 Muenster University of Applied Sciences 9 3 April 2020 11 Packetization Layer Path MTU Discovery for Datagram Transports 12 draft-ietf-tsvwg-datagram-plpmtud-19 14 Abstract 16 This document describes a robust method for Path MTU Discovery 17 (PMTUD) for datagram Packetization Layers (PLs). It describes an 18 extension to RFC 1191 and RFC 8201, which specifies ICMP-based Path 19 MTU Discovery for IPv4 and IPv6. The method allows a PL, or a 20 datagram application that uses a PL, to discover whether a network 21 path can support the current size of datagram. This can be used to 22 detect and reduce the message size when a sender encounters a packet 23 black hole (where packets are discarded). The method can probe a 24 network path with progressively larger packets to discover whether 25 the maximum packet size can be increased. This allows a sender to 26 determine an appropriate packet size, providing functionality for 27 datagram transports that is equivalent to the Packetization Layer 28 PMTUD specification for TCP, specified in RFC 4821. 30 This document updates RFC 4821 to specify the method for datagram 31 PLs, and updates RFC 8085 as the method to use in place of RFC 4821 32 with UDP datagrams. Section 7.3 of RFC4960 recommends an endpoint 33 apply the techniques in RFC 4821 on a per-destination-address basis. 34 RFC 4960, RFC 6951 and RFC 8261 are updated to recommend that SCTP, 35 SCTP encapsulated in UDP and SCTP encapsulated in DTLS use the method 36 specified in this document instead of the method in RFC 4821. 38 The document also provides implementation notes for incorporating 39 Datagram PMTUD into IETF datagram transports or applications that use 40 datagram transports. 42 When published, this specification updates RFC 4960, RFC 4821, RFC 43 8085 and RFC 8261. 45 Status of This Memo 47 This Internet-Draft is submitted in full conformance with the 48 provisions of BCP 78 and BCP 79. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF). Note that other groups may also distribute 52 working documents as Internet-Drafts. The list of current Internet- 53 Drafts is at https://datatracker.ietf.org/drafts/current/. 55 Internet-Drafts are draft documents valid for a maximum of six months 56 and may be updated, replaced, or obsoleted by other documents at any 57 time. It is inappropriate to use Internet-Drafts as reference 58 material or to cite them other than as "work in progress." 60 This Internet-Draft will expire on 5 October 2020. 62 Copyright Notice 64 Copyright (c) 2020 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 69 license-info) in effect on the date of publication of this document. 70 Please review these documents carefully, as they describe your rights 71 and restrictions with respect to this document. Code Components 72 extracted from this document must include Simplified BSD License text 73 as described in Section 4.e of the Trust Legal Provisions and are 74 provided without warranty as described in the Simplified BSD License. 76 Table of Contents 78 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 79 1.1. Classical Path MTU Discovery . . . . . . . . . . . . . . 4 80 1.2. Packetization Layer Path MTU Discovery . . . . . . . . . 6 81 1.3. Path MTU Discovery for Datagram Services . . . . . . . . 7 82 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 83 3. Features Required to Provide Datagram PLPMTUD . . . . . . . . 11 84 4. DPLPMTUD Mechanisms . . . . . . . . . . . . . . . . . . . . . 14 85 4.1. PLPMTU Probe Packets . . . . . . . . . . . . . . . . . . 14 86 4.2. Confirmation of Probed Packet Size . . . . . . . . . . . 15 87 4.3. Black Hole Detection and Reducing the PLPMTU . . . . . . 15 88 4.4. The Maximum Packet Size (MPS) . . . . . . . . . . . . . . 16 89 4.5. Disabling the Effect of PMTUD . . . . . . . . . . . . . . 17 90 4.6. Response to PTB Messages . . . . . . . . . . . . . . . . 18 91 4.6.1. Validation of PTB Messages . . . . . . . . . . . . . 18 92 4.6.2. Use of PTB Messages . . . . . . . . . . . . . . . . . 19 93 5. Datagram Packetization Layer PMTUD . . . . . . . . . . . . . 20 94 5.1. DPLPMTUD Components . . . . . . . . . . . . . . . . . . . 21 95 5.1.1. Timers . . . . . . . . . . . . . . . . . . . . . . . 21 96 5.1.2. Constants . . . . . . . . . . . . . . . . . . . . . . 22 97 5.1.3. Variables . . . . . . . . . . . . . . . . . . . . . . 23 98 5.1.4. Overview of DPLPMTUD Phases . . . . . . . . . . . . . 24 99 5.2. State Machine . . . . . . . . . . . . . . . . . . . . . . 26 100 5.3. Search to Increase the PLPMTU . . . . . . . . . . . . . . 29 101 5.3.1. Probing for a larger PLPMTU . . . . . . . . . . . . . 29 102 5.3.2. Selection of Probe Sizes . . . . . . . . . . . . . . 30 103 5.3.3. Resilience to Inconsistent Path Information . . . . . 30 104 5.4. Robustness to Inconsistent Paths . . . . . . . . . . . . 31 105 6. Specification of Protocol-Specific Methods . . . . . . . . . 31 106 6.1. Application support for DPLPMTUD with UDP or UDP-Lite . . 31 107 6.1.1. Application Request . . . . . . . . . . . . . . . . . 32 108 6.1.2. Application Response . . . . . . . . . . . . . . . . 32 109 6.1.3. Sending Application Probe Packets . . . . . . . . . . 32 110 6.1.4. Initial Connectivity . . . . . . . . . . . . . . . . 32 111 6.1.5. Validating the Path . . . . . . . . . . . . . . . . . 32 112 6.1.6. Handling of PTB Messages . . . . . . . . . . . . . . 33 113 6.2. DPLPMTUD for SCTP . . . . . . . . . . . . . . . . . . . . 33 114 6.2.1. SCTP/IPv4 and SCTP/IPv6 . . . . . . . . . . . . . . . 33 115 6.2.1.1. Initial Connectivity . . . . . . . . . . . . . . 33 116 6.2.1.2. Sending SCTP Probe Packets . . . . . . . . . . . 33 117 6.2.1.3. Validating the Path with SCTP . . . . . . . . . . 34 118 6.2.1.4. PTB Message Handling by SCTP . . . . . . . . . . 34 119 6.2.2. DPLPMTUD for SCTP/UDP . . . . . . . . . . . . . . . . 34 120 6.2.2.1. Initial Connectivity . . . . . . . . . . . . . . 35 121 6.2.2.2. Sending SCTP/UDP Probe Packets . . . . . . . . . 35 122 6.2.2.3. Validating the Path with SCTP/UDP . . . . . . . . 35 123 6.2.2.4. Handling of PTB Messages by SCTP/UDP . . . . . . 35 124 6.2.3. DPLPMTUD for SCTP/DTLS . . . . . . . . . . . . . . . 35 125 6.2.3.1. Initial Connectivity . . . . . . . . . . . . . . 35 126 6.2.3.2. Sending SCTP/DTLS Probe Packets . . . . . . . . . 35 127 6.2.3.3. Validating the Path with SCTP/DTLS . . . . . . . 36 128 6.2.3.4. Handling of PTB Messages by SCTP/DTLS . . . . . . 36 129 6.3. DPLPMTUD for QUIC . . . . . . . . . . . . . . . . . . . . 36 130 6.3.1. Initial Connectivity . . . . . . . . . . . . . . . . 36 131 6.3.2. Sending QUIC Probe Packets . . . . . . . . . . . . . 36 132 6.3.3. Validating the Path with QUIC . . . . . . . . . . . . 37 133 6.3.4. Handling of PTB Messages by QUIC . . . . . . . . . . 37 134 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 37 135 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 136 9. Security Considerations . . . . . . . . . . . . . . . . . . . 37 137 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 138 10.1. Normative References . . . . . . . . . . . . . . . . . . 39 139 10.2. Informative References . . . . . . . . . . . . . . . . . 40 140 Appendix A. Revision Notes . . . . . . . . . . . . . . . . . . . 42 141 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 46 143 1. Introduction 145 The IETF has specified datagram transport using UDP, SCTP, and DCCP, 146 as well as protocols layered on top of these transports (e.g., SCTP/ 147 UDP, DCCP/UDP, QUIC/UDP), and direct datagram transport over the IP 148 network layer. This document describes a robust method for Path MTU 149 Discovery (PMTUD) that can be used with these transport protocols (or 150 the applications that use their transport service) to discover an 151 appropriate size of packet to use across an Internet path. 153 1.1. Classical Path MTU Discovery 155 Classical Path Maximum Transmission Unit Discovery (PMTUD) can be 156 used with any transport that is able to process ICMP Packet Too Big 157 (PTB) messages (e.g., [RFC1191] and [RFC8201]). In this document, 158 the term PTB message is applied to both IPv4 ICMP Unreachable 159 messages (type 3) that carry the error Fragmentation Needed (Type 3, 160 Code 4) [RFC0792] and ICMPv6 Packet Too Big messages (Type 2) 161 [RFC4443]. When a sender receives a PTB message, it reduces the 162 effective MTU to the value reported as the Link MTU in the PTB 163 message. A method from time-to-time increases the packet size in 164 attempt to discover an increase in the supported PMTU. The packets 165 sent with a size larger than the current effective PMTU are known as 166 probe packets. 168 Packets not intended as probe packets are either fragmented to the 169 current effective PMTU, or the attempt to send fails with an error 170 code. Applications can be provided with a primitive to let them read 171 the Maximum Packet Size (MPS), derived from the current effective 172 PMTU. 174 Classical PMTUD is subject to protocol failures. One failure arises 175 when traffic using a packet size larger than the actual PMTU is 176 black-holed (all datagrams sent with this size, or larger, are 177 discarded). This could arise when the PTB messages are not delivered 178 back to the sender for some reason (see for example [RFC2923]). 180 Examples where PTB messages are not delivered include: 182 * The generation of ICMP messages is usually rate limited. This 183 could result in no PTB messages being generated to the sender (see 184 section 2.4 of [RFC4443]) 186 * ICMP messages can be filtered by middleboxes (including firewalls) 187 [RFC4890]. A stateful firewall could be configured with a policy 188 to block incoming ICMP messages, which would prevent reception of 189 PTB messages to a sending endpoint behind this firewall. 191 * When the router issuing the ICMP message drops a tunneled packet, 192 the resulting ICMP message will be directed to the tunnel ingress. 193 This tunnel endpoint is responsible for forwarding the ICMP 194 message and also processing the quoted packet within the payload 195 field to remove the effect of the tunnel, and return a correctly 196 formatted ICMP message to the sender [I-D.ietf-intarea-tunnels]. 197 Failure to do this prevents the PTB message reaching the original 198 sender. 200 * Asymmetry in forwarding can result in there being no return route 201 to the original sender, which would prevent an ICMP message being 202 delivered to the sender. This issue can also arise when policy- 203 based routing is used, Equal Cost Multipath (ECMP) routing is 204 used, or a middlebox acts as an application load balancer. An 205 example is where the path towards the server is chosen by ECMP 206 routing depending on bytes in the IP payload. In this case, when 207 a packet sent by the server encounters a problem after the ECMP 208 router, then any resulting ICMP message also needs to be directed 209 by the ECMP router towards the original sender. 211 * There are additional cases where the next hop destination fails to 212 receive a packet because of its size. This could be due to 213 misconfiguration of the layer 2 path between nodes, for instance 214 the MTU configured in a layer 2 switch, or misconfiguration of the 215 Maximum Receive Unit (MRU). If a packet is dropped by the link, 216 this will not cause a PTB message to be sent to the original 217 sender. 219 Another failure could result if a node that is not on the network 220 path sends a PTB message that attempts to force a sender to change 221 the effective PMTU [RFC8201]. A sender can protect itself from 222 reacting to such messages by utilizing the quoted packet within a PTB 223 message payload to validate that the received PTB message was 224 generated in response to a packet that had actually originated from 225 the sender. However, there are situations where a sender would be 226 unable to provide this validation. Examples where validation of the 227 PTB message is not possible include: 229 * When a router issuing the ICMP message implements RFC792 230 [RFC0792], it is only required to include the first 64 bits of the 231 IP payload of the packet within the quoted payload. There could 232 be insufficient bytes remaining for the sender to interpret the 233 quoted transport information. 235 Note: The recommendation in RFC1812 [RFC1812] is that IPv4 routers 236 return a quoted packet with as much of the original datagram as 237 possible without the length of the ICMP datagram exceeding 576 238 bytes. IPv6 routers include as much of the invoking packet as 239 possible without the ICMPv6 packet exceeding 1280 bytes [RFC4443]. 241 * The use of tunnels/encryption can reduce the size of the quoted 242 packet returned to the original source address, increasing the 243 risk that there could be insufficient bytes remaining for the 244 sender to interpret the quoted transport information. 246 * Even when the PTB message includes sufficient bytes of the quoted 247 packet, the network layer could lack sufficient context to 248 validate the message, because validation depends on information 249 about the active transport flows at an endpoint node (e.g., the 250 socket/address pairs being used, and other protocol header 251 information). 253 * When a packet is encapsulated/tunneled over an encrypted 254 transport, the tunnel/encapsulation ingress might have 255 insufficient context, or computational power, to reconstruct the 256 transport header that would be needed to perform validation. 258 * When an ICMP message is generated by a router in a network segment 259 that has inserted a header into a packet, the quoted packet could 260 contain additional protocol header information that was not 261 included in the original sent packet, and which the PL sender does 262 not process or may not know how to process. This could disrupt 263 the ability of the sender to validate this PTB message. 265 * A Network Address Translation (NAT) device that translates a 266 packet header, ought to also translate ICMP messages and update 267 the ICMP quoted packet [RFC5508] in that message. If this is not 268 correctly translated then the sender would not be able to 269 associate the message with the PL that originated the packet, and 270 hence this ICMP message cannot be validated. 272 1.2. Packetization Layer Path MTU Discovery 274 The term Packetization Layer (PL) has been introduced to describe the 275 layer that is responsible for placing data blocks into the payload of 276 IP packets and selecting an appropriate MPS. This function is often 277 performed by a transport protocol (e.g., DCCP, RTP, SCTP, QUIC), but 278 can also be performed by other encapsulation methods working above 279 the transport layer. 281 In contrast to PMTUD, Packetization Layer Path MTU Discovery 282 (PLPMTUD) [RFC4821] introduced a method that does not rely upon 283 reception and validation of PTB messages. It is therefore more 284 robust than Classical PMTUD. This has become the recommended 285 approach for implementing discovery of the PMTU [RFC8085]. 287 It uses a general strategy where the PL sends probe packets to search 288 for the largest size of unfragmented datagram that can be sent over a 289 network path. Probe packets are sent to explore using a larger 290 packet size. If a probe packet is successfully delivered (as 291 determined by the PL), then the PLPMTU is raised to the size of the 292 successful probe. If a black hole is detected (e.g., where packets 293 of size PLPMTU are consistently not received), the method reduces the 294 PLPMTU. 296 Datagram PLPMTUD introduces flexibility in implementation. At one 297 extreme, it can be configured to only perform Black Hole Detection 298 and recovery with increased robustness compared to Classical PMTUD. 299 At the other extreme, all PTB processing can be disabled, and PLPMTUD 300 replaces Classical PMTUD. 302 PLPMTUD can also include additional consistency checks without 303 increasing the risk that data is lost when probing to discover the 304 Path MTU. For example, information available at the PL, or higher 305 layers, enables received PTB messages to be validated before being 306 utilized. 308 1.3. Path MTU Discovery for Datagram Services 310 Section 5 of this document presents a set of algorithms for datagram 311 protocols to discover the largest size of unfragmented datagram that 312 can be sent over a network path. The method relies upon features of 313 the PL described in Section 3 and applies to transport protocols 314 operating over IPv4 and IPv6. It does not require cooperation from 315 the lower layers, although it can utilize PTB messages when these 316 received messages are made available to the PL. 318 The message size guidelines in section 3.2 of the UDP Usage 319 Guidelines [RFC8085] state "an application SHOULD either use the Path 320 MTU information provided by the IP layer or implement Path MTU 321 Discovery (PMTUD)", but does not provide a mechanism for discovering 322 the largest size of unfragmented datagram that can be used on a 323 network path. The present document updates RFC 8085 to specify this 324 method in place of PLPMTUD [RFC4821] and provides a mechanism for 325 sharing the discovered largest size as the MPS (see Section 4.4). 327 Section 10.2 of [RFC4821] recommended a PLPMTUD probing method for 328 the Stream Control Transport Protocol (SCTP). SCTP utilizes probe 329 packets consisting of a minimal sized HEARTBEAT chunk bundled with a 330 PAD chunk as defined in [RFC4820]. However, RFC 4821 did not provide 331 a complete specification. The present document replaces this by 332 providing a complete specification. 334 The Datagram Congestion Control Protocol (DCCP) [RFC4340] requires 335 implementations to support Classical PMTUD and states that a DCCP 336 sender "MUST maintain the MPS allowed for each active DCCP session". 337 It also defines the current congestion control MPS (CCMPS) supported 338 by a network path. This recommends use of PMTUD, and suggests use of 339 control packets (DCCP-Sync) as path probe packets, because they do 340 not risk application data loss. The method defined in this 341 specification can be used with DCCP. 343 Section 4 and Section 5 define the protocol mechanisms and 344 specification for Datagram Packetization Layer Path MTU Discovery 345 (DPLPMTUD). 347 Section 6 specifies the method for datagram transports and provides 348 information to enable the implementation of PLPMTUD with other 349 datagram transports and applications that use datagram transports. 351 Section 6 also provides updated recommendations for [RFC6951] and 352 [RFC8261]. 354 2. Terminology 356 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 357 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 358 "OPTIONAL" in this document are to be interpreted as described in BCP 359 14 [RFC2119] [RFC8174] when, and only when, they appear in all 360 capitals, as shown here. 362 The following terminology is defined. Relevant terms are directly 363 copied from [RFC4821], and the definitions in [RFC1122]. 365 Acknowledged PL: A PL that includes a mechanism that can confirm 366 successful delivery of datagrams to the remote PL endpoint (e.g., 367 SCTP). Typically, the PL receiver returns acknowledgments 368 corresponding to the received datagrams, which can be utilised to 369 detect black-holing of packets (c.f., Unacknowledged PL). 371 Actual PMTU: The Actual PMTU is the PMTU of a network path between a 372 sender PL and a destination PL, which the DPLPMTUD algorithm seeks 373 to determine. 375 Black Hole: A Black Hole is encountered when a sender is unaware 376 that packets are not being delivered to the destination end point. 377 Two types of Black Hole are relevant to DPLPMTUD: 379 * Packets encounter a packet Black Hole when packets are not 380 delivered to the destination endpoint (e.g., when the sender 381 transmits packets of a particular size with a previously known 382 effective PMTU and they are discarded by the network). 384 * An ICMP Black Hole is encountered when the sender is unaware 385 that packets are not delivered to the destination endpoint 386 because PTB messages are not received by the originating PL 387 sender. 389 Classical Path MTU Discovery: Classical PMTUD is a process described 390 in [RFC1191] and [RFC8201], in which nodes rely on PTB messages to 391 learn the largest size of unfragmented packet that can be used 392 across a network path. 394 Datagram: A datagram is a transport-layer protocol data unit, 395 transmitted in the payload of an IP packet. 397 Effective PMTU: The Effective PMTU is the current estimated value 398 for PMTU that is used by a PMTUD. This is equivalent to the 399 PLPMTU derived by PLPMTUD plus the size of any headers added below 400 the PL, including the IP layer headers. 402 EMTU_S: The Effective MTU for sending (EMTU_S) is defined in 403 [RFC1122] as "the maximum IP datagram size that may be sent, for a 404 particular combination of IP source and destination addresses...". 406 EMTU_R: The Effective MTU for receiving (EMTU_R) is designated in 407 [RFC1122] as "the largest datagram size that can be reassembled". 409 Link: A Link is a communication facility or medium over which nodes 410 can communicate at the link layer, i.e., a layer below the IP 411 layer. Examples are Ethernet LANs and Internet (or higher) layer 412 tunnels. 414 Link MTU: The Link Maximum Transmission Unit (MTU) is the size in 415 bytes of the largest IP packet, including the IP header and 416 payload, that can be transmitted over a link. Note that this 417 could more properly be called the IP MTU, to be consistent with 418 how other standards organizations use the acronym. This includes 419 the IP header, but excludes link layer headers and other framing 420 that is not part of IP or the IP payload. Other standards 421 organizations generally define the link MTU to include the link 422 layer headers. This specification continues the requirement in 423 [RFC4821], that states "All links MUST enforce their MTU: links 424 that might non- deterministically deliver packets that are larger 425 than their rated MTU MUST consistently discard such packets." 427 MAX_PLPMTU: The MAX_PLPMTU is the largest size of PLPMTU that 428 DPLPMTUD will attempt to use. 430 MIN_PLPMTU: The MIN_PLPMTU is the smallest size of PLPMTU that 431 DPLPMTUD will attempt to use. 433 MPS: The Maximum Packet Size (MPS) is the largest size of 434 application data block that can be sent across a network path by a 435 PL using a single Datagram. 437 MSL: Maximum Segment Lifetime (MSL) The maximum delay a packet is 438 expected to experience across a path, taken as 2 minutes 439 [RFC8085]. 441 Packet: A Packet is the IP header plus the IP payload. 443 Packetization Layer (PL): The PL is a layer of the network stack 444 that places data into packets and performs transport protocol 445 functions. Examples of a PL include: TCP, SCTP, SCTP over DTLS or 446 QUIC. 448 Path: The Path is the set of links and routers traversed by a packet 449 between a source node and a destination node by a particular flow. 451 Path MTU (PMTU): The Path MTU (PMTU) is the minimum of the Link MTU 452 of all the links forming a network path between a source node and 453 a destination node, as used by PMTUD. 455 PTB_SIZE: The PTB_SIZE is a value reported in a validated PTB 456 message that indicates next hop link MTU of a router along the 457 path. 459 PL_PTB_SIZE: The size reported in a validated PTB message, reduced 460 by the size of all headers added by layers below the PL. 462 PLPMTU: The Packetization Layer PMTU is an estimate of the largest 463 size of PL datagram that can be sent by a path, controled by 464 PLPMTUD. 466 PLPMTUD: Packetization Layer Path MTU Discovery (PLPMTUD), the 467 method described in this document for datagram PLs, which is an 468 extension to Classical PMTU Discovery. 470 Probe packet: A probe packet is a datagram sent with a purposely 471 chosen size (typically the current PLPMTU or larger) to detect if 472 packets of this size can be successfully sent end-to-end across 473 the network path. 475 Unacknowledged PL: A PL that does not itself provide a mechanism to 476 confirm delivery of datagrams to the remote PL endpoint (e.g., 477 UDP), and therefore requires DPLPMTUD to provide a mechanism to 478 detect black-holing of packets (c.f., Acknowledged PL). 480 3. Features Required to Provide Datagram PLPMTUD 482 The principles expressed in [RFC4821] apply to the use of the 483 technique with any PL. TCP PLPMTUD has been defined using standard 484 TCP protocol mechanisms. Unlike TCP, a datagram PL requires 485 additional mechanisms and considerations to implement PLPMTUD. 487 The requirements for datagram PLPMTUD are: 489 1. Managing the PLPMTU: For datagram PLs, the PLPMTU is managed by 490 DPLPMTUD. A PL MUST NOT send a datagram (other than a probe 491 packet) with a size at the PL that is larger than the current 492 PLPMTU. 494 2. Probe packets: The network interface below PL is REQUIRED to 495 provide a way to transmit a probe packet that is larger than the 496 PLMPMTU. In IPv4, a probe packet MUST be sent with the Don't 497 Fragment (DF) bit set in the IP header, and without network 498 layer endpoint fragmentation. In IPv6, a probe packet is always 499 sent without source fragmentation (as specified in section 5.4 500 of [RFC8201]). 502 3. Reception feedback: The destination PL endpoint is REQUIRED to 503 provide a feedback method that indicates to the DPLPMTUD sender 504 when a probe packet has been received by the destination PL 505 endpoint. Section 6 provides examples of how a PL can provide 506 this acknowledgment of received probe packets. 508 4. Probe loss recovery: It is RECOMMENDED to use probe packets that 509 do not carry any user data that would require retransmission if 510 lost. Most datagram transports permit this. If a probe packet 511 contains user data requiring retransmission in case of loss, the 512 PL (or layers above) are REQUIRED to arrange any retransmission/ 513 repair of any resulting loss. The PL is REQUIRED to be robust 514 in the case where probe packets are lost due to other reasons 515 (including link transmission error, congestion). 517 5. PMTU parameters: A DPLPMTUD sender is RECOMMENDED to utilize 518 information about the maximum size of packet that can be 519 transmitted by the sender on the local link (e.g., the local 520 Link MTU). It MAY utilize similar information about the maximum 521 size a receiver can accept when this is supplied (note this 522 could be less than EMTU_R). This avoids implementations trying 523 to send probe packets that can not be transferred by the local 524 link. Too high of a value could reduce the efficiency of the 525 search algorithm. Some applications also have a maximum 526 transport protocol data unit (PDU) size, in which case there is 527 no benefit from probing for a size larger than this (unless a 528 transport allows multiplexing multiple applications PDUs into 529 the same datagram). 531 6. Processing PTB messages: A DPLPMTUD sender MAY optionally 532 utilize PTB messages received from the network layer to help 533 identify when a network path does not support the current size 534 of probe packet. Any received PTB message MUST be validated 535 before it is used to update the PLPMTU discovery information 536 [RFC8201]. This validation confirms that the PTB message was 537 sent in response to a packet originating by the sender, and 538 needs to be performed before the PLPMTU discovery method reacts 539 to the PTB message. A PTB message MUST NOT be used to increase 540 the PLPMTU [RFC8201], but could trigger a probe to test for a 541 larger PLPMTU. A PL_PTB_SIZE that is greater than that 542 currently probed MUST be ignored. A valid PTB_SIZE is converted 543 to a PL_PTB_SIZE before it is to be used in the DPLPMTUD state 544 machine. 546 7. Probing and congestion control: The decision about when to send 547 a probe packet does not need to be limited by the congestion 548 controller. When not controlled by the congestion controller, 549 the interval between probe packets MUST be at least one RTT. If 550 transmission of probe packets is limited by the congestion 551 controller, this could result in transmission of probe packets 552 being delayed or suspended during congestion. 554 8. Loss of a probe packet SHOULD NOT be treated as an indication of 555 congestion and SHOULD NOT trigger a congestion control reaction 556 [RFC4821], because this could result in unnecessary reduction of 557 the sending rate. 559 9. An update to the PLPMTU (or MPS) MUST NOT increase the 560 congestion window measured in bytes [RFC4821]. Therefore, an 561 increase in the packet size does not cause an increase in the 562 data rate in bytes per second. 564 10. A PL that maintains the congestion window in terms of a limit to 565 the number of outstanding fixed size packets SHOULD adapt this 566 limit to compensate for the size of the actual packets. 568 11. Probing and flow control: Flow control at the PL concerns the 569 end-to-end flow of data using the PL service. This does not 570 apply to DPLPMTU when probe packets use a design that does not 571 carry user data to the remote application. 573 12. Shared PLPMTU state: The PMTU value calculated from the PLPMTU 574 MAY also be stored with the corresponding entry associated with 575 the destination in the IP layer cache, and used by other PL 576 instances. The specification of PLPMTUD [RFC4821] states: "If 577 PLPMTUD updates the MTU for a particular path, all Packetization 578 Layer sessions that share the path representation (as described 579 in Section 5.2 of [RFC4821]) SHOULD be notified to make use of 580 the new MTU". Such methods MUST be robust to the wide variety 581 of underlying network forwarding behaviors. Section 5.2 of 582 [RFC8201] provides guidance on the caching of PMTU information 583 and also the relation to IPv6 flow labels. 585 In addition, the following principles are stated for design of a 586 DPLPMTUD method: 588 * A PL MAY be designed to segment data blocks larger than the MPS 589 into multiple datagrams. However, not all datagram PLs support 590 segmentation of data blocks. It is RECOMMENDED that methods avoid 591 forcing an application to use an arbitrary small MPS for 592 transmission while the method is searching for the currently 593 supported PLPMTU. A reduced MPS can adversely impact the 594 performance of an application. 596 * To assist applications in choosing a suitable data block size, the 597 PL is RECOMMENDED to provide a primitive that returns the MPS 598 derived from the PLPMTU to the higher layer using the PL. The 599 value of the MPS can change following a change in the path, or 600 loss of probe packets. 602 * Path validation: It is RECOMMENDED that methods are robust to path 603 changes that could have occurred since the path characteristics 604 were last confirmed, and to the possibility of inconsistent path 605 information being received. 607 * Datagram reordering: A method is REQUIRED to be robust to the 608 possibility that a flow encounters reordering, or the traffic 609 (including probe packets) is divided over more than one network 610 path. 612 * Datagram delay and duplication: The feedback mechanism is REQUIRED 613 to be robust to the possibility that packets could be 614 significantly delayed or duplicated along a network path. 616 * When to probe: It is RECOMMENDED that methods determine whether 617 the path has changed since it last measured the path. This can 618 help determine when to probe the path again. 620 4. DPLPMTUD Mechanisms 622 This section lists the protocol mechanisms used in this 623 specification. 625 4.1. PLPMTU Probe Packets 627 The DPLPMTUD method relies upon the PL sender being able to generate 628 probe packets with a specific size. TCP is able to generate these 629 probe packets by choosing to appropriately segment data being sent 630 [RFC4821]. In contrast, a datagram PL that constructs a probe packet 631 has to either request an application to send a data block that is 632 larger than that generated by an application, or to utilize padding 633 functions to extend a datagram beyond the size of the application 634 data block. Protocols that permit exchange of control messages 635 (without an application data block) can generate a probe packet by 636 extending a control message with padding data. The total size of a 637 probe packet includes all headers and padding added to the payload 638 data being sent (e.g., including protocol option fields, security- 639 related fields such as an Authenticated Encryption with Associated 640 Data (AEAD) tag and TLS record layer padding). 642 A receiver is REQUIRED to be able to distinguish an in-band data 643 block from any added padding. This is needed to ensure that any 644 added padding is not passed on to an application at the receiver. 646 This results in three possible ways that a sender can create a probe 647 packet: 649 Probing using padding data: A probe packet that contains only 650 control information together with any padding, which is needed to 651 be inflated to the size of the probe packet. Since these probe 652 packets do not carry an application-supplied data block, they do 653 not typically require retransmission, although they do still 654 consume network capacity and incur endpoint processing. 656 Probing using application data and padding data: A probe packet that 657 contains a data block supplied by an application that is combined 658 with padding to inflate the length of the datagram to the size of 659 the probe packet. 661 Probing using application data: A probe packet that contains a data 662 block supplied by an application that matches the size of the 663 probe packet. This method requests the application to issue a 664 data block of the desired probe size. 666 A PL that uses a probe packet carrying application data and needs 667 protection from the loss of this probe packet could perform 668 transport-layer retransmission/repair of the data block (e.g., by 669 retransmission after loss is detected or by duplicating the data 670 block in a datagram without the padding data). This retransmitted 671 data block might possibly need to be sent using a smaller PLPMTU, 672 which could need the PL to to use a smaller packet size to traverse 673 the end-to-end path. (This could utilize endpoint network-layer or a 674 PL that can re-segment the data block into multiple datagrams). 676 DPLPMTUD MAY choose to use only one of these methods to simplify the 677 implementation. 679 Probe messages sent by a PL MUST contain enough information to 680 uniquely identify the probe within Maximum Segment Lifetime (e.g., 681 including a unique identifier from the PL or the DPLPMTUD 682 implementation), while being robust to reordering and replay of probe 683 response and PTB messages. 685 4.2. Confirmation of Probed Packet Size 687 The PL needs a method to determine (confirm) when probe packets have 688 been successfully received end-to-end across a network path. 690 Transport protocols can include end-to-end methods that detect and 691 report reception of specific datagrams that they send (e.g., DCCP and 692 SCTP provide keep-alive/heartbeat features). When supported, this 693 mechanism MAY also be used by DPLPMTUD to acknowledge reception of a 694 probe packet. 696 A PL that does not acknowledge data reception (e.g., UDP and UDP- 697 Lite) is unable itself to detect when the packets that it sends are 698 discarded because their size is greater than the actual PMTU. These 699 PLs need to rely on an application protocol to detect this loss. 701 Section 6 specifies this function for a set of IETF-specified 702 protocols. 704 4.3. Black Hole Detection and Reducing the PLPMTU 706 The description that follows uses the set of constants defined in 707 Section 5.1.2 and variables defined in Section 5.1.3. 709 Black Hole Detection is triggered by an indication that the network 710 path could be unable to support the current PLPMTU size. 712 There are three indicators that can detect black holes: 714 * A validated PTB message can be received that indicates a 715 PL_PTB_SIZE less than the current PLPMTU. A DPLPMTUD method MUST 716 NOT rely solely on this method. 718 * A PL can use the DPLPMTUD probing mechanism to periodically 719 generate probe packets of the size of the current PLPMTU (e.g., 720 using the confirmation timer Section 5.1.1). A timer tracks 721 whether acknowledgments are received. Successive loss of probes 722 is an indication that the current path no longer supports the 723 PLPMTU (e.g., when the number of probe packets sent without 724 receiving an acknowledgment, PROBE_COUNT, becomes greater than 725 MAX_PROBES). 727 * A PL can utilize an event that indicates the network path no 728 longer sustains the sender's PLPMTU size. This could use a 729 mechanism implemented within the PL to detect excessive loss of 730 data sent with a specific packet size and then conclude that this 731 excessive loss could be a result of an invalid PLPMTU (as in 732 PLPMTUD for TCP [RFC4821]). 734 The three methods can result in different transmission patterns for 735 packet probes and are expected to result in different responsiveness 736 following a change in the actual PMTU. 738 A PL MAY inhibit sending probe packets when no application data has 739 been sent since the previous probe packet. A PL that resumes sending 740 user data MAY continue PLPMTU discovery for each path. This allows 741 it to use an up-to-date PLPMTU. However, this could result in 742 additional packets being sent. 744 When the method detects the current PLPMTU is not supported, DPLPMTUD 745 sets a lower PLPMTU, and sets a lower MPS. The PL then confirms that 746 the new PLPMTU can be successfully used across the path. A probe 747 packet could need to have a size less than the size of the data block 748 generated by the application. 750 4.4. The Maximum Packet Size (MPS) 752 The result of probing determines a usable PLPMTU, which is used to 753 set the MPS used by the application. The MPS is smaller than the 754 PLPMTU because it is reduced by the size of PL headers (including the 755 overhead of security-related fields such as an AEAD tag and TLS 756 record layer padding). The relationship between the MPS and the 757 PLPMTUD is illustrated in Figure 1. 759 any additional 760 headers .--- MPS -----. 761 | | | 762 v v v 763 +------------------------------+ 764 | IP | ** | PL | protocol data | 765 +------------------------------+ 767 <----- PLPMTU -----> 768 <---------- PMTU --------------> 770 Figure 1: Relationship between MPS and PLPMTU 772 A PL is unable to send a packet (other than a probe packet) with a 773 size larger than the current PLPMTU at the network layer. To avoid 774 this, a PL MAY be designed to segment data blocks larger than the MPS 775 into multiple datagrams. 777 DPLPMTUD seeks to avoid IP fragmentation. An attempt to send a data 778 block larger than the MPS will therefore fail if a PL is unable to 779 segment data. To determine the largest data block that can be sent, 780 a PL SHOULD provide applications with a primitive that returns the 781 MPS, derived from the current PLPMTU. 783 If DPLPMTUD results in a change to the MPS, the application needs to 784 adapt to the new MPS. A particular case can arise when packets have 785 been sent with a size less than the MPS and the PLPMTU was 786 subsequently reduced. If these packets are lost, the PL MAY segment 787 the data using the new MPS. If a PL is unable to re-segment a 788 previously sent datagram (e.g., [RFC4960]), then the sender either 789 discards the datagram or could perform retransmission using network- 790 layer fragmentation to form multiple IP packets not larger than the 791 PLPMTU. For IPv4, the use of endpoint fragmentation by the sender is 792 preferred over clearing the DF bit in the IPv4 header. Operational 793 experience reveals that IP fragmentation can reduce the reliability 794 of Internet communication [I-D.ietf-intarea-frag-fragile], which may 795 reduce the success of retransmission. 797 4.5. Disabling the Effect of PMTUD 799 A PL implementing this specification MUST suspend network layer 800 processing of outgoing packets that enforces a PMTU 801 [RFC1191][RFC8201] for each flow utilizing DPLPMTUD, and instead use 802 DPLPMTUD to control the size of packets that are sent by a flow. 803 This removes the need for the network layer to drop or fragment sent 804 packets that have a size greater than the PMTU. 806 4.6. Response to PTB Messages 808 This method requires the DPLPMTUD sender to validate any received PTB 809 message before using the PTB information. The response to a PTB 810 message depends on the PL_PTB_SIZE calculated from the PTB_SIZE in 811 the PTB message, the state of the PLPMTUD state machine, and the IP 812 protocol being used. 814 Section 4.6.1 first describes validation for both IPv4 ICMP 815 Unreachable messages (type 3) and ICMPv6 Packet Too Big messages, 816 both of which are referred to as PTB messages in this document. 818 4.6.1. Validation of PTB Messages 820 This section specifies utilization and validation of PTB messages. 822 * A simple implementation MAY ignore received PTB messages and in 823 this case the PLPMTU is not updated when a PTB message is 824 received. 826 * A PL that supports PTB messages MUST validate these messages 827 before they are further processed. 829 A PL that receives a PTB message from a router or middlebox performs 830 ICMP validation as specified in Section 5.2 of [RFC8085][RFC8201]. 831 Because DPLPMTUD operates at the PL, the PL needs to check that each 832 received PTB message is received in response to a packet transmitted 833 by the endpoint PL performing DPLPMTUD. 835 The PL MUST check the protocol information in the quoted packet 836 carried in an ICMP PTB message payload to validate the message 837 originated from the sending node. This validation includes 838 determining that the combination of the IP addresses, the protocol, 839 the source port and destination port match those returned in the 840 quoted packet - this is also necessary for the PTB message to be 841 passed to the corresponding PL. 843 The validation SHOULD utilize information that it is not simple for 844 an off-path attacker to determine [RFC8085]. For example, it could 845 check the value of a protocol header field known only to the two PL 846 endpoints. A datagram application that uses well-known source and 847 destination ports ought to also rely on other information to complete 848 this validation. 850 These checks are intended to provide protection from packets that 851 originate from a node that is not on the network path. A PTB message 852 that does not complete the validation MUST NOT be further utilized by 853 the DPLPMTUD method, as discussed in the Security Considerations 854 section. 856 PTB messages that have been validated MAY be utilized by the DPLPMTUD 857 algorithm, but MUST NOT be used directly to set the PLPMTU. The 858 PL_PTB_SIZE is smaller than the PTB_SIZE because it is reduced by 859 headers below the PL including any IP options or extensions added to 860 the PL packet. A method that utilizes these PTB messages can improve 861 the speed at which the algorithm detects an appropriate PLPMTU by 862 triggering an immediate probe for the PL_PTB_SIZE (resulting in a 863 network-layer packet of size PTB_SIZE), compared to one that relies 864 solely on probing using a timer-based search algorithm. 865 Section 4.6.2 describes this processing. 867 4.6.2. Use of PTB Messages 869 Before using the size reported in the PTB message it must first be 870 converted to a PL_PTB_SIZE. A set of checks are intended to provide 871 protection from a router that reports an unexpected PTB_SIZE. The PL 872 also needs to check that the indicated PL_PTB_SIZE is less than the 873 size used by probe packets and at least the minimum size accepted. 875 This section provides a summary of how PTB messages can be utilized. 876 (This uses the set of constants defined in section 5.1.2). This 877 processing depends on the PL_PTB_SIZE and the current value of a set 878 of variables: 880 PL_PTB_SIZE < MIN_PLPMTU 881 * Invalid PL_PTB_SIZE see Section 4.6.1. 883 * PTB message ought to be discarded without further processing 884 (i.e., PLPMTU is not modified). 886 * The information could be utilized as an input that triggers 887 enabling a resilience mode (see Section 5.3.3). 889 MIN_PLPMTU < PL_PTB_SIZE < BASE_PLPMTU 890 * A robust PL MAY enter an error state (see Section 5.2) for an 891 IPv4 path when the PL_PTB_SIZE reported in the PTB message is 892 larger than or equal to 68 bytes [RFC0791] and when this is 893 less than the BASE_PLPMTU. 895 * A robust PL MAY enter an error state (see Section 5.2) for an 896 IPv6 path when the PL_PTB_SIZE reported in the PTB message is 897 larger than or equal to 1280 bytes [RFC8200] and when this is 898 less than the BASE_PLPMTU. 900 BASE_PLPMTU <= PL_PTB_SIZE < PLPMTU 901 * This could be an indication of a black hole. The PLPMTU SHOULD 902 be set to BASE_PLPMTU (the PLPMTU is reduced to the BASE_PLPMTU 903 to avoid unnecessary packet loss when a black hole is 904 encountered). 906 * The PL ought to start a search to quickly discover the new 907 PLPMTU. The PL_PTB_SIZE reported in the PTB message can be 908 used to initialize a search algorithm. 910 PL_PTB_SIZE = PLPMTU 911 * Completes the search for a larger PLPMTU. 913 PLPMTU < PL_PTB_SIZE < PROBED_SIZE 914 * The PLPMTU continues to be valid, but the size of a packet used 915 to search (PROBED_SIZE) was larger than the actual PMTU. 917 * The PLPMTU is not updated. 919 * The PL can use the reported PL_PTB_SIZE from the PTB message as 920 the next search point when it resumes the search algorithm. 922 PL_PTB_SIZE > PROBED_SIZE 923 * Inconsistent network signal. 925 * PTB message ought to be discarded without further processing 926 (i.e., PLPMTU is not modified). 928 * The information could be utilized as an input to trigger 929 enabling a resilience mode. 931 5. Datagram Packetization Layer PMTUD 933 This section specifies Datagram PLPMTUD (DPLPMTUD). The method can 934 be introduced at various points (as indicated with * in the figure 935 below) in the IP protocol stack to discover the PLPMTU so that an 936 application can utilize an appropriate MPS for the current network 937 path. 939 DPLPMTUD SHOULD NOT be used by an upper PL or application if it is 940 already used in a lower layer DPLPMTUD SHOULD only be performed once 941 between a pair of endpoints. A PL MUST adjust the MPS indicated by 942 DPLPMTUD to account for any additional overhead introduced by the PL. 944 +----------------------+ 945 | Application* | 946 +-----+------------+---+ 947 | | 948 +---+--+ +--+--+ 949 | QUIC*| |SCTP*| 950 +---+--+ +-+-+-+ 951 | | | 952 +---+ +----+ | 953 | | | 954 +-+--+-+ | 955 | UDP | | 956 +---+--+ | 957 | | 958 +-----------+-------+--+ 959 | Network Interface | 960 +----------------------+ 962 Figure 2: Examples where DPLPMTUD can be implemented 964 The central idea of DPLPMTUD is probing by a sender. Probe packets 965 are sent to find the maximum size of user message that can be 966 completely transferred across the network path from the sender to the 967 destination. 969 The following sections identify the components needed for 970 implementation, provides an overview of the phases of operation, and 971 specifies the state machine and search algorithm. 973 5.1. DPLPMTUD Components 975 This section describes the timers, constants, and variables of 976 DPLPMTUD. 978 5.1.1. Timers 980 The method utilizes up to three timers: 982 PROBE_TIMER: The PROBE_TIMER is configured to expire after a period 983 longer than the maximum time to receive an acknowledgment to a 984 probe packet. This value MUST NOT be smaller than 1 second, and 985 SHOULD be larger than 15 seconds. Guidance on selection of the 986 timer value are provided in section 3.1.1 of the UDP Usage 987 Guidelines [RFC8085]. 989 PMTU_RAISE_TIMER: The PMTU_RAISE_TIMER is configured to the period a 990 sender will continue to use the current PLPMTU, after which it re- 991 enters the Search phase. This timer has a period of 600 seconds, 992 as recommended by PLPMTUD [RFC4821]. 994 DPLPMTUD MAY inhibit sending probe packets when no application 995 data has been sent since the previous probe packet. A PL 996 preferring to use an up-to-date PMTU once user data is sent again, 997 can choose to continue PMTU discovery for each path. However, 998 this could result in sending additional packets. 1000 CONFIRMATION_TIMER: When an acknowledged PL is used, this timer MUST 1001 NOT be used. For other PLs, the CONFIRMATION_TIMER is configured 1002 to the period a PL sender waits before confirming the current 1003 PLPMTU is still supported. This is less than the PMTU_RAISE_TIMER 1004 and used to decrease the PLPMTU (e.g., when a black hole is 1005 encountered). Confirmation needs to be frequent enough when data 1006 is flowing that the sending PL does not black hole extensive 1007 amounts of traffic. Guidance on selection of the timer value are 1008 provided in section 3.1.1 of the UDP Usage Guidelines [RFC8085]. 1010 DPLPMTUD MAY inhibit sending probe packets when no application 1011 data has been sent since the previous probe packet. A PL 1012 preferring to use an up-to-date PMTU once user data is sent again, 1013 can choose to continue PMTU discovery for each path. However, 1014 this could result in sending additional packets. 1016 The various timers could be implemented using a single timer 1018 5.1.2. Constants 1020 The following constants are defined: 1022 MAX_PROBES: The MAX_PROBES is the maximum value of the PROBE_COUNT 1023 counter (see Section 5.1.3). MAX_PROBES represents the limit for 1024 the number of consecutive probe attempts of any size. Search 1025 algorithms benefit from a MAX_PROBES value greater than 1 because 1026 this can provide robustness to isolated packet loss. The default 1027 value of MAX_PROBES is 3. 1029 MIN_PLPMTU: The MIN_PLPMTU is the smallest allowed probe packet 1030 size. For IPv6, this value is 1280 bytes, as specified in 1031 [RFC8200]. For IPv4, the minimum value is 68 bytes. 1033 Note: An IPv4 router is required to be able to forward a datagram 1034 of 68 bytes without further fragmentation. This is the combined 1035 size of an IPv4 header and the minimum fragment size of 8 bytes. 1036 In addition, receivers are required to be able to reassemble 1037 fragmented datagrams at least up to 576 bytes, as stated in 1038 section 3.3.3 of [RFC1122]. 1040 MAX_PLPMTU: The MAX_PLPMTU is the largest size of PLPMTU. This has 1041 to be less than or equal to the maximum size of the PL packet that 1042 can be sent on the outgoing interface (constrained by the local 1043 interface MTU). When known, this also ought to be less than the 1044 maximum size of PL packet that can be received by the remote 1045 endpoint (constrained by EMTU_R). It can be limited by the design 1046 or configuration of the PL being used. An application, or PL, MAY 1047 choose a smaller MAX_PLPMTU when there is no need to send packets 1048 larger than a specific size. 1050 BASE_PLPMTU: The BASE_PLPMTU is a configured size expected to work 1051 for most paths. The size is equal to or larger than the 1052 MIN_PLPMTU and smaller than the MAX_PLPMTU. In the case of IPv6, 1053 this value is derived from the IPv6 minimum link MTU of 1280 bytes 1054 [RFC8200]. When using IPv4, there is no currently equivalent size 1055 specified and a default BASE_PLPMTU of 1200 bytes is RECOMMENDED. 1057 5.1.3. Variables 1059 This method utilizes a set of variables: 1061 PROBED_SIZE: The PROBED_SIZE is the size of the current probe 1062 packet. This is a tentative value for the PLPMTU, which is 1063 awaiting confirmation by an acknowledgment. 1065 PROBE_COUNT: The PROBE_COUNT is a count of the number of successive 1066 unsuccessful probe packets that have been sent. Each time a probe 1067 packet is acknowledged, the value is set to zero. (Some probe 1068 loss is expected while searching, therefore loss of a single probe 1069 is not an indication of a PMTU problem.) 1071 The figure below illustrates the relationship between the packet size 1072 constants and variables at a point of time when the DPLPMTUD 1073 algorithm performs path probing to increase the size of the PLPMTU. 1074 A probe packet has been sent of size PROBED_SIZE. Once this is 1075 acknowledged, the PLPMTU will raise to PROBED_SIZE allowing the 1076 DPLPMTUD algorithm to further increase PROBED_SIZE toward sending a 1077 probe with the size of the actual PMTU. 1079 MIN_PLPMTU MAX_PLPMTU 1080 <-------------------------------------------> 1081 | | | 1082 v | | 1083 BASE_PLPMTU | v 1084 | PROBED_SIZE 1085 v 1086 PLPMTU 1088 Figure 3: Relationships between packet size constants and variables 1090 5.1.4. Overview of DPLPMTUD Phases 1092 This section provides a high-level informative view of the DPLPMTUD 1093 method, by describing the movement of the method through several 1094 phases of operation. More detail is available in the state machine 1095 Section 5.2. 1097 +------+ 1098 +------->| Base |-----------------+ Connectivity 1099 | +------+ | or BASE_PLPMTU 1100 | | | confirmation failed 1101 | | v 1102 | | Connectivity +-------+ 1103 | | and BASE_PLPMTU | Error | 1104 | | confirmed +-------+ 1105 | | | Consistent 1106 | v | connectivity 1107 Black Hole | +--------+ | and BASE_PLPMTU 1108 detected | | Search |<---------------+ confirmed 1109 | +--------+ 1110 | ^ | 1111 | | | 1112 | Raise | | Search 1113 | timer | | algorithm 1114 | expired | | completed 1115 | | | 1116 | | v 1117 | +-----------------+ 1118 +---| Search Complete | 1119 +-----------------+ 1121 Figure 4: DPLPMTUD Phases 1123 Base: The Base Phase confirms connectivity to the remote peer using 1124 packets of the BASE_PLPMTU. This phase is implicit for a 1125 connection-oriented PL (where it can be performed in a PL 1126 connection handshake). A connectionless PL sends a probe packet 1127 and uses acknowledgment of this probe packet to confirm that the 1128 remote peer is reachable. 1130 The sender also confirms that BASE_PLPMTU is supported across the 1131 network path. This may be achieved using a PL mechanism (e.g., 1132 using a handshake packet of size BASE_PLPMTU), or by sending a 1133 probe packet of size BASE_PLPMTU and confirming that this is 1134 received. 1136 A probe packet of size BASE_PLPMTU can be sent immediately on the 1137 initial entry to the Base Phase (following a connectivity check). 1138 A PL that does not wish to support a path with a PLPMTU less than 1139 BASE_PLPMTU can simplify the phase into a single step by 1140 performing the connectivity checks with a probe of the BASE_PLPMTU 1141 size. 1143 Once confirmed, DPLPMTUD enters the Search Phase. If the Base 1144 Phase fails to confirm the BASE_PLPMTU, DPLPMTUD enters the Error 1145 Phase. 1147 Search: The Search Phase utilizes a search algorithm to send probe 1148 packets to seek to increase the PLPMTU. The algorithm concludes 1149 when it has found a suitable PLPMTU, by entering the Search 1150 Complete Phase. 1152 A PL could respond to PTB messages using the PTB to advance or 1153 terminate the search, see Section 4.6. 1155 Search Complete: The Search Complete Phase is entered when the 1156 PLPMTU is supported across the network path. A PL can use a 1157 CONFIRMATION_TIMER to periodically repeat a probe packet for the 1158 current PLPMTU size. If the sender is unable to confirm 1159 reachability (e.g., if the CONFIRMATION_TIMER expires) or the PL 1160 signals a lack of reachability, a black hole has been detected and 1161 DPLPMTUD enters the Base phase. 1163 The PMTU_RAISE_TIMER is used to periodically resume the search 1164 phase to discover if the PLPMTU can be raised. Black Hole 1165 Detection causes the sender to enter the Base Phase. 1167 Error: The Error Phase is entered when there is conflicting or 1168 invalid PLPMTU information for the path (e.g., a failure to 1169 support the BASE_PLPMTU) that cause DPLPMTUD to be unable to 1170 progress and the PLPMTU is lowered. 1172 DPLPMTUD remains in the Error Phase until a consistent view of the 1173 path can be discovered and it has also been confirmed that the 1174 path supports the BASE_PLPMTU (or DPLPMTUD is suspended). 1176 A method that only reduces the PLPMTU to a suitable size would be 1177 sufficient to ensure reliable operation, but can be very inefficient 1178 when the actual PMTU changes or when the method (for whatever reason) 1179 makes a suboptimal choice for the PLPMTU. 1181 A full implementation of DPLPMTUD provides an algorithm enabling the 1182 DPLPMTUD sender to increase the PLPMTU following a change in the 1183 characteristics of the path, such as when a link is reconfigured with 1184 a larger MTU, or when there is a change in the set of links traversed 1185 by an end-to-end flow (e.g., after a routing or path fail-over 1186 decision). 1188 5.2. State Machine 1190 A state machine for DPLPMTUD is depicted in Figure 5. If multipath 1191 or multihoming is supported, a state machine is needed for each path. 1193 Note: Not all changes are shown to simplify the diagram. 1195 | | 1196 | Start | PL indicates loss 1197 | | of connectivity 1198 v v 1199 +---------------+ +---------------+ 1200 | DISABLED | | ERROR | 1201 +---------------+ PROBE_TIMER expiry: +---------------+ 1202 | PL indicates PROBE_COUNT = MAX_PROBES or ^ | 1203 | connectivity PTB: PLPTB_SIZE < BASE_PLPMTU | | 1204 +--------------------+ +---------------+ | 1205 | | | 1206 v | BASE_PLPMTU Probe | 1207 +---------------+ acked | 1208 | BASE |----------------------+ 1209 +---------------+ | 1210 ^ | ^ ^ | 1211 Black hole detected | | | | Black hole detected | 1212 +--------------------+ | | +--------------------+ | 1213 | +----+ | | 1214 | PROBE_TIMER expiry: | | 1215 | PROBE_COUNT < MAX_PROBES | | 1216 | | | 1217 | PMTU_RAISE_TIMER expiry | | 1218 | +-----------------------------------------+ | | 1219 | | | | | 1220 | | v | v 1221 +---------------+ +---------------+ 1222 |SEARCH_COMPLETE| | SEARCHING | 1223 +---------------+ +---------------+ 1224 | ^ ^ | | ^ 1225 | | | | | | 1226 | | +-----------------------------------------+ | | 1227 | | MAX_PLPMTU Probe acked or | | 1228 | | PROBE_TIMER expiry: PROBE_COUNT = MAX_PROBES or | | 1229 +----+ PTB: PLPTB_SIZE = PLPMTU +----+ 1230 CONFIRMATION_TIMER expiry: PROBE_TIMER expiry: 1231 PROBE_COUNT < MAX_PROBES or PROBE_COUNT < MAX_PROBES or 1232 PLPMTU Probe acked Probe acked or PTB: 1233 PLPMTU < PLPTB_SIZE < PROBED_SIZE 1235 Figure 5: State machine for Datagram PLPMTUD 1237 The following states are defined: 1239 DISABLED: The DISABLED state is the initial state before probing has 1240 started. It is also entered from any other state, when the PL 1241 indicates loss of connectivity. This state is left once the PL 1242 indicates connectivity to the remote PL. When transitioning to 1243 the BASE state, a probe packet of size BASE_PLPMTU can be sent 1244 immediately. 1246 BASE: The BASE state is used to confirm that the BASE_PLPMTU size is 1247 supported by the network path and is designed to allow an 1248 application to continue working when there are transient 1249 reductions in the actual PMTU. It also seeks to avoid long 1250 periods when a sender searching for a larger PLPMTU is unaware 1251 that packets are not being delivered due to a packet or ICMP Black 1252 Hole. 1254 On entry, the PROBED_SIZE is set to the BASE_PLPMTU size and the 1255 PROBE_COUNT is set to zero. 1257 Each time a probe packet is sent, the PROBE_TIMER is started. The 1258 state is exited when the probe packet is acknowledged, and the PL 1259 sender enters the SEARCHING state. 1261 The state is also left when the PROBE_COUNT reaches MAX_PROBES or 1262 a received PTB message is validated. This causes the PL sender to 1263 enter the ERROR state. 1265 SEARCHING: The SEARCHING state is the main probing state. This 1266 state is entered when probing for the BASE_PLPMTU completes. 1268 Each time a probe packet is acknowledged, the PROBE_COUNT is set 1269 to zero, the PLPMTU is set to the PROBED_SIZE and then the 1270 PROBED_SIZE is increased using the search algorithm (as described 1271 in Section 5.3. 1273 When a probe packet is sent and not acknowledged within the period 1274 of the PROBE_TIMER, the PROBE_COUNT is incremented and a new probe 1275 packet is transmitted. 1277 The state is exited to enter SEARCH_COMPLETE when the PROBE_COUNT 1278 reaches MAX_PROBES, a validated PTB is received that corresponds 1279 to the last successfully probed size (PL_PTB_SIZE = PLPMTU), or a 1280 probe of size MAX_PLPMTU is acknowledged (PLPMTU = MAX_PLPMTU). 1282 When a black hole is detected in the SEARCHING state, this causes 1283 the PL sender to enter the BASE state. 1285 SEARCH_COMPLETE: The SEARCH_COMPLETE state indicates that a search 1286 has completed. This is the normal maintenance state, where the PL 1287 is not probing to update the PLPMTU. DPLPMTUD remains in this 1288 state until either the PMTU_RAISE_TIMER expires or a black hole is 1289 detected. 1291 When DPLPMTUD uses an unacknowledged PL and is in the 1292 SEARCH_COMPLETE state, a CONFIRMATION_TIMER periodically resets 1293 the PROBE_COUNT and schedules a probe packet with the size of the 1294 PLPMTU. If MAX_PROBES successive PLPMTUD sized probes fail to be 1295 acknowledged the method enters the BASE state. When used with an 1296 acknowledged PL (e.g., SCTP), DPLPMTUD SHOULD NOT continue to 1297 generate PLPMTU probes in this state. 1299 ERROR: The ERROR state represents the case where either the network 1300 path is not known to support a PLPMTU of at least the BASE_PLPMTU 1301 size or when there is contradictory information about the network 1302 path that would otherwise result in excessive variation in the MPS 1303 signaled to the higher layer. The state implements a method to 1304 mitigate oscillation in the state-event engine. It signals a 1305 conservative value of the MPS to the higher layer by the PL. The 1306 state is exited when packet probes no longer detect the error. 1307 The PL sender then enters the SEARCHING state. 1309 Implementations are permitted to enable endpoint fragmentation if 1310 the DPLPMTUD is unable to validate MIN_PLPMTU within PROBE_COUNT 1311 probes. If DPLPMTUD is unable to validate MIN_PLPMTU the 1312 implementation will transition to the DISABLED state. 1314 Note: MIN_PLPMTU could be identical to BASE_PLPMTU, simplifying 1315 the actions in this state. 1317 5.3. Search to Increase the PLPMTU 1319 This section describes the algorithms used by DPLPMTUD to search for 1320 a larger PLPMTU. 1322 5.3.1. Probing for a larger PLPMTU 1324 Implementations use a search algorithm across the search range to 1325 determine whether a larger PLPMTU can be supported across a network 1326 path. 1328 The method discovers the search range by confirming the minimum 1329 PLPMTU and then using the probe method to select a PROBED_SIZE less 1330 than or equal to MAX_PLPMTU. MAX_PLPMTU is the minimum of the local 1331 MTU and EMTU_R (when this is learned from the remote endpoint). The 1332 MAX_PLPMTU MAY be reduced by an application that sets a maximum to 1333 the size of datagrams it will send. 1335 The PROBE_COUNT is initialized to zero when the first probe with a 1336 size greater than or equal to PLPMTUD is sent. A timer is used to 1337 trigger the sending of probe packets of size PROBED_SIZE, larger than 1338 the PLPMTU. Each probe packet successfully sent to the remote peer 1339 is confirmed by acknowledgment at the PL, see Section 4.1. 1341 Each time a probe packet is sent to the destination, the PROBE_TIMER 1342 is started. The timer is canceled when the PL receives 1343 acknowledgment that the probe packet has been successfully sent 1344 across the path Section 4.1. This confirms that the PROBED_SIZE is 1345 supported, and the PROBED_SIZE value is then assigned to the PLPMTU. 1346 The search algorithm can continue to send subsequent probe packets of 1347 an increasing size. 1349 If the timer expires before a probe packet is acknowledged, the probe 1350 has failed to confirm the PROBED_SIZE. Each time the PROBE_TIMER 1351 expires, the PROBE_COUNT is incremented, the PROBE_TIMER is 1352 reinitialized, and a new probe of the same size or any other size 1353 (determined by the search algorithm) can be sent. The maximum number 1354 of consecutive failed probes is configured (MAX_PROBES). If the 1355 value of the PROBE_COUNT reaches MAX_PROBES, probing will stop, and 1356 the PL sender enters the SEARCH_COMPLETE state. 1358 5.3.2. Selection of Probe Sizes 1360 The search algorithm determines a minimum useful gain in PLPMTU. It 1361 would not be constructive for a PL sender to attempt to probe for all 1362 sizes. This would incur unnecessary load on the path. 1363 Implementations SHOULD select the set of probe packet sizes to 1364 maximize the gain in PLPMTU from each search step. 1366 Implementations could optimize the search procedure by selecting step 1367 sizes from a table of common PMTU sizes. When selecting the 1368 appropriate next size to search, an implementer ought to also 1369 consider that there can be common sizes of MPS that applications seek 1370 to use, and their could be common sizes of MTU used within the 1371 network. 1373 5.3.3. Resilience to Inconsistent Path Information 1375 A decision to increase the PLPMTU needs to be resilient to the 1376 possibility that information learned about the network path is 1377 inconsistent. A path is inconsistent when, for example, probe 1378 packets are lost due to other reasons (i.e., not packet size) or due 1379 to frequent path changes. Frequent path changes could occur by 1380 unexpected "flapping" - where some packets from a flow pass along one 1381 path, but other packets follow a different path with different 1382 properties. 1384 A PL sender is able to detect inconsistency from the sequence of 1385 PLPMTU probes that are acknowledged or the sequence of PTB messages 1386 that it receives. When inconsistent path information is detected, a 1387 PL sender could use an alternate search mode that clamps the offered 1388 MPS to a smaller value for a period of time. This avoids unnecessary 1389 loss of packets. 1391 5.4. Robustness to Inconsistent Paths 1393 Some paths could be unable to sustain packets of the BASE_PLPMTU 1394 size. The Error State could be implemented to provide rubustness to 1395 such paths. This allows fallback to a smaller than desired PLPMTU, 1396 rather than suffer connectivity failure. This could utilize methods 1397 such as endpoint IP fragmentation to enable the PL sender to 1398 communicate using packets smaller than the BASE_PLPMTU. 1400 6. Specification of Protocol-Specific Methods 1402 DPLPMTUD requires protocol-specific details to be specified for each 1403 PL that is used. 1405 The first subsection provides guidance on how to implement the 1406 DPLPMTUD method as a part of an application using UDP or UDP-Lite. 1407 The guidance also applies to other datagram services that do not 1408 include a specific transport protocol (such as a tunnel 1409 encapsulation). The following subsections describe how DPLPMTUD can 1410 be implemented as a part of the transport service, allowing 1411 applications using the service to benefit from discovery of the 1412 PLPMTU without themselves needing to implement this method when using 1413 SCTP and QUIC. 1415 6.1. Application support for DPLPMTUD with UDP or UDP-Lite 1417 The current specifications of UDP [RFC0768] and UDP-Lite [RFC3828] do 1418 not define a method in the RFC-series that supports PLPMTUD. In 1419 particular, the UDP transport does not provide the transport features 1420 needed to implement datagram PLPMTUD. 1422 The DPLPMTUD method can be implemented as a part of an application 1423 built directly or indirectly on UDP or UDP-Lite, but relies on 1424 higher-layer protocol features to implement the method [RFC8085]. 1426 Some primitives used by DPLPMTUD might not be available via the 1427 Datagram API (e.g., the ability to access the PLPMTU from the IP 1428 layer cache, or interpret received PTB messages). 1430 In addition, it is recommended that PMTU discovery is not performed 1431 by multiple protocol layers. An application SHOULD avoid using 1432 DPLPMTUD when the underlying transport system provides this 1433 capability. To use common method for managing the PLPMTU has 1434 benefits, both in the ability to share state between different 1435 processes and opportunities to coordinate probing. 1437 6.1.1. Application Request 1439 An application needs an application-layer protocol mechanism (such as 1440 a message acknowledgment method) that solicits a response from a 1441 destination endpoint. The method SHOULD allow the sender to check 1442 the value returned in the response to provide additional protection 1443 from off-path insertion of data [RFC8085]. Suitable methods include 1444 a parameter known only to the two endpoints, such as a session ID or 1445 initialized sequence number. 1447 6.1.2. Application Response 1449 An application needs an application-layer protocol mechanism to 1450 communicate the response from the destination endpoint. This 1451 response could indicate successful reception of the probe across the 1452 path, but could also indicate that some (or all packets) have failed 1453 to reach the destination. 1455 6.1.3. Sending Application Probe Packets 1457 A probe packet can carry an application data block, but the 1458 successful transmission of this data is at risk when used for 1459 probing. Some applications might prefer to use a probe packet that 1460 does not carry an application data block to avoid disruption to data 1461 transfer. 1463 6.1.4. Initial Connectivity 1465 An application that does not have other higher-layer information 1466 confirming connectivity with the remote peer SHOULD implement a 1467 connectivity mechanism using acknowledged probe packets before 1468 entering the BASE state. 1470 6.1.5. Validating the Path 1472 An application that does not have other higher-layer information 1473 confirming correct delivery of datagrams SHOULD implement the 1474 CONFIRMATION_TIMER to periodically send probe packets while in the 1475 SEARCH_COMPLETE state. 1477 6.1.6. Handling of PTB Messages 1479 An application that is able and wishes to receive PTB messages MUST 1480 perform ICMP validation as specified in Section 5.2 of [RFC8085]. 1481 This requires that the application checks each received PTB message 1482 to validate that it was is received in response to transmitted 1483 traffic and that the reported PL_PTB_SIZE is less than the current 1484 probed size (see Section 4.6.2). A validated PTB message MAY be used 1485 as input to the DPLPMTUD algorithm, but MUST NOT be used directly to 1486 set the PLPMTU. 1488 6.2. DPLPMTUD for SCTP 1490 Section 10.2 of [RFC4821] specified a recommended PLPMTUD probing 1491 method for SCTP and Section 7.3 of [RFC4960] and recommended an 1492 endpoint apply the techniques in RFC4821 on a per-destination-address 1493 basis. The specification for DPLPMTUD continues the practice of 1494 using the PL to discover the PMTU, but updates, RFC4960 with a 1495 recommendation to use the method specified in this document: The 1496 RECOMMENDED method for generating probes is to add a chunk consisting 1497 only of padding to an SCTP message. The PAD chunk defined in 1498 [RFC4820] SHOULD be attached to a minimum length HEARTBEAT (HB) chunk 1499 to build a probe packet. This enables probing without affecting the 1500 transfer of user messages and without being limited by congestion 1501 control or flow control. This is preferred to using DATA chunks 1502 (with padding as required) as path probes. 1504 Section 6.9 of [RFC4960] describes dividing the user messages into 1505 data chunks sent by the PL when using SCTP. This notes that once an 1506 SCTP message has been sent, it cannot be re-segmented. [RFC4960] 1507 describes the method to retransmit data chunks when the MPS has 1508 reduced, and the use of IP fragmentation for this case. 1510 6.2.1. SCTP/IPv4 and SCTP/IPv6 1512 6.2.1.1. Initial Connectivity 1514 The base protocol is specified in [RFC4960]. This provides an 1515 acknowledged PL. A sender can therefore enter the BASE state as soon 1516 as connectivity has been confirmed. 1518 6.2.1.2. Sending SCTP Probe Packets 1520 Probe packets consist of an SCTP common header followed by a 1521 HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to control 1522 the length of the probe packet. The HEARTBEAT chunk is used to 1523 trigger the sending of a HEARTBEAT ACK chunk. The reception of the 1524 HEARTBEAT ACK chunk acknowledges reception of a successful probe. A 1525 successful probe updates the association and path counters, but an 1526 unsuccessful probe is discounted (assumed to be a result of choosing 1527 too large a PLPMTU). 1529 The HEARTBEAT chunk carries a Heartbeat Information parameter which 1530 includes, besides the information suggested in [RFC4960], the probe 1531 size, which is the size of the complete datagram. The size of the 1532 PAD chunk is therefore computed by reducing the probing size by the 1533 IPv4 or IPv6 header size, the SCTP common header, the HEARTBEAT 1534 request and the PAD chunk header. The payload of the PAD chunk 1535 contains arbitrary data. 1537 Probing starts directly after the PL handshake, before data is sent. 1538 Assuming this behavior (i.e., the PMTU is smaller than or equal to 1539 the interface MTU), this process will take several round trip time 1540 periods, dependent on the number of DPLPMTUD probes sent. The 1541 Heartbeat timer can be used to implement the PROBE_TIMER. 1543 6.2.1.3. Validating the Path with SCTP 1545 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1546 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1548 6.2.1.4. PTB Message Handling by SCTP 1550 Normal ICMP validation MUST be performed as specified in Appendix C 1551 of [RFC4960]. This requires that the first 8 bytes of the SCTP 1552 common header are quoted in the payload of the PTB message, which can 1553 be the case for ICMPv4 and is normally the case for ICMPv6. 1555 When a PTB message has been validated, the PL_PTB_SIZE calculated 1556 from the PTB_SIZE reported in the PTB message SHOULD be used with the 1557 DPLPMTUD algorithm, providing that the reported PL_PTB_SIZE is less 1558 than the current probe size (see Section 4.6). 1560 6.2.2. DPLPMTUD for SCTP/UDP 1562 The UDP encapsulation of SCTP is specified in [RFC6951]. 1564 This specification updates the reference to RFC 4821 in section 5.6 1565 of RFC 6951 to refer to XXXTHISRFCXXX. RFC 6951 is updated by 1566 addition of the following sentence is to be added at the end of 1567 section 5.6: "The RECOMMENDED method for determining the MTU of the 1568 path is specified in XXXTHISRFCXXX". 1570 XXX RFC EDITOR - please replace XXXTHISRFCXXX when published XXX 1572 6.2.2.1. Initial Connectivity 1574 A sender can enter the BASE state as soon as SCTP connectivity has 1575 been confirmed. 1577 6.2.2.2. Sending SCTP/UDP Probe Packets 1579 Packet probing can be performed as specified in Section 6.2.1.2. The 1580 maximum payload is reduced by 8 bytes, which has to be considered 1581 when filling the PAD chunk. 1583 6.2.2.3. Validating the Path with SCTP/UDP 1585 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1586 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1588 6.2.2.4. Handling of PTB Messages by SCTP/UDP 1590 ICMP validation MUST be performed for PTB messages as specified in 1591 Appendix C of [RFC4960]. This requires that the first 8 bytes of the 1592 SCTP common header are contained in the PTB message, which can be the 1593 case for ICMPv4 (but note the UDP header also consumes a part of the 1594 quoted packet header) and is normally the case for ICMPv6. When the 1595 validation is completed, the PL_PTB_SIZE calculated from the PTB_SIZE 1596 in the PTB message SHOULD be used with the DPLPMTUD providing that 1597 the reported PL_PTB_SIZE is less than the current probe size. 1599 6.2.3. DPLPMTUD for SCTP/DTLS 1601 The Datagram Transport Layer Security (DTLS) encapsulation of SCTP is 1602 specified in [RFC8261]. This is used for data channels in WebRTC 1603 implementations. This specification updates the reference to RFC 1604 4821 in section 5 of RFC 8261 to refer to XXXTHISRFCXXX. 1606 XXX RFC EDITOR - please replace XXXTHISRFCXXX when published XXX 1608 6.2.3.1. Initial Connectivity 1610 A sender can enter the BASE state as soon as SCTP connectivity has 1611 been confirmed. 1613 6.2.3.2. Sending SCTP/DTLS Probe Packets 1615 Packet probing can be done, as specified in Section 6.2.1.2. 1617 6.2.3.3. Validating the Path with SCTP/DTLS 1619 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1620 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1622 6.2.3.4. Handling of PTB Messages by SCTP/DTLS 1624 [RFC4960] does not specify a way to validate SCTP/DTLS ICMP message 1625 payload. This can prevent processing of PTB messages at the PL. 1627 6.3. DPLPMTUD for QUIC 1629 QUIC [I-D.ietf-quic-transport] is a UDP-based transport that provides 1630 reception feedback. The UDP payload includes the QUIC packet header, 1631 protected payload, and any authentication fields. QUIC depends on a 1632 PMTU of at least 1280 bytes. 1634 Section 14 of [I-D.ietf-quic-transport] describes the path 1635 considerations when sending QUIC packets. It recommends the use of 1636 PADDING frames to build the probe packet. Pure probe-only packets 1637 are constructed with PADDING frames and PING frames to create a 1638 padding only packet that will elicit an acknowledgment. Such padding 1639 only packets enable probing without affecting the transfer of other 1640 QUIC frames. 1642 The recommendation for QUIC endpoints implementing DPLPMTUD is that a 1643 MPS is maintained for each combination of local and remote IP 1644 addresses [I-D.ietf-quic-transport]. If a QUIC endpoint determines 1645 that the PMTU between any pair of local and remote IP addresses has 1646 fallen below the size required for an acceptable MPS, it immediately 1647 ceases to send QUIC packets on the affected path. This could result 1648 in termination of the connection if an alternative path cannot be 1649 found [I-D.ietf-quic-transport]. 1651 6.3.1. Initial Connectivity 1653 The base protocol is specified in [I-D.ietf-quic-transport]. This 1654 provides an acknowledged PL. A sender can therefore enter the BASE 1655 state as soon as connectivity has been confirmed. 1657 6.3.2. Sending QUIC Probe Packets 1659 A probe packet consists of a QUIC Header and a payload containing 1660 PADDING Frames and a PING Frame. PADDING Frames are a single octet 1661 (0x00) and several of these can be used to create a probe packet of 1662 size PROBED_SIZE. QUIC provides an acknowledged PL, a sender can 1663 therefore enter the BASE state as soon as connectivity has been 1664 confirmed. 1666 The current specification of QUIC sets the following: 1668 * BASE_PLPMTU: A QUIC sender pads initial packets to confirm the 1669 path can support packets of the required size, which sets the 1670 BASE_PLPMTU and MIN_PLPMTU. 1672 * MIN_PLPMTU: A QUIC sender that determines the MIN_PLPMTU has 1673 fallen MUST immediately stop sending on the affected path. 1675 6.3.3. Validating the Path with QUIC 1677 QUIC provides an acknowledged PL. A sender therefore MUST NOT 1678 implement the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1680 6.3.4. Handling of PTB Messages by QUIC 1682 QUIC validates ICMP PTB messages. In addition to UDP Port 1683 validation, QUIC can validate an ICMP message by using other PL 1684 information (e.g., validation of connection identifiers (CIDs) in the 1685 quoted packet of any received ICMP message). 1687 7. Acknowledgments 1689 This work was partially funded by the European Union's Horizon 2020 1690 research and innovation programme under grant agreement No. 644334 1691 (NEAT). The views expressed are solely those of the author(s). 1693 Thanks to all that have commented or contributed, the TSVWG and QUIC 1694 working groups, and Mathew Calder and Julius Flohr for providing 1695 early implementations. 1697 8. IANA Considerations 1699 This memo includes no request to IANA. 1701 If there are no requirements for IANA, the section will be removed 1702 during conversion into an RFC by the RFC Editor. 1704 9. Security Considerations 1706 The security considerations for the use of UDP and SCTP are provided 1707 in the referenced RFCs. 1709 To avoid excessive load, the interval between individual probe 1710 packets MUST be at least one RTT, and the interval between rounds of 1711 probing is determined by the PMTU_RAISE_TIMER. 1713 A PL sender needs to ensure that the method used to confirm reception 1714 of probe packets protects from off-path attackers injecting packets 1715 into the path. This protection is provided in IETF-defined protocols 1716 (e.g., TCP, SCTP) using a randomly-initialized sequence number. A 1717 description of one way to do this when using UDP is provided in 1718 section 5.1 of [RFC8085]). 1720 There are cases where ICMP Packet Too Big (PTB) messages are not 1721 delivered due to policy, configuration or equipment design (see 1722 Section 1.1). This method therefore does not rely upon PTB messages 1723 being received, but is able to utilize these when they are received 1724 by the sender. PTB messages could potentially be used to cause a 1725 node to inappropriately reduce the PLPMTU. A node supporting 1726 DPLPMTUD MUST therefore appropriately validate the payload of PTB 1727 messages to ensure these are received in response to transmitted 1728 traffic (i.e., a reported error condition that corresponds to a 1729 datagram actually sent by the path layer, see Section 4.6.1). 1731 An on-path attacker able to create a PTB message could forge PTB 1732 messages that include a valid quoted IP packet. Such an attack could 1733 be used to drive down the PLPMTU. There are two ways this method can 1734 be mitigated against such attacks: First, by ensuring that a PL 1735 sender never reduces the PLPMTU below the base size, solely in 1736 response to receiving a PTB message. This is achieved by first 1737 entering the BASE state when such a message is received. Second, the 1738 design does not require processing of PTB messages, a PL sender could 1739 therefore suspend processing of PTB messages (e.g., in a robustness 1740 mode after detecting that subsequent probes actually confirm that a 1741 size larger than the PTB_SIZE is supported by a path). 1743 Parsing the quoted packet inside a PTB message can introduce addional 1744 per-packet processing at the PL sender. This processing SHOULD be 1745 limited to avoid a denial of service attack when arbitrary headers 1746 are included. Rate-limiting the processing could result in PTB 1747 messages not being received by a PL, however the DPLPMTUD method is 1748 robust to such loss. 1750 The successful processing of an ICMP message can trigger a probe when 1751 the reported PTB size is valid, but this does not directly update the 1752 PLPMTU for the path. This prevents a message attempting to black 1753 hole data by indicating a size larger than supported by the path. 1755 It is possible that the information about a path is not stable. This 1756 could be a result of forwarding across more than one path that has a 1757 different actual PMTU or a single path presents a varying PMTU. The 1758 design of a PLPMTUD implementation SHOULD consider how to mitigate 1759 the effects of varying path information. One possible mitigation is 1760 to provide robustness (see Section 5.4) in the method that avoids 1761 oscillation in the MPS. 1763 A node performing DPLPMTUD could experience conflicting information 1764 about the size of supported probe packets. This could occur when 1765 multiple paths are concurrently in use and these exhibit a different 1766 PMTU. If not considered, this could result in packets not being 1767 delivered (black holed) when the PLPMTU results in a packet larger 1768 than the smallest actual PMTU. 1770 DPLPMTUD methods can introduce padding data to inflate the length of 1771 the datagram to the total size required for a probe packet. The 1772 total size of a probe packet includes all headers and padding added 1773 to the payload data being sent (e.g., including security-related 1774 fields such as an AEAD tag and TLS record layer padding). The value 1775 of the padding data does not influence the DPLPMTUD search algorithm, 1776 and therefore needs to be set consistent with the policy of the PL. 1778 If a PL can make use of cryptographic confidentiality or data- 1779 integrity mechanisms, then the design ought to avoid adding anything 1780 (e.g., padding) to DPLPMTUD probe packets that is not also protected 1781 by those cryptographic mechanisms. 1783 10. References 1785 10.1. Normative References 1787 [I-D.ietf-quic-transport] 1788 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1789 and Secure Transport", Work in Progress, Internet-Draft, 1790 draft-ietf-quic-transport-27, 21 February 2020, 1791 . 1794 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1795 DOI 10.17487/RFC0768, August 1980, 1796 . 1798 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1799 DOI 10.17487/RFC0791, September 1981, 1800 . 1802 [RFC1191] Mogul, J.C. and S.E. Deering, "Path MTU discovery", 1803 RFC 1191, DOI 10.17487/RFC1191, November 1990, 1804 . 1806 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1807 Requirement Levels", BCP 14, RFC 2119, 1808 DOI 10.17487/RFC2119, March 1997, 1809 . 1811 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., 1812 and G. Fairhurst, Ed., "The Lightweight User Datagram 1813 Protocol (UDP-Lite)", RFC 3828, DOI 10.17487/RFC3828, July 1814 2004, . 1816 [RFC4820] Tuexen, M., Stewart, R., and P. Lei, "Padding Chunk and 1817 Parameter for the Stream Control Transmission Protocol 1818 (SCTP)", RFC 4820, DOI 10.17487/RFC4820, March 2007, 1819 . 1821 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1822 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1823 . 1825 [RFC6951] Tuexen, M. and R. Stewart, "UDP Encapsulation of Stream 1826 Control Transmission Protocol (SCTP) Packets for End-Host 1827 to End-Host Communication", RFC 6951, 1828 DOI 10.17487/RFC6951, May 2013, 1829 . 1831 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1832 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1833 March 2017, . 1835 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1836 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1837 May 2017, . 1839 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1840 (IPv6) Specification", STD 86, RFC 8200, 1841 DOI 10.17487/RFC8200, July 2017, 1842 . 1844 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1845 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1846 DOI 10.17487/RFC8201, July 2017, 1847 . 1849 [RFC8261] Tuexen, M., Stewart, R., Jesup, R., and S. Loreto, 1850 "Datagram Transport Layer Security (DTLS) Encapsulation of 1851 SCTP Packets", RFC 8261, DOI 10.17487/RFC8261, November 1852 2017, . 1854 10.2. Informative References 1856 [I-D.ietf-intarea-frag-fragile] 1857 Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O., 1858 and F. Gont, "IP Fragmentation Considered Fragile", Work 1859 in Progress, Internet-Draft, draft-ietf-intarea-frag- 1860 fragile-17, 30 September 2019, . 1863 [I-D.ietf-intarea-tunnels] 1864 Touch, J. and M. Townsley, "IP Tunnels in the Internet 1865 Architecture", Work in Progress, Internet-Draft, draft- 1866 ietf-intarea-tunnels-10, 12 September 2019, 1867 . 1870 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1871 RFC 792, DOI 10.17487/RFC0792, September 1981, 1872 . 1874 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1875 Communication Layers", STD 3, RFC 1122, 1876 DOI 10.17487/RFC1122, October 1989, 1877 . 1879 [RFC1812] Baker, F., Ed., "Requirements for IP Version 4 Routers", 1880 RFC 1812, DOI 10.17487/RFC1812, June 1995, 1881 . 1883 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 1884 RFC 2923, DOI 10.17487/RFC2923, September 2000, 1885 . 1887 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1888 Congestion Control Protocol (DCCP)", RFC 4340, 1889 DOI 10.17487/RFC4340, March 2006, 1890 . 1892 [RFC4443] Conta, A., Deering, S., and M. Gupta, Ed., "Internet 1893 Control Message Protocol (ICMPv6) for the Internet 1894 Protocol Version 6 (IPv6) Specification", STD 89, 1895 RFC 4443, DOI 10.17487/RFC4443, March 2006, 1896 . 1898 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1899 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 1900 . 1902 [RFC4890] Davies, E. and J. Mohacsi, "Recommendations for Filtering 1903 ICMPv6 Messages in Firewalls", RFC 4890, 1904 DOI 10.17487/RFC4890, May 2007, 1905 . 1907 [RFC5508] Srisuresh, P., Ford, B., Sivakumar, S., and S. Guha, "NAT 1908 Behavioral Requirements for ICMP", BCP 148, RFC 5508, 1909 DOI 10.17487/RFC5508, April 2009, 1910 . 1912 Appendix A. Revision Notes 1914 Note to RFC-Editor: please remove this entire section prior to 1915 publication. 1917 Individual draft -00: 1919 * Comments and corrections are welcome directly to the authors or 1920 via the IETF TSVWG working group mailing list. 1922 * This update is proposed for WG comments. 1924 Individual draft -01: 1926 * Contains the first representation of the algorithm, showing the 1927 states and timers 1929 * This update is proposed for WG comments. 1931 Individual draft -02: 1933 * Contains updated representation of the algorithm, and textual 1934 corrections. 1936 * The text describing when to set the effective PMTU has not yet 1937 been validated by the authors 1939 * To determine security to off-path-attacks: We need to decide 1940 whether a received PTB message SHOULD/MUST be validated? The text 1941 on how to handle a PTB message indicating a link MTU larger than 1942 the probe has yet not been validated by the authors 1944 * No text currently describes how to handle inconsistent results 1945 from arbitrary re-routing along different parallel paths 1947 * This update is proposed for WG comments. 1949 Working Group draft -00: 1951 * This draft follows a successful adoption call for TSVWG 1952 * There is still work to complete, please comment on this draft. 1954 Working Group draft -01: 1956 * This draft includes improved introduction. 1958 * The draft is updated to require ICMP validation prior to accepting 1959 PTB messages - this to be confirmed by WG 1961 * Section added to discuss Selection of Probe Size - methods to be 1962 evaluated and recommendations to be considered 1964 * Section added to align with work proposed in the QUIC WG. 1966 Working Group draft -02: 1968 * The draft was updated based on feedback from the WG, and a 1969 detailed review by Magnus Westerlund. 1971 * The document updates RFC 4821. 1973 * Requirements list updated. 1975 * Added more explicit discussion of a simpler black-hole detection 1976 mode. 1978 * This draft includes reorganisation of the section on IETF 1979 protocols. 1981 * Added more discussion of implementation within an application. 1983 * Added text on flapping paths. 1985 * Replaced 'effective MTU' with new term PLPMTU. 1987 Working Group draft -03: 1989 * Updated figures 1991 * Added more discussion on blackhole detection 1993 * Added figure describing just blackhole detection 1995 * Added figure relating MPS sizes 1997 Working Group draft -04: 1999 * Described phases and named these consistently. 2001 * Corrected transition from confirmation directly to the search 2002 phase (Base has been checked). 2004 * Redrawn state diagrams. 2006 * Renamed BASE_MTU to BASE_PMTU (because it is a base for the PMTU). 2008 * Clarified Error state. 2010 * Clarified suspending DPLPMTUD. 2012 * Verified normative text in requirements section. 2014 * Removed duplicate text. 2016 * Changed all text to refer to /packet probe/probe packet/ 2017 /validation/verification/ added term /Probe Confirmation/ and 2018 clarified BlackHole detection. 2020 Working Group draft -05: 2022 * Updated security considerations. 2024 * Feedback after speaking with Joe Touch helped improve UDP-Options 2025 description. 2027 Working Group draft -06: 2029 * Updated description of ICMP issues in section 1.1 2031 * Update to description of QUIC. 2033 Working group draft -07: 2035 * Moved description of the PTB processing method from the PTB 2036 requirements section. 2038 * Clarified what is performed in the PTB validation check. 2040 * Updated security consideration to explain PTB security without 2041 needing to read the rest of the document. 2043 * Reformatted state machine diagram 2045 Working group draft -08: 2047 * Moved to rfcxml v3+ 2048 * Rendered diagrams to svg in html version. 2050 * Removed Appendix A. Event-driven state changes. 2052 * Removed section on DPLPMTUD with UDP Options. 2054 * Shortened the description of phases. 2056 Working group draft -09: 2058 * Remove final mention of UDP Options 2060 * Add Initial Connectivity sections to each PL 2062 * Add to disable outgoing pmtu enforcement of packets 2064 Working group draft -10: 2066 * Address comments from Lars Eggert 2068 * Reinforce that PROBE_COUNT is successive attempts to probe for any 2069 size 2071 * Redefine MAX_PROBES to 3 2073 * Address PTB_SIZE of 0 or less that MIN_PLPMTU 2075 Working group draft -11: 2077 * Restore a sentence removed in previous rev 2079 * De-acronymise QUIC 2081 * Address some nits 2083 Working group draft -12: 2085 * Add TSVWG, QUIC and implementers to acknowledgments 2087 * Shorten a diagram line. 2089 * Address nits from Julius and Wes. 2091 * Be clearer when talking about IP layer caches 2093 Working group draft -13, -14: 2095 * Updated after WGLC. 2097 Working group draft -15: 2099 * Updated after AD evaluation and prepared for IETF-LC. 2101 Working group draft -16: 2103 * Updated text after SECDIR review. 2105 Working group draft -17: 2107 * Updated text after GENART and IETF-LC. 2109 * Renamed BASE_MTU to BASE_PLPMTU, and MIN and MAX PMTU to PLPMTU 2110 (because these are about a base for the PLPMTU), and ensured 2111 consistent separation of PMTU and PLPMTU. 2113 * Adopted US-style English throughout. 2115 Working group draft -18: 2117 * Updated text and address nits from OPSDIR, ART and IESG reviews. 2119 * Order PTB processing based on PL_PTB_SIZE 2121 Working group draft -19: 2123 * Updated text and address nits based on comments from Tim Chown and 2124 Murray S. Kucherawy. 2126 Authors' Addresses 2128 Godred Fairhurst 2129 University of Aberdeen 2130 School of Engineering 2131 Fraser Noble Building 2132 Aberdeen 2133 AB24 3UE 2134 United Kingdom 2136 Email: gorry@erg.abdn.ac.uk 2138 Tom Jones 2139 University of Aberdeen 2140 School of Engineering 2141 Fraser Noble Building 2142 Aberdeen 2143 AB24 3UE 2144 United Kingdom 2146 Email: tom@erg.abdn.ac.uk 2148 Michael Tuexen 2149 Muenster University of Applied Sciences 2150 Stegerwaldstrasse 39 2151 48565 Steinfurt 2152 Germany 2154 Email: tuexen@fh-muenster.de 2156 Irene Ruengeler 2157 Muenster University of Applied Sciences 2158 Stegerwaldstrasse 39 2159 48565 Steinfurt 2160 Germany 2162 Email: i.ruengeler@fh-muenster.de 2164 Timo Voelker 2165 Muenster University of Applied Sciences 2166 Stegerwaldstrasse 39 2167 48565 Steinfurt 2168 Germany 2170 Email: timo.voelker@fh-muenster.de