idnits 2.17.1 draft-ietf-tsvwg-datagram-plpmtud-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The abstract seems to indicate that this document updates RFC8201, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). (Using the creation date from RFC4821, updated by this document, for RFC5378 checks: 2003-10-21) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 20, 2018) is 1978 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-16 == Outdated reference: A later version (-32) exists of draft-ietf-tsvwg-udp-options-05 ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) == Outdated reference: A later version (-13) exists of draft-ietf-intarea-tunnels-09 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Fairhurst 3 Internet-Draft T. Jones 4 Updates: 4821 (if approved) University of Aberdeen 5 Intended status: Standards Track M. Tuexen 6 Expires: May 24, 2019 I. Ruengeler 7 Muenster University of Applied Sciences 8 November 20, 2018 10 Packetization Layer Path MTU Discovery for Datagram Transports 11 draft-ietf-tsvwg-datagram-plpmtud-06 13 Abstract 15 This document describes a robust method for Path MTU Discovery 16 (PMTUD) for datagram Packetization Layers (PLs). The document 17 describes an extension to RFC 1191 and RFC 8201, which specifies 18 ICMP-based Path MTU Discovery for IPv4 and IPv6. The method allows a 19 PL, or a datagram application that uses a PL, to discover whether a 20 network path can support the current size of datagram. This can be 21 used to detect and reduce the message size when a sender encounters a 22 network black hole (where packets are discarded, and no ICMP message 23 is received). The method can also probe a network path with 24 progressively larger packets to find whether the maximum packet size 25 can be increased. This allows a sender to determine an appropriate 26 packet size, providing functionally for datagram transports that is 27 equivalent to the Packetization layer PMTUD specification for TCP, 28 specified in RFC 4821. 30 The document also provides implementation notes for incorporating 31 Datagram PMTUD into IETF datagram transports or applications that use 32 datagram transports. 34 When published, this specification updates RFC 4821. 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at https://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on May 24, 2019. 53 Copyright Notice 55 Copyright (c) 2018 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (https://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 71 1.1. Classical Path MTU Discovery . . . . . . . . . . . . . . 4 72 1.2. Packetization Layer Path MTU Discovery . . . . . . . . . 6 73 1.3. Path MTU Discovery for Datagram Services . . . . . . . . 7 74 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 75 3. Features Required to Provide Datagram PLPMTUD . . . . . . . . 9 76 4. DPLPMTUD Mechanisms . . . . . . . . . . . . . . . . . . . . . 12 77 4.1. PLPMTU Probe Packets . . . . . . . . . . . . . . . . . . 12 78 4.2. Confirmation of Probed Packet Size . . . . . . . . . . . 13 79 4.3. Detection of Black Holes . . . . . . . . . . . . . . . . 14 80 4.4. Response to PTB Messages . . . . . . . . . . . . . . . . 14 81 4.4.1. Validation of PTB Messages . . . . . . . . . . . . . 15 82 4.4.2. Use of PTB Messages . . . . . . . . . . . . . . . . . 15 83 5. Datagram Packetization Layer PMTUD . . . . . . . . . . . . . 16 84 5.1. DPLPMTUD Components . . . . . . . . . . . . . . . . . . . 17 85 5.1.1. Timers . . . . . . . . . . . . . . . . . . . . . . . 17 86 5.1.2. Constants . . . . . . . . . . . . . . . . . . . . . . 18 87 5.1.3. Variables . . . . . . . . . . . . . . . . . . . . . . 19 88 5.2. DPLPMTUD Phases . . . . . . . . . . . . . . . . . . . . . 19 89 5.2.1. Path Confirmation Phase . . . . . . . . . . . . . . . 21 90 5.2.2. Search Phase . . . . . . . . . . . . . . . . . . . . 21 91 5.2.2.1. Resilience to inconsistent path information . . . 22 92 5.2.3. Search Complete Phase . . . . . . . . . . . . . . . . 22 93 5.2.4. PROBE_BASE Phase . . . . . . . . . . . . . . . . . . 23 94 5.2.5. ERROR Phase . . . . . . . . . . . . . . . . . . . . . 23 95 5.2.5.1. Robustness to inconsistent path . . . . . . . . . 23 97 5.2.6. DISABLED Phase . . . . . . . . . . . . . . . . . . . 24 98 5.3. State Machine . . . . . . . . . . . . . . . . . . . . . . 24 99 5.4. Search to Increase the PLPMTU . . . . . . . . . . . . . . 27 100 5.4.1. Probing for a larger PLPMTU . . . . . . . . . . . . . 27 101 5.4.2. Selection of Probe Sizes . . . . . . . . . . . . . . 28 102 5.4.3. Resilience to inconsistent Path information . . . . . 29 103 6. Specification of Protocol-Specific Methods . . . . . . . . . 29 104 6.1. Application support for DPLPMTUD with UDP or UDP-Lite . . 29 105 6.1.1. Application Request . . . . . . . . . . . . . . . . . 30 106 6.1.2. Application Response . . . . . . . . . . . . . . . . 30 107 6.1.3. Sending Application Probe Packets . . . . . . . . . . 30 108 6.1.4. Validating the Path . . . . . . . . . . . . . . . . . 30 109 6.1.5. Handling of PTB Messages . . . . . . . . . . . . . . 30 110 6.2. DPLPMTUD with UDP Options . . . . . . . . . . . . . . . . 31 111 6.2.1. UDP Probe Request Option . . . . . . . . . . . . . . 32 112 6.2.2. UDP Probe Response Option . . . . . . . . . . . . . . 33 113 6.3. DPLPMTUD for SCTP . . . . . . . . . . . . . . . . . . . . 33 114 6.3.1. SCTP/IPv4 and SCTP/IPv6 . . . . . . . . . . . . . . . 33 115 6.3.1.1. Sending SCTP Probe Packets . . . . . . . . . . . 33 116 6.3.1.2. Validating the Path with SCTP . . . . . . . . . . 34 117 6.3.1.3. PTB Message Handling by SCTP . . . . . . . . . . 34 118 6.3.2. DPLPMTUD for SCTP/UDP . . . . . . . . . . . . . . . . 34 119 6.3.2.1. Sending SCTP/UDP Probe Packets . . . . . . . . . 34 120 6.3.2.2. Validating the Path with SCTP/UDP . . . . . . . . 35 121 6.3.2.3. Handling of PTB Messages by SCTP/UDP . . . . . . 35 122 6.3.3. DPLPMTUD for SCTP/DTLS . . . . . . . . . . . . . . . 35 123 6.3.3.1. Sending SCTP/DTLS Probe Packets . . . . . . . . . 35 124 6.3.3.2. Validating the Path with SCTP/DTLS . . . . . . . 35 125 6.3.3.3. Handling of PTB Messages by SCTP/DTLS . . . . . . 35 126 6.4. DPLPMTUD for QUIC . . . . . . . . . . . . . . . . . . . . 35 127 6.4.1. Sending QUIC Probe Packets . . . . . . . . . . . . . 36 128 6.4.2. Validating the Path with QUIC . . . . . . . . . . . . 36 129 6.4.3. Handling of PTB Messages by QUIC . . . . . . . . . . 36 130 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 37 131 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 132 9. Security Considerations . . . . . . . . . . . . . . . . . . . 37 133 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 134 10.1. Normative References . . . . . . . . . . . . . . . . . . 38 135 10.2. Informative References . . . . . . . . . . . . . . . . . 39 136 Appendix A. Event-driven state changes . . . . . . . . . . . . . 40 137 Appendix B. Revision Notes . . . . . . . . . . . . . . . . . . . 43 138 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 45 140 1. Introduction 142 The IETF has specified datagram transport using UDP, SCTP, and DCCP, 143 as well as protocols layered on top of these transports (e.g., SCTP/ 144 UDP, DCCP/UDP, QUIC/UDP), and direct datagram transport over the IP 145 network layer. This document describes a robust method for Path MTU 146 Discovery (PMTUD) that may be used with these transport protocols (or 147 the applications that use their transport service) to discover an 148 appropriate size of packet to use across an Internet path. 150 1.1. Classical Path MTU Discovery 152 Classical Path Maximum Transmission Unit Discovery (PMTUD) can be 153 used with any transport that is able to process ICMP Packet Too Big 154 (PTB) messages (e.g., [RFC1191] and [RFC8201]). The term PTB message 155 is applied to both IPv4 ICMP Unreachable messages (type 3) that carry 156 the error Fragmentation Needed (Type 3, Code 4) [RFC0792] and ICMPv6 157 packet too big messages (Type 2) [RFC4443]. When a sender receives a 158 PTB message, it reduces the effective MTU to the value reported as 159 the Link MTU in the PTB message, and a method that from time-to-time 160 increases the packet size in attempt to discover an increase in the 161 supported PMTU. The packets sent with a size larger than the current 162 effective PMTU are known as probe packets. 164 Packets not intended as probe packets are either fragmented to the 165 current effective PMTU, or the attempt to send fails with an error 166 code. Applications are sometimes provided with a primitive to let 167 them read the Maximum Packet Size (MPS), derived from the current 168 effective PMTU. 170 Classical PMTUD is subject to protocol failures. One failure arises 171 when traffic using a packet size larger than the actual PMTU is 172 black-holed (all datagrams sent with this size, or larger, are 173 silently discarded without the sender receiving ICMP PTB messages). 174 This could arise when the PTB messages are not delivered back to the 175 sender for some reason [RFC2923]). 177 Examples where PTB messages are not delivered include: 179 o The generation of ICMP messages is usually rate limited. This may 180 result in no PTB messages being sent to the sender (see section 181 2.4 of [RFC4443] 183 o ICMP messages are increasingly filtered by middleboxes (including 184 firewalls) [RFC4890]. A stateful firewall could be configured 185 with a policy to block incoming ICMP messages, which would prevent 186 reception of PTB messages to endpoints behind this firewall. 188 o When the router issuing the ICMP message drops a tunneled packet, 189 the resulting ICMP message will be directed to the tunnel ingress. 190 This tunnel endpoint is responsible for forwarding the ICMP 191 message and also processing the quoted packet within the payload 192 field to remove the effect of the tunnel, and return a correctly 193 formatted ICMP message to the sender [I-D.ietf-intarea-tunnels]. 194 Failure to do this results in black-holing. 196 o Asymmetry in forwarding can result in there being no route back to 197 the original sender, which would prevent an ICMP message being 198 delivered to the sender. This can be also be an issue when 199 policy-based routing is used, Equal Cost Multipath (ECMP) routing 200 is used, or a middlebox acts as an application load balancer. An 201 example is where the path towards the server is chosen by ECMP 202 routing depending on bytes in the IP payload. In this case, when 203 a packet sent by the server encounters a problem after the ECMP 204 router, then any resulting ICMP message needs to also be directed 205 by the ECMP router towards the same server (i.e., ICMP messages 206 need to follow the same path as the flows to which they 207 correspond). Failure to do this results in black-holing. 209 o There are cases where the next hop destination fails to receive a 210 packet because of its size. This could be due to misconfiguration 211 of the layer 2 path between nodes, for instance the MTU configured 212 in a layer 2 switch, or misconfiguration of the Maximum Receive 213 Unit (MRU). If the packet is dropped by the link, this will not 214 cause in a PTB to be sent, and result in consequent black-holing. 216 Another failure could result if a node that is not on the network 217 path sends a PTB message that attempts to force the sender to change 218 the effective PMTU [RFC8201]. A sender can protect itself from 219 reacting to such messages by utilising the quoted packet within a PTB 220 message payload to validate that the received PTB message was 221 generated in response to a packet that had actually originated from 222 the sender. However, there are situations where a sender would be 223 unable to provide this validation. 225 Examples where validation of the PTB message is not possible include: 227 o When a router issuing the ICMP message implements RFC792 228 [RFC0792], it is only required to include the first 64 bits of the 229 IP payload of the packet within the quoted payload. This may be 230 insufficient to perform the tunnel processing described in the 231 previous bullet. There could be insufficient bytes remaining for 232 the sender to interpret the quoted transport information. The 233 recommendation in RFC1812 [RFC1812] is that IPv4 routers return a 234 quoted packet with as much of the original datagram as possible 235 without the length of the ICMP datagram exceeding 576 bytes. 236 (IPv6 routers include as much of invoking packet as possible 237 without the ICMPv6 packet exceeding 1280 bytes [RFC4443].) 239 o The use of tunnels/encryption can reduce the size of the quoted 240 packet returned to the original source address, increasing the 241 risk that there could be insufficient bytes remaining for the 242 sender to interpret the quoted transport information. 244 o Even when the PTB message includes sufficient bytes of the quoted 245 packet, the network layer could lack sufficient context to 246 validate the message, because validation depends on information 247 about the active transport flows at an endpoint node (e.g., the 248 socket/address pairs being used, and other protocol header 249 information). 251 o When a packet is encapsulated/tunneled over an encrypted 252 transport, the tunnel/encapsulation ingress might have 253 insufficient context, or computational power, to reconstruct the 254 transport header that would be needed to perform validation. 256 1.2. Packetization Layer Path MTU Discovery 258 The term Packetization Layer (PL) has been introduced to describe the 259 layer that is responsible for placing data blocks into the payload of 260 IP packets and selecting an appropriate MPS. This function is often 261 performed by a transport protocol, but can also be performed by other 262 encapsulation methods working above the transport layer. 264 In contrast to PMTUD, Packetization Layer Path MTU Discovery 265 (PLPMTUD) [RFC4821] does not rely upon reception and validation of 266 PTB messages. It is therefore more robust than Classical PMTUD. 267 This has become the recommended approach for implementing PMTU 268 discovery with TCP. 270 It uses a general strategy where the PL sends probe packets to search 271 for the largest size of unfragmented datagram that can be sent over a 272 network path. The probe packets are sent with a progressively larger 273 packet size. If a probe packet is successfully delivered (as 274 determined by the PL), then the PLPMTU is raised to the size of the 275 successful probe. If no response is received to a probe packet, the 276 method reduces the probe size. This PLPMTU is used to set the 277 application MPS. 279 PLPMTUD introduces flexibility in the implementation of PMTU 280 discovery. At one extreme, it can be configured to only perform PTB 281 black hole detection and recovery to increase the robustness of 282 Classical PMTUD, or at the other extreme, all PTB processing can be 283 disabled and PLPMTUD can completely replace Classical PMTUD. 285 PLPMTUD can also include additional consistency checks without 286 increasing the risk of increased black-holing. For instance,the 287 information available at the PL, or higher layers, makes PTB 288 validation more straight forward. 290 1.3. Path MTU Discovery for Datagram Services 292 Section 5 of this document presents a set of algorithms for datagram 293 protocols to discover the largest size of unfragmented datagram that 294 can be sent over a network path. The method described relies on 295 features of the PL described in Section 3 and applies to transport 296 protocols operating over IPv4 and IPv6. It does not require 297 cooperation from the lower layers, although it can utilise ICMP PTB 298 messages when these received messages are made available to the PL. 300 The UDP Usage Guidelines [RFC8085] state "an application SHOULD 301 either use the Path MTU information provided by the IP layer or 302 implement Path MTU Discovery (PMTUD)", but does not provide a 303 mechanism for discovering the largest size of unfragmented datagram 304 that can be used on a network path. Prior to this document, PLPMTUD 305 had not been specified for UDP. 307 Section 10.2 of [RFC4821] recommends a PLPMTUD probing method for the 308 Stream Control Transport Protocol (SCTP). SCTP utilises heartbeat 309 messages as probe packets, but RFC4821 does not provide a complete 310 specification. The present document provides the details to complete 311 that specification. 313 The Datagram Congestion Control Protocol (DCCP) [RFC4340] requires 314 implementations to support Classical PMTUD and states that a DCCP 315 sender "MUST maintain the MPS allowed for each active DCCP session". 316 It also defines the current congestion control MPS (CCMPS) supported 317 by a network path. This recommends use of PMTUD, and suggests use of 318 control packets (DCCP-Sync) as path probe packets, because they do 319 not risk application data loss. The method defined in this 320 specification could be used with DCCP. 322 Section 6 specifies the method for a set of transports, and provides 323 information to enable the implementation of PLPMTUD with other 324 datagram transports and applications that use datagram transports. 326 2. Terminology 328 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 329 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 330 "OPTIONAL" in this document are to be interpreted as described in BCP 331 14 [RFC2119] [[RFC8174]] when, and only when, they appear in all 332 capitals, as shown here. 334 Other terminology is directly copied from [RFC4821], and the 335 definitions in [RFC1122]. 337 Actual PMTU: The Actual PMTU is the PMTU of a network path between a 338 sender PL and a destination PL, which the DPLPMTUD algorithm seeks 339 to determine. 341 Black Holed: Packets are Black holed when the sender is unaware that 342 packets are not delivered to the destination endpoint (e.g., when 343 the sender transmits packets of a particular size with a 344 previously known effective PMTU and they are silently discarded by 345 the network, but is not made aware of a change to the path that 346 resulted in a smaller PLPMTU by ICMP messages). 348 Classical Path MTU Discovery: Classical PMTUD is a process described 349 in [RFC1191] and [RFC8201], in which nodes rely on PTB messages to 350 learn the largest size of unfragmented datagram that can be used 351 across a network path. 353 Datagram: A datagram is a transport-layer protocol data unit, 354 transmitted in the payload of an IP packet. 356 Effective PMTU: The Effective PMTU is the current estimated value 357 for PMTU that is used by a PMTUD. This is equivalent to the 358 PLPMTU derived by PLPMTUD. 360 EMTU_S: The Effective MTU for sending (EMTU_S) is defined in 361 [RFC1122] as "the maximum IP datagram size that may be sent, for a 362 particular combination of IP source and destination addresses...". 364 EMTU_R: The Effective MTU for receiving (EMTU_R) is designated in 365 [RFC1122] as the largest datagram size that can be reassembled by 366 EMTU_R ("Effective MTU to receive"). 368 Link: A Link is a communication facility or medium over which nodes 369 can communicate at the link layer, i.e., a layer below the IP 370 layer. Examples are Ethernet LANs and Internet (or higher) layer 371 and tunnels. 373 Link MTU: The Link Maximum Transmission Unit (MTU) is the size in 374 bytes of the largest IP packet, including the IP header and 375 payload, that can be transmitted over a link. Note that this 376 could more properly be called the IP MTU, to be consistent with 377 how other standards organizations use the acronym. This includes 378 the IP header, but excludes link layer headers and other framing 379 that is not part of IP or the IP payload. Other standards 380 organizations generally define the link MTU to include the link 381 layer headers. 383 MPS: The Maximum Packet Size (MPS) is the largest size of 384 application data block that can be sent across a network path. In 385 DPLPMTUD this quantity is derived from the PLPMTU by taking into 386 consideration the size of the lower protocol layer headers. 388 MIN_PMTU: The MIN_PMTU is the smallest size of PLPMTU that DPLPTMUD 389 will attempt to use. 391 Packet: A Packet is the IP header plus the IP payload. 393 Packetization Layer (PL): The Packetization Layer (PL) is the layer 394 of the network stack that places data into packets and performs 395 transport protocol functions. 397 Path: The Path is the set of links and routers traversed by a packet 398 between a source node and a destination node by a particular flow. 400 Path MTU (PMTU): The Path MTU (PMTU) is the minimum of the Link MTU 401 of all the links forming a network path between a source node and 402 a destination node. 404 PTB_SIZE: The PTB_SIZE is a value reported in a validated PTB 405 message that indicates next hop link MTU of a router along the 406 path. 408 PLPMTU: The Packetization Layer PMTU is an estimate of the actual 409 PMTU provided by the DPLPMTUD algorithm. 411 PLPMTUD: Packetization Layer Path MTU Discovery (PLPMTUD), the 412 method described in this document for datagram PLs, which is an 413 extension to Classical PMTU Discovery. 415 Probe packet: A probe packet is a datagram sent with a purposely 416 chosen size (typically the current PLPMTU or larger) to detect if 417 packets of this size can be successfully sent end-to-end across 418 the network path. 420 3. Features Required to Provide Datagram PLPMTUD 422 TCP PLPMTUD has been defined using standard TCP protocol mechanisms. 423 All of the requirements in [RFC4821] also apply to the use of the 424 technique with a datagram PL. Unlike TCP, some datagram PLs require 425 additional mechanisms to implement PLPMTUD. 427 There are eight requirements for performing the datagram PLPMTUD 428 method described in this specification: 430 1. PMTU parameters: A DPLPMTUD sender is RECOMMENDED to provide 431 information about the maximum size of packet that can be 432 transmitted by the sender on the local link (the local Link MTU). 434 It MAY utilize similar information about the receiver when this 435 is supplied (note this could be less than EMTU_R). This avoids 436 implementations trying to send probe packets that can not be 437 transmitted by the local link. Too high of a value could reduce 438 the efficiency of the search algorithm. Some applications also 439 have a maximum transport protocol data unit (PDU) size, in which 440 case there is no benefit from probing for a size larger than this 441 (unless a transport allows multiplexing multiple applications 442 PDUs into the same datagram). 444 2. PLPMTU: A datagram application is REQUIRED to be able to choose 445 the size of datagrams sent to the network, up to the PLPMTU, or a 446 smaller value (such as the MPS) derived from this. This value is 447 managed by the DPLPMTUD method. The PLPMTU (specified as the 448 effective PMTU in Section 1 of [RFC1191]) is equivalent to the 449 EMTU_S (specified in [RFC1122]). 451 3. Probe packets: On request, a DPLPMTUD sender is REQUIRED to be 452 able to transmit a packet larger than the PLMPMTU. This is used 453 to send a probe packet. In IPv4, a probe packet MUST be sent 454 with the Don't Fragment (DF) bit set in the IP header, and 455 without network layer endpoint fragmentation. In IPv6, a probe 456 packet is always sent without source fragmentation (as specified 457 in section 5.4 of [RFC8201]). 459 4. Processing PTB messages: A DPLPMTUD sender MAY optionally utilize 460 PTB messages received from the network layer to help identify 461 when a network path does not support the current size of probe 462 packet. Any received PTB message MUST be validated before it is 463 used to update the PLPMTU discovery information [RFC8201]. This 464 validation confirms that the PTB message was sent in response to 465 a packet originating by the sender, and needs to be performed 466 before the PLPMTU discovery method reacts to the PTB message. 467 When the PTB_SIZE is indicated in the PTB message, this MAY be 468 used by DPLPMTUD to reduce the probe size but MUST NOT be used to 469 increase the PLPMTU ([RFC8201]). This validation SHOULD utilise 470 information that can not be simply determined by an off-path 471 attacker, for example, by checking the value of a protocol header 472 field known only to the two PL endpoints. (Some datagram 473 applications use well-known source and destination ports and 474 therefore this check needs to rely on other information.) 476 5. Reception feedback: The destination PL endpoint is REQUIRED to 477 provide a feedback method that indicates to the DPLPMTUD sender 478 when a probe packet has been received by the destination PL 479 endpoint. The mechanism needs to be robust to the possibility 480 that packets could be significantly delayed along a network path. 482 The local PL endpoint at the sending node is REQUIRED to pass 483 this feedback to the sender-side DPLPMTUD method. 485 6. Probing and congestion control: The isolated loss of a probe 486 packet SHOULD NOT be treated as an indication of congestion and 487 its loss SHOULD NOT directly trigger a congestion control 488 reaction [RFC4821]. 490 7. Probe loss recovery: If the data block carried by a probe packet 491 needs to be sent reliably, the PL (or layers above) are REQUIRED 492 to arrange any retransmission/repair of any resulting loss. This 493 method is REQUIRED to be robust in the case where probe packets 494 are lost due to other reasons (including link transmission error, 495 congestion). The DPLPMTUD sender treats isolated loss of a probe 496 packet (with or without an PTB message) as a potential indication 497 of a PMTU limit for the path, but not as an indication of 498 congestion, see Paragraph 6. 500 8. Shared PLPMTU state: The PLPMTU value could also be stored with 501 the corresponding entry in the destination cache and used by 502 other PL instances. The specification of PLPMTUD [RFC4821] 503 states: "If PLPMTUD updates the MTU for a particular path, all 504 Packetization Layer sessions that share the path representation 505 (as described in Section 5.2 of [RFC4821]) SHOULD be notified to 506 make use of the new MTU and make the required congestion control 507 adjustments". Such methods MUST be robust to the wide variety of 508 underlying network forwarding behaviours, PLPMTU adjustments 509 based on shared PLPMTU values should be incorporated in the 510 search algorithms. Section 5.2 of [RFC8201] provides guidance on 511 the caching of PMTU information and also the relation to IPv6 512 flow labels. 514 In addition, the following principles are stated for design of a 515 DPLPMTUD method: 517 o MPS: A method is REQUIRED to signal an appropriate MPS to the 518 higher layer using the PL. The value of the MPS can change 519 following a change to the path. It is RECOMMENDED that methods 520 avoid forcing an application to use an arbitrary small MPS 521 (PLPMTU) for transmission while the method is searching for the 522 currently supported PLPMTU. Datagram PLs do not necessarily 523 support fragmentation of PDUs larger than the PLPMTU. A reduced 524 MPS can adversely impact the performance of a datagram 525 application. 527 o Path validation: It is RECOMMENDED that methods are robust to path 528 changes that could have occurred since the path characteristics 529 were last confirmed, and to the possibility of inconsistent path 530 information being received. 532 o Datagram reordering: A method is REQUIRED to be robust to the 533 possibility that a flow encounters reordering, or the traffic 534 (including probe packets) is divided over more than one network 535 path. 537 o When to probe: It is RECOMMENDED that methods determine whether 538 the path capacity has increased since it last measured the path. 539 This determines when the path should again be probed. 541 4. DPLPMTUD Mechanisms 543 This section lists the protocol mechanisms used in this 544 specification. 546 4.1. PLPMTU Probe Packets 548 The DPLPMTUD method relies upon the PL sender being able to generate 549 probe packets with a specific size. TCP is able to generate these 550 probe packets by choosing to appropriately segment data being sent 551 [RFC4821]. In contrast, a datagram PL that needs to construct a 552 probe packet has to either request an application to send a data 553 block that is larger than that generated by an application, or to 554 utilise padding functions to extend a datagram beyond the size of the 555 application data block. Protocols that permit exchange of control 556 messages (without an application data block) could alternatively 557 prefer to generate a probe packet by extending a control message with 558 padding data. 560 A receiver needs to be able to distinguish an in-band data block from 561 any added padding. This is needed to ensure that any added padding 562 is not passed on to an application at the receiver. 564 This results in three possible ways that a sender can create a probe 565 packet listed in order of preference: 567 Probing using padding data: A probe packet that contains only 568 control information together with any padding, which is needed to 569 be inflated to the size required for the probe packet. Since 570 these probe packets do not carry an application-supplied data 571 block, they do not typically require retransmission, although they 572 do still consume network capacity and incur endpoint processing. 574 Probing using application data and padding data: A probe packet that 575 contains a data block supplied by an application that is combined 576 with padding to inflate the length of the datagram to the size 577 required for the probe packet. If the application/transport needs 578 protection from the loss of this probe packet, the application/ 579 transport could perform transport-layer retransmission/repair of 580 the data block (e.g., by retransmission after loss is detected or 581 by duplicating the data block in a datagram without the padding 582 data). 584 Probing using application data: A probe packet that contains a data 585 block supplied by an application that matches the size required 586 for the probe packet. This method requests the application to 587 issue a data block of the desired probe size. If the application/ 588 transport needs protection from the loss of an unsuccessful probe 589 packet, the application/transport needs then to perform transport- 590 layer retransmission/repair of the data block (e.g., by 591 retransmission after loss is detected). 593 A PL that uses a probe packet carrying an application data block, 594 could need to retransmit this application data block if the probe 595 fails. This could need the PL to re-fragment the data block to a 596 smaller packet size that is expected to traverse the end-to-end path 597 (which could utilise endpoint network-layer or PL fragmentation when 598 these are available). 600 DPLPMTUD MAY choose to use only one of these methods to simplify the 601 implementation. 603 Probe messages sent by a PL MUST contain enough information to 604 uniquely identify the probe within Maximum Segment Lifetime, while 605 being robust to reordering and replay of probe response and ICMP PTB 606 messages. 608 4.2. Confirmation of Probed Packet Size 610 The PL needs a method to determine (confirm) when probe packets have 611 been successfully received end-to-end across a network path. 613 Transport protocols can include end-to-end methods that detect and 614 report reception of specific datagrams that they send (e.g., DCCP and 615 SCTP provide keep-alive/heartbeat features). When supported, this 616 mechanism SHOULD also be used by DPLPMTUD to acknowledge reception of 617 a probe packet. 619 A PL that does not acknowledge data reception (e.g., UDP and UDP- 620 Lite) is unable itself to detect when the packets that it sends are 621 discarded because their size is greater than the actual PMTU. These 622 PLs need to either rely on an application protocol to detect this 623 loss, or make use of an additional transport method such as UDP- 624 Options [I-D.ietf-tsvwg-udp-options]. 626 Section Section 5 specifies this function for a set of IETF-specified 627 protocols. 629 4.3. Detection of Black Holes 631 A PL sender needs to reduce the PLPMTU when it discovers the actual 632 PMTU supported by a network path is less than the PLPMTU (i.e. to 633 detect that traffic is being black holed). This can be triggered 634 when a validated PTB message is received, or by another event that 635 indicates the network path no longer sustains the current packet 636 size, such as a loss report from the PL or repeated lack of response 637 to probe packets sent to confirm the PLPMTU. Detection is followed 638 by a reduction of the PLPMTU. 640 Black Hole detection is performed by periodically sending packet 641 probes of size PLPMTU to verify that a network path still supports 642 the last acknowledged PLPMTU size. There are two ways a DPLPMTUD 643 sender detect that the current PLPMTU is not sustained by the path 644 (i.e., to detect a black hole): 646 o A PL can rely upon a mechanisms implemented within the PL protocol 647 to detect excessive loss of data sent with a specific packet size 648 and then conclude that this excessive loss could be a result of an 649 invalid PMTU (as in PLPMTUD for TCP [RFC4821]). 651 o A PL can use the probing mechanism to send confirmation probe 652 packets of the size of the current PLPMTU and a timer track 653 whether acknowledgments are received (e.g., The number of probe 654 packets sent without receiving an acknowledgement, PROBE_COUNT, 655 becomes greater than the MAX_PROBES). These messages need to be 656 generated periodically (e.g., using the confirmation timer 657 Section 5.1.1), and should be suppressed when the PL is not 658 actively sending data. Successive loss of probes is an indication 659 that the current path no longer supports the PLPMTU. 661 When the method detects the current PLPMTU is not supported (a black 662 hole is found), DPLPMTUD sets a lower MPS. The PL then confirms that 663 the updated PLPMTU can be successfully used across the path. This 664 can need the PL to send a probe packet with a size less than the size 665 of the data block generated by an application. In this case, the PL 666 could provide a way to fragment a datagram at the PL, or could 667 instead utilise a control packet with padding. 669 4.4. Response to PTB Messages 671 This method requires the DPLPMTUD sender to validate any received PTB 672 message before using the PTB information. The response to a PTB 673 message depends on the PTB_SIZE indicated in the PTB message, the 674 state of the PLPMTUD state machine, and the IP protocol being used. 676 Section 4.4.1 first describes validation for both IPv4 ICMP 677 Unreachable messages (type 3) and ICMPv6 packet too big messages, 678 both of which are referred to as PTB messages in this document. 680 4.4.1. Validation of PTB Messages 682 A PL that receives a PTB message from a router or middlebox, MUST 683 perform ICMP validation as specified in Section 5.2 of [RFC8085]. 684 This needs the PL to check the protocol information in the quoted 685 payload to validate the message originated from the sending node. 686 This check includes determining the appropriate port and IP 687 information - necessary for the PTB message to be passed to the PL. 688 In addition, the PL SHOULD validate information from the ICMP payload 689 to determine that the quoted packet was sent by the PL. These checks 690 are intended to provide protection from packets that originate from a 691 node that is not on the network path. PTB messages are discarded if 692 they fail to pass these checks, or where there is insufficient ICMP 693 payload to perform the checks 695 PTB messages that have been validated can be utilised by the DPLPMTUD 696 algorithm. A method that utilises these PTB messages can improve the 697 speed at the which the algorithm detects an appropriate PLPMTU, 698 compared to one that relies solely on probing. 700 4.4.2. Use of PTB Messages 702 A set of checks are intended to provide protection from a router that 703 reports an unexpected PTB_SIZE. The PL needs to check that the 704 indicated PTB_SIZE is less than the size used by probe packets and 705 larger than minimum size accepted. 707 This section provides an informative summary of how PTB messages can 708 be utilised. 710 Validating PTB Messages: 712 * A simple implementation is permitted to ignore received PTB 713 messages and therefore the PLPMTU is not updated when a PTB 714 message is received. 716 * An implementation that supports PTB messages MUST validate 717 messages before they are processed. 719 MIN_PMTU < PTB_SIZE < BASE_MTU 720 * A robust PL MAY enter the PROBE_ERROR state for an IPv4 path 721 when the PTB_SIZE reported in the PTB message >= 576B and when 722 this is less than the BASE_MTU. 724 * A robust PL MAY enter the PROBE_ERROR state for an IPv6 path 725 when the PTB_SIZE reported in the PTB message >= 1280B and when 726 this is less than the BASE_MTU. 728 PTB_SIZE = PLPMTU 730 * Transition to SEARCH_COMPLETE. 732 PTB_SIZE > PROBED_SIZE 734 * The PTB_SIZE > PROBED_SIZE, inconsistent network signal. These 735 PTB messages ought to be discarded without further processing 736 (the PLPMTU not updated). 738 * The information could be utilised as an input to trigger 739 enabling a resilience mode. 741 BASE_PMTU <= PTB_SIZE < PLPMTU 743 * Black hole detection is triggered and the PLPMTU ought to be 744 set to BASE_PMTU. 746 * The PL could use PTB_SIZE reported in the PTB message to 747 initialise a search algorithm. 749 PLPMTU < PTB_SIZE < PROBED_SIZE 751 * The PLPMTU continues to be valid, but the last PROBED_SIZE 752 searched was larger than the actual PMTU. 754 * The PLPMTU is not updated. 756 * The PL can use the reported PTB_SIZE from the PTB message as 757 the next search point when it resumes the search algorithm. 759 5. Datagram Packetization Layer PMTUD 761 This section specifies Datagram PLPMTUD (DPLPMTUD). The method can 762 be introduced at various points in the IP protocol stack to discover 763 the PLPMTU so that an application can utilise an appropriate MPS for 764 the current network path. 766 +----------------------+ 767 | APP* | 768 +-+-------+----+---+---+ 769 | | | | 770 +---+--+ +--+--+ | +-+---+ 771 | QUIC*| |UDPO*| | |SCTP*| 772 +---+--+ +--+--+ | ++--+-+ 773 | | | | | 774 +-------+-+ | | | 775 | | | | 776 ++-+--++ | 777 | UDP | | 778 +---+--+ | 779 | | 780 +--------------+-----+-+ 781 | Network Interface | 782 +----------------------+ 784 Figure 1: Examples where DPLPMTUD can be implemented 786 The central idea of DPLPMTUD is probing by a sender. Probe packets 787 are sent to find the maximum size of user message that is completely 788 transferred across the network path from the sender to the 789 destination. 791 This section identifies the components needed for implementation, the 792 phases of operation, the state machine and search algorithm. 794 5.1. DPLPMTUD Components 796 This section describes components of DPLPMTUD. 798 5.1.1. Timers 800 The method utilises three timers: 802 PROBE_TIMER: The PROBE_TIMER is configured to expire after a period 803 longer than the maximum time to receive an acknowledgment to a 804 probe packet. This value MUST be larger than 1 second, and SHOULD 805 be larger than 15 seconds. Guidance on selection of the timer 806 value are provided in section 3.1.1 of the UDP Usage Guidelines 807 [RFC8085]. 809 If the PL has a path Round Trip Time (RTT) estimate and timely 810 acknowledgements the PROBE_TIMER can be derived from the PL RTT 811 estimate. 813 PMTU_RAISE_TIMER: The PMTU_RAISE_TIMER is configured to the period a 814 sender will continue to use the current PLPMTU, after which it re- 815 enters the Search phase. This timer has a period of 600 secs, as 816 recommended by PLPMTUD [RFC4821]. 818 DPLPMTUD SHOULD inhibit sending probe packets when no application 819 data has been sent since the previous probe packet. 821 CONFIRMATION_TIMER: The CONFIRMATION_TIMER is configured to the 822 period a PL sender waits before confirming the current PLPMTU is 823 still supported. This is less than the PMTU_RAISE_TIMER and used 824 to decrease the PLPMTU (e.g., when a black hole is encountered). 825 Confirmation needs to be frequent enough when data is flowing that 826 the sending PL does not black hole extensive amounts of traffic. 827 Guidance on selection of the timer value are provided in section 828 3.1.1 of the UDP Usage Guidelines[RFC8085]. 830 DPLPMTUD SHOULD inhibit sending probe packets when no application 831 data has been sent since the previous probe packet. 833 An implementation could implement the various timers using a single 834 timer process. 836 5.1.2. Constants 838 The following constants are defined: 840 MAX_PROBES: MAX_PROBES is the maximum value of the 841 PROBE_ERROR_COUNTER. The default value of MAX_PROBES is 10. 843 MIN_PMTU: The MIN_PMTU is smallest allowed probe packet size. For 844 IPv6, this value is 1280 bytes, as specified in [RFC2460]. For 845 IPv4, the minimum value is 68 bytes. (An IPv4 router is required 846 to be able to forward a datagram of 68 octets without further 847 fragmentation. This is the combined size of an IPv4 header and 848 the minimum fragment size of 8 octets. In addition, receivers are 849 required to be able to reassemble fragmented datagrams at least up 850 to 576B, as stated in section 3.3.3 of [RFC1122])) 852 MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU. This has to 853 be less than or equal to the minimum of the local MTU of the 854 outgoing interface and the destination PMTU for receiving. An 855 application or PL MAY reduce the MAX_PMTU when there is no need to 856 send packets larger than a specific size. 858 BASE_PMTU: The BASE_PMTU is a configured size expected to work for 859 most paths. The size is equal to or larger than the MIN_PMTU and 860 smaller than the MAX_PMTU. In the case of IPv6, this value is 861 1280 bytes [RFC2460]. When using IPv4, a size of 1200 bytes is 862 RECOMMENDED. 864 5.1.3. Variables 866 This method utilises a set of variables: 868 PROBED_SIZE: The PROBED_SIZE is the size of the current probe 869 packet. This is a tentative value for the PLPMTU, which is 870 awaiting confirmation by an acknowledgment. 872 PROBE_COUNT: The PROBE_COUNT is a count of the number of 873 unsuccessful probe packets that have been sent with a size of 874 PROBED_SIZE. The value is initialised to zero when a particular 875 size of PROBED_SIZE is first attempted. 877 The figure below illustrates the relationship between the packet size 878 constants and variables, in this case when the DPLPMTUD algorithm 879 performs path probing to increase the size of the PLPMTU. The MPS is 880 less than the PLPMTU. A probe packet has been sent of size 881 PROBED_SIZE. When this is acknowledged, the PLPMTU will be raised to 882 PROBED_SIZE allowing the PROBED_SIZE to be increased towards the 883 actual PMTU. 885 MIN_PMTU PMTU_MAX 886 <------------------------------------------------------> 887 | | | | | 888 V | | | V 889 BASE_PMTU V | V Actual PMTU 890 MPS | PROBED_SIZE 891 V 892 PLPMTU 894 Figure 2: Relationships between probe and packet sizes 896 5.2. DPLPMTUD Phases 898 The Datagram PLPMTUD algorithm moves through several phases of 899 operation. 901 An implementation that only reduces the PLPMTU to a suitable size 902 would be sufficient to ensure reliable operation, but can be very 903 inefficient when the actual PMTU changes or when the method (for 904 whatever reason) makes a suboptimal choice for the PLPMTU. 906 A full implementation of DPLPMTUD provides an algorithm enabling the 907 DPLPMTUD sender to increase the PLPMTU following a change in the 908 characteristics of the path, such as when a link is reconfigured with 909 a larger MTU, or when there is a change in the set of links traversed 910 by an end-to-end flow (e.g., after a routing or path fail-over 911 decision). 913 Black hole detection, see Section 4.3 and PTB processing Section 4.4 914 proceed in parallel with these phases of operation. 916 +-------------------+ 917 | Path Confirmation +-- Connectivity 918 +--------+----------+ \----- or BASE_PMTU 919 | /\ \/ Confirmation Fails 920 Connectivity and | | +-------+ 921 BASE_PMTU confirmed | ---------+ Error | 922 | +-------+ 923 | CONFIRMATION_TIMER 924 | Fires 925 \/ 926 +----------------+ +--------------+ 927 | Search Complete|<---------+ Search | 928 +----------------+ +--------------+ 929 Search Algorithm 930 Completes 932 Figure 3: DPLPMTUD Phases 934 Path Confirmation 936 * Connectivity is confirmed. 938 * DPLPMTUD confirms the BASE_PMTU is supported across the network 939 path. 941 * DPLPMTUD then enters the search phase. 943 Search 945 * DPLPMTUD performs probing to increase the PLPMTU. 947 * DPLPMTUD then enters the search complete or an error phase. 949 Search Complete 951 * DPLPMTUD has found a suitable PLPMTU that is supported across 952 the network path. 954 * Black hole detection will confirm this PLPMTU continues to be 955 supported. 957 * On a longer time-frame, DPLPMTUD will re-enter the search phase 958 to discover if the PLPMTU can be raised. 960 Error 962 * Inconsistent or invalid network signals cause DPLPMTUD to be 963 unable to progress. 965 * This causes the algorithm to lower the MPS until the path is 966 shown to support the BASE_PMTU, or to suspend DPLPMTUD. 968 5.2.1. Path Confirmation Phase 970 DPLPMTUD starts in the Path confirmation phase. Path confirmation is 971 performed in two stages: 973 1. Connectivity to the remote peer is first confirmed. When a 974 connection-oriented PL is used, this stage is implicit. It is 975 performed as part of the normal PL connection handshake. In 976 contrast, an connectionless PL MUST send an acknowledged probe 977 packet to confirm that the remote peer is reachable. 979 2. In the second stage, the PL confirms it can successfully send a 980 datagram of the BASE_PMTU size across the current path. 982 A PL that does not wish to support a network path with a PLPMTU less 983 than BASE_PMTU can simplify the phase into a single step by 984 performing connectivity checks with probes of the BASE_PMTU size. 986 A PL MAY respond to PTB messages while in this phase, see 987 Section 4.4. 989 Once path confirmation has completed, DPLPMTUD can advertise an MPS 990 to an upper layer. 992 If DPLPMTUD fails to complete these tests it enters the 993 PROBE_DISABLED phase, see Section 5.2.6, and ceases using DPLPTMUD. 995 5.2.2. Search Phase 997 The search phase utilises a search algorithm in attempt to increase 998 the PLPMTU (see Section 5.4.1). The PL sender increases the MPS each 999 time a packet probe confirms a larger PLPMTU is supported by the 1000 path. The algorithm concludes by entering the SEARCH_COMPLETE phase, 1001 see Section 5.2.3. 1003 A PL MAY respond to PTB messages while in this phase, using the PTB 1004 to advance or terminate the search, see Section 4.4. Similarly black 1005 hole detection can terminate the search by entering the PROBE_BASE 1006 phase, see Section 5.2.4. 1008 5.2.2.1. Resilience to inconsistent path information 1010 Sometimes a PL sender is able to detect inconsistent results from the 1011 sequence of PLPMTU probes that it sends or the sequence of PTB 1012 messages that it receives. This could be manifested as excessive 1013 fluctuation of the MPS. 1015 When inconsistent path information is detected, a PL sender can 1016 enable an alternate search mode that clamps the offered MPS to a 1017 smaller value for a period of time. This avoids unnecessary black- 1018 holing of packets. 1020 5.2.3. Search Complete Phase 1022 On entry to the search complete phase, the DPLPMTUD sender starts the 1023 PMTU_RAISE_TIMER. In this phase, the PLPMTU remains at the value 1024 confirmed by the last successful probe packet. 1026 In this phase, the PL MUST periodically confirm that the PLPMTU is 1027 still supported by the path. If the PL is designed in a way that is 1028 unable to confirm reachability to the destination endpoint after 1029 probing has completed, the method uses a CONFIRMATION_TIMER to 1030 periodically repeat a probe packet for the current PLPMTU size. 1032 If the DPLPMTUD sender is unable to confirm reachability for packets 1033 with a size of the current PLPMTU (e.g., if the CONFIRMATION_TIMER 1034 expires) or the PL signals a lack of reachability, the method exits 1035 the phase and enters the PROBE_BASE phase, see Section 5.2.4. 1037 If the PMTU_RAISE_TIMER expires, the DPLPMTUD sender re-enters the 1038 Search phase, see Section 5.2.2, and resumes probing for a larger 1039 PLPMTU. 1041 Back hole detection can be used in parallel to check that a network 1042 path continues to support a previously confirmed PLPMTU. If a black 1043 hole is detected the algorithm moves to the PROBE_BASE phase, see 1044 Section 5.2.4. 1046 The phase can also exited when a validated PTB message is received 1047 (see Section 4.4.1). 1049 5.2.4. PROBE_BASE Phase 1051 This phase is entered when black hole detection or a PTB message 1052 indicates that the PLPMTU is not supported by the path. 1054 On entry to this phase, the PLPMTU is set to the BASE_PMTU, and a 1055 corresponding reduced MPS is advertised. 1057 PROBED_SIZE is then set to the PLPMTU (i.e., the BASE_PMTU), to 1058 confirm this size is supported across the path. If confirmed, 1059 DPLPMTUD enters the Search Phase to determine whether the PL sender 1060 can use a larger PLPMTU. 1062 If the path cannot be confirmed to support the BASE_PMTU after 1063 sending MAX_PROBES, DPLPMTUD moves to the Error phase, see 1064 Section 5.2.5. 1066 5.2.5. ERROR Phase 1068 The ERROR phase is entered when there is conflicting or invalid 1069 PLPMTU information for the path (e.g. a failure to support the 1070 BASE_PMTU). In this phase, the MPS is set to a value less than the 1071 BASE_PMTU, but at least the size of the MIN_PMTU. 1073 DPLPMTUD remains in the ERROR phase until a consistent view of the 1074 path can be discovered and it has also been confirmed that the path 1075 supports the BASE_PMTU. 1077 Note: MIN_PMTU may be identical to BASE_PMTU, simplifying the actions 1078 in this phase. 1080 If no acknowledgement is received for PROBE_COUNT probes of size 1081 MIN_PMTU, the method suspends DPLPMTUD, see Section 5.2.5. 1083 5.2.5.1. Robustness to inconsistent path 1085 Robustness to paths unable to sustain the BASE_PMTU. Some paths 1086 could be unable to sustain packets of the BASE_PMTU size. These 1087 paths could use an alternate algorithm to implement the PROBE_ERROR 1088 phase that allows fallback to a smaller than desired PLPMTU, rather 1089 than suffer connectivity failure. 1091 This could also utilise methods such as endpoint IP fragmentation to 1092 enable the PL sender to communicate using packets smaller than the 1093 BASE_PMTU. 1095 5.2.6. DISABLED Phase 1097 This phase suspends operation of DPLPMTUD. It disables probing for 1098 the PLPMTU until action is taken by the PL or application using the 1099 PL. 1101 5.3. State Machine 1103 A state machine for DPLPMTUD is depicted in Figure 4. If multihoming 1104 is supported, a state machine is needed for each active path. 1106 PROBE_TIMER expiry 1107 (PROBE_COUNT = MAX_PROBES) 1108 +-------------------+ +--------------+ 1109 | PROBE_START +------>|PROBE_DISABLED| 1110 +-------------------+ +--------------+ 1111 | ^ 1112 | Path confirmed | 1113 v | 1114 MAX_PMTU acked or +--------------+-+ (PROBE_COUNT | 1115 PTB (BASE_PMTU <= +---------| PROBE_SEARCH | | < MAX_PROBES) | 1116 PTB_SIZE | +--> +--------------+<+ or Probe acked | 1117 | PROBE_BASE |<-------| PROBE_ERROR | 1136 +------+--------+ +--------------+ +-------------+ 1137 /\ | Black hole detected ^ | | BASE_PMTU Probe acked: ^ 1138 | | or | | | | 1139 | | (PTB_SIZE < PLPMTU) | | | Probe BASE_PMTU: | 1140 | | | | | (PROBE_COUNT = MAX_PROBES)| 1141 | | | | +---------------------------+ 1142 +----+ +--+ 1143 Confirmation: PROBE_TIMER expiry: 1144 (PROBE_COUNT < MAX_PROBES) (PROBE_COUNT < MAX_PROBES) 1145 or 1146 PLPMTU Probe acked 1148 Figure 4: State machine for Datagram PLPMTUD. Note: Some state 1149 changes are not show to simplify the diagram. 1151 The following states are defined: 1153 PROBE_START: The PROBE_START state is the initial state before 1154 probing has started. The state confirms connectivity to the 1155 remote PL. 1157 The PLPMTU is set to the BASE_PMTU size. Probing ought to start 1158 immediately after connection setup to prevent the prevent the loss 1159 of user data. PLPMTUD is not performed in this state. The state 1160 transitions to PROBE_SEARCH, when a network path has been 1161 confirmed, i.e., when a sent packet has been acknowledged on this 1162 network path and the BASE_PMTU is confirmed to be supported. If 1163 the network path cannot be confirmed this state transitions to 1164 PROBE_DISABLED. 1166 PROBE_SEARCH: The PROBE_SEARCH state is the main probing state. 1167 This state is entered when probing for the BASE_PMTU was 1168 successful. 1170 The PROBE_COUNT is set to zero when the first probe packet is sent 1171 for each probe size. Each time a probe packet is acknowledged, 1172 the PLPMTU is set to the PROBED_SIZE, and then the PROBED_SIZE is 1173 increased using the search algorithm. 1175 When a probe packet is sent and not acknowledged within the period 1176 of the PROBE_TIMER, the PROBE_COUNT is incremented and the probe 1177 packet is retransmitted. The state is exited when the PROBE_COUNT 1178 reaches MAX_PROBES; a PTB message is validated; a probe of size 1179 PMTU_MAX is acknowledged or black hole detection is triggered. 1181 SEARCH_COMPLETE: The SEARCH_COMPLETE state indicates a successful 1182 end to the PROBE_SEARCH state. DPLPMTUD remains in this state 1183 until either the PMTU_RAISE_TIMER expires; a received PTB message 1184 is validated; or black hole detection is triggered. 1186 When DPLPMTUD uses an unacknowledged PL and is in the 1187 SEARCH_COMPLETE state, a CONFIRMATION_TIMER periodically resets 1188 the PROBE_COUNT and schedules a probe packet with the size of the 1189 PLPMTU. If the probe packet fails to be acknowledged after 1190 MAX_PROBES attempts, the method enters the PROBE_BASE state. When 1191 used with an acknowledged PL (e.g., SCTP), DPLPMTUD SHOULD NOT 1192 continue to generate PLPMTU probes in this state. 1194 PROBE_BASE: The PROBE_BASE state is used to confirm whether the 1195 BASE_PMTU size is supported by the network path and is designed to 1196 allow an application to continue working when there are transient 1197 reductions in the actual PMTU. It also seeks to avoid long 1198 periods where traffic is black holed while searching for a larger 1199 PLPMTU. 1201 On entry, the PROBED_SIZE is set to the BASE_PMTU size and the 1202 PROBE_COUNT is set to zero. 1204 Each time a probe packet is sent, and the PROBE_TIMER is started. 1205 The state is exited when the probe packet is acknowledged, and the 1206 PL sender enters the PROBE_SEARCH state. 1208 The state is also left when the PROBE_COUNT reaches MAX_PROBES; a 1209 PTB message is validated. This causes the PL sender to enter the 1210 PROBE_ERROR state. 1212 PROBE_ERROR: The PROBE_ERROR state represents the case where the 1213 network path is not known to support a PLPMTU of at least the 1214 BASE_PMTU size. It is entered when either a probe of size 1215 BASE_PMTU has not been acknowledged or a validated PTB message 1216 indicates a smaller PTB_SIZE smaller than the BASE_PMTU. 1218 On entry, the PROBE_COUNT is set to zero and the PROBED_SIZE is 1219 set to the MIN_PMTU size, and the PLPMTU is reset to MIN_PMTU 1220 size. In this state, a probe packet is sent, and the PROBE_TIMER 1221 is started. The state transitions to the PROBE_SEARCH state when 1222 a probe packet is acknowledged of at least size BASE_PMTU. Robust 1223 implementations may validate the BASE_PMTU several times before 1224 transition to the PROBE_SEARCH. 1226 Implementations are permitted to enable endpoint fragmentation if 1227 the DPLPMTUD is unable to validate MIN_PMTU within PROBE_COUNT 1228 probes. If DPLPMTUD is unable to validate MIN_PMTU the 1229 implementation should transition to PROBE_DISABLED. 1231 PROBE_DISABLED: The PROBE_DISABLED state indicates that connectivity 1232 could not be established. DPLPMTUD MUST NOT probe in this state. 1234 Appendix A contains an informative description of key events. 1236 5.4. Search to Increase the PLPMTU 1238 This section describes the algorithms used by DPLPMTUD to search for 1239 a larger PLPMTU. 1241 5.4.1. Probing for a larger PLPMTU 1243 Implementations use a search algorithm across the search range to 1244 determine whether a larger PLPMTU can be supported across a network 1245 path. 1247 The method discovers the search range by confirming the minimum 1248 PLPMTU and then using the probe method to select a PROBED_SIZE less 1249 than or equal to PMTU_MAX. PMTU_MAX is the minimum of the local MTU 1250 and EMTU_R (learned from the remote endpoint). The PMTU_MAX MAY be 1251 reduced by an application that sets a maximum to the size of 1252 datagrams it will send. 1254 The PROBE_COUNT is initialised to zero when a probe packet is first 1255 sent with a particular size. A timer is used by the search algorithm 1256 to trigger the sending of probe packets of size PROBED_SIZE, larger 1257 than the PLPMTU. Each probe packet successfully sent to the remote 1258 peer is confirmed by acknowledgement at the PL, see Section 4.1. 1260 Each time a probe packet is sent to the destination, the PROBE_TIMER 1261 is started. The timer is cancelled when the PL receives 1262 acknowledgment that the probe packet has been successfully sent 1263 across the path Section 4.1. This confirms that the PROBED_SIZE is 1264 supported, and the PROBED_SIZE value is then assigned to the PLPMTU. 1265 The search algorithm can continue to send subsequent probe packets of 1266 an increasing size. 1268 If the timer expires before a probe packet is acknowledged, the probe 1269 has failed to confirm the PROBED_SIZE. Each time the PROBE_TIMER 1270 expires, the PROBE_COUNT is incremented, the PROBE_TIMER is 1271 reinitialised, and a probe packet of the same size is retransmitted 1272 (the replicated probe improve the resilience to loss). The maximum 1273 number of retransmissions for a particular size is configured 1274 (MAX_PROBES). If the value of the PROBE_COUNT reaches MAX_PROBES, 1275 probing will stop, and the PL sender enters the SEARCH_COMPLETE 1276 state. 1278 5.4.2. Selection of Probe Sizes 1280 The search algorithm needs to determine a minimum useful gain in 1281 PLPMTU. It would not be constructive for a PL sender to attempt to 1282 probe for all sizes - this would incur unnecessary load on the path 1283 and has the undesirable effect of slowing the time to reach a more 1284 optimal MPS. Implementations SHOULD select the set of probe packet 1285 sizes to maximise the gain in PLPMTU from each search step. 1287 Implementations could optimize the search procedure by selecting step 1288 sizes from a table of common PMTU sizes. When selecting the 1289 appropriate next size to search, an implementor ought to also 1290 consider that there can be common sizes of MPS that applications seek 1291 to use. 1293 xxx Author Note: A future version of this section will detail example 1294 methods for selecting probe size values, but does not plan to mandate 1295 a single method. xxx 1297 5.4.3. Resilience to inconsistent Path information 1299 A decision to increase the PLPMTU needs to be resilient to the 1300 possibility that information learned about the network path is 1301 inconsistent (this could happen when probe packets are lost due to 1302 other reasons, or some of the packets in a flow are forwarded along a 1303 portion of the path that supports a different actual PMTU). 1305 Frequent path changes could occur due to unexpected "flapping" - 1306 where some packets from a flow pass along one path, but other packets 1307 follow a different path with different properties. DPLPMTUD can be 1308 made resilient to these anomalies by introducing hysteresis into the 1309 search decision to increase the MPS. 1311 6. Specification of Protocol-Specific Methods 1313 This section specifies protocol-specific details for datagram PLPMTUD 1314 for IETF-specified transports. 1316 The first subsection provides guidance on how to implement the 1317 DPLPMTUD method as a part of an application using UDP or UDP-Lite. 1318 The guidance also applies to other datagram services that do not 1319 include a specific transport protocol (such as a tunnel 1320 encapsulation). The following subsection describe how DPLPMTUD can 1321 be implemented as a part of the transport service, allowing 1322 applications using the service to benefit from discovery of the 1323 PLPMTU without themselves needing to implement this method. 1325 6.1. Application support for DPLPMTUD with UDP or UDP-Lite 1327 The current specifications of UDP [RFC0768] and UDP-Lite [RFC3828] do 1328 not define a method in the RFC-series that supports PLPMTUD. In 1329 particular, the UDP transport does not provide the transport layer 1330 features needed to implement datagram PLPMTUD. 1332 The DPLPMTUD method can be implemented as a part of an application 1333 built directly or indirectly on UDP or UDP-Lite, but relies on 1334 higher-layer protocol features to implement the method [RFC8085]. 1336 Some primitives used by DPLPMTUD might not be available via the 1337 Datagram API (e.g., the ability to access the PLPMTU cache, or 1338 interpret received ICMP PTB messages). 1340 In addition, it is desirable that PMTU discovery is not performed by 1341 multiple protocol layers. An application SHOULD avoid implementing 1342 DPLPMTUD when the underlying transport system provides this 1343 capability. Using a common method for managing the PLPMTU has 1344 benefits, both in the ability to share state between different 1345 processes and opportunities to coordinate probing. 1347 6.1.1. Application Request 1349 An application needs an application-layer protocol mechanism (such as 1350 a message acknowledgement method) that solicits a response from a 1351 destination endpoint. The method SHOULD allow the sender to check 1352 the value returned in the response to provide additional protection 1353 from off-path insertion of data [RFC8085], suitable methods include a 1354 parameter known only to the two endpoints, such as a session ID or 1355 initialised sequence number. 1357 6.1.2. Application Response 1359 An application needs an application-layer protocol mechanism to 1360 communicate the response from the destination endpoint. This 1361 response may indicate successful reception of the probe across the 1362 path, but could also indicate that some (or all packets) have failed 1363 to reach the destination. 1365 6.1.3. Sending Application Probe Packets 1367 A probe packet that may carry an application data block, but the 1368 successful transmission of this data is at risk when used for 1369 probing. Some applications may prefer to use a probe packet that 1370 does not carry an application data block to avoid disruption to 1371 normal data transfer. 1373 6.1.4. Validating the Path 1375 An application that does not have other higher-layer information 1376 confirming correct delivery of datagrams SHOULD implement the 1377 CONFIRMATION_TIMER to periodically send probe packets while in the 1378 SEARCH_COMPLETE state. 1380 6.1.5. Handling of PTB Messages 1382 An application that is able and wishes to receive PTB messages MUST 1383 perform ICMP validation as specified in Section 5.2 of [RFC8085]. 1384 This requires that the application to check each received PTB 1385 messages to validate it is received in response to transmitted 1386 traffic and that the reported PTB_SIZE is less than the current 1387 probed size. A validated PTB message MAY be used as input to the 1388 DPLPMTUD algorithm, but MUST NOT be used directly to set the PLPMTU. 1390 6.2. DPLPMTUD with UDP Options 1392 UDP Options[I-D.ietf-tsvwg-udp-options] can supply the additional 1393 functionality required to implement DPLPMTUD within the UDP transport 1394 service. Implementing DPLPMTU using UDP Options avoids the need for 1395 each application to implement the DPLPMTUD method. 1397 Section 5.6 of[I-D.ietf-tsvwg-udp-options] defines the Maximum 1398 Segment Size (MSS) option, which allows the local sender to indicate 1399 the EMTU_R to the peer. The value received in this option can be 1400 used to initialise PMTU_MAX. 1402 UDP Options enables padding to be added to UDP datagrams that are 1403 used as Probe Packets. Feedback confirming reception of each Probe 1404 Packet is provided by two new UDP Options: 1406 o The Probe Request Option (Section 6.2.1) is set by a sending PL to 1407 solicit a response from a remote endpoint. A four-byte token 1408 identifies each request. 1410 o The Probe Response Option (Section 6.2.2 is generated by the UDP 1411 Options receiver in response to reception of a previously received 1412 Probe Request Option. Each Probe Response Option echoes a 1413 previously received four-byte token. 1415 The token value allows implementations to be distinguish between 1416 acknowledgements for initial probe packets and acknowledgements 1417 confirming receipt of subsequent probe packets (e.g., travelling 1418 along alternate paths with a larger RTT). Each probe packet needs to 1419 be uniquely identifiable by the UDP Options sender within the Maximum 1420 Segment Lifetime (MSL). The UDP Options sender therefore needs to 1421 not recycle token values until they have expired or have been 1422 acknowledged. A 4 byte value for the token field provides sufficient 1423 space for multiple unique probes to be made within the MSL. 1425 The initial value of the four byte token field SHOULD be assigned to 1426 a randomised value, as described in section 5.1 of [RFC8085]) to 1427 enhance protection from off-path attacks. 1429 Implementations ought to only send a probe packet with a Request 1430 Probe Option when required by their local state machine, i.e., when 1431 probing to grow the PLPMTU or to confirm the current PLPMTU. The 1432 procedure to handle the loss of a response packet is the 1433 responsibility of the sender of the request. Implementations are 1434 allowed to track multiple requests and respond to them with a single 1435 packet. 1437 A PL needs to determine that the path can still support the size of 1438 datagram that the application is currently sending in the DPLPMTUD 1439 search_done state (i.e., to detect black-holing of data). One way to 1440 achieve this is to send probe packets of size PLPMTU or to utilise a 1441 higher-layer method that provides explicit feedback indicating any 1442 packet loss. Another possibility is to utilise data packets that 1443 carry a Timestamp Option. Reception of a valid timestamp that was 1444 echoed by the remote endpoint can be used to infer connectivity. 1445 This can provide useful feedback even over paths with asymmetric 1446 capacity and/or that carry UDP Option flows that have very asymmetric 1447 datagram rates, because an echo of the most recent timestamp still 1448 indicates reception of at least one packet of the transmitted size. 1449 This is sufficient to confirm there is no black hole. 1451 In contrast, when sending a probe to increase the PLPMTU, a timestamp 1452 might be unable to unambiguously identify that a specific probe 1453 packet has been received. Timestamp mechanisms cannot be used to 1454 confirm the reception of individual probe messages and cannot be used 1455 to stimulate a response from the remote peer. 1457 6.2.1. UDP Probe Request Option 1459 The Probe Request Option allows a sending endpoint to solicit a 1460 response from a destination endpoint. 1462 The Probe Request Option carries a four byte token set by the sender. 1463 This token can be set to a value that is likely to be known only to 1464 the sender (and is sent along the end-to-end path). The initial 1465 value of the token SHOULD be assigned to a randomised value, as 1466 described in section 5.1 of [RFC8085]) to enhance protection from 1467 off-path attacks. 1469 The sender needs to then check the value returned in the UDP Probe 1470 Response Option. The value of the Token field, uniquely identifies a 1471 probe within the maximum segment lifetime. 1473 +----------+--------+-----------------+ 1474 | Kind=9* | Len=6 | Token | 1475 +----------+--------+-----------------+ 1476 1 byte 1 byte 4 bytes 1478 * To be confirmed by IANA. 1480 Figure 5: UDP Probe REQ Option Format 1482 6.2.2. UDP Probe Response Option 1484 The Probe Response Option is generated in response to reception of a 1485 previously received Probe Request Option. This response is generated 1486 by the UDP Option processing. 1488 The Probe Response Option carries a four byte token field. The Token 1489 field associates the response with the Token value carried in the 1490 most recently-received Echo Request. The rate of generation of UDP 1491 packets carrying a Probe Response Option is expected to be less than 1492 once per RTT and SHOULD be rate-limited (see Section 9). 1494 +----------+--------+-----------------+ 1495 | Kind=10* | Len=6 | Token | 1496 +----------+--------+-----------------+ 1497 1 byte 1 byte 4 bytes 1499 * To be confirmed by IANA. 1501 Figure 6: UDP Probe RES Option Format 1503 6.3. DPLPMTUD for SCTP 1505 Section 10.2 of [RFC4821] specifies a recommended PLPMTUD probing 1506 method for SCTP. It recommends the use of the PAD chunk, defined in 1507 [RFC4820] to be attached to a minimum length HEARTBEAT chunk to build 1508 a probe packet. This enables probing without affecting the transfer 1509 of user messages and without interfering with congestion control. 1510 This is preferred to using DATA chunks (with padding as required) as 1511 path probes. 1513 XXX Author Note: Future versions of this document might define a 1514 parameter contained in the INIT and INIT ACK chunk to indicate the 1515 remote peer MTU to the local peer. However, multihoming makes this a 1516 bit complex, so it might not be worth doing. XXX 1518 6.3.1. SCTP/IPv4 and SCTP/IPv6 1520 The base protocol is specified in [RFC4960]. This provides an 1521 acknowledged PL. A sender can therefore enter the PROBE_BASE state 1522 as soon as connectivity has been confirmed. 1524 6.3.1.1. Sending SCTP Probe Packets 1526 Probe packets consist of an SCTP common header followed by a 1527 HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to control 1528 the length of the probe packet. The HEARTBEAT chunk is used to 1529 trigger the sending of a HEARTBEAT ACK chunk. The reception of the 1530 HEARTBEAT ACK chunk acknowledges reception of a successful probe. 1532 The HEARTBEAT chunk carries a Heartbeat Information parameter which 1533 should include, besides the information suggested in [RFC4960], the 1534 probe size, which is the size of the complete datagram. The size of 1535 the PAD chunk is therefore computed by reducing the probing size by 1536 the IPv4 or IPv6 header size, the SCTP common header, the HEARTBEAT 1537 request and the PAD chunk header. The payload of the PAD chunk 1538 contains arbitrary data. 1540 To avoid fragmentation of retransmitted data, probing starts right 1541 after the handshake, before data is sent. Assuming normal behaviour 1542 (i.e., the PMTU is smaller than or equal to the interface MTU), this 1543 process will take a few round trip time periods depending on the 1544 number of PMTU sizes probed. The Heartbeat timer can be used to 1545 implement the PROBE_TIMER. 1547 6.3.1.2. Validating the Path with SCTP 1549 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1550 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1552 6.3.1.3. PTB Message Handling by SCTP 1554 Normal ICMP validation MUST be performed as specified in Appendix C 1555 of [RFC4960]. This requires that the first 8 bytes of the SCTP 1556 common header are quoted in the payload of the PTB message, which can 1557 be the case for ICMPv4 and is normally the case for ICMPv6. 1559 When a PTB message has been validated, the PTB_SIZE reported in the 1560 PTB message SHOULD be used with the DPLPMTUD algorithm, providing 1561 that the reported PTB_SIZE is less than the current probe size. 1563 6.3.2. DPLPMTUD for SCTP/UDP 1565 The UDP encapsulation of SCTP is specified in [RFC6951]. 1567 6.3.2.1. Sending SCTP/UDP Probe Packets 1569 Packet probing can be performed as specified in Section 6.3.1.1. The 1570 maximum payload is reduced by 8 bytes, which has to be considered 1571 when filling the PAD chunk. 1573 6.3.2.2. Validating the Path with SCTP/UDP 1575 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1576 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1578 6.3.2.3. Handling of PTB Messages by SCTP/UDP 1580 Normal ICMP validation MUST be performed for PTB messages as 1581 specified in Appendix C of [RFC4960]. This requires that the first 8 1582 bytes of the SCTP common header are contained in the PTB message, 1583 which can be the case for ICMPv4 (but note the UDP header also 1584 consumes a part of the quoted packet header) and is normally the case 1585 for ICMPv6. When the validation is completed, the PTB_SIZE indicated 1586 in the PTB message SHOULD be used with the DPLPMTUD providing that 1587 the reported PTB_SIZE is less than the current probe size. 1589 6.3.3. DPLPMTUD for SCTP/DTLS 1591 The Datagram Transport Layer Security (DTLS) encapsulation of SCTP is 1592 specified in [RFC8261]. It is used for data channels in WebRTC 1593 implementations. 1595 6.3.3.1. Sending SCTP/DTLS Probe Packets 1597 Packet probing can be done as specified in Section 6.3.1.1. 1599 6.3.3.2. Validating the Path with SCTP/DTLS 1601 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1602 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1604 6.3.3.3. Handling of PTB Messages by SCTP/DTLS 1606 It is not possible to perform normal ICMP validation as specified in 1607 [RFC4960], since even if the ICMP message payload contains sufficient 1608 information, the reflected SCTP common header would be encrypted. 1609 Therefore it is not possible to process PTB messages at the PL. 1611 6.4. DPLPMTUD for QUIC 1613 Quick UDP Internet Connection (QUIC) [I-D.ietf-quic-transport] is a 1614 UDP-based transport that provides reception feedback. The UDP 1615 payload includes the QUIC packet header, protected payload, and any 1616 authentication fields. QUIC depends on a PMTU of at least 1280 1617 bytes. 1619 Section 9.2 of [I-D.ietf-quic-transport] describes the path 1620 considerations when sending QUIC packets. It recommends the use of 1621 PADDING frames to build the probe packet. Pure probe-only packets 1622 are constructed with PADDING frames and PING frames to create a 1623 padding only packet that will illict an acknowledgement. Padding 1624 only frames enable probing the without affecting the transfer of 1625 other QUIC frames. 1627 The recommendation for QUIC endpoints implementing DPLPMTUD is 1628 therefore that a MPS is maintained for each combination of local and 1629 remote IP addresses [I-D.ietf-quic-transport]. If a QUIC endpoint 1630 determines that the PMTU between any pair of local and remote IP 1631 addresses has fallen below an acceptable MPS, it needs to immediately 1632 cease sending QUIC packets on the affected path. This could result 1633 in termination of the connection if an alternative path cannot be 1634 found [I-D.ietf-quic-transport]. 1636 6.4.1. Sending QUIC Probe Packets 1638 A probe packet consists of a QUIC Header and a payload containing 1639 PADDING Frames and a PING Frame. PADDING Frames are a single octet 1640 (0x00) and several of these can be used to create a probe packet of 1641 size PROBED_SIZE. QUIC provides an acknowledged PL, A sender can 1642 therefore enter the PROBE_BASE state as soon as connectivity has been 1643 confirmed. 1645 The current specification of QUIC sets the following: 1647 o BASE_PMTU: 1200. A QUIC sender needs to pad initial packets to 1648 1200 bytes to confirm the path can support packets of a useful 1649 size. 1651 o MIN_PMTU: 1200 bytes. A QUIC sender that determines the PMTU has 1652 fallen below 1200 bytes MUST immediately stop sending on the 1653 affected path. 1655 6.4.2. Validating the Path with QUIC 1657 QUIC provides an acknowledged PL. A sender therefore MUST NOT 1658 implement the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1660 6.4.3. Handling of PTB Messages by QUIC 1662 QUIC operates over the UDP transport, and the guidelines on ICMP 1663 validation as specified in Section 5.2 of [RFC8085] therefore apply. 1664 In addition to UDP Port validation QUIC can validate an ICMP message 1665 by looking for valid Connection IDs in the quoted packet. 1667 7. Acknowledgements 1669 This work was partially funded by the European Union's Horizon 2020 1670 research and innovation programme under grant agreement No. 644334 1671 (NEAT). The views expressed are solely those of the author(s). 1673 8. IANA Considerations 1675 This memo includes no request to IANA. 1677 XXX If new UDP Options are specified in this document, a request to 1678 IANA will be included here. XXX 1680 If there are no requirements for IANA, the section will be removed 1681 during conversion into an RFC by the RFC Editor. 1683 9. Security Considerations 1685 The security considerations for the use of UDP and SCTP are provided 1686 in the references RFCs. Security guidance for applications using UDP 1687 is provided in the UDP Usage Guidelines [RFC8085], specifically the 1688 generation of probe packets is regarded as a "Low Data-Volume 1689 Application", described in section 3.1.3 of this document. This 1690 recommends that sender limits generation of probe packets to an 1691 average rate lower than one probe per 3 seconds. 1693 A PL sender needs to ensure that the method used to confirm reception 1694 of probe packets offers protection from off-path attackers injecting 1695 packets into the path. This protection if provided in IETF-defined 1696 protocols (e.g., TCP, SCTP) using a randomly-initialised sequence 1697 number. A description of one way to do this when using UDP is 1698 provided in section 5.1 of [RFC8085]). 1700 There are cases where PTB messages are not delivered due to policy, 1701 configuration or equipment design (see Section 1.1), this method 1702 therefore does not rely upon PTB messages being received, but is able 1703 to utilise these when they are received by the sender. PTB messages 1704 could potentially be used to cause a node to inappropriately reduce 1705 the PLPMTU. A node supporting DPLPMTUD MUST therefore appropriately 1706 validate the payload of PTB messages to ensure these are received in 1707 response to transmitted traffic (i.e., a reported error condition 1708 that corresponds to a datagram actually sent by the path layer). 1710 An on-path attacker, able to create a PTB message could forge PTB 1711 messages that include a valid quoted IP packet. Such an attack could 1712 be used to drive down the PLPMTU. There are two ways this method can 1713 be mitigated against such attacks: First, by ensuring that a PL 1714 sender never reduces the PLPMTU below the base size, solely in 1715 response to receiving a PTB message. This is achieved by first 1716 entering the PROBE_BASE state when such a message is received. 1717 Second, the design does not require processing of PTB messages, a PL 1718 sender could therefore suspend processing of PTB messages (e.g., in a 1719 robustness mode after detecting that subsequent probes actually 1720 confirm that a size larger than the PTB_SIZE is supported by a path). 1722 Parallel forwarding paths SHOULD be considered. Section 5.2.5.1 1723 identifies the need for robustness in the method when the path 1724 information may be inconsistent. 1726 A node performing DPLPMTUD could experience conflicting information 1727 about the size of supported probe packets. This could occur when 1728 there are multiple paths are concurrently in use and these exhibit a 1729 different PMTU. If not considered, this could result in data being 1730 black holed when the PLPMTU is larger than the smallest PMTU across 1731 the current paths. 1733 10. References 1735 10.1. Normative References 1737 [I-D.ietf-quic-transport] 1738 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1739 and Secure Transport", draft-ietf-quic-transport-16 (work 1740 in progress), October 2018. 1742 [I-D.ietf-tsvwg-udp-options] 1743 Touch, J., "Transport Options for UDP", draft-ietf-tsvwg- 1744 udp-options-05 (work in progress), July 2018. 1746 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1747 DOI 10.17487/RFC0768, August 1980, 1748 . 1750 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1751 DOI 10.17487/RFC1191, November 1990, 1752 . 1754 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1755 Requirement Levels", BCP 14, RFC 2119, 1756 DOI 10.17487/RFC2119, March 1997, 1757 . 1759 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1760 (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, 1761 December 1998, . 1763 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., 1764 and G. Fairhurst, Ed., "The Lightweight User Datagram 1765 Protocol (UDP-Lite)", RFC 3828, DOI 10.17487/RFC3828, July 1766 2004, . 1768 [RFC4820] Tuexen, M., Stewart, R., and P. Lei, "Padding Chunk and 1769 Parameter for the Stream Control Transmission Protocol 1770 (SCTP)", RFC 4820, DOI 10.17487/RFC4820, March 2007, 1771 . 1773 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1774 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1775 . 1777 [RFC6951] Tuexen, M. and R. Stewart, "UDP Encapsulation of Stream 1778 Control Transmission Protocol (SCTP) Packets for End-Host 1779 to End-Host Communication", RFC 6951, 1780 DOI 10.17487/RFC6951, May 2013, 1781 . 1783 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1784 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1785 March 2017, . 1787 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1788 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1789 May 2017, . 1791 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1792 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1793 DOI 10.17487/RFC8201, July 2017, 1794 . 1796 [RFC8261] Tuexen, M., Stewart, R., Jesup, R., and S. Loreto, 1797 "Datagram Transport Layer Security (DTLS) Encapsulation of 1798 SCTP Packets", RFC 8261, DOI 10.17487/RFC8261, November 1799 2017, . 1801 10.2. Informative References 1803 [I-D.ietf-intarea-tunnels] 1804 Touch, J. and M. Townsley, "IP Tunnels in the Internet 1805 Architecture", draft-ietf-intarea-tunnels-09 (work in 1806 progress), July 2018. 1808 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1809 RFC 792, DOI 10.17487/RFC0792, September 1981, 1810 . 1812 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1813 Communication Layers", STD 3, RFC 1122, 1814 DOI 10.17487/RFC1122, October 1989, 1815 . 1817 [RFC1812] Baker, F., Ed., "Requirements for IP Version 4 Routers", 1818 RFC 1812, DOI 10.17487/RFC1812, June 1995, 1819 . 1821 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 1822 RFC 2923, DOI 10.17487/RFC2923, September 2000, 1823 . 1825 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1826 Congestion Control Protocol (DCCP)", RFC 4340, 1827 DOI 10.17487/RFC4340, March 2006, 1828 . 1830 [RFC4443] Conta, A., Deering, S., and M. Gupta, Ed., "Internet 1831 Control Message Protocol (ICMPv6) for the Internet 1832 Protocol Version 6 (IPv6) Specification", STD 89, 1833 RFC 4443, DOI 10.17487/RFC4443, March 2006, 1834 . 1836 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1837 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 1838 . 1840 [RFC4890] Davies, E. and J. Mohacsi, "Recommendations for Filtering 1841 ICMPv6 Messages in Firewalls", RFC 4890, 1842 DOI 10.17487/RFC4890, May 2007, 1843 . 1845 Appendix A. Event-driven state changes 1847 This appendix contains an informative description of key events: 1849 Path Setup: When a new path is initiated, the state is set to 1850 PROBE_START. This sends a probe packet with the size of the 1851 BASE_PMTU. As soon as the path is confirmed, the state changes to 1852 PROBE_SEARCH. 1854 Arrival of an Acknowledgment: Depending on the probing state, the 1855 reaction differs according to Figure 7, which is a simplification 1856 of Figure 4 focusing on this event. 1858 +--------------+ +----------------+ 1859 | PROBE_START | --3------------------------------> | PROBE_DISABLED | 1860 +--------------+ --4---------------- ------------> +----------------+ 1861 \/ 1862 +--------------+ /\ +--------------+ 1863 | PROBE_ERROR | -------------------- \ ----------> | PROBE_BASE | 1864 +--------------+ --4--------------/ \ +--------------+ 1865 \ 1866 +--------------+ --1 -------- \ +--------------+ 1867 | PROBE_BASE | \ --- \ ------> | PROBE_ERROR | 1868 +--------------+ --3--------- \ -----/ \ +--------------+ 1869 \ \ 1870 +--------------+ \ -----> +--------------+ 1871 | PROBE_SEARCH | --2--- -----------------> | PROBE_SEARCH | 1872 +--------------+ \ ------------------> +--------------+ 1873 \ ---- / 1874 +---------------+ / \ +---------------+ 1875 |SEARCH_COMPLETE| -1--- \ |SEARCH_COMPLETE| 1876 +---------------+ -5-- -----------------------> +---------------+ 1877 \ 1878 \ +--------------+ 1879 --------------------------> | PROBE_BASE | 1880 +--------------+ 1882 Condition 1: The maximum PMTU size has not yet been reached. 1883 Condition 2: The maximum PMTU size has been reached. Condition 3: 1884 Probe Timer expires and PROBE_COUNT = MAX_PROBEs. Condition 4: 1885 PROBE_ACK received. Condition 5: Black hole detected. 1887 Figure 7: State changes at the arrival of an acknowledgment 1889 Probing timeout: The PROBE_COUNT is initialised to zero each time 1890 the value of PROBED_SIZE is changed and when a acknowledgment 1891 confirming delivery of a probe packet. The PROBE_TIMER is started 1892 each time a probe packet is sent. It is stopped when an 1893 acknowledgment arrives that confirms delivery of a probe packet of 1894 PROBED_SIZE. If the probe packet is not acknowledged before the 1895 PROBE_TIMER expires, the PROBE_COUNT is incremented. When the 1896 PROBE_COUNT equals the value MAX_PROBES, the state is changed, 1897 otherwise a new probe packet of the same size (PROBED_SIZE) is 1898 resent. The state transitions are illustrated in Figure 8. This 1899 shows a simplification of Figure 4 with a focus only on this 1900 event. 1902 +--------------+ +----------------+ 1903 | PROBE_START | --2------------------------------->| PROBE_DISABLED | 1904 +--------------+ +----------------+ 1906 +--------------+ +--------------+ 1907 | PROBE_ERROR | -----------------> | PROBE_ERROR | 1908 +--------------+ / +--------------+ 1909 / 1910 +--------------+ --2----------/ +--------------+ 1911 | PROBE_BASE | --1------------------------------> | PROBE_BASE | 1912 +--------------+ +--------------+ 1914 +--------------+ +--------------+ 1915 | PROBE_SEARCH | --1------------------------------> | PROBE_SEARCH | 1916 +--------------+ --2--------- +--------------+ 1917 \ 1918 +---------------+ \ +---------------+ 1919 |SEARCH_COMPLETE| -------------------> |SEARCH_COMPLETE| 1920 +---------------+ +---------------+ 1922 Condition 1: The maximum number of probe packets has not been 1923 reached. Condition 2: The maximum number of probe packets has been 1924 reached. XXX This diagram has not been validated. 1926 Figure 8: State changes at the expiration of the probe timer 1928 PMTU raise timer timeout: DPLPMTUD periodically sends a probe packet 1929 to detect whether a larger PMTU is possible. This probe packet is 1930 generated by the PMTU_RAISE_TIMER. 1932 Arrival of a PTB message: The active probing of the path can be 1933 supported by the arrival of a PTB message indicating the PTB_SIZE. 1934 Two examples are: 1936 1. The PTB_SIZE is between the PLPMTU and the probe that 1937 triggered the PTB message. 1939 2. The PTB_SIZE is smaller than the PLPMTU. 1941 In first case, the PROBE_BASE state transitions to the PROBE_ERROR 1942 state. In the PROBE_SEARCH state, a new probe packet is sent with 1943 the size reported by the PTB message. 1945 In second case, the probing starts again with a value of 1946 PROBE_BASE. 1948 Appendix B. Revision Notes 1950 Note to RFC-Editor: please remove this entire section prior to 1951 publication. 1953 Individual draft -00: 1955 o Comments and corrections are welcome directly to the authors or 1956 via the IETF TSVWG working group mailing list. 1958 o This update is proposed for WG comments. 1960 Individual draft -01: 1962 o Contains the first representation of the algorithm, showing the 1963 states and timers 1965 o This update is proposed for WG comments. 1967 Individual draft -02: 1969 o Contains updated representation of the algorithm, and textual 1970 corrections. 1972 o The text describing when to set the effective PMTU has not yet 1973 been validated by the authors 1975 o To determine security to off-path-attacks: We need to decide 1976 whether a received PTB message SHOULD/MUST be validated? The text 1977 on how to handle a PTB message indicating a link MTU larger than 1978 the probe has yet not been validated by the authors 1980 o No text currently describes how to handle inconsistent results 1981 from arbitrary re-routing along different parallel paths 1983 o This update is proposed for WG comments. 1985 Working Group draft -00: 1987 o This draft follows a successful adoption call for TSVWG 1989 o There is still work to complete, please comment on this draft. 1991 Working Group draft -01: 1993 o This draft includes improved introduction. 1995 o The draft is updated to require ICMP validation prior to accepting 1996 PTB messages - this to be confirmed by WG 1998 o Section added to discuss Selection of Probe Size - methods to be 1999 evlauated and recommendations to be considered 2001 o Section added to align with work proposed in the QUIC WG. 2003 Working Group draft -02: 2005 o The draft was updated based on feedback from the WG, and a 2006 detailed review by Magnus Westerlund. 2008 o The document updates RFC 4821. 2010 o Requirements list updated. 2012 o Added more explicit discussion of a simpler black-hole detection 2013 mode. 2015 o This draft includes reorganisation of the section on IETF 2016 protocols. 2018 o Added more discussion of implementation within an application. 2020 o Added text on flapping paths. 2022 o Replaced 'effective MTU' with new term PLPMTU. 2024 Working Group draft -03: 2026 o Updated figures 2028 o Added more discussion on blackhole detection 2030 o Added figure describing just blackhole detection 2032 o Added figure relating MPS sizes 2034 Working Group draft -04: 2036 o Described phases and named these consistently. 2038 o Corrected transition from confirmation directly to the search 2039 phase (Base has been checked). 2041 o Redrawn state diagrams. 2043 o Renamed BASE_MTU to BASE_PMTU (because it is a base for the PMTU). 2045 o Clarified Error state. 2047 o Clarified supsending DPLPMTUD. 2049 o Verified normative text in requirements section. 2051 o Removed duplicate text. 2053 o Changed all text to refer to /packet probe/probe packet/ 2054 /validation/verification/ added term /Probe Confirmation/ and 2055 clarified BlackHole detection. 2057 Working Group draft -05: 2059 o Updated security considerations. 2061 o Feedback after speaking with Joe Touch helped improve UDP-Options 2062 description. 2064 Working Group draft -06: 2066 o Updated description of ICMP issues in section 1.1 2068 o Update to description of QUIC. 2070 Authors' Addresses 2072 Godred Fairhurst 2073 University of Aberdeen 2074 School of Engineering 2075 Fraser Noble Building 2076 Aberdeen AB24 3UE 2077 UK 2079 Email: gorry@erg.abdn.ac.uk 2081 Tom Jones 2082 University of Aberdeen 2083 School of Engineering 2084 Fraser Noble Building 2085 Aberdeen AB24 3UE 2086 UK 2088 Email: tom@erg.abdn.ac.uk 2089 Michael Tuexen 2090 Muenster University of Applied Sciences 2091 Stegerwaldstrasse 39 2092 Stein fart 48565 2093 DE 2095 Email: tuexen@fh-muenster.de 2097 Irene Ruengeler 2098 Muenster University of Applied Sciences 2099 Stegerwaldstrasse 39 2100 Stein fart 48565 2101 DE 2103 Email: i.ruengeler@fh-muenster.de