idnits 2.17.1 draft-ietf-tsvwg-datagram-plpmtud-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. -- The abstract seems to indicate that this document updates RFC8201, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC4821, updated by this document, for RFC5378 checks: 2003-10-21) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 18, 2019) is 1895 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-16 == Outdated reference: A later version (-32) exists of draft-ietf-tsvwg-udp-options-05 ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) == Outdated reference: A later version (-13) exists of draft-ietf-intarea-tunnels-09 Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Fairhurst 3 Internet-Draft T. Jones 4 Updates: 4821 (if approved) University of Aberdeen 5 Intended status: Standards Track M. Tuexen 6 Expires: August 22, 2019 I. Ruengeler 7 T. Voelker 8 Muenster University of Applied Sciences 9 February 18, 2019 11 Packetization Layer Path MTU Discovery for Datagram Transports 12 draft-ietf-tsvwg-datagram-plpmtud-07 14 Abstract 16 This document describes a robust method for Path MTU Discovery 17 (PMTUD) for datagram Packetization Layers (PLs). The document 18 describes an extension to RFC 1191 and RFC 8201, which specifies 19 ICMP-based Path MTU Discovery for IPv4 and IPv6. The method allows a 20 PL, or a datagram application that uses a PL, to discover whether a 21 network path can support the current size of datagram. This can be 22 used to detect and reduce the message size when a sender encounters a 23 network black hole (where packets are discarded, and no ICMP message 24 is received). The method can also probe a network path with 25 progressively larger packets to find whether the maximum packet size 26 can be increased. This allows a sender to determine an appropriate 27 packet size, providing functionally for datagram transports that is 28 equivalent to the Packetization Layer PMTUD specification for TCP, 29 specified in RFC 4821. 31 The document also provides implementation notes for incorporating 32 Datagram PMTUD into IETF datagram transports or applications that use 33 datagram transports. 35 When published, this specification updates RFC 4821. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at https://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on August 22, 2019. 54 Copyright Notice 56 Copyright (c) 2019 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (https://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 72 1.1. Classical Path MTU Discovery . . . . . . . . . . . . . . 4 73 1.2. Packetization Layer Path MTU Discovery . . . . . . . . . 6 74 1.3. Path MTU Discovery for Datagram Services . . . . . . . . 7 75 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 76 3. Features Required to Provide Datagram PLPMTUD . . . . . . . . 9 77 4. DPLPMTUD Mechanisms . . . . . . . . . . . . . . . . . . . . . 12 78 4.1. PLPMTU Probe Packets . . . . . . . . . . . . . . . . . . 12 79 4.2. Confirmation of Probed Packet Size . . . . . . . . . . . 13 80 4.3. Detection of Black Holes . . . . . . . . . . . . . . . . 14 81 4.4. Response to PTB Messages . . . . . . . . . . . . . . . . 15 82 4.4.1. Validation of PTB Messages . . . . . . . . . . . . . 15 83 4.4.2. Use of PTB Messages . . . . . . . . . . . . . . . . . 16 84 5. Datagram Packetization Layer PMTUD . . . . . . . . . . . . . 17 85 5.1. DPLPMTUD Components . . . . . . . . . . . . . . . . . . . 18 86 5.1.1. Timers . . . . . . . . . . . . . . . . . . . . . . . 18 87 5.1.2. Constants . . . . . . . . . . . . . . . . . . . . . . 19 88 5.1.3. Variables . . . . . . . . . . . . . . . . . . . . . . 19 89 5.2. DPLPMTUD Phases . . . . . . . . . . . . . . . . . . . . . 20 90 5.2.1. BASE_PMTU Confirmation Phase . . . . . . . . . . . . 22 91 5.2.2. Search Phase . . . . . . . . . . . . . . . . . . . . 22 92 5.2.2.1. Resilience to Inconsistent Path Information . . . 22 93 5.2.3. Search Complete Phase . . . . . . . . . . . . . . . . 23 94 5.2.4. PROBE_BASE Phase . . . . . . . . . . . . . . . . . . 23 95 5.2.5. ERROR Phase . . . . . . . . . . . . . . . . . . . . . 24 96 5.2.5.1. Robustness to Inconsistent Path . . . . . . . . . 24 98 5.2.6. DISABLED Phase . . . . . . . . . . . . . . . . . . . 24 99 5.3. State Machine . . . . . . . . . . . . . . . . . . . . . . 24 100 5.4. Search to Increase the PLPMTU . . . . . . . . . . . . . . 27 101 5.4.1. Probing for a Larger PLPMTU . . . . . . . . . . . . . 27 102 5.4.2. Selection of Probe Sizes . . . . . . . . . . . . . . 28 103 5.4.3. Resilience to Inconsistent Path Information . . . . . 28 104 6. Specification of Protocol-Specific Methods . . . . . . . . . 28 105 6.1. Application support for DPLPMTUD with UDP or UDP-Lite . . 29 106 6.1.1. Application Request . . . . . . . . . . . . . . . . . 29 107 6.1.2. Application Response . . . . . . . . . . . . . . . . 29 108 6.1.3. Sending Application Probe Packets . . . . . . . . . . 30 109 6.1.4. Validating the Path . . . . . . . . . . . . . . . . . 30 110 6.1.5. Handling of PTB Messages . . . . . . . . . . . . . . 30 111 6.2. DPLPMTUD with UDP Options . . . . . . . . . . . . . . . . 30 112 6.2.1. UDP Probe Request Option . . . . . . . . . . . . . . 32 113 6.2.2. UDP Probe Response Option . . . . . . . . . . . . . . 32 114 6.3. DPLPMTUD for SCTP . . . . . . . . . . . . . . . . . . . . 33 115 6.3.1. SCTP/IPv4 and SCTP/IPv6 . . . . . . . . . . . . . . . 33 116 6.3.1.1. Sending SCTP Probe Packets . . . . . . . . . . . 33 117 6.3.1.2. Validating the Path with SCTP . . . . . . . . . . 34 118 6.3.1.3. PTB Message Handling by SCTP . . . . . . . . . . 34 119 6.3.2. DPLPMTUD for SCTP/UDP . . . . . . . . . . . . . . . . 34 120 6.3.2.1. Sending SCTP/UDP Probe Packets . . . . . . . . . 34 121 6.3.2.2. Validating the Path with SCTP/UDP . . . . . . . . 34 122 6.3.2.3. Handling of PTB Messages by SCTP/UDP . . . . . . 34 123 6.3.3. DPLPMTUD for SCTP/DTLS . . . . . . . . . . . . . . . 34 124 6.3.3.1. Sending SCTP/DTLS Probe Packets . . . . . . . . . 35 125 6.3.3.2. Validating the Path with SCTP/DTLS . . . . . . . 35 126 6.3.3.3. Handling of PTB Messages by SCTP/DTLS . . . . . . 35 127 6.4. DPLPMTUD for QUIC . . . . . . . . . . . . . . . . . . . . 35 128 6.4.1. Sending QUIC Probe Packets . . . . . . . . . . . . . 35 129 6.4.2. Validating the Path with QUIC . . . . . . . . . . . . 36 130 6.4.3. Handling of PTB Messages by QUIC . . . . . . . . . . 36 131 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36 132 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 133 9. Security Considerations . . . . . . . . . . . . . . . . . . . 36 134 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 135 10.1. Normative References . . . . . . . . . . . . . . . . . . 38 136 10.2. Informative References . . . . . . . . . . . . . . . . . 39 137 Appendix A. Event-driven state changes . . . . . . . . . . . . . 40 138 Appendix B. Revision Notes . . . . . . . . . . . . . . . . . . . 43 139 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 45 141 1. Introduction 143 The IETF has specified datagram transport using UDP, SCTP, and DCCP, 144 as well as protocols layered on top of these transports (e.g., SCTP/ 145 UDP, DCCP/UDP, QUIC/UDP), and direct datagram transport over the IP 146 network layer. This document describes a robust method for Path MTU 147 Discovery (PMTUD) that may be used with these transport protocols (or 148 the applications that use their transport service) to discover an 149 appropriate size of packet to use across an Internet path. 151 1.1. Classical Path MTU Discovery 153 Classical Path Maximum Transmission Unit Discovery (PMTUD) can be 154 used with any transport that is able to process ICMP Packet Too Big 155 (PTB) messages (e.g., [RFC1191] and [RFC8201]). The term PTB message 156 is applied to both IPv4 ICMP Unreachable messages (type 3) that carry 157 the error Fragmentation Needed (Type 3, Code 4) [RFC0792] and ICMPv6 158 packet too big messages (Type 2) [RFC4443]. When a sender receives a 159 PTB message, it reduces the effective MTU to the value reported as 160 the Link MTU in the PTB message, and a method that from time-to-time 161 increases the packet size in attempt to discover an increase in the 162 supported PMTU. The packets sent with a size larger than the current 163 effective PMTU are known as probe packets. 165 Packets not intended as probe packets are either fragmented to the 166 current effective PMTU, or the attempt to send fails with an error 167 code. Applications are sometimes provided with a primitive to let 168 them read the Maximum Packet Size (MPS), derived from the current 169 effective PMTU. 171 Classical PMTUD is subject to protocol failures. One failure arises 172 when traffic using a packet size larger than the actual PMTU is 173 black-holed (all datagrams sent with this size, or larger, are 174 silently discarded without the sender receiving PTB messages). This 175 could arise when the PTB messages are not delivered back to the 176 sender for some reason (see for example [RFC2923]). 178 Examples where PTB messages are not delivered include: 180 o The generation of ICMP messages is usually rate limited. This may 181 result in no PTB messages being sent to the sender (see section 182 2.4 of [RFC4443]) 184 o ICMP messages are increasingly filtered by middleboxes (including 185 firewalls) [RFC4890]. A stateful firewall could be configured 186 with a policy to block incoming ICMP messages, which would prevent 187 reception of PTB messages to endpoints behind this firewall. 189 o When the router issuing the ICMP message drops a tunneled packet, 190 the resulting ICMP message will be directed to the tunnel ingress. 191 This tunnel endpoint is responsible for forwarding the ICMP 192 message and also processing the quoted packet within the payload 193 field to remove the effect of the tunnel, and return a correctly 194 formatted ICMP message to the sender [I-D.ietf-intarea-tunnels]. 195 Failure to do this results in black-holing. 197 o Asymmetry in forwarding can result in there being no route back to 198 the original sender, which would prevent an ICMP message being 199 delivered to the sender. This can be also be an issue when 200 policy-based routing is used, Equal Cost Multipath (ECMP) routing 201 is used, or a middlebox acts as an application load balancer. An 202 example is where the path towards the server is chosen by ECMP 203 routing depending on bytes in the IP payload. In this case, when 204 a packet sent by the server encounters a problem after the ECMP 205 router, then any resulting ICMP message needs to also be directed 206 by the ECMP router towards the same server (i.e., ICMP messages 207 need to follow the same path as the flows to which they 208 correspond). Failure to do this results in black-holing. 210 o There are cases where the next hop destination fails to receive a 211 packet because of its size. This could be due to misconfiguration 212 of the layer 2 path between nodes, for instance the MTU configured 213 in a layer 2 switch, or misconfiguration of the Maximum Receive 214 Unit (MRU). If the packet is dropped by the link, this will not 215 cause a PTB message to be sent, and result in consequent black- 216 holing. 218 Another failure could result if a node that is not on the network 219 path sends a PTB message that attempts to force the sender to change 220 the effective PMTU [RFC8201]. A sender can protect itself from 221 reacting to such messages by utilising the quoted packet within a PTB 222 message payload to validate that the received PTB message was 223 generated in response to a packet that had actually originated from 224 the sender. However, there are situations where a sender would be 225 unable to provide this validation. 227 Examples where validation of the PTB message is not possible include: 229 o When a router issuing the ICMP message implements RFC792 230 [RFC0792], it is only required to include the first 64 bits of the 231 IP payload of the packet within the quoted payload. This may be 232 insufficient to perform the tunnel processing described in the 233 previous bullet. There could be insufficient bytes remaining for 234 the sender to interpret the quoted transport information. The 235 recommendation in RFC1812 [RFC1812] is that IPv4 routers return a 236 quoted packet with as much of the original datagram as possible 237 without the length of the ICMP datagram exceeding 576 bytes. 238 (IPv6 routers include as much of invoking packet as possible 239 without the ICMPv6 packet exceeding 1280 bytes [RFC4443].) 241 o The use of tunnels/encryption can reduce the size of the quoted 242 packet returned to the original source address, increasing the 243 risk that there could be insufficient bytes remaining for the 244 sender to interpret the quoted transport information. 246 o Even when the PTB message includes sufficient bytes of the quoted 247 packet, the network layer could lack sufficient context to 248 validate the message, because validation depends on information 249 about the active transport flows at an endpoint node (e.g., the 250 socket/address pairs being used, and other protocol header 251 information). 253 o When a packet is encapsulated/tunneled over an encrypted 254 transport, the tunnel/encapsulation ingress might have 255 insufficient context, or computational power, to reconstruct the 256 transport header that would be needed to perform validation. 258 1.2. Packetization Layer Path MTU Discovery 260 The term Packetization Layer (PL) has been introduced to describe the 261 layer that is responsible for placing data blocks into the payload of 262 IP packets and selecting an appropriate MPS. This function is often 263 performed by a transport protocol, but can also be performed by other 264 encapsulation methods working above the transport layer. 266 In contrast to PMTUD, Packetization Layer Path MTU Discovery 267 (PLPMTUD) [RFC4821] does not rely upon reception and validation of 268 PTB messages. It is therefore more robust than Classical PMTUD. 269 This has become the recommended approach for implementing PMTU 270 discovery with TCP. 272 It uses a general strategy where the PL sends probe packets to search 273 for the largest size of unfragmented datagram that can be sent over a 274 network path. The probe packets are sent with a progressively larger 275 packet size. If a probe packet is successfully delivered (as 276 determined by the PL), then the PLPMTU is raised to the size of the 277 successful probe. If no response is received to a probe packet, the 278 method reduces the probe size. This PLPMTU is used to set the 279 application MPS. 281 PLPMTUD introduces flexibility in the implementation of PMTU 282 discovery. At one extreme, it can be configured to only perform PTB 283 black hole detection and recovery to increase the robustness of 284 Classical PMTUD, or at the other extreme, all PTB processing can be 285 disabled and PLPMTUD can completely replace Classical PMTUD. 287 PLPMTUD can also include additional consistency checks without 288 increasing the risk of increased black-holing. For instance,the 289 information available at the PL, or higher layers, makes PTB message 290 validation more straight forward. 292 1.3. Path MTU Discovery for Datagram Services 294 Section 5 of this document presents a set of algorithms for datagram 295 protocols to discover the largest size of unfragmented datagram that 296 can be sent over a network path. The method described relies on 297 features of the PL described in Section 3 and applies to transport 298 protocols operating over IPv4 and IPv6. It does not require 299 cooperation from the lower layers, although it can utilise PTB 300 messages when these received messages are made available to the PL. 302 The UDP Usage Guidelines [RFC8085] state "an application SHOULD 303 either use the Path MTU information provided by the IP layer or 304 implement Path MTU Discovery (PMTUD)", but does not provide a 305 mechanism for discovering the largest size of unfragmented datagram 306 that can be used on a network path. Prior to this document, PLPMTUD 307 had not been specified for UDP. 309 Section 10.2 of [RFC4821] recommends a PLPMTUD probing method for the 310 Stream Control Transport Protocol (SCTP). SCTP utilises probe 311 packets consisting of a minimal sized HEARTBEAT chunk bundled with a 312 PAD chunk as defined in [RFC4820], but RFC4821 does not provide a 313 complete specification. The present document provides the details to 314 complete that specification. 316 The Datagram Congestion Control Protocol (DCCP) [RFC4340] requires 317 implementations to support Classical PMTUD and states that a DCCP 318 sender "MUST maintain the MPS allowed for each active DCCP session". 319 It also defines the current congestion control MPS (CCMPS) supported 320 by a network path. This recommends use of PMTUD, and suggests use of 321 control packets (DCCP-Sync) as path probe packets, because they do 322 not risk application data loss. The method defined in this 323 specification could be used with DCCP. 325 Section 6 specifies the method for a set of transports, and provides 326 information to enable the implementation of PLPMTUD with other 327 datagram transports and applications that use datagram transports. 329 2. Terminology 331 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 332 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 333 "OPTIONAL" in this document are to be interpreted as described in BCP 334 14 [RFC2119] [RFC8174] when, and only when, they appear in all 335 capitals, as shown here. 337 Other terminology is directly copied from [RFC4821], and the 338 definitions in [RFC1122]. 340 Actual PMTU: The Actual PMTU is the PMTU of a network path between a 341 sender PL and a destination PL, which the DPLPMTUD algorithm seeks 342 to determine. 344 Black Holed: Packets are Black holed when the sender is unaware that 345 packets are not delivered to the destination endpoint (e.g., when 346 the sender transmits packets of a particular size with a 347 previously known effective PMTU and they are silently discarded by 348 the network, but is not made aware of a change to the path that 349 resulted in a smaller PLPMTU by ICMP messages). 351 Classical Path MTU Discovery: Classical PMTUD is a process described 352 in [RFC1191] and [RFC8201], in which nodes rely on PTB messages to 353 learn the largest size of unfragmented datagram that can be used 354 across a network path. 356 Datagram: A datagram is a transport-layer protocol data unit, 357 transmitted in the payload of an IP packet. 359 Effective PMTU: The Effective PMTU is the current estimated value 360 for PMTU that is used by a PMTUD. This is equivalent to the 361 PLPMTU derived by PLPMTUD. 363 EMTU_S: The Effective MTU for sending (EMTU_S) is defined in 364 [RFC1122] as "the maximum IP datagram size that may be sent, for a 365 particular combination of IP source and destination addresses...". 367 EMTU_R: The Effective MTU for receiving (EMTU_R) is designated in 368 [RFC1122] as the largest datagram size that can be reassembled by 369 EMTU_R ("Effective MTU to receive"). 371 Link: A Link is a communication facility or medium over which nodes 372 can communicate at the link layer, i.e., a layer below the IP 373 layer. Examples are Ethernet LANs and Internet (or higher) layer 374 and tunnels. 376 Link MTU: The Link Maximum Transmission Unit (MTU) is the size in 377 bytes of the largest IP packet, including the IP header and 378 payload, that can be transmitted over a link. Note that this 379 could more properly be called the IP MTU, to be consistent with 380 how other standards organizations use the acronym. This includes 381 the IP header, but excludes link layer headers and other framing 382 that is not part of IP or the IP payload. Other standards 383 organizations generally define the link MTU to include the link 384 layer headers. 386 MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU that DPLPMTUD 387 will attempt to use. 389 MPS: The Maximum Packet Size (MPS) is the largest size of 390 application data block that can be sent across a network path. In 391 DPLPMTUD this quantity is derived from the PLPMTU by taking into 392 consideration the size of the lower protocol layer headers. 394 MIN_PMTU: The MIN_PMTU is the smallest size of PLPMTU that DPLPMTUD 395 will attempt to use. 397 Packet: A Packet is the IP header plus the IP payload. 399 Packetization Layer (PL): The Packetization Layer (PL) is the layer 400 of the network stack that places data into packets and performs 401 transport protocol functions. 403 Path: The Path is the set of links and routers traversed by a packet 404 between a source node and a destination node by a particular flow. 406 Path MTU (PMTU): The Path MTU (PMTU) is the minimum of the Link MTU 407 of all the links forming a network path between a source node and 408 a destination node. 410 PTB_SIZE: The PTB_SIZE is a value reported in a validated PTB 411 message that indicates next hop link MTU of a router along the 412 path. 414 PLPMTU: The Packetization Layer PMTU is an estimate of the actual 415 PMTU provided by the DPLPMTUD algorithm. 417 PLPMTUD: Packetization Layer Path MTU Discovery (PLPMTUD), the 418 method described in this document for datagram PLs, which is an 419 extension to Classical PMTU Discovery. 421 Probe packet: A probe packet is a datagram sent with a purposely 422 chosen size (typically the current PLPMTU or larger) to detect if 423 packets of this size can be successfully sent end-to-end across 424 the network path. 426 3. Features Required to Provide Datagram PLPMTUD 428 TCP PLPMTUD has been defined using standard TCP protocol mechanisms. 429 All of the requirements in [RFC4821] also apply to the use of the 430 technique with a datagram PL. Unlike TCP, some datagram PLs require 431 additional mechanisms to implement PLPMTUD. 433 There are eight requirements for performing the datagram PLPMTUD 434 method described in this specification: 436 1. PMTU parameters: A DPLPMTUD sender is RECOMMENDED to provide 437 information about the maximum size of packet that can be 438 transmitted by the sender on the local link (the local Link MTU). 439 It MAY utilize similar information about the receiver when this 440 is supplied (note this could be less than EMTU_R). This avoids 441 implementations trying to send probe packets that can not be 442 transmitted by the local link. Too high of a value could reduce 443 the efficiency of the search algorithm. Some applications also 444 have a maximum transport protocol data unit (PDU) size, in which 445 case there is no benefit from probing for a size larger than this 446 (unless a transport allows multiplexing multiple applications 447 PDUs into the same datagram). 449 2. PLPMTU: A datagram application using a transport layer not 450 supporting fragmentation is REQUIRED to be able to choose the 451 size of datagrams sent to the network, up to the PLPMTU, or a 452 smaller value (such as the MPS) derived from this. This value is 453 managed by the DPLPMTUD method. The PLPMTU (specified as the 454 effective PMTU in Section 1 of [RFC1191]) is equivalent to the 455 EMTU_S (specified in [RFC1122]). 457 3. Probe packets: On request, a DPLPMTUD sender is REQUIRED to be 458 able to transmit a packet larger than the PLMPMTU. This is used 459 to send a probe packet. In IPv4, a probe packet MUST be sent 460 with the Don't Fragment (DF) bit set in the IP header, and 461 without network layer endpoint fragmentation. In IPv6, a probe 462 packet is always sent without source fragmentation (as specified 463 in section 5.4 of [RFC8201]). 465 4. Processing PTB messages: A DPLPMTUD sender MAY optionally utilize 466 PTB messages received from the network layer to help identify 467 when a network path does not support the current size of probe 468 packet. Any received PTB message MUST be validated before it is 469 used to update the PLPMTU discovery information [RFC8201]. This 470 validation confirms that the PTB message was sent in response to 471 a packet originating by the sender, and needs to be performed 472 before the PLPMTU discovery method reacts to the PTB message. A 473 PTB message MUST NOT be used to increase the PLPMTU [RFC8201]. 475 5. Reception feedback: The destination PL endpoint is REQUIRED to 476 provide a feedback method that indicates to the DPLPMTUD sender 477 when a probe packet has been received by the destination PL 478 endpoint. The mechanism needs to be robust to the possibility 479 that packets could be significantly delayed along a network path. 481 The local PL endpoint at the sending node is REQUIRED to pass 482 this feedback to the sender-side DPLPMTUD method. 484 6. Probe loss recovery: It is RECOMMENDED to use probe packets that 485 do not carry any user data. Most datagram transports permit 486 this. If a probe packet contains user data requiring 487 retransmission in case of loss, the PL (or layers above) are 488 REQUIRED to arrange any retransmission/repair of any resulting 489 loss. DPLPMTUD is REQUIRED to be robust in the case where probe 490 packets are lost due to other reasons (including link 491 transmission error, congestion). 493 7. Probing and congestion control: The DPLPMTUD sender treats 494 isolated loss of a probe packet (with or without a corresponding 495 PTB message) as a potential indication of a PMTU limit for the 496 path. Loss of a probe packet SHOULD NOT be treated as an 497 indication of congestion and the loss SHOULD NOT directly trigger 498 a congestion control reaction [RFC4821]. 500 8. Shared PLPMTU state: The PLPMTU value could also be stored with 501 the corresponding entry in the destination cache and used by 502 other PL instances. The specification of PLPMTUD [RFC4821] 503 states: "If PLPMTUD updates the MTU for a particular path, all 504 Packetization Layer sessions that share the path representation 505 (as described in Section 5.2 of [RFC4821]) SHOULD be notified to 506 make use of the new MTU". Such methods MUST be robust to the 507 wide variety of underlying network forwarding behaviours, PLPMTU 508 adjustments based on shared PLPMTU values should be incorporated 509 in the search algorithms. Section 5.2 of [RFC8201] provides 510 guidance on the caching of PMTU information and also the relation 511 to IPv6 flow labels. 513 In addition, the following principles are stated for design of a 514 DPLPMTUD method: 516 o MPS: A method is REQUIRED to signal an appropriate MPS to the 517 higher layer using the PL. The value of the MPS can change 518 following a change to the path. It is RECOMMENDED that methods 519 avoid forcing an application to use an arbitrary small MPS 520 (PLPMTU) for transmission while the method is searching for the 521 currently supported PLPMTU. Datagram PLs do not necessarily 522 support fragmentation of PDUs larger than the PLPMTU. A reduced 523 MPS can adversely impact the performance of a datagram 524 application. 526 o Path validation: It is RECOMMENDED that methods are robust to path 527 changes that could have occurred since the path characteristics 528 were last confirmed, and to the possibility of inconsistent path 529 information being received. 531 o Datagram reordering: A method is REQUIRED to be robust to the 532 possibility that a flow encounters reordering, or the traffic 533 (including probe packets) is divided over more than one network 534 path. 536 o When to probe: It is RECOMMENDED that methods determine whether 537 the path capacity has increased since it last measured the path. 538 This determines when the path should again be probed. 540 4. DPLPMTUD Mechanisms 542 This section lists the protocol mechanisms used in this 543 specification. 545 4.1. PLPMTU Probe Packets 547 The DPLPMTUD method relies upon the PL sender being able to generate 548 probe packets with a specific size. TCP is able to generate these 549 probe packets by choosing to appropriately segment data being sent 550 [RFC4821]. In contrast, a datagram PL that needs to construct a 551 probe packet has to either request an application to send a data 552 block that is larger than that generated by an application, or to 553 utilise padding functions to extend a datagram beyond the size of the 554 application data block. Protocols that permit exchange of control 555 messages (without an application data block) could alternatively 556 prefer to generate a probe packet by extending a control message with 557 padding data. 559 A receiver needs to be able to distinguish an in-band data block from 560 any added padding. This is needed to ensure that any added padding 561 is not passed on to an application at the receiver. 563 This results in three possible ways that a sender can create a probe 564 packet listed in order of preference: 566 Probing using padding data: A probe packet that contains only 567 control information together with any padding, which is needed to 568 be inflated to the size required for the probe packet. Since 569 these probe packets do not carry an application-supplied data 570 block, they do not typically require retransmission, although they 571 do still consume network capacity and incur endpoint processing. 573 Probing using application data and padding data: A probe packet that 574 contains a data block supplied by an application that is combined 575 with padding to inflate the length of the datagram to the size 576 required for the probe packet. If the application/transport needs 577 protection from the loss of this probe packet, the application/ 578 transport could perform transport-layer retransmission/repair of 579 the data block (e.g., by retransmission after loss is detected or 580 by duplicating the data block in a datagram without the padding 581 data). 583 Probing using application data: A probe packet that contains a data 584 block supplied by an application that matches the size required 585 for the probe packet. This method requests the application to 586 issue a data block of the desired probe size. If the application/ 587 transport needs protection from the loss of an unsuccessful probe 588 packet, the application/transport needs then to perform transport- 589 layer retransmission/repair of the data block (e.g., by 590 retransmission after loss is detected). 592 A PL that uses a probe packet carrying an application data block, 593 could need to retransmit this application data block if the probe 594 fails. This could need the PL to re-fragment the data block to a 595 smaller packet size that is expected to traverse the end-to-end path 596 (which could utilise endpoint network-layer or PL fragmentation when 597 these are available). 599 DPLPMTUD MAY choose to use only one of these methods to simplify the 600 implementation. 602 Probe messages sent by a PL MUST contain enough information to 603 uniquely identify the probe within Maximum Segment Lifetime, while 604 being robust to reordering and replay of probe response and PTB 605 messages. 607 4.2. Confirmation of Probed Packet Size 609 The PL needs a method to determine (confirm) when probe packets have 610 been successfully received end-to-end across a network path. 612 Transport protocols can include end-to-end methods that detect and 613 report reception of specific datagrams that they send (e.g., DCCP and 614 SCTP provide keep-alive/heartbeat features). When supported, this 615 mechanism SHOULD also be used by DPLPMTUD to acknowledge reception of 616 a probe packet. 618 A PL that does not acknowledge data reception (e.g., UDP and UDP- 619 Lite) is unable itself to detect when the packets that it sends are 620 discarded because their size is greater than the actual PMTU. These 621 PLs need to either rely on an application protocol to detect this 622 loss, or make use of an additional transport method such as UDP- 623 Options [I-D.ietf-tsvwg-udp-options]. 625 Section 5 specifies this function for a set of IETF-specified 626 protocols. 628 4.3. Detection of Black Holes 630 A PL sender needs to reduce the PLPMTU when it discovers the actual 631 PMTU supported by a network path is less than the PLPMTU (i.e. to 632 detect that traffic is being black holed). This can be triggered 633 when a validated PTB message is received, or by another event that 634 indicates the network path no longer sustains the current packet 635 size, such as a loss report from the PL or repeated lack of response 636 to probe packets sent to confirm the PLPMTU. Detection is followed 637 by a reduction of the PLPMTU. 639 Black Hole detection is performed by periodically sending packet 640 probes of size PLPMTU to verify that a network path still supports 641 the last acknowledged PLPMTU size. There are two ways a DPLPMTUD 642 sender detect that the current PLPMTU is not sustained by the path 643 (i.e., to detect a black hole): 645 o A PL can rely upon a mechanisms implemented within the PL protocol 646 to detect excessive loss of data sent with a specific packet size 647 and then conclude that this excessive loss could be a result of an 648 invalid PMTU (as in PLPMTUD for TCP [RFC4821]). 650 o A PL can use the probing mechanism to send confirmation probe 651 packets of the size of the current PLPMTU and a timer track 652 whether acknowledgments are received (e.g., the number of probe 653 packets sent without receiving an acknowledgement, PROBE_COUNT, 654 becomes greater than the MAX_PROBES). These messages need to be 655 generated periodically (e.g., using the confirmation timer 656 Section 5.1.1), and MAY inhibit sending probe packets when no 657 application data has been sent since the previous probe packet. A 658 PL preferring to use an up-to-data PMTU once user data is sent 659 again, MAY choose to continue PMTU discovery for each path. 660 However, this may result in additional packets being sent. 661 Successive loss of probes is an indication that the current path 662 no longer supports the PLPMTU. 664 When the method detects the current PLPMTU is not supported (a black 665 hole is found), DPLPMTUD sets a lower MPS. The PL then confirms that 666 the updated PLPMTU can be successfully used across the path. This 667 can need the PL to send a probe packet with a size less than the size 668 of the data block generated by an application. In this case, the PL 669 could provide a way to fragment a datagram at the PL, or could 670 instead utilise a control packet with padding. 672 4.4. Response to PTB Messages 674 This method requires the DPLPMTUD sender to validate any received PTB 675 message before using the PTB information. The response to a PTB 676 message depends on the PTB_SIZE indicated in the PTB message, the 677 state of the PLPMTUD state machine, and the IP protocol being used. 679 Section 4.4.1 first describes validation for both IPv4 ICMP 680 Unreachable messages (type 3) and ICMPv6 packet too big messages, 681 both of which are referred to as PTB messages in this document. 683 4.4.1. Validation of PTB Messages 685 This section specifies utlisation of PTB messages. 687 o A simple implementation MAY ignore received PTB messages and in 688 this case the PLPMTU is not updated when a PTB message is 689 received. 691 o An implementation that supports PTB messages MUST validate 692 messages before they are further processed. 694 A PL that receives a PTB message from a router or middlebox, performs 695 ICMP validation as specified in Section 5.2 of [RFC8085][RFC8201]. 696 Because DPLPMTUD operates at the PL, the PL needs to check that each 697 received PTB message is received in response to a packet transmitted 698 by the endpoint PL performing DPLPMTUD. 700 The PL MUST check the protocol information in the quoted packet 701 carried in the ICMP PTB message payload to validate the message 702 originated from the sending node. This validation includes 703 determining that the combination of the IP addresses, the protocol, 704 the source port and destination port match those returned in the 705 quoted packet - this is also necessary for the PTB message to be 706 passed to the corresponding PL. 708 The validation SHOULD utilise information that it is not simple for 709 an off-path attacker to determine. For example, by checking the 710 value of a protocol header field known only to the two PL endpoints. 711 A datagram application that uses well-known source and destination 712 ports ought to also rely on other information to complete this 713 validation. 715 These checks are intended to provide protection from packets that 716 originate from a node that is not on the network path. 718 A PTB message that does not complete the validation MUST NOT be 719 further utilised by the DPLPMTUD method. 721 PTB messages that have been validated MAY be utilised by the DPLPMTUD 722 algorithm, but MUST NOT be used directly to set the PLPMTU. A method 723 that utilises these PTB messages can improve the speed at the which 724 the algorithm detects an appropriate PLPMTU, compared to one that 725 relies solely on probing. Section 4.4.2 describes this processing. 727 4.4.2. Use of PTB Messages 729 A set of checks are intended to provide protection from a router that 730 reports an unexpected PTB_SIZE. The PL needs to check that the 731 indicated PTB_SIZE is less than the size used by probe packets and 732 larger than minimum size accepted. 734 This section provides a summary of how PTB messages can be utilised. 735 This processing depends on the PTB_SIZE and the current value of a 736 set of variables: 738 MIN_PMTU < PTB_SIZE < BASE_PMTU 740 * A robust PL MAY enter the PROBE_ERROR state for an IPv4 path 741 when the PTB_SIZE reported in the PTB message >= 68 bytes and 742 when this is less than the BASE_PMTU. 744 * A robust PL MAY enter the PROBE_ERROR state for an IPv6 path 745 when the PTB_SIZE reported in the PTB message >= 1280 bytes and 746 when this is less than the BASE_PMTU. 748 PTB_SIZE = PLPMTU 750 * Transition to SEARCH_COMPLETE. 752 PTB_SIZE > PROBED_SIZE 754 * The PTB_SIZE > PROBED_SIZE, inconsistent network signal. These 755 PTB messages ought to be discarded without further processing 756 (the PLPMTU not updated). 758 * The information could be utilised as an input to trigger 759 enabling a resilience mode. 761 BASE_PMTU <= PTB_SIZE < PLPMTU 763 * Black hole detection is triggered and the PLPMTU ought to be 764 set to BASE_PMTU. 766 * The PL could use PTB_SIZE reported in the PTB message to 767 initialise a search algorithm. 769 PLPMTU < PTB_SIZE < PROBED_SIZE 771 * The PLPMTU continues to be valid, but the last PROBED_SIZE 772 searched was larger than the actual PMTU. 774 * The PLPMTU is not updated. 776 * The PL can use the reported PTB_SIZE from the PTB message as 777 the next search point when it resumes the search algorithm. 779 xxx Author Note: Do we want to specify how to handle PTB Message with 780 PTB_SIZE = 0? xxx 782 5. Datagram Packetization Layer PMTUD 784 This section specifies Datagram PLPMTUD (DPLPMTUD). The method can 785 be introduced at various points (as indicated with * in the figure 786 below) in the IP protocol stack to discover the PLPMTU so that an 787 application can utilise an appropriate MPS for the current network 788 path. DPLPMTUD SHOULD NOT be used by an application if it is already 789 used in a lower layer. 791 +----------------------+ 792 | Application* | 793 +-+-------+----+---+---+ 794 | | | | 795 +---+--+ +--+--+ | +-+---+ 796 | QUIC*| |UDPO*| | |SCTP*| 797 +---+--+ +--+--+ | ++--+-+ 798 | | | | | 799 +-------+-+ | | | 800 | | | | 801 ++-+--++ | 802 | UDP | | 803 +---+--+ | 804 | | 805 +--------------+-----+-+ 806 | Network Interface | 807 +----------------------+ 809 Figure 1: Examples where DPLPMTUD can be implemented 811 The central idea of DPLPMTUD is probing by a sender. Probe packets 812 are sent to find the maximum size of a user message that can be 813 completely transferred across the network path from the sender to the 814 destination. 816 This section identifies the components needed for implementation, the 817 phases of operation, the state machine and search algorithm. 819 5.1. DPLPMTUD Components 821 This section describes components of DPLPMTUD. 823 5.1.1. Timers 825 The method utilises up to three timers: 827 PROBE_TIMER: The PROBE_TIMER is configured to expire after a period 828 longer than the maximum time to receive an acknowledgment to a 829 probe packet. This value MUST NOT be smaller than 1 second, and 830 SHOULD be larger than 15 seconds. Guidance on selection of the 831 timer value are provided in section 3.1.1 of the UDP Usage 832 Guidelines [RFC8085]. 834 If the PL has a path Round Trip Time (RTT) estimate and timely 835 acknowledgements the PROBE_TIMER can be derived from the PL RTT 836 estimate. 838 PMTU_RAISE_TIMER: The PMTU_RAISE_TIMER is configured to the period a 839 sender will continue to use the current PLPMTU, after which it re- 840 enters the Search phase. This timer has a period of 600 secs, as 841 recommended by PLPMTUD [RFC4821]. 843 DPLPMTUD MAY inhibit sending probe packets when no application 844 data has been sent since the previous probe packet. A PL 845 preferring to use an up-to-data PMTU once user data is sent again, 846 can choose to continue PMTU discovery for each path. However, 847 this could in sending additional packets. 849 CONFIRMATION_TIMER: When an acknowledged PL is used, this timer MUST 850 NOT be used. For other PLs, the CONFIRMATION_TIMER is configured 851 to the period a PL sender waits before confirming the current 852 PLPMTU is still supported. This is less than the PMTU_RAISE_TIMER 853 and used to decrease the PLPMTU (e.g., when a black hole is 854 encountered). Confirmation needs to be frequent enough when data 855 is flowing that the sending PL does not black hole extensive 856 amounts of traffic. Guidance on selection of the timer value are 857 provided in section 3.1.1 of the UDP Usage Guidelines [RFC8085]. 859 DPLPMTUD MAY inhibit sending probe packets when no application 860 data has been sent since the previous probe packet. A PL 861 preferring to use an up-to-data PMTU once user data is sent again, 862 can choose to continue PMTU discovery for each path. However, 863 this may result in sending additional packets. 865 An implementation could implement the various timers using a single 866 timer. 868 5.1.2. Constants 870 The following constants are defined: 872 MAX_PROBES: MAX_PROBES is the maximum value of the PROBE_COUNT 873 counter. The default value of MAX_PROBES is 10. 875 MIN_PMTU: The MIN_PMTU is smallest allowed probe packet size. For 876 IPv6, this value is 1280 bytes, as specified in [RFC2460]. For 877 IPv4, the minimum value is 68 bytes. (An IPv4 router is required 878 to be able to forward a datagram of 68 bytes without further 879 fragmentation. This is the combined size of an IPv4 header and 880 the minimum fragment size of 8 bytes. In addition, receivers are 881 required to be able to reassemble fragmented datagrams at least up 882 to 576 bytes, as stated in section 3.3.3 of [RFC1122])) 884 MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU. This has to 885 be less than or equal to the minimum of the local MTU of the 886 outgoing interface and the destination PMTU for receiving. An 887 application or PL MAY reduce the MAX_PMTU when there is no need to 888 send packets larger than a specific size. 890 BASE_PMTU: The BASE_PMTU is a configured size expected to work for 891 most paths. The size is equal to or larger than the MIN_PMTU and 892 smaller than the MAX_PMTU. In the case of IPv6, this value is 893 1280 bytes [RFC2460]. When using IPv4, a size of 1200 bytes is 894 RECOMMENDED. 896 5.1.3. Variables 898 This method utilises a set of variables: 900 PROBED_SIZE: The PROBED_SIZE is the size of the current probe 901 packet. This is a tentative value for the PLPMTU, which is 902 awaiting confirmation by an acknowledgment. 904 PROBE_COUNT: The PROBE_COUNT is a count of the number of 905 unsuccessful probe packets that have been sent with a size of 906 PROBED_SIZE. The value is initialised to zero when a particular 907 size of PROBED_SIZE is first attempted. 909 The figure below illustrates the relationship between the packet size 910 constants and variables, in this case when the DPLPMTUD algorithm 911 performs path probing to increase the size of the PLPMTU. The MPS is 912 less than the PLPMTU. A probe packet has been sent of size 913 PROBED_SIZE. When this is acknowledged, the PLPMTU will be raised to 914 PROBED_SIZE allowing the PROBED_SIZE to be increased towards the 915 actual PMTU. 917 MIN_PMTU MAX_PMTU 918 <--------------------------------------------------> 919 | | | | 920 V | | V 921 BASE_PMTU | V Actual PMTU 922 | PROBED_SIZE 923 V 924 PLPMTU 926 Figure 2: Relationships between probe and packet sizes 928 5.2. DPLPMTUD Phases 930 The Datagram PLPMTUD algorithm moves through several phases of 931 operation. 933 An implementation that only reduces the PLPMTU to a suitable size 934 would be sufficient to ensure reliable operation, but can be very 935 inefficient when the actual PMTU changes or when the method (for 936 whatever reason) makes a suboptimal choice for the PLPMTU. 938 A full implementation of DPLPMTUD provides an algorithm enabling the 939 DPLPMTUD sender to increase the PLPMTU following a change in the 940 characteristics of the path, such as when a link is reconfigured with 941 a larger MTU, or when there is a change in the set of links traversed 942 by an end-to-end flow (e.g., after a routing or path fail-over 943 decision). 945 Black hole detection (Section 4.3) and PTB processing (Section 4.4) 946 proceed in parallel with these phases of operation. 948 +------------------------+ 949 | BASE_PMTU Confirmation +-- Connectivity 950 +------------+-----------+ \----+ or BASE_PMTU 951 | ^ V Confirmation Fails 952 Connectivity and | | +-------+ 953 BASE_PMTU confirmed | +---------+ Error | 954 | +-------+ 955 | CONFIRMATION_TIMER 956 | Fires 957 V 958 +----------------+ +--------------+ 959 | Search Complete|<---------+ Search | 960 +----------------+ +--------------+ 961 Search Algorithm 962 Completes 964 Figure 3: DPLPMTUD Phases 966 BASE_PMTU Confirmation 968 * Connectivity is confirmed. 970 * DPLPMTUD confirms the BASE_PMTU is supported across the network 971 path. 973 * DPLPMTUD then enters the search phase. 975 Search 977 * DPLPMTUD performs probing to increase the PLPMTU. 979 * DPLPMTUD then enters the search complete or an error phase. 981 Search Complete 983 * DPLPMTUD has found a suitable PLPMTU that is supported across 984 the network path. 986 * Black hole detection will confirm this PLPMTU continues to be 987 supported. 989 * On a longer time-frame, DPLPMTUD will re-enter the search phase 990 to discover if the PLPMTU can be raised. 992 Error 994 * Inconsistent or invalid network signals cause DPLPMTUD to be 995 unable to progress. 997 * This causes the algorithm to lower the MPS until the path is 998 shown to support the BASE_PMTU, or to suspend DPLPMTUD. 1000 5.2.1. BASE_PMTU Confirmation Phase 1002 DPLPMTUD starts in the BASE_PMTU confirmation phase. BASE_PMTU 1003 confirmation is performed in two stages: 1005 1. Connectivity to the remote peer is first confirmed. When a 1006 connection-oriented PL is used, this stage is implicit. It is 1007 performed as part of the normal PL connection handshake. In 1008 contrast, an connectionless PL MUST send an acknowledged probe 1009 packet to confirm that the remote peer is reachable. 1011 2. In the second stage, the PL confirms it can successfully send a 1012 datagram of the BASE_PMTU size across the current path. 1014 A PL that does not wish to support a network path with a PLPMTU less 1015 than BASE_PMTU can simplify the phase into a single step by 1016 performing connectivity checks with probes of the BASE_PMTU size. 1018 A PL MAY respond to PTB messages while in this phase, see 1019 Section 4.4. 1021 Once BASE_PMTU confirmation has completed, DPLPMTUD can advertise an 1022 MPS to an upper layer. 1024 If DPLPMTUD fails to complete these tests it enters the 1025 PROBE_DISABLED phase, see Section 5.2.6, and ceases using DPLPTMUD. 1027 5.2.2. Search Phase 1029 The search phase utilises a search algorithm in attempt to increase 1030 the PLPMTU (see Section 5.4.1). The PL sender increases the MPS each 1031 time a packet probe confirms a larger PLPMTU is supported by the 1032 path. The algorithm concludes by entering the SEARCH_COMPLETE phase, 1033 see Section 5.2.3. 1035 A PL MAY respond to PTB messages while in this phase, using the PTB 1036 to advance or terminate the search, see Section 4.4. Similarly black 1037 hole detection can terminate the search by entering the PROBE_BASE 1038 phase, see Section 5.2.4. 1040 5.2.2.1. Resilience to Inconsistent Path Information 1042 Sometimes a PL sender is able to detect inconsistent results from the 1043 sequence of PLPMTU probes that it sends or the sequence of PTB 1044 messages that it receives. This could be manifested as excessive 1045 fluctuation of the MPS. 1047 When inconsistent path information is detected, a PL sender can 1048 enable an alternate search mode that clamps the offered MPS to a 1049 smaller value for a period of time. This avoids unnecessary black- 1050 holing of packets. 1052 5.2.3. Search Complete Phase 1054 On entry to the search complete phase, the DPLPMTUD sender starts the 1055 PMTU_RAISE_TIMER. In this phase, the PLPMTU remains at the value 1056 confirmed by the last successful probe packet. 1058 In this phase, the PL MUST periodically confirm that the PLPMTU is 1059 still supported by the path. If the PL is designed in a way that is 1060 unable to confirm reachability to the destination endpoint after 1061 probing has completed, the method uses a CONFIRMATION_TIMER to 1062 periodically repeat a probe packet for the current PLPMTU size. 1064 If the DPLPMTUD sender is unable to confirm reachability for packets 1065 with a size of the current PLPMTU (e.g., if the CONFIRMATION_TIMER 1066 expires) or the PL signals a lack of reachability, the method exits 1067 the phase and enters the PROBE_BASE phase, see Section 5.2.4. 1069 If the PMTU_RAISE_TIMER expires, the DPLPMTUD sender re-enters the 1070 Search phase, see Section 5.2.2, and resumes probing for a larger 1071 PLPMTU. 1073 Back hole detection can be used in parallel to check that a network 1074 path continues to support a previously confirmed PLPMTU. If a black 1075 hole is detected the algorithm moves to the PROBE_BASE phase, see 1076 Section 5.2.4. 1078 The phase can also exited when a validated PTB message is received 1079 (see Section 4.4.1). 1081 5.2.4. PROBE_BASE Phase 1083 This phase is entered when black hole detection or a PTB message 1084 indicates that the PLPMTU is not supported by the path. 1086 On entry to this phase, the PLPMTU is set to the BASE_PMTU, and a 1087 corresponding reduced MPS is advertised. 1089 PROBED_SIZE is then set to the PLPMTU (i.e., the BASE_PMTU), to 1090 confirm this size is supported across the path. If confirmed, 1091 DPLPMTUD enters the Search Phase to determine whether the PL sender 1092 can use a larger PLPMTU. 1094 If the path cannot be confirmed to support the BASE_PMTU after 1095 sending MAX_PROBES, DPLPMTUD moves to the Error phase, see 1096 Section 5.2.5. 1098 5.2.5. ERROR Phase 1100 The ERROR phase is entered when there is conflicting or invalid 1101 PLPMTU information for the path (e.g. a failure to support the 1102 BASE_PMTU). In this phase, the MPS is set to a value less than the 1103 BASE_PMTU, but at least the size of the MIN_PMTU. 1105 DPLPMTUD remains in the ERROR phase until a consistent view of the 1106 path can be discovered and it has also been confirmed that the path 1107 supports the BASE_PMTU. 1109 Note: MIN_PMTU may be identical to BASE_PMTU, simplifying the actions 1110 in this phase. 1112 If no acknowledgement is received for PROBE_COUNT probes of size 1113 MIN_PMTU, the method suspends DPLPMTUD, see Section 5.2.5. 1115 5.2.5.1. Robustness to Inconsistent Path 1117 Robustness to paths unable to sustain the BASE_PMTU. Some paths 1118 could be unable to sustain packets of the BASE_PMTU size. These 1119 paths could use an alternate algorithm to implement the PROBE_ERROR 1120 phase that allows fallback to a smaller than desired PLPMTU, rather 1121 than suffer connectivity failure. 1123 This could also utilise methods such as endpoint IP fragmentation to 1124 enable the PL sender to communicate using packets smaller than the 1125 BASE_PMTU. 1127 5.2.6. DISABLED Phase 1129 This phase suspends operation of DPLPMTUD. It disables probing for 1130 the PLPMTU until action is taken by the PL or application using the 1131 PL. 1133 5.3. State Machine 1135 A state machine for DPLPMTUD is depicted in Figure 4. If multihoming 1136 is supported, a state machine is needed for each path. 1138 | | 1139 | Start | PL indicates loss 1140 | | of connectivity 1141 V V 1142 +---------------+ +---------------+ 1143 | DISABLED | | ERROR | 1144 +---------------+ +---------------+ 1145 | PL indicates PROBE_TIMER expiry: ^ | 1146 | connectivity PROBE_COUNT = MAX_PROBES | | 1147 +--------------------+ +---------------+ | 1148 | | | 1149 V | BASE_PMTU Probe | 1150 +---------------+ acked | 1151 | BASE |----------------------+ 1152 +---------------+ | 1153 Black hole detected or ^ | ^ ^ Black hole detected or | 1154 PTB_SIZE < PLPMTU | | | | PTB_SIZE < PLPMTU | 1155 +--------------------+ | | +--------------------+ | 1156 | +----+ | | 1157 | PROBE_TIMER expiry: | | 1158 | PROBE_COUNT < MAX_PROBES | | 1159 | | | 1160 | PMTU_RAISE_TIMER expiry | | 1161 | +-----------------------------------------+ | | 1162 | | | | | 1163 | | V | V 1164 +---------------+ +---------------+ 1165 |SEARCH_COMPLETE| | SEARCHING | 1166 +---------------+ +---------------+ 1167 | ^ ^ | | ^ 1168 | | | | | | 1169 | | +-----------------------------------------+ | | 1170 | | MAX_PMTU Probe acked or | | 1171 | | PTB (BASE_PMTU <= PTB_SIZE < PROBED_SIZE) or | | 1172 +----+ PROBE_COUNT = MAX_PROBES +----+ 1173 CONFIRMATION_TIMER expiry: PROBE_TIMER expiry: 1174 PROBE_COUNT < MAX_PROBES or PROBE_COUNT < MAX_PROBES or 1175 PLPMTU Probe acked Probe acked 1177 Figure 4: State machine for Datagram PLPMTUD. Note: Some state 1178 changes are not show to simplify the diagram. 1180 The following states are defined: 1182 DISABLED: The DISABLED state is the initial state before probing has 1183 started. It is also entered from any other state, when the PL 1184 indicates loss of connectivity. This state is left, once the PL 1185 indicates connectivity to the remote PL. 1187 BASE: The BASE state is used to confirm that the BASE_PMTU size is 1188 supported by the network path and is designed to allow an 1189 application to continue working when there are transient 1190 reductions in the actual PMTU. It also seeks to avoid long 1191 periods where traffic is black holed while searching for a larger 1192 PLPMTU. 1194 On entry, the PROBED_SIZE is set to the BASE_PMTU size and the 1195 PROBE_COUNT is set to zero. 1197 Each time a probe packet is sent, and the PROBE_TIMER is started. 1198 The state is exited when the probe packet is acknowledged, and the 1199 PL sender enters the SEARCHING state. 1201 The state is also left when the PROBE_COUNT reaches MAX_PROBES; a 1202 PTB message is validated. This causes the PL sender to enter the 1203 ERROR state. 1205 SEARCHING: The SEARCHING state is the main probing state. This 1206 state is entered when probing for the BASE_PMTU was successful. 1208 The PROBE_COUNT is set to zero when the first probe packet is sent 1209 for each probe size. Each time a probe packet is acknowledged, 1210 the PLPMTU is set to the PROBED_SIZE, and then the PROBED_SIZE is 1211 increased using the search algorithm. 1213 When a probe packet is sent and not acknowledged within the period 1214 of the PROBE_TIMER, the PROBE_COUNT is incremented and the probe 1215 packet is retransmitted. The state is exited when the PROBE_COUNT 1216 reaches MAX_PROBES; a PTB message is validated; a probe of size 1217 MAX_PMTU is acknowledged or black hole detection is triggered. 1219 SEARCH_COMPLETE: The SEARCH_COMPLETE state indicates a successful 1220 end to the PROBE_SEARCH state. DPLPMTUD remains in this state 1221 until either the PMTU_RAISE_TIMER expires; a received PTB message 1222 is validated; or black hole detection is triggered. 1224 When DPLPMTUD uses an unacknowledged PL and is in the 1225 SEARCH_COMPLETE state, a CONFIRMATION_TIMER periodically resets 1226 the PROBE_COUNT and schedules a probe packet with the size of the 1227 PLPMTU. If the probe packet fails to be acknowledged after 1228 MAX_PROBES attempts, the method enters the BASE state. When used 1229 with an acknowledged PL (e.g., SCTP), DPLPMTUD SHOULD NOT continue 1230 to generate PLPMTU probes in this state. 1232 ERROR: The ERROR state represents the case where either the network 1233 path is not known to support a PLPMTU of at least the BASE_PMTU 1234 size or when there is contradictory information about the network 1235 path that would otherwise result in excessive variation in the MPS 1236 signalled to the higher layer. The state implements a method to 1237 mitigate oscillation in the state-event engine. It signals a 1238 conservative value of the MPS to the higher layer by the PL. The 1239 state is exited when Packet Probes no longer detect the error or 1240 when the PL indicates that connectivity has been lost. 1242 Implementations are permitted to enable endpoint fragmentation if 1243 the DPLPMTUD is unable to validate MIN_PMTU within PROBE_COUNT 1244 probes. If DPLPMTUD is unable to validate MIN_PMTU the 1245 implementation should transition to PROBE_DISABLED. 1247 Appendix A contains an informative description of key events. 1249 5.4. Search to Increase the PLPMTU 1251 This section describes the algorithms used by DPLPMTUD to search for 1252 a larger PLPMTU. 1254 5.4.1. Probing for a Larger PLPMTU 1256 Implementations use a search algorithm across the search range to 1257 determine whether a larger PLPMTU can be supported across a network 1258 path. 1260 The method discovers the search range by confirming the minimum 1261 PLPMTU and then using the probe method to select a PROBED_SIZE less 1262 than or equal to MAX_PMTU. MAX_PMTU is the minimum of the local MTU 1263 and EMTU_R (learned from the remote endpoint). The MAX_PMTU MAY be 1264 reduced by an application that sets a maximum to the size of 1265 datagrams it will send. 1267 The PROBE_COUNT is initialised to zero when a probe packet is first 1268 sent with a particular size. A timer is used by the search algorithm 1269 to trigger the sending of probe packets of size PROBED_SIZE, larger 1270 than the PLPMTU. Each probe packet successfully sent to the remote 1271 peer is confirmed by acknowledgement at the PL, see Section 4.1. 1273 Each time a probe packet is sent to the destination, the PROBE_TIMER 1274 is started. The timer is cancelled when the PL receives 1275 acknowledgment that the probe packet has been successfully sent 1276 across the path Section 4.1. This confirms that the PROBED_SIZE is 1277 supported, and the PROBED_SIZE value is then assigned to the PLPMTU. 1278 The search algorithm can continue to send subsequent probe packets of 1279 an increasing size. 1281 If the timer expires before a probe packet is acknowledged, the probe 1282 has failed to confirm the PROBED_SIZE. Each time the PROBE_TIMER 1283 expires, the PROBE_COUNT is incremented, the PROBE_TIMER is 1284 reinitialised, and a probe packet of the same size is retransmitted 1285 (the replicated probe improve the resilience to loss). The maximum 1286 number of retransmissions for a particular size is configured 1287 (MAX_PROBES). If the value of the PROBE_COUNT reaches MAX_PROBES, 1288 probing will stop, and the PL sender enters the SEARCH_COMPLETE 1289 state. 1291 5.4.2. Selection of Probe Sizes 1293 The search algorithm needs to determine a minimum useful gain in 1294 PLPMTU. It would not be constructive for a PL sender to attempt to 1295 probe for all sizes - this would incur unnecessary load on the path 1296 and has the undesirable effect of slowing the time to reach a more 1297 optimal MPS. Implementations SHOULD select the set of probe packet 1298 sizes to maximise the gain in PLPMTU from each search step. 1300 Implementations could optimize the search procedure by selecting step 1301 sizes from a table of common PMTU sizes. When selecting the 1302 appropriate next size to search, an implementor ought to also 1303 consider that there can be common sizes of MPS that applications seek 1304 to use. 1306 xxx Author Note: A future version of this section will detail example 1307 methods for selecting probe size values, but does not plan to mandate 1308 a single method. xxx 1310 5.4.3. Resilience to Inconsistent Path Information 1312 A decision to increase the PLPMTU needs to be resilient to the 1313 possibility that information learned about the network path is 1314 inconsistent (this could happen when probe packets are lost due to 1315 other reasons, or some of the packets in a flow are forwarded along a 1316 portion of the path that supports a different actual PMTU). 1318 Frequent path changes could occur due to unexpected "flapping" - 1319 where some packets from a flow pass along one path, but other packets 1320 follow a different path with different properties. DPLPMTUD can be 1321 made resilient to these anomalies by introducing hysteresis into the 1322 search decision to increase the MPS. 1324 6. Specification of Protocol-Specific Methods 1326 This section specifies protocol-specific details for datagram PLPMTUD 1327 for IETF-specified transports. 1329 The first subsection provides guidance on how to implement the 1330 DPLPMTUD method as a part of an application using UDP or UDP-Lite. 1332 The guidance also applies to other datagram services that do not 1333 include a specific transport protocol (such as a tunnel 1334 encapsulation). The following subsections describe how DPLPMTUD can 1335 be implemented as a part of the transport service, allowing 1336 applications using the service to benefit from discovery of the 1337 PLPMTU without themselves needing to implement this method. 1339 6.1. Application support for DPLPMTUD with UDP or UDP-Lite 1341 The current specifications of UDP [RFC0768] and UDP-Lite [RFC3828] do 1342 not define a method in the RFC-series that supports PLPMTUD. In 1343 particular, the UDP transport does not provide the transport layer 1344 features needed to implement datagram PLPMTUD. 1346 The DPLPMTUD method can be implemented as a part of an application 1347 built directly or indirectly on UDP or UDP-Lite, but relies on 1348 higher-layer protocol features to implement the method [RFC8085]. 1350 Some primitives used by DPLPMTUD might not be available via the 1351 Datagram API (e.g., the ability to access the PLPMTU cache, or 1352 interpret received PTB messages). 1354 In addition, it is desirable that PMTU discovery is not performed by 1355 multiple protocol layers. An application SHOULD avoid implementing 1356 DPLPMTUD when the underlying transport system provides this 1357 capability. Using a common method for managing the PLPMTU has 1358 benefits, both in the ability to share state between different 1359 processes and opportunities to coordinate probing. 1361 6.1.1. Application Request 1363 An application needs an application-layer protocol mechanism (such as 1364 a message acknowledgement method) that solicits a response from a 1365 destination endpoint. The method SHOULD allow the sender to check 1366 the value returned in the response to provide additional protection 1367 from off-path insertion of data [RFC8085], suitable methods include a 1368 parameter known only to the two endpoints, such as a session ID or 1369 initialised sequence number. 1371 6.1.2. Application Response 1373 An application needs an application-layer protocol mechanism to 1374 communicate the response from the destination endpoint. This 1375 response may indicate successful reception of the probe across the 1376 path, but could also indicate that some (or all packets) have failed 1377 to reach the destination. 1379 6.1.3. Sending Application Probe Packets 1381 A probe packet that may carry an application data block, but the 1382 successful transmission of this data is at risk when used for 1383 probing. Some applications may prefer to use a probe packet that 1384 does not carry an application data block to avoid disruption to 1385 normal data transfer. 1387 6.1.4. Validating the Path 1389 An application that does not have other higher-layer information 1390 confirming correct delivery of datagrams SHOULD implement the 1391 CONFIRMATION_TIMER to periodically send probe packets while in the 1392 SEARCH_COMPLETE state. 1394 6.1.5. Handling of PTB Messages 1396 An application that is able and wishes to receive PTB messages MUST 1397 perform ICMP validation as specified in Section 5.2 of [RFC8085]. 1398 This requires that the application to check each received PTB 1399 messages to validate it is received in response to transmitted 1400 traffic and that the reported PTB_SIZE is less than the current 1401 probed size (see Section 4.4.2). A validated PTB message MAY be used 1402 as input to the DPLPMTUD algorithm, but MUST NOT be used directly to 1403 set the PLPMTU. 1405 6.2. DPLPMTUD with UDP Options 1407 UDP Options[I-D.ietf-tsvwg-udp-options] can supply the additional 1408 functionality required to implement DPLPMTUD within the UDP transport 1409 service. Implementing DPLPMTUD using UDP Options avoids the need for 1410 each application to implement the DPLPMTUD method. 1412 Section 5.6 of[I-D.ietf-tsvwg-udp-options] defines the Maximum 1413 Segment Size (MSS) option, which allows the local sender to indicate 1414 the EMTU_R to the peer. The value received in this option can be 1415 used to initialise MAX_PMTU. 1417 UDP Options enables padding to be added to UDP datagrams that are 1418 used as Probe Packets. Feedback confirming reception of each Probe 1419 Packet is provided by two new UDP Options: 1421 o The Probe Request Option (Section 6.2.1) is set by a sending PL to 1422 solicit a response from a remote endpoint. A four-byte token 1423 identifies each request. 1425 o The Probe Response Option (Section 6.2.2 is generated by the UDP 1426 Options receiver in response to reception of a previously received 1427 Probe Request Option. Each Probe Response Option echoes a 1428 previously received four-byte token. 1430 The token value allows implementations to be distinguish between 1431 acknowledgements for initial probe packets and acknowledgements 1432 confirming receipt of subsequent probe packets (e.g., travelling 1433 along alternate paths with a larger RTT). Each probe packet needs to 1434 be uniquely identifiable by the UDP Options sender within the Maximum 1435 Segment Lifetime (MSL). The UDP Options sender therefore needs to 1436 not recycle token values until they have expired or have been 1437 acknowledged. A 4 byte value for the token field provides sufficient 1438 space for multiple unique probes to be made within the MSL. 1440 The initial value of the four byte token field SHOULD be assigned to 1441 a randomised value, as described in section 5.1 of [RFC8085]) to 1442 enhance protection from off-path attacks. 1444 Implementations ought to only send a probe packet with a Request 1445 Probe Option when required by their local state machine, i.e., when 1446 probing to grow the PLPMTU or to confirm the current PLPMTU. The 1447 procedure to handle the loss of a response packet is the 1448 responsibility of the sender of the request. Implementations are 1449 allowed to track multiple requests and respond to them with a single 1450 packet. 1452 A PL needs to determine that the path can still support the size of 1453 datagram that the application is currently sending in the DPLPMTUD 1454 search_done state (i.e., to detect black-holing of data). One way to 1455 achieve this is to send probe packets of size PLPMTU or to utilise a 1456 higher-layer method that provides explicit feedback indicating any 1457 packet loss. Another possibility is to utilise data packets that 1458 carry a Timestamp Option. Reception of a valid timestamp that was 1459 echoed by the remote endpoint can be used to infer connectivity. 1460 This can provide useful feedback even over paths with asymmetric 1461 capacity and/or that carry UDP Option flows that have very asymmetric 1462 datagram rates, because an echo of the most recent timestamp still 1463 indicates reception of at least one packet of the transmitted size. 1464 This is sufficient to confirm there is no black hole. 1466 In contrast, when sending a probe to increase the PLPMTU, a timestamp 1467 might be unable to unambiguously identify that a specific probe 1468 packet has been received. Timestamp mechanisms cannot be used to 1469 confirm the reception of individual probe messages and cannot be used 1470 to stimulate a response from the remote peer. 1472 6.2.1. UDP Probe Request Option 1474 The Probe Request Option allows a sending endpoint to solicit a 1475 response from a destination endpoint. 1477 The Probe Request Option carries a four byte token set by the sender. 1478 This token can be set to a value that is likely to be known only to 1479 the sender (and is sent along the end-to-end path). The initial 1480 value of the token SHOULD be assigned to a randomised value, as 1481 described in section 5.1 of [RFC8085]) to enhance protection from 1482 off-path attacks. 1484 The sender needs to then check the value returned in the UDP Probe 1485 Response Option. The value of the Token field, uniquely identifies a 1486 probe within the maximum segment lifetime. 1488 +----------+--------+-----------------+ 1489 | Kind=9* | Len=6 | Token | 1490 +----------+--------+-----------------+ 1491 1 byte 1 byte 4 bytes 1493 * To be confirmed by IANA. 1495 Figure 5: UDP Probe REQ Option Format 1497 6.2.2. UDP Probe Response Option 1499 The Probe Response Option is generated in response to reception of a 1500 previously received Probe Request Option. This response is generated 1501 by the UDP Option processing. 1503 The Probe Response Option carries a four byte token field. The Token 1504 field associates the response with the Token value carried in the 1505 most recently-received Echo Request. The rate of generation of UDP 1506 packets carrying a Probe Response Option is expected to be less than 1507 once per RTT and SHOULD be rate-limited (see Section 9). 1509 +----------+--------+-----------------+ 1510 | Kind=10* | Len=6 | Token | 1511 +----------+--------+-----------------+ 1512 1 byte 1 byte 4 bytes 1514 * To be confirmed by IANA. 1516 Figure 6: UDP Probe RES Option Format 1518 6.3. DPLPMTUD for SCTP 1520 Section 10.2 of [RFC4821] specifies a recommended PLPMTUD probing 1521 method for SCTP. It recommends the use of the PAD chunk, defined in 1522 [RFC4820] to be attached to a minimum length HEARTBEAT chunk to build 1523 a probe packet. This enables probing without affecting the transfer 1524 of user messages and without interfering with congestion control. 1525 This is preferred to using DATA chunks (with padding as required) as 1526 path probes. 1528 XXX Author Note: Future versions of this document might define a 1529 parameter contained in the INIT and INIT ACK chunk to indicate the 1530 remote peer MTU to the local peer. However, multihoming makes this a 1531 bit complex, so it might not be worth doing. XXX 1533 6.3.1. SCTP/IPv4 and SCTP/IPv6 1535 The base protocol is specified in [RFC4960]. This provides an 1536 acknowledged PL. A sender can therefore enter the PROBE_BASE state 1537 as soon as connectivity has been confirmed. 1539 6.3.1.1. Sending SCTP Probe Packets 1541 Probe packets consist of an SCTP common header followed by a 1542 HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to control 1543 the length of the probe packet. The HEARTBEAT chunk is used to 1544 trigger the sending of a HEARTBEAT ACK chunk. The reception of the 1545 HEARTBEAT ACK chunk acknowledges reception of a successful probe. 1547 The HEARTBEAT chunk carries a Heartbeat Information parameter which 1548 should include, besides the information suggested in [RFC4960], the 1549 probe size, which is the size of the complete datagram. The size of 1550 the PAD chunk is therefore computed by reducing the probing size by 1551 the IPv4 or IPv6 header size, the SCTP common header, the HEARTBEAT 1552 request and the PAD chunk header. The payload of the PAD chunk 1553 contains arbitrary data. 1555 To avoid fragmentation of retransmitted data, probing starts right 1556 after the handshake, before data is sent. Assuming normal behaviour 1557 (i.e., the PMTU is smaller than or equal to the interface MTU), this 1558 process will take a few round trip time periods depending on the 1559 number of PMTU sizes probed. The Heartbeat timer can be used to 1560 implement the PROBE_TIMER. 1562 6.3.1.2. Validating the Path with SCTP 1564 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1565 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1567 6.3.1.3. PTB Message Handling by SCTP 1569 Normal ICMP validation MUST be performed as specified in Appendix C 1570 of [RFC4960]. This requires that the first 8 bytes of the SCTP 1571 common header are quoted in the payload of the PTB message, which can 1572 be the case for ICMPv4 and is normally the case for ICMPv6. 1574 When a PTB message has been validated, the PTB_SIZE reported in the 1575 PTB message SHOULD be used with the DPLPMTUD algorithm, providing 1576 that the reported PTB_SIZE is less than the current probe size. 1578 6.3.2. DPLPMTUD for SCTP/UDP 1580 The UDP encapsulation of SCTP is specified in [RFC6951]. 1582 6.3.2.1. Sending SCTP/UDP Probe Packets 1584 Packet probing can be performed as specified in Section 6.3.1.1. The 1585 maximum payload is reduced by 8 bytes, which has to be considered 1586 when filling the PAD chunk. 1588 6.3.2.2. Validating the Path with SCTP/UDP 1590 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1591 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1593 6.3.2.3. Handling of PTB Messages by SCTP/UDP 1595 Normal ICMP validation MUST be performed for PTB messages as 1596 specified in Appendix C of [RFC4960]. This requires that the first 8 1597 bytes of the SCTP common header are contained in the PTB message, 1598 which can be the case for ICMPv4 (but note the UDP header also 1599 consumes a part of the quoted packet header) and is normally the case 1600 for ICMPv6. When the validation is completed, the PTB_SIZE indicated 1601 in the PTB message SHOULD be used with the DPLPMTUD providing that 1602 the reported PTB_SIZE is less than the current probe size. 1604 6.3.3. DPLPMTUD for SCTP/DTLS 1606 The Datagram Transport Layer Security (DTLS) encapsulation of SCTP is 1607 specified in [RFC8261]. It is used for data channels in WebRTC 1608 implementations. 1610 6.3.3.1. Sending SCTP/DTLS Probe Packets 1612 Packet probing can be done as specified in Section 6.3.1.1. 1614 6.3.3.2. Validating the Path with SCTP/DTLS 1616 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1617 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1619 6.3.3.3. Handling of PTB Messages by SCTP/DTLS 1621 It is not possible to perform normal ICMP validation as specified in 1622 [RFC4960], since even if the ICMP message payload contains sufficient 1623 information, the reflected SCTP common header would be encrypted. 1624 Therefore it is not possible to process PTB messages at the PL. 1626 6.4. DPLPMTUD for QUIC 1628 Quick UDP Internet Connection (QUIC) [I-D.ietf-quic-transport] is a 1629 UDP-based transport that provides reception feedback. The UDP 1630 payload includes the QUIC packet header, protected payload, and any 1631 authentication fields. QUIC depends on a PMTU of at least 1280 1632 bytes. 1634 Section 9.2 of [I-D.ietf-quic-transport] describes the path 1635 considerations when sending QUIC packets. It recommends the use of 1636 PADDING frames to build the probe packet. Pure probe-only packets 1637 are constructed with PADDING frames and PING frames to create a 1638 padding only packet that will elicit an acknowledgement. Padding 1639 only frames enable probing the without affecting the transfer of 1640 other QUIC frames. 1642 The recommendation for QUIC endpoints implementing DPLPMTUD is 1643 therefore that a MPS is maintained for each combination of local and 1644 remote IP addresses [I-D.ietf-quic-transport]. If a QUIC endpoint 1645 determines that the PMTU between any pair of local and remote IP 1646 addresses has fallen below an acceptable MPS, it needs to immediately 1647 cease sending QUIC packets on the affected path. This could result 1648 in termination of the connection if an alternative path cannot be 1649 found [I-D.ietf-quic-transport]. 1651 6.4.1. Sending QUIC Probe Packets 1653 A probe packet consists of a QUIC Header and a payload containing 1654 PADDING Frames and a PING Frame. PADDING Frames are a single octet 1655 (0x00) and several of these can be used to create a probe packet of 1656 size PROBED_SIZE. QUIC provides an acknowledged PL, A sender can 1657 therefore enter the PROBE_BASE state as soon as connectivity has been 1658 confirmed. 1660 The current specification of QUIC sets the following: 1662 o BASE_PMTU: 1200. A QUIC sender needs to pad initial packets to 1663 1200 bytes to confirm the path can support packets of a useful 1664 size. 1666 o MIN_PMTU: 1200 bytes. A QUIC sender that determines the PMTU has 1667 fallen below 1200 bytes MUST immediately stop sending on the 1668 affected path. 1670 6.4.2. Validating the Path with QUIC 1672 QUIC provides an acknowledged PL. A sender therefore MUST NOT 1673 implement the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1675 6.4.3. Handling of PTB Messages by QUIC 1677 QUIC operates over the UDP transport, and the guidelines on ICMP 1678 validation as specified in Section 5.2 of [RFC8085] therefore apply. 1679 In addition to UDP Port validation QUIC can validate an ICMP message 1680 by looking for valid Connection IDs in the quoted packet. 1682 7. Acknowledgements 1684 This work was partially funded by the European Union's Horizon 2020 1685 research and innovation programme under grant agreement No. 644334 1686 (NEAT). The views expressed are solely those of the author(s). 1688 8. IANA Considerations 1690 This memo includes no request to IANA. 1692 XXX If new UDP Options are specified in this document, a request to 1693 IANA will be included here. XXX 1695 If there are no requirements for IANA, the section will be removed 1696 during conversion into an RFC by the RFC Editor. 1698 9. Security Considerations 1700 The security considerations for the use of UDP and SCTP are provided 1701 in the references RFCs. Security guidance for applications using UDP 1702 is provided in the UDP Usage Guidelines [RFC8085], specifically the 1703 generation of probe packets is regarded as a "Low Data-Volume 1704 Application", described in section 3.1.3 of this document. This 1705 recommends that sender limits generation of probe packets to an 1706 average rate lower than one probe per 3 seconds. 1708 A PL sender needs to ensure that the method used to confirm reception 1709 of probe packets offers protection from off-path attackers injecting 1710 packets into the path. This protection if provided in IETF-defined 1711 protocols (e.g., TCP, SCTP) using a randomly-initialised sequence 1712 number. A description of one way to do this when using UDP is 1713 provided in section 5.1 of [RFC8085]). 1715 There are cases where ICMP Packet Too Big (PTB) messages are not 1716 delivered due to policy, configuration or equipment design (see 1717 Section 1.1), this method therefore does not rely upon PTB messages 1718 being received, but is able to utilise these when they are received 1719 by the sender. PTB messages could potentially be used to cause a 1720 node to inappropriately reduce the PLPMTU. A node supporting 1721 DPLPMTUD MUST therefore appropriately validate the payload of PTB 1722 messages to ensure these are received in response to transmitted 1723 traffic (i.e., a reported error condition that corresponds to a 1724 datagram actually sent by the path layer, see Section 4.4.1). 1726 An on-path attacker, able to create a PTB message could forge PTB 1727 messages that include a valid quoted IP packet. Such an attack could 1728 be used to drive down the PLPMTU. There are two ways this method can 1729 be mitigated against such attacks: First, by ensuring that a PL 1730 sender never reduces the PLPMTU below the base size, solely in 1731 response to receiving a PTB message. This is achieved by first 1732 entering the PROBE_BASE state when such a message is received. 1733 Second, the design does not require processing of PTB messages, a PL 1734 sender could therefore suspend processing of PTB messages (e.g., in a 1735 robustness mode after detecting that subsequent probes actually 1736 confirm that a size larger than the PTB_SIZE is supported by a path). 1738 Parallel forwarding paths SHOULD be considered. Section 5.2.5.1 1739 identifies the need for robustness in the method when the path 1740 information may be inconsistent. 1742 A node performing DPLPMTUD could experience conflicting information 1743 about the size of supported probe packets. This could occur when 1744 there are multiple paths are concurrently in use and these exhibit a 1745 different PMTU. If not considered, this could result in data being 1746 black holed when the PLPMTU is larger than the smallest PMTU across 1747 the current paths. 1749 10. References 1751 10.1. Normative References 1753 [I-D.ietf-quic-transport] 1754 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1755 and Secure Transport", draft-ietf-quic-transport-16 (work 1756 in progress), October 2018. 1758 [I-D.ietf-tsvwg-udp-options] 1759 Touch, J., "Transport Options for UDP", draft-ietf-tsvwg- 1760 udp-options-05 (work in progress), July 2018. 1762 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1763 DOI 10.17487/RFC0768, August 1980, 1764 . 1766 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1767 DOI 10.17487/RFC1191, November 1990, 1768 . 1770 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1771 Requirement Levels", BCP 14, RFC 2119, 1772 DOI 10.17487/RFC2119, March 1997, 1773 . 1775 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1776 (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, 1777 December 1998, . 1779 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., 1780 and G. Fairhurst, Ed., "The Lightweight User Datagram 1781 Protocol (UDP-Lite)", RFC 3828, DOI 10.17487/RFC3828, July 1782 2004, . 1784 [RFC4820] Tuexen, M., Stewart, R., and P. Lei, "Padding Chunk and 1785 Parameter for the Stream Control Transmission Protocol 1786 (SCTP)", RFC 4820, DOI 10.17487/RFC4820, March 2007, 1787 . 1789 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1790 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1791 . 1793 [RFC6951] Tuexen, M. and R. Stewart, "UDP Encapsulation of Stream 1794 Control Transmission Protocol (SCTP) Packets for End-Host 1795 to End-Host Communication", RFC 6951, 1796 DOI 10.17487/RFC6951, May 2013, 1797 . 1799 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1800 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1801 March 2017, . 1803 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1804 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1805 May 2017, . 1807 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1808 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1809 DOI 10.17487/RFC8201, July 2017, 1810 . 1812 [RFC8261] Tuexen, M., Stewart, R., Jesup, R., and S. Loreto, 1813 "Datagram Transport Layer Security (DTLS) Encapsulation of 1814 SCTP Packets", RFC 8261, DOI 10.17487/RFC8261, November 1815 2017, . 1817 10.2. Informative References 1819 [I-D.ietf-intarea-tunnels] 1820 Touch, J. and M. Townsley, "IP Tunnels in the Internet 1821 Architecture", draft-ietf-intarea-tunnels-09 (work in 1822 progress), July 2018. 1824 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1825 RFC 792, DOI 10.17487/RFC0792, September 1981, 1826 . 1828 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1829 Communication Layers", STD 3, RFC 1122, 1830 DOI 10.17487/RFC1122, October 1989, 1831 . 1833 [RFC1812] Baker, F., Ed., "Requirements for IP Version 4 Routers", 1834 RFC 1812, DOI 10.17487/RFC1812, June 1995, 1835 . 1837 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 1838 RFC 2923, DOI 10.17487/RFC2923, September 2000, 1839 . 1841 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1842 Congestion Control Protocol (DCCP)", RFC 4340, 1843 DOI 10.17487/RFC4340, March 2006, 1844 . 1846 [RFC4443] Conta, A., Deering, S., and M. Gupta, Ed., "Internet 1847 Control Message Protocol (ICMPv6) for the Internet 1848 Protocol Version 6 (IPv6) Specification", STD 89, 1849 RFC 4443, DOI 10.17487/RFC4443, March 2006, 1850 . 1852 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1853 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 1854 . 1856 [RFC4890] Davies, E. and J. Mohacsi, "Recommendations for Filtering 1857 ICMPv6 Messages in Firewalls", RFC 4890, 1858 DOI 10.17487/RFC4890, May 2007, 1859 . 1861 Appendix A. Event-driven state changes 1863 This appendix contains an informative description of key events: 1865 Path Setup: When a new path is initiated, the state is set to 1866 PROBE_START. This sends a probe packet with the size of the 1867 BASE_PMTU. As soon as the path is confirmed, the state changes to 1868 PROBE_SEARCH. 1870 Arrival of an Acknowledgment: Depending on the probing state, the 1871 reaction differs according to Figure 7, which is a simplification 1872 of Figure 4 focusing on this event. 1874 +--------------+ +----------------+ 1875 | PROBE_START | --3------------------------------> | PROBE_DISABLED | 1876 +--------------+ --4---------------- ------------> +----------------+ 1877 \/ 1878 +--------------+ /\ +--------------+ 1879 | PROBE_ERROR | -------------------- \ ----------> | PROBE_BASE | 1880 +--------------+ --4--------------/ \ +--------------+ 1881 \ 1882 +--------------+ --1 -------- \ +--------------+ 1883 | PROBE_BASE | \ --- \ ------> | PROBE_ERROR | 1884 +--------------+ --3--------- \ -----/ \ +--------------+ 1885 \ \ 1886 +--------------+ \ -----> +--------------+ 1887 | PROBE_SEARCH | --2--- -----------------> | PROBE_SEARCH | 1888 +--------------+ \ ------------------> +--------------+ 1889 \ ---- / 1890 +---------------+ / \ +---------------+ 1891 |SEARCH_COMPLETE| -1--- \ |SEARCH_COMPLETE| 1892 +---------------+ -5-- -----------------------> +---------------+ 1893 \ 1894 \ +--------------+ 1895 --------------------------> | PROBE_BASE | 1896 +--------------+ 1898 Condition 1: The maximum PMTU size has not yet been reached. 1899 Condition 2: The maximum PMTU size has been reached. Condition 3: 1900 Probe Timer expires and PROBE_COUNT = MAX_PROBEs. Condition 4: 1901 PROBE_ACK received. Condition 5: Black hole detected. 1903 Figure 7: State changes at the arrival of an acknowledgment 1905 Probing timeout: The PROBE_COUNT is initialised to zero each time 1906 the value of PROBED_SIZE is changed and when a acknowledgment 1907 confirming delivery of a probe packet. The PROBE_TIMER is started 1908 each time a probe packet is sent. It is stopped when an 1909 acknowledgment arrives that confirms delivery of a probe packet of 1910 PROBED_SIZE. If the probe packet is not acknowledged before the 1911 PROBE_TIMER expires, the PROBE_COUNT is incremented. When the 1912 PROBE_COUNT equals the value MAX_PROBES, the state is changed, 1913 otherwise a new probe packet of the same size (PROBED_SIZE) is 1914 resent. The state transitions are illustrated in Figure 8. This 1915 shows a simplification of Figure 4 with a focus only on this 1916 event. 1918 +--------------+ +----------------+ 1919 | PROBE_START | --2------------------------------->| PROBE_DISABLED | 1920 +--------------+ +----------------+ 1922 +--------------+ +--------------+ 1923 | PROBE_ERROR | -----------------> | PROBE_ERROR | 1924 +--------------+ / +--------------+ 1925 / 1926 +--------------+ --2----------/ +--------------+ 1927 | PROBE_BASE | --1------------------------------> | PROBE_BASE | 1928 +--------------+ +--------------+ 1930 +--------------+ +--------------+ 1931 | PROBE_SEARCH | --1------------------------------> | PROBE_SEARCH | 1932 +--------------+ --2--------- +--------------+ 1933 \ 1934 +---------------+ \ +---------------+ 1935 |SEARCH_COMPLETE| -------------------> |SEARCH_COMPLETE| 1936 +---------------+ +---------------+ 1938 Condition 1: The maximum number of probe packets has not been 1939 reached. Condition 2: The maximum number of probe packets has been 1940 reached. XXX This diagram has not been validated. 1942 Figure 8: State changes at the expiration of the probe timer 1944 PMTU raise timer timeout: DPLPMTUD periodically sends a probe packet 1945 to detect whether a larger PMTU is possible. This probe packet is 1946 generated by the PMTU_RAISE_TIMER. 1948 Arrival of a PTB message: The active probing of the path can be 1949 supported by the arrival of a PTB message indicating the PTB_SIZE. 1950 Two examples are: 1952 1. The PTB_SIZE is between the PLPMTU and the probe that 1953 triggered the PTB message. 1955 2. The PTB_SIZE is smaller than the PLPMTU. 1957 In first case, the PROBE_BASE state transitions to the PROBE_ERROR 1958 state. In the PROBE_SEARCH state, a new probe packet is sent with 1959 the size reported by the PTB message. 1961 In second case, the probing starts again with a value of 1962 PROBE_BASE. 1964 Appendix B. Revision Notes 1966 Note to RFC-Editor: please remove this entire section prior to 1967 publication. 1969 Individual draft -00: 1971 o Comments and corrections are welcome directly to the authors or 1972 via the IETF TSVWG working group mailing list. 1974 o This update is proposed for WG comments. 1976 Individual draft -01: 1978 o Contains the first representation of the algorithm, showing the 1979 states and timers 1981 o This update is proposed for WG comments. 1983 Individual draft -02: 1985 o Contains updated representation of the algorithm, and textual 1986 corrections. 1988 o The text describing when to set the effective PMTU has not yet 1989 been validated by the authors 1991 o To determine security to off-path-attacks: We need to decide 1992 whether a received PTB message SHOULD/MUST be validated? The text 1993 on how to handle a PTB message indicating a link MTU larger than 1994 the probe has yet not been validated by the authors 1996 o No text currently describes how to handle inconsistent results 1997 from arbitrary re-routing along different parallel paths 1999 o This update is proposed for WG comments. 2001 Working Group draft -00: 2003 o This draft follows a successful adoption call for TSVWG 2005 o There is still work to complete, please comment on this draft. 2007 Working Group draft -01: 2009 o This draft includes improved introduction. 2011 o The draft is updated to require ICMP validation prior to accepting 2012 PTB messages - this to be confirmed by WG 2014 o Section added to discuss Selection of Probe Size - methods to be 2015 evlauated and recommendations to be considered 2017 o Section added to align with work proposed in the QUIC WG. 2019 Working Group draft -02: 2021 o The draft was updated based on feedback from the WG, and a 2022 detailed review by Magnus Westerlund. 2024 o The document updates RFC 4821. 2026 o Requirements list updated. 2028 o Added more explicit discussion of a simpler black-hole detection 2029 mode. 2031 o This draft includes reorganisation of the section on IETF 2032 protocols. 2034 o Added more discussion of implementation within an application. 2036 o Added text on flapping paths. 2038 o Replaced 'effective MTU' with new term PLPMTU. 2040 Working Group draft -03: 2042 o Updated figures 2044 o Added more discussion on blackhole detection 2046 o Added figure describing just blackhole detection 2048 o Added figure relating MPS sizes 2050 Working Group draft -04: 2052 o Described phases and named these consistently. 2054 o Corrected transition from confirmation directly to the search 2055 phase (Base has been checked). 2057 o Redrawn state diagrams. 2059 o Renamed BASE_MTU to BASE_PMTU (because it is a base for the PMTU). 2061 o Clarified Error state. 2063 o Clarified supsending DPLPMTUD. 2065 o Verified normative text in requirements section. 2067 o Removed duplicate text. 2069 o Changed all text to refer to /packet probe/probe packet/ 2070 /validation/verification/ added term /Probe Confirmation/ and 2071 clarified BlackHole detection. 2073 Working Group draft -05: 2075 o Updated security considerations. 2077 o Feedback after speaking with Joe Touch helped improve UDP-Options 2078 description. 2080 Working Group draft -06: 2082 o Updated description of ICMP issues in section 1.1 2084 o Update to description of QUIC. 2086 Working group draft -07: 2088 o Moved description of the PTB processing method from the PTB 2089 requirements section. 2091 o Clarified what is performed in the PTB validation check. 2093 o Updated security consideration to explain PTB security without 2094 needing to read the rest of the document. 2096 o Reformatted state machine diagram 2098 Authors' Addresses 2099 Godred Fairhurst 2100 University of Aberdeen 2101 School of Engineering 2102 Fraser Noble Building 2103 Aberdeen AB24 3UE 2104 UK 2106 Email: gorry@erg.abdn.ac.uk 2108 Tom Jones 2109 University of Aberdeen 2110 School of Engineering 2111 Fraser Noble Building 2112 Aberdeen AB24 3UE 2113 UK 2115 Email: tom@erg.abdn.ac.uk 2117 Michael Tuexen 2118 Muenster University of Applied Sciences 2119 Stegerwaldstrasse 39 2120 Steinfurt 48565 2121 DE 2123 Email: tuexen@fh-muenster.de 2125 Irene Ruengeler 2126 Muenster University of Applied Sciences 2127 Stegerwaldstrasse 39 2128 Steinfurt 48565 2129 DE 2131 Email: i.ruengeler@fh-muenster.de 2133 Timo Voelker 2134 Muenster University of Applied Sciences 2135 Stegerwaldstrasse 39 2136 Steinfurt 48565 2137 DE 2139 Email: timo.voelker@fh-muenster.de