idnits 2.17.1 draft-ietf-tsvwg-datagram-plpmtud-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The abstract seems to indicate that this document updates RFC8201, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC4821, updated by this document, for RFC5378 checks: 2003-10-21) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 5, 2018) is 2059 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-14 == Outdated reference: A later version (-32) exists of draft-ietf-tsvwg-udp-options-05 ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Fairhurst 3 Internet-Draft T. Jones 4 Updates: 4821 (if approved) University of Aberdeen 5 Intended status: Standards Track M. Tuexen 6 Expires: March 9, 2019 I. Ruengeler 7 Muenster University of Applied Sciences 8 September 5, 2018 10 Packetization Layer Path MTU Discovery for Datagram Transports 11 draft-ietf-tsvwg-datagram-plpmtud-04 13 Abstract 15 This document describes a robust method for Path MTU Discovery 16 (PMTUD) for datagram Packetization Layers (PLs). The document 17 describes an extension to RFC 1191 and RFC 8201, which specifies 18 ICMP-based Path MTU Discovery for IPv4 and IPv6. The method allows a 19 PL, or a datagram application that uses a PL, to discover whether a 20 network path can support the current size of datagram. This can be 21 used to detect and reduce the message size when a sender encounters a 22 network black hole (where packets are discarded, and no ICMP message 23 is received). The method can also probe a network path with 24 progressively larger packets to find whether the maximum packet size 25 can be increased. This allows a sender to determine an appropriate 26 packet size, providing functionally for datagram transports that is 27 equivalent to the Packetization layer PMTUD specification for TCP, 28 specified in RFC 4821. 30 The document also provides implementation notes for incorporating 31 Datagram PMTUD into IETF datagram transports or applications that use 32 datagram transports. 34 When published, this specification updates RFC 4821 when used with 35 datagram transports. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at https://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on March 9, 2019. 54 Copyright Notice 56 Copyright (c) 2018 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (https://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 72 1.1. Classical Path MTU Discovery . . . . . . . . . . . . . . 4 73 1.2. Packetization Layer Path MTU Discovery . . . . . . . . . 5 74 1.3. Path MTU Discovery for Datagram Services . . . . . . . . 6 75 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 76 3. Features Required to Provide Datagram PLPMTUD . . . . . . . . 9 77 4. DPLPMTUD Mechanisms . . . . . . . . . . . . . . . . . . . . . 11 78 4.1. PLPMTU Probe Packets . . . . . . . . . . . . . . . . . . 11 79 4.2. Confirmation of Probed Packet Size . . . . . . . . . . . 13 80 4.3. Detection of Black Holes . . . . . . . . . . . . . . . . 13 81 4.4. Response to PTB Messages . . . . . . . . . . . . . . . . 14 82 4.4.1. Validation of PTB Messages . . . . . . . . . . . . . 14 83 4.4.2. Use of PTB Messages . . . . . . . . . . . . . . . . . 15 84 5. Datagram Packetization Layer PMTUD . . . . . . . . . . . . . 16 85 5.1. DPLPMTUD Components . . . . . . . . . . . . . . . . . . . 17 86 5.1.1. Timers . . . . . . . . . . . . . . . . . . . . . . . 17 87 5.1.2. Constants . . . . . . . . . . . . . . . . . . . . . . 17 88 5.1.3. Variables . . . . . . . . . . . . . . . . . . . . . . 18 89 5.2. DPLPMTUD Phases . . . . . . . . . . . . . . . . . . . . . 19 90 5.2.1. Path Confirmation Phase . . . . . . . . . . . . . . . 20 91 5.2.2. Search Phase . . . . . . . . . . . . . . . . . . . . 21 92 5.2.2.1. Resilience to inconsistent path information . . . 21 93 5.2.3. Search Complete Phase . . . . . . . . . . . . . . . . 21 94 5.2.4. PROBE_BASE Phase . . . . . . . . . . . . . . . . . . 22 95 5.2.5. ERROR Phase . . . . . . . . . . . . . . . . . . . . . 22 96 5.2.5.1. Robustness to inconsistent path . . . . . . . . . 23 98 5.2.6. DISABLED Phase . . . . . . . . . . . . . . . . . . . 23 99 5.3. State Machine . . . . . . . . . . . . . . . . . . . . . . 23 100 5.4. Search to Increase the PLPMTU . . . . . . . . . . . . . . 26 101 5.4.1. Probing for a larger PLPMTU . . . . . . . . . . . . . 26 102 5.4.2. Selection of Probe Sizes . . . . . . . . . . . . . . 27 103 5.4.3. Resilience to inconsistent Path information . . . . . 28 104 6. Specification of Protocol-Specific Methods . . . . . . . . . 28 105 6.1. Application support for DPLPMTUD with UDP or UDP-Lite . . 28 106 6.1.1. Application Request . . . . . . . . . . . . . . . . . 29 107 6.1.2. Application Response . . . . . . . . . . . . . . . . 29 108 6.1.3. Sending Application Probe Packets . . . . . . . . . . 29 109 6.1.4. Validating the Path . . . . . . . . . . . . . . . . . 29 110 6.1.5. Handling of PTB Messages . . . . . . . . . . . . . . 29 111 6.2. DPLPMTUD with UDP Options . . . . . . . . . . . . . . . . 30 112 6.2.1. UDP Probe Request Option . . . . . . . . . . . . . . 31 113 6.2.2. UDP Probe Response Option . . . . . . . . . . . . . . 31 114 6.3. DPLPMTUD for SCTP . . . . . . . . . . . . . . . . . . . . 32 115 6.3.1. SCTP/IPv4 and SCTP/IPv6 . . . . . . . . . . . . . . . 32 116 6.3.1.1. Sending SCTP Probe Packets . . . . . . . . . . . 32 117 6.3.1.2. Validating the Path with SCTP . . . . . . . . . . 33 118 6.3.1.3. PTB Message Handling by SCTP . . . . . . . . . . 33 119 6.3.2. DPLPMTUD for SCTP/UDP . . . . . . . . . . . . . . . . 33 120 6.3.2.1. Sending SCTP/UDP Probe Packets . . . . . . . . . 33 121 6.3.2.2. Validating the Path with SCTP/UDP . . . . . . . . 33 122 6.3.2.3. Handling of PTB Messages by SCTP/UDP . . . . . . 33 123 6.3.3. DPLPMTUD for SCTP/DTLS . . . . . . . . . . . . . . . 33 124 6.3.3.1. Sending SCTP/DTLS Probe Packets . . . . . . . . . 34 125 6.3.3.2. Validating the Path with SCTP/DTLS . . . . . . . 34 126 6.3.3.3. Handling of PTB Messages by SCTP/DTLS . . . . . . 34 127 6.4. DPLPMTUD for QUIC . . . . . . . . . . . . . . . . . . . . 34 128 6.4.1. Sending QUIC Probe Packets . . . . . . . . . . . . . 34 129 6.4.2. Validating the Path with QUIC . . . . . . . . . . . . 35 130 6.4.3. Handling of PTB Messages by QUIC . . . . . . . . . . 35 131 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 35 132 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 133 9. Security Considerations . . . . . . . . . . . . . . . . . . . 36 134 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 36 135 10.1. Normative References . . . . . . . . . . . . . . . . . . 36 136 10.2. Informative References . . . . . . . . . . . . . . . . . 38 137 Appendix A. Event-driven state changes . . . . . . . . . . . . . 38 138 Appendix B. Revision Notes . . . . . . . . . . . . . . . . . . . 41 139 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43 141 1. Introduction 143 The IETF has specified datagram transport using UDP, SCTP, and DCCP, 144 as well as protocols layered on top of these transports (e.g., SCTP/ 145 UDP, DCCP/UDP, QUIC/UDP), and direct datagram transport over the IP 146 network layer. This document describes a robust method for Path MTU 147 Discovery (PMTUD) that may be used with these transport protocols (or 148 the applications that use their transport service) to discover an 149 appropriate size of packet to use across an Internet path. 151 This specification clarifies the PLPMTUD method for SCTP described in 152 section 10.2 of [RFC4821] by specifying the procedure in Section 6.3 153 of this document. 155 1.1. Classical Path MTU Discovery 157 Classical Path Maximum Transmission Unit Discovery (PMTUD) can be 158 used with any transport that is able to process ICMP Packet Too Big 159 (PTB) messages (e.g., [RFC1191] and [RFC8201]). The term PTB message 160 is applied to both IPv4 ICMP Unreachable messages (Type 3) that carry 161 the error Fragmentation Needed (Type 3, Code 4) and ICMPv6 packet too 162 big messages (Type 2). When a sender receives a PTB message, it 163 reduces the effective MTU to the value reported in the PTB message 164 (in this document called the PTB_SIZE). A method from time-to-time 165 increases the packet size in attempt to discover an increase in the 166 supported PMTU. The packets sent with a size larger than the current 167 effective PMTU are known as probe packets. 169 Packets not intended as probe packets are either fragmented to the 170 current effective PMTU, or an attempt to send a packet larger than 171 current effective PMTU fails with an error code. Applications are 172 sometimes provided with a primitive to let them read the maximum 173 packet size, derived from the current effective PMTU. 175 Classical PMTUD is subject to protocol failures. One failure arises 176 when traffic using a packet size larger than the actual PMTU is black 177 holed (all datagrams sent with this size, or larger, are silently 178 discarded without the sender receiving ICMP PTB messages). This 179 could arise when the PTB messages are not delivered back to the 180 sender for some reason [RFC2923]). For example, ICMP messages are 181 increasingly filtered by middleboxes (including firewalls) [RFC4890]. 182 A stateful firewall could be configured with a policy to block 183 incoming ICMP messages, which would prevent reception of PTB messages 184 to endpoints behind this firewall. Other examples include cases 185 where PTB messages are not correctly processed/generated by tunnel 186 endpoints. 188 Another failure could result if a node that is not on the network 189 path sends a PTB message that attempts to force the sender to change 190 the effective PMTU [RFC8201]. A sender can protect itself from 191 reacting to such messages by utilising the quoted packet within a PTB 192 message payload to validate that the received PTB message was 193 generated in response to a packet that had actually originated from 194 the sender. However, there are situations where a sender would be 195 unable to provide this validation. 197 Examples where validation of the PTB message is not possible include: 199 o When the router issuing the ICMP message is acting on a tunneled 200 packet, the ICMP message will be directed to the tunnel endpoint. 201 This tunnel endpoint is responsible for forwarding the ICMP 202 message and also processing the quoted packet within the payload 203 field to remove the effect of the tunnel, and return a correctly 204 formatted ICMP message to the sender. Failure to do appropriate 205 processing therefore results in black-holing. 207 o When a router issuing the ICMP message implements RFC 792 208 [RFC0792], it is only required to include (quote) the first 64 209 bits of the IP payload of the packet within the ICMP payload. 210 This could be insufficient to perform the tunnel processing 211 described in the previous bullet. Even if the decapsulated 212 message is processed by the tunnel endpoint, there could be 213 insufficient bytes remaining for the sender to interpret the 214 quoted transport information. RFC 1812 [RFC1812] requires routers 215 to return the full packet if possible. This can result in black- 216 holing when used the path includes tunnels. 218 o When a router issuing the ICMP message quotes a packet with an 219 encrypted transport, it may lack sufficient context to determine 220 the original transport header. 222 o Even when the PTB message includes sufficient bytes of the quoted 223 packet, the network layer could lack sufficient context to 224 validate the ICMP message, because this depends on information 225 about the active transport flows at an endpoint node (e.g., the 226 socket/address pairs being used, and other protocol header 227 information). 229 1.2. Packetization Layer Path MTU Discovery 231 The term Packetization Layer (PL) has been introduced to describe the 232 layer that is responsible for placing data blocks into the payload of 233 IP packets and selecting an appropriate Maximum Packet Size (MPS). 234 This function is often performed by a transport protocol, but can 235 also be performed by other encapsulation methods working above the 236 transport layer. 238 In contrast to PMTUD, Packetization Layer Path MTU Discovery 239 (PLPMTUD) [RFC4821] does not rely upon reception and validation of 240 PTB messages. It is therefore more robust than Classical PMTUD. 242 This has become the recommended approach for implementing PMTU 243 discovery with TCP. 245 It uses a general strategy where the PL sends probe packets to search 246 for the largest size of unfragmented datagram that can be sent over a 247 network path. The probe packets are sent with a progressively larger 248 packet size. If a probe packet is successfully delivered (as 249 determined by the PL), then the PLPMTU is raised to the size of the 250 successful probe. If no response is received to a probe packet, the 251 method reduces the probe size. This PLPMTU is used to set the 252 application MPS. 254 PLPMTUD introduces flexibility in the implementation of PMTU 255 discovery. At one extreme, it can be configured to only perform PTB 256 black hole detection and recovery to increase the robustness of 257 Classical PMTUD, or at the other extreme, all PTB processing can be 258 disabled and PLPMTUD can completely replace Classical PMTUD. 260 PLPMTUD can also include additional consistency checks without 261 increasing the risk of increased black-holing. For instance,the 262 information available at the PL, or higher layers, makes PTB 263 validation more straight forward. 265 1.3. Path MTU Discovery for Datagram Services 267 Section 5 of this document presents a set of algorithms for datagram 268 protocols to discover the largest size of unfragmented datagram that 269 can be sent over a network path. The method described relies on 270 features of the PL described in Section 3 and applies to transport 271 protocols operating over IPv4 and IPv6. It does not require 272 cooperation from the lower layers, although it can utilise ICMP PTB 273 messages when these received messages are made available to the PL. 275 The UDP Usage Guidelines [RFC8085] state "an application SHOULD 276 either use the Path MTU information provided by the IP layer or 277 implement Path MTU Discovery (PMTUD)", but does not provide a 278 mechanism for discovering the largest size of unfragmented datagram 279 that can be used on a network path. Prior to this document, PLPMTUD 280 had not been specified for UDP. 282 Section 10.2 of [RFC4821] recommends a PLPMTUD probing method for the 283 Stream Control Transport Protocol (SCTP). SCTP utilises heartbeat 284 messages as probe packets, but RFC4821 does not provide a complete 285 specification. The present document provides the details to complete 286 that specification. 288 The Datagram Congestion Control Protocol (DCCP) [RFC4340] requires 289 implementations to support Classical PMTUD and states that a DCCP 290 sender "MUST maintain the MPS allowed for each active DCCP session". 291 It also defines the current congestion control MPS (CCMPS) supported 292 by a network path. This recommends use of PMTUD, and suggests use of 293 control packets (DCCP-Sync) as path probe packets, because they do 294 not risk application data loss. The method defined in this 295 specification could be used with DCCP. 297 Section 6 specifies the method for a set of transports, and provides 298 information to enable the implementation of PLPMTUD with other 299 datagram transports and applications that use datagram transports. 301 2. Terminology 303 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 304 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 305 document are to be interpreted as described in [RFC2119]. 307 Other terminology is directly copied from [RFC4821], and the 308 definitions in [RFC1122]. 310 Actual PMTU: The Actual PMTU is the PMTU of a network path between a 311 sender PL and a destination PL, which the DPLPMTUD algorithm seeks 312 to determine. 314 Black Holed: Packets are Black holed when the sender is unaware that 315 packets are not delivered to the destination endpoint (e.g., when 316 the sender transmits packets of a particular size with a 317 previously known effective PMTU and they are silently discarded by 318 the network, but is not made aware of a change to the path that 319 resulted in a smaller PLPMTU by ICMP messages). 321 Classical Path MTU Discovery: Classical PMTUD is a process described 322 in [RFC1191] and [RFC8201], in which nodes rely on PTB messages to 323 learn the largest size of unfragmented datagram that can be used 324 across a network path. 326 Datagram: A datagram is a transport-layer protocol data unit, 327 transmitted in the payload of an IP packet. 329 Effective PMTU: The Effective PMTU is the current estimated value 330 for PMTU that is used by a PMTUD. This is equivalent to the 331 PLPMTU derived by PLPMTUD. 333 EMTU_S: The Effective MTU for sending (EMTU_S) is defined in 334 [RFC1122] as "the maximum IP datagram size that may be sent, for a 335 particular combination of IP source and destination addresses...". 337 EMTU_R: The Effective MTU for receiving (EMTU_R) is designated in 338 [RFC1122] as the largest datagram size that can be reassembled by 339 EMTU_R ("Effective MTU to receive"). 341 Link: A Link is a communication facility or medium over which nodes 342 can communicate at the link layer, i.e., a layer below the IP 343 layer. Examples are Ethernet LANs and Internet (or higher) layer 344 and tunnels. 346 Link MTU: The Link Maximum Transmission Unit (MTU) is the size in 347 bytes of the largest IP packet, including the IP header and 348 payload, that can be transmitted over a link. Note that this 349 could more properly be called the IP MTU, to be consistent with 350 how other standards organizations use the acronym. This includes 351 the IP header, but excludes link layer headers and other framing 352 that is not part of IP or the IP payload. Other standards 353 organizations generally define the link MTU to include the link 354 layer headers. 356 MPS: The Maximum Packet Size (MPS) is the largest size of 357 application data block that can be sent across a network path. In 358 DPLPMTUD this quantity is derived from the PLPMTU by taking into 359 consideration the size of the lower protocol layer headers. 361 MIN_PMTU: The MIN_PMTU is the smallest size of PLPMTU that DPLPTMUD 362 will attempt to use. 364 Packet: A Packet is the IP header plus the IP payload. 366 Packetization Layer (PL): The Packetization Layer (PL) is the layer 367 of the network stack that places data into packets and performs 368 transport protocol functions. 370 Path: The Path is the set of links and routers traversed by a packet 371 between a source node and a destination node by a particular flow. 373 Path MTU (PMTU): The Path MTU (PMTU) is the minimum of the Link MTU 374 of all the links forming a network path between a source node and 375 a destination node. 377 PTB_SIZE: The PTB_SIZE is a value reported in a validated PTB 378 message that indicates next hop link MTU of a router along the 379 path. 381 PLPMTU: The Packetization Layer PMTU is an estimate of the actual 382 PMTU provided by the DPLPMTUD algorithm. 384 PLPMTUD: Packetization Layer Path MTU Discovery (PLPMTUD), the 385 method described in this document for datagram PLs, which is an 386 extension to Classical PMTU Discovery. 388 Probe packet: A probe packet is a datagram sent with a purposely 389 chosen size (typically the current PLPMTU or larger) to detect if 390 packets of this size can be successfully sent end-to-end across 391 the network path. 393 3. Features Required to Provide Datagram PLPMTUD 395 TCP PLPMTUD has been defined using standard TCP protocol mechanisms. 396 All of the requirements in [RFC4821] also apply to the use of the 397 technique with a datagram PL. Unlike TCP, some datagram PLs require 398 additional mechanisms to implement PLPMTUD. 400 There are eight requirements for performing the datagram PLPMTUD 401 method described in this specification: 403 1. PMTU parameters: A DPLPMTUD sender is RECOMMENDED to provide 404 information about the maximum size of packet that can be 405 transmitted by the sender on the local link (the local Link MTU). 406 It MAY utilize similar information about the receiver when this 407 is supplied (note this could be less than EMTU_R). This avoids 408 implementations trying to send probe packets that can not be 409 transmitted by the local link. Too high of a value could reduce 410 the efficiency of the search algorithm. Some applications also 411 have a maximum transport protocol data unit (PDU) size, in which 412 case there is no benefit from probing for a size larger than this 413 (unless a transport allows multiplexing multiple applications 414 PDUs into the same datagram). 416 2. PLPMTU: A datagram application is REQUIRED to be able to choose 417 the size of datagrams sent to the network, up to the PLPMTU, or a 418 smaller value (such as the MPS) derived from this. This value is 419 managed by the DPLPMTUD method. The PLPMTU (specified as the 420 effective PMTU in Section 1 of [RFC1191]) is equivalent to the 421 EMTU_S (specified in [RFC1122]). 423 3. Probe packets: On request, a DPLPMTUD sender is REQUIRED to be 424 able to transmit a packet larger than the PLMPMTU. This is used 425 to send a probe packet. In IPv4, a probe packet MUST be sent 426 with the Don't Fragment (DF) bit set in the IP header, and 427 without network layer endpoint fragmentation. In IPv6, a probe 428 packet is always sent without source fragmentation (as specified 429 in section 5.4 of [RFC8201]). 431 4. Processing PTB messages: A DPLPMTUD sender MAY optionally utilize 432 PTB messages received from the network layer to help identify 433 when a network path does not support the current size of probe 434 packet. Any received PTB message MUST be validated before it is 435 used to update the PLPMTU discovery information [RFC8201]. This 436 validation confirms that the PTB message was sent in response to 437 a packet originating by the sender, and needs to be performed 438 before the PLPMTU discovery method reacts to the PTB message. 439 When the PTB_SIZE is indicated in the PTB message, this MAY be 440 used by DPLPMTUD to reduce the probe size but MUST NOT be used to 441 increase the PLPMTU ([RFC8201]). This validation SHOULD utilise 442 information that can not be simply determined by an off-path 443 attacker, for example, by checking the value of a protocol header 444 field known only to the two PL endpoints. (Some datagram 445 applications use well-known source and destination ports and 446 therefore this check needs to rely on other information.) 448 5. Reception feedback: The destination PL endpoint is REQUIRED to 449 provide a feedback method that indicates to the DPLPMTUD sender 450 when a probe packet has been received by the destination PL 451 endpoint. The mechanism needs to be robust to the possibility 452 that packets could be significantly delayed along a network path. 453 The local PL endpoint at the sending node is REQUIRED to pass 454 this feedback to the sender-side DPLPMTUD method. 456 6. Probing and congestion control: The isolated loss of a probe 457 packet SHOULD NOT be treated as an indication of congestion and 458 its loss SHOULD NOT directly trigger a congestion control 459 reaction [RFC4821]. 461 7. Probe loss recovery: If the data block carried by a probe packet 462 needs to be sent reliably, the PL (or layers above) are REQUIRED 463 to arrange any retransmission/repair of any resulting loss. This 464 method is REQUIRED to be robust in the case where probe packets 465 are lost due to other reasons (including link transmission error, 466 congestion). The DPLPMTUD sender treats isolated loss of a probe 467 packet (with or without an PTB message) as a potential indication 468 of a PMTU limit for the path, but not as an indication of 469 congestion, see Paragraph 6. 471 8. Shared PLPMTU state: The PLPMTU value could also be stored with 472 the corresponding entry in the destination cache and used by 473 other PL instances. The specification of PLPMTUD [RFC4821] 474 states: "If PLPMTUD updates the MTU for a particular path, all 475 Packetization Layer sessions that share the path representation 476 (as described in Section 5.2 of [RFC4821]) SHOULD be notified to 477 make use of the new MTU and make the required congestion control 478 adjustments". Such methods MUST be robust to the wide variety of 479 underlying network forwarding behaviours, PLPMTU adjustments 480 based on shared PLPMTU values should be incorporated in the 481 search algorithms. Section 5.2 of [RFC8201] provides guidance on 482 the caching of PMTU information and also the relation to IPv6 483 flow labels. 485 In addition, the following principles are stated for design of a 486 DPLPMTUD method: 488 o MPS: A method is REQUIRED to signal an appropriate MPS to the 489 higher layer using the PL. The value of the MPS can change 490 following a change to the path. It is RECOMMENDED that methods 491 avoid forcing an application to use an arbitrary small MPS 492 (PLPMTU) for transmission while the method is searching for the 493 currently supported PLPMTU. Datagram PLs do not necessarily 494 support fragmentation of PDUs larger than the PLPMTU. A reduced 495 MPS can adversely impact the performance of a datagram 496 application. 498 o Path validation: It is RECOMMENDED that methods are robust to path 499 changes that could have occurred since the path characteristics 500 were last confirmed, and to the possibility of inconsistent path 501 information being received. 503 o Datagram reordering: A method is REQUIRED to be robust to the 504 possibility that a flow encounters reordering, or the traffic 505 (including probe packets) is divided over more than one network 506 path. 508 o When to probe: It is RECOMMENDED that methods determine whether 509 the path capacity has increased since it last measured the path. 510 This determines when the path should again be probed. 512 4. DPLPMTUD Mechanisms 514 This section lists the protocol mechanisms used in this 515 specification. 517 4.1. PLPMTU Probe Packets 519 The DPLPMTUD method relies upon the PL sender being able to generate 520 probe packets with a specific size. TCP is able to generate these 521 probe packets by choosing to appropriately segment data being sent 522 [RFC4821]. In contrast, a datagram PL that needs to construct a 523 probe packet has to either request an application to send a data 524 block that is larger than that generated by an application, or to 525 utilise padding functions to extend a datagram beyond the size of the 526 application data block. Protocols that permit exchange of control 527 messages (without an application data block) could alternatively 528 prefer to generate a probe packet by extending a control message with 529 padding data. 531 A receiver needs to be able to distinguish an in-band data block from 532 any added padding. This is needed to ensure that any added padding 533 is not passed on to an application at the receiver. 535 This results in three possible ways that a sender can create a probe 536 packet listed in order of preference: 538 Probing using padding data: A probe packet that contains only 539 control information together with any padding, which is needed to 540 be inflated to the size required for the probe packet. Since 541 these probe packets do not carry an application-supplied data 542 block, they do not typically require retransmission, although they 543 do still consume network capacity and incur endpoint processing. 545 Probing using application data and padding data: A probe packet that 546 contains a data block supplied by an application that is combined 547 with padding to inflate the length of the datagram to the size 548 required for the probe packet. If the application/transport needs 549 protection from the loss of this probe packet, the application/ 550 transport could perform transport-layer retransmission/repair of 551 the data block (e.g., by retransmission after loss is detected or 552 by duplicating the data block in a datagram without the padding 553 data). 555 Probing using application data: A probe packet that contains a data 556 block supplied by an application that matches the size required 557 for the probe packet. This method requests the application to 558 issue a data block of the desired probe size. If the application/ 559 transport needs protection from the loss of an unsuccessful probe 560 packet, the application/transport needs then to perform transport- 561 layer retransmission/repair of the data block (e.g., by 562 retransmission after loss is detected). 564 A PL that uses a probe packet carrying an application data block, 565 could need to retransmit this application data block if the probe 566 fails. This could need the PL to re-fragment the data block to a 567 smaller packet size that is expected to traverse the end-to-end path 568 (which could utilise endpoint network-layer or PL fragmentation when 569 these are available). 571 DPLPMTUD MAY choose to use only one of these methods to simplify the 572 implementation. 574 Probe messages sent by a PL MUST contain enough information to 575 uniquely identify the probe within Maximum Segment Lifetime, while 576 being robust to reordering and replay of probe response and ICMP PTB 577 messages. 579 4.2. Confirmation of Probed Packet Size 581 The PL needs a method to determine (confirm) when probe packets have 582 been successfully received end-to-end across a network path. 584 Transport protocols can include end-to-end methods that detect and 585 report reception of specific datagrams that they send (e.g., DCCP and 586 SCTP provide keep-alive/heartbeat features). When supported, this 587 mechanism SHOULD also be used by DPLPMTUD to acknowledge reception of 588 a probe packet. 590 A PL that does not acknowledge data reception (e.g., UDP and UDP- 591 Lite) is unable itself to detect when the packets that it sends are 592 discarded because their size is greater than the actual PMTU. These 593 PLs need to either rely on an application protocol to detect this 594 loss, or make use of an additional transport method such as UDP- 595 Options [I-D.ietf-tsvwg-udp-options]. 597 Section Section 5 specifies this function for a set of IETF-specified 598 protocols. 600 4.3. Detection of Black Holes 602 A PL sender needs to reduce the PLPMTU when it discovers the actual 603 PMTU supported by a network path is less than the PLPMTU (i.e. to 604 detect that traffic is being black holed). This can be triggered 605 when a validated PTB message is received, or by another event that 606 indicates the network path no longer sustains the current packet 607 size, such as a loss report from the PL or repeated lack of response 608 to probe packets sent to confirm the PLPMTU. Detection is followed 609 by a reduction of the PLPMTU. 611 Black Hole detection is performed by periodically sending packet 612 probes of size PLPMTU to verify that a network path still supports 613 the last acknowledged PLPMTU size. There are two ways a DPLPMTUD 614 sender detect that the current PLPMTU is not sustained by the path 615 (i.e., to detect a black hole): 617 o A PL can rely upon a mechanisms implemented within the PL protocol 618 to detect excessive loss of data sent with a specific packet size 619 and then conclude that this excessive loss could be a result of an 620 invalid PMTU (as in PLPMTUD for TCP [RFC4821]). 622 o A PL can use the probing mechanism to send confirmation probe 623 packets of the size of the current PLPMTU and a timer track 624 whether acknowledgments are received (e.g., The number of probe 625 packets sent without receiving an acknowledgement, PROBE_COUNT, 626 becomes greater than the MAX_PROBES). These messages need to be 627 generated periodically (e.g., using the confirmation timer 628 Section 5.1.1), and should be suppressed when the PL is not 629 actively sending data. Successive loss of probes is an indication 630 that the current path no longer supports the PLPMTU. 632 When the method detects the current PLPMTU is not supported (a black 633 hole is found), DPLPMTUD sets a lower MPS. The PL then confirms that 634 the updated PLPMTU can be successfully used across the path. This 635 can need the PL to send a probe packet with a size less than the size 636 of the data block generated by an application. In this case, the PL 637 could provide a way to fragment a datagram at the PL, or could 638 instead utilise a control packet with padding. 640 4.4. Response to PTB Messages 642 This method requires the DPLPMTUD sender to validate any received PTB 643 message before using the PTB information. The response to a PTB 644 message depends on the PTB_SIZE indicated in the PTB message, the 645 state of the PLPMTUD state machine, and the IP protocol being used. 647 Section 4.4.1 first describes validation for both IPv4 ICMP 648 Unreachable messages (type 3) and ICMPv6 packet too big messages, 649 both of which are referred to as PTB messages in this document. 651 4.4.1. Validation of PTB Messages 653 A PL that receives a PTB message from a router or middlebox, MUST 654 perform ICMP validation as specified in Section 5.2 of [RFC8085]. 655 This needs the PL to check the protocol information in the quoted 656 payload to validate the message originated from the sending node. 657 This check includes determining the appropriate port and IP 658 information - necessary for the PTB message to be passed to the PL. 659 In addition, the PL SHOULD validate information from the ICMP payload 660 to determine that the quoted packet was sent by the PL. These checks 661 are intended to provide protection from packets that originate from a 662 node that is not on the network path. PTB messages are discarded if 663 they fail to pass these checks, or where there is insufficient ICMP 664 payload to perform the checks 666 PTB messages that have been validated can be utilised by the DPLPMTUD 667 algorithm. A method that utilises these PTB messages can improve the 668 speed at the which the algorithm detects an appropriate PLPMTU, 669 compared to one that relies solely on probing. 671 4.4.2. Use of PTB Messages 673 A set of checks are intended to provide protection from a router that 674 reports an unexpected PTB_SIZE. The PL needs to check that the 675 indicated PTB_SIZE is less than the size used by probe packets and 676 larger than minimum size accepted. 678 This section provides an informative summary of how PTB messages can 679 be utilised. 681 Validating PTB Messages: 683 * A simple implementation is permitted to ignore received PTB 684 messages and therefore the PLPMTU is not updated when a PTB 685 message is received. 687 * An implementation that supports PTB messages MUST validate 688 messages before they are processed. 690 MIN_PMTU < PTB_SIZE < BASE_MTU 692 * A robust PL MAY enter the PROBE_ERROR state for an IPv4 path 693 when the PTB_SIZE reported in the PTB message >= 576B and when 694 this is less than the BASE_MTU. 696 * A robust PL MAY enter the PROBE_ERROR state for an IPv6 path 697 when the PTB_SIZE reported in the PTB message >= 1280B and when 698 this is less than the BASE_MTU. 700 PTB_SIZE = PLPMTU 702 * Transition to SEARCH_COMPLETE. 704 PTB_SIZE > PROBED_SIZE 706 * The PTB_SIZE > PROBED_SIZE, inconsistent network signal. These 707 PTB messages ought to be discarded without further processing 708 (the PLPMTU not updated). 710 * The information could be utilised as an input to trigger 711 enabling a resilience mode. 713 BASE_PMTU <= PTB_SIZE < PLPMTU 715 * Black hole detection is triggered and the PLPMTU ought to be 716 set to BASE_PMTU. 718 * The PL could use PTB_SIZE reported in the PTB message to 719 initialise a search algorithm. 721 PLPMTU < PTB_SIZE < PROBED_SIZE 723 * The PLPMTU continues to be valid, but the last PROBED_SIZE 724 searched was larger than the actual PMTU. 726 * The PLPMTU is not updated. 728 * The PL can use the reported PTB_SIZE from the PTB message as 729 the next search point when it resumes the search algorithm. 731 5. Datagram Packetization Layer PMTUD 733 This section specifies Datagram PLPMTUD (DPLPMTUD). The method can 734 be introduced at various points in the IP protocol stack to discover 735 the PLPMTU so that an application can utilise an appropriate MPS for 736 the current network path. 738 +----------------------+ 739 | APP* | 740 +-+-------+----+---+---+ 741 | | | | 742 +---+--+ +--+--+ | +-+---+ 743 | QUIC*| |UDPO*| | |SCTP*| 744 +---+--+ +--+--+ | ++--+-+ 745 | | | | | 746 +-------+-+ | | | 747 | | | | 748 ++-+--++ | 749 | UDP | | 750 +---+--+ | 751 | | 752 +--------------+-----+-+ 753 | Network Interface | 754 +----------------------+ 756 Figure 1: Examples where DPLPMTUD can be implemented 758 The central idea of DPLPMTUD is probing by a sender. Probe packets 759 are sent to find the maximum size of user message that is completely 760 transferred across the network path from the sender to the 761 destination. 763 This section identifies the components needed for implementation, the 764 phases of operation, the state machine and search algorithm. 766 5.1. DPLPMTUD Components 768 This section describes components of DPLPMTUD. 770 5.1.1. Timers 772 The method utilises three timers: 774 PROBE_TIMER: The PROBE_TIMER is configured to expire after a period 775 longer than the maximum time to receive an acknowledgment to a 776 probe packet. This value MUST be larger than 1 second, and SHOULD 777 be larger than 15 seconds. Guidance on selection of the timer 778 value are provided in section 3.1.1 of the UDP Usage Guidelines 779 [RFC8085]. 781 If the PL has a path Round Trip Time (RTT) estimate and timely 782 acknowledgements the PROBE_TIMER can be derived from the PL RTT 783 estimate. 785 PMTU_RAISE_TIMER: The PMTU_RAISE_TIMER is configured to the period a 786 sender will continue to use the current PLPMTU, after which it re- 787 enters the Search phase. This timer has a period of 600 secs, as 788 recommended by PLPMTUD [RFC4821]. 790 DPLPMTUD SHOULD inhibit sending probe packets when no application 791 data has been sent since the previous probe packet. 793 CONFIRMATION_TIMER: The CONFIRMATION_TIMER is configured to the 794 period a PL sender waits before confirming the current PLPMTU is 795 still supported. This is less than the PMTU_RAISE_TIMER and used 796 to decrease the PLPMTU (e.g., when a black hole is encountered). 797 Confirmation needs to be frequent enough when data is flowing that 798 the sending PL does not black hole extensive amounts of traffic. 799 Guidance on selection of the timer value are provided in section 800 3.1.1 of the UDP Usage Guidelines[RFC8085]. 802 DPLPMTUD SHOULD inhibit sending probe packets when no application 803 data has been sent since the previous probe packet. 805 An implementation could implement the various timers using a single 806 timer process. 808 5.1.2. Constants 810 The following constants are defined: 812 MAX_PROBES: MAX_PROBES is the maximum value of the 813 PROBE_ERROR_COUNTER. The default value of MAX_PROBES is 10. 815 MIN_PMTU: The MIN_PMTU is smallest allowed probe packet size. For 816 IPv6, this value is 1280 bytes, as specified in [RFC2460]. For 817 IPv4, the minimum value is 68 bytes. (An IPv4 router is required 818 to be able to forward a datagram of 68 octets without further 819 fragmentation. This is the combined size of an IPv4 header and 820 the minimum fragment size of 8 octets. In addition, receivers are 821 required to be able to reassemble fragmented datagrams at least up 822 to 576B, as stated in section 3.3.3 of [RFC1122])) 824 MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU. This has to 825 be less than or equal to the minimum of the local MTU of the 826 outgoing interface and the destination PMTU for receiving. An 827 application or PL MAY reduce the MAX_PMTU when there is no need to 828 send packets larger than a specific size. 830 BASE_PMTU: The BASE_PMTU is a configured size expected to work for 831 most paths. The size is equal to or larger than the MIN_PMTU and 832 smaller than the MAX_PMTU. In the case of IPv6, this value is 833 1280 bytes [RFC2460]. When using IPv4, a size of 1200 bytes is 834 RECOMMENDED. 836 5.1.3. Variables 838 This method utilises a set of variables: 840 PROBED_SIZE: The PROBED_SIZE is the size of the current probe 841 packet. This is a tentative value for the PLPMTU, which is 842 awaiting confirmation by an acknowledgment. 844 PROBE_COUNT: The PROBE_COUNT is a count of the number of 845 unsuccessful probe packets that have been sent with a size of 846 PROBED_SIZE. The value is initialised to zero when a particular 847 size of PROBED_SIZE is first attempted. 849 The figure below illustrates the relationship between the packet size 850 constants and variables, in this case when the DPLPMTUD algorithm 851 performs path probing to increase the size of the PLPMTU. The MPS is 852 less than the PLPMTU. A probe packet has been sent of size 853 PROBED_SIZE. When this is acknowledged, the PLPMTU will be raised to 854 PROBED_SIZE allowing the PROBED_SIZE to be increased towards the 855 actual PMTU. 857 MIN_PMTU PMTU_MAX 858 <------------------------------------------------------> 859 | | | | | 860 V | | | V 861 BASE_PMTU V | V Actual PMTU 862 MPS | PROBED_SIZE 863 V 864 PLPMTU 866 Figure 2: Relationships between probe and packet sizes 868 5.2. DPLPMTUD Phases 870 The Datagram PLPMTUD algorithm moves through several phases of 871 operation. 873 An implementation that only reduces the PLPMTU to a suitable size 874 would be sufficient to ensure reliable operation, but can be very 875 inefficient when the actual PMTU changes or when the method (for 876 whatever reason) makes a suboptimal choice for the PLPMTU. 878 A full implementation of DPLPMTUD provides an algorithm enabling the 879 DPLPMTUD sender to increase the PLPMTU following a change in the 880 characteristics of the path, such as when a link is reconfigured with 881 a larger MTU, or when there is a change in the set of links traversed 882 by an end-to-end flow (e.g., after a routing or path fail-over 883 decision). 885 Black hole detection, see Section 4.3 and PTB processing Section 4.4 886 proceed in parallel with these phases of operation. 888 +-------------------+ 889 | Path Confirmation +-- Connectivity 890 +--------+----------+ \----- or BASE_PMTU 891 | /\ \/ Confirmation Fails 892 Connectivity and | | +-------+ 893 BASE_PMTU confirmed | ---------+ Error | 894 | +-------+ 895 | CONFIRMATION_TIMER 896 | Fires 897 \/ 898 +----------------+ +--------------+ 899 | Search Complete|<---------+ Search | 900 +----------------+ +--------------+ 901 Search Algorithm 902 Completes 904 Figure 3: DPLPMTUD Phases 906 Path Confirmation 908 * Connectivity is confirmed. 910 * DPLPMTUD confirms the BASE_PMTU is supported across the network 911 path. 913 * DPLPMTUD then enters the search phase. 915 Search 917 * DPLPMTUD performs probing to increase the PLPMTU. 919 * DPLPMTUD then enters the search complete or an error phase. 921 Search Complete 923 * DPLPMTUD has found a suitable PLPMTU that is supported across 924 the network path. 926 * Black hole detection will confirm this PLPMTU continues to be 927 supported. 929 * On a longer time-frame, DPLPMTUD will re-enter the search phase 930 to discover if the PLPMTU can be raised. 932 Error 934 * Inconsistent or invalid network signals cause DPLPMTUD to be 935 unable to progress. 937 * This causes the algorithm to lower the MPS until the path is 938 shown to support the BASE_PMTU, or to suspend DPLPMTUD. 940 5.2.1. Path Confirmation Phase 942 DPLPMTUD starts in the Path confirmation phase. Path confirmation is 943 performed in two stages: 945 1. Connectivity to the remote peer is first confirmed. When a 946 connection-oriented PL is used, this stage is implicit. It is 947 performed as part of the normal PL connection handshake. In 948 contrast, an connectionless PL MUST send an acknowledged probe 949 packet to confirm that the remote peer is reachable. 951 2. In the second stage, the PL confirms it can successfully send a 952 datagram of the BASE_PMTU size across the current path. 954 A PL that does not wish to support a network path with a PLPMTU less 955 than BASE_PMTU can simplify the phase into a single step by 956 performing connectivity checks with probes of the BASE_PMTU size. 958 A PL MAY respond to PTB messages while in this phase, see 959 Section 4.4. 961 Once path confirmation has completed, DPLPMTUD can advertise an MPS 962 to an upper layer. 964 If DPLPMTUD fails to complete these tests it enters the 965 PROBE_DISABLED phase, see Section 5.2.6, and ceases using DPLPTMUD. 967 5.2.2. Search Phase 969 The search phase utilises a search algorithm in attempt to increase 970 the PLPMTU (see Section 5.4.1). The PL sender increases the MPS each 971 time a packet probe confirms a larger PLPMTU is supported by the 972 path. The algorithm concludes by entering the SEARCH_COMPLETE phase, 973 see Section 5.2.3. 975 A PL MAY respond to PTB messages while in this phase, using the PTB 976 to advance or terminate the search, see Section 4.4. Similarly black 977 hole detection can terminate the search by entering the PROBE_BASE 978 phase, see Section 5.2.4. 980 5.2.2.1. Resilience to inconsistent path information 982 Sometimes a PL sender is able to detect inconsistent results from the 983 sequence of PLPMTU probes that it sends or the sequence of PTB 984 messages that it receives. This could be manifested as excessive 985 fluctuation of the MPS. 987 When inconsistent path information is detected, a PL sender can 988 enable an alternate search mode that clamps the offered MPS to a 989 smaller value for a period of time. This avoids unnecessary black- 990 holing of packets. 992 5.2.3. Search Complete Phase 994 On entry to the search complete phase, the DPLPMTUD sender starts the 995 PMTU_RAISE_TIMER. In this phase, the PLPMTU remains at the value 996 confirmed by the last successful probe packet. 998 In this phase, the PL MUST periodically confirm that the PLPMTU is 999 still supported by the path. If the PL is designed in a way that is 1000 unable to confirm reachability to the destination endpoint after 1001 probing has completed, the method uses a CONFIRMATION_TIMER to 1002 periodically repeat a probe packet for the current PLPMTU size. 1004 If the DPLPMTUD sender is unable to confirm reachability for packets 1005 with a size of the current PLPMTU (e.g., if the CONFIRMATION_TIMER 1006 expires) or the PL signals a lack of reachability, the method exits 1007 the phase and enters the PROBE_BASE phase, see Section 5.2.4. 1009 If the PMTU_RAISE_TIMER expires, the DPLPMTUD sender re-enters the 1010 Search phase, see Section 5.2.2, and resumes probing for a larger 1011 PLPMTU. 1013 Back hole detection can be used in parallel to check that a network 1014 path continues to support a previously confirmed PLPMTU. If a black 1015 hole is detected the algorithm moves to the PROBE_BASE phase, see 1016 Section 5.2.4. 1018 The phase can also exited when a validated PTB message is received 1019 (see Section 4.4.1). 1021 5.2.4. PROBE_BASE Phase 1023 This phase is entered when black hole detection or a PTB message 1024 indicates that the PLPMTU is not supported by the path. 1026 On entry to this phase, the PLPMTU is set to the BASE_PMTU, and a 1027 corresponding reduced MPS is advertised. 1029 PROBED_SIZE is then set to the PLPMTU (i.e., the BASE_PMTU), to 1030 confirm this size is supported across the path. If confirmed, 1031 DPLPMTUD enters the Search Phase to determine whether the PL sender 1032 can use a larger PLPMTU. 1034 If the path cannot be confirmed to support the BASE_PMTU after 1035 sending MAX_PROBES, DPLPMTUD moves to the Error phase, see 1036 Section 5.2.5. 1038 5.2.5. ERROR Phase 1040 The ERROR phase is entered when there is conflicting or invalid 1041 PLPMTU information for the path (e.g. a failure to support the 1042 BASE_PMTU). In this phase, the MPS is set to a value less than the 1043 BASE_PMTU, but at least the size of the MIN_PMTU. 1045 DPLPMTUD remains in the ERROR phase until a consistent view of the 1046 path can be discovered and it has also been confirmed that the path 1047 supports the BASE_PMTU. 1049 Note: MIN_PMTU may be identical to BASE_PMTU, simplifying the actions 1050 in this phase. 1052 If no acknowledgement is received for PROBE_COUNT probes of size 1053 MIN_PMTU, the method suspends DPLPMTUD, see Section 5.2.5. 1055 5.2.5.1. Robustness to inconsistent path 1057 Robustness to paths unable to sustain the BASE_PMTU. Some paths 1058 could be unable to sustain packets of the BASE_PMTU size. These 1059 paths could use an alternate algorithm to implement the PROBE_ERROR 1060 phase that allows fallback to a smaller than desired PLPMTU, rather 1061 than suffer connectivity failure. 1063 This could also utilise methods such as endpoint IP fragmentation to 1064 enable the PL sender to communicate using packets smaller than the 1065 BASE_PMTU. 1067 5.2.6. DISABLED Phase 1069 This phase suspends operation of DPLPMTUD. It disables probing for 1070 the PLPMTU until action is taken by the PL or application using the 1071 PL. 1073 5.3. State Machine 1075 A state machine for DPLPMTUD is depicted in Figure 4. If multihoming 1076 is supported, a state machine is needed for each active path. 1078 PROBE_TIMER expiry 1079 (PROBE_COUNT = MAX_PROBES) 1080 +-------------------+ +--------------+ 1081 | PROBE_START +------>|PROBE_DISABLED| 1082 +-------------------+ +--------------+ 1083 | ^ 1084 | Path confirmed | 1085 v | 1086 MAX_PMTU acked or +--------------+-+ (PROBE_COUNT | 1087 PTB (BASE_PMTU <= +---------| PROBE_SEARCH | | < MAX_PROBES) | 1088 PTB_SIZE | +--> +--------------+<+ or Probe acked | 1089 | PROBE_BASE |<-------| PROBE_ERROR | 1108 +------+--------+ +--------------+ +-------------+ 1109 /\ | Black hole detected ^ | | BASE_PMTU Probe acked: ^ 1110 | | or | | | | 1111 | | (PTB_SIZE < PLPMTU) | | | Probe BASE_PMTU: | 1112 | | | | | (PROBE_COUNT = MAX_PROBES)| 1113 | | | | +---------------------------+ 1114 +----+ +--+ 1115 Confirmation: PROBE_TIMER expiry: 1116 (PROBE_COUNT < MAX_PROBES) (PROBE_COUNT < MAX_PROBES) 1117 or 1118 PLPMTU Probe acked 1120 Figure 4: State machine for Datagram PLPMTUD. Note: Some state 1121 changes are not show to simplify the diagram. 1123 The following states are defined: 1125 PROBE_START: The PROBE_START state is the initial state before 1126 probing has started. The state confirms connectivity to the 1127 remote PL. 1129 The PLPMTU is set to the BASE_PMTU size. Probing ought to start 1130 immediately after connection setup to prevent the prevent the loss 1131 of user data. PLPMTUD is not performed in this state. The state 1132 transitions to PROBE_SEARCH, when a network path has been 1133 confirmed, i.e., when a sent packet has been acknowledged on this 1134 network path and the BASE_PMTU is confirmed to be supported. If 1135 the network path cannot be confirmed this state transitions to 1136 PROBE_DISABLED. 1138 PROBE_SEARCH: The PROBE_SEARCH state is the main probing state. 1139 This state is entered when probing for the BASE_PMTU was 1140 successful. 1142 The PROBE_COUNT is set to zero when the first probe packet is sent 1143 for each probe size. Each time a probe packet is acknowledged, 1144 the PLPMTU is set to the PROBED_SIZE, and then the PROBED_SIZE is 1145 increased using the search algorithm. 1147 When a probe packet is sent and not acknowledged within the period 1148 of the PROBE_TIMER, the PROBE_COUNT is incremented and the probe 1149 packet is retransmitted. The state is exited when the PROBE_COUNT 1150 reaches MAX_PROBES; a PTB message is validated; a probe of size 1151 PMTU_MAX is acknowledged or black hole detection is triggered. 1153 SEARCH_COMPLETE: The SEARCH_COMPLETE state indicates a successful 1154 end to the PROBE_SEARCH state. DPLPMTUD remains in this state 1155 until either the PMTU_RAISE_TIMER expires; a received PTB message 1156 is validated; or black hole detection is triggered. 1158 When DPLPMTUD uses an unacknowledged PL and is in the 1159 SEARCH_COMPLETE state, a CONFIRMATION_TIMER periodically resets 1160 the PROBE_COUNT and schedules a probe packet with the size of the 1161 PLPMTU. If the probe packet fails to be acknowledged after 1162 MAX_PROBES attempts, the method enters the PROBE_BASE state. When 1163 used with an acknowledged PL (e.g., SCTP), DPLPMTUD SHOULD NOT 1164 continue to generate PLPMTU probes in this state. 1166 PROBE_BASE: The PROBE_BASE state is used to confirm whether the 1167 BASE_PMTU size is supported by the network path and is designed to 1168 allow an application to continue working when there are transient 1169 reductions in the actual PMTU. It also seeks to avoid long 1170 periods where traffic is black holed while searching for a larger 1171 PLPMTU. 1173 On entry, the PROBED_SIZE is set to the BASE_PMTU size and the 1174 PROBE_COUNT is set to zero. 1176 Each time a probe packet is sent, and the PROBE_TIMER is started. 1177 The state is exited when the probe packet is acknowledged, and the 1178 PL sender enters the PROBE_SEARCH state. 1180 The state is also left when the PROBE_COUNT reaches MAX_PROBES; a 1181 PTB message is validated. This causes the PL sender to enter the 1182 PROBE_ERROR state. 1184 PROBE_ERROR: The PROBE_ERROR state represents the case where the 1185 network path is not known to support a PLPMTU of at least the 1186 BASE_PMTU size. It is entered when either a probe of size 1187 BASE_PMTU has not been acknowledged or a validated PTB message 1188 indicates a smaller PTB_SIZE smaller than the BASE_PMTU. 1190 On entry, the PROBE_COUNT is set to zero and the PROBED_SIZE is 1191 set to the MIN_PMTU size, and the PLPMTU is reset to MIN_PMTU 1192 size. In this state, a probe packet is sent, and the PROBE_TIMER 1193 is started. The state transitions to the PROBE_SEARCH state when 1194 a probe packet is acknowledged of at least size BASE_PMTU. Robust 1195 implementations may validate the BASE_PMTU several times before 1196 transition to the PROBE_SEARCH. 1198 Implementations are permitted to enable endpoint fragmentation if 1199 the DPLPMTUD is unable to validate MIN_PMTU within PROBE_COUNT 1200 probes. If DPLPMTUD is unable to validate MIN_PMTU the 1201 implementation should transition to PROBE_DISABLED. 1203 PROBE_DISABLED: The PROBE_DISABLED state indicates that connectivity 1204 could not be established. DPLPMTUD MUST NOT probe in this state. 1206 Appendix A contains an informative description of key events. 1208 5.4. Search to Increase the PLPMTU 1210 This section describes the algorithms used by DPLPMTUD to search for 1211 a larger PLPMTU. 1213 5.4.1. Probing for a larger PLPMTU 1215 Implementations use a search algorithm across the search range to 1216 determine whether a larger PLPMTU can be supported across a network 1217 path. 1219 The method discovers the search range by confirming the minimum 1220 PLPMTU and then using the probe method to select a PROBED_SIZE less 1221 than or equal to PMTU_MAX. PMTU_MAX is the minimum of the local MTU 1222 and EMTU_R (learned from the remote endpoint). The PMTU_MAX MAY be 1223 reduced by an application that sets a maximum to the size of 1224 datagrams it will send. 1226 The PROBE_COUNT is initialised to zero when a probe packet is first 1227 sent with a particular size. A timer is used by the search algorithm 1228 to trigger the sending of probe packets of size PROBED_SIZE, larger 1229 than the PLPMTU. Each probe packet successfully sent to the remote 1230 peer is confirmed by acknowledgement at the PL, see Section 4.1. 1232 Each time a probe packet is sent to the destination, the PROBE_TIMER 1233 is started. The timer is cancelled when the PL receives 1234 acknowledgment that the probe packet has been successfully sent 1235 across the path Section 4.1. This confirms that the PROBED_SIZE is 1236 supported, and the PROBED_SIZE value is then assigned to the PLPMTU. 1237 The search algorithm can continue to send subsequent probe packets of 1238 an increasing size. 1240 If the timer expires before a probe packet is acknowledged, the probe 1241 has failed to confirm the PROBED_SIZE. Each time the PROBE_TIMER 1242 expires, the PROBE_COUNT is incremented, the PROBE_TIMER is 1243 reinitialised, and a probe packet of the same size is retransmitted 1244 (the replicated probe improve the resilience to loss). The maximum 1245 number of retransmissions for a particular size is configured 1246 (MAX_PROBES). If the value of the PROBE_COUNT reaches MAX_PROBES, 1247 probing will stop, and the PL sender enters the SEARCH_COMPLETE 1248 state. 1250 5.4.2. Selection of Probe Sizes 1252 The search algorithm needs to determine a minimum useful gain in 1253 PLPMTU. It would not be constructive for a PL sender to attempt to 1254 probe for all sizes - this would incur unnecessary load on the path 1255 and has the undesirable effect of slowing the time to reach a more 1256 optimal MPS. Implementations SHOULD select the set of probe packet 1257 sizes to maximise the gain in PLPMTU from each search step. 1259 Implementations could optimize the search procedure by selecting step 1260 sizes from a table of common PMTU sizes. When selecting the 1261 appropriate next size to search, an implementor ought to also 1262 consider that there can be common sizes of MPS that applications seek 1263 to use. 1265 xxx Author Note: A future version of this section will detail example 1266 methods for selecting probe size values, but does not plan to mandate 1267 a single method. xxx 1269 5.4.3. Resilience to inconsistent Path information 1271 A decision to increase the PLPMTU needs to be resilient to the 1272 possibility that information learned about the network path is 1273 inconsistent (this could happen when probe packets are lost due to 1274 other reasons, or some of the packets in a flow are forwarded along a 1275 portion of the path that supports a different actual PMTU). 1277 Frequent path changes could occur due to unexpected "flapping" - 1278 where some packets from a flow pass along one path, but other packets 1279 follow a different path with different properties. DPLPMTUD can be 1280 made resilient to these anomalies by introducing hysteresis into the 1281 search decision to increase the MPS. 1283 6. Specification of Protocol-Specific Methods 1285 This section specifies protocol-specific details for datagram PLPMTUD 1286 for IETF-specified transports. 1288 The first subsection provides guidance on how to implement the 1289 DPLPMTUD method as a part of an application using UDP or UDP-Lite. 1290 The guidance also applies to other datagram services that do not 1291 include a specific transport protocol (such as a tunnel 1292 encapsulation). The following subsection describe how DPLPMTUD can 1293 be implemented as a part of the transport service, allowing 1294 applications using the service to benefit from discovery of the 1295 PLPMTU without themselves needing to implement this method. 1297 6.1. Application support for DPLPMTUD with UDP or UDP-Lite 1299 The current specifications of UDP [RFC0768] and UDP-Lite [RFC3828] do 1300 not define a method in the RFC-series that supports PLPMTUD. In 1301 particular, the UDP transport does not provide the transport layer 1302 features needed to implement datagram PLPMTUD. 1304 The DPLPMTUD method can be implemented as a part of an application 1305 built directly or indirectly on UDP or UDP-Lite, but relies on 1306 higher-layer protocol features to implement the method [RFC8085]. 1308 Some primitives used by DPLPMTUD might not be available via the 1309 Datagram API (e.g., the ability to access the PLPMTU cache, or 1310 interpret received ICMP PTB messages). 1312 In addition, it is desirable that PMTU discovery is not performed by 1313 multiple protocol layers. An application SHOULD avoid implementing 1314 DPLPMTUD when the underlying transport system provides this 1315 capability. Using a common method for managing the PLPMTU has 1316 benefits, both in the ability to share state between different 1317 processes and opportunities to coordinate probing. 1319 6.1.1. Application Request 1321 An application needs an application-layer protocol mechanism (such as 1322 a message acknowledgement method) that solicits a response from a 1323 destination endpoint. The method SHOULD allow the sender to check 1324 the value returned in the response to provide additional protection 1325 from off-path insertion of data [RFC8085], suitable methods include a 1326 parameter known only to the two endpoints, such as a session ID or 1327 initialised sequence number. 1329 6.1.2. Application Response 1331 An application needs an application-layer protocol mechanism to 1332 communicate the response from the destination endpoint. This 1333 response may indicate successful reception of the probe across the 1334 path, but could also indicate that some (or all packets) have failed 1335 to reach the destination. 1337 6.1.3. Sending Application Probe Packets 1339 A probe packet that may carry an application data block, but the 1340 successful transmission of this data is at risk when used for 1341 probing. Some applications may prefer to use a probe packet that 1342 does not carry an application data block to avoid disruption to 1343 normal data transfer. 1345 6.1.4. Validating the Path 1347 An application that does not have other higher-layer information 1348 confirming correct delivery of datagrams SHOULD implement the 1349 CONFIRMATION_TIMER to periodically send probe packets while in the 1350 SEARCH_COMPLETE state. 1352 6.1.5. Handling of PTB Messages 1354 An application that is able and wishes to receive PTB messages MUST 1355 perform ICMP validation as specified in Section 5.2 of [RFC8085]. 1356 This requires that the application to check each received PTB 1357 messages to validate it is received in response to transmitted 1358 traffic and that the reported PTB_SIZE is less than the current 1359 probed size. A validated PTB message MAY be used as input to the 1360 DPLPMTUD algorithm, but MUST NOT be used directly to set the PLPMTU. 1362 6.2. DPLPMTUD with UDP Options 1364 UDP Options[I-D.ietf-tsvwg-udp-options] can supply the additional 1365 functionality required to implement DPLPMTUD within the UDP transport 1366 service. Implementing DPLPMTU using UDP Options avoids the need for 1367 each application to implement the DPLPMTUD method. 1369 Section 5.6 of[I-D.ietf-tsvwg-udp-options] defines the MSS option, 1370 which allows the local sender to indicate the EMTU_R to the peer. 1371 The value received in this option can be used to initialise PMTU_MAX. 1373 UDP Options enables padding to be added to UDP datagrams that are 1374 used as Probe Packets. Feedback confirming reception of each Probe 1375 Packet is provided by two new UDP Options: 1377 o The Probe Request Option (Section 6.2.1) is set by a sending PL to 1378 solicit a response from a remote endpoint. A four-byte token 1379 identifies each request. 1381 o The Probe Response Option (Section 6.2.2 is generated by the UDP 1382 Options receiver in response to reception of a previously received 1383 Probe Request Option. Each Probe Response Option echoes a 1384 previously received four-byte token. 1386 The token value allows implementations to be distinguish between 1387 acknowledgements for initial probe packets and acknowledgements 1388 confirming receipt of subsequent probe packets (e.g., travelling 1389 along alternate paths with a larger RTT). Each probe packet needs to 1390 be uniquely identifiable by the UDP Options sender within the Maximum 1391 Segment Lifetime (MSL). The UDP Options sender therefore needs to 1392 not recycle token values until they have expired or have been 1393 acknowledged. A 4 byte value for the token field provides sufficient 1394 space for multiple unique probes to be made within the MSL. 1396 Implementations ought to only send a probe packet with a Probe 1397 Request Option when required by their local state machine, i.e., when 1398 probing to grow the PLPMTU or to confirm the current PLPMTU. The 1399 procedure to handle the loss of a response packet is the 1400 responsibility of the sender of the request. 1402 A PL needs to determine that the path can still support the size of 1403 datagram that the application is currently sending in the DPLPMTUD 1404 search_done state (i.e., to detect black-holing of data). One way to 1405 achieve this is to send probe packets of size PLPMTU or to utilise a 1406 higher-layer method that provides explicit feedback indicating any 1407 packet loss. Another possibility is to utilise data packets that 1408 carry a Timestamp Option. Reception of a valid timestamp that was 1409 echoed by the remote endpoint can be used to infer connectivity. 1411 This can provide useful feedback even over paths with asymmetric 1412 capacity and/or that carry UDP Option flows that have very asymmetric 1413 datagram rates, because an echo of the most recent timestamp still 1414 indicates reception of at least one packet of the transmitted size. 1415 This is sufficient to confirm there is no black hole. 1417 In contrast, when sending a probe to increase the PLPMTU, a timestamp 1418 may be unable to unambiguously identify that a specific probe packet 1419 has been received. Timestamp mechanisms cannot be used to confirm 1420 the reception of individual probe messages and cannot be used to 1421 stimulate a response from the remote peer. 1423 6.2.1. UDP Probe Request Option 1425 The Probe Request Option allows a sending endpoint to solicit a 1426 response from a destination endpoint. 1428 The Probe Request Option carries a four byte token set by the sender. 1429 This token can be set to a value that is likely to be known only to 1430 the sender (and is sent along the end-to-end path). The sender can 1431 then check the value returned in the UDP Probe Response Option. The 1432 value of the Token field, uniquely identifies a probe within the 1433 maximum segment lifetime and can also provide additional protection 1434 from off-path insertion of data[RFC8085]. 1436 +---------+--------+-----------------+ 1437 | Kind=9 | Len=6 | Token | 1438 +---------+--------+-----------------+ 1439 1 byte 1 byte 4 bytes 1441 Figure 5: UDP Probe REQ Option Format 1443 6.2.2. UDP Probe Response Option 1445 The Probe Response Option is generated in response to reception of a 1446 previously received Probe Request Option. 1448 The Probe Response Option carries a four byte token field. The Token 1449 field associates the response with the Token value carried in the 1450 most recently-received Echo Request. The rate of generation of UDP 1451 packets carrying a Probe Response Option MAY be rate-limited. 1453 +---------+--------+-----------------+ 1454 | Kind=10 | Len=6 | Token | 1455 +---------+--------+-----------------+ 1456 1 byte 1 byte 4 bytes 1458 Figure 6: UDP Probe RES Option Format 1460 6.3. DPLPMTUD for SCTP 1462 Section 10.2 of [RFC4821] specifies a recommended PLPMTUD probing 1463 method for SCTP. It recommends the use of the PAD chunk, defined in 1464 [RFC4820] to be attached to a minimum length HEARTBEAT chunk to build 1465 a probe packet. This enables probing without affecting the transfer 1466 of user messages and without interfering with congestion control. 1467 This is preferred to using DATA chunks (with padding as required) as 1468 path probes. 1470 XXX Author Note: Future versions of this document might define a 1471 parameter contained in the INIT and INIT ACK chunk to indicate the 1472 remote peer MTU to the local peer. However, multihoming makes this a 1473 bit complex, so it might not be worth doing. XXX 1475 6.3.1. SCTP/IPv4 and SCTP/IPv6 1477 The base protocol is specified in [RFC4960]. This provides an 1478 acknowledged PL. A sender can therefore enter the PROBE_BASE state 1479 as soon as connectivity has been confirmed. 1481 6.3.1.1. Sending SCTP Probe Packets 1483 Probe packets consist of an SCTP common header followed by a 1484 HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to control 1485 the length of the probe packet. The HEARTBEAT chunk is used to 1486 trigger the sending of a HEARTBEAT ACK chunk. The reception of the 1487 HEARTBEAT ACK chunk acknowledges reception of a successful probe. 1489 The HEARTBEAT chunk carries a Heartbeat Information parameter which 1490 should include, besides the information suggested in [RFC4960], the 1491 probe size, which is the size of the complete datagram. The size of 1492 the PAD chunk is therefore computed by reducing the probing size by 1493 the IPv4 or IPv6 header size, the SCTP common header, the HEARTBEAT 1494 request and the PAD chunk header. The payload of the PAD chunk 1495 contains arbitrary data. 1497 To avoid fragmentation of retransmitted data, probing starts right 1498 after the handshake, before data is sent. Assuming normal behaviour 1499 (i.e., the PMTU is smaller than or equal to the interface MTU), this 1500 process will take a few round trip time periods depending on the 1501 number of PMTU sizes probed. The Heartbeat timer can be used to 1502 implement the PROBE_TIMER. 1504 6.3.1.2. Validating the Path with SCTP 1506 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1507 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1509 6.3.1.3. PTB Message Handling by SCTP 1511 Normal ICMP validation MUST be performed as specified in Appendix C 1512 of [RFC4960]. This requires that the first 8 bytes of the SCTP 1513 common header are quoted in the payload of the PTB message, which can 1514 be the case for ICMPv4 and is normally the case for ICMPv6. 1516 When a PTB message has been validated, the PTB_SIZE reported in the 1517 PTB message SHOULD be used with the DPLPMTUD algorithm, providing 1518 that the reported PTB_SIZE is less than the current probe size. 1520 6.3.2. DPLPMTUD for SCTP/UDP 1522 The UDP encapsulation of SCTP is specified in [RFC6951]. 1524 6.3.2.1. Sending SCTP/UDP Probe Packets 1526 Packet probing can be performed as specified in Section 6.3.1.1. The 1527 maximum payload is reduced by 8 bytes, which has to be considered 1528 when filling the PAD chunk. 1530 6.3.2.2. Validating the Path with SCTP/UDP 1532 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1533 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1535 6.3.2.3. Handling of PTB Messages by SCTP/UDP 1537 Normal ICMP validation MUST be performed for PTB messages as 1538 specified in Appendix C of [RFC4960]. This requires that the first 8 1539 bytes of the SCTP common header are contained in the PTB message, 1540 which can be the case for ICMPv4 (but note the UDP header also 1541 consumes a part of the quoted packet header) and is normally the case 1542 for ICMPv6. When the validation is completed, the PTB_SIZE indicated 1543 in the PTB message SHOULD be used with the DPLPMTUD providing that 1544 the reported PTB_SIZE is less than the current probe size. 1546 6.3.3. DPLPMTUD for SCTP/DTLS 1548 The Datagram Transport Layer Security (DTLS) encapsulation of SCTP is 1549 specified in [RFC8261]. It is used for data channels in WebRTC 1550 implementations. 1552 6.3.3.1. Sending SCTP/DTLS Probe Packets 1554 Packet probing can be done as specified in Section 6.3.1.1. 1556 6.3.3.2. Validating the Path with SCTP/DTLS 1558 Since SCTP provides an acknowledged PL, a sender MUST NOT implement 1559 the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1561 6.3.3.3. Handling of PTB Messages by SCTP/DTLS 1563 It is not possible to perform normal ICMP validation as specified in 1564 [RFC4960], since even if the ICMP message payload contains sufficient 1565 information, the reflected SCTP common header would be encrypted. 1566 Therefore it is not possible to process PTB messages at the PL. 1568 6.4. DPLPMTUD for QUIC 1570 Quick UDP Internet Connection (QUIC) [I-D.ietf-quic-transport] is a 1571 UDP-based transport that provides reception feedback. 1573 Section 9.2 of [I-D.ietf-quic-transport] describes the path 1574 considerations when sending QUIC packets. It recommends the use of 1575 PADDING frames to build the probe packet. This enables probing 1576 without affecting the transfer of other QUIC frames. 1578 This provides an acknowledged PL. A sender can therefore enter the 1579 PROBE_BASE state as soon as connectivity has been confirmed. 1581 6.4.1. Sending QUIC Probe Packets 1583 A probe packet consists of a QUIC Header and a payload containing 1584 only PADDING Frames. PADDING Frames are a single octet (0x00) and 1585 several of these can be used to create a probe packet of size 1586 PROBED_SIZE. QUIC provides an acknowledged PL. A sender can 1587 therefore enter the PROBE_BASE state as soon as connectivity has been 1588 confirmed. 1590 The current specification of QUIC sets the following: 1592 o BASE_PMTU: 1200. A QUIC sender needs to pad initial packets to 1593 1200 bytes to confirm the path can support packets of a useful 1594 size. 1596 o MIN_PMTU: 1200 bytes. A QUIC sender that determines the PMTU has 1597 fallen below 1200 bytes MUST immediately stop sending on the 1598 affected path. 1600 6.4.2. Validating the Path with QUIC 1602 QUIC provides an acknowledged PL. A sender therefore MUST NOT 1603 implement the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. 1605 6.4.3. Handling of PTB Messages by QUIC 1607 QUIC operates over the UDP transport, and the guidelines on ICMP 1608 validation as specified in Section 5.2 of [RFC8085] therefore apply. 1609 Although QUIC does not currently specify a method for validating ICMP 1610 responses, it does provide some guidelines to make it harder for an 1611 off-path attacker to inject ICMP messages. 1613 o Set the IPv4 Don't Fragment (DF) bit on a small proportion of 1614 packets, so that most invalid ICMP messages arrive when there are 1615 no DF packets outstanding, and can therefore be identified as 1616 spurious. 1618 o Store additional information from the IP or UDP headers from DF 1619 packets (for example, the IP ID or UDP checksum) to further 1620 authenticate incoming Datagram Too Big messages. 1622 o Any reduction in PMTU due to a report contained in an ICMP packet 1623 is provisional until QUIC's loss detection algorithm determines 1624 that the packet is actually lost. 1626 XXX The above list was pulled whole from quic-transport - input is 1627 invited from QUIC contributors. XXX 1629 7. Acknowledgements 1631 This work was partially funded by the European Union's Horizon 2020 1632 research and innovation programme under grant agreement No. 644334 1633 (NEAT). The views expressed are solely those of the author(s). 1635 8. IANA Considerations 1637 This memo includes no request to IANA. 1639 XXX If new UDP Options are specified in this document, a request to 1640 IANA will be included here. XXX 1642 If there are no requirements for IANA, the section will be removed 1643 during conversion into an RFC by the RFC Editor. 1645 9. Security Considerations 1647 The security considerations for the use of UDP and SCTP are provided 1648 in the references RFCs. Security guidance for applications using UDP 1649 is provided in the UDP Usage Guidelines [RFC8085]. 1651 There are cases where PTB messages are not delivered due to policy, 1652 configuration or equipment design (see Section 1.1), this method 1653 therefore does not rely upon PTB messages being received, but is able 1654 to utilise these when they are received by the sender. PTB messages 1655 could potentially be used to cause a node to inappropriately reduce 1656 the PLPMTU. A node supporting DPLPMTUD MUST therefore appropriately 1657 validate the payload of PTB messages to ensure these are received in 1658 response to transmitted traffic (i.e., a reported error condition 1659 that corresponds to a datagram actually sent by the path layer). 1661 Parallel forwarding paths may need to be considered. Section 5.2.5.1 1662 identifies the need for robustness in the method when the path 1663 information may be inconsistent. 1665 A node performing DPLPMTUD could experience conflicting information 1666 about the size of supported probe packets. This could occur when 1667 there are multiple paths are concurrently in use and these exhibit a 1668 different PMTU. If not considered, this could result in data being 1669 black holed when the PLPMTU is larger than the smallest PMTU across 1670 the current paths. 1672 An on-path attacker could forge PTB messages to drive down the PLPMTU 1674 10. References 1676 10.1. Normative References 1678 [I-D.ietf-quic-transport] 1679 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1680 and Secure Transport", draft-ietf-quic-transport-14 (work 1681 in progress), August 2018. 1683 [I-D.ietf-tsvwg-udp-options] 1684 Touch, J., "Transport Options for UDP", draft-ietf-tsvwg- 1685 udp-options-05 (work in progress), July 2018. 1687 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1688 DOI 10.17487/RFC0768, August 1980, 1689 . 1691 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1692 RFC 792, DOI 10.17487/RFC0792, September 1981, 1693 . 1695 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1696 Communication Layers", STD 3, RFC 1122, 1697 DOI 10.17487/RFC1122, October 1989, 1698 . 1700 [RFC1812] Baker, F., Ed., "Requirements for IP Version 4 Routers", 1701 RFC 1812, DOI 10.17487/RFC1812, June 1995, 1702 . 1704 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1705 Requirement Levels", BCP 14, RFC 2119, 1706 DOI 10.17487/RFC2119, March 1997, 1707 . 1709 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1710 (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, 1711 December 1998, . 1713 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., 1714 and G. Fairhurst, Ed., "The Lightweight User Datagram 1715 Protocol (UDP-Lite)", RFC 3828, DOI 10.17487/RFC3828, July 1716 2004, . 1718 [RFC4820] Tuexen, M., Stewart, R., and P. Lei, "Padding Chunk and 1719 Parameter for the Stream Control Transmission Protocol 1720 (SCTP)", RFC 4820, DOI 10.17487/RFC4820, March 2007, 1721 . 1723 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1724 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1725 . 1727 [RFC6951] Tuexen, M. and R. Stewart, "UDP Encapsulation of Stream 1728 Control Transmission Protocol (SCTP) Packets for End-Host 1729 to End-Host Communication", RFC 6951, 1730 DOI 10.17487/RFC6951, May 2013, 1731 . 1733 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1734 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1735 March 2017, . 1737 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1738 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1739 DOI 10.17487/RFC8201, July 2017, 1740 . 1742 [RFC8261] Tuexen, M., Stewart, R., Jesup, R., and S. Loreto, 1743 "Datagram Transport Layer Security (DTLS) Encapsulation of 1744 SCTP Packets", RFC 8261, DOI 10.17487/RFC8261, November 1745 2017, . 1747 10.2. Informative References 1749 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1750 DOI 10.17487/RFC1191, November 1990, 1751 . 1753 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 1754 RFC 2923, DOI 10.17487/RFC2923, September 2000, 1755 . 1757 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1758 Congestion Control Protocol (DCCP)", RFC 4340, 1759 DOI 10.17487/RFC4340, March 2006, 1760 . 1762 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1763 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 1764 . 1766 [RFC4890] Davies, E. and J. Mohacsi, "Recommendations for Filtering 1767 ICMPv6 Messages in Firewalls", RFC 4890, 1768 DOI 10.17487/RFC4890, May 2007, 1769 . 1771 Appendix A. Event-driven state changes 1773 This appendix contains an informative description of key events: 1775 Path Setup: When a new path is initiated, the state is set to 1776 PROBE_START. This sends a probe packet with the size of the 1777 BASE_PMTU. As soon as the path is confirmed, the state changes to 1778 PROBE_SEARCH. 1780 Arrival of an Acknowledgment: Depending on the probing state, the 1781 reaction differs according to Figure 7, which is a simplification 1782 of Figure 4 focusing on this event. 1784 +--------------+ +----------------+ 1785 | PROBE_START | --3------------------------------> | PROBE_DISABLED | 1786 +--------------+ --4---------------- ------------> +----------------+ 1787 \/ 1788 +--------------+ /\ +--------------+ 1789 | PROBE_ERROR | -------------------- \ ----------> | PROBE_BASE | 1790 +--------------+ --4--------------/ \ +--------------+ 1791 \ 1792 +--------------+ --1 -------- \ +--------------+ 1793 | PROBE_BASE | \ --- \ ------> | PROBE_ERROR | 1794 +--------------+ --3--------- \ -----/ \ +--------------+ 1795 \ \ 1796 +--------------+ \ -----> +--------------+ 1797 | PROBE_SEARCH | --2--- -----------------> | PROBE_SEARCH | 1798 +--------------+ \ ------------------> +--------------+ 1799 \ ---- / 1800 +---------------+ / \ +---------------+ 1801 |SEARCH_COMPLETE| -1--- \ |SEARCH_COMPLETE| 1802 +---------------+ -5-- -----------------------> +---------------+ 1803 \ 1804 \ +--------------+ 1805 --------------------------> | PROBE_BASE | 1806 +--------------+ 1808 Condition 1: The maximum PMTU size has not yet been reached. 1809 Condition 2: The maximum PMTU size has been reached. Condition 3: 1810 Probe Timer expires and PROBE_COUNT = MAX_PROBEs. Condition 4: 1811 PROBE_ACK received. Condition 5: Black hole detected. 1813 Figure 7: State changes at the arrival of an acknowledgment 1815 Probing timeout: The PROBE_COUNT is initialised to zero each time 1816 the value of PROBED_SIZE is changed and when a acknowledgment 1817 confirming delivery of a probe packet. The PROBE_TIMER is started 1818 each time a probe packet is sent. It is stopped when an 1819 acknowledgment arrives that confirms delivery of a probe packet of 1820 PROBED_SIZE. If the probe packet is not acknowledged before the 1821 PROBE_TIMER expires, the PROBE_COUNT is incremented. When the 1822 PROBE_COUNT equals the value MAX_PROBES, the state is changed, 1823 otherwise a new probe packet of the same size (PROBED_SIZE) is 1824 resent. The state transitions are illustrated in Figure 8. This 1825 shows a simplification of Figure 4 with a focus only on this 1826 event. 1828 +--------------+ +----------------+ 1829 | PROBE_START | --2------------------------------->| PROBE_DISABLED | 1830 +--------------+ +----------------+ 1832 +--------------+ +--------------+ 1833 | PROBE_ERROR | -----------------> | PROBE_ERROR | 1834 +--------------+ / +--------------+ 1835 / 1836 +--------------+ --2----------/ +--------------+ 1837 | PROBE_BASE | --1------------------------------> | PROBE_BASE | 1838 +--------------+ +--------------+ 1840 +--------------+ +--------------+ 1841 | PROBE_SEARCH | --1------------------------------> | PROBE_SEARCH | 1842 +--------------+ --2--------- +--------------+ 1843 \ 1844 +---------------+ \ +---------------+ 1845 |SEARCH_COMPLETE| -------------------> |SEARCH_COMPLETE| 1846 +---------------+ +---------------+ 1848 Condition 1: The maximum number of probe packets has not been 1849 reached. Condition 2: The maximum number of probe packets has been 1850 reached. XXX This diagram has not been validated. 1852 Figure 8: State changes at the expiration of the probe timer 1854 PMTU raise timer timeout: DPLPMTUD periodically sends a probe packet 1855 to detect whether a larger PMTU is possible. This probe packet is 1856 generated by the PMTU_RAISE_TIMER. 1858 Arrival of a PTB message: The active probing of the path can be 1859 supported by the arrival of a PTB message indicating the PTB_SIZE. 1860 Two examples are: 1862 1. The PTB_SIZE is between the PLPMTU and the probe that 1863 triggered the PTB message. 1865 2. The PTB_SIZE is smaller than the PLPMTU. 1867 In first case, the PROBE_BASE state transitions to the PROBE_ERROR 1868 state. In the PROBE_SEARCH state, a new probe packet is sent with 1869 the size reported by the PTB message. 1871 In second case, the probing starts again with a value of 1872 PROBE_BASE. 1874 Appendix B. Revision Notes 1876 Note to RFC-Editor: please remove this entire section prior to 1877 publication. 1879 Individual draft -00: 1881 o Comments and corrections are welcome directly to the authors or 1882 via the IETF TSVWG working group mailing list. 1884 o This update is proposed for WG comments. 1886 Individual draft -01: 1888 o Contains the first representation of the algorithm, showing the 1889 states and timers 1891 o This update is proposed for WG comments. 1893 Individual draft -02: 1895 o Contains updated representation of the algorithm, and textual 1896 corrections. 1898 o The text describing when to set the effective PMTU has not yet 1899 been validated by the authors 1901 o To determine security to off-path-attacks: We need to decide 1902 whether a received PTB message SHOULD/MUST be validated? The text 1903 on how to handle a PTB message indicating a link MTU larger than 1904 the probe has yet not been validated by the authors 1906 o No text currently describes how to handle inconsistent results 1907 from arbitrary re-routing along different parallel paths 1909 o This update is proposed for WG comments. 1911 Working Group draft -00: 1913 o This draft follows a successful adoption call for TSVWG 1915 o There is still work to complete, please comment on this draft. 1917 Working Group draft -01: 1919 o This draft includes improved introduction. 1921 o The draft is updated to require ICMP validation prior to accepting 1922 PTB messages - this to be confirmed by WG 1924 o Section added to discuss Selection of Probe Size - methods to be 1925 evlauated and recommendations to be considered 1927 o Section added to align with work proposed in the QUIC WG. 1929 Working Group draft -02: 1931 o The draft was updated based on feedback from the WG, and a 1932 detailed review by Magnus Westerlund. 1934 o The document updates RFC 4821. 1936 o Requirements list updated. 1938 o Added more explicit discussion of a simpler black-hole detection 1939 mode. 1941 o This draft includes reorganisation of the section on IETF 1942 protocols. 1944 o Added more discussion of implementation within an application. 1946 o Added text on flapping paths. 1948 o Replaced 'effective MTU' with new term PLPMTU. 1950 Working Group draft -03: 1952 o Updated figures 1954 o Added more discussion on blackhole detection 1956 o Added figure describing just blackhole detection 1958 o Added figure relating MPS sizes 1960 Working Group draft -04: 1962 o Described phases and named these consistently. 1964 o Corrected transition from confirmation directly to the search 1965 phase (Base has been checked). 1967 o Redrawn state diagrams. 1969 o Renamed BASE_MTU to BASE_PMTU (because it is a base for the PMTU). 1971 o Clarified Error state. 1973 o Clarified supsending DPLPMTUD. 1975 o Verified normative text in requirements section. 1977 o Removed duplicate text. 1979 o Changed all text to refer to /packet probe/probe packet/ 1980 /validation/verification/ added term /Probe Confirmation/ and 1981 clarified BlackHole detection. 1983 Authors' Addresses 1985 Godred Fairhurst 1986 University of Aberdeen 1987 School of Engineering 1988 Fraser Noble Building 1989 Aberdeen AB24 3U 1990 UK 1992 Email: gorry@erg.abdn.ac.uk 1994 Tom Jones 1995 University of Aberdeen 1996 School of Engineering 1997 Fraser Noble Building 1998 Aberdeen AB24 3U 1999 UK 2001 Email: tom@erg.abdn.ac.uk 2003 Michael Tuexen 2004 Muenster University of Applied Sciences 2005 Stegerwaldstrasse 39 2006 Stein fart 48565 2007 DE 2009 Email: tuexen@fh-muenster.de 2010 Irene Ruengeler 2011 Muenster University of Applied Sciences 2012 Stegerwaldstrasse 39 2013 Stein fart 48565 2014 DE 2016 Email: i.ruengeler@fh-muenster.de