This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@ietf.org if you reply to or forward this review. TSV-ART Review of draft-ietf-tls-dtls13-41 Reviewer: Bernard Aboba Summary: The timeout and retransmission scheme looks workable for common cases, but could use some refinement to make it more robust. Technical Comments 4.5.2. Handling Invalid Records Unlike TLS, DTLS is resilient in the face of invalid records (e.g., invalid formatting, length, MAC, etc.). In general, invalid records SHOULD be silently discarded, thus preserving the association; however, an error MAY be logged for diagnostic purposes. [BA] How does silent discard of invalid records interact with retransmission timers? Implementations which choose to generate an alert instead, MUST generate error alerts to avoid attacks where the attacker repeatedly probes the implementation to see how it responds to various types of error. Note that if DTLS is run over UDP, then any implementation which does this will be extremely susceptible to denial-of-service (DoS) attacks because UDP forgery is so easy. Thus, this practice is NOT RECOMMENDED for such transports, both to increase the reliability of DTLS service and to avoid the risk of spoofing attacks sending traffic to unrelated third parties. [BA] "this practice" refers to "generate an alert instead", correct? 5.8.2. Timer Values Though timer values are the choice of the implementation, mishandling of the timer can lead to serious congestion problems, for example if [BA] Saying "timer values are the choice of the implementation" seems odd, because it is followed by normative language. I would delete this and start the sentence with "Mishandling...". many instances of a DTLS time out early and retransmit too quickly on a congested link. Implementations SHOULD use an initial timer value of 100 msec (the minimum defined in RFC 6298 [RFC6298]) and double the value at each retransmission, up to no less than 60 seconds (the RFC 6298 maximum). Application specific profiles, such as those used for the Internet of Things environment, may recommend longer timer values. Note that a 100 msec timer is recommended rather than the 3-second RFC 6298 default in order to improve latency for time- sensitive applications. Because DTLS only uses retransmission for handshake and not dataflow, the effect on congestion should be minimal. Implementations SHOULD retain the current timer value until a message is transmitted and acknowledged without having to be retransmitted, at which time the value may be reset to the initial value. [BA] Is it always possible to distinguish a retransmission from a late arrival of an original packet? This seems like it could result in wrongly resetting the timer in some situations. 5.8.3. Large Flight Sizes DTLS does not have any built-in congestion control or rate control; in general this is not an issue because messages tend to be small. However, in principle, some messages - especially Certificate - can be quite large. If all the messages in a large flight are sent at once, this can result in network congestion. A better strategy is to send out only part of the flight, sending more when messages are acknowledged. DTLS offers a number of mechanisms for minimizing the size of the certificate message, including the cached information extension [RFC7924] and certificate compression [RFC8879]. [BA] How does the implementation know how much of the flight to send? Not sure how prevalent large certs are for DTLS (e.g. compared with the self-signed certs of WebRTC), but in EAP-TLS deployments, large certs have caused problems. The EAP-TLS cert document draft-ietf-emu-eaptlscert cites some additional mechanisms for reducing certificate sizes, such as draft-ietf-tls-ctls and [RFC6066] which defines the "client_certificate_url" extension which allows TLS clients to send a sequence of Uniform Resource Locators (URLs) instead of the client certificate. 5.11. Alert Messages Note that Alert messages are not retransmitted at all, even when they occur in the context of a handshake. However, a DTLS implementation which would ordinarily issue an alert SHOULD generate a new alert message if the offending record is received again (e.g., as a retransmitted handshake message). Implementations SHOULD detect when a peer is persistently sending bad messages and terminate the local connection state after such misbehavior is detected. Note that alerts are not reliably transmitted; implementation SHOULD NOT depend on receiving alerts in order to signal errors or connection closure. [BA] For the fatal alert case, it does seem like retransmission would be a good idea; otherwise the peer can be left hanging. Section 7.1 "Disruptions" such as reordering do not affect timers, correct? ACKs SHOULD NOT be sent for these flights unless generating the responding flight takes significant time. What is "significant time"? Editorial Comments (NITs) Section 2 The reader is also as to be familiar with [BA] "as" -> "assumed" Section 3 The basic design philosophy of DTLS is to construct "TLS over datagram transport". Datagram transport does not require nor provide reliable or in-order delivery of data. The DTLS protocol preserves this property for application data. Applications such as media streaming, Internet telephony, and online gaming use datagram transport for communication due to the delay-sensitive nature of transported data. The behavior of such applications is unchanged when the DTLS protocol is used to secure communication, since the DTLS protocol does not compensate for lost or reordered data traffic. [BA] While low-latency streaming and gaming does use DTLS to protect data (e.g. for protection of WebRTC data channel), telephony and RTC Audio/Video uses DTLS/SRTP for key derivation only, and SRTP for protection of data. So you might want to make a distinction. Section 3.1 Note that timeout and retransmission do not apply to the HelloRetryRequest since this would require creating state on the server. The HelloRetryRequest is designed to be small enough that it will not itself be fragmented, thus avoiding concerns about interleaving multiple HelloRetryRequests. [BA] I would add "For more detail on timeouts and retransmission, see Section 5.8." 4.3. Transport Layer Mapping DTLS messages MAY be fragmented into multiple DTLS records. Each DTLS record MUST fit within a single datagram. In order to avoid IP fragmentation, clients of the DTLS record layer SHOULD attempt to size records so that they fit within any PMTU estimates obtained from the record layer. [BA] You might reference PMTU considerations described in Section 4.4. 5. Post-handshake client authentication Messages of each category can be sent independently, and reliability is established via independent state machines each of which behaves as described in Section 5.8.1. For example, if a server sends a NewSessionTicket and a CertificateRequest message, two independent state machines will be created. As explained in the corresponding sections, sending multiple instances of messages of a given category without having completed earlier transmissions is allowed for some categories, but not for others. Specifically, a server MAY send multiple NewSessionTicket messages at once without awaiting ACKs for earlier NewSessionTicket first. Likewise, a server MAY send multiple CertificateRequest messages at once without having completed earlier client authentication requests before. In contrast, implementations MUST NOT have send KeyUpdate, NewConnectionId or RequestConnectionId [BA] "send" -> "sent" 6. Example of Handshake with Timeout and Retransmission The following is an example of a handshake with lost packets and retransmissions. Note that the client sends an empty ACK message because it can only acknowledge Record 1 sent by the server once it has processed messages in Record 0 needed to establish epoch 2 keys, which are needed to encrypt to decrypt messages found in Record 1. [BA] "encrypt to decrypt" -> "encrypt or decrypt"? Section 7.3 In the first case the use of the ACK message is optional because the peer will retransmit in any case and therefore the ACK just allows for selective retransmission, as opposed to the whole flight retransmission in previous versions of DTLS. For instance in the flow shown in Figure 11 if the client does not send the ACK message [BA] Figure 11 is the DTLS State Machine. Are you referring to another figure? The use of the ACK for the second case is mandatory for the proper functioning of the protocol. For instance, the ACK message sent by the client in Figure 13, acknowledges receipt and processing of record 4 (containing the NewSessionTicket message) and if it is not sent the server will continue retransmission of the NewSessionTicket indefinitely until its transmission cap is reached. [BA] Do you mean "maximum retransmission timemout value"?