idnits 2.17.1 draft-liu-multipath-quic-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1192 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([QUIC-TRANSPORT]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (5 September 2021) is 965 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 272 == Missing Reference: 'X' is mentioned on line 360, but not defined == Missing Reference: 'Y' is mentioned on line 362, but not defined -- Looks like a reference, but probably isn't: '1' on line 367 == Missing Reference: 'U' is mentioned on line 365, but not defined -- Looks like a reference, but probably isn't: '2' on line 365 == Missing Reference: 'V' is mentioned on line 367, but not defined Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC Y. Liu 3 Internet-Draft Y. Ma 4 Intended status: Standards Track Alibaba Inc. 5 Expires: 9 March 2022 C. Huitema 6 Private Octopus Inc. 7 Q. An 8 Alibaba Inc. 9 Z. Li 10 ICT-CAS 11 5 September 2021 13 Multipath Extension for QUIC 14 draft-liu-multipath-quic-04 16 Abstract 18 This document specifies multipath extension for the QUIC protocol to 19 enable the simultaneous usage of multiple paths for a single 20 connection. The extension is compliant with the single-path QUIC 21 design. The design principle is to support multipath by adding 22 limited extension to [QUIC-TRANSPORT]. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at https://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on 9 March 2022. 41 Copyright Notice 43 Copyright (c) 2021 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 48 license-info) in effect on the date of publication of this document. 49 Please review these documents carefully, as they describe your rights 50 and restrictions with respect to this document. Code Components 51 extracted from this document must include Simplified BSD License text 52 as described in Section 4.e of the Trust Legal Provisions and are 53 provided without warranty as described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction 58 2. Conventions and Definitions 59 3. Enable Multipath QUIC - Handshake 60 4. Path Management 61 4.1. Path Identifier and Connection ID 62 4.2. Path Packet Number Spaces 63 4.3. Path Initiation 64 4.4. Path State Management 65 4.5. Path Close 66 4.5.1. Use PATH_STATUS frame to close a path 67 4.5.2. Effect of RETIRE_CONNECTION_ID frame 68 4.5.3. Idle timeout 69 5. Using TLS to Secure QUIC Multipath 70 5.1. Packet protection for QUIC Multipath 71 5.2. Key Update for QUIC Multipath 72 6. Using Multipath QUIC with load balancers 73 7. Packet scheduling 74 7.1. Basic Scheduling 75 7.2. Scheduling with QoE Feedback 76 7.3. Per-stream Policy 77 8. Congestion control and loss detection 78 8.1. Congestion control 79 8.2. Packet number space and acknowledgements 80 8.3. Flow control 81 9. New frames 82 9.1. PATH_STATUS frame 83 9.2. ACK_MP frame 84 9.3. QOE_CONTROL_SIGNALS frame 85 10. Implementation Considerations 86 10.1. Management of acknowledgements delay 87 10.2. Handling of 0-RTT packets 88 11. Security Considerations 89 12. IANA Considerations 90 13. Changelog 91 14. Appendix.A Scenarios related to migration 92 15. Appendix.B Considerations on RTT estimate and loss detection 93 16. Appendix.C Difference from past proposals 94 17. References 95 17.1. Normative References 96 17.2. Informative References 97 Authors' Addresses 99 1. Introduction 101 In this document, we propose an extension to the current QUIC design 102 to enable the simultaneous usage of multiple paths for a single 103 connection. 105 This proposal is based on several basic design points: 107 * Re-use as much as possible mechanisms of QUIC-v1, which has 108 supported connection migration and path validation. 110 * To avoid the risk of packets being dropped by middleboxes (which 111 may only support QUIC-v1), use the same packet header formats as 112 QUIC V1. 114 * Endpoints need a Path Identifier for each different path which is 115 used to track states of packets. As we want to keep the packet 116 header formats unchanged [QUIC-TRANSPORT], Connection IDs (and the 117 sequence number of Connection IDs) would be a good choice of Path 118 Identifier. 120 * For the convenience of packet loss detection and recovery, 121 endpoints use a different packet number space for each Path 122 Identifier. 124 * Congestion Control, RTT measurements and PMTU discovery should be 125 per-path (following [QUIC-TRANSPORT]) 127 This document is organized as follows. It first provides definitions 128 of multipath quic in Section 2. It then specifies how to enable 129 multipath quic during handshake in Section 3, and path management in 130 Section 4. It discusses packet scheduling in Section 7, and 131 congestion control in Section 8. The new frames are defined in 132 Section 9. 134 2. Conventions and Definitions 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 138 "OPTIONAL" in this document are to be interpreted as described in BCP 139 14 [RFC2119] [RFC8174] when, and only when, they appear in all 140 capitals, as shown here. 142 We assume that the reader is familiar with the terminology used in 143 [QUIC-TRANSPORT]. In addition, we define the following terms: 145 * Path Identifier(Path ID): An identifier that is used to identify a 146 path in a QUIC connection at an endpoint. Path Identifier is used 147 in multi-path control frames (etc. PATH_STATUS frame) to identify 148 a path. By default, it is defined as the sequence number of the 149 destination Connection ID used for sending packets on that 150 particular path Section 4.1, but alternative definitions can be 151 used if the length of that connection ID is zero. 153 * Packet Number Space Identifier(PN Space ID): An identifier that is 154 used to distinguish packet number spaces for different paths. It 155 is used in 1-RTT packets and ACK_MP frames. Each node maintains a 156 list of "Received Packets" for each of the CID that it provided to 157 the peer, which is used for acknowledging packets received with 158 that CID. Section 4.2 160 The difference between Path Identifier and Packet Number Space 161 Identifier, is that Path Identifier is used in multi-path control 162 frames to identify a path, and Packet Number Space Identifier is used 163 in 1-RTT packets and ACK_MP frames to distinguish packet number 164 spaces for different paths. 166 3. Enable Multipath QUIC - Handshake 168 This extension defines a new transport parameter, used to negotiate 169 the use of the multipath extension during the connection handshake, 170 as specified in [QUIC-TRANSPORT]. The new transport parameter is 171 defined as follow: 173 * name: enable_multipath (TBD - experiments use 0xbaba) 175 * value: 0 (default) for disabled, 1 for enabled 177 If the peer does not carry the enable_multipath(TBD - experiments use 178 0xbaba) transport parameter, which means the peer does NOT support 179 multipath, endpoint MUST fallback to [QUIC-TRANSPORT] with single 180 path and MUST NOT send any MP frames in the following packets, also 181 MUST NOT use the multipath specific AEAD algorithm defined in 182 Section 5.1. 184 Notice that transport parameter "active_connection_id_limit" 185 [QUIC-TRANSPORT] limits the number of usable Connection IDs, and also 186 limits the number of concurrent paths. 188 4. Path Management 190 After endpoints have negotiated in handshake flow that both endpoints 191 enable multipath feature, endpoints can start using multiple paths. 193 This proposal add one multi-path control frame for path management: 195 * PATH_STATUS frame for the receiver side to claim the path state 196 and preference 198 All the new MP frames are sent in 1-RTT packets [QUIC-TRANSPORT]. 200 4.1. Path Identifier and Connection ID 202 Endpoints need a Path Identifier for each different path which is 203 used to track states of packets. Endpoints use Connection IDs in 204 1-RTT packet header as Path Identifier in each directions, and use 205 the sequence number of Connection IDs in MP frames to identify the 206 path referred. 208 Following [QUIC-TRANSPORT], Each endpoint uses NEW_CONNECTION_ID 209 frames to claim usable connections IDs for itself. Before an 210 endpoint add a new path, it SHOULD check whether there is at least 211 one unused available Connection ID for each side. 213 Endpoints can find which path a received packet belongs to according 214 to the Destination Connection ID of the 1-RTT packet. Endpoints can 215 find the context of a path by its' Connection ID or the Sequence 216 number of Connection ID. 218 The Identifier Type field in Path Identifier is used to distinguish 219 the following 3 different types of Path Identifiers: 221 * Type 0: Refer to the connection identifier used by the sender of 222 the control frame when sending data over the specified path. This 223 method SHOULD be used if this connection identifier is non-zero 224 length. This method MUST NOT be used if this connection 225 identifier is zero-length. 227 * Type 1: Refer to the connection identifier used by the receiver of 228 the control frame when sending data over the specified path. This 229 method MUST NOT be used if this connection identifier is zero- 230 length. 232 * Type 2: Refer to the path over which the control frame is sent or 233 received. 235 4.2. Path Packet Number Spaces 237 For the convenience of packet loss detection and recovery, endpoints 238 use a different packet number space for each Packet Number Space 239 Identifier. The sending peer chooses the connection identifier used 240 in 1-RTT packets. As much as possible, 1-RTT packets sent to 241 different paths SHOULD carry different connection identifiers, but 242 there is an exception if the peer uses 0-lenght CID. In all cases, 243 the packet number space for 1-RTT packets is specific to the 244 connection ID in these packets. 246 Packet Number Space Identifier(PN Space ID) is an identifier that is 247 used to distinguish packet number spaces for different paths. It is 248 used in 1-RTT packets and ACK_MP frames. In 1-RTT packets we use the 249 sequence number of Destination Connection ID as the Packet Number 250 Space Identifier, and we add a Packet Number Space Identifier field 251 in ACK_MP frames. 253 Note: If a peer uses zero length CID, then all packets sent to that 254 peer MUST be numbered in a single number space, because the packet 255 level decryption implementation will only see one Connection ID 256 sequence number(the default number 0). 258 4.3. Path Initiation 260 Figure 1 illustrates an example of new path establishment. 262 Client Server 264 (Exchanges start on default path) 265 1-RTT[]: NEW_CONNECTION_ID[C1, Seq=1] --> 266 <-- 1-RTT[]: NEW_CONNECTION_ID[S1, Seq=1] 267 <-- 1-RTT[]: NEW_CONNECTION_ID[S2, Seq=2] 268 ... 269 (starts new path) 270 1-RTT[0]: DCID=S2, PATH_CHALLENGE[X] --> 271 Checks AEAD using nonce(CID sequence 2, PN 0) 272 <-- 1-RTT[0]: DCID=C1, PATH_RESPONSE[X], PATH_CHALLENGE[Y], 273 ACK_MP[Seq=2,PN=0] 274 Checks AEAD using nonce(CID sequence 1, PN 0) 275 1-RTT[1]: DCID=S2, PATH_RESPONSE[Y], 276 ACK_MP[Seq=1, PN=0], ... --> 278 Figure 1: Example of new path establishment 280 As shown in Figure 1, client provides one unused available Connection 281 ID (C1 with sequence number 1), and server provides two available 282 Connection IDs (S1 with sequence number 1, and S2 with sequence 283 number 2). When client wants to start a new path, it checks whether 284 there is unused available Connection IDs for each side, and choose an 285 available Connection ID S2 as the Destination Connection ID in the 286 new path. 288 Endpoints need to exchange unused available Connection IDs with the 289 NEW_CONNECTION_ID frame before an endpoint starts a new path. For 290 example, if the goal is to maintain 2 paths, each endpoint should 291 provide at least 3 CID to its peer: 2 in use, and one spare. If the 292 client has used all the allocated CID, it is supposed to retire those 293 that are not used anymore, and the server is supposed to provide 294 replacements, as specified in [QUIC-TRANSPORT]. 296 If the transport parameter "active_connection_id_limit" is negotiated 297 as N, and the server has provided N Connection IDs and the client has 298 started N paths, the limit is reached. If the client wants to start 299 a new path, it has to retire one of the established paths. 301 Path validation uses the PATH_CHALLENGE and PATH_RESPONSE frame 302 defined in QUIC-Transport [QUIC-TRANSPORT]. 304 4.4. Path State Management 306 An endpoint uses PATH_STATUS frames to inform that the peer should 307 send packets in the preference expressed by these frames. An 308 endpoint uses the sequence number of the CID used by the peer for 309 PATH_STATUS frames (describing the sender's path identifier). 311 In the example Figure 1, if the client wants to send a PATH_STATUS 312 frame to tell the server that it prefers the path with CID sequence 313 number 1 (of the server's side), the client should use the identifier 314 of the server (sequence 1) in PATH_STATUS frame. 316 PATH_STATUS frame describes 4 kinds of path states: 318 * Abandon a path, and release the corresponding resource. 320 * Mark a path as "available", i.e., allow the peer to use its own 321 logic to split traffic among available paths. 323 * Mark a path as "standby", i.e., suggest that no traffic should be 324 sent on that path if another path is available. 326 * Mark the priority of a path, i.e, path 1 is weight 8, path 2 is 327 weight 2, suggest that path 1 has higher priority than path 2, and 328 peer should try to send more data in path 1. 330 PATH_STATUS frame can be sent via a different path, instead of the 331 path identified by the Path Identifier field. 333 4.5. Path Close 335 An endpoint that want to delete a path SHOULD NOT rely on implicit 336 signals like idle time or packet losses, but instead SHOULD use 337 explicit ask to abandon path by sending the PATH_STATUS frame. 339 4.5.1. Use PATH_STATUS frame to close a path 341 Both client and server can close a path, by sending PATH_STATUS frame 342 which abandons the path with a corresponding Path Identifier. Once a 343 path is marked as "abandon", it means that the resources related to 344 the path can be released. 346 Figure 2 illustrates an example of path closing. In this case, we 347 are going to close the first path. For the first path, the server's 348 1-RTT packets use DCID C1, which has a sequence number of 1; the 349 client's 1-RTT packets use DCID S2, which has a sequence number of 2. 350 For the second path, the server's 1-RTT packets use DCID C2, which 351 has a sequence number of 2; the client's 1-RTT packets use CID S3, 352 which has a sequence number of 3. Note that two paths use different 353 packet number space. (For the convience of distinguishing the CID 354 sequence number and PATH_STATUS sequence number, we call the 355 "PATH_STATUS sequence number" as "PSSN".) 357 Client Server 359 (client tells server to abandon a path) 360 1-RTT[X]: DCID=S2 PATH_STATUS[id=1, PSSN1, status=abandon, pri.=0] -> 361 (server tells client to abandon a path) 362 <- 1-RTT[Y]: DCID=C1 PATH_STATUS[id=2, PSSN2, status=abandon, pri.=0], 363 ACK_MP[Seq=2, PN=X] 364 (client abandons the path that it is using) 365 1-RTT[U]: DCID=S3 RETIRE_CONNECTION_ID[2], ACK_MP[Seq=1, PN=Y] -> 366 (server abandons the path that it is using) 367 <- 1-RTT[V]: DCID=C2 RETIRE_CONNECTION_ID[1], ACK_MP[Seq=3, PN=U] 369 Figure 2: Example of closing a path 371 In scenarios such as client detects the network environment change 372 (client's 4G/Wi-Fi is turned off, Wi-Fi signal is fading to a 373 threshold), or endpoints detect that the quality of RTT or loss rate 374 is becoming worse, client or server can terminate a path immediately. 376 4.5.2. Effect of RETIRE_CONNECTION_ID frame 378 Receiving a RETIRE_CONNECTION_ID frame causes the endpoint to discard 379 the resources associated with that connection ID. If the connection 380 ID was used by the peer to identify a path from the peer to this 381 endpoint, the resources include the list of received packets used to 382 send acknowledgements. The peer MAY decide to keep sending data 383 using the same IP addresses and UDP ports previously associated with 384 the connection ID, but MUST use a different connection ID when doing 385 so. 387 4.5.3. Idle timeout 389 [QUIC-TRANSPORT] allows for closing of connections if they stay idle 390 for too long. The connection idle timeout in multipath QUIC is 391 defined as "no packet received on any path for the duration of the 392 idle timeout". It means that if all paths remain idle for the idle 393 timeout, the connection is implicitly closed. 395 5. Using TLS to Secure QUIC Multipath 397 In order to facilitate loss detection and recovery when sending data 398 over multiple paths, this specification defines how packets sent over 399 multiple paths use different packet number spaces. This requires 400 changes in the way AEAD is applied for packet protection, as 401 explained in Section 5.1, and tighter constrainst for key updates, as 402 explained in Section 5.2. 404 5.1. Packet protection for QUIC Multipath 406 Packet protection for QUIC V1 is specified is section 5 of 407 [QUIC-TLS]. The general principles of packet protection are not 408 changed for QUIC Multipath. No changes are needed for setting packet 409 protection keys, initial secrets, header protection, use of 0-RTT 410 keys, receiving out-of-order protected packets, receiving protected 411 packets, or retry packet integrity. However, the use of multiple 412 number spaces for 1-RTT packets requires changes in AEAD usage. 414 Section 5.3 of [QUIC-TLS] specifies AEAD usage, and in particular the 415 use of a nonce, N, formed by combining the packet protection IV with 416 the packet number. QUIC multipath uses multiple packet number 417 spaces, and thus the packet number alone would not guarantee the 418 uniqueness of the nonce. In order to guarantee this uniqueness, we 419 construct the nonce N by combining the packet protection IV with the 420 packet number and with the identifier of the path, which for 1-RTT 421 packets is the Sequence Number of the Destination Connection ID 422 present in the packet header, as defined in Section 5.1.1 of 423 [QUIC-TRANSPORT], or zero if the Connection ID is zero-length. 424 Section 19 of [QUIC-TRANSPORT] encode this Connection ID Sequence 425 Number as a A variable-length integer, allowing values up to 2^62-1; 426 for QUIC multipath, we require that a range of no more than 2^32-1 427 values be used without updating the packet protection key. 429 For QUIC multipath, the construction of the nonce starts with the 430 construction of a 96 bit path-and-packet-number, composed of the 32 431 bit Connection ID Sequence Number in byte order, two zero bits, and 432 the 62 bits of the reconstructed QUIC packet number in network byte 433 order. If the IV is larger than 96 bits, path-and-packet-number is 434 left-padded with zeros to the size of the IV. The exclusive OR of 435 the padded packet number and the IV forms the AEAD nonce. 437 For example, assuming the IV value is "6b26114b9cba2b63a9e8dd4f", the 438 connection ID sequence number is "3", and the packet number is 439 "aead", the nonce will be set to "6b2611489cba2b63a9a873e2". 441 5.2. Key Update for QUIC Multipath 443 The Key Phase bit update process for QUIC V1 is specified in 444 Section 6 of [QUIC-TLS]. The general principles of key update are 445 not changed for Multipath QUIC. Following QUIC V1, the Key Phase bit 446 is used to indicate which packet protection keys are used to protect 447 the packet. The Key Phase bit is toggled to signal each subsequent 448 key update. Because of network delays, packets protected with the 449 older key might arrive later than the packets protected with the new 450 key. Therefore, the endpoint needs to retain old packet keys to 451 allow these delayed packets to be processed and it must distinguish 452 between the new key and the old key. In QUIC V1, this is done using 453 packet numbers so that the rule is made simple: Use the older key if 454 packet number is lower than any packet number frome the current key 455 phase. 457 In QUIC multipath, some care is needed in the initiating Key Update 458 process. Because different paths use different packet number spaces 459 but share a single key, when a key update is initiated on one path, 460 packets sent to the other path needs to know when transition is 461 complete. Otherwise, it is possible that the other paths send 462 packets with the old keys, but skip sending any packets in the 463 current key phase and directly jump to sending packet in the next key 464 phase. When that happens, as the endpoint can only retain two sets 465 of packet protection keys with the 1-bit Key Phase bit, the other 466 paths cannot distinguish which key should be used to decode received 467 packets, which results in a key rotation synchronization problem. 469 To address such a synchronization issue, in QUIC multipath, if key 470 update is initilized on one path, the sender should send at least one 471 packet with the new key on all active paths. Regarding the 472 responding to Key Update process, the endpoint MUST NOT initiate a 473 subsequent key update until a packet with the current key has been 474 acknowledged on each path. 476 Following the Section 5.4. of [QUIC-TLS], the Key Phase bit is 477 protected, so sending multiple packets with Key Phase bit flipping at 478 the same time should not cause linkability issue. 480 6. Using Multipath QUIC with load balancers 482 This specification follows the Connection ID negotiation defined in 483 [QUIC-TRANSPORT]. For stateless or low-state load balancers 484 supporting Multipath QUIC, implementations SHOULD use the 485 specification of Connection ID generation and Load balancer routing 486 defined in [QUIC-LB], guarantee that packets with Connection IDs 487 belonging to the same connection, can be routed to same server. 489 7. Packet scheduling 491 7.1. Basic Scheduling 493 For an outgoing packet, the packet scheduler decides which path the 494 packet shall be transmitted. A basic static scheduling strategy 495 consists of four major components: 497 1. Path state: A scheduler may want to decide which path shall be 498 activated to transmit data. For instance, a scheduler can choose 499 to use only one of the two paths and completely ignore the other 500 one. A scheduler marks the selected paths to be in the 501 "available" state and the un-selected ones in the "standby" 502 state. 504 2. Path priority: Due to the fact that costs of transmitting data 505 over different paths are not always equal. For example, the 506 energy (battery) cost over a 5G path and a wifi path are very 507 different. In another example, transmissions over a wifi path 508 and a cellular path may incur different charges per packet. Note 509 that a user's preference may change over time. For instance, 510 certain mobile carriers offer unlimited free data for a 511 particular streaming app. Therefore, the path priority should be 512 made available in the scheduler. 514 3. Path selection algorithm: A selection algorithm splits packets 515 across different paths and determines the order of paths to be 516 selected. The selection algorithm takes congestion controller 517 states as inputs, such as smoothed RTTs (sRTTs), estimated 518 bandwidths (eBWs) and congestion window sizes (CWNDs) as well as 519 application-defined information such as path priorities and path 520 states. The outputs of the algorithm is an ordered list of paths 521 to put a packet on. To name a few, some of the commonly used 522 algorithms are: - Round-Robin: There is no priority. it selects 523 paths one by one in order to transmit data. - Lowest-RTT: It 524 first chooses the path with the lowest RTT and feeds packets to 525 it until that path's congestion window is full. Then it chooses 526 the path with the second lowest RTT. - Highest-Bandwidth: It 527 first chooses the path with the highest bandwidth and feeds 528 packets to it until that path's congestion window is full. Then 529 it chooses path with the second largest bandwidth. 531 4. Packet re-injection: One major challenge in multi-path 532 transmission is the multi-path head-of-line (MP-HoL) blocking 533 [XLINK]. The blocking happens when the packets sent earlier at 534 the slow path arrive later than the packets at the fast path, 535 causing out-of-order arrival; the out-of-order packets are not 536 eligible to be submitted to applications, so the fast paths have 537 to wait. Significant heterogeneity over Wi-Fi, LTE, and 5G, as 538 well as frequent handoffs of mobile terminals will further 539 aggravate this issue. One MAY implement packet re-injection to 540 overcome MP-HoL blocking at the expense of redundant traffic 541 overhead. In re-injection, the sender keeps track of a queue for 542 unacknowledged packets. When there are no more packets to send, 543 the sender can send duplicates of the unacknowledged packets at 544 other paths without waiting for the loss recovery on the original 545 path, allowing the receiver to continue consuming data without 546 suffering from the blocking effect. 548 The path state and path priority are managed by PATH_STATUS frame. 549 The path selection algorithm and packet redundancy are application 550 related and should be controlled by the applicaiton. 552 7.2. Scheduling with QoE Feedback 554 Applications may have completely different QoE requirements---the 555 interactive applications are delay sensitive, while the video 556 streaming applications are more throughput sensitive. There is thus 557 a trend of cross-layer design that takes applications' demands into 558 account when managing paths or scheduling packets. The QoE feedback 559 is used to fully support application-awareness in multipath 560 scheduling and is carried in the QOE_CONTROL_SIGNALS frames Figure 7. 561 The QOE_CONTROL_SIGNALS frames can include general application-level 562 information that is needed by the schedulers. The frequency of such 563 feedback should be controlled to limit the amount of extra packets. 564 The QoE control signal allows a synchronization of viewpoints between 565 two endhosts. It is up to the application to determine the 566 interpretation of QoE control signals. 568 To illustrate the effectiveness of QOE_CONTROL_SIGNALS, we show how 569 to use it to control traffic redundancy overhead when packet re- 570 injection is implemented to improve multi-path transport performance 571 [XLINK]. As discussed above, the problem with packet re-injection is 572 that it MAY introduce a lot of redundant packets, increasing traffic 573 cost. Indeed, redundant packets are not always needed as the video 574 player MAY cache video chunks. Therefore, if the number of cached 575 frames is large in the video player, the play-time left until the 576 next possible re-buffering is long, and hence, the urgency of using 577 re-injection is low. On the contrary, if the number of cached frames 578 is small in the video player, the time left until the next possible 579 re-buffering is short and, hence, the urgency of using re-injection 580 is high. Knowing that the client video player's buffer occupancy 581 level is an indicator of the user-perceived QoE, one can capture the 582 related information (such as number of cached frames and framerate) 583 in client, encapsulate the information in QoE_CONTROL_SIGNAL and send 584 it back to the server to decide when to turn on or turn off its re- 585 injection usage. 587 7.3. Per-stream Policy 589 As QUIC supports stream multiplexing, streams are allowed to 590 associate stream priorities to express applications intent. For 591 instance, objects in a web page may be dependent on others and thus 592 have different priorities multipath quic scheduler. A stream 593 priority-aware packet scheduling algorithm will improve the 594 performance notably. 596 High priority /\ +---------+ 597 || | | 598 || +---------+ 599 || +---------+ 600 || | | 601 || +---------+ 602 || ... User-defined stream priority 603 || +---------+ 604 Low priority || | | 605 || +---------+ 606 ----------------------------------------------------------- 607 High priority /\ +---------+ 608 || | | 609 || +---------+ 610 || +---------+ 611 || | | 612 || +---------+ 613 || ... Default stream priority 614 || +---------+ 615 Low priority || | | 616 || +---------+ 618 Figure 3: Stream priority 620 The priority management scheme composes two separated priority 621 ranges. The user-defined priority range includes those streams that 622 the applications explicitly designate priorities, while the default 623 priority range includes the streams with no priorities set by the 624 applications. Only when the streams in the user-defined ranges have 625 no data to send, the streams in the default priority range can send. 626 In the same range, one can use the weighted-round robin for 627 scheduling---the higher-priority streams get more quota for data to 628 send in each round. One can also dynamically set/change the 629 priorities of the streams in the default priority ranges to enable 630 short stream first if needed. 632 8. Congestion control and loss detection 634 8.1. Congestion control 636 Implementations MAY support coupled congestion controllers such as 637 LIA [MPTCP-LIA], OLIA [MPTCP-OLIA], and etc., or support decoupled 638 congestion controllers in environments using disjoint network paths. 640 In decoupled congestion control, each path runs its own congestion 641 controller without interacting with the congestion controllers of 642 other paths. That is to say, in the aspect of congestion control, a 643 path behaves exactly the same as a normal QUIC connection over the 644 same network path. 646 Each path MAY choose congestion control algorithm independently. 648 8.2. Packet number space and acknowledgements 650 Each path has it's own packet number space for transmitting 1-RTT 651 packets. 653 Acknowledgements of Initial and Handshake packets MUST be carried 654 using ACK frames, as specified in [QUIC-TRANSPORT]. The ACK frames, 655 as defined in [QUIC-TRANSPORT], do not carry path identifiers. If 656 for some reason ACK frames are received in 1RTT packets while the 657 state of multipath negotiation is ambiguous, they MUST be interpreted 658 as acknowledging packets sent on path number 0. After endpoints 659 successfully negotiate multipath support, they SHOULD use ACK_MP 660 frames instead of ACK frames to signal acknowledgement of 1-RTT 661 packets, and also 0-RTT packets as specified in Section 10.2. 663 ACK_MP frame Section 9.2 can be returned via either a different path, 664 or the same path identified by the Path Identifier, based on 665 different strategies of sending ACK_MP frames. 667 8.3. Flow control 669 TBD. 671 9. New frames 673 All the new frames MUST be sent in 1-RTT packet, and MUST NOT use 674 other encryption levels. 676 If an endpoint receives MP frames from packets of other encryption 677 levels, it MUST return MP_PROTOCOL_VIOLATION as a connection error 678 and close the connection. 680 9.1. PATH_STATUS frame 682 PATH_STATUS Frame are used by endpoints to inform the peer of the 683 current status of one path, and the peer should send packets 684 according to the preference expressed in these frames. Endpoint use 685 the sequence number of the CID used by the peer for PATH_STATUS 686 frames (describing the sender's path identifier). PATH_STATUS frames 687 are formatted as shown in Figure 4. 689 PATH_STATUS Frame { 690 Type (i) = TBD-03 (experiments use 0xbaba03), 691 Path Identifier (..), 692 Path Status sequence number (i), 693 Path Status (i), 694 Path Priority (i), 695 } 697 Figure 4: PATH_STATUS Frame Format 699 PATH_STATUS Frames contain the following fields: 701 Path Identifier: An identifier of the path, which is formatted as 702 shown in Figure 5. 704 * Identifier Type: Identifier Type field is set to indicate the type 705 of path identifier. 707 - Type 0: Refer to the connection identifier used by the sender 708 of the control frame when sending data over the specified path. 709 This method SHOULD be used if this connection identifier is 710 non-zero length. This method MUST NOT be used if this 711 connection identifier is zero-length. 713 - Type 1: Refer to the connection identifier used by the receiver 714 of the control frame when sending data over the specified path. 715 This method MUST NOT be used if this connection identifier is 716 zero-length. 718 - Type 2: Refer to the path over which the control frame is sent 719 or received. 721 * Path Identifier Content: A variable-length integer specifying the 722 path identifier. 724 Path Identifier { 725 Identifier Type (i) = 0x00..0x02, 726 Path Identifier Content (i), 727 } 729 Figure 5: Path Identifier Format 731 Note: If the receiver of the PATH_STATUS frame is using non-zero 732 length Connection ID on that path, endpoint SHOULD use type 0x00 for 733 path identifier in the control frame. If the receiver of the 734 PATH_STATUS frame is using 0-length Connection ID, but the peer is 735 using non-zero length Connection ID on that path, endpoints SHOULD 736 use type 0x01 for path identifier. If both endpoints are using 737 0-length Connection IDs on that path, endpoints SHOULD only use type 738 0x02 for path identifier. 740 Path Status sequence number: A variable-length integer specifying the 741 sequence number assigned for this PATH_STATUS frame. There is a 742 different path status sequence number space for each path. 744 Available values of Path Status field are: 746 * 0: Abandon 748 * 1: Standby 750 * 2: Available 752 If the value of Path Status field is 2-available, the receiver side 753 can use the Path Priority field to express the priority weight of a 754 path for the peer. 756 Frames may be received out of order. A peer MUST ignore an incoming 757 PATH_STATUS frame if it previously received another PATH_STATUS frame 758 for the same Path Identifier with a sequence number equal to or 759 higher than the sequence number of the incoming frame. 761 PATH_STATUS frames SHOULD be acknowledged. If a packet containing a 762 PATH_STATUS frame is considered lost, the peer should only repeat it 763 if it was the last status sent for that path -- as indicated by the 764 sequence number. 766 9.2. ACK_MP frame 768 ACK_MP frame allows for acknowledgements on different paths. ACK_MP 769 frame is formatted by adding a Path Identifier field to 770 [QUIC-TRANSPORT] ACK frame. ACK_MP frame is formatted as shown in 771 Figure 6. 773 ACK_MP Frame { 774 Type (i) = TBD-00..TBD-01 (experiments use 0xbaba00..0xbaba01), 775 Packet Number Space Identifier (i), 776 Largest Acknowledged (i), 777 ACK Delay (i), 778 ACK Range Count (i), 779 First ACK Range (i), 780 ACK Range (..) ..., 781 [ECN Counts (..)], 782 } 784 Figure 6: ACK_MP Frame Format 786 Packet Number Space Identifier: An identifier of the path packet 787 number space, which is the sequence number of Destination Connection 788 ID of the 1-RTT packets which are acknowledged by the ACK_MP frame. 789 If the endpoint receives 1-RTT packets with 0-length Connection ID, 790 it SHOULD use Packet Number Space Identifier 0 in ACK_MP frames. 792 Type(i) = TBD-00 (experiments use 0xbaba00) , with no ECN Counts 793 Type(i) = TBD-01 (experiments use 0xbaba01) , with ECN Counts 795 9.3. QOE_CONTROL_SIGNALS frame 797 QOE_CONTROL_SIGNALS frame is used to carry quality of experience 798 (QoE) information. A typical use of such information is to provide 799 feedback to help application-aware scheduling. Note that different 800 applications may have very different needs, the interpretation of the 801 QoE control signal can be up to the users. QOE_CONTROL_SIGNALS 802 frames are formatted as shown in Figure 7. 804 QOE_CONTROL_SIGNALS Frame { 805 Type (i) = TBD-02 (experiments use 0xbaba02), 806 Path Identifier (..), 807 QoE Control Signals Length(8), 808 QoE Control Signals (..) 809 } 811 Figure 7: QOE_CONTROL_SIGNALS Frame Format 813 Path Identifier: An identifier of the path, which is formatted as 814 shown in Figure 5. 816 QOE_CONTROL_SIGNALS frames may be received out of order, peers SHOULD 817 pass them to the application as they arrive. Although 818 QOE_CONTROL_SIGNALS frames are not retransmitted upon loss detection, 819 they are ack-eliciting [QUIC-RECOVERY]. 821 10. Implementation Considerations 823 10.1. Management of acknowledgements delay 825 If implementation uses ACK_FREQUENCY Frame in [QUIC-DELAYED-ACK] to 826 let senders control the frequency of acknowledgements, the same 827 mechanism can be used in multi-path QUIC. There are two parameters 828 in the ACK_FREQUENCY Frame, "Packet Tolerance" and "Update Max Ack 829 Delay". 831 Those two parameters are typically computed in real time based on 832 observed performance: 834 * "Packet Tolerance" is set to a fraction of the congestion window 836 * "Update Max Ack Delay" is set to a fraction of the RTT -- but not 837 smaller than the specified min delay 839 In multi-path QUIC, there are multiple paths with different RTT and 840 different congestion windows. In this draft, it is suggested that 841 implementations can use the smallest RTT of the available paths to 842 compute the delay, and use the sum of congestion windows of all 843 available(not including standby/abandon state) paths. 845 10.2. Handling of 0-RTT packets 847 The draft specifies a packet number space for each path. Because 848 multi-path is enabled after the handshake negotiation complete, there 849 will be a separate context for each Connection ID after multi-path is 850 negotiated. 0-RTT packets are sent before these per path contexts are 851 established. To avoid confusion, this draft provides a way for 852 implementations to deal with 0-RTT packets that is both easy to 853 implement and compatible with [QUIC-TRANSPORT]: 855 * All 0-RTT packet are initially tracked in the "global" application 856 context. 858 * On the client side, 0-RTT packets are initially sent in the 859 "global" application context. The handshake concludes before any 860 1-RTT packet can be sent or received. When the handshake 861 completes, if multipath is negotiated, the tracking of 0-RTT 862 packets moves from the "global" application context to the "path 863 0" application context. That means the sequence number of the 864 first 1-RTT packets sent by the client will follow the sequence 865 number of the last 0-RTT packet. 867 * On the server side, the negotiation completes after the client 868 first flight is received and the the server first flight is sent. 869 0-RTT packets are received after that. If multipath is 870 negotiated, they are considered received on "path 0". 872 In conclusion, 0-RTT packets are tracked and processed with path 873 identifier 0. 875 11. Security Considerations 877 TBD. 879 12. IANA Considerations 881 This document defines a new transport parameter for the negotiation 882 of enable multiple paths for QUIC, and three new frame types. The 883 draft defines provisional values for experiments, but we expect IANA 884 to allocate short values if the draft is approved. 886 The following entry in Table 1 should be added to the "QUIC Transport 887 Parameters" registry under the "QUIC Protocol" heading. 889 +==============================+==================+===============+ 890 | Value | Parameter Name. | Specification | 891 +==============================+==================+===============+ 892 | TBD (experiments use 0xbaba) | enable_multipath | Section 3 | 893 +------------------------------+------------------+---------------+ 895 Table 1: Addition to QUIC Transport Parameters Entries 897 The following frame types defined in Table 2 should be added to the 898 "QUIC Frame Types" registry under the "QUIC Protocol" heading. 900 +====================+=====================+===============+ 901 | Value | Frame Name | Specification | 902 +====================+=====================+===============+ 903 | TBD-00 - TBD-01 | ACK_MP | Section 9.2 | 904 | (experiments use | | | 905 | 0xbaba00-0xbaba01) | | | 906 +--------------------+---------------------+---------------+ 907 | TBD-02 | QOE_CONTROL_SIGNALS | Section 9.3 | 908 | (experiments use | | | 909 | 0xbaba02) | | | 910 +--------------------+---------------------+---------------+ 911 | TBD-03 | PATH_STATUS | Section 9.1 | 912 | (experiments use | | | 913 | 0xbaba03) | | | 914 +--------------------+---------------------+---------------+ 916 Table 2: Addition to QUIC Frame Types Entries 918 13. Changelog 920 14. Appendix.A Scenarios related to migration 922 In QUIC V1, there are four scenarios related to migration: CID 923 renewal, NAT Rebinding, controlled migration, and migration to server 924 preferred address. It would be useful to explain exactly how these 925 four scenarios are supported or changed with Multipath QUIC. For V1, 926 these scenarios are described as follow: 928 * CID Renewal happens when the client starts using a new CID for 929 1-RTT packet, while still using the same four-tuple. This is 930 typically done for privacy, for example after a long period of 931 silence. The expected result is that the server will also use a 932 new CID for its next packets. In that scenario, RTT and 933 congestion control parameters remain the same before and after 934 migration. 936 * NAT Rebinding happens when a NAT on the path changes its mappings. 937 The server receives packets that bear the same CID as previously, 938 but arrive on a different four tuple. The complication is that 939 this could be an attack in which the attacker captures a packet 940 from the client and resends it from a different address. The 941 server is expected to perform continuity tests for both the old 942 and the new path, typically using a different CID for the new 943 path. If the continuity test on the new path succeeds before the 944 old path, the server migrates to the new path, otherwise it 945 continues using the old path and ignores the new path. 947 * Controlled migration happens when a client tests a new path. The 948 server receives packets that bear a new CID and arrive on a new 949 four tuple. The server responds to the path challenge, perform 950 its own continuity test on the new path. If the client sends non- 951 path-validation packets on the new path, the server switches to 952 sending on the new path and discards the old path. 954 * Preferred address migration happens when the server sends the 955 preferred address TP during the exchange. The client performs a 956 controlled migration to the new path, and if that is successful 957 discards the old path. 959 We could sum up these scenarios in the following table: 961 +=====+=========+===================+====================+ 962 | CID | 4-tuple | preferred address | result | 963 +=====+=========+===================+====================+ 964 | Old | Old | - | Not a migration. | 965 +-----+---------+-------------------+--------------------+ 966 | Old | New | - | NAT Rebinding. | 967 +-----+---------+-------------------+--------------------+ 968 | New | Old | - | CID Renewal. | 969 +-----+---------+-------------------+--------------------+ 970 | New | New | matches PFA | Migration to | 971 | | | | Preferred Address. | 972 +-----+---------+-------------------+--------------------+ 973 | New | New | other | Controlled | 974 | | | | Migration. | 975 +-----+---------+-------------------+--------------------+ 977 Table 3: Scenarios related to migration 979 The expectation in those scenarios is: 981 +==============+============================================+ 982 | Scenario | Expectation | 983 +==============+============================================+ 984 | Not a | Continue using existing path | 985 | migration | | 986 +--------------+--------------------------------------------+ 987 | NAT | After validation, use new path and discard | 988 | Rebinding | previous path. | 989 +--------------+--------------------------------------------+ 990 | CID Renewal | Create new path with new CIDs, discard old | 991 | | path. Reuse RTT and CC parameter. | 992 +--------------+--------------------------------------------+ 993 | Controlled | Create new path with new CIDs. Server | 994 | Migration | creates a new path,ready to use both | 995 | | paths. Client may later discard old path. | 996 +--------------+--------------------------------------------+ 997 | Migration to | Same as Controlled Migration, but the | 998 | Preferred | client is expected to abandon the old path | 999 | Address | | 1000 +--------------+--------------------------------------------+ 1002 Table 4: Expectation in scenarios related to migration 1004 In multipath quic, client / server create a new path and abandon the 1005 old path to do exactly the same thing as connection migration in the 1006 previous scenarios. 1008 15. Appendix.B Considerations on RTT estimate and loss detection 1010 QUIC implementations use RTT estimates in many ways: 1012 * For loss detection, RTT estimates are used to evaluate how long to 1013 wait for an acknowledgement before a packet is declared lost. 1015 * Several congestion control algorithm (e.g. LEDBAT, VEGAS, 1016 HYSTART) use variations of the RTT above the minimum value to 1017 detect the beginning of congestion. 1019 * BBR uses the minimal RTT to compute the minimal size of the 1020 congestion window for a target data rate. 1022 * ACK delays are often set as a fraction of the RTT. 1024 In a multipath environment, the RTT can be estimated each time a new 1025 packet is acknolwedged. However, the observed RTT will vary not only 1026 based on the state of the send path, but also based on the choice of 1027 the return path used for acknowledgements. Each RTT measurement will 1028 the sum of the one-way delay on the send path and the one-way delay 1029 on the return path. This has a number of implications for the 1030 different ways of using the RTT presented above: 1032 * If the goal is to detect possible losses, it is probably 1033 sufficient to consider all RTT measurements for a given path. 1034 Classic formulas like adding smoothed RTT and a number of 1035 deviations aim at estimating a reasonable upper bound of the 1036 acknowledgement delays. Statistics on observed acknowledgement 1037 delays will provide a valid estimate, regardless of the selection 1038 of the return path by the peer. 1040 * If the goal is to detect the onset of collision and tune a 1041 congestion algorithm, the variations of delays due to the choices 1042 of return paths will be a source of errors. Implementations will 1043 need to pick a strategy, such as for example only considering 1044 acknowledgements received through the "fastest" return path, or 1045 maybe those received through the matching four tuple for the 1046 sending path. An alternative would be to use time stamps to 1047 directly estimate variations of the one way delays. 1048 [QUIC-Timestamp] provides good support for such one-way-delay 1049 compuation. 1051 * If BBR is in use and ACKs are returned on different paths, it may 1052 cause an ambiguity issue with the computation of bandwidth and 1053 delay product (BDP). In BBR, BDP is used to limit the number of 1054 inflight packets. One may choose to use the smallest RTT measured 1055 to compute BDP. However, if the majority of ACKs are returned 1056 from a high-latency path, the cwnd = cwnd_gain * bandwidth * 1057 min_rtt may be lower than what is needed to achieve good 1058 performance. One possible solution is to transmit a new packet 1059 and its ACK on the same path. Other possible solutions may 1060 include transmitting ACKs on the shortest path with relative 1061 increase of cwnd_gain. For the time being, we think there is a 1062 research problem and it is up to the implementers to pick the best 1063 solution. 1065 16. Appendix.C Difference from past proposals 1067 This proposal differs from past proposals 1068 [I-D.deconinck-quic-multipath] in two fundamental perspectives: 1070 * The multi-path QUIC is built on top of the concept of the 1071 bidirectional paths, which readily fits into the nature of both 1072 cellular and wifi links that cover the majority of multi-path 1073 applications in QUIC while keeping the design simple and easy to 1074 implement. In doing so, we are able to re-use most of the current 1075 QUIC transport design with the sole addition of three new frames. 1077 * The multi-path QUIC design enables feedback-based dynamic 1078 scheduling strategy. As the major goal of multi-path QUIC is to 1079 enhance performance in mobile applications, where the sender and 1080 receiver may have different viewpoints about the fast-changing 1081 wireless connectivity, especially in high-mobility scenarios, the 1082 proposed design allows the sender and receiver to synchronize 1083 their viewpoints via message exchange in ACK packet in order to 1084 maximize performance. 1086 17. References 1088 17.1. Normative References 1090 [QUIC-DELAYED-ACK] 1091 Iyengar, J., Ed. and I. Swett, Ed., "Sender Control of 1092 Acknowledgement Delays in QUIC", Work in Progress, 1093 Internet-Draft, draft-iyengar-quic-delayed-ack-02, 1094 . 1097 [QUIC-LB] Duke, M., Ed. and N. Banks, Ed., "QUIC-LB: Generating 1098 Routable QUIC Connection IDs", Work in Progress, Internet- 1099 Draft, draft-ietf-quic-load-balancers, 1100 . 1103 [QUIC-RECOVERY] 1104 Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection 1105 and Congestion Control", Work in Progress, Internet-Draft, 1106 draft-ietf-quic-recovery, 1107 . 1109 [QUIC-TLS] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure 1110 QUIC", Work in Progress, Internet-Draft, draft-ietf-quic- 1111 tls, . 1113 [QUIC-TRANSPORT] 1114 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1115 Multiplexed and Secure Transport", Work in Progress, 1116 Internet-Draft, draft-ietf-quic-transport, 1117 . 1119 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1120 Requirement Levels", BCP 14, RFC 2119, 1121 DOI 10.17487/RFC2119, March 1997, 1122 . 1124 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1125 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1126 May 2017, . 1128 17.2. Informative References 1130 [I-D.deconinck-quic-multipath] 1131 Coninck, Q. D. and O. Bonaventure, "Multipath Extensions 1132 for QUIC (MP-QUIC)", Work in Progress, Internet-Draft, 1133 draft-deconinck-quic-multipath-07, 3 May 2021, 1134 . 1137 [MPTCP-LIA] 1138 Raiciu, C., Handly, M., and D. Wischik, "Coupled 1139 Congestion Control for Multipath Transport Protocols", 1140 October 2011, . 1142 [MPTCP-OLIA] 1143 Khalili, R., Gast, N., and J. Boudec, "Opportunistic 1144 Linked-Increases Congestion Control Algorithm for MPTCP", 1145 July 2014, . 1148 [QUIC-Timestamp] 1149 Huitema, C., "Quic Timestamps For Measuring One-Way 1150 Delays", August 2020, 1151 . 1153 [XLINK] Zheng, Z., Ma, Y., Liu, Y., Yang, F., Li, Z., Zhang, Y., 1154 Shi, W., Chen, W., Li, D., An, Q., Hong, H., Liu, H., and 1155 M. Zhang, "XLINK: QoE-driven multi-path QUIC transport in 1156 large-scale video services", August 2021, 1157 . 1159 Authors' Addresses 1161 Yanmei Liu 1162 Alibaba Inc. 1164 Email: miaoji.lym@alibaba-inc.com 1166 Yunfei Ma 1167 Alibaba Inc. 1169 Email: yunfei.ma@alibaba-inc.com 1171 Christian Huitema 1172 Private Octopus Inc. 1174 Email: huitema@huitema.net 1176 Qing An 1177 Alibaba Inc. 1179 Email: anqing.aq@alibaba-inc.com 1181 Zhenyu Li 1182 ICT-CAS 1184 Email: zyli@ict.ac.cn