idnits 2.17.1 draft-welzl-irtf-iccrg-tcp-in-udp-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 21, 2016) is 2957 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'RFC5245' is mentioned on line 136, but not defined ** Obsolete undefined reference: RFC 5245 (Obsoleted by RFC 8445, RFC 8839) == Missing Reference: 'RFC5128' is mentioned on line 138, but not defined == Missing Reference: 'RFC5389' is mentioned on line 145, but not defined ** Obsolete undefined reference: RFC 5389 (Obsoleted by RFC 8489) == Outdated reference: A later version (-09) exists of draft-ietf-rmcat-coupled-cc-00 -- Obsolete informational reference (is this intentional?): RFC 1078 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 6093 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 6555 (Obsoleted by RFC 8305) -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Congestion Control Research Group M. Welzl 3 Internet-Draft S. Islam 4 Intended status: Experimental K. Hiorth 5 Expires: September 22, 2016 University of Oslo 6 J. You 7 Huawei 8 March 21, 2016 10 TCP in UDP 11 draft-welzl-irtf-iccrg-tcp-in-udp-00 13 Abstract 15 This document specifies a method to encapsulate multiple TCP 16 connections using only one UDP port number pair. Doing so allows for 17 a relatively easy implementation of coupled congestion control for 18 the TCP connections. This can have several performance benefits, and 19 it makes it possible to precisely assign a share of the congestion 20 window to the connections based on priorities. It also enables use 21 of UDP-based NAT traversal techniques, and it can act as a framework 22 for experimentation with novel changes to the TCP standard. 24 Requirements Language 26 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 27 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 28 document are to be interpreted as described in [RFC2119]. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on September 22, 2016. 47 Copyright Notice 49 Copyright (c) 2016 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. More related work . . . . . . . . . . . . . . . . . . . . . . 4 66 3. Specification . . . . . . . . . . . . . . . . . . . . . . . . 5 67 4. Protocol operation and implementation notes . . . . . . . . . 8 68 5. Coupled congestion control . . . . . . . . . . . . . . . . . 11 69 5.1. Example algorithm . . . . . . . . . . . . . . . . . . . . 11 70 6. Usage considerations . . . . . . . . . . . . . . . . . . . . 13 71 7. Implementation status . . . . . . . . . . . . . . . . . . . . 14 72 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 73 9. Security Considerations . . . . . . . . . . . . . . . . . . . 15 74 10. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 15 75 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 76 11.1. Normative References . . . . . . . . . . . . . . . . . . 15 77 11.2. Informative References . . . . . . . . . . . . . . . . . 15 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 80 1. Introduction 82 Note that this document is written in a style that should facilitate 83 quick reading by focusing on the key changes from prior similar 84 proposals. A future version of this document will provide more 85 details about the parts that are "inherited" from such prior work. 87 TCP-in-UDP (TiU) is based on [Che13]. It differs from it in that: 89 o Other than [Che13], TiU encapsulates multiple TCP connections 90 using the same UDP port number pair. TCP port numbers are 91 preserved; a single well-known UDP port is used for TiU. If TiU 92 is implemented in the kernel, this allows using normal TCP 93 sockets, where enabling the usage of TiU could be done via a 94 socket option, for example. 96 o The header format is slightly different to allow representing a 97 TCP connection with a few bits that are encoded across the 98 original TCP header's "Reserved" field and the URG (Urgent) flag 99 to encode a Connection ID. With this encoding, similar to the 100 encapsulation in [Che13], the total TiU header size does not 101 exceed the original TCP header size. 103 o A (TiU-encapsulated) TCP SYN uses a newly defined TCP option to 104 establish the mapping between a Connection ID and the original TCP 105 port number pair. 107 o A method to couple the congestion controls of the TCP connections 108 is presented. This coupling can have various performance benefits 109 (explained in detail in Section 6) and allows to precisely 110 allocate a desired share to one of the coupled TCP connections 111 based on a priority from the application. Coupled congestion 112 control is possible in TiU because the common preceding UDP header 113 makes it reasonable to assume that the connections traverse the 114 same network bottleneck. This is not necessarily a correct 115 assumption when the outer header's port numbers differ due to 116 mechanisms like Equal-Cost Multi-Path (ECMP). Note that ECMP can 117 have performance benefits which TiU eliminates. This trade-off is 118 also discussed in Section Section 6. 120 o This document provides some new and/or somewhat different 121 explanations: Section 4 discusses how TiU support can work with 122 preceding extra information such as a SPUD header 123 ([I-D.hildebrand-spud-prototype]) without exceeding the MTU and 124 elaborates on a possible method of implementating TiU including 125 robust "Happy Eyeballing". 127 TiU inherits all the benefits of [Che13] and a preceding similar 128 proposal, [Den08]. It adds potential benefits that are due to 129 coupled congestion control, and it adds the potential disadvantage of 130 not being able to benefit from ECMP. In short, the benefits and 131 features of TiU that are already explained in detail in [Che13] and 132 [Den08] are: 134 o To establish direct communication between two devices that are 135 both behind NAT gateways, Interactive Connectivity Establishment 136 (ICE) [RFC5245] is used to create the necessary mappings in both 137 NAT gateways, and ICE can have higher success rates using UDP 138 [RFC5128]. 140 o TCP options, as required for Multipath TCP [RFC6824], for example, 141 are expected to work more reliably because middleboxes will be 142 less able to interfere with them. 144 o Because the packet format allows the first octet to be in the 145 range 0x0-0x3 (as is the case for a STUN [RFC5389] packet, where 146 the most significant two bits are always zero), the UDP port 147 number pair used by TiU can be used to exchange STUN packets with 148 a STUN server that is unaware of TiU. 150 o Following the method described in [Che13] and [Den08], other 151 transport protocols than TCP (e.g., SCTP) could be UDP- 152 encapsulated in a similar fashion. With TiU, the same outer UDP 153 port number pair could be used for different encapsulated 154 protocols at the same time. 156 [Che13] also lists a disadvantage of UDP-encapsulating TCP packets: 157 because NAT gateways typically use shorter timeouts for UDP port 158 mappings than they do for TCP port mappings, long-lived UDP- 159 encapsulated TCP connections will need to send more frequent 160 keepalive packets than native TCP connections. TiU inherits this 161 problem too, although using a single five-tuple for multiple TCP 162 connections alleviates it by reducing the chance of experiencing long 163 periods of silence. 165 2. More related work 167 The TCPMUX mechanism in [RFC1078] multiplexes TCP connections under 168 the same outer transport port number; it does however not preserve 169 the port numbers of the original TCP connections, and no method to 170 couple congestion controls is described in [RFC1078]. 172 TiU's congestion control coupling follows the style of RTP 173 application congestion control coupling in 174 [I-D.ietf-rmcat-coupled-cc] which is designed to be easy to 175 implement, and to minimize the number of changes that need to be made 176 to the underlying congestion control mechanisms. This method was 177 shown to yield several benefits in [fse]. TiU's congestion control 178 requires slightly deeper changes to the TCP's congestion control, 179 making it harder to implement than [I-D.ietf-rmcat-coupled-cc], but 180 it is still a much smaller code change than the Congestion Manager 181 [RFC3124]. 183 Combining congestion controls as TiU does it has some similarities 184 with Ensemble Sharing in [RFC2140], which however only concerns 185 initial values of variables used by new connections and does not 186 share the congestion window (cwnd), which is the variable of interest 187 in TiU. The cwnd variable is shared across ongoing connections in 188 [ETCP] and [EFCM], and the mechanism described in Section 5 resembles 189 the mechanisms in these works, but neither [ETCP] nor [EFCM] address 190 the problem of ECMP. 192 Coupled congestion control has also been specified for Multipath TCP 193 [RFC6356]. MPTCP's coupled congestion control combines the 194 congestion controls of subflows that may traverse different paths, 195 whereas TiU builds on the assumption that all its encapsulated TCP 196 connections traverse the same path. This makes the two methods for 197 coupled congestion control very different, even though they both aim 198 at emulating the behavior of a single TCP connection in the case 199 where all flows traverse the same network bottleneck. 201 3. Specification 203 TiU uses a header that is very similar to the header format in 204 [Den08] and [Che13], where it is explained in greater detail. It 205 consists of a UDP header that is followed by a slightly altered TCP 206 header. The UDP source and destination ports are semantically 207 different from [Den08] and [Che13]: TiU uses a single well-known UDP 208 port, and multiple TCP connections use the same UDP port number pair. 209 The encapsulated TCP header is changed to fit into a UDP packet 210 without increasing the MSS; this is achieved by removing the TCP 211 source and destination ports, the Urgent Pointer and the (now 212 unnecessary) TCP checksum. Moreover, the order of fields is changed 213 to move the Data Offset field to the beginning of the UDP payload. 214 This allows using it to identify other encapsulated content such as a 215 STUN packet: for TCP, the Data Offset must be at least 5, i.e. the 216 most-significant four bits of the first octet of the UDP payload are 217 in the range 0x5-0xF, whereas this is not the case for other 218 protocols (e.g., STUN requires these bits to be 0). The altered TCP 219 header for TiU is shown below: 221 0 1 2 3 222 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 223 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 224 | Source Port | Destination Port | 225 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 226 | Length | Checksum | 227 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 228 | Data | Conn |C|E|C|A|P|R|S|F| | 229 | Offset| ID |W|C|I|C|S|S|Y|I| Window | 230 | | |R|E|D|K|H|T|N|N| | 231 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 232 | Sequence Number | 233 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 234 | Acknowledgment Number | 235 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 236 | (Optional) Options | 237 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 239 Figure 1: Encapsulated TCP-in-UDP Header Format (the first 8 bytes 240 are the UDP header) 242 Different from [Den08] and [Che13], the least-significant four bits 243 of the first octet and a bit that replaces the URG bit in the next 244 octet together form a five-bit "Connection ID" (Conn ID). TiU 245 maintains the port numbers of the TCP connections that it 246 encapsulates; the Connection ID is a way to encode the port number 247 information with a few unused header bits. It uniquely identifies a 248 port number pair of a TCP connection that is encapsulated with TiU. 249 Using these five bits, TiU can combine up to 32 TCP connections with 250 one UDP port number pair. 252 The TiU-TCP SYN and SYN/ACK packets look slightly little different, 253 because they need to establish the mapping between the Connection ID 254 and the port numbers that are used by TiU-encapsulated TCP 255 connections: 257 0 1 2 3 258 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 259 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 260 | Source Port | Destination Port | 261 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 262 | Length | Checksum | 263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 264 | Data |Re- |C|E| |A|P|R|S|F| | 265 | Offset|served |W|C|0|C|S|S|Y|I| Window | 266 | | |R|E| |K|H|T|N|N| | 267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 268 | Sequence Number | 269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 270 | Acknowledgment Number | 271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 272 | Encapsulated Source Port | Encapsulated Destination Port | 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 274 | Options | 275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 277 Figure 2: Encapsulated TCP-in-UDP SYN and SYN/ACK Packet Header 278 Format 280 The Encapsulated Source Port and Encapsulated Destination Port are 281 the port numbers of the TCP connection. To create this header, an 282 implementation can simply swap the position of the original TCP 283 header's port number fields with the position of the Data Offset / 284 Reserved / Flags / Window fields. 286 Every TiU SYN or TiU SYN-ACK packet also carries at least the TiU- 287 Setup TCP option. This option contains a Connection ID number. On a 288 SYN packet, it is the Connection ID that the sender intends to use in 289 future packets to represent the Encapsulated Source Port and 290 Encapsulated Destination Port. On a SYN/ACK packet, it confirms that 291 such usage is accepted by the recipient of the SYN. A special value 292 of 255 is used to signify an error, upon which TiU will no longer be 293 used (i.e., the next packet is expected to be a non-encapsulated TCP 294 packet). The TiU-Setup TCP option is defined as follows: 296 0 1 2 3 297 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 299 | Kind | Length | ExID | 300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 301 | Connection ID | 302 +-+-+-+-+-+-+-+-+ 304 Figure 3: TiU Setup TCP Option 306 The option follows the format for Experimental TCP Options defined in 307 [RFC6994]. It has Kind=253, Length=5, an ExID that is with value TBD 308 (see Section 8) and the Connection ID. The Connection ID is an 8-bit 309 field for easier parsing, but only values 0-31 are valid Connection 310 IDs (because the Connection ID in non - SYN or SYN/ACK TiU packets is 311 only 5 bit long). 313 4. Protocol operation and implementation notes 315 There can be several ways to implement TCP-in-UDP. The following 316 gives an overview of how a TiU implementation can operate. This 317 description matches the implementation described in Section 7. 319 A goal of TiU is to achieve congestion control coupling with a simple 320 implementation that minimizes changes to existing code. It is thus 321 recommendable to implement TiU in the kernel, as a change to the 322 existing kernel TCP code. The changes fall in two basic categories: 324 o Encapsulation and decapsulation: this is code that should, in the 325 simplest case, operate just before a TCP segment is transmitted. 326 Based on e.g. a socket option that enables/disables TiU, the TCP 327 segment is changed into the TiU header format (Figure 1). In case 328 it is a TCP SYN or TCP SYN/ACK packet, the header format is 329 defined as in Figure 2, and the TiU-Setup TCP option is appended. 330 This packet is then transmitted. For decapsulation, the reverse 331 mechanism applies, upon reception of a UDP packet that uses 332 destination port XXX (TBD, see Section 8). Both hosts keep a list 333 of encapsulated TCP port numbers and their corresponding 334 Connection IDs. In case a SYN packet requests using a Connection 335 ID that is already reserved, an error (Connection ID value 255 in 336 the TiU Setup TCP option) must be signified to the other end in a 337 TiU-encapsulated TCP SYN/ACK, and encapsulation must be disabled 338 on all further TCP packets. Similarly, when receiving a TiU SYN/ 339 ACK with an error, a TCP sender must stop encapsulating TCP 340 packets. 342 o Coupled congestion control: this is code that influences the 343 congestion control of TCP. Section 5 describes a simple coupled 344 congestion control algorithm that can be applied to couple TCP 345 connections and assign them a share of the total congestion window 346 that is based on a priority. 348 The TCP port number space usage on the host is left unchanged: the 349 original code can reserve TCP ports as it always did. Except for the 350 TiU encapsulation compressing the port numbers into a Connection ID 351 field, TCP ports should be used similar to normal TCP operation. A 352 TCP port that is in use by a TiU-encapsulated TCP connection must 353 therefore not be made available to non-encapsulated TCP connections, 354 and vice versa. 356 For each TCP connection, two variables must be configured: 1) TiU- 357 ENABLE, which is a boolean, deciding whether to use TiU or not, and 358 2) Priority, which is a value, e.g. from 1 to 10, that is used by the 359 coupled congestion control algorithm to assign an appropriate share 360 of the total cwnd to the connection. Priority values are local and 361 their range does not matter for this algorithm: the algorithm works 362 with a flow's priority portion of the sum of all priority values. 363 The configuration of the two per-connection variables can be 364 implemented in various ways, e.g. through an API option. 366 With these code changes in place, TiU can operate as follows, 367 assuming no previous TiU connections have been made between a 368 specific host pair and a client tries to connect to a server: 370 o An application uses an API option to request TiU operation. The 371 kernel then sends out a TiU TCP SYN that contains a TiU-Setup TCP 372 option. This packet header contains the encapsulated TCP port 373 numbers (source port A and destination port B) and the Connection 374 ID X. 376 o The server listens on UDP port XXX (TBD, see Section 8). Upon 377 receiving a packet on this port, it knows that it is a TiU packet 378 and decodes it, handing the resulting TCP packet over to "normal" 379 TCP processing. The TiU-Setup TCP option allows the server to 380 associate future TiU packets containing Connection ID X with ports 381 A and B. The server sends its response as a TiU SYN-ACK. 383 o TCP operates as normal from here on, but packets are TiU- 384 encapsulated before sending them out and decapsulated upon 385 reception, using Connection ID X. Both hosts associate TiU 386 packets carrying Connection ID X with a local identifier that 387 matches ports A and B, just like they would associate non- 388 encapsulated TCP packets with the same local identifier when 389 seeing ports A and B in the TCP header. 391 o If an application on either side of the TiU connection wants to 392 connect to a destination host on the other side and requests TiU 393 operation, the kernel sends out another TiU TCP SYN, this time 394 containing a different TCP source port number and either the same 395 or a different destination port number (C and D), and a TiU-Setup 396 TCP option with Connection ID Y. From now on, packets carrying 397 Connection ID Y will be associated with ports C and D on both 398 hosts. Otherwise, TiU operation continues as described above. 400 o Now, because there are two or more connections available between 401 the same host pair, coupled congestion control begins to operate 402 for all outgoing TiU packets (see Section 5 for details). This is 403 a local operation, applying the priority values that were 404 configured to use for the TiU-encapsulated TCP connections. 406 Unless it is known that UDP packets with destination port number XXX 407 (TBD, see Section 8) can be used without problems on the path between 408 two communicating hosts, it is advisable for TiU implementations to 409 contain methods to fall back to non-encapsulated ("raw") TCP 410 communication. Such fall-back must be supported for the case of 411 Connection ID collisions anyway. Middleboxes have been known to 412 track TCP connections [Honda11], and falling back to communication 413 with raw TCP packets without ever using a raw TCP SYN - SYN/ACK 414 handshake may lead to problems with such devices. The following 415 method is recommended to efficiently fall back to raw TCP 416 communication: 418 o After sending out a TiU SYN packet, additionally send a raw TCP 419 SYN packet. 421 o After sending out a TiU SYN/ACK packet, additionally send a raw 422 TCP SYN/ACK packet. 424 o Upon receiving a TiU SYN packet, after responding with a TiU SYN/ 425 ACK packet and raw TCP SYN/ACK packet, immediately store the 426 encapsulated port numbers and Connection ID. As long as a TiU 427 connection is ongoing, ignore any additional incoming TCP SYN or 428 TCP SYN/ACK packets from the same host that carry port numbers 429 matching the stored encapsulated port numbers. Otherwise, process 430 TCP SYN or TCP SYN/ACK packets as normal. 432 This method ensures that the TCP SYN / SYN/ACK handshake is visible 433 to middleboxes and allows to immediately switch back to raw TCP 434 communication in case of failures. If implemented on both sides as 435 described above and no TiU SYN or TiU SYN/ACK packet arrives, yet a 436 TCP SYN or TCP SYN/ACK packet does, this can only mean that the other 437 host does not support TiU, a UDP packet was dropped, or the UDP and 438 TCP packets were reordered in transit. Reordering in the host (e.g., 439 a server responding to a TCP SYN before it responds to a TiU SYN) can 440 be a problem for similar methods (e.g. [RFC6555]), but it can be 441 eliminated by prescribing the processing order as above. 443 Because TCP does not preserve message boundaries and the size of the 444 TCP header can vary depending on the options that are used, it is 445 also no problem to precede the TCP header in the UDP packet with a 446 different header (e.g. SPUD [I-D.hildebrand-spud-prototype]) without 447 exceeding the known MTU limit. When creating a TCP segment, a TCP 448 sender needs to consider the length of this header when calculating 449 the segment size, just like it would consider the length of a TCP 450 option. For this to work, the usage of other headers such as SPUD 451 in-between the UDP header and the TiU header must therefore be known 452 to both the sender-side and receiver-side code that processes TiU. 454 5. Coupled congestion control 456 For each TCP connection c, the algorithm described below receives 457 cwnd and ssthresh as input and stores the following information: 459 o the Connection ID. 461 o a priority P(c) -- e.g., an integer value in the range from 1 462 (unimportant) to 10 (very important). 464 o The previously used cwnd used by the connection c, ccc_cwnd(c). 466 o The previously used ssthresh used by the connection c, 467 ccc_ssthresh(c). 469 Three global variables S_CWND, S_SSTHRESH and S_P are used to 470 represent the sum of all the ccc_cwnd values, ccc_sshtresh values and 471 priorities of all TCP connections, respectively. S_CWND and 472 S_SSTHRESH are used to update the cwnd and ssthresh values for all 473 connections. 475 5.1. Example algorithm 477 This algorithm emulates the behavior of a single TCP connection by 478 choosing one connection as the connection that dictates the increase 479 / decrease behavior for the aggregate. It was designed to be as 480 simple as possible. In the algorithm description below, 481 abbreviations are used to refer to the phases of TCP congestion 482 control as defined in [RFC5681]: SS refers to Slow Start, CA refers 483 to Congestion Avoidance and FR refers to Fast Recovery. 485 For simplicity, this algorithm refrains from changing cwnd when a 486 connection is in FR. SS should not happen as long as ACKs arrive. 487 Hence, the algorithm ensures that the aggregate's behavior is only 488 dictated by SS when all connections are in the SS phase. 490 (1) When a connection c starts, it adds its priority P(c) to S_P. 491 If it is the very first connection that uses the outer UDP port 492 number pair, it also sets S_CWND to its own cwnd. After that, 493 the connection's globally known cwnd and ssthresh values 494 (ccc_cwnd(c) and ccc_ssthresh(c)) are updated, and the 495 connection updates its own cwnd and ssthresh values to be equal 496 to ccc_cwnd(c) and ccc_ssthresh(c). 498 S_P = S_P + P(c) 499 ccc_cwnd(c) = P(c) * S_CWND / S_P 500 ccc_ssthresh(c) = ssthresh 501 if (S_SSTHRESH > 0) 502 ccc_ssthresh(c) = P(c) * S_SSTHRESH / S_P 503 end if 504 // Update c's own cwnd and ssthresh for immediate use: 505 send ccc_cwnd(c) and ccc_ssthresh(c) to the connection c 507 (2) When a connection c stops, its entry is removed. S_P is 508 recalculated. 510 (3) Every time the congestion controller of a connection c 511 calculates a new cwnd, the connection calls UPDATE, which 512 carries out the tasks listed below to derive the new cwnd and 513 ssthresh values for all the connections. Since we intend to 514 emulate the behavior of one connection, we designate one of the 515 connections as the "Coordinating Connection" (CoCo). Whenever 516 the coordinating connection calls UPDATE, S_CWND and S_SSTHRESH 517 are additionally updated to reflect the current sum of all 518 stored ccc_cwnd and ccc_ssthresh values. Initially, there is 519 only one connection and this connection automatically becomes 520 the CoCo. It updates S_CWND to its own cwnd and sets S_SSTHRESH 521 to 0. 523 (4) WHEN a non-CoCo connection c CALLS UPDATE...... 525 if(all of the connections including CoCo are in CA but c is in FR) 526 c becomes the new CoCo. 527 else 528 if(c is in CA or SS) 529 c's cwnd is assigned its previously stored ccc_cwnd value. 531 (5) WHEN c(CoCo) CALLS UPDATE...... 533 if (c is in CA) 534 if(cwnd >= ccc_cwnd(c)) // cwnd has increased 535 S_CWND = S_CWND + cwnd - ccc_cwnd(c) 536 else 537 S_CWND = S_CWND * cwnd / ccc_cwnd(c) 538 end if 539 ccc_cwnd(c) = P(c) * S_CWND / S_P 540 ccc_ssthresh(c) = ssthresh 541 if (S_SSTHRESH > 0) 542 ccc_ssthresh(c) = P(c) * S_SSTHRESH / S_P 543 end if 544 // Update c's own cwnd and ssthresh for immediate use: 545 send ccc_cwnd(c) and ccc_ssthresh(c) to the connection c 546 end if 548 else if (c is in FR) 549 S_SSTHRESH = S_CWND/2 551 else if (c is in SS) 552 if (all other connections are in SS) 553 S_SSTHRESH = S_CWND/2 554 S_CWND = S_CWND * cwnd / ccc_cwnd(c) 555 ccc_cwnd(c) = P(c) * S_CWND / S_P 556 // Update c's own cwnd for immediate use: 557 send ccc_cwnd(c) to the connection c 558 else 559 make any other connection which is not in SS the CoCo 560 end if 561 end if 563 6. Usage considerations 565 TiU cannot work with applications that require the Urgent pointer 566 (which is not recommended for use by new applications anyway 567 [RFC6093], but should be consider if TiU is implemented in a way that 568 allows it to be applied onto existing applications; telnet is a well- 569 known example of an application that uses this functionality). It 570 enables use of TCP with methods such as SPUD 571 [I-D.hildebrand-spud-prototype]. It can also be used as a method to 572 experimentally test new TCP functionality in the presence of 573 middleboxes that would otherwise create problems (as some have been 574 known to do [Honda11]). TCP option space is getting scarce, in 575 particular on TCP SYN and TCP SYN/ACK packets. Rather than 576 stretching the Data Offset field on TCP SYN / TCP SYN/ACK packets 577 (which was considered for TiU design), it is recommended to use one 578 of the other proposed mechanisms to stretch option space, e.g. 579 "Inner Space" [I-D.briscoe-tcpm-inner-space]. 581 Reasons to use TiU include the benefits of [Che13] and [Den08] that 582 were discussed in Section 1. TiU has the disadvantage of disabling 583 ECMP for the TCP connections that it encapsulates. This can reduce 584 the capacity usage of these TCP connections. It has the advantage of 585 being able to apply coupled congestion control, which can provide 586 precise congestion window assignment based on a priority. Other 587 benefits of TiU's coupled congestion control are: 589 o Reduced average loss and queuing delay (because the competition 590 between the encapsulated TCP connections is avoided) 592 o Even in the absence of prioritization, better fairness between the 593 TiU-encapsulated TCP connections 595 o No need for new TiU connections to slow start up to a reasonable 596 cwnd value that ongoing TiU connections already have: a connection 597 can immediately be assigned its share of the aggregate's total 598 cwnd. This can significantly reduce the completion time of short 599 connections. 601 All of these benefits only play out when there are more than one TCP 602 connections. Some of the benefits in the list above are more 603 significant when some transfers are short. Moreover, short transfers 604 are less likely than long ones to saturate the capacity of a path, 605 reducing the chance to benefit from ECMP (which TiU eliminates). 606 This makes the usage of TiU especially attractive in situations where 607 some transfers are short. 609 7. Implementation status 611 The University of Oslo is currently working on a FreeBSD kernel 612 implementation of TCP-in-UDP. 614 8. IANA Considerations 616 This document specifies a new TCP option that uses the shared 617 experimental options format [RFC6994]. No value has yet been 618 assigned for ExID. 620 This document requires a well-known UDP port (referred to as port XXX 621 in this document). Due to the highly experimental nature of TiU, 622 this document is being shared with the community to solicit comments 623 before requesting such a port number. 625 9. Security Considerations 627 We have not thought about security yet. This will surely be fun! 629 10. Acknowledgement 631 This work has received funding from Huawei Technologies Co., Ltd., 632 and the European Union's Horizon 2020 research and innovation 633 programme under grant agreement No. 644334 (NEAT). The views 634 expressed are solely those of the author(s). 636 11. References 638 11.1. Normative References 640 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 641 Requirement Levels", BCP 14, RFC 2119, 642 DOI 10.17487/RFC2119, March 1997, 643 . 645 11.2. Informative References 647 [Che13] Cheshire, S., Graessley, J., and R. McGuire, 648 "Encapsulation of TCP and other Transport Protocols over 649 UDP", Internet-draft draft-cheshire-tcp-over-udp-00, June 650 2013. 652 [Den08] Denis-Courmont, R., "UDP-Encapsulated Transport 653 Protocols", Internet-draft draft-denis-udp-transport-00, 654 July 2008. 656 [EFCM] Savoric, M., Karl, H., Schlager, M., Poschwatta, T., and 657 A. Wolisz, "Analysis and performance evaluation of the 658 EFCM common congestion controller for TCP connections", 659 Computer Networks (2005) , 2005. 661 [ETCP] Eggert, L., Heidemann, J., and J. Joe, "Effects of 662 ensemble-TCP", ACM SIGCOMM Computer Communication Review 663 (2000) , 2000. 665 [fse] Islam, S., Welzl, M., Gjessing, S., and N. Khademi, 666 "Coupled Congestion Control for RTP Media", ACM SIGCOMM 667 Capacity Sharing Workshop (CSWS 2014) and ACM SIGCOMM CCR 668 44(4) 2014; extended version available as a technical 669 report from 670 http://safiquli.at.ifi.uio.no/paper/fse-tech-report.pdf , 671 2014. 673 [Honda11] Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A., 674 Handley, M., and H. Tokuda, "Is it still possible to 675 extend TCP?", Proc. of ACM Internet Measurement 676 Conference (IMC) '11, November 2011. 678 [I-D.briscoe-tcpm-inner-space] 679 Briscoe, B., "Inner Space for TCP Options", draft-briscoe- 680 tcpm-inner-space-01 (work in progress), October 2014. 682 [I-D.hildebrand-spud-prototype] 683 Hildebrand, J. and B. Trammell, "Substrate Protocol for 684 User Datagrams (SPUD) Prototype", draft-hildebrand-spud- 685 prototype-03 (work in progress), March 2015. 687 [I-D.ietf-rmcat-coupled-cc] 688 Islam, S., Welzl, M., and S. Gjessing, "Coupled congestion 689 control for RTP media", draft-ietf-rmcat-coupled-cc-00 690 (work in progress), September 2015. 692 [RFC1078] Lottor, M., "TCP port service Multiplexer (TCPMUX)", 693 RFC 1078, DOI 10.17487/RFC1078, November 1988, 694 . 696 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 697 DOI 10.17487/RFC2140, April 1997, 698 . 700 [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", 701 RFC 3124, DOI 10.17487/RFC3124, June 2001, 702 . 704 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 705 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 706 . 708 [RFC6093] Gont, F. and A. Yourtchenko, "On the Implementation of the 709 TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093, 710 January 2011, . 712 [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled 713 Congestion Control for Multipath Transport Protocols", 714 RFC 6356, DOI 10.17487/RFC6356, October 2011, 715 . 717 [RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with 718 Dual-Stack Hosts", RFC 6555, DOI 10.17487/RFC6555, April 719 2012, . 721 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 722 "TCP Extensions for Multipath Operation with Multiple 723 Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, 724 . 726 [RFC6994] Touch, J., "Shared Use of Experimental TCP Options", 727 RFC 6994, DOI 10.17487/RFC6994, August 2013, 728 . 730 Authors' Addresses 732 Michael Welzl 733 University of Oslo 734 PO Box 1080 Blindern 735 Oslo N-0316 736 Norway 738 Email: michawe@ifi.uio.no 740 Safiqul Islam 741 University of Oslo 742 PO Box 1080 Blindern 743 Oslo N-0316 744 Norway 746 Phone: +47 22 84 08 37 747 Email: safiquli@ifi.uio.no 749 Kristian Hiorth 750 University of Oslo 751 PO Box 1080 Blindern 752 Oslo N-0316 753 Norway 755 Email: kristahi@ifi.uio.no 757 Jianjie You 758 Huawei 759 101 Software Avenue, Yuhua District 760 Nanjing 210012 761 China 763 Email: youjianjie@huawei.com