idnits 2.17.1 draft-briscoe-tcpm-syn-op-sis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC793, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC793, updated by this document, for RFC5378 checks: 1981-09-01) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 22, 2014) is 3504 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) == Outdated reference: A later version (-10) exists of draft-ietf-tcpm-fastopen-09 -- Obsolete informational reference (is this intentional?): RFC 6555 (Obsoleted by RFC 8305) -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor Extensions (tcpm) B. Briscoe 3 Internet-Draft BT 4 Updates: 793 (if approved) September 22, 2014 5 Intended status: Experimental 6 Expires: March 26, 2015 8 Extended TCP Option Space in the Payload of an Alternative SYN 9 draft-briscoe-tcpm-syn-op-sis-02 11 Abstract 13 This document describes an experimental method to extend the option 14 space for connection parameters within the initial TCP SYN segment at 15 the start of a TCP connection. In this method the TCP client sends 16 two alternative SYNs: one intended for legacy servers and one 17 intended for upgraded servers. Once it establishes which type of 18 server has responded, it continues the connection appropriate to that 19 server type and aborts the other. The SYN intended for upgraded 20 servers includes additional options at the end of the payload. It is 21 designed to traverse all known middleboxes. In the longer term, 22 clients will be able to send only the SYN intended for upgraded 23 servers. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on March 26, 2015. 42 Copyright Notice 44 Copyright (c) 2014 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Motivation for Adoption (to be removed before 61 publication) . . . . . . . . . . . . . . . . . . . . . . 3 62 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 1.3. Experiment Goals . . . . . . . . . . . . . . . . . . . . 4 64 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 65 2. Protocol Specification . . . . . . . . . . . . . . . . . . . 5 66 2.1. Dual 3-Way Handshake . . . . . . . . . . . . . . . . . . 5 67 2.2. Retransmission Behaviour . . . . . . . . . . . . . . . . 7 68 2.3. Segment Structure . . . . . . . . . . . . . . . . . . . . 7 69 2.3.1. SYN-U Structure (Non-Deterministic) . . . . . . . . . 7 70 2.3.2. SYN/ACK-U Structure . . . . . . . . . . . . . . . . . 9 71 2.4. TCP Option Processing . . . . . . . . . . . . . . . . . . 9 72 2.4.1. Writing TCP Options . . . . . . . . . . . . . . . . . 9 73 2.4.2. Reading TCP Options . . . . . . . . . . . . . . . . . 10 74 2.4.3. Forwarding TCP Options . . . . . . . . . . . . . . . 11 75 3. Discussion of Non-Determinism . . . . . . . . . . . . . . . . 11 76 4. Migration to Single Handshake . . . . . . . . . . . . . . . . 12 77 5. Interaction with Pre-Existing TCP . . . . . . . . . . . . . . 13 78 6. Dual Handshake: The Explicit Variant . . . . . . . . . . . . 13 79 6.1. Retransmission Behaviour - Explicit Variant . . . . . . . 14 80 6.2. SYN-L Structure . . . . . . . . . . . . . . . . . . . . . 15 81 6.3. Corner Cases . . . . . . . . . . . . . . . . . . . . . . 15 82 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 83 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 84 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 85 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 86 10.1. Normative References . . . . . . . . . . . . . . . . . . 16 87 10.2. Informative References . . . . . . . . . . . . . . . . . 17 88 Appendix A. Alternative Protocol Specifications . . . . . . . . 17 89 A.1. SYN-U Structure (Deterministic) . . . . . . . . . . . . . 17 90 Appendix B. Comparison of Alternatives . . . . . . . . . . . . . 19 91 B.1. Implicit vs Explicit Dual Handshake . . . . . . . . . . . 19 92 B.2. Non-Deterministic vs Deterministic SYN-U . . . . . . . . 20 93 B.3. Comparison with Other Proposals . . . . . . . . . . . . . 21 94 Appendix C. Protocol Design Issues (to be Deleted before 95 Publication) . . . . . . . . . . . . . . . . . . . . 21 96 Appendix D. Change Log (to be Deleted before Publication) . . . 21 97 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 24 99 1. Introduction 101 This document describes an experimental method to extend the TCP 102 option space available in the initial SYN segment of a TCP connection 103 (i.e. SYN set and ACK not set) [RFC0793]. This extension is 104 required to support some combinations of TCP options, notably large 105 ones such as TCP AO [RFC5925] (16B), Multipath TCP [RFC6824] (12B), 106 and TCP Fast Open [I-D.ietf-tcpm-fastopen] (6-18B) as well as other 107 options already typically used in TCP connections, such as SACK-ok 108 (2B), Timestamp (10B), Window Scale (3B), MSS (4B) . 110 In this method the TCP client sends two alternative SYNs: one 111 intended for legacy servers and one intended for upgraded servers. 112 Once it establishes which type of server has responded, it continues 113 the connection appropriate to that server type and aborts the other. 114 The SYN intended for upgraded servers includes additional options at 115 the end of the payload. It is designed to traverse all known 116 middleboxes. 118 The ambition of this specification is more than just a low latency 119 way to extend the TCP option space using two SYNs for parallel 120 capability negotiation. A larger goal is to enable evolution 121 towards: 123 o a single TCP initial segment with more space for control options 124 and 126 o a more structured way for TCP to determine which control options 127 might interact with middleboxes and which are intended solely for 128 end-system interaction. 130 1.1. Motivation for Adoption (to be removed before publication) 132 It is recognised that there could be potential for compressing 133 together multiple options in order to mitigate the option space 134 problem. However, it seems inevitable that ultimately more option 135 space will be needed, particularly given that many of the TCP options 136 introduced recently consume large numbers of bits in order to provide 137 sufficient information entropy, which is not amenable to compression. 139 Extension of TCP option space on a SYN requires support from both 140 ends. This means it will take many years before the facility is 141 functional for most pairs of end-points. Therefore, given the 142 problem is already becoming pressing, a solution needs to start being 143 deployed now. 145 1.2. Scope 147 This experimental specification extends the TCP wire protocol. It is 148 independent of the dynamic behaviour of TCP and it is independent of 149 (and thus compatible with) any protocol that encapsulates TCP, 150 including IPv4 and IPv6. 152 1.3. Experiment Goals 154 TCP is critical to the robust functioning of the Internet, therefore 155 any proposed modifications to TCP need to be thoroughly tested. The 156 present specification describes an experimental protocol that 157 provides extra option space on the initial TCP SYN segment. The 158 intention is to specify the protocol sufficiently so that more than 159 one implementation can be built in order to test its function, 160 robustness and interoperability (with itself, with previous version 161 of TCP, and with various commonly deployed middleboxes). 163 Success criteria: The experimental protocol will be considered 164 successful if it satisfies the following requirements in the 165 consensus opinion of the IETF tcpm working group. {ToDo: describe 166 success criteria} 168 Duration: To be credible, the experiment will need to last at least 169 12 months from publication of the present specification. If 170 successful, a report on the experiment will be written up. it 171 would then be appropriate to work on a standards track 172 specification, in which the experiment report may be included. 174 1.4. Terminology 176 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 177 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 178 document are to be interpreted as described in [RFC2119]. In this 179 document, these words will appear with that interpretation only when 180 in ALL CAPS. Lower case uses of these words are not to be 181 interpreted as carrying RFC-2119 significance. 183 TCP header: As defined in RFC 793 [RFC0793]. Even though the 184 present specification places TCP options at the end of the 185 payload, the term 'TCP header' is still used to mean only those 186 fields at the head of the segment, delimited by the TCP Data 187 Offset. 189 Extra TCP options: TCP options placed in the space that the present 190 specification makes available beyond the Data Offset, and beyond 191 any optional payload. 193 TCP payload: User-data to be passed to the layer above TCP. The 194 present document redefines the TCP payload so that it does not 195 include the extra TCP options placed at the end of the payload. 197 Legacy connection: A connection starting with a SYN that a pre- 198 existing TCP server will fully understand. 200 Upgraded connection: A connection starting with an upgraded SYN that 201 will only be fully understood by a server complying with the 202 present specification (even though it might appear valid to a pre- 203 existing TCP server). 205 Legacy server: A TCP listener complying with pre-existing TCP 206 specifications, but not with the present document. 208 Upgraded server: A TCP listener complying with the present document 209 as well as with pre-existing TCP specifications. 211 2. Protocol Specification 213 2.1. Dual 3-Way Handshake 215 The upgraded TCP client sends two alternative SYNs: a regular SYN in 216 case the server is legacy and a SYN-U (see Section 2.3.1) in case the 217 server is upgraded. The two SYNs MUST have the same network 218 addresses and the same destination port, but different source ports. 219 Once the client establishes which type of server has responded, it 220 continues the connection appropriate to that server type and aborts 221 the other. 223 The SYN intended for upgraded servers (SYN-U) includes additional TCP 224 options at the end of the payload (see Section 2.3.1). The options 225 are placed at the end of the payload to ensure that the SYN-U is more 226 likely to traverse middleboxes that inspect application-layer 227 headers, which they expect to be at the start of the payload. 229 Table 1 summarises the TCP 3-way handshake exchange for each of the 230 two SYNs between an upgraded TCP client (the active opener) and 231 either: 233 1. a legacy server, using the two columns to the left, or 235 2. an upgraded server, using the two columns to the right 237 Because the two SYNs come from different source ports, the server 238 will treat them as separate connections, probably using separate 239 threads (assuming a threaded server). A load balancer might forward 240 each SYN to separate replicas of the same logical server. Each 241 replica will deal with each incoming SYN independently - it does not 242 need to co-ordinate with the other replica. 244 +---+-----------------+-----------+---+----------------+------------+ 245 | | Legacy Server | Legacy | | Upgraded | Upgraded | 246 | | Thread X | Server | | Server Thread | Server | 247 | | | Thread Y | | X | Thread Y | 248 +---+-----------------+-----------+---+----------------+------------+ 249 | 1 | >SYN | >SYN-U | | | >SYN | >SYN-U | 250 | | | | | | | | 251 | 2 | ACK | >RST | | | >RST | >ACK | 258 | | | | | | | | 259 | 5 | Cont... | | | | | Cont... | 260 +---+-----------------+-----------+---+----------------+------------+ 262 Table 1: Dual 3-Way Handshake in Two Server Scenarios 264 Each column of the table shows the required 3-way handshake exchange 265 within each connection, using the following symbols: 267 > means client to server 269 < means server to client 271 Cont... means the TCP connection continues as normal 273 The connection that starts with a regular SYN is called the 'legacy 274 connection' and the one that starts with a SYN-U is called the 275 'upgraded connection'. An upgraded server MUST respond to a SYN-U 276 with an upgraded SYN/ACK (termed a SYN/ACK-U and defined in 277 Section 2.3.2). Then the client recognises that it is talking to an 278 upgraded server. The client's behaviour depends on which response it 279 receives first, as follows: 281 o If the client first receives a SYN/ACK response on the legacy 282 connection, it MUST wait for the response on the upgraded 283 connection. It then proceeds as follows: 285 * If the response on the upgraded connection is a regular SYN/ 286 ACK, the client MUST reset (RST) the upgraded connection and it 287 can continue with the legacy connection. 289 * If the response on the upgraded connection is an upgraded SYN/ 290 ACK-U, the client MUST reset (RST) the legacy connection and it 291 can continue with the upgraded connection. 293 o If the client first receives a legacy SYN/ACK response on the 294 upgraded connection, it MUST reset (RST) the upgraded connection 295 immediately. It can then wait for the response on the legacy 296 connection and, once it arrives, continue as normal. 298 o If the client first receives an upgraded SYN/ACK-U response on the 299 upgraded connection, it MUST reset (RST) the legacy connection 300 immediately and continue with the upgraded connection. 302 2.2. Retransmission Behaviour 304 If the client receives a response to the SYN, but a short while after 305 that {duration TBA} the response to the SYN-U has not arrived, it 306 SHOULD retransmit the SYN-U. If latency is more important than the 307 extra TCP options, in parallel to any retransmission, or instead of 308 any retransmission, the client MAY give up on the upgraded (SYN-U) 309 connection by sending a reset (RST) and completing the 3-way 310 handshake of the legacy connection. 312 If the client receives no response at all to either the SYN or the 313 SYN-U, it SHOULD solely retransmit one or the other, not both. If 314 latency is more important than the extra TCP options, it will 315 retransmit the SYN. Otherwise it will retransmit the SYN-U. It MUST 316 NOT retransmit both segments, because the lack of response could be 317 due to severe congestion. 319 2.3. Segment Structure 321 2.3.1. SYN-U Structure (Non-Deterministic) 323 {Temporary note: The structure for a SYN-U segment specified in this 324 section leads to slightly non-deterministic behaviour, so it will be 325 labelled SYN-UN (for Upgraded Non-deterministic). A deterministic 326 alternative is given in Appendix A. It is expected that one will be 327 chosen during the IETF review process, at which point the other will 328 be deleted.} 330 A SYN-UN is structured as shown in Figure 1. Up to the payload, it 331 is identical to a regular TCP SYN segment, with a base TCP header 332 (TCP hdr) and the usual facility to set the Data Offset (DO) to allow 333 space for TCP options (TCPopts#2). The significance of '#2' will be 334 explained later. 336 Unlike a legacy TCP segment, the payload of a SYN-UN does not 337 continue to the end of the packet. Instead, it can be seen that 338 space is provided for additional TCP options at the end of the packet 339 at an offset from the end of the packet defined using the Extra 340 Options Offset (EOO) field. The EOO field is read from a new 341 'SynOpSis' TCP option defined in this specification. 343 Note that the handshake described earlier (Section 2.1) ensures that 344 a legacy server will never erroneously pass this mixture of payload 345 and options to the application. If a SYN carries a payload, a TCP 346 server holds back the payload from the application until the 3-way 347 handshake completes. And, once the upgraded client recognises it is 348 talking to a legacy server it will abort the 3-way handshake of the 349 upgraded connection. Therefore it will always prevent the mixed 350 payload from confusing the application. 352 The SynOpSis TCP option MUST be the final TCP option right-aligned at 353 the end of the payload so that the server can find it (using the 354 length of the whole packet found in the network layer header, e.g. 355 IPv4 or IPv6). 357 | EPOO | 358 ,---------->| 359 | DO | | EOO | 2 | 360 ,-------------------->| |<----------------------.<---------. 361 +---------+-----------+---------+-----------+-----------+----------+ 362 | TCP hdr | TCPopts#2 | Payload | TCPopts#1 | TCPopts#3 | SynOpSis | 363 +---------+-----------+---------+-----------+-----------+----------+ 365 All offsets are specified in 4-octet (32-bit) words. 367 Figure 1: The Structure of a SYN-UN segment (not to scale) 369 The SynOpSis TCP option has Kind SynOpSis, with a value {TBA} (See 370 Section 7). The internal structure of the SynOpSis TCP option for a 371 SYN-UN is defined in Figure 2. In general, the SynOpSis TCP option 372 can have different lengths for different purposes. However, in a 373 SYN-UN, the SynOpSis TCP option MUST have Length = 8, so that the 374 server can find where it starts (8 octets before the end of the 375 segment). The first 4 octets of the option contain a magic number 376 {TBA} to reduce the chance that arbitrary data within the payload 377 will be mistaken for a SynOpSis TCP option. 379 +---------------+---------------+-------------------------------+ 380 | Kind=SynOpSis | Length=8 | Magic Number | 381 +---------------+---------------+---------------+---------------+ 382 | Magic Number (cont) | EOO | EPOO | 383 +---------------+---------------+---------------+---------------+ 385 Figure 2: SynOpSis TCP Option for a SYN-UN 387 Two 1-octet offset fields are placed at the end of the SynOpSis TCP 388 option for a SYN-UN: 390 The Extra Options Offset (EOO): The EOO field defines the total size 391 of the extra TCP options in 4-octet words. The start of the extra 392 options will be located 4 * (EOO + 2) octets from the end of the 393 packet. The IP payload size will be 4 * (DO + EOO + 2) + 394 TCP_payload_size. 396 The Extra Prefix Options Offset: The EPOO field defines an 397 additional offset from the start of the extra TCP options that 398 identifies the extent of those extra TCP options that need to be 399 processed before any regular TCP options. The EPOO field defines 400 this offset in 4-octet words. 402 2.3.2. SYN/ACK-U Structure 404 The SYN/ACK-U carries a simple SynOpSis flag TCP option as defined in 405 Figure 3. It solely identifies that the SYN/ACK is from a server 406 that supports SynOpSis TCP options. 408 +---------------+---------------+ 409 | Kind=SynOpSis | Length=2 | 410 +---------------+---------------+ 412 Figure 3: A SynOpSis flag TCP option 414 2.4. TCP Option Processing 416 2.4.1. Writing TCP Options 418 If an upgraded TCP client includes the TCP Fast Open option 419 [I-D.ietf-tcpm-fastopen] in the SYN, it MUST be placed with the extra 420 TCP options after the end of the payload. An upgraded TCP client 421 MUST NOT place any TCP option in the TCP header of a SYN that might 422 cause a TCP server to pass user-data directly to the application 423 before the 3-way handshake completes. 425 In order to ensure that the first extra TCP option aligns on a 426 4-octet word boundary, a TCP client SHOULD {ToDo: MUST?} start the 427 extra TCP options with sufficient 1-octet no-op TCP options 428 [RFC0793]. The number of no-op octets required will be 3 - ((S - 1) 429 % 4), where S is the IP payload size in octets and '%' is the modulo 430 operation. 432 2.4.2. Reading TCP Options 434 Before processing any TCP options, if the TCP payload is greater than 435 9 octets, an upgraded server MUST determine whether there is a 436 SynOpSis TCP option at the end of the packet by checking all the 437 following conditions: 439 o The Kind value is the SynOpSis Kind value; 441 o The length is 8; 443 o The next 4 octets match the magic number; 445 o The sum of the value of the EOO field, and all the length fields 446 found by walking along the TCP options at the end of the payload 447 exactly reaches the end of the packet. 449 If any of these conditions fails, the server MUST proceed by 450 processing any TCP options in the TCP header (TCPopts#2 in Figure 1), 451 and treat all octets after the Data Offset as user-data. 453 If an upgraded server finds a valid SynOpSis TCP option at the end of 454 the packet, it MUST process the TCP options in a SYN-UN in the 455 following order: 457 1. The Prefix TCP options (TCPopts#1 in Figure 1) 459 2. The regular TCP options following the main header but before the 460 payload (TCPopts#2 in Figure 1); 462 3. The Suffix TCP options (TCPopts#3 in Figure 1) 464 This arrangement allows the client to reveal certain TCP options for 465 processing by middleboxes (TCPopts#2), while concealing others after 466 the payload. And the client can still control the order in which the 467 server processes all the TCP options. 469 2.4.3. Forwarding TCP Options 471 Middleboxes exist that process some aspects of the TCP header. 472 Although the present specification defines a new location for extra 473 TCP options at the end of a packet, this is intended for the 474 exclusive use of the destination TCP implementation. Legacy 475 middleboxes will not expect to find TCP options beyond the Data 476 Offset anyway. A middlebox MUST continue to treat any data beyond 477 the Data Offset solely as user-data. 479 A TCP implementation is not necessarily aware whether it is deployed 480 in a middlebox or in a destination, e.g. a split TCP connection might 481 use a regular TCP implementation. Therefore, a general-purpose TCP 482 that implements the present specification will need a configuration 483 switch to disable any search for TCP options at the end of the 484 packet. 486 3. Discussion of Non-Determinism 488 All the TCP headers and options before the payload of a SYN-UN (see 489 Section 2.3.1) are completely indistinguishable from a regular SYN. 490 This makes it very likely that a SYN-UN will be able to traverse any 491 legacy middlebox, even one that splits a TCP connection. A SYN-UN 492 can only be distinguished from any legacy SYN by the presence of the 493 SynOpSis bit-pattern at the end of the packet. 495 This is termed the non-deterministic segment structure, because there 496 will be a very small probability (roughly 2^{-48-L}) that payload 497 data on a regular (non-SynOpSis) SYN could: 499 o happen to contain a pattern in exactly the right place that 500 matches the kind, length and magic number of a SynOpSis TCP option 501 and 503 o happen to contain a valid sequence of numbers in exactly the right 504 places to look like a valid sequence of TCP option lengths. 506 In the above formula, L is the sum of all the bits in all the TCP 507 option length fields that seem to be in the payload. For instance, 508 if it appears that there are 2 TCP options before the SynOpSis option 509 at the end of the payload, then L=2*8=16, and the probability of 510 incorrectly using user-data as TCP options will then be roughly 511 2^(-64) = 1 in 18 billion billion (18x10^18). This 'stealth' 512 approach has been taken in order to maximise the chances of 513 traversing all the various types of middlebox. 515 Note that the non-determinism is only in one direction. I.e., there 516 is a small chance that arbitrary user data might be mistaken for the 517 SynOpSis TCP option, but it is not possible that a valid SynOpSis TCP 518 option would ever be mistaken for user data. 520 {ToDo: It is recognised that it is potentially unsafe to use 521 probability to determine whether TCP options are hidden at the end of 522 the payload. If the WG prefers not to use the non-deterministic 523 structure in Section 2.3.1, it can be replaced with the alternative 524 more conventional deterministic protocol structure in . 525 (Appendix A.1), and this discussion of non-determinism could then be 526 deleted.} 528 4. Migration to Single Handshake 530 The strategy of sending two SYNs in parallel is not essential to the 531 Alternative SYN approach. It is merely an initial strategy that 532 minimises latency when the client does not know whether the server 533 has been upgraded. Evolution to a single SYN with greater optio 534 space could proceed as follows: 536 o Clients could maintain a white-list of upgraded servers discovered 537 by experience and send just the upgraded SYN-U in these cases. 539 o Then, for white-listed servers, the client could send a legacy SYN 540 only in the rare cases when an attempt to use an upgraded 541 connection had previously failed (perhaps a mobile client 542 encountering a new blockage on a new path to a server that it had 543 previously accessed over a good path). 545 o In the longer term, once it can be assumed that most servers are 546 upgraded and the risk of having to fall back to legacy has dropped 547 to near-zero, clients could send just the upgraded SYN first, 548 without maintaining a white-list, but still be prepared to send a 549 legacy SYN in the rare cases when that might fail. 551 There is concern that, although dual handshake approaches might well 552 eventually migrate to a single handshake, they do not scale when 553 there are numerous choices to be made simultaneously. For instance, 554 trying IPv4 and IPv6 in parallel [RFC6555]; and trying SCTP and TCP 555 in parallel [I-D.wing-tsvwg-happy-eyeballs-sctp]; and trying ECN and 556 non-ECN in parallel; and so on. Nonetheless, it is not necessary to 557 try every possible combination of N choices, which would otherwise 558 require 2^N handshakes (assuming each choice is between two options). 559 Instead, a selection of the choices could be attempted together. At 560 the extreme, two handshakes could be attempted, one with all the new 561 features, and one without all the new features. 563 5. Interaction with Pre-Existing TCP 565 {ToDo: TCP API, TCP States and Transitions, TCP Segment Processing, 566 Processing and Segment Size Overhead, Connectionless Resets, ICMP 567 Handling. Interaction with EDO, Interaction with TFO (see 568 Section 2.4.1), Interactions with Other TCP Variants including SYN 569 Cookies, Forward-Compatibility, Interaction with TCP assumptions of 570 Middleboxes. } 572 6. Dual Handshake: The Explicit Variant 574 This explicit dual handshake is similar to that in Section 2.1, 575 except the SYN that the client intends for a legacy server is 576 explicitly distinguishable from the SYN that would be sent by a 577 legacy client. Then, in the case of an upgraded server, the server 578 can reset the legacy connection itself, rather than creating 579 connection state for at least a round trip until the client resets 580 the connection. 582 {Temporary note: The choice between the explicit handshake in the 583 present section or the handshake in Section 2.1 is a tradeoff between 584 robustness against middlebox interference and minimal server state. 585 During the IETF review process, one might be chosen as the only 586 variant to go forward, at which point the other will be deleted. 587 Alternatively, the IETF could allow both variants and a client could 588 be implemented with either, or both. If both, the application could 589 choose which to use at run-time. Then we will need a section 590 describing the necessary API.} 592 For an explicit dual handshake, the TCP client still sends two 593 alternative SYNs: a SYN-L intended for legacy servers and a SYN-U 594 intended for upgraded servers. The two SYNs MUST have the same 595 network addresses and the same destination port, but different source 596 ports. Once the client establishes which type of server has 597 responded, it continues the connection appropriate to that server 598 type and aborts the other. The SYN intended for upgraded servers 599 includes additional options at the end of the payload (the SYN-U 600 defined as before in Section 2.3.1). 602 Table 2 summarises the TCP 3-way handshake exchange for each of the 603 two SYNs between an upgraded TCP client (the active opener) and 604 either: 606 1. a legacy server, using the two columns to the left, or 608 2. an upgraded server, using the two columns to the right 609 The table uses the same layout and symbols as Table 1, which have 610 already been explained in Section 2.1. 612 +---+-------------+--------------+---+--------------+---------------+ 613 | | Legacy | Legacy | | Upgraded | Upgraded | 614 | | Server | Server | | Server | Server Thread | 615 | | Thread X | Thread Y | | Thread X | Y | 616 +---+-------------+--------------+---+--------------+---------------+ 617 | 1 | >SYN-L | >SYN-U | | | >SYN-L | >SYN-U | 618 | | | | | | | | 619 | 2 | ACK | >RST | | | | >ACK | 622 | | | | | | | | 623 | 4 | Cont... | | | | | Cont... | 624 +---+-------------+--------------+---+--------------+---------------+ 626 Table 2: Explicit Variant of Dual 3-Way Handshake in Two Server 627 Scenarios 629 As before, an upgraded server MUST respond to a SYN-U with a SYN/ACK- 630 U. Then, the client recognises that it is talking to an upgraded 631 server. 633 Unlike before, an upgraded server MUST respond to a SYN-L with a RST. 634 However, the client cannot rely on this behaviour, because a 635 middlebox might strip the SynOpSis TCP option from the SYN-L before 636 it reaches the server. Then the handshake would effectively revert 637 to the implicit variant. Therefore the client's behaviour still 638 depends on which SYN-ACK arrives first, so its response to SYN-ACKs 639 has to follow the rules specified for the implicit handshake variant 640 in Section 2.1. 642 The rules for processing TCP options are unchanged from those in 643 Section 2.4. 645 6.1. Retransmission Behaviour - Explicit Variant 647 If the client receives a RST on one connection, but a short while 648 after that {duration TBA} the response to the SYN-U has not arrived, 649 it SHOULD retransmit the SYN-U. If latency is more important than 650 the extra TCP options, in parallel to any retransmission, or instead 651 of any retransmission, the client MAY send a SYN without any SynOpSis 652 option, in case this is the cause of the black-hole. However, the 653 presence of the RST implies that one of the SYNs with a SynOpSis TCP 654 option (the SYN-L) probably reached the server, therefore it is more 655 likely (but not certain) that the lack of response on the other 656 connection is due to transmission loss or congestion loss. 658 If the client receives no response at all to either the SYN-L or the 659 SYN-U, it SHOULD solely retransmit one or the other, not both. If 660 latency is more important than the extra TCP options, it SHOULD send 661 a SYN without a SynOpSis TCP option. Otherwise it SHOULD retransmit 662 the SYN-U. It MUST NOT retransmit both segments, because the lack of 663 response could be due to severe congestion. 665 6.2. SYN-L Structure 667 The SYN-L is merely a SYN with with an extra SynOpSis flag option as 668 shown in Figure 3 (see Section 2.3.2). It solely identifies that the 669 SYN is from a client that supports SynOpSis TCP options. In the case 670 of a legacy server, it will just ignore this TCP option that it 671 doesn't recognise. 673 6.3. Corner Cases 675 There is a small but finite possibility that one load-sharing replica 676 of a server is upgraded, while another is not. The Implicit 677 Handshake is robust to this possibility, but the Explicit Handshake 678 is not., unless the following additional rules are followed: 680 Both aborted: The client might receive a RST on its legacy 681 connection in response to its SYN-L, then a regular SYN/ACK on its 682 upgraded connection in response to its SYN-U. In this case, the 683 client MUST still respond with a RST on its upgraded connection. 684 Otherwise, its extra TCP options will be passed as user-data to 685 the application by the legacy server. If confronted with this 686 unusual scenario where both connections are aborted, the client's 687 only recourse is to retry a new dual handshake on different source 688 ports, or ultimately to fall-back to sending a regular SYN. 690 Both successful: This could happen in either order but, in both 691 cases, the client aborts the last connection to respond: 693 * The client completes the legacy handshake (because it receives 694 a SYN/ACK), but then, before it has aborted the upgraded 695 connection, it receives a SYN/ACK-U on it. In this case, the 696 client MUST abort the upgraded connection even though it would 697 work. Otherwise the client will have opened both connections, 698 one with extra TCP options and one without. This could confuse 699 the application. 701 * The client completes the the upgraded connection after 702 receiving a SYN/ACK-U, but then it receives a SYN/ACK on the 703 legacy connection. In this case, the client MUST abort the 704 legacy connection. 706 7. IANA Considerations 708 This specification requires IANA to allocate one value from the TCP 709 option Kind name-space, against the name "Sister SYN Options 710 (SynOpSis)" 712 Early implementation before the IANA allocation MUST follow [RFC6994] 713 and use experimental option 254 and magic number 0xHHHH (16 bits) 714 {ToDo: Value TBA and register this with IANA}, then migrate to the 715 new option after the allocation. 717 8. Security Considerations 719 Certain cryptographic functions have different coverage rules for the 720 TCP header and TCP payload. Placing some TCP options at the end of 721 the payload could mean that they are treated differently from regular 722 TCP options. This is a deliberate feature of the protocol, but 723 application developers will need to be aware that this is the case. 725 {ToDo: More} 727 9. Acknowledgements 729 The idea of this approach grew out of discussions with Joe Touch 730 while developing draft-touch-tcpm-syn-ext-opt, and with Jana Iyengar 731 and Olivier Bonaventure. The idea that it is architecturally 732 preferable to place a protocol extension behind a higher layer, and 733 code its location into upgraded implementations, was originally 734 articulated by Rob Hancock. The following people provided useful 735 review comments: Joe Touch, Yuchung Cheng. 737 Bob Briscoe was part-funded by the European Community under its 738 Seventh Framework Programme through the Trilogy 2 project (ICT- 739 317756). The views expressed here are solely those of the authors. 741 10. References 743 10.1. Normative References 745 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 746 793, September 1981. 748 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 749 Requirement Levels", BCP 14, RFC 2119, March 1997. 751 [RFC6994] Touch, J., "Shared Use of Experimental TCP Options", RFC 752 6994, August 2013. 754 10.2. Informative References 756 [I-D.ietf-tcpm-fastopen] 757 Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 758 Fast Open", draft-ietf-tcpm-fastopen-09 (work in 759 progress), July 2014. 761 [I-D.wing-tsvwg-happy-eyeballs-sctp] 762 Wing, D. and P. Natarajan, "Happy Eyeballs: Trending 763 Towards Success with SCTP", draft-wing-tsvwg-happy- 764 eyeballs-sctp-02 (work in progress), October 2010. 766 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 767 Authentication Option", RFC 5925, June 2010. 769 [RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with 770 Dual-Stack Hosts", RFC 6555, April 2012. 772 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 773 "TCP Extensions for Multipath Operation with Multiple 774 Addresses", RFC 6824, January 2013. 776 Appendix A. Alternative Protocol Specifications 778 This appendix is informative and will be deleted before publication. 779 It documents protocol alternatives that the IETF may wish to consider 780 in place of those in the body of the specification. 782 A.1. SYN-U Structure (Deterministic) 784 This appendix describes a structure for an upgraded SYN called SYN-UD 785 (for upgraded deterministic) that is an alternative to the non- 786 deterministic structure defined in Section 2.3.1. It is termed 787 'deterministic' because it uses the conventional placement for the 788 SynOpSis TCP option (instead of the unconventional SYN-UN placement 789 at the end of the packet, where arbitrary user-data could be mistaken 790 for the SynOpSis option). 792 However, given it uses the new SynOpSis TCP option in the TCP header, 793 it will not always successfully traverse middleboxes. Unlike a SYN- 794 UN, a SYN-UD will certainly not traverse legacy middleboxes that do 795 not forward unrecognised TCP options, and it is unlikely to traverse 796 a legacy middlebox that splits TCP connections, unless it copies 797 unrecognised TCP options. Nonetheless, like the SYN-UN, the options 798 are still placed at the end of the payload to ensure that the SYN-UD 799 is more likely to traverse middleboxes that inspect application-layer 800 headers, which they expect to be at the start of the payload. 802 The placement of the SynOpSis TCP option in a SYN-UD segment is shown 803 in Figure 4. It can be seen that extra TCP options are still placed 804 at the end of the payload at an offset from the end of the packet 805 defined using the Extra Options Offset (EOO) field. 807 The EOO field is read from a new 'SynOpSis' TCP option defined in 808 this specification. The SynOpSis TCP options is placed in the 809 regular TCP option space of the SYN-UD. 811 | DO | | EOO | 812 ,------------------------------------------->| |<----------. 813 +---------+-----------+----------+-----------+---------+-----------+ 814 | TCP hdr | TCPopts#1 | SynOpSis | TCPopts#3 | Payload | TCPopts#2 | 815 +---------+-----------+----------+-----------+---------+-----------+ 817 Figure 4: The Structure of an alternative (deterministic) SYN-UD 818 segment (not to scale) 820 The SynOpSis TCP option for a SYN-UD segment MUST have Kind SynOpSis, 821 with a value {TBA} (See Section 7) and Length = 3. In general, the 822 SynOpSis TCP option can have different lengths for different 823 purposes. However, in a SYN-UD, the SynOpSis TCP option has Length = 824 3, so that it can carry the 1-octet EOO field, which MUST be present 825 in a SYN-UD. The internal structure of the SynOpSis TCP option for a 826 SYN-UD segment is defined in Figure 5. 828 +---------------+---------------+---------------+ 829 | Kind=SynOpSis | Length=3 | EOO | 830 +---------------+---------------+---------------+ 832 Figure 5: SynOpSis TCP Option for a deterministic SYN-UD 834 The Extra Options Offset (EOO) field defines the total size of the 835 extra TCP options in 4-octet words. The start of the extra options 836 will be located 4 * EOO octets from the end of the packet. The IP 837 packet payload size will be 4 * (DO + EOO) + TCP_payload_size. 839 An upgraded server MUST process the TCP options in a SYN-UD in the 840 following order: 842 1. The regular TCP options following the main header but before the 843 SynOpSis TCP option (TCPopts#1 in Figure 4) 845 2. The TCP options at the end of the payload (TCPopts#2 in Figure 4) 846 3. The regular TCP options following the main header but after the 847 SynOpSis TCP option (TCPopts#3 in Figure 4); 849 Appendix B. Comparison of Alternatives 851 B.1. Implicit vs Explicit Dual Handshake 853 In the body of this specification, two variants of the dual handshake 854 are defined: 856 1. The implicit dual handshake (Section 2.1) with just a regular SYN 857 (no SynOpSis flag option) on the legacy connection; 859 2. The explicit dual handshake (Section 6) with a SYN-L (SynOpSis 860 flag option) on the legacy connection. 862 Both schemes double up connection state (for a round trip) on the 863 legacy server. But only the implicit scheme doubles up connection 864 state (for a round trip) on the upgraded server as well. On the 865 other hand, the explicit scheme risks delay accessing a legacy server 866 if a middlebox discards the SYN-L (e.g. some firewalls discard 867 packets with unrecognised TCP options). Table 3 summarises these 868 points. 870 +----------------------------------+---------------+----------------+ 871 | | SYN | SYN-L | 872 | | (Implicit) | (Explicit) | 873 +----------------------------------+---------------+----------------+ 874 | Minimum state on upgraded server | - | + | 875 | | | | 876 | Minimum risk of delay to legacy | + | - | 877 | server | | | 878 +----------------------------------+---------------+----------------+ 880 Table 3: Comparison of Implicit vs. Explicit Dual Handshake on the 881 Legacy Connection 883 There is no need for the IETF to choose between these. If the spec 884 allows either or both, the tradeoff can be left to implementers at 885 build-time, or to the application at run-time. 887 Initially clients might choose the Implicit Dual Handshake to 888 minimise delays due to middlebox interference. But later, perhaps 889 once more middleboxes support the scheme, clients might choose the 890 Explicit scheme, to minimise state on upgraded servers. 892 B.2. Non-Deterministic vs Deterministic SYN-U 894 Two alternative segment structures for the SYN-U are defined, but in 895 this case it is recommended that the IETF needs to choose between 896 them so that only one or the other would be specified: 898 a. The non-deterministic SYN-UN (Section 2.3.1), with the SynOpSis 899 TCP option located at the end of the packet; 901 b. The deterministic SYN-UD (Appendix A.1), with the SynOpSis TCP 902 option located conventionally in the sequence of TCP options in 903 the TCP header. 905 The non-deterministic SYN-UN presents a small risk of user data being 906 mistaken for TCP options. Also, whether or not the client needs 907 extra option space, it requires the server to always check for a TCP 908 option at the end of any SYN with a payload greater than 9 octets. 909 On the other hand, the deterministic SYN-UD risks delay accessing an 910 upgraded server because it is visible to middleboxes that discard 911 packets with unrecognised TCP options. Also the SYN-UD is vulnerable 912 to being removed by middleboxes that do not forward unrecognised 913 options, whereas the SYN-UN is likely to traverse all legacy 914 middleboxes, even split TCP connections. Table 4 summarises these 915 points. 917 +---------------------------+---------------------+-----------------+ 918 | | SYN-UN (Non- | SYN-UD | 919 | | deterministic) | (Deterministic) | 920 +---------------------------+---------------------+-----------------+ 921 | User data unmistakable | - | + | 922 | | | | 923 | No need for upgraded | - | + | 924 | server to check end of | | | 925 | every SYN payload | | | 926 | | | | 927 | Minimum risk of delay to | + | - | 928 | upgraded server | | | 929 | | | | 930 | Extra TCP options likely | + | - | 931 | to traverse all | | | 932 | middleboxes | | | 933 +---------------------------+---------------------+-----------------+ 935 Table 4: Comparison of Implicit vs. Explicit Dual Handshake on the 936 Legacy Connection 938 The IETF needs to choose between SYN-UN and SYN-UD, because if 939 implementation of either or both were allowed, the two deficiencies 940 of SYN-UN would still affect server implementations, whether or not 941 the client used a SYN-UN to take advantage of the two benefits. 943 Currently this document favours SYN-UN, because SYN-UD's lack of 944 reliable middlebox traversal introduces a functional deficiency (if 945 extra option space is absolutely required, the connection cannot even 946 start). In contrast, SYN-UN's first failing has vanishingly small 947 probability, and its second failing 'only' increases server 948 processing - it does not impair the ability of connections to 949 function outright. 951 B.3. Comparison with Other Proposals 953 {ToDo} 955 Appendix C. Protocol Design Issues (to be Deleted before Publication) 957 This appendix is informative, not normative. It records outstanding 958 issues with the protocol design that will need to be resolved before 959 publication. 961 Reliance on segmentation boundary: The definition of the position of 962 the SynOpSis TCP options depends on where the sender decided to 963 place a segment boundary. In general, a sender cannot rely on 964 segment boundaries being preserved, e.g. by segmentation 965 offloading hardware. In the case of a SYN, no more payload data 966 is sent in the first round trip, therefore using this segment 967 boundary is probably safe. However, it may constrain future 968 attempts to send additional data in the first round. 970 Tie to EDO?: Consider whether a successful SYN/ACK-U implies EDO is 971 also supported. 973 Size of SynOpSis magic number: Justify choice. 975 Appendix D. Change Log (to be Deleted before Publication) 977 A detailed version history can be accesssed at 978 981 From briscoe...-01 to briscoe...-02: 983 Technical changes: 985 * Defined the client behaviour dependent on which response 986 arrives first. 988 * Allowed retransmission of either SYN or SYN-U if no response 989 from either. 991 * Redefined EOO as an offset from the end of the packet, not from 992 the beginning of the payload. 994 * Added section on Migration to a Single Handshake. Reworded 995 dual handshake so that it is not mandatory for the client to 996 send dual SYNs simultaneously; only the relation between the 997 SYNs and the response to either is mandatory, while parallel 998 SYNs is purely for latency reduction. 1000 * Added rules for writing TCP options, i.e. i) options like TFO 1001 MUST NOT be located in the TCP header and ii) add no-ops to 1002 align on 4-octet boundary. 1004 * Added rules for forwarding TCP options, i.e. only the 1005 destination looks for TCP options after the Data Offset, not 1006 middleboxes. 1008 * Moved the Explicit Handshake variant (SYN-L) into the body from 1009 the appendix, and recommended the choice could be down to 1010 implementers or apps. Included section on corner cases. 1012 * Introduced more normative language throughout the Protocol 1013 Spec. 1015 Editorial changes: 1017 * Added temporary motivation section 1019 * Added confusible terminology to Terminology section. 1021 * Divided protocol spec into sub-sections. 1023 * Handshake table: Clarified that the two columns under each 1024 server represent separate threads, that may run on separate 1025 servers, without co-ordination. Represented message 1026 dependencies in the alignment of the rows. 1028 * Explained the table. 1030 * Explained why a legacy server won't ever pass SYN-U to the app. 1032 * More precisely described loss as 'not arrived before a 1033 timeout', and explained the tradeoff between latency and extra 1034 TCP options. 1036 * Gave reasoning for locating TCP options in three groups. 1038 * Acknowledged Rob Hancock for the architectural idea of hiding 1039 an extension to a protocol in the layer above. 1041 * Appendix about protocol alternatives now only presents the SYN- 1042 UD alternative, given the implicit/explicit handshake choice 1043 has been moved to the body. 1045 * Rewrote appendix about comparing the choices to treat the two 1046 pairs of choices separately, rather than discussing all four 1047 combinations of pairs of choices. 1049 From briscoe...-00 to briscoe...-01: 1051 Technical changes: 1053 * Added the definition of a SYN/ACK-U 1055 * Deterministic Protocol Spec: Replaced SYN/ACK-L with RST (Joe 1056 Touch) 1058 * Added Non-Deterministic Explicit and Deterministic Implicit 1059 Protocol Specs in Appendices 1061 * Added Comparison of Alternatives as an Appendix 1063 * Security Considerations: Added note about crypto coverage of 1064 TCP options in the payload being different from that of other 1065 TCP options. 1067 * Added an appendix to record outstanding Protocol Design Issues, 1068 and included segmentation boundary issue (Yuchung Cheng). 1070 Editorial changes: 1072 * Changed TCP option Kind from SYN-OP-SIS to SynOpSis 1074 * Protocol Spec: Explained why the extra TCP options are placed 1075 at the end of the payload 1077 * Throughout: avoided the ambiguity in the word payload, now that 1078 there are TCP options at the end of the payload. Some might 1079 consider these to be within the payload, while others might 1080 consider them to be placed beyond the payload. 1082 * Segment structure figures: Clarified that they are not to 1083 scale. 1085 * Added placeholder section "Interaction with TCP" 1087 * Acknowledged reviewers 1089 Author's Address 1091 Bob Briscoe 1092 BT 1093 B54/77, Adastral Park 1094 Martlesham Heath 1095 Ipswich IP5 3RE 1096 UK 1098 Phone: +44 1473 645196 1099 Email: bob.briscoe@bt.com 1100 URI: http://bobbriscoe.net/