idnits 2.17.1 draft-briscoe-tcpm-inner-space-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC793, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC793, updated by this document, for RFC5378 checks: 1981-09-01) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 27, 2014) is 3468 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 6555 (Obsoleted by RFC 8305) -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor Extensions (tcpm) B. Briscoe 3 Internet-Draft BT 4 Updates: 793 (if approved) October 27, 2014 5 Intended status: Experimental 6 Expires: April 30, 2015 8 Inner Space for TCP Options 9 draft-briscoe-tcpm-inner-space-01 11 Abstract 13 This document describes an experimental method to extend the limited 14 space for control options in every segment of a TCP connection. It 15 can use a dual handshake so that, from the very first SYN segment, 16 extra option space can immediately start to be used optimistically. 17 At the same time a dual handshake prevents a legacy server from 18 getting confused and sending the control options to the application 19 as user-data. The dual handshake is only one strategy - a single 20 handshake will usually suffice once deployment has got started. The 21 protocol is designed to traverse most known middleboxes including 22 connection splitters, because it sits wholly within the TCP Data. It 23 also provides reliable ordered delivery for control options. 24 Therefore, it should allow new TCP options to be introduced i) with 25 minimal middlebox traversal problems; ii) with incremental deployment 26 from legacy servers; iii) without an extra round of handshaking delay 27 iv) without having to provide its own loss recovery and ordering 28 mechanism and v) without arbitrary limits on available space. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on April 30, 2015. 47 Copyright Notice 49 Copyright (c) 2014 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Motivation for Adoption Now (to be removed before 66 publication) . . . . . . . . . . . . . . . . . . . . . . 6 67 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 1.3. Experiment Goals . . . . . . . . . . . . . . . . . . . . 6 69 1.4. Document Roadmap . . . . . . . . . . . . . . . . . . . . 7 70 1.5. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 71 2. Protocol Specification . . . . . . . . . . . . . . . . . . . 9 72 2.1. Protocol Interaction Model . . . . . . . . . . . . . . . 9 73 2.1.1. Dual 3-Way Handshake . . . . . . . . . . . . . . . . 9 74 2.1.2. Dual Handshake Retransmission Behaviour . . . . . . . 11 75 2.1.3. Continuing the Upgraded Connection . . . . . . . . . 12 76 2.2. Upgraded Segment Structure and Format . . . . . . . . . . 12 77 2.2.1. Structure of an Upgraded Segment . . . . . . . . . . 12 78 2.2.2. Format of the InSpace Option . . . . . . . . . . . . 14 79 2.3. Inner TCP Option Processing . . . . . . . . . . . . . . . 15 80 2.3.1. Writing Inner TCP Options . . . . . . . . . . . . . . 15 81 2.3.1.1. Constraints on TCP Fast Open . . . . . . . . . . 15 82 2.3.1.2. Option Alignment . . . . . . . . . . . . . . . . 16 83 2.3.1.3. Sequence Space Coverage . . . . . . . . . . . . . 16 84 2.3.1.4. Presence or Absence of Payload . . . . . . . . . 16 85 2.3.2. Reading Inner TCP Options . . . . . . . . . . . . . . 16 86 2.3.2.1. Reading Inner TCP Options (SYN=1) . . . . . . . . 17 87 2.3.2.2. Reading Inner TCP Options (SYN=0) . . . . . . . . 18 88 2.3.3. Forwarding Inner TCP Options . . . . . . . . . . . . 19 89 2.4. Exceptions . . . . . . . . . . . . . . . . . . . . . . . 20 90 2.5. SYN Flood Protection . . . . . . . . . . . . . . . . . . 20 91 3. Design Rationale . . . . . . . . . . . . . . . . . . . . . . 21 92 3.1. Dual Handshake and Migration to Single Handshake . . . . 21 93 3.2. In-Band Inner Option Space . . . . . . . . . . . . . . . 22 94 3.2.1. Non-Deterministic Magic Number Approach . . . . . . . 22 95 3.2.2. Non-Goal: Security Middlebox Evasion . . . . . . . . 23 96 3.2.3. Avoiding the Start of the First Two Segments . . . . 24 97 3.2.4. Control Options Within Data Sequence Space . . . . . 24 98 3.2.5. Rationale for the Sent Payload Size Field . . . . . . 26 99 3.3. Rationale for the InSpace Option Format . . . . . . . . . 26 100 3.4. Protocol Overhead . . . . . . . . . . . . . . . . . . . . 27 101 4. Interaction with Pre-Existing TCP Implementations . . . . . . 29 102 4.1. Compatibility with Pre-Existing TCP Variants . . . . . . 29 103 4.2. Interaction with Middleboxes . . . . . . . . . . . . . . 31 104 4.3. Interaction with the Pre-Existing TCP API . . . . . . . . 31 105 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 106 6. Security Considerations . . . . . . . . . . . . . . . . . . . 34 107 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 108 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 109 8.1. Normative References . . . . . . . . . . . . . . . . . . 35 110 8.2. Informative References . . . . . . . . . . . . . . . . . 35 111 Appendix A. Protocol Extension Specifications . . . . . . . . . 36 112 A.1. Disabling InSpace and Generic Connection Mode Switching . 37 113 A.2. Dual Handshake: The Explicit Variant . . . . . . . . . . 39 114 A.2.1. SYN-O Structure . . . . . . . . . . . . . . . . . . . 41 115 A.2.2. Retransmission Behaviour - Explicit Variant . . . . . 41 116 A.2.3. Corner Cases . . . . . . . . . . . . . . . . . . . . 42 117 A.2.4. Workround if Data in SYN is Blocked . . . . . . . . . 43 118 A.3. Jumbo InSpace TCP Option (only if SYN=0) . . . . . . . . 44 119 A.4. Upgraded Segment Structure to Traverse DPI boxes . . . . 44 120 Appendix B. Comparison of Alternatives . . . . . . . . . . . . . 46 121 B.1. Implicit vs Explicit Dual Handshake . . . . . . . . . . . 46 122 Appendix C. Protocol Design Issues (to be Deleted before 123 Publication) . . . . . . . . . . . . . . . . . . . . 47 124 Appendix D. Change Log (to be Deleted before Publication) . . . 48 125 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 49 127 1. Introduction 129 TCP has become hard to extend, partly because the option space was 130 limited to 40B when TCP was first defined [RFC0793] and partly 131 because many middleboxes only forward TCP headers that conform to the 132 stereotype they expect. 134 This specification ensures new TCP capabilities can traverse most 135 middleboxes by tunnelling TCP options within the TCP Data as 'Inner 136 Options' (Figure 1). Then the TCP receiver can reconstruct the Inner 137 Options sent by the sender, even if the middlebox resegments the data 138 stream and even if it strips 'Outer' options from the TCP header that 139 it does not recognise. The two words 'Inner Space' are appropriate 140 as a name for the scheme; 'Inner' because it encapsulates options 141 within the TCP Data and 'Space' because the space within the TCP Data 142 is virtually unlimited--constrained only by the maximum segment size. 144 ,-----. TCP Payload ,-----. 145 | App |<----------------------------------------->| App | 146 |-----| |-----| 147 | | Inner Options within TCP Data | | 148 | |<----------------------------------------->| | 149 | | | | 150 | TCP | TCP Header and TCP header and | TCP | 151 | | Outer Options ,---------. Outer Options | | 152 | |<-------------->|Middlebox|<-------------->| | 153 |-----| |---------| |-----| 154 | IP | | IP | | IP | 155 : : : : : : 157 Figure 1: Encapsulation Approach 159 TCP options fall into three main categories: 161 a. Those that have to remain as Outer Options--typically those 162 concerned with transmission of each TCP segment, e.g. Timestamps 163 and Selective ACKnowledgements (SACK); 165 b. Those that are best as Inner Options--typically those concerned 166 with transmission of the data as a stream, e.g. the TCP 167 Authentication Option [RFC5925] or tcpcrypt [I-D.bittau-tcpinc]; 169 c. Those that can be either Inner or Outer Options--typically those 170 used at the start of a connection which is also inherently the 171 start of the first segment so segmentation is not a concern. 173 Pressure of space is most acute in the initial segments of each half- 174 connection, i.e. the SYN and SYN/ACK, and particularly the SYN. Even 175 though Inner Space is not suitable for category (a) options, moving 176 all of categories (b) and (c) into Inner Space frees up plenty of 177 outer space in the header for category (a). 179 The following list of options that might be required on a SYN 180 illustrates how acute the problem is: 182 o 4B: Maximum Segment Size (MSS) [RFC0793]; 184 o 2B: SACK-ok [RFC2018]; 186 o 3B: Window Scale [RFC7323]; 188 o 10B: Timestamp [RFC7323]; 190 o 12B: Multipath TCP [RFC6824]; 191 o 6-18B: TCP Fast Open on a resumed connection 192 [I-D.ietf-tcpm-fastopen]; 194 o 16B: TCP-AO [RFC5925]; 196 There is probably potential for compressing together multiple options 197 in order to mitigate the option space problem. However, the option 198 space problem has to be faced, because complex special placement is 199 already being contemplated for options that can be larger than 40B on 200 their own (e.g. the key agreement options of tcpcrypt 201 [I-D.bittau-tcpinc]). 203 Given the Inner Space protocol places control options within TCP 204 Data, it is critical that a legacy TCP receiver is never confused 205 into passing this mix to an application as if it were pure data. 206 Naively, both ends could handshake to check they understand the 207 protocol, but this would introduce a round of delay and it would not 208 solve the shortage of space in a SYN. Instead, the client uses dual 209 handshakes; one suitable for an upgraded server, and the other for an 210 ordinary server. Then, if the client discovers that the server does 211 not understand the new protocol, it can abort the upgraded handshake 212 before the server passes corrupt data to the application. Otherwise, 213 if the server does understand the new protocol, the client can abort 214 the ordinary handshake. Either way, it has added zero extra delay. 215 Interworking of the dual handshake with TCP Fast Open 216 [I-D.ietf-tcpm-fastopen] is carefully defined so that either server 217 can pass data to the application as soon as the initial SYN arrives. 219 When control options are placed within the TCP Data they inherently 220 get delivered reliably and in order. Although this was not 221 originally recognised as part of the design brief, it offers the 222 significant benefit of simplifying the design of new TCP options. 223 Reliable ordered delivery no longer has to be individually crafted 224 into the design of each new TCP option. 226 Solving the five problems of i) option-space exhaustion; ii) 227 middlebox traversal; iii) legacy server confusion; iv) reliable 228 ordered control message delivery; and v) handshake latency; does not 229 come without cost: 231 o So that the Inner Space protocol is immune to option stripping, it 232 flags its presence using a magic number within the TCP Data of the 233 initial segment in each direction, not a conventional TCP option 234 in the header. This introduces a risk that payload in an ordinary 235 SYN or SYN/ACK might be mistaken for the Inner Space protocol (an 236 initial worst-case estimate of the probability is one connection 237 globally every 40 years). Nonetheless, the risk is zero in the 238 (currently common) case of an ordinary connection without payload 239 during the handshake. There is also no risk of a mistake the 240 other way round--an upgraded connection cannot be mistaken for an 241 ordinary connection. 243 o Although the dual handshake introduces no extra latency, it 244 introduces extra connection processing & state, extra traffic and 245 extra header processing. Initial estimates put the percentage 246 overhead in single digits for connection processing and state, and 247 traffic overhead at only a few hundredths of a percent. 248 Nonetheless, once the most popular TCP servers have upgraded, only 249 a single handshake will be necessary most of the time and overhead 250 should drop to vanishingly small proportions. 252 Finally, it should be noted that the ambition of this work is more 253 than just an incrementally deployable, low latency way to extend TCP 254 option space. The aim is to move towards a more structured way for 255 middleboxes to interact transparently with, rather than arbitrarily 256 interfere with, end-system TCP stacks. This has been achieved for 257 connection and stream control options, but it will still be hard to 258 introduce new per-segment control options, which will still have to 259 be located within the traditional Outer TCP Options. 261 1.1. Motivation for Adoption Now (to be removed before publication) 263 It seems inevitable that ultimately more option space will be needed, 264 particularly given that many of the TCP options introduced recently 265 consume large numbers of bits in order to provide sufficient 266 information entropy, which is not amenable to compression. 268 Extension of TCP option space requires support from both ends. This 269 means it will take many years before the facility is functional for 270 most pairs of end-points. Therefore, given the problem is already 271 becoming pressing, a solution needs to start being deployed now. 273 1.2. Scope 275 This experimental specification extends the TCP wire protocol. It is 276 independent of the dynamic rate control behaviour of TCP and it is 277 independent of (and thus compatible with) any protocol that 278 encapsulates TCP, including IPv4 and IPv6. 280 1.3. Experiment Goals 282 TCP is critical to the robust functioning of the Internet, therefore 283 any proposed modifications to TCP need to be thoroughly tested. 285 Success criteria: The experimental protocol will be considered 286 successful if it satisfies the following requirements in the 287 consensus opinion of the IETF tcpm working group. The protocol 288 needs to be sufficiently well specified so that more than one 289 implementation can be built in order to test its function, 290 robustness, overhead and interoperability (with itself, with 291 previous version of TCP, and with various commonly deployed 292 middleboxes). Non-functional issues such as recommendations on 293 message timing also need to be tested. Various optional 294 extensions to the protocol are proposed in Appendix A so 295 experiments are also needed to determine whether these extensions 296 ought to remain optional, or perhaps be removed or become 297 mandatory. 299 Duration: To be credible, the experiment will need to last at least 300 12 months from publication of the present specification. If 301 successful, it would then be appropriate to progress to a 302 standards track specification, complemented by a report on the 303 experiments. 305 1.4. Document Roadmap 307 The body of the document starts with a full specification of the 308 Inner Space extension to TCP (Section 2). It is rather terse, 309 answering 'What?' and 'How?' questions, but deferring 'Why?' to 310 Section 3. The careful design choices made are not necessarily 311 apparent from a superficial read of the specification, so the Design 312 Rationale section is fairly extensive. The body of the document ends 313 with Section 4 that checks possible interactions between the new 314 scheme and pre-existing variants of TCP, including interaction with 315 partial implementations of TCP in known middleboxes. 317 Appendix A specifies optional extensions to the protocol that will 318 need to be implemented experimentally to determine whether they are 319 useful. And Appendix B discusses the merits of the chosen design 320 against alternative schemes. 322 1.5. Terminology 324 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 325 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 326 document are to be interpreted as described in [RFC2119]. In this 327 document, these words will appear with that interpretation only when 328 in ALL CAPS. Lower case uses of these words are not to be 329 interpreted as carrying RFC-2119 significance. 331 TCP Header: As defined in [RFC0793]. Even though the present 332 specification places TCP options beyond the Data Offset, the term 333 'TCP Header' is still used to mean only those fields at the head 334 of the segment, delimited by the TCP Data Offset. 336 Inner TCP Options (or just Inner Options): TCP options placed in the 337 space that the present specification makes available beyond the 338 Data Offset. 340 Outer TCP Options (or just Outer Options): The TCP options in the 341 traditional location directly after the base TCP Header and before 342 the TCP Data Offset. 344 Prefix TCP Options: Inner Options to be processed before the Outer 345 Options. 347 Suffix TCP Options: Inner Options to be processed after the Outer 348 Options. 350 TCP options: Any TCP options, whether inner, outer or both. This 351 specification makes this term on its own ambiguous so it should be 352 qualified if it is intended to mean TCP options in a certain 353 location. 355 TCP Payload: Data to be passed to the layer above TCP. The present 356 specification redefines the TCP Payload so that it does not 357 include the Inner TCP Options, the Inner Space Option and any 358 Magic Number, even though they are located beyond the Data Offset. 360 TCP Data: The information in a TCP segment after the Data Offset, 361 including the TCP Payload, Inner TCP Options, the Inner Space 362 Option and the Magic Number defined in the present specification. 364 client: The process taking the role of actively opening a TCP 365 connection. 367 server: The process taking the role of TCP listener. 369 Upgraded Segment: A segment that will only be fully understood by a 370 host complying with the present specification (even though it 371 might appear valid to a pre-existing TCP receiver). Similarly, 372 Upgraded SYN, Upgraded SYN/ACK etc. 374 Ordinary Segment: A segment complying with pre-existing TCP 375 specifications but not the present specification. Similarly, 376 Ordinary SYN, Ordinary SYN/ACK etc. 378 Upgraded Connection: A connection starting with an Upgraded SYN. 380 Ordinary Connection: A connection starting with an Ordinary SYN. 382 Upgraded Host: A host complying with the present document as well as 383 with pre-existing TCP specifications. Similarly Upgraded TCP 384 Client, Upgraded TCP Server, etc. 386 Legacy Host: A host complying with pre-existing TCP specifications, 387 but not with the present document. Similarly Legacy TCP Client, 388 Legacy TCP Server, etc. 390 Note that the term 'Ordinary' is used for segments and connections, 391 but the term 'Legacy' is used for hosts. This is because, if the 392 Inner Space protocol were widely used in future, a host that could 393 not open an Upgraded Connection would be considered deficient and 394 therefore 'Legacy', whereas an Ordinary Connection would not be 395 considered deficient in the future; because it will always be 396 legitimate to open an Ordinary Connection if extra option space is 397 not needed. 399 2. Protocol Specification 401 2.1. Protocol Interaction Model 403 2.1.1. Dual 3-Way Handshake 405 During initial deployment, an Upgraded TCP Client sends two 406 alternative SYNs: an Ordinary SYN in case the server is legacy and a 407 SYN-U in case the server is upgraded. The two SYNs MUST have the 408 same network addresses and the same destination port, but different 409 source ports. Once the client establishes which type of server has 410 responded, it continues the connection appropriate to that server 411 type and aborts the other without completing the 3-way handshake. 413 The format of the SYN-U will be described later (Section 2.2.2). At 414 this stage it is only necessary to know that the client can put 415 either TCP options or payload (or both) in a SYN-U, in the space 416 traditionally intended only for payload. So if the server's response 417 shows that it does not recognise the Upgraded SYN-U, the client is 418 responsible for aborting the Upgraded Connection. This ensures that 419 a Legacy TCP Server will never erroneously confuse the application by 420 passing it TCP options as if they were user-data. 422 Section 3.1 explains various strategies the client can use to send 423 the SYN-U first and defer or avoid sending the Ordinary SYN. 424 However, such strategies are local optimizations that do not need to 425 be standardized. The rules below cover the most aggressive case, in 426 which the client sends the SYN-U then the Ordinary SYN back-to-back 427 to avoid any extra delay. Nonetheless, the rules are just as 428 applicable if the client defers or avoids sending the Ordinary SYN. 430 Table 1 summarises the TCP 3-way handshake exchange for each of the 431 two SYNs in the two right-hand columns, between an Upgraded TCP 432 Client (the active opener) and either: 434 1. a Legacy Server, in the top half of the table (steps 2-4), or 436 2. an Upgraded Server, in the bottom half of the table (steps 2-4) 438 Because the two SYNs come from different source ports, the server 439 will treat them as separate connections, probably using separate 440 threads (assuming a threaded server). A load balancer might forward 441 each SYN to separate replicas of the same logical server. Each 442 replica will deal with each incoming SYN independently - it does not 443 need to co-ordinate with the other replica. 445 +------+------------------+--------------------+--------------------+ 446 | | | Ordinary | Upgraded | 447 | | | Connection | Connection | 448 +------+------------------+--------------------+--------------------+ 449 | 1 | Upgraded Client | >SYN | >SYN-U | 450 | | | | | 451 | /\/\ | /\/\/\/\/\/\/\/\ | /\/\/\/\/\/\/\/\/\ | /\/\/\/\/\/\/\/\/\ | 452 | 2 | Legacy Server | ACK | >RST | 458 | | | | | 459 | 4 | | Cont... | | 460 | | | | | 461 | /\/\ | /\/\/\/\/\/\/\/\ | /\/\/\/\/\/\/\/\/\ | /\/\/\/\/\/\/\/\/\ | 462 | 2 | Upgraded Server | RST | >ACK | 468 | | | | | 469 | 4 | | | Cont... | 470 +------+------------------+--------------------+--------------------+ 472 Table 1: Dual 3-Way Handshake in Two Server Scenarios 474 Each column of the table shows the required 3-way handshake exchange 475 within each connection, using the following symbols: 477 > means client to server; 478 < means server to client; 480 Cont... means the TCP connection continues. 482 The connection that starts with an Ordinary SYN is called the 483 'Ordinary Connection' and the one that starts with a SYN-U is called 484 the 'Upgraded Connection'. An Upgraded Server MUST respond to a 485 SYN-U with an Upgraded SYN/ACK (termed a SYN/ACK-U and defined in 486 Section 2.2.2). Then the client recognises that it is talking to an 487 Upgraded Server. The client's behaviour depends on which response it 488 receives first, as follows: 490 o If the client first receives a SYN/ACK response on the Ordinary 491 Connection, it MUST wait for the response on the Upgraded 492 Connection. It then proceeds as follows: 494 * If the response on the Upgraded Connection is an Ordinary SYN/ 495 ACK, the client MUST reset (RST) the Upgraded Connection and it 496 can continue with the Ordinary Connection. 498 * If the response on the Upgraded Connection is an Upgraded SYN/ 499 ACK-U, the client MUST reset (RST) the Ordinary Connection and 500 it can continue with the Upgraded Connection. 502 o If the client first receives an Ordinary SYN/ACK response on the 503 Upgraded Connection, it MUST reset (RST) the Upgraded Connection 504 immediately. It can then wait for the response on the Ordinary 505 Connection and, once it arrives, continue as normal. 507 o If the client first receives an Upgraded SYN/ACK-U response on the 508 Upgraded Connection, it MUST reset (RST) the Ordinary Connection 509 immediately and continue with the Upgraded Connection. 511 2.1.2. Dual Handshake Retransmission Behaviour 513 If the client receives a response to the SYN, but a short while after 514 that {ToDo: duration TBA} the response to the SYN-U has not arrived, 515 it SHOULD retransmit the SYN-U. If latency is more important than 516 the extra TCP option space, in parallel to any retransmission, or 517 instead of any retransmission, the client MAY give up on the Upgraded 518 (SYN-U) Connection by sending a reset (RST) and completing the 3-way 519 handshake of the Ordinary Connection. 521 If the client receives no response at all to either the SYN or the 522 SYN-U, it SHOULD solely retransmit one or the other, not both. If 523 latency is more important than the extra TCP option space, it will 524 retransmit the SYN. Otherwise it will retransmit the SYN-U. It MUST 525 NOT retransmit both segments, because the lack of response could be 526 due to severe congestion. 528 2.1.3. Continuing the Upgraded Connection 530 Once an Upgraded Connection has been successfully negotiated in the 531 SYN, SYN/ACK exchange, either host can allocate any amount of the TCP 532 Data space in any subsequent segment for extra TCP options. In fact, 533 the sender has to use the upgraded segment structure in every 534 subsequent segment of the connection that contains non-zero TCP 535 Payload. The sender can use the upgraded structure in a segment 536 carrying no user-data (e.g. a pure ACK), but it does not have to. 538 As well as extra option space, the facility offers other advantages, 539 such as reliable ordered delivery of Inner TCP Options on empty 540 segments and more robust middlebox traversal. If none of these 541 features is needed, at any point the facility can be disabled for the 542 rest of the connection, using the ModeSwitch TCP option in 543 Appendix A.1. Interestingly, the ModeSwitch options itself can be 544 very simple because it uses the reliable ordered delivery property of 545 Inner Options, rather than having to cater for the possibility that a 546 message to switch to disabled mode might be lost or reordered. 548 2.2. Upgraded Segment Structure and Format 550 2.2.1. Structure of an Upgraded Segment 552 An Upgraded Segment is structured as shown in Figure 2. Up to the 553 TCP Data Offset, the structure is identical to an Ordinary TCP 554 Segment, with a base TCP Header (BaseHdr) and the usual facility to 555 set the Data Offset (DO) to allow space for TCP options. These 556 regular TCP options are renamed by this specification to Outer TCP 557 Options or just Outer Options, and labelled as OuterOpts in the 558 figure. 560 The first segment in each direction (i.e. the SYN or the SYN/ACK) is 561 identifiable as upgraded by the presence of the 4-octet Magic Number 562 A (MagicA) at the start of the TCP Data. The probability that an 563 Upgraded Server will mistake arbitrary data at the beginning of the 564 payload of an Ordinary Segment for the Magic Number has to be allowed 565 for, but it is vanishingly small (see Section 3.2.1). Once an 566 Upgraded Connection has been negotiated during the SYN - SYN/ACK 567 exchange, a magic number is not needed to identify Upgraded Segments, 568 because both ends know that the protocol requires the sender to use 569 the upgraded format on all subsequent segments with non-zero TCP 570 Data. Aside from the magic number, the structure of the rest of an 571 Upgraded Segment is effectively the same whether a) SYN=1 or b) 572 SYN=0. 574 | SOO | 575 a) SYN=1 ,--------->| 576 | DO | 1 | Len | InOO | SPS | 577 ,------------------>,------>,------->,-------------------->,------->| 578 +--------+----------+-------+--------+----------+----------+--------+ 579 | BaseHdr| OuterOpts| MagicA| InSpace|PrefixOpts|SuffixOpts| Payload| 580 +--------+----------+-------+--------+----------+----------+--------+ 581 | '----------.----------' | 582 | Inner Options | 583 `-----------------------.-----------------------' 584 TCP Data 586 b) SYN=0 587 | DO | Len | InOO | SPS | 588 ,------------------>,------->,---------------------->,------->| 589 +--------+----------+--------+-----------------------+--------+ 590 | BaseHdr| OuterOpts| InSpace| Inner Options | Payload| 591 +--------+----------+--------+-----------------------+--------+ 592 `----------------.------------------------' 593 TCP Data 595 All offsets are specified in 4-octet (32-bit) words, except SPS, 596 which is in octets. 598 Figure 2: The Structure of an Upgraded Segment (not to scale) 600 Unlike an Ordinary TCP Segment, the Payload of an Upgraded Segment 601 does not start straight after the TCP Data Offset. Instead, Figure 2 602 shows that space is provided for additional Inner TCP Options before 603 the TCP Payload. The size of this space is termed the Inner Options 604 Offset (InOO). The TCP receiver reads the InOO field from the Inner 605 Option Space (InSpace) option defined in Section 2.2.2. 607 The InSpace Option is located in a standardized location so that the 608 receiver can find it: 610 o On a segment with SYN=1, an Upgraded TCP Sender MUST locate the 611 InSpace Option straight after the magic number, specifically 4 * 612 (DO + 1) octets from the start of the segment. 614 o On a segment with SYN=0, an Upgraded TCP Sender MUST locate the 615 InSpace Option at the beginning of the TCP Data, specifically 4 * 616 DO octets from the start of the segment. 618 Because the InSpace Option is only ever located in a standardized 619 location it does not need to follow the RFC 793 format of a TCP 620 option. Therefore, although we call InSpace an 'option', we do not 621 describe it as a 'TCP option'. 623 The Sent Payload Size (SPS) is also read from within the InSpace 624 Option. If the byte-stream has been resegmented, it allows the 625 receiver to step from one InSpace Option to the next even if the 626 InSpace Options are no longer at the start of each segment (see 627 Section 2.3). 629 On a segment with SYN=1 (i.e. a SYN or SYN/ACK) the Suffix Options 630 Offset (SOO) is also read from within the InSpace Option. It 631 delineates the end of the Prefix TCP Options (PrefixOpts in the 632 figure) and the start of the Suffix TCP Options (SuffixOpts). When 633 SYN=1, the receiver processes PrefixOpts before OuterOpts, then 634 SuffixOpts afterwards. When SYN=0, the receiver processes the Outer 635 Options before the Inner Options. Full details of option processing 636 are given in Section 2.3. 638 2.2.2. Format of the InSpace Option 640 The internal structure of the InSpace Option for an Upgraded SYN or 641 SYN/ACK segment (SYN=1) is defined in Figure 3a) and for a segment 642 with SYN=0 in Figure 3b). 644 0 1 2 3 645 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 646 a) SYN = 1 647 +-------------------------------+---------------------------+---+ 648 | Sent Payload Size (SPS) |Inner Options Offset (InOO)|Len| 649 +-------------------------------+---------------------------+---+ 650 | Magic Number B |Suffix Options Offset (SOO)|CU | 651 +-------------------------------+---------------------------+---+ 653 b) SYN = 0 654 +-------------------------------+---------------------------+---+ 655 | Sent Payload Size (SPS) |Inner Options Offset (InOO)|Len| 656 +-------------------------------+---------------------------+---+ 658 Figure 3: InSpace Option Format 660 The fields are defined as follows (see Section 3.3 for the rationale 661 behind these format choices): 663 Option Length (Len): The 2-bit Len field specifies the length of the 664 InSpace Option in 4-octets words (see Section 3.3 for rationale). 665 For this experimental specification: 667 When SYN=1: the sender MUST use Len=2; 669 When SYN=0: the sender MUST use Len=1. 671 Sent Payload Size (SPS): In this 16-bit field the sender MUST record 672 the size in octets of the TCP Payload when it was sent. This 673 specification defines the TCP Payload as solely the user-data to 674 be passed to the application. This excludes Inner TCP options, 675 the InSpace Option and any magic number. 677 Inner Options Offset (InOO): This 14-bit field defines the total 678 size of the Inner TCP Options in 4-octet words. 680 The following fields are only defined on a segment with SYN=1 (i.e. a 681 SYN or SYN/ACK): 683 Magic Number B: The sender MUST fill this 16-bit field with Magic 684 Number B {ToDo: Value TBA} to further reduce the chance that a 685 receiver will mistake the end of an arbitrary Ordinary Payload for 686 the InSpace Option. 688 Suffix Options Offset (SOO): The 14-bit SOO field defines an 689 additional offset in 4-octet words from the start of the Inner 690 Options that identifies the extent of the Prefix Options (see 691 Section 2.3.2). 693 Currently Unused (CU): The sender MUST fill the CU field with zeros 694 and they MUST be ignored and forwarded unchanged by other nodes, 695 even if their value is different. 697 2.3. Inner TCP Option Processing 699 2.3.1. Writing Inner TCP Options 701 2.3.1.1. Constraints on TCP Fast Open 703 If an Upgraded TCP Client uses a TCP Fast Open (TFO) cookie 704 [I-D.ietf-tcpm-fastopen] in an Upgraded SYN-U, it MUST place the TFO 705 option within the Inner TCP Options, beyond the Data Offset. 707 This rule is specific to TFO, but it can be generalised to any 708 capability similar to TFO as follows: An Upgraded TCP Client MUST NOT 709 place any TCP option in the Outer TCP Options of a SYN if it might 710 cause a TCP server to pass user-data directly to the application 711 before its own 3-way handshake completes. 713 If a client uses TCP Fast Open cookies on both the parallel 714 connection attempts of a dual handshake, an Upgraded Server will 715 deliver the TCP Payload to the application twice before the client 716 aborts the Ordinary Connection. This is not a problem, because 717 [I-D.ietf-tcpm-fastopen] requires that TFO is only used for 718 applications that are robust to duplicate requests. 720 2.3.1.2. Option Alignment 722 If the end of the last Inner TCP Option does not align on a 4-octet 723 boundary, the sender MUST append sufficient no-op TCP options. On a 724 SYN=1 segment, the end of the Prefix TCP Options MUST be similarly 725 aligned. 727 If a block-mode transformation (e.g. compression or encryption) is 728 being used, the sender might have to add some padding options to 729 align the end of the Inner Options with the end of a block. Any 730 future encryption specification will need to carefully define this 731 padding in order not to weaken the cipher. 733 2.3.1.3. Sequence Space Coverage 735 TCP's sequence number and acknowledgement number space MUST include 736 all the TCP Data, i.e. the InSpace Option, any Inner Options, and any 737 magic number as well as the TCP Payload. Similarly, the sender MUST 738 NOT transmit any form of TCP Data unless the advertised receive 739 window is sufficient. These rules have significant implications, 740 which are discussed in Section 3.2.4. 742 2.3.1.4. Presence or Absence of Payload 744 Whenever the sender includes non-zero user-data payload in a segment, 745 it MUST also include an InSpace Option, whether or not there are any 746 Inner Options. 748 If the sender includes no user-data in a segment (e.g. pure ACKs, 749 RSTs) it MAY include an InSpace Option but it does not have to. 750 {ToDo: Consider whether there is any reason to preclude Inner Options 751 on a RST, FIN or FIN-ACK.} 753 Once a sender has included the InSpace Option and possibly other 754 Inner Options on a segment with no TCP Payload, while it has no 755 further user-data to send it SHOULD NOT repeat the same set of 756 control options on subsequent segments. Thus, in a sequence of pure 757 ACKs, any particular set of Inner Options will only appear once, and 758 other pure ACKs will be empty. The only envisaged exception to this 759 rule would be infrequent repetition (i.e. tens of minutes to hours) 760 of the same control options, which might be necessary to provide a 761 heartbeat or keep-alive capability. 763 2.3.2. Reading Inner TCP Options 765 The rules for reading Inner TCP Options are divided between the 766 following two subsections, depending on whether SYN=1 or SYN=0. 768 2.3.2.1. Reading Inner TCP Options (SYN=1) 770 This subsection applies when TCP receives a segment with SYN=1, i.e. 771 when the server receives a SYN or the client receives a SYN/ACK. 773 Before processing any TCP options, unless the size of the TCP Data is 774 less than 8 octets, an Upgraded Receiver MUST determine whether the 775 segment is an Upgraded Segment by checking that all the following 776 conditions apply: 778 o The first 4 octets of the segment match Magic Number A; 780 o The value of the Length field of the InSpace Option is 2; 782 o The value of Magic Number B in the InSpace Option is correct; 784 o The value of the Sent Payload Size matches the size of the TCP 785 Payload. 787 If all these conditions pass, the receiver MAY walk the sequence of 788 Inner TCP Options, using the length of each to check that the sum of 789 their lengths equals InOO. The receiver then concludes that the 790 received segment is an Upgraded Segment. 792 The receiver then processes the TCP Options in the following order: 794 1. Any Prefix TCP options (PrefixOpts in Figure 2) 796 2. Any Outer TCP options (OuterOpts in Figure 2); 798 3. Any Suffix TCP options (SuffixOpts in Figure 2) 800 The receiver removes the magic number, the InSpace Option and each 801 TCP Option from the TCP Data as it processes each. This frees up 802 receive buffer, so the receiver increases its local value of the 803 receive window accordingly. Once only the TCP Payload remains, the 804 receiver holds it ready to pass to the application. It then returns 805 the appropriate Upgraded Acknowledgement to progress the dual 806 handshake (see Section 2.1.1). 808 If any of the above tests to find the InSpace Option fails: 810 1. the receiver concludes that the received segment is an Ordinary 811 Segment. It MUST then proceed by processing any Outer TCP 812 options in the TCP Header in the normal order (OuterOpts in 813 Figure 2). 815 2. If some previous control message causes the TCP receiver to alter 816 the TCP Data (e.g. decompression, decryption), it reruns the 817 above tests to check if the altered TCP Data now looks like an 818 Upgraded Segment. 820 3. If it finds an InSpace Option, it suspends processing the Outer 821 TCP Options and instead processes and removes TCP Options in the 822 following order: 824 1. Any Prefix Inner Options; 826 2. Any remaining Outer TCP Options; 828 3. Any Suffix Inner Options. 830 4. If it does not find an InSpace Option, it continues processing 831 the remaining Outer TCP Options as normal. 833 For the avoidance of doubt the above rules imply that, as long as an 834 InSpace Option has not been found in the segment, the receiver might 835 rerun the tests for it multiple times if multiple Outer TCP Options 836 alter the TCP Data. However, once the receiver has found an InSpace 837 Option, it MUST NOT rerun the tests for an Upgraded Segment in the 838 same segment. 840 If the receiver has not found an InSpace Option after processing all 841 the Outer Options, it returns the appropriate Ordinary 842 Acknowledgement to progress the dual handshake (see Section 2.1.1). 843 As normal, it holds any TCP Payload ready to pass to the application. 845 2.3.2.2. Reading Inner TCP Options (SYN=0) 847 This subsection applies once the TCP connection has successfully 848 negotiated to use the upgraded InSpace structure. 850 As each segment with SYN=0 arrives, the receiver immediately 851 processes any Outer TCP options. 853 As the receiver buffers TCP Data, it uses TCP's regular mechanisms to 854 fill any gaps due to reordering or loss so that it can work its way 855 along the ordered byte-stream. As the receiver encounters each set 856 of Inner Options, it MUST process them in the order they were sent, 857 as illustrated in Figure 4a) in Section 3.2.4. The receiver MUST 858 remove the InSpace Option and Inner TCP Options from the TCP Data as 859 it processes them, adding to the receive window accordingly. Once 860 only the TCP Payload remains the receiver passes it to the 861 application. 863 It uses each InSpace Option to calculate the extent of the associated 864 Inner Options (using InOO), and the amount of payload data before the 865 next InSpace Option (using Sent Payload Size). The receiver MUST NOT 866 locate InSpace Options by assuming there is one at the start of the 867 TCP Data in every segment, because resegmentation might invalidate 868 this assumption. 870 Therefore, the receiver processes the Inner Options in the order they 871 were sent, which is not necessarily the order in which they are 872 received. And if an Inner Option applies to the data stream, the 873 receiver applies it at the point in the data stream where the sender 874 inserted it. As a consequence, the receiver always processes the 875 Inner Options after the Outer Options. 877 The Inner Options are deliberately placed within the byte-stream so 878 that the sender can transform them along with the payload data, e.g. 879 to compress or encrypt them. A previous control message might have 880 required the TCP receiver to alter the byte-stream before passing it 881 to the application, e.g. decompression or decryption. If so, the 882 TCP receiver applies transformations progressively, to one sent 883 segment at a time, in the following order: 885 1. The receiver MUST apply any transformations to the byte-stream up 886 to the end of the next set of Inner Options, i.e. over the extent 887 of the next Sent Payload Size, InSpace Option and any Inner 888 Options. 890 2. The receiver MUST then process and remove the InSpace Option and 891 any Inner Options (which might change the way it transforms the 892 next segment, e.g. a rekey option). 894 3. Having established the extent of the next sent segment, The 895 receiver returns to step 1. 897 2.3.3. Forwarding Inner TCP Options 899 Middleboxes exist that process some aspects of the TCP Header. 900 Although the present specification defines a new location for Inner 901 TCP Options beyond the Data Offset, this is intended for the 902 exclusive use of the destination TCP implementation. Therefore: 904 o A middlebox MUST treat any octets beyond the Data Offset as 905 immutable user-data. Legacy Middleboxes already do not expect to 906 find options beyond the Data Offset anyway. 908 o A middlebox MUST NOT defer data in a segment with SYN=1 to a 909 subsequent segment. 911 A TCP implementation is not necessarily aware whether it is deployed 912 in a middlebox or in a destination, e.g. a split TCP connection might 913 use a regular off-the-shelf TCP implementation. Therefore, a 914 general-purpose TCP that implements the present specification will 915 need a configuration switch to disable any search for options beyond 916 the Data Offset and to enable immediate forwarding of data in a SYN. 918 2.4. Exceptions 920 {ToDo: Define behaviour of forwarding or receiving nodes if the 921 structure or format of an Upgraded Segment is not as specified.} 923 If an Upgraded TCP Receiver receives an InSpace Option with a Length 924 it does not recognise as valid, it MUST drop the packet and 925 acknowledge the octets up to the start of the unrecognised option. 927 Values of Sent Payload Size greater than 2^16 - 25 (=65,511) octets 928 in a regular (non-jumbo) InSpace Option MUST be treated as the 929 distance to the next InSpace option, but they MUST NOT be taken as 930 indicative of the size of the TCP Payload when it was sent. This is 931 because the TCP Payload in a regular IPv6 packet cannot be greater 932 than (2^16 -1 - 20 - 4) octets (given the minimum TCP header is 20 933 octets and the minimum InSpace Option is 4 octets). A Sent Payload 934 Size of 0xFFFF octets MAY be used to minimise the occurrence of empty 935 InSpace options without permanently disabling the Inner Space 936 protocol for the rest of the connection. 938 If the size of the payload is greater than 65,511 octets, the sender 939 MUST use a Jumbo InSpace Option (Appendix A.3). 941 2.5. SYN Flood Protection 943 An implementation of the Inner Space protocol MUST support the 944 EchoCookie TCP option [I-D.briscoe-tcpm-echo-cookie]. To indicate 945 its support for EchoCookie, an Ordinary Client would send an empty 946 EchoCookie TCP option on the SYN. Support for the Inner Space 947 protocol makes this redundant. Therefore an Inner Space client MUST 948 NOT send an empty EchoCookie TCP option on a SYN-U. 950 The EchoCookie TCP option replaces the SYN Cookie mechanism 951 [RFC4987], which only has sufficient space to hold the result of one 952 TCP option negotiation (the MSS), and then only a subset of the 953 possible values (see the discussion under Security Considerations 954 Section 6). 956 3. Design Rationale 958 This section is informative, not normative. 960 3.1. Dual Handshake and Migration to Single Handshake 962 In traditional [RFC0793] TCP, the space for options is limited to 40B 963 by the maximum possible Data Offset. Before a TCP sender places 964 options beyond that, it has to be sure that the receiver will 965 understand the upgraded protocol, otherwise it will confuse and 966 potentially crash the application by passing it TCP options as if 967 they were payload data. 969 The Dual Handshake (Section 2.1.1) ensures that a Legacy TCP Server 970 will never pass on TCP options as if they were user-data. If a SYN 971 carries TCP Data, a TCP server typically holds it back from the 972 application until the 3-way handshake completes. This gives the 973 client the opportunity to abort the Upgraded Connection if the 974 response from the server shows it does not recognise an Upgraded SYN. 976 The strategy of sending two SYNs in parallel is not essential to the 977 Alternative SYN approach. It is merely an initial strategy that 978 minimises latency when the client does not know whether the server 979 has been upgraded. Evolution to a single SYN with greater option 980 space could proceed as follows: 982 o Clients could maintain a white-list of upgraded servers discovered 983 by experience and send just the Upgraded SYN-U in these cases. 985 o Then, for white-listed servers, the client could send an Ordinary 986 SYN only in the rare cases when an attempt to use an Upgraded 987 Connection had previously failed (perhaps a mobile client 988 encountering a new blockage on a new path to a server that it had 989 previously accessed over a good path). 991 o In the longer term, once it can be assumed that most servers are 992 upgraded and the risk of having to fall back to legacy has dropped 993 to near-zero, clients could send just the Upgraded SYN first, 994 without maintaining a white-list, but still be prepared to send an 995 Ordinary SYN in the rare cases when that might fail. 997 There is concern that, although dual handshake approaches might well 998 eventually migrate to a single handshake, they do not scale when 999 there are numerous choices to be made simultaneously. For instance: 1001 o trying IPv6 then IPv4 [RFC6555]; 1002 o and trying SCTP and TCP in parallel 1003 [I-D.wing-tsvwg-happy-eyeballs-sctp]; 1005 o and trying ECN and non-ECN in parallel; 1007 o and so on. 1009 Nonetheless, it is not necessary to try every possible combination of 1010 N choices, which would otherwise require 2^N handshakes (assuming 1011 each choice is between two options). Instead, a selection of the 1012 choices could be attempted together. At the extreme, two handshakes 1013 could be attempted, one with all the new features, and one without 1014 all the new features. 1016 3.2. In-Band Inner Option Space 1018 3.2.1. Non-Deterministic Magic Number Approach 1020 This section justifies the magic number approach by contrasting it 1021 with a more 'conventional' approach. A conventional approach would 1022 use a regular (Outer) TCP option to point to the dividing line within 1023 the TCP Data between the extra Inner Options and the TCP Payload. 1025 This 'conventional' approach cannot provide extra option space over a 1026 path on which a middlebox strips TCP options that it does not 1027 recognise. [Honda11] quantifies the prevalence of such paths. It 1028 reports on experiments conducted in 2010-2011 that found unknown 1029 options were stripped from the SYN-SYN/ACK exchange on 14% of paths 1030 to port 80 (HTTP), 6% of paths to port 443 (HTTPS) and 4% of paths to 1031 port 34343 (unassigned). Further analysis found that the option- 1032 stripping middleboxes fell into two main categories: 1034 o about a quarter appeared to actively remove options that they did 1035 not recognise (perhaps assuming they might be indicative of an 1036 attack?); 1038 o the rest were some type of higher layer proxy that split the TCP 1039 connection, unwittingly failing to pass unknown options between 1040 the two connections. 1042 In contrast, the magic number approach ensures that not only are the 1043 Inner Options tucked away beyond the Data Offset, but the option that 1044 gives the extent of the Inner Options is also beyond the Data Offset 1045 (see Section 2.2.1). This ensures that all the TCP Headers and 1046 options up to the Data Offset are completely indistinguishable from 1047 an Ordinary Segment. It is very unusual for a middlebox not to 1048 forward TCP Data unchanged, so it will be highly likely (but not 1049 certain--see Appendix A.2.4) to forward the extra Inner Options. 1051 The downside of the magic number approach is that it is slightly non- 1052 deterministic, quantified as follows: 1054 o The probability that an Upgraded SYN=1 segment will be mistaken 1055 for an Ordinary Segment is precisely zero. 1057 o In the currently common case of a SYN with zero payload, the 1058 probability that it will be mistaken for an Upgraded Segment is 1059 also precisely zero. 1061 o However, there will be a very small probability (roughly 2^{-66} 1062 or 1 in 74 billion billion (74 * 10^18)) that payload data in an 1063 Ordinary SYN=1 segment could be mistaken for an Upgraded SYN or 1064 SYN/ACK, if it happens to contain a pattern in exactly the right 1065 place that matches the correct Sent Payload Size, Length and Magic 1066 Numbers of an InSpace Option. {ToDo: Estimate how often a 1067 collision will occur globally. Rough estimate: 1 connection 1068 collision globally every 40 years.} 1070 The above probability is based on the assumptions that: 1072 o the magic numbers will be chosen randomly (in reality they will 1073 not--for instance, a magic number that looked just like the start 1074 of an HTTP connection would be rejected) 1076 o data at the start of Ordinary SYN=1 segments is random (in reality 1077 it is not--the first few bytes of most payloads are very 1078 predictable). 1080 Therefore even though 2^{-66} is a vanishingly small probability, the 1081 actual probability of a collision will be much lower. 1083 If a collision does occur, it will result in TCP removing a number of 1084 32-bit words of data from the start of a byte-stream before passing 1085 it to the application. 1087 3.2.2. Non-Goal: Security Middlebox Evasion 1089 The purpose of locating control options within the TCP Data is not to 1090 evade security. Security middleboxes can be expected to evolve to 1091 examine control options in the new inner location. Instead, the 1092 purpose is to traverse middleboxes that block new TCP options 1093 unintentionally--as a side effect of their main purpose--merely 1094 because their designers were too careless to consider that TCP might 1095 evolve. This category of middleboxes tends to forward the TCP 1096 Payload unaltered. 1098 By sitting within the TCP Data, the Inner Space protocol should 1099 traverse enough existing middleboxes to reach critical mass and prove 1100 itself useful. In turn, this will open an opportunity to introduce 1101 integrity protection for the TCP Data (which includes Inner Options). 1102 Whereas today, no operating system would introduce integrity 1103 protection of Outer TCP options, because in too many cases it would 1104 fail and abort the connection. Once the integrity of Inner Options 1105 is protected, it will raise the stakes. Any attempt to meddle with 1106 control options within the TCP Data will not just close off the 1107 theoretical potential benefit of a protocol advance that no-one knows 1108 they want yet; it will fail integrity checks and therefore completely 1109 break any communication. It is unlikely that a network operator will 1110 buy a middlebox that does that. 1112 Then middlebox designers will be on the back foot. To completely 1113 block communications they will need a sound justification. If they 1114 block an attack, that will be fine. But if they want to block 1115 everything abnormal, they will have to block the whole communication, 1116 or nothing. So the operator will want to choose middlebox vendors 1117 who take much more care to ensure their policies track the latest 1118 protocol advances--to avoid costly support calls. 1120 3.2.3. Avoiding the Start of the First Two Segments 1122 Some middleboxes discard a segment sent to a well-known port 1123 (particularly port 80) if the TCP Data does not conform to the 1124 expected app-layer protocol (particularly HTTP). Often such 1125 middleboxes only parse the start of the app-layer header (e.g. Web 1126 filters only continue until they find the URL being accessed, or DPI 1127 boxes only continue until they have identified the application-layer 1128 protocol). 1130 The segment structure defined in Section 2.2.1 would not traverse 1131 such middleboxes. An alternative segment structure that avoids the 1132 start of the first two segments in each direction is defined in 1133 Appendix A.4. It is not mandatory to implement in the present 1134 specification. However, it is hoped that it will be included in some 1135 experimental implementations so that it can be decided whether it is 1136 worth making mandatory. 1138 3.2.4. Control Options Within Data Sequence Space 1140 Including Inner Options within TCP's sequence space gives the sender 1141 a simple way to ensure that control options will be delivered 1142 reliably and in order to the remote TCP, even if the control options 1143 are on segments without user-data. By using TCP's existing stream 1144 delivery mechanisms, it adds no extra protocol processing, no extra 1145 packets and no extra bits. 1147 The sender can even choose to place control options on a segment 1148 without user-data, e.g. to reliably re-key TCP-level encryption on a 1149 connection currently sending no data in one direction. The sender 1150 can even add an InSpace Option without further Inner Options. Then 1151 it can ensure that the segment will automatically be delivered 1152 reliably and in order to the remote TCP, even though it carries no 1153 user-data or other TCP control options, e.g. for a test probe, a 1154 tail-loss probe or a keep-alive. 1156 Figure 4a) illustrates control options arriving reliably and in order 1157 at the receiving TCP stack in comparison with the traditional 1158 approach shown in Figure 4b), in which control options are outside 1159 the sequence space. In the traditional approach, during a period 1160 when the remote TCP is sending no user-data, the local TCP may 1161 receive control options E, B and D without ever knowing that they are 1162 out of order, and without ever knowing that C is missing. 1164 a) __ ____ _______ _ __ 1165 |__|____|_______|_| |__| control 1166 :E : D : C :B: :A : 1167 ________________: : : : :__________________: : 1168 |________________| |__________________| data 1170 b) __ 1171 |__| E 1172 |_|__ B __ 1173 |____|D |__|A control 1174 \ / \ / 1175 ________________\/__________________\/ 1176 |________________||__________________| data 1177 ! 1178 !drop 1179 ____!__ 1180 |_______|C 1182 Figure 4: Control options a) inside vs. b) outside TCP sequence 1183 space` 1185 By including Inner Options within the sequence space, each control 1186 option is automatically bound to the start of a particular byte in 1187 the data stream, which makes it easy to switch behaviour at a 1188 specific point mid-stream (e.g. re-keying or switching to a different 1189 control mode). With traditional TCP options, a bespoke reliable and 1190 ordered binding to the data stream would have to be developed for 1191 each TCP option that needs this capability (e.g. co-ordinating use 1192 of new keys in TCP-AO [RFC5925] or tcpcrypt [I-D.bittau-tcpinc]). 1194 Including Inner Options in sequence also allows the receiver to tell 1195 the sender the exact point at which it encountered an unrecognised 1196 TCP option using only TCP's pre-existing byte-granularity 1197 acknowledgement scheme. 1199 Middleboxes exist that rewrite TCP sequence and acknowledgement 1200 numbers, and they also rewrite options that refer to sequence numbers 1201 (at least those known when the middlebox was produced, such as SACK, 1202 but not any introduced afterwards). If Inner Options were not 1203 included in sequence, the number of bytes beyond the TCP Data Offset 1204 in each segment would not match the sequence number increment between 1205 segments. Then, such middleboxes could unintentionally corrupt the 1206 user-data and options by 'normalising' sequence or acknowledgement 1207 numbering. Fortunately, including Inner Options in sequence improves 1208 robustness against such middleboxes. 1210 3.2.5. Rationale for the Sent Payload Size Field 1212 A middlebox that splits a TCP connection can coalesce and/or divide 1213 the original segments. Segmentation offload hardware introduces 1214 similar resegmentation. Inclusion of the Sent Payload Size field in 1215 the InSpace Option makes the scheme robust against such 1216 resegmentation. 1218 The Sent Payload Size is not strictly necessary on a SYN (SYN=1, 1219 ACK=0) because a SYN is never resegmented. However, for simplicity, 1220 the layout for a SYN is made the same as for a SYN/ACK. This future- 1221 proofs the protocol against the possibility that SYNs might be 1222 resegmented in future. And it makes it easy to introduce the 1223 alternative segment structure of Appendix A.4 if it is needed. 1225 3.3. Rationale for the InSpace Option Format 1227 The format of the InSpace Option (Figure 3) does not necessarily have 1228 to comply with the RFC 793 format for TCP options, because it is not 1229 intended to ever appear in a sequence of TCP options. In particular, 1230 it does not need an Option Kind, because the option is always in a 1231 known location. In effect the magic number serves as a multi-octet 1232 Option Kind for the first InSpace Option, and the location of each 1233 subsequent options is always known as an offset from the previous 1234 one, using InOO and Sent Payload Size fields. 1236 Other aspects of the layout are justified as follows: 1238 Length: Whatever the size of the InSpace Option, the right-hand edge 1239 of the Length field is always located 4 octets from the start of 1240 the option, so that the receiver can find it to determine the 1241 layout of the rest of the option. The option is always a multiple 1242 of 4 octets long, so that any subsequent Inner TCP Options comply 1243 with TCP's option alignment requirements. 1245 Sent Payload Size: This field is 16 bits wide, which is reasonable 1246 given segment size cannot exceed the limits set by the Total 1247 Length field in the IPv4 header and the Payload Length field in 1248 the IPv6 header, both of which are 16 bits wide. 1250 If the sender were to use a jumbogram [RFC2675], it could use the 1251 Jumbo InSpace Option defined in Appendix A.3, which offers a 1252 32-bit Sent Payload Size field. The Jumbo InSpace Option is not 1253 mandatory to implement for the present experimental specification. 1254 Even if it is implemented, it is only defined when SYN=0, given 1255 use of a jumbogram for a SYN or SYN/ACK would significantly exceed 1256 other limits that TCP sets for these segments. 1258 InSpace Options Offset The 14-bit field is in units of 4-octet 1259 words, in order to restrict Inner Options to no less than the size 1260 of a maximum sized segment (given 4 * 2^14 = 2^16 octets). 1262 When SYN=1 the layout of the InSpace Option is extended to include: 1264 Suffix Options Offset: The SOO field is the same 14-bit width as the 1265 InOO field, and for the same reason. Both the SOO and InOO fields 1266 are aligned 2 bits to the left of a word boundary so that they can 1267 be used directly in units of octets by masking out the 2-bit field 1268 to the right. 1270 Magic Number B: The 32-bit size of Magic Number A is not enough to 1271 reduce the probability of mistaking the start of an Ordinary SYN 1272 Payload for the start of the Inner Space protocol. A 64-bit magic 1273 number could have been provided by using the next 4-octet word, 1274 but this would be unnecessarily large. Therefore, when SYN=1, 16 1275 more bits of magic number are provided within the InSpace Option. 1276 Otherwise, these 16-bits would only have to be used for padding to 1277 align with the next 4-octet word boundary anyway. 1279 3.4. Protocol Overhead 1281 The overhead of the Inner Space protocol is quantified as follows: 1283 Dual Handshake: 1285 Latency: 1287 Upgraded Server : zero; 1289 Legacy Server: worst latency of the dual handshakes. 1291 Connection Rate: The typical connection rate will inflate by P*D, 1292 where: 1294 P [0-100%] is the proportion of connections that use extra 1295 option space; 1297 D [0-100%] is the proportion of these that use a dual 1298 handshake (the remainder use a single handshake, e.g. by 1299 caching knowledge of upgraded servers). 1301 For example, if P=80% and D=10%, the connection rate will 1302 inflate by 8%. P is difficult to predict. D is likely to be 1303 small, and in the longer term it should reduce to the 1304 proportion of connections to remaining legacy servers, which 1305 are likely to be the less frequently accessed ones. In the 1306 worst case if both P & D are 100%, the maximum that the 1307 connection rate can inflate by is 100% (i.e. to twice present 1308 levels). 1310 Connection State: Connection state on servers and middleboxes 1311 will inflate by P*D/R, where 1313 R is the average hold time of connection state measured in 1314 round trip times 1316 This is because a server or middlebox only holds dual 1317 connection state for one round trip, until the RST on one of 1318 the two connections. For example, keeping P & D as they were 1319 in the above example, if R = 3 round trips {ToDo: TBA}, 1320 connection state would inflate by 2.7%. In the longer term, any 1321 extra connection state would be focused on legacy servers, with 1322 none on upgraded servers. Therefore, if memory for dual 1323 handshake flow state was a problem, upgrading the server to 1324 support the Inner Space protocol would solve the problem. 1326 Network Traffic: The network traffic overhead is 2*H*P*D/J 1327 counting in bytes or 2*P*D/K counting in packets, where 1329 H is 88B for IPv4 or 108B for IPv6 (assuming the Ordinary SYN 1330 and SYN/ACK have a TCP header packed to the maximum of 60B 1331 with TCP options, they have no TCP payload, their IP headers 1332 have no extensions and the InSpace Option in the SYN-U and 1333 SYN/ACK-U is 8B); 1335 J is the average number of bytes per TCP connection (in both 1336 directions) 1338 K is the average number of packets per TCP connection (in both 1339 directions); 1341 For example, keeping and P & D as they were in the above 1342 example, if J = 50KiB for IPv4 and K = 70 packets (ToDo: TBA), 1343 traffic overhead would be 0.03% counting in bytes or 0.2% 1344 counting in packets. 1346 Processing: {ToDo: Implementation tests} 1348 InSpace Option on every non-empty SYN=0 segment: 1350 Network Traffic: The traffic overhead is P*Q*4/F, where 1352 Q is the proportion of Inner Space connections that leave the 1353 protocol enabled after the initial handshake; 1355 F is the average frame size in bytes (assuming one segment per 1356 frame). 1358 This is because the InSpace option adds 4B per segment. For 1359 example, keeping P as it was in the above example and taking 1360 Q=10% and F=750B, the traffic overhead is 0.04%. It is as 1361 difficult to predict Q as it is to predict P. 1363 Processing: {ToDo: Implementation tests} 1365 4. Interaction with Pre-Existing TCP Implementations 1367 4.1. Compatibility with Pre-Existing TCP Variants 1369 A TCP option MUST by default only be used as an Outer Option, unless 1370 it is explicitly specified that it can (or must) be used as an Inner 1371 Option. The following list of pre-existing TCP options can be 1372 located as Inner Options: 1374 o Maximum Segment Size (MSS) [RFC0793]; 1376 o SACK-ok [RFC2018]; 1378 o Window Scale [RFC7323]; 1380 o Multipath TCP [RFC6824], except the Data ACK part of the Data 1381 Sequence Signal (DSS) option; 1383 o TCP Fast Open [I-D.ietf-tcpm-fastopen]; 1385 o The tcpcrypt CRYPT option [I-D.bittau-tcpinc]. 1387 The following MUST NOT be located as Inner Options: 1389 o Timestamp [RFC7323]; 1391 o SACK [RFC2018]; 1393 o The Data ACK part of the DSS option of Multipath TCP [RFC6824]; 1395 o TCP-AO [RFC5925]; 1397 o The tcpcrypt MAC option [I-D.bittau-tcpinc] as long as it covers 1398 the TCP header. 1400 {ToDo: The above list is not authoritative. Many of the above 1401 schemes involve multiple different types of TCP option, and all the 1402 types need to be separately assessed.} 1404 The Inner Space protocol supports TCP Fast Open, by constraining the 1405 client to obey the rules in Section 2.3.1.1). 1407 All the sub-types of the MPTCP option [RFC6824] except one could be 1408 located as Inner Options. That is, MP_CAPABLE, MP_JOIN, ADD_ADDR(2), 1409 REMOVE_ADDR, MP_PRIO, MP_FAIL, MP_FASTCLOSE. The Data Sequence 1410 Signal (DSS) of MPTCP consists of four separable parts: i) the Data 1411 ACK; ii) the mapping between the Data Sequence Number and the Subflow 1412 Sequence Number over a Data-Level Length; iii) the Checksum; and iv) 1413 the DATA_FIN flag. If MPTCP were re-factored to take advantage of 1414 the Inner Space protocol, all these parts except the Data ACK could 1415 be located as Inner Options (the Checksum would not be necessary). 1417 The MPTCP Data ACK has to remain as an Outer Option otherwise there 1418 would be a risk of flow control deadlock, as pointed out in 1419 [Raiciu12]. For instance, a Web client might pipeline multiple 1420 requests that fill a Web server's receive buffer, while the Web 1421 server might be busy sending a large response to the first request 1422 before it reads the second request. If the Data ACK were an Inner 1423 Option, the Web client would have to stop acknowledging the first 1424 response from the server (due to lack of receive window). Then the 1425 server would not be able to move on to the next request--a classic 1426 deadlock. 1428 The TCP-AO has to be located as an Outer Option to prevent the 1429 possibility of flow-control deadlock (because it would consume 1430 receive window on pure ACKs). 1432 All sub-options of the tcpcrypt CRYPT option could be located as 1433 Inner Options. However, as long as the tcpcrypt MAC option covers 1434 the TCP header and Outer Options, it has to be located as an Outer 1435 Option for the same deadlock reason as TCP-AO. 1437 An Upgraded Server can support SYN Cookies [RFC4987] for Ordinary 1438 Connections. For Upgraded Connections Section 2.5 defines a new 1439 EchoCookie TCP option that is a prerequisite for InSpace 1440 implementations, and provides sufficient space for the more extensive 1441 connection state requirements of an InSpace server. 1443 {ToDo: TCP States and Transitions, Connectionless Resets, ICMP 1444 Handling, Forward-Compatibility.} 1446 4.2. Interaction with Middleboxes 1448 The interaction with the assumptions about TCP made by middleboxes is 1449 covered extensively elsewhere: 1451 o Section 2.3.3 specifies forwarding behaviour for Inner Options; 1453 o The following sections explain the Inner Space protocol approach 1454 to middlebox traversal: 1456 * Section 3.2.1 justifies the magic number approach; 1458 * Section 3.2.2 explains why the protocol will remain robust as 1459 middlboxes evolve; 1461 * Section 3.2.4 justifies including Inner Options in sequence; 1463 * Section 3.2.5) explains how the protocol will remain robust to 1464 resegmentation. 1466 4.3. Interaction with the Pre-Existing TCP API 1468 An aim of the Inner Space protocol is for legacy applications to 1469 continue to just work without modification. Therefore it is expected 1470 that the dual handshaking logic and any maintenance of a cached 1471 white-list of servers that support the Inner Space protocol will be 1472 implemented beneath the well-known socket interface. 1474 Inner Space implementations will need to comply with the following 1475 behaviours to ensure that legacy applications continue to receive 1476 predictable behaviour from the socket interface: 1478 Querying local port (TCP client): If an application calls 1479 "getsockname()" while the TCP client behind the socket is engaged 1480 in a dual TCP handshake, the call SHOULD block until the local TCP 1481 has aborted one of the connections so it knows which of the two 1482 ports will continue to be used. 1484 Binding to an explicit port: If an application specifies that it 1485 wants the TCP client to use a specific port, the Inner Space 1486 capability MUST be disabled, because the dual handshake has to try 1487 two ports. Use of a specific port might be necessary, for example 1488 in a port-testing application or if the application wants to 1489 explicitly control all the handshaking logic of the Inner Space 1490 protocol itself. 1492 Logging: The dual handshake will show up as a specific signature in 1493 logs of network activity. Log formats might not be able to record 1494 two local ports against one socket, so logs might contain 1495 unexpected or erroneous data. Even if logs correctly track both 1496 connection attempts, log analysis software might not expect to see 1497 one socket attempt to use two different ports. {ToDo: All this 1498 needs to be turned into a predictability requirement.} 1500 Note that Inner Space has no impact on queries for the remote port 1501 from a TCP server. If an application calls "getpeername()" while the 1502 TCP server behind the socket is (unwittingly) engaged in a dual 1503 handshake, it will return the port of the remote client, even though 1504 this connection might subsequently be aborted. This is because a TCP 1505 server is not aware of whether it is part of a dual handshake. 1507 It would be appropriate to enable the Inner Space protocol on a per- 1508 host or per-user basis. The necessary configuration switch does not 1509 need to be standardised, but it might allow the following three 1510 states: 1512 Enabled: The stack will enable Inner Space on any TCP connection 1513 that that needs Inner Space for its TCP options. The stack might 1514 still disable the Inner Space protocol autonomously after the 1515 initial handshake if it is not needed. 1517 Forwarding: The Forwarding mode is for TCP implementations on 1518 middleboxes that implement split TCP connections, as discussed in 1519 Section 2.3.3. Forwarding mode is similar to Disabled, except it 1520 forwards data in SYN without deferring it until the incoming 1521 connection is established. 1523 Disabled: Inner Space is not enabled by default on any connections, 1524 except those that specifically request it. 1526 The socket API might also need to be extended for future applications 1527 that want to control the Inner Space protocol explicitly. Experience 1528 will determine the best API, so these ideas are merely informational 1529 suggestions at this stage: 1531 Enabling/disabling Inner Space: As well as the above per-host or 1532 per-user switches, the extended API might need to allow an 1533 application to disable Inner Options on a per-socket basis (e.g. 1534 for testing). A socket might need to be opened in one of three 1535 possible Inner Space modes: i) Enabled; ii) Enabled initially but 1536 can be disabled autonomously by the stack if redundant; iii) 1537 Enabled initially, then disables itself after the SYN/ACK; and iv) 1538 Disabled. It also ought to be possible for an application to 1539 disable Inner Options on-demand mid-connection. 1541 Querying support for Inner Space: An application might need to be 1542 able to determine whether the host supports Inner Space and in 1543 which mode it is enabled on a particular socket. For instance, an 1544 application might need to choose different socket options 1545 depending on whether Inner Space is enabled to make the necessary 1546 space available. 1548 Latency vs Efficiency: A socket that prefers efficient use of 1549 connection state over latency might use the optional explicit 1550 variant of the dual handshake (Appendix B). It is unlikely that a 1551 new option specific to Inner Space would be needed to express this 1552 preference, as many operating systems already offer a similar 1553 socket option. 1555 Logging: Log formats and log analysis software might need to be 1556 extended to distinguish between the deliberate RST within the dual 1557 handshake and an unexpected connection RST. 1559 5. IANA Considerations 1561 This specification requires IANA to allocate values from the TCP 1562 Option Kind name-space against the following names: 1564 o "Inner Option Space Upgraded (InSpaceU)" 1566 o "Inner Option Space Ordinary (InSpaceO)" 1568 o "ModeSwitch" 1570 Early implementation before the IANA allocation MUST follow [RFC6994] 1571 and use experimental option 254 and respective Experiment IDs: 1573 o 0xUUUU (16 bits); 1575 o 0xOOOO (16 bits); 1576 o 0xMMMM (16 bits); 1578 {ToDo: Values TBA and register them with IANA} then migrate to the 1579 assigned option after allocation. 1581 6. Security Considerations 1583 Certain cryptographic functions have different coverage rules for the 1584 TCP Header and TCP Payload. Placing some TCP options beyond the Data 1585 Offset could mean that they are treated differently from regular TCP 1586 options. This is a deliberate feature of the protocol, but 1587 application developers will need to be aware that this is the case. 1589 A malicious host can send bogus SYN segments with a spoofed source IP 1590 address (a SYN flood attack). The Inner Space protocol does not 1591 alter the feasibility of this attack. However, the extra space for 1592 TCP options on a SYN allows the attacker to include more TCP options 1593 on a SYN than before, so it can make a server do more option 1594 processing before replying with a SYN/ACK. To mitigate this problme, 1595 a server under stress could deprioritise SYNs with longer option 1596 fields to focus its resources on SYNs that require less processing. 1598 Each SYN in a SYN flood attack causes a TCP server to consume memory. 1599 The Inner Space protocol allows a potentially large amount of TCP 1600 option state to be negotiated during the SYN exchange, which could 1601 exhaust the TCP server's memory. The EchoCookie TCP option (see 1602 Section 2.5) allows the server to place this state in a cookie and 1603 send it on the SYN/ACK to the purported address of the client--rather 1604 than hold it in memory. 1606 Then, as long as the client returns the cookie on the acknowledgement 1607 and the server verifies it, the server can recover its full record of 1608 all the TCP options it negotiated and continue the connection without 1609 delay. On the other hand, the server's responses to SYNs from 1610 spoofed addresses will scatter to those spoofed addresses and the 1611 server will not have consumed any memory while waiting in vain for 1612 them to reply. See the Security Considerations in 1613 [I-D.briscoe-tcpm-echo-cookie] for how the EchoCookie facility 1614 protects against reflection and amplification attacks. 1616 7. Acknowledgements 1618 The idea of this approach grew out of discussions with Joe Touch 1619 while developing draft-touch-tcpm-syn-ext-opt, and with Jana Iyengar 1620 and Olivier Bonaventure. The idea that it is architecturally 1621 preferable to place a protocol extension within a higher layer, and 1622 code its location into upgraded implementations of the lower layer, 1623 was originally articulated by Rob Hancock. {ToDo: Ref?} The following 1624 people provided useful review comments: Joe Touch, Yuchung Cheng, 1625 John Leslie, Mirja Kuehlewind, Andrew Yourtchenko, Costin Raiciu, 1626 Marcelo Bagnulo Braun, Julian Chesterfield and Jaime Garcia. 1628 Bob Briscoe's contribution is part-funded by the European Community 1629 under its Seventh Framework Programme through the Trilogy 2 project 1630 (ICT-317756) and the Reducing Internet Transport Latency (RITE) 1631 project (ICT-317700). The views expressed here are solely those of 1632 the author. 1634 8. References 1636 8.1. Normative References 1638 [I-D.ietf-tcpm-fastopen] 1639 Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 1640 Fast Open", draft-ietf-tcpm-fastopen-10 (work in 1641 progress), September 2014. 1643 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 1644 793, September 1981. 1646 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1647 Requirement Levels", BCP 14, RFC 2119, March 1997. 1649 [RFC6994] Touch, J., "Shared Use of Experimental TCP Options", RFC 1650 6994, August 2013. 1652 8.2. Informative References 1654 [Honda11] Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A., 1655 Handley, M., and H. Tokuda, "Is it Still Possible to 1656 Extend TCP?", Proc. ACM Internet Measurement Conference 1657 (IMC'11) 181--192, November 2011. 1659 [I-D.bittau-tcpinc] 1660 Bittau, A., Boneh, D., Hamburg, M., Handley, M., Mazieres, 1661 D., and Q. Slack, "Cryptographic protection of TCP Streams 1662 (tcpcrypt)", draft-bittau-tcpinc-01 (work in progress), 1663 July 2014. 1665 [I-D.briscoe-tcpm-echo-cookie] 1666 Briscoe, B., "The Echo Cookie TCP Option", draft-briscoe- 1667 tcpm-echo-cookie-00 (work in progress), October 2014. 1669 [I-D.wing-tsvwg-happy-eyeballs-sctp] 1670 Wing, D. and P. Natarajan, "Happy Eyeballs: Trending 1671 Towards Success with SCTP", draft-wing-tsvwg-happy- 1672 eyeballs-sctp-02 (work in progress), October 2010. 1674 [Iyengar10] 1675 Iyengar, J., Ford, B., Ailawadi, D., Amin, S., Nowlan, M., 1676 Tiwari, N., and J. Wise, "Minion--an All-Terrain Packet 1677 Packhorse to Jump-Start Stalled Internet Transports", 1678 Proc. Int'l Wkshp on Protocols for Future, Large-scale & 1679 Diverse Network Transports (PFLDnet'10) , November 2010. 1681 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 1682 Selective Acknowledgment Options", RFC 2018, October 1996. 1684 [RFC2675] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", 1685 RFC 2675, August 1999. 1687 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 1688 Mitigations", RFC 4987, August 2007. 1690 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1691 Authentication Option", RFC 5925, June 2010. 1693 [RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with 1694 Dual-Stack Hosts", RFC 6555, April 2012. 1696 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 1697 "TCP Extensions for Multipath Operation with Multiple 1698 Addresses", RFC 6824, January 2013. 1700 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 1701 Scheffenegger, "TCP Extensions for High Performance", RFC 1702 7323, September 2014. 1704 [Raiciu12] 1705 Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., 1706 Duchene, F., Bonaventure, O., and M. Handley, "How Hard 1707 Can It Be? Designing and Implementing a Deployable 1708 Multipath TCP", Proc. USENIX Symposium on Networked 1709 Systems Design and Implementation , April 2012. 1711 Appendix A. Protocol Extension Specifications 1713 This appendix specifies protocol extensions that are OPTIONAL while 1714 the specification is experimental. If an implementation includes an 1715 extension, this section gives normative specification requirements. 1717 However, if the extension is not implemented, the normative 1718 requirements can be ignored. 1720 {Temporary note: The IETF may wish to consider making some of these 1721 extensions mandatory to implement if early testing shows they are 1722 useful or even necessary. Or it may wish to make at least the 1723 receiving side mandatory to implement to ensure that two-ended 1724 experiments are more feasible.} 1726 A.1. Disabling InSpace and Generic Connection Mode Switching 1728 This appendix is normative. It is separated from the body of the 1729 specification because it is OPTIONAL to implement while the Inner 1730 Space protocol is experimental. It defines the new ModeSwitch TCP 1731 option illustrated in Figure 5. This option provides a facility to 1732 disable the Inner Space protocol for the remainder of a connection. 1733 It also provides a general-purpose facility for a TCP connection to 1734 co-ordinate between the endpoints before switching into a yet-to-be- 1735 defined mode. 1737 0 1 2 1738 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 1739 +---------------+---------------+-----------+-+-+ 1740 | ModeSwitch | Length=3 |Flags (CU) |I|R| 1741 +---------------+---------------+-----------+-+-+ 1743 Figure 5: The ModeSwitch TCP Option 1745 The Option Kind is ModeSwitch, the value of which is to be allocated 1746 by IANA {ToDo: Value TBA}. ModeSwitch MUST be used only as an Inner 1747 Option, because it uses the reliable ordered delivery property of 1748 Inner Options. Therefore implementation of the Inner Space protocol 1749 is REQUIRED for an implementation of ModeSwitch. Nonetheless, 1750 ModeSwitch is a generic facility for switching a connection between 1751 yet-to-be-defined modes that do not have to relate to extra option 1752 space. 1754 The sender MUST set the option Length to 3 (octets). The Length 1755 field MUST be forwarded unchanged by other nodes, even if its value 1756 is different. 1758 The Flags field is available for defining modes of the connection. 1759 Only two connection modes are currently defined. The first 6 bits of 1760 the Flags field are Currently Unused (CU) and the sender MUST set 1761 them to zero. The CU flags MUST be ignored and forwarded unchanged 1762 by other nodes, even if their value is non-zero. 1764 The two 1-bit connection mode flags that are currently defined have 1765 the following meanings: 1767 o R: Request flag if 1. Request mode is a special mode that allows 1768 the hosts to co-ordinate a change to any other mode(s); 1770 o I: Inner Space mode: Enabled if 1, Disabled if 0. 1772 The default Inner Space mode at the start of a connection is I=1, 1773 meaning Inner Space is in enabled mode. 1775 The procedure for changing a mode or modes is as follows: 1777 o The host that wants to change modes (the requester) sends a 1778 ModeSwitch message as an Inner Option with R=1 and with the other 1779 flag(s) set to the mode(s) it wants to change to. The requester 1780 does not change modes yet. 1782 o The responder echoes the mode flag(s) it is willing to change to, 1783 with the request flag R=0. 1785 o The half-connection from the responder changes to the mode(s) it 1786 confirms directly after the end of the segment that echoes its 1787 confirmation, i.e. after the last octet of the TCP Payload 1788 following the ModeSwitch option that echoes its confirmation. 1789 Therefore it sends the segment carrying the confirmation in the 1790 prior mode(s) of the connection. 1792 o Once the requester receives the responder's confirmation message, 1793 it re-echoes its confirmation of the responder's confirmation, 1794 with the mode(s) set to those that both hosts agree on and R=0. 1796 o The half-connection from the requester changes to the mode(s) it 1797 confirms directly after the end of the segment that re-echoes its 1798 confirmation. Therefore it sends the segment carrying the 1799 confirmation in the prior mode(s) of the connection. 1801 o The responder can refuse a request to change into a mode in any 1802 one of three ways: 1804 * either implicitly by never confirming it; 1806 * or explicitly by sending a message with R=0 and the opposite 1807 mode; 1809 * or explicitly be sending a counter-request to switch to the 1810 opposite mode (that the connection is already in) with R=1. 1812 The regular TCP sequence numbers and acknowledgement numbers of 1813 requests or confirmations can be used to disambiguate overlapping 1814 requests or responses. 1816 Once a host switches to Disabled mode, it MUST NOT send any further 1817 InSpace Options. Therefore it can send no further Inner Options and 1818 it cannot switch back to Enabled mode for the rest of the connection. 1820 To temporarily reduce InSpace overhead without permanently disabling 1821 the protocol, the sender can use a value of 0xFFFF in the Sent 1822 Payload Size (see Section 2.4). 1824 A.2. Dual Handshake: The Explicit Variant 1826 This appendix is normative. It is separated from the body of the 1827 specification because it is OPTIONAL to implement while the Inner 1828 Space protocol is experimental. It is not mandatory to implement 1829 because it will be more useful once the Inner Space protocol has 1830 become accepted widely enough that fewer middleboxes will discard SYN 1831 segments carrying this option (see Appendix B for when best to deploy 1832 it). It only works if both ends support it, but it can be deployed 1833 one end at a time, so there is no need for support in early 1834 experimental implementations. 1836 {Temporary note: The choice between the explicit handshake in the 1837 present section or the handshake in Section 2.1.1 is a tradeoff 1838 between robustness against middlebox interference and minimal server 1839 state. During the IETF review process, one might be chosen as the 1840 only variant to go forward, at which point the other will be deleted. 1841 Alternatively, the IETF could require a server to understand both 1842 variants and a client could be implemented with either, or both. If 1843 both, the application could choose which to use at run-time. Then we 1844 will need a section describing the necessary API.} 1846 This explicit dual handshake is similar to that in Section 2.1.1, 1847 except the SYN that the Upgraded Client sends on the Ordinary 1848 Connection is explicitly distinguishable from the SYN that would be 1849 sent by a Legacy Client. Then, if the server actually is an Upgraded 1850 Server, it can reset the Ordinary Connection itself, rather than 1851 creating connection state for at least a round trip until the client 1852 resets the connection. 1854 For an explicit dual handshake, the TCP client still sends two 1855 alternative SYNs: a SYN-O intended for Legacy Servers and a SYN-U 1856 intended for Upgraded Servers. The two SYNs MUST have the same 1857 network addresses and the same destination port, but different source 1858 ports. Once the client establishes which type of server has 1859 responded, it continues the connection appropriate to that server 1860 type and aborts the other. The SYN intended for Upgraded Servers 1861 includes additional options within the TCP Data (the SYN-U defined as 1862 before in Section 2.2.1). 1864 Table 2 summarises the TCP 3-way handshake exchange for each of the 1865 two SYNs in the two right-hand columns, between an Upgraded TCP 1866 Client (the active opener) and either: 1868 1. a Legacy Server, in the top half of the table (steps 2-4), or 1870 2. an Upgraded Server, in the bottom half of the table (steps 2-4) 1872 The table uses the same layout and symbols as Table 1, which has 1873 already been explained in Section 2.1.1. 1875 +------+------------------+--------------------+--------------------+ 1876 | | | Ordinary | Upgraded | 1877 | | | Connection | Connection | 1878 +------+------------------+--------------------+--------------------+ 1879 | 1 | Upgraded Client | >SYN-O | >SYN-U | 1880 | | | | | 1881 | /\/\ | /\/\/\/\/\/\/\/\ | /\/\/\/\/\/\/\/\/\ | /\/\/\/\/\/\/\/\/\ | 1882 | 2 | Legacy Server | ACK | >RST | 1888 | | | | | 1889 | 4 | | Cont... | | 1890 | | | | | 1891 | /\/\ | /\/\/\/\/\/\/\/\ | /\/\/\/\/\/\/\/\/\ | /\/\/\/\/\/\/\/\/\ | 1892 | 2 | Upgraded Server | ACK | 1895 | | | | | 1896 | 4 | | | Cont... | 1897 +------+------------------+--------------------+--------------------+ 1899 Table 2: Explicit Variant of Dual 3-Way Handshake in Two Server 1900 Scenarios 1902 As before, an Upgraded Server MUST respond to a SYN-U with a SYN/ACK- 1903 U. Then, the client recognises that it is talking to an Upgraded 1904 Server. 1906 Unlike before, an Upgraded Server MUST respond to a SYN-O with a RST. 1907 However, the client cannot rely on this behaviour, because a 1908 middlebox might be stripping Outer TCP Options which would turn the 1909 SYN-O into a regular SYN before it reached the server. Then the 1910 handshake would effectively revert to the implicit variant. 1911 Therefore the client's behaviour still depends on which SYN-ACK 1912 arrives first, so its response to SYN-ACKs has to follow the rules 1913 specified for the implicit handshake variant in Section 2.1.1. 1915 The rules for processing TCP options are also unchanged from those in 1916 Section 2.3. 1918 A.2.1. SYN-O Structure 1920 The SYN-O is merely a SYN with an extra InSpaceO Outer TCP Option as 1921 shown in Figure 6. It merely identifies that the SYN is opening an 1922 Ordinary Connection, but explicitly identifies that the client 1923 supports the Inner Space protocol. 1925 0 1 1926 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1927 +---------------+---------------+ 1928 | Kind=InSpaceO | Length=2 | 1929 +---------------+---------------+ 1931 Figure 6: An InSpaceO TCP Option Flag 1933 An InSpaceO TCP Option has Option Kind InSpaceO with value {ToDo: 1934 Value TBA} and MUST have Length = 2 octets. 1936 To use this option, the client MUST place it with the Outer TCP 1937 Options. A Legacy Server will just ignore this TCP option, which is 1938 the normal behaviour for an option that TCP does not recognise 1939 [RFC0793]. 1941 A.2.2. Retransmission Behaviour - Explicit Variant 1943 If the client receives a RST on one connection, but a short while 1944 after that {ToDo: duration TBA} the response to the SYN-U has not 1945 arrived, it SHOULD retransmit the SYN-U. If latency is more 1946 important than the extra TCP option space, in parallel to any 1947 retransmission, or instead of any retransmission, the client MAY send 1948 a SYN without any InSpace TCP Option, in case this is the cause of 1949 the black-hole. However, the presence of the RST implies that the 1950 SYN with the InSpaceO TCP Option (the SYN-O) probably reached the 1951 server, therefore it is more likely (but not certain) that the lack 1952 of response on the other connection is due to transmission loss or 1953 congestion loss. 1955 If the client receives no response at all to either the SYN-O or the 1956 SYN-U, it SHOULD solely retransmit one or the other, not both. If 1957 latency is more important than the extra TCP option space, it SHOULD 1958 send a SYN without an InSpaceO TCP Option. Otherwise it SHOULD 1959 retransmit the SYN-U. It MUST NOT retransmit both segments, because 1960 the lack of response could be due to severe congestion. 1962 A.2.3. Corner Cases 1964 There is a small but finite possibility that the Explicit Dual 1965 Handshake might encounter the cases below. The Implicit Handshake 1966 (Section 2.1.1) is robust to these possibilities, but the Explicit 1967 Handshake is not, unless the following additional rules are followed: 1969 Both successful: This could occur if one load-sharing replica of a 1970 server is upgraded, while another is not. This could happen in 1971 either order but, in both cases, the client aborts the last 1972 connection to respond: 1974 * The client completes the Ordinary Handshake (because it 1975 receives a SYN/ACK), but then, before it has aborted the 1976 Upgraded Connection, it receives a SYN/ACK-U on it. In this 1977 case, the client MUST abort the Upgraded Connection even though 1978 it would work. Otherwise the client will have opened both 1979 connections, one with Inner TCP Options and one without. This 1980 could confuse the application. 1982 * The client completes the Upgraded Connection after receiving a 1983 SYN/ACK-U, but then it receives a SYN/ACK in response to the 1984 SYN-O. In this case, the client MUST abort the connection it 1985 initiated with the SYN-O. 1987 Both aborted: The client might receive a RST in response to its SYN- 1988 O, then an Ordinary SYN/ACK on its Upgraded Connection in response 1989 to its SYN-U. This could occur i) if a split connection middlebox 1990 actively forwards unknown options but holds back or discards data 1991 in a SYN; or ii) if one load-sharing replica of a server is 1992 upgraded, while another is not. 1994 Whatever the likely cause, the client MUST still respond with a 1995 RST on its Upgraded Connection. Otherwise, its Inner TCP Options 1996 will be passed as user-data to the application by a Legacy Server. 1998 If confronted with this scenario where both connections are 1999 aborted, the client will not be able to include extra options on a 2000 SYN, but it might still be able to set up a connection with extra 2001 option space on all the other segments in both directions using 2002 the approach in Appendix A.2.4. If that doesn't work either, the 2003 client's only recourse is to retry a new dual handshake on 2004 different source ports, or ultimately to fall-back to sending an 2005 Ordinary SYN. 2007 A.2.4. Workround if Data in SYN is Blocked 2009 If a path either holds back or discards data in a SYN-U, but there is 2010 evidence that the server is upgraded from a RST response to the SYN- 2011 O, the strategy below might at least allow a connection to use extra 2012 option space on all the segments except the SYN. 2014 It is assumed that the symptoms described in the 'both aborted' case 2015 (Appendix A.2.3) have occurred, i.e. the server has responded to the 2016 SYN-O with a RST, but it has responded to the SYN-U with an Ordinary 2017 SYN/ACK not a SYN/ACK-U, so the client has had to RST the Upgraded 2018 Connection as well. In this case, the client SHOULD attempt the 2019 following (alternatively it MAY give up and fall back to opening an 2020 Ordinary TCP connection). 2022 The client sends an 'Alternative SYN-U' by including an InSpaceU 2023 Outer TCP Option (Figure 7). This Alternative SYN-U merely flags 2024 that the client is attempting to open an Upgraded Connection. The 2025 client MUST NOT include any Inner Options or InSpace Option or Magic 2026 Number. If the previous aborted SYN/ACK-U acknowledged the data that 2027 the client sent within the original SYN-U, the client SHOULD resend 2028 the TCP Payload data in the Alternative SYN-U, otherwise it might as 2029 well defer it to the first data segment. 2031 0 1 2032 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 2033 +---------------+---------------+ 2034 | Kind=InSpaceU | Length=2 | 2035 +---------------+---------------+ 2037 Figure 7: An InSpaceU Flag TCP option 2039 An InSpaceU Flag TCP Option has Option Kind InSpaceU with value 2040 {ToDo: Value TBA} and MUST have Length = 2 octets. 2042 To use this option, the client MUST place it with the Outer TCP 2043 Options. A Legacy Server will just ignore this TCP option, which is 2044 the normal behaviour for an option that TCP does not recognise 2045 [RFC0793]. Because the client has received a RST from the server in 2046 response to the SYN-O it can assume that the server is upgraded. So 2047 the client probably only needs to send a single Alternative SYN-U in 2048 this repeat attempt. Nonetheless, the RST might have been spurious. 2050 Therefore the client MAY also send an Ordinary SYN in parallel, i.e. 2051 using the Implicit Dual Handshake (Section 2.1.1). 2053 If an Upgraded Server receives a SYN carrying the InSpaceU option, it 2054 MUST continue the rest of the connection as if it had received a full 2055 SYN-U (Section 2.2), i.e. by processing any Outer Options in the 2056 SYN-U and responding with a SYN/ACK-U. 2058 A.3. Jumbo InSpace TCP Option (only if SYN=0) 2060 This appendix is normative. It is separated from the body of the 2061 specification because it is OPTIONAL to implement while the Inner 2062 Space protocol is experimental. In experimental implementations, it 2063 will be sufficient to implement the required behaviour for when the 2064 Length of a received InSpace Option is not recognised (Section 2.4). 2066 If the IPv6 Jumbo extension header is used, the SentPayloadSize field 2067 will need to be 4 octets wide, not 2 octets. This section defines 2068 the format of the InSpace Option necessary to support jumbograms. 2070 If sending a jumbogram, a sender MUST use the InSpace Option format 2071 defined in Figure 8. All the fields have the same meanings as 2072 defined in Section 2.2.2, except InOO and SentPayloadSize use more 2073 bits. 2075 When reading a segment, the Jumbo InSpace Option could be present in 2076 a packet that is not a jumbogram (e.g. due to resegmentation). 2077 Therefore a receiver MUST use the Jumbo InSpace Option to work along 2078 the stream irrespective of whether arriving packets are jumbo sized 2079 or not. 2081 0 1 2 3 2082 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2083 +-----------------------------------------------------------+---+ 2084 | Inner Options Offset (InOO) |Len| 2085 +-----------------------------------------------------------+---+ 2086 | Sent Payload Size (SPS) | 2087 +---------------------------------------------------------------+ 2089 Figure 8: InSpace Option for a Jumbo Data-UNJH 2091 A.4. Upgraded Segment Structure to Traverse DPI boxes 2093 This appendix is normative. It is separated from the body of the 2094 specification because it is OPTIONAL to implement while the Inner 2095 Space protocol is experimental. If a receiver has implemented the 2096 Inner Space protocol but not this extension, no mechanism is provided 2097 for it to ask the sender to fall-back to the base Inner Space 2098 protocol if it is sent a segment formatted according to this 2099 extension. However, it will at least fall-back naturally to regular 2100 TCP behaviour because of the dual handshake. 2102 In experiments conducted between 2010 and 2011, [Honda11] reported 2103 that 7 of 142 paths (about 5%) blocked access to port 80 if the 2104 payload was not parsable as valid HTTP. This variant of the 2105 specification has been defined in case experiments prove that it 2106 significantly improves traversal of such deep packet inspection (DPI) 2107 boxes. 2109 This variant starts the TCP Data with the expected app-layer headers 2110 on the first two segments in each direction: 2112 SYN=1: The structure in Figure 9a) is used on a SYN or SYN/ACK. The 2113 sender locates the 4-octet Magic Number A at the end of the 2114 segment. The sender right-aligns the 8-octet InSpace Option just 2115 before Magic Number A. Then it right-aligns the Inner Options 2116 against the InSpace Option, all after the end of the TCP Payload. 2117 The start of the Inner Options is therefore 4 * (InOO +3) octets 2118 before the end of the segment, where InOO is read from within the 2119 InSpace Option. 2121 A receiver implementation will check whether Magic Number A is 2122 present at the end of the segment if it does not first find it at 2123 the start of the segment. Although the InnerOptions are located 2124 at the end of the TCP Payload, they are considered to be applied 2125 before the first octet of the TCP Payload. 2127 SYN=0: The structure of the first non-SYN segment that contains any 2128 TCP Data is shown in Figure 9b). 2130 The receiver will find the second InSpace Option (InSpace#2) 2131 located SPS#1 octets from the start of the segment, where SPS#1 is 2132 the value of Sent Payload Size that was read from the InSpace 2133 Option in the previous (SYN=1) segment that started the half- 2134 connection. Although the Inner Options are shifted, as for the 2135 first segment, they are still considered to be applied at the 2136 start of the TCP Data in this second segment. 2138 From the second InSpace Option onwards, the structure of the stream 2139 reverts to that already defined in Section 2.2.1. So the value of 2140 Sent Payload Size (SPS#2) in the second InSpace Option (InSpace #2) 2141 defines the length of any remaining TCP Payload before the end of the 2142 first data segment, as shown. 2144 TCP Data 2145 .------------------------'----------------------. 2146 | Inner Options | 2147 a) SYN=1 | .----------'----------. | 2148 +--------+----------+--------+----------+----------+---------+------+ 2149 | BaseHdr| OuterOpts| Payload|PrefixOpts|SuffixOpts|InSpace#1|MagicA| 2150 +--------+----------+--------+----------+----------+---------+------+ 2151 | DO | | SOO | | | 1 | 2152 `------------------>| `--------->| | Len |<-----' 2153 | | | InOO |<--------' | 2154 |<--------------------' | 2156 b) First SYN=0 segment in either direction 2157 +--------+----------+----------+---------+---------------+----------+ 2158 | BaseHdr| OuterOpts| Payload |InSpace#2| Inner Options | Payload | 2159 +--------+----------+----------+---------+---------------+----------+ 2160 | DO | SPS#1 | Len | InOO | SPS#2 | 2161 `------------------>`--------->`-------->`-------------->`--------->| 2163 All offsets are specified in 4-octet (32-bit) words, except SPS, 2164 which is in octets. 2166 Figure 9: Segment Structures to Traverse DPI boxes (not to scale) 2168 It is recognised that having to work from the end of the first 2169 segment makes processing more involved. Experimental implementation 2170 of this approach will determine whether the extra complexity improves 2171 DPI box traversal sufficiently to make it worthwhile. 2173 Appendix B. Comparison of Alternatives 2175 B.1. Implicit vs Explicit Dual Handshake 2177 In the body of this specification, two variants of the dual handshake 2178 are defined: 2180 1. The implicit dual handshake (Section 2.1.1) starting with just an 2181 Ordinary SYN (no InSpaceO flag option) on the Ordinary 2182 Connection; 2184 2. The explicit dual handshake (Appendix A.2) starting with a SYN-O 2185 (InSpaceO flag option) on the Ordinary Connection. 2187 Both schemes double up connection state (for a round trip) on the 2188 Legacy Server. But only the implicit scheme doubles up connection 2189 state (for a round trip) on the Upgraded Server as well. On the 2190 other hand, the explicit scheme risks delay accessing a Legacy Server 2191 if a middlebox discards the SYN-O (it is possible that some firewalls 2192 will discard packets with unrecognised TCP options {ToDo: ref?}). 2193 Table 3 summarises these points. 2195 +----------------------------------+---------------+----------------+ 2196 | | SYN | SYN-L | 2197 | | (Implicit) | (Explicit) | 2198 +----------------------------------+---------------+----------------+ 2199 | Minimum state on Upgraded Server | - | + | 2200 | | | | 2201 | Minimum risk of delay to Legacy | + | - | 2202 | Server | | | 2203 +----------------------------------+---------------+----------------+ 2205 Table 3: Comparison of Implicit vs. Explicit Dual Handshake on the 2206 Ordinary Connection 2208 There is no need for the IETF to choose between these. If the 2209 specification allows either or both, the tradeoff can be left to 2210 implementers at build-time, or to the application at run-time. 2212 Initially clients might choose the Implicit Dual Handshake to 2213 minimise delays due to middlebox interference. But later, perhaps 2214 once more middleboxes support the scheme, clients might choose the 2215 Explicit scheme, to minimise state on Upgraded Servers. 2217 Appendix C. Protocol Design Issues (to be Deleted before Publication) 2219 This appendix is informative, not normative. It records outstanding 2220 issues with the protocol design that will need to be resolved before 2221 publication. 2223 Option alignment following re-segmentation: If the byte-stream is 2224 resegmented (e.g. by a connection splitter), the TCP options 2225 within the stream will not necessarily align on 4-octet word 2226 boundaries within the new segments. 2228 Ossifies reliable ordered delivery into TCP design: At present it is 2229 theoretically possible to implement a variant of TCP that provides 2230 partial reliability. Inner Space as it stands would prevent a 2231 future partial reliable TCP, but not if out-of-order delivery were 2232 added, as discussed below. 2234 Ideally Outer Options in Inner: Ideally enable Outer Options to be 2235 located beyond the Data Offset: i) without consuming receive 2236 window ii) either without consuming sequence space or, if 2237 otherwise, must be robust to middlebox correction; iii) delivered 2238 immediately on reception, not in sent order. Could use the Minion 2240 [Iyengar10] variant (or a similar variant) of the consistent 2241 overhead byte-stuffing (COBS) encoding. 2243 Appendix D. Change Log (to be Deleted before Publication) 2245 A detailed version history can be accessed at 2246 2249 From briscoe-...-inner-space-00 to briscoe-...-inner-space-01: 2250 Technical changes: 2252 * Corrected DO to 4 * DO (twice) 2254 * Confirmed that receive window applies to Inner Options 2256 * Generalised the cause of decryption/decompression from a 2257 previous TCP option to any previouis control message 2259 * Added requirement for a middlebox not to defer data on SYN 2261 * Latency of dual handshake is worst of two 2263 * Completed "Interaction with Pre-Existing TCP Implementations" 2264 section, covering other TCP variants, TCP in middleboxes and 2265 the TCP API. Shifted some TCP options to Outer only, because 2266 of RWND deadlock problem 2268 * Added two outstanding issues: i) ossifies reliable ordered 2269 delivery; ii) Ideally Outer in Inner. 2271 Editorial changes: 2273 * Removed section on Echo TCP option to a separate I-D that is 2274 mandatory to implement for inner-space, and shifted some SYN 2275 flood discussion in Security Considerations 2277 * Clarifications throughout 2279 * Acknowledged more review comments 2281 From draft-briscoe-tcpm-syn-op-sis-02 to draft-briscoe-tcpm-inner- 2282 space-00: 2283 The Inner Space protocol is a development of a proposal called the 2284 SynOpSis (Sister SYN options) protocol. Most of the elements of 2285 Inner Space were in SynOpSis, such as the implicit and explicit 2286 dual handshakes; the use of a magic number to flag the existence 2287 of the option; the various header offsets; and the option 2288 processing rules. 2290 The main technical differences are: Inner Space extends option 2291 space on any segment, not just the SYN; this advance requires the 2292 introduction of the Sent Payload Size field and a general 2293 rearrangement and simplification of the protocol format; the 2294 option processing rules have been extended to assure compatibility 2295 with TFO and one degree of recursion has been introduced to cater 2296 for encryption or compression of Inner Options; The Echo option 2297 has been added to provide a SYN-cookie-like capability. Also, the 2298 default protocol has been pared down to the bare bones and 2299 optional extensions relegated to appendices. 2301 The main editorial differences are: The emphasis of the Abstract 2302 and Introduction has expanded from a focus on just extra space 2303 using the dual handshake to include much more comprehensive 2304 middlebox traversal. A comprehensive Design Rationale section has 2305 been added. 2307 Author's Address 2309 Bob Briscoe 2310 BT 2311 B54/77, Adastral Park 2312 Martlesham Heath 2313 Ipswich IP5 3RE 2314 UK 2316 Phone: +44 1473 645196 2317 Email: bob.briscoe@bt.com 2318 URI: http://bobbriscoe.net/