idnits 2.17.1 draft-sarolahti-irtf-catcp-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 5, 2012) is 4435 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-06) exists of draft-ietf-tcpm-experimental-options-00 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) == Outdated reference: A later version (-12) exists of draft-ietf-mptcp-multiaddressed-06 Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Sarolahti 3 Internet-Draft J. Ott 4 Intended status: Experimental Aalto University 5 Expires: September 6, 2012 C. Perkins 6 University of Glasgow 7 March 5, 2012 9 TCP Segment Caching 10 draft-sarolahti-irtf-catcp-00.txt 12 Abstract 14 This document describes Content- and Cache-Aware TCP (CATCP) that 15 allows caching of TCP segments to be re-used between different 16 connections transmitting same data. When there is redundant data to 17 multiple receivers, this can lead to significant load reductions and 18 performance improvements. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on September 6, 2012. 37 Copyright Notice 39 Copyright (c) 2012 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 4 57 4. New TCP Options . . . . . . . . . . . . . . . . . . . . . . . 5 58 4.1. CA-TCP Enabled Option . . . . . . . . . . . . . . . . . . 5 59 4.2. Content Label Option . . . . . . . . . . . . . . . . . . . 6 60 4.3. Content Request Option . . . . . . . . . . . . . . . . . . 7 61 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 62 5.1. Sender Behavior . . . . . . . . . . . . . . . . . . . . . 8 63 5.2. Receiver Behavior . . . . . . . . . . . . . . . . . . . . 11 64 5.3. Controller Behavior . . . . . . . . . . . . . . . . . . . 11 65 5.4. Segment Cache Behavior . . . . . . . . . . . . . . . . . . 13 66 6. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 14 67 6.1. Correctness of Data . . . . . . . . . . . . . . . . . . . 14 68 6.2. Consistent Segmentation . . . . . . . . . . . . . . . . . 14 69 6.3. Interaction with middleboxes . . . . . . . . . . . . . . . 14 70 6.4. Other Issues . . . . . . . . . . . . . . . . . . . . . . . 15 71 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 72 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 73 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 74 9.1. Normative References . . . . . . . . . . . . . . . . . . . 15 75 9.2. Informative References . . . . . . . . . . . . . . . . . . 16 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16 78 1. Introduction 80 Many current Internet applications are content-oriented, where the 81 main application primitive is to locate and fetch a named content 82 resource. The most common example is the world-wide web, that uses 83 URLs to identify a particular content resource. Other content- 84 oriented applications are the various peer-to-peer file sharing 85 systems, or the traditional FTP. To enhance the efficiency of 86 content delivery, popular content is replicated to multiple servers, 87 and cached by intermediate on-path caches. Usually these caches are 88 application-specific, commonly focused on the web traffic. 90 This document describes a TCP extension called Content- and Cache- 91 Aware TCP (CA-TCP), that identifies the content in TCP payload in a 92 new TCP option [RFC0793]. This information can be used my new types 93 of intermediate network caches, to enable generic TCP segment caching 94 independent of the upper layer application protocol. These network 95 caches can transmit the cached TCP segments on behalf of the original 96 TCP sender to any number of receivers. All acknowledgments flow to 97 the original sender, and it can keep track of the progress of 98 transmission. The extension requires modifications only at the TCP 99 sender, and works with any normal TCP receiver. If the network path 100 contains intermediate network caches that support CA-TCP, the 101 communication performance of transmitting same data to multiple 102 receivers can be significantly improved. If there are no such caches 103 on the path, the communication behavior is identical to normal TCP. 105 Interoperating with the network caches, CA-TCP can reduce the load at 106 the TCP senders, and reduce the overall congestion in the network. 107 Through short-term caching between multiple simultaneous receivers of 108 the same data, CA-TCP can also be used to enable a form of "pseudo- 109 multicast": a cache can replicate the data sent by the TCP sender to 110 multiple receivers. This can be useful with, for example, with TCP- 111 based live video streaming. 113 There are also open issues to be solved with CA-TCP. A verification 114 mechanism is needed for the content that can be cached, to protect 115 against false injected TCP segments at the caches. The cached 116 content needs to be consistently segmented among different receivers 117 to make use of the cached data. These and some other issues are 118 discussed in more detail in Section 6. 120 2. Terminology 122 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 123 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 124 document are to be interpreted as described in [RFC2119]. 126 This document is an Experimental specification, but uses the 127 normative language as described above. In other words, 128 implementation of this document is optional, but if a host implements 129 CA-TCP, the normative instructions of this document MUST be followed. 131 In addition we use the following terms: 133 o Segment cache: A CATCP-aware cache that can store TCP segments 134 based on the TCP options in the packets. 136 3. Protocol Overview 138 With CA-TCP the applications can identify data they are transmitting 139 using a content label that is indicated to the TCP implementation by 140 an API extension. This content label is transmitted with TCP 141 segments using a new Content Label Option, that enables TCP segment- 142 level caching at segment caches that support CA-TCP. 144 In addition to the TCP sender and receiver, the CA-TCP framework 145 consists of a Controller, which is a middlebox that intercepts the 146 data segments and acknowledgments and determines the next data 147 expected to be transmitted, and segment caches that maintain packet 148 level caches of labeled content. These caches are then used to share 149 the cached content between TCP connections, with the segment caches 150 intercepting CA-TCP acknowledgments, labeled using a new content 151 request TCP option, and transmitting data from their cache in 152 response. This helps reduce the load at the TCP sender, and in the 153 upstream path. The receiver may be modified to add content request 154 options to acknowledgments, or they may be added by a middlebox near 155 the edge of the network known as the controlling node. 157 TCP acknowledgments are delivered to the original sender as normal, 158 allowing it to keep track of the transmission progress and update its 159 transmission window accordingly. When transmitting a segment on 160 behalf of the sender, the segment cache updates the content request 161 TCP option to inform the sender and upstream segment caches that it 162 has sent data, preventing them from injecting the same data again. 164 A CA-TCP connection starts in the usual way, with an end-to-end 165 three-way handshake. Once the connection is established, data 166 transfer can proceed as normal, with unlabeled content, or the sender 167 can add a content label to identify the payload as being a particular 168 content item. 170 For example, when fetching an HTTP resource, the client would 171 initiate the connection and send the HTTP GET request as unlabeled 172 content in the usual way. The server would accept the connection and 173 send the headers of the response as unlabeled content, then it may 174 assign a content label for the response data, differentiating between 175 variants (e.g., encodings) of the content as negotiated with the 176 client. The content label can be changed in the middle of a 177 connection, if the object under transmission changes. Data that 178 should not be cached is not given a content label. 180 4. New TCP Options 182 Because CA-TCP is an experimental mechanism, the new options use the 183 experimental TCP options according to the guidelines given in 184 [I-D.ietf-tcpm-experimental-options]. Therefore no IANA allocation 185 for new TCP option types is needed at this time. 187 4.1. CA-TCP Enabled Option 189 The CA-TCP Enabled TCP Option is shown in Figure 1. The option is 190 sent in the beginning of connection, together with the SYN and SYN- 191 ACK segments, to indicate that CA-TCP is supported. 193 1 2 3 194 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 195 +---------------+---------------+-------------------------------+ 196 | Kind | Length = 6 | Magic Number = 0x20120229 | 197 +---------------+---------------+-------------------------------+ 198 | ...Magic Number | 199 +---------------+---------------+ 201 Figure 1: CA-TCP Enabled Option 203 The option fields are as follows: 205 o Kind: set to 253 in TCP SYN segment, and 254 in TCP SYN-ACK 206 segment. These are the experimental TCP option codes, available 207 without IANA allocation. Note that TCP receivers are not required 208 to be aware of CA-TCP, but a CA-TCP segment cache SHOULD add this 209 option to TCP SYN-ACK segment, if it was not there already. 211 o Length and Magic Number are set as indicated above. The use of 212 Magic Number is described in [I-D.ietf-tcpm-experimental-options], 213 and we use hexadecimal value 0x20120229 for CA-TCP. 215 4.2. Content Label Option 217 The Content Label Option (Figure 2) is attached to TCP segments 218 containing data that can be cached by the segment caches. This 219 option SHOULD NOT be used, if a TCP sender has not received a valid 220 CA-TCP Enabled Option in a SYN-ACK, as this likely indicates that 221 there are no segment caches on the path, and the Content Label Option 222 would be useless. 224 1 2 3 225 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 226 +---------------+---------------+---------------+---------------+ 227 | Kind = 253 | Length = 16 | Magic Code | Reserved | 228 +---------------+---------------+---------------+---------------+ 229 | | 230 | Content Label (8 bytes) | 231 | | 232 +---------------------------------------------------------------+ 233 | Offset | 234 +---------------------------------------------------------------+ 236 Figure 2: Content Label Option 238 The option fields are as follows: 240 o Kind and Length: as indicated in the picture. 242 o Magic Code: This is the last 8 bits of the Magic Number in the CA- 243 TCP Enabled Option, i.e., 0x29. Using a full 32-bit Magic Number 244 is not desirable for Content Label Option because of the size 245 constraints in TCP Options. An 8-bit "magic code" allows 246 simultaneous operation of multiple experimental options sharing 247 the same option kind numbers. If the length and the magic code 248 are not correct, the Content Label Option MUST NOT be processed by 249 the CA-TCP segment caches. 251 o Reserved: SHOULD be set to 0 for now, reserved for future use. 253 o Content Label: Identifies the application layer content. 254 Application chooses a content label that is unique by high 255 probability, and communicates that to the TCP implementation 256 through the API. This label is only used as a caching identifier 257 (together with the content offset), and the TCP implementation is 258 indifferent about the contents of this field. The application 259 MUST NOT use the same content label if the application payload has 260 changed from the earlier use of the content label. 262 o Offset: Indicates the relative distance of the TCP payload to the 263 start of the content object in bytes. When a TCP sender starts to 264 send cachable data under a new content label, the offset is 265 initialized to 0 (unless the sender starts transmission from the 266 middle of content). For subsequent segments the offset value is 267 increased by the same amount as the TCP sequence number is 268 increased. This information is used to identify the cached 269 segments, in addition to the content label. Relative offsets are 270 needed, because TCP connections start at random sequence numbers 271 that cannot be used for caching. 273 4.3. Content Request Option 275 The Content Request Option (Figure 3) is carried together with TCP 276 acknowledgments. A CA-TCP segment cache uses it to indicate the 277 highest sequence number sent, and to indicate whether it has sent 278 data in response to the acknowledgment, to prevent upstream nodes 279 sending excess data in response to single acknowledgments. 281 1 2 3 282 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 283 +---------------+---------------+---------------+-------+-------+ 284 | Kind = 254 | Length = 20 | Magic Code | CS | Rsvd | 285 +---------------+---------------+---------------+-------+-------+ 286 | | 287 | Content Label (8 bytes) | 288 | | 289 +---------------------------------------------------------------+ 290 | Next Offset | 291 +---------------------------------------------------------------+ 292 | TCP Sequence | 293 +---------------------------------------------------------------+ 295 Figure 3: Content Request Option 297 The option fields are as follows: 299 o Kind and Length: as indicated in the picture. 301 o Magic Code: This is the last 8 bits of the Magic Number in the CA- 302 TCP Enabled Option, i.e., 0x29. If the length and the magic code 303 are not correct, a segment cache MUST NOT process this option. 305 o CS (CanSend): Number of TCP segments that can be sent in response 306 to this TCP acknowledgment. With this field a CA-TCP segment 307 cache can control the number of data segments transmitted in 308 response to the acknowledgment. Even though the field length 309 allows larger values, the value of CS SHOULD NOT be larger than 2, 310 i.e., at most two outgoing segments are allowed for an incoming 311 acknowledgment. The management of CS field is described in more 312 detail in Section Section 5.3.1. 314 o Rsvd (Reserved): SHOULD be set to 0 for now, reserved for future 315 use. 317 o Content Label: Identifies the content expected to be transit for 318 this connection. When receiving this acknowledgment, a CA-TCP 319 segment cache uses this field, together with the 'Next Offset' 320 field to check if it has data in its cache it can transmit in 321 response to the acknowledgment. 323 o Next Offset: Indicates, relative to the beginning of the content, 324 what data is expected to be transmitted next. A segment cache or 325 TCP sender uses this field, together with the 'Content Label' 326 field to determine what data is to be transmitted next. When a 327 segment cache transmits data, it increases the Next Offset field 328 before forwarding the TCP acknowledgment, to prevent multiple 329 transmissions of the same data. 331 o TCP Sequence: Tells the next TCP sequence number that should be 332 used by a segment cache for sending a data segment in response to 333 this acknowledgment. This information is needed by a segment 334 cache to build a valid TCP header for outgoing data segment. 335 Segment caches cannot be assumed to maintain a full TCP flow state 336 of ongoing connections. The other header fields can be 337 constructed based on the acknowledgment, as outlined in Section 338 Section 5.4. 340 5. Operation 342 In the following we specify the operation of CA-TCP Sender, CA-TCP 343 Controller that intercepts the normal TCP acknowledgments, and 344 Segment Cache, that stores the TCP segments that arrive with a 345 Content Label option. The TCP receiver follows normal TCP operation. 347 5.1. Sender Behavior 349 A TCP sender starts connection with normal TCP SYN handshake. If the 350 sender is going to use CA-TCP during the connection, it MUST add CA- 351 TCP Enabled Option (with Kind number 253) to the SYN segment. If a 352 TCP implementation with CA-TCP support does not know at connection 353 establishment time whether it is going to use CA-TCP, a safe action 354 is to add the CA-TCP Enabled option to all new connections. However, 355 a TCP sender MAY leave the option out, in which case it MUST NOT use 356 CA-TCP during the remainder of the connection. 358 The TCP sender checks if incoming SYN-ACK segment carries a CA-TCP 359 Enabled option (with Kind number 254). If this option is not 360 included, the TCP sender MUST NOT use CA-TCP during the remainder of 361 the connection. 363 5.1.1. Outgoing Data 365 Any time during the connection a TCP sender may or may not attach the 366 content label to the outgoing data segments. A TCP sender can also 367 change the content label during the connection, if the content item 368 under transmission changes. For example, in a persistent HTTP 369 connection, the HTTP headers would be sent without a content label, 370 and a static, cachable HTTP body would be transmitted with the 371 content label option., and for multiple HTTP responses in the same 372 connection, the different body elements would use different content 373 label. 375 The content label is chosen by the sending application, and the TCP 376 implementation is indifferent about its content. The receiving TCP 377 implementation will ignore the Content Label option, and receiving 378 application will not therefore know about its existence. The content 379 label should be chosen such that the likelihood of a label collision 380 at segment caches is as small as possible, for example using a digest 381 composed from the content itself. 383 When application sets new content label, the first byte of the 384 indicated content MUST start a new TCP segment. The TCP sender adds 385 a Content Label Option to this segment and subsequent segments until 386 the transmission of the content is complete. The value of Offset 387 field is set to 0 in the first segment, and the subsequent segments 388 set this value to the distance of the first byte of payload to the 389 start of data. The sender also stores the the TCP sequence number of 390 the first byte of content for further processing in an internal 391 variable (SND.Content-Start). This field can be seen as a content- 392 internal sequence numbering that is independent of the randomly 393 initialized value of the TCP sequence number. 395 If the labeled content changes in any way, for example, if any single 396 bit changes, or the length of the content changes, the application 397 MUST use a different content label than earlier. 399 TCP sender SHOULD try to maintain consistent segmentation for payload 400 with content label, to improve the efficiency of the caching of the 401 content. This is may not always be possible, but failing to do so is 402 not fatal. Inconsistent segment boundaries just result in not being 403 able to use the possibly earlier cached data. 405 5.1.2. Processing Acknowledgments 407 If an incoming acknowledgment arrives without the Content Request 408 Option, it is processed normally. 410 If an acknowledgment contains the Content Request Option, the sender 411 updates its local state in the following way: if the sum of 'Next 412 Offset' field and SND.Content-Start points to sequence that is later 413 than the value of SND.NXT in TCP sender's local state, the value of 414 SND.NXT is set to the sum of 'Next Offset' and SND.Content-Start. 415 This avoids resending data that has already been sent by a segment 416 cache in the network. 418 It is possible that with a fast segment cache SND.NXT grows faster 419 than the sending TCP application can write to the socket send buffer. 420 This is considered a feature of CA-TCP. The applications are 421 expected to explicitly enable CA-TCP, and to use content labels 422 consistently for the exactly same data. If this principle is 423 followed, this behavior does not cause problems. 425 SND.UNA in TCP's local state is managed normally, based on the 426 incoming TCP acknowledgments. TCP's retransmission algorithms 427 operate normally, based on incoming duplicate acknowledgments and 428 retransmission timer. Retransmitted segments MAY contain the Content 429 Label Option, if the content in question was assigned with a label. 431 TCP congestion control operates according to the normal rules 432 [RFC5681]. However, when an incoming acknowledgment arrives, the TCP 433 sender consults the 'CanSend (CS)' field, before increasing the 434 congestion window. The sender's congestion window MUST NOT be 435 increased more than the value of CS, multiplied by the SND.MSS. When 436 data is delivered from segment cache, it is common that the CS value 437 in incoming acknowledgment is 0. In this case the TCP sender does 438 not increase the congestion window at all (even by a fraction, if in 439 congestion avoidance). Note that this makes the TCP sender behavior 440 more conservative than in the normal case. 442 When data is delivered from the segment caches, it is possible that 443 TCP sender receives acknowledgments for data it has not yet sent. 444 Such data SHOULD be discarded from the socket send buffer without 445 transmitting it. Sampling the round-trip time and updating the RTO 446 estimate is impossible in such cases. Otherwise the RTO estimate is 447 maintained in a normal way [RFC6298]. The RTO timer SHOULD be reset 448 based on the currently estimated value on each new acknowledgment 449 that advances the window, to avoid spurious retransmission timeouts 450 on long periods of cached data. 452 5.2. Receiver Behavior 454 TCP Receiver operates in normal way, and does not need to be aware of 455 CA-TCP, or be able to parse the options. The receiver ignores the 456 options with their content, and delivers the data to the application 457 normally. A receiver can operate as a CA-TCP Controller, adding the 458 Content Request options to outgoing acknowledgments. 460 5.3. Controller Behavior 462 CA-TCP Controller is a node that processes normal TCP acknowledgments 463 and adds Content Request option to them, if needed. The Controller 464 can be co-located at a segment cache, or a TCP receiver can act as a 465 controller. Because segment caches operate based on the Content 466 Request options, an ideal location for the controller is close to the 467 receiver, because the segment caches downstream from controller 468 cannot be used for caching. In order to process the acknowledgments 469 appropriately, the controller needs to maintain some per-flow state, 470 as described below. 472 Controller maintains some flow-specific data for each flow with 473 potentially cachable data, in a structure we call "flow table". The 474 flow table contains the following data for each active flow. 476 o Source and Destination IP address and port -- for identifying a 477 flow. 479 o Content.Label -- Content label currently in transit in a flow. 480 This can be 0, if no labeled content is currently in transit. 482 o Content.Offset -- The TCP sequence number at the start of the 483 currently transmitted content object. 485 o Content.Next -- The next untransmitted content byte in the flow, 486 relative to the beginning of the content. 488 o Congestion control state -- see Section 5.3.1 for discussion on 489 congestion control at the Controller. 491 When a TCP SYN segment with CA-TCP Enabled option arrives at the 492 controller, it creates a new entry in the flow table using the source 493 and destination IP address and TCP port, and initializes the other 494 parameters for the flow. Storing the flow table entry is optional: 495 controller may choose to ignore the option, for example because of 496 resource constraints or a policy. In this case the TCP connection 497 continues to operate normally, without segment caching. 499 When a TCP SYN-ACK segment arrives at the controller (or is sent by a 500 receiver that also acts as a controller), and the controller has the 501 corresponding flow state for the flow, it SHOULD add CA-TCP Enabled 502 option to the segment before passing it forward, with option kind set 503 to 254, as described in Section 4.1. 505 When a TCP data segment with Content Label option arrives at the 506 controller, and it has the corresponding flow state initialized, the 507 controller stores the currently transmitted content label to the 508 appropriate flow table entry if it is different from earlier stored 509 content label. The controller also stores (or verifies) the TCP 510 sequence number at the start of the content to the flow table 511 ('Content.Offset'). 'Content.Offset' is calculated by subtracting 512 the Offset field at the Content Label option ('Option.Offset') from 513 the TCP sequence number in the current segment. This information is 514 needed to build correct Content Request options for TCP 515 acknowledgments for the same TCP flow. If Option.Offset + TCP.Length 516 (the length of TCP segment payload) is higher than 'Content.Next' in 517 the local flow table, the controller sets Content.Next to 518 Option.Offset + TCP.Length. Note that for a correctly working 519 connection, the difference between offset and TCP sequence number 520 should not change during the transmission of the content. However, 521 it is possible that TCP sender starts transmission of content from a 522 different offset than 0. 524 When a TCP data segment without Content Label arrives, and the flow 525 is found in the flow table, the controller erases the current label 526 from the table, if one was stored. If labeled content was previously 527 in transit, but a segment without content label option is received, 528 that tells the transmission of labeled content is finished (for 529 example, transmission of HTTP body is over, and data for new HTTP 530 request arrives). 532 When a TCP acknowledgment arrives that does NOT yet have a Content 533 Request option, but corresponds to a flow that is stored to flow 534 table, the controlling node reviews from the flow table if there 535 currently is labeled content in transmission. In positive case, it 536 adds a Content Request Option to the segment containing the current 537 content label. The controller also copies the 'Content.Next' value 538 from the local flow state to the 'Next Offset' field of the TCP 539 Option, and sets the 'TCP Sequence' field in the option as 540 Content.Offset + Content.Next. Now the contents of the option refer 541 to the next content offset and TCP sequence that can be transmitted 542 either by the sender caches, or by the TCP sender. The Controller 543 also sets the CS (CanSend) field using the congestion control 544 algorithm described below. After this the Controller forwards the 545 acknowledgment. 547 When a TCP acknowledgment arrives with Content Request option, the 548 Controller just ignores the segment and forwards the acknowledgment. 549 In this case there is another controller on the downstream path that 550 already manages the TCP flow. 552 5.3.1. Congestion Control 554 The Controller MUST take care of congestion control for the flows 555 that are in maintained at the flow table. A simplified congestion 556 control algorithm is considered sufficient, because the original TCP 557 sender is responsible of retransmitting data and ultimately maintains 558 the normal sender-side congestion control algorithm. 560 For example, the following algorithm is sufficient. 562 o The controller maintains a congestion window for each flow that 563 indicates how many segments are allowed to be in transit. The 564 congestion window is initialized to 3. 566 o When an incoming acknowledgment advances the window, the 567 congestion window is increased according to congestion avoidance 568 algorithm. 570 o When three consecutive duplicate acknowledgments arrive at the 571 controller, the congestion window is halved. 573 The controller then compares the current congestion window (cwnd) to 574 the number of outstanding unacknowledged TCP segments for the flow 575 (FlightSize), and sets the CanSend field in outgoing Content Request 576 option to cwnd - FlightSize. If the difference is more than 2, the 577 CanSend field SHOULD NOT have larger value than 2, to limit bursts. 579 The controller does not manage retransmission timer or take RTT 580 samples. If a retransmission timeout is required, the TCP sender 581 takes care of the needed retransmissions. 583 5.4. Segment Cache Behavior 585 The segment cache intercepts arriving TCP segments with Content 586 Request option, and transmits segment(s) from cache, if the requested 587 segments can be found, based on content label and Next Offset fields 588 in the option. 590 If a segment cache has a segment in storage that starts from the 591 sequence number indicated by the Next Offset field AND that has the 592 same content label, AND if the CS field is larger than 0, the segment 593 cache can transmit the cached segment towards the receiver that sent 594 the acknowledgment. The TCP header is built based on the incoming 595 acknowledgment, and the TCP sequence number is copied from the 596 Content Request option. The ACK flag is not set in the segment, and 597 the advertised window is set to a small value (exact value TBD), 598 because the segment cache does not know the state of the buffer at 599 the TCP end host. 601 After this the CS value in Content Request option is decremented by 602 one, and the Next Offset and the TCP sequence are incremented by the 603 length of the cached data segment payload. If the CS value is still 604 larger than 0, the segment cache can send another segment based on 605 the same procedure, if the segment is in cache, again decrementing 606 the option fields appropriately. 608 After the segment cache is done with processing the Content Request 609 option, it forwards the segment towards TCP sender. 611 6. Open Issues 613 6.1. Correctness of Data 615 Using content labels will give a malicious data source a tool to 616 inject data under a false content label unless some measures are 617 taken to verify the validity of the content at the receiver. Many 618 applications have such verification mechanisms built in, but at the 619 moment there are no valid common transport layer mechanism to check 620 the correctness of labeled content. A sending application must also 621 use the content labels consistently: if any part of the content is 622 changed, the label must be changed. 624 6.2. Consistent Segmentation 626 The potential benefit of caching depends on having consistent 627 segmentation of data. If the segment boundaries vary between 628 connections, the efficiency and cache hit rate suffers. 630 6.3. Interaction with middleboxes 632 Network address translation is not expected to affect the behavior of 633 CA-TCP, and its behavior is tested with some NAT implementations. 634 However, some middleboxes may perform in-window checks that filter 635 out acknowledgments that appear to arrive out of window, as may 636 easily happen with CA-TCP, as acknowledgments may arrive for data 637 that was not sent by the original sender. Some middleboxes may alter 638 segment boundaries, leading to similar problems as discussed above 639 for consistent segmentation. 641 6.4. Other Issues 643 The control loop between a cache and receiver may be significantly 644 faster than the loop between the TCP end hosts. Sender can interrupt 645 the transmission of data at any time by sending a segment without a 646 content label. However, during the time the segment is in transit, a 647 cache may have transmitted new data if there are cache hits. 648 Currently we believe this is not a critical problem for labeled 649 content. 651 CA-TCP requires a good portion of TCP option space for labeled data. 652 There are also other enhancements that are hungry for TCP option 653 space, such as Multipath TCP [I-D.ietf-mptcp-multiaddressed], and the 654 option space limitation of 40 bytes prevents the different 655 enhancements to be used together. One solution to this problem would 656 be to come up with enhancements to extend the available option space, 657 such as discussed in [I-D.eddy-tcp-loo]. 659 If the communication path changes during a TCP connection, the 660 traffic may bypass the original controller. This is not a fatal 661 problem, but just causes the rest of the TCP connection be 662 transmitted between the sender and the receiver. 664 7. Security Considerations 666 The main security issue is related to the correctness of data, as 667 discussed in Section 6.1. Good solutions for this problem are 668 currently being investigated. 670 8. Acknowledgments 672 This work has been partially funded by the PURSUIT EU project. 674 9. References 676 9.1. Normative References 678 [I-D.ietf-tcpm-experimental-options] 679 Touch, J., "Shared Use of Experimental TCP Options", 680 draft-ietf-tcpm-experimental-options-00 (work in 681 progress), January 2012. 683 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 684 RFC 793, September 1981. 686 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 687 Requirement Levels", BCP 14, RFC 2119, March 1997. 689 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 690 Control", RFC 5681, September 2009. 692 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 693 "Computing TCP's Retransmission Timer", RFC 6298, 694 June 2011. 696 9.2. Informative References 698 [I-D.eddy-tcp-loo] 699 Eddy, W. and A. Langley, "Extending the Space Available 700 for TCP Options", draft-eddy-tcp-loo-04 (work in 701 progress), July 2008. 703 [I-D.ietf-mptcp-multiaddressed] 704 Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 705 "TCP Extensions for Multipath Operation with Multiple 706 Addresses", draft-ietf-mptcp-multiaddressed-06 (work in 707 progress), January 2012. 709 Authors' Addresses 711 Pasi Sarolahti 712 Aalto University 713 Department of Communications and Networking 714 P.O. Box 13000 715 FI-00076 Aalto 716 Finland 718 Email: pasi.sarolahti@iki.fi 720 Joerg Ott 721 Aalto University 722 Department of Communications and Networking 723 P.O. Box 13000 724 FI-00076 Aalto 725 Finland 727 Email: jo@netlab.hut.fi 728 Colin Perkins 729 University of Glasgow 730 School of Computing Science 731 Glasgow G12 8QQ 732 United Kingdom 734 Email: csp@csperkins.org