idnits 2.17.1 draft-sabatini-tcp-sack-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 9 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 10 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 177: '... function and is MUST be coded as 0 cu...' RFC 2119 keyword, line 180: '... This option MUST NOT be transmitted...' RFC 2119 keyword, line 264: '...he data receiver MAY elect to generate...' RFC 2119 keyword, line 266: '...circumstance, it MUST generate them un...' RFC 2119 keyword, line 268: '...iven connection, it MUST NOT send SACK...' (13 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 28, 2012) is 4413 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Fall95' is mentioned on line 84, but not defined == Missing Reference: 'RFC1323' is mentioned on line 321, but not defined ** Obsolete undefined reference: RFC 1323 (Obsoleted by RFC 7323) == Missing Reference: 'Stevens94' is mentioned on line 380, but not defined == Missing Reference: 'Floyd96' is mentioned on line 417, but not defined ** Obsolete normative reference: RFC 1323 (ref. 'Jacobson92') (Obsoleted by RFC 7323) ** Obsolete normative reference: RFC 793 (ref. 'Postel81') (Obsoleted by RFC 9293) Summary: 6 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Sabatini 3 Internet-Draft Broker Communications Inc. 4 Intended Status: Standards Track . 5 Expires: September 1, 2012 February 28, 2012 7 Highly Efficient Selective Acknowledgement (SACK) for TCP 8 draft-sabatini-tcp-sack-00 10 Status of this Memo 12 This Internet-Draft is submitted in full conformance with the 13 provisions of BCP 78 and BCP 79. Internet-Drafts are working 14 documents of the Internet Engineering Task Force (IETF), its areas, 15 and its working groups. Note that other groups may also distribute 16 working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/1id-abstracts.html 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html 29 This Internet-Draft will expire on September 1, 2012. 31 Comments are solicited and should be addressed to the author at 32 draft-sack@tsabatini.com. 34 Copyright Notice 36 Copyright (c) 2012 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Abstract 50 This memo expands on the Selective Acknowledgement Protocol described 51 in RFC2018 to improve its performance and efficiency while reducing 52 the delay involved in recovering lost segments. This leads to very 53 reliable and efficient communications regardless of transit delay or 54 high levels of lost segments due to noise or congestion. It 55 introduces a fundamentally new way of looking at Selective 56 Acknowledgement and uses this concept to improve the performance of 57 the RFC2018 protocol. This memo proposes an implementation of the 58 improved SACK and discusses its performance and related issues. 60 Acknowledgements 62 Much of the text in this document is taken directly from RFC2018 "TCP 63 Selective Acknowledgement Options" by M. Mathis, J. Mahdavi, S.floyd 64 and A. Romanow and RFC1072 "TCP Extensions for Long-Delay Paths" by 65 B. Braden and V. Jacobson. 67 1. Introduction 69 This revision to the SACK protocol has its roots in a similar, HDLC 70 based protocol I designed and implemented for secure financial 71 transactions. That protocol, being designed for use on a worldwide 72 basis, was born out of the need for a protocol that would handle any 73 communications environment no matter how noisy or how much delay 74 (including multiple satellite hops) was in the path. In later years 75 its properties were found valuable in congestion situations where 76 packets were dropped. 78 Multiple packet losses from a window of data can have a catastrophic 79 effect on TCP throughput. TCP [Postel81] uses a cumulative 80 acknowledgment scheme in which received segments that are not at the 81 left edge of the receive window are not acknowledged. This forces 82 the sender to either wait a roundtrip time to find out about each 83 lost packet, or to unnecessarily retransmit segments which have been 84 correctly received [Fall95]. With the cumulative acknowledgment 85 scheme, multiple dropped segments generally cause TCP to lose its 86 ACK-based clock, reducing overall throughput. 88 Selective Acknowledgment (SACK) is a strategy which corrects this 89 behavior in the face of multiple dropped segments. With selective 90 acknowledgments, the data receiver can inform the sender about all 91 segments that have arrived successfully, so the sender need 92 retransmit only the segments that have actually been lost. 94 I propose modifications to the SACK options as proposed in RFC2018. 95 Specifically, I add a transmit state to each transmitted message and 96 return that transmit state when each acknowledgement is sent. By 97 using the returned transmit state I can tell what messages have been 98 transmitted after the information in the acknowledgement and thus 99 rebuild the state of the receiver at the transmitter. I also propose 100 changes to the way SACK blocks are reported to insure that the 101 oldest, and thus the most critical, are transmitted expeditiously. 102 Additionally since the space to store acknowledgements in IPV4 is 103 limited and may not be able to accommodate all of the acknowledgement 104 pairs, I propose a method of sending the complete receiver state by 105 sending multiple acknowledgements. 107 The RFC2018 selective acknowledgment extension uses two TCP options. 108 The first is an enabling option, "SACK-permitted", which may be sent 109 in a SYN segment to indicate that the SACK option can be used once 110 the connection is established. This option is extended to both 111 indicate that this newer version of the protocol is being used and to 112 establish an initial value for transmit state. The other is the SACK 113 option itself, which may be sent over an established connection once 114 permission has been given by SACK-permitted. This has also been 115 extended to add both the transmit state implicit in the message and 116 the transmit state that was received at the far end (now called 117 "Received State"). 119 The SACK option is to be included in a segment sent from a TCP that 120 is receiving data to the TCP that is sending that data; we will refer 121 to these TCP's as the data receiver and the data sender, 122 respectively. We will consider a particular simplex data flow; any 123 data flowing in the reverse direction over the same connection can be 124 treated independently. 126 2. Underlying concepts 128 In order for a sender to know how to optimally transmit messages to a 129 receiver the sender must recreate the state of the receiver as of the 130 last acknowledgement received (which segments have been recieved and 131 acknowledged, which segments have not) and then "age" or modify that 132 state by updating it based upon the messages transmitted since the 133 state implicit in the acknowledgement was current. In order to do 134 this the sender must maintain a transmission order buffer which lists 135 the segment ranges of each message as it is sent. We called the 136 index into the transmission order buffer "Send state" and transmitted 137 this state variable with each message. The receiver, after correctly 138 receiving the message, saves this value and returns it (now called 139 "receive state") and the list of selectively acknowledged segments 140 with each acknowledgement. When the sender receives this information 141 it is then capable of constructing a list of missing segments by 142 taking its unacknowledged segment range list and modifying it on the 143 basis of the received selective acknowledgements and then removing 144 from that list all segments that have been transmitted since the 145 message which caused the acknowledgement which is all segments sent 146 with indexes between the current "send state" and the "receive state" 147 in the acknowledgement message. 149 To accommodate the issue of receiving segments out of order at the 150 receiver, or those packets delayed by alternate routing, the reciever 151 does not instantly update the received state value (which could 152 trigger a false retransmission) but rather puts it on a timer queue 153 for a length of time appropriate to the delay randomness in the 154 arrival path (typically 40 to 200 ms based on media, speed and 155 distance), which when the timer entry expires, causes the update of 156 the recieved state value. If the recieved state value when returned 157 to the sender and processed shows blocks that remain unacknowledged 158 after this time out they are assumed to be lost and they are queued 159 for retransmission. 161 Thus by transmitting the complete acknowledgement information (SACK 162 blocks) from the receiver along with an indicator to the sender as to 163 its state current at the time of the acknowledgement the sender can 164 accurately recreate the current status of the receiver assuming all 165 "in flight" messages were received and thus only send the 166 unacknowledged messages starting with the oldest along with any new 167 messages whose retransmission is requested. 169 3. Sack-Permitted Option 171 This six-byte option may be sent in a SYN by a TCP that has been 172 extended to receive (and presumably process) the Improved SACK 173 option. 175 The presence of the additional four bytes differentiates the Improved 176 SACK from the earlier protocol. Although Receive Status serves no 177 function and is MUST be coded as 0 currently, it is left to further 178 study whether it can be utilized in link reconnection after failure. 180 This option MUST NOT be transmitted on non-SYN segments in the 181 current protocol, it is left to future study as to its use for 182 transmitting long sequences of acknowledgements in one frame. 184 TCP Sack-Permitted Option: 186 Kind: 4 188 +--------+--------+ 189 | Kind=4 |Length=6| 190 +--------+--------+--------+--------+ 191 | Send State | Receive State | 192 +--------+--------+--------+--------+ 194 4. Sack Option Format 196 The SACK option is to be used to convey extended acknowledgment 197 information from the receiver to the sender over an established TCP 198 connection. 200 TCP SACK Option: 202 Kind: 5 204 Length: Variable 206 +--------+--------+ 207 | Kind=5 | Length | 208 +--------+--------+--------+--------+ 209 | Send State | Receive State | 210 +--------+--------+--------+--------+ 211 | Left Edge of 1st Block | 212 +--------+--------+--------+--------+ 213 | Right Edge of 1st Block | 214 +--------+--------+--------+--------+ 215 | | 216 / . . . / 217 | | 218 +--------+--------+--------+--------+ 219 | Left Edge of nth Block | 220 +--------+--------+--------+--------+ 221 | Right Edge of nth Block | 222 +--------+--------+--------+--------+ 224 The SACK option is to be sent by a data receiver to inform the data 225 sender of non-contiguous blocks of data that have been received and 226 queued. The data receiver awaits the receipt of data (perhaps by 227 means of retransmissions) to fill the gaps in sequence space between 228 received blocks. When missing segments are received, the data 229 receiver acknowledges the data normally by advancing the left window 230 edge in the Acknowledgement Number Field of the TCP header. The SACK 231 option does not change the meaning of the Acknowledgement Number 232 field. 234 This option contains a list of some of the blocks of contiguous 235 sequence space occupied by data that has been received and queued 236 within the window. 238 Each contiguous block of data queued at the data receiver is defined 239 in the SACK option by two 32-bit unsigned integers in network byte 240 order: 242 * Left Edge of Block 244 This is the first sequence number of this block. 246 * Right Edge of Block 248 This is the sequence number immediately following the last 249 sequence number of this block. 251 Each block represents received bytes of data that are contiguous and 252 isolated; that is, the bytes just below the block, (Left Edge of 253 Block - 1), and just above the block, (Right Edge of Block), have not 254 been received. 256 A SACK option that specifies n blocks will have a length of 8*n+6 257 bytes, so the 40 bytes available for TCP options can specify a 258 maximum of 4 blocks. It is suggested that the Improved SACK will 259 provide the timestamp information used for RTTM [Jacobson92]. 261 5. Generating Sack Options: Data Receiver Behavior 263 If the data receiver has received a SACK-Permitted option on the SYN 264 for this connection, the data receiver MAY elect to generate SACK 265 options as described below. If the data receiver generates SACK 266 options under any circumstance, it MUST generate them under all 267 permitted circumstances. If the data receiver has not received a 268 SACK-Permitted option for a given connection, it MUST NOT send SACK 269 options on that connection. 271 If sent at all, SACK options MUST be included in all ACKs which do 272 not ACK the highest sequence number in the data receiver's queue. In 273 this situation the network has lost or mis-ordered data, such that 274 the receiver holds non-contiguous data in its queue. RFC 1122, 275 Section 4.2.2.21, discusses the reasons for the receiver to send ACKs 276 in response to additional segments received in this state. The 277 receiver MUST send an ACK for every valid segment that arrives 278 containing new data, and each of these "duplicate" ACKs SHOULD bear a 279 SACK option. 281 The purpose of the SACK blocks is to recreate the status of the 282 receiver at the transmitter. To that end the most important 283 information is (1) new or changed blocks, (2) the second transmission 284 of new or changed blocks, (3) a complete enumeration of all received 285 blocks starting from the oldest first. Since the SACK option field 286 may not have enough space for all blocks outstanding the receiver 287 will continue to issue acknowledgements until all blocks are 288 transmitted. In order to implement the SACK option a flag must be 289 kept with each block indicating whether it has been sent a second 290 time. 292 If the data receiver chooses to send a SACK option, the following 293 rules apply: 295 * The data receiver first fills in "Send State" in the option from 296 the current value of its "Send State". The data receiver then 297 fills in "Receive State" from the "Send State" of the SACK option 298 of the last TCP packet received that has cleared the delay queue. 300 * The first SACK block (i.e., the one immediately following the 301 kind and length fields in the option) MUST specify the contiguous 302 block of data containing the segment which triggered this ACK, 303 unless that segment advanced the Acknowledgment Number field in 304 the header. This assures that the ACK with the SACK option 305 reflects the most recent change in the data receiver's buffer 306 queue. 308 * The data receiver SHOULD include as many distinct SACK blocks as 309 possible in the SACK option. Note that the maximum available 310 option space may not be sufficient to report all blocks present in 311 the receiver's queue. 313 * The second SACK block SHOULD be filled out by repeating the most 314 recently reported SACK block (based on first SACK blocks in 315 previous SACK options) that are not subsets of a SACK block 316 already included in the SACK option being constructed and if it 317 has not previously been retransmitted. This assures that in 318 normal operation, any segment remaining part of a non-contiguous 319 block of data held by the data receiver is reported in at least 320 two successive SACK options, even for large-window TCP 321 implementations [RFC1323]). 323 * Subsequent SACK blocks SHOULD be filled with other outstanding 324 SACK blocks on the list, cycling from the earliest to the latest 325 and then starting again with the earliest. Whenever the the list 326 changes sufficient acknowledgements must be sent to insure that 327 all SACK blocks are transmitted. 329 * Upon any change to the recieved state value, if the reciever is 330 not currently transmiting data or ACK packets, the reciever will 331 initiate sending sufficient data or ACK packets to completely 332 transmit its complete SACK block list based on the rules above. 334 * A timer is maintained that is one quarter of the expected round 335 trip delay (typically 250 mS). This timer is set when the last 336 acknowledgement is transmitted by the receiver. At the expiration 337 of this timer if there are still segments that have not been 338 retransmitted the receiver again sends sufficient acknowledgements 339 to completely transmit all current SACK blocks. 341 6. Interpreting the Sack Option and Retransmission Strategy: Data 342 Sender Behavior 344 When receiving an ACK containing a SACK option, the data sender MUST 345 record the selective acknowledgment for future reference. The data 346 sender is assumed to have a retransmission queue that contains the 347 segments that have been transmitted but not yet acknowledged, in 348 sequence-number order. If the data sender performs re-packetization 349 before retransmission, the block boundaries in a SACK option that it 350 receives may not fall on boundaries of segments in the retransmission 351 queue; however, this does not pose a serious difficulty for the 352 sender. 354 One possible implementation of the sender's behavior is as follows. 355 Upon receiving an acknowledgement the sender first eliminates all 356 saved SACK blocks from the list which have now been acknowledged by 357 the TCP header. The sender then adds the SACK blocks from the 358 current acknowledgement into SACK block list, eliminating any that 359 have been combined. The sender then constructs a list of 360 unacknowledged blocks by creating a block for each gap in sequence. 361 The sender then takes the received state from the message and uses 362 the list of blocks that have been transmitted since that state was 363 generated to delete members on the unacknowledged list. The sender 364 finally sets the updated unacknowledged list as the list of blocks to 365 be sent, oldest first. 367 After a retransmit timeout the data sender SHOULD delete all saved 368 SACK blocks, since under normal circumstances the acknowledgements 369 from the other end should have prevented the timeout. The data 370 sender MUST start the retransmit with the segment at the left edge of 371 the window after a retransmit timeout. A segment will not be 372 dequeued and its buffer freed until the left window edge is advanced 373 over it. 375 6.1 Congestion Control Issues 377 This document does not attempt to specify in detail the congestion 378 control algorithms for implementations of TCP with SACK. However, 379 the congestion control algorithms present in the de facto standard 380 TCP implementations MUST be preserved [Stevens94]. This algorithim 381 eliminates much unnecessary retransmission so is likely to lessen 382 overall congestion. 384 The use of time-outs as a fall-back mechanism for detecting dropped 385 packets is unchanged by the SACK option. Because in normal operation 386 acknowledgements will prevent retransmit timeout, when a retransmit 387 timeout occurs the data sender MUST ignore prior SACK information in 388 determining which data to retransmit. 390 Future research into congestion control algorithms may take advantage 391 of the additional information provided by SACK. One such area for 392 future research concerns modifications to TCP for a wireless or 393 satellite environment where packet loss is not necessarily an 394 indication of congestion. 396 7. Efficiency and Worst Case Behavior 398 Although this high efficiency improved SACK option sends more and 399 larger SACK blocks and more acknowledgements than the previous 400 version, with an active bi-directional link additional 401 acknowledgements are often associated with data transmission and thus 402 not a penalty. If the SACK option needs to be used due to segment 403 loss then the improved efficiency afforded with this protocol more 404 than justifies the additional SACK blocks. 406 The deployment of other TCP options may reduce the number of 407 available SACK blocks to 2 or even to 1. This will reduce the 408 redundancy of SACK delivery in the presence of lost ACKs. Even so, 409 the exposure of TCP SACK in regard to the unnecessary retransmission 410 of packets is strictly less than the exposure of current 411 implementations of TCP. The worst-case conditions necessary for the 412 sender to needlessly retransmit data is discussed in more detail in a 413 separate document [Floyd96]. 415 Older TCP implementations which do not have the SACK option will not 416 be unfairly disadvantaged when competing against SACK-capable TCPs. 417 This issue is discussed in more detail in [Floyd96]. 419 8. Timestamping 421 One pleasant benefit of having a token which is returned by the far 422 end on a determineistic basis is the easy calculation of round trip 423 delay. We can save a time stamp along with the segment information 424 in our transmission order array. This allows us to calculate round 425 trip delay when we receive our "Receive State" value and use it to 426 access the timestamp. Since more than one received message might 427 have the same "Receive State" value we zero the timestamp after use 428 to indicate that the value should not be used again. Note that if an 429 acknowledgement is lost we will calculate a longer delay than is 430 accurate therefore we must smooth the returned values, typically 431 returning the smallest out of the last N where N is typically four. 433 9. Data Receiver Reneging 435 Since the Sender is recreating the state of the Receiver, the data 436 Receiver MUST NOT discard data in its queue once that data has been 437 reported in a SACK option. The Receiver is responsible for 438 allocating enough buffers so that the missing segments within the 439 window may be properly received and processed. 441 10. Security Considerations 443 This document neither strengthens nor weakens TCP's current security 444 properties. 446 11. References 448 [Jacobson88}, Jacobson, V. and R. Braden, "TCP Extensions for Long- 449 Delay Paths", RFC 1072, October 1988. 451 [Jacobson92] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions 452 for High Performance", RFC 1323, May 1992. 454 [Postel81] Postel, J., "Transmission Control Protocol - DARPA 455 Internet Program Protocol Specification", RFC 793, DARPA, September 456 1981. 458 Author's Address 460 Anthony Sabatini 461 Broker Communications Inc. 462 200 West 20th Street 463 Suite 1216 464 New York, NY 10011 465 Email: draft-sack@tsabatini.com 467 The author is currently a master's degree candidate at - 469 Hofstra University 470 Hempstead, N.Y. 472 His adviser is Dr. Xiang Fu