idnits 2.17.1 draft-stewart-sctp-pktdrprep-16.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 15, 2014) is 3755 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2026' is defined on line 529, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2960 (Obsoleted by RFC 4960) Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Stewart 3 Internet-Draft Adara Networks 4 Intended status: Standards Track P. Lei 5 Expires: July 19, 2014 Cisco Systems, Inc. 6 M. Tuexen 7 Univ. of Applied Sciences Muenster 8 January 15, 2014 10 Stream Control Transmission Protocol (SCTP) Packet Drop Reporting 11 draft-stewart-sctp-pktdrprep-16.txt 13 Abstract 15 This document describes a new chunk type for SCTP. This new chunk 16 type can be used by both endhosts and routers to report the loss of 17 SCTP datagrams due to errors in transmission or other drops not due 18 to congestion. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on July 19, 2014. 37 Copyright Notice 39 Copyright (c) 2014 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 This document may contain material from IETF Documents or IETF 53 Contributions published or made publicly available before November 54 10, 2008. The person(s) controlling the copyright in some of this 55 material may not have granted the IETF Trust the right to allow 56 modifications of such material outside the IETF Standards Process. 57 Without obtaining an adequate license from the person(s) controlling 58 the copyright in such materials, this document may not be modified 59 outside the IETF Standards Process, and derivative works of it may 60 not be created outside the IETF Standards Process, except to format 61 it for publication as an RFC or to translate it into languages other 62 than English. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 67 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 3 68 3. Architectural Considerations . . . . . . . . . . . . . . . . 3 69 4. New Chunk Types . . . . . . . . . . . . . . . . . . . . . . . 4 70 4.1. Packet Drop Chunk (PKTDROP) . . . . . . . . . . . . . . . 5 71 5. Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 7 72 5.1. Sender of the packet drop . . . . . . . . . . . . . . . . 7 73 5.1.1. Middle box . . . . . . . . . . . . . . . . . . . . . 7 74 5.1.2. End host . . . . . . . . . . . . . . . . . . . . . . 8 75 5.2. Receiver side . . . . . . . . . . . . . . . . . . . . . . 8 76 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 77 7. Recommended Variables . . . . . . . . . . . . . . . . . . . . 12 78 8. Normative References . . . . . . . . . . . . . . . . . . . . 12 79 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 81 1. Introduction 83 The modern Internet has a wide variety of link types. A vast 84 majority of these link type present a very low bit error rate. In 85 recent years, however, a large number of higher bit error links are 86 becoming more wide spread for example satellite, 802.11, and 3G 87 cellular to name just a few. Often times one of the segments in the 88 path will realize that it is going to drop a packet due to bit 89 errors. When a drop does occur due to an error other than 90 congestion, the drop will be mistakenly interpreted as congestion in 91 the network by any transport protocol. 93 This "mis-interpretation" of feed back may cause an SCTP sender to 94 drastically under utilize a link. Depending on how severe the error 95 rate, the sender may stay in a continual state of congestion 96 collapse, thus effecting performance in a very negative way over the 97 entire life of the association. 99 This draft proposes a new SCTP chunk type that can be used by a 100 sender to discover dropped packets in such a case. This chunk may 101 also be used by an SCTP receiver to report cases of window overrun or 102 received data that may have had bit errors. 104 2. Conventions 106 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 107 SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when 108 they appear in this document, are to be interpreted as described in 109 RFC2119 [RFC2119]. 111 3. Architectural Considerations 113 The Packet Drop Reports (PKTDROP) can be generated by an SCTP 114 endpoint or a middle box. 116 The SCTP endpoint can inform its peer that it has received an SCTP 117 packet, but the CRC32c was wrong. The peer can retransmit this 118 packet and does not need to adopt the window for congestion control 119 because this packet-loss is not related to congestion. It is also 120 possible for the endpoint to make clear that the receiver window was 121 overrun. 123 There are two scenarios where a middle box may send Packet Drop 124 Reports. 126 For the first scenario consider a middle box in the path between the 127 communicating SCTP endpoints (see Figure 1), which communicates with 128 a middle box peer. Please note that the middle box peer can be can 129 be located at the same physical device that also runs the SCTP stack 130 or running on separate boxes providing a tunneling service. The 131 crucial point here is, that there is some protocol running between 132 the middle box and the middle box peer. 134 +------------+ +-----------------+ 135 +-------| middle box +-----+ middle box peer +--------+ 136 | +------------+ +-----------------+ | 137 | | 138 +----+-----+ +-----+----+ 139 | SCTP | | SCTP | 140 | endpoint | | endpoint | 141 +----------+ +----------+ 143 Figure 1 145 If they run a protocol below SCTP which provides an acknowledgment 146 service in a way that the sending middle box knows that a packet was 147 not received by the middle box peer and the packet was not dropped 148 due to congestion, then the sending middle box can also send a Packet 149 Drop Report back to the sending SCTP endpoint. It can also indicate 150 the current status of the send queue and the bandwidth limit between 151 the middle boxes if applicable. 153 In the other scenario there is only one middle box involved, which 154 means that there is no middle box specific communication, as shown in 155 Figure 2. In this case the middle box may want to send Packet Drop 156 Reports to report to the SCTP sender the number of queued data and a 157 possible bandwidth limitation between the middle box and the SCTP 158 receiver. 160 +------------+ 161 +--------------------+ middle box +--------------------+ 162 | +------------+ | 163 | | 164 +----+-----+ +-----+----+ 165 | SCTP | | SCTP | 166 | endpoint | | endpoint | 167 +----------+ +----------+ 169 Figure 2 171 4. New Chunk Types 173 This section defines the new chunk type that will be used to report 174 dropped packets not due to congestion in the network. Figure 3 175 illustrates the new chunk types. 177 Chunk Type Chunk Name 178 -------------------------------------------------------------- 179 0x81 Packet Drop Chunk (PKTDROP) 181 Figure 3 183 It should be noted that the PKTDROP Chunk format requires the 184 receiver to ignore the chunk if it is not understood. This is 185 accomplished as described in RFC2960 [RFC2960] section 3.2. by the 186 use of the upper bit of the chunk type. 188 4.1. Packet Drop Chunk (PKTDROP) 190 This chunk is used to communicate to the remote endpoint the 191 purposeful dropping of a packet which is NOT due to congestion or the 192 current state of a network bottleneck. 194 0 1 2 3 195 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 196 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 197 | Type = 0x81 | Flags=CTBM | Chunk Length | 198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 199 | Link Bandwidth or Maximum Rwnd | 200 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 201 | Size of data on queue | 202 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 203 | Truncated Length | Reserved | 204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 205 | Dropped SCTP Packet | 206 \ (No IP header Included - optional) / 207 / \ 208 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 210 Chunk Type : 8 bits - This value MUST be set to 0x81 for all packet 211 drop chunks. 213 Flags : 8 bits - The lower 3 bits of this field are used to identify 214 various properties about the packet report: 216 +--Bit--+-Set Value-+----------------Meaning--------- 217 | C | 00001000 | This bit transforms the link 218 | | | bandwidth and queue size fields 219 | | | from a byte count to a packet 220 | | | count. This will be set only 221 | | | by middle boxes that cannot 222 | | | determine byte counts. 223 +-------+-----------+------------------------------- 224 | T | 00000100 | This bit informs the receiver 225 | | | the packet was truncated to 226 | | | fit. If this bit is set then the 227 | | | truncated length holds the 228 | | | original packet length (from the 229 | | | IP header). 230 +-------+-----------+------------------------------- 231 | B | 00000010 | This bit informs the receiver 232 | | | that a BAD CRC32c was detected 233 | | | by an SCTP endpoint. 234 +-------+-----------+------------------------------- 235 | M | 00000001 | This bit informs the receiver 236 | | | that the source of the packet is 237 | | | a middle box, not the endhost. 238 | | | This also tells the receiver to 239 | | | look for the Peers Verification tag 240 | | | in the packet. This is equivalent 241 | | | to the T bit in an ABORT or 242 | | | SHUTDOWN COMPLETE packet. 243 +-------+-----------+------------------------------- 245 Chunk Length : 16 bits unsigned int - This value holds the length of 246 the chunk including the chunk header. 248 Link Bandwidth or Maximum Rwnd : 32 bits unsigned int - If the M bit 249 is set to '1', this value holds the bandwidth capacity in bytes per 250 second of the link the middle box is connected to aka the bottleneck 251 bandwidth being sent towards. If the M bit is set to '0' then this 252 value holds the maximum allowable Rwnd of the peer. This value is 253 normally the same value as that found in the INIT or INIT-ACK's 254 a_rwnd field. 256 Size of data on queue : 32 bits unsigned int - This value represents 257 the current number of bytes of data onqueue towards the link or 258 reader. In the case of a middle box (M bit set to '1'), this will 259 inform the receiver how much data is currently in queue towards the 260 bottleneck, if the link layer is reliable (e.g. a Reliable Link 261 Protocol) this number will also include any inflight data over the 262 link. In the case of an endhost (M bit set to '0') this will tell 263 the receiver how much data is still un-read or held for reassembly by 264 the remote SCTP endpoint. 266 Truncated Length : 16 bits unsigned int - This value is set to the 267 original size of the SCTP packet that was dropped. The size does NOT 268 include the IP header or any other IP option field (i.e. it is the 269 size of the SCTP payload within the IP packet). This value is only 270 valid if the T bit is set to '1'. 272 Reserved : 16 bits unsigned int - This value SHOULD be set to '0' by 273 the sender and MUST be ignored by the receiver. 275 Data Field: variable - This field is variable and usually holds the 276 packet that was dropped or a portion of it if the T bit is set. In 277 some instances a middle box may send a packet drop report without 278 this data. In such a case, it is reporting to the SCTP sender the 279 current bandwidth and NOT reporting a dropped packet. 281 5. Procedures 283 5.1. Sender of the packet drop 285 A packet drop chunk MUST NOT be send in response to a packet 286 containing an ABORT chunk or a packet drop chunk. 288 5.1.1. Middle box 290 Periodically a middle box may realize that it cannot transmit a chunk 291 due to errors in transmission. In such a case the middle box SHOULD 292 compose a packet drop chunk to send back to the SCTP sender of the 293 dropped packet. The middle box MUST set the M bit to one and copy 294 into the SCTP common header the verification tag found in the packet 295 to be dropped. The IP addresses and SCTP ports MUST be swapped so 296 that the receiver of the packet drop will identify the packet drop 297 report with the correct SCTP association. 299 After filling out the IP and SCTP headers, the sender MUST copy in 300 all or part of the SCTP packet being dropped not including the IP and 301 SCTP header (i.e. starting at the first SCTP chunk). The sender of 302 the packet drop report MUST assure that the packet fits into a single 303 MTU, truncating the packet and setting the T bit if necessary. If 304 the middle box truncates the packet to fit in a single MTU, the 305 middle box MUST copy the original length of the SCTP packet into the 306 Truncated length field. 308 The middle box sending the drop packet report SHOULD also total up 309 the data that is inflight (towards the destination, of the dropped 310 packet) and the data that is inqueue awaiting transmission, placing 311 this size in 'size of data on queue' field. The sender SHOULD also 312 place the link bandwidth, in bytes per second, in the 'bandwidth' 313 field. The receiving SCTP endpoint should use this information to 314 adjust its congestion control parameters. 316 5.1.2. End host 318 An SCTP endhost MAY want to send a packet drop for one of two 319 reasons, the SCTP sender has overrun the local receivers rwnd or the 320 inbound packet failed its CRC-32c check. 322 If the SCTP endhost detects a bad CRC-32c it will still use the SCTP 323 common header to attempt to locate the association. If a valid 324 association is found and the verification tag are correct, chances 325 are good that the common header was not damaged and thus the found 326 TCB can be used to generate a drop report with the rest of the SCTP 327 packet. 329 In either case the receiver that is sending the drop report MUST copy 330 the packet, with possible truncation as described above. The sender 331 of the drop report MUST set the M bit to 0 and place the verification 332 tag of the peer in the outbound packet. The sender of the drop 333 report should also place the maximum rwnd value in the 'Maximum Rwnd' 334 field, and should place the number of bytes unread in the 'data on 335 queue' field. Note that the unread byte count MUST include data in 336 any local buffer not yet read by the user, data pending reassembly 337 and data awaiting stream re-ordering. 339 5.2. Receiver side 341 When receiving a Packet Drop report the SCTP endpoint will want to 342 examine the drop report and based on the information possibly 343 retransmit lost information to the peer. The receiver SHOULD verify 344 that the sender actually had a packet by comparing some of the data 345 that was dropped to the data that was sent. This is done to assure 346 the sender that a malicious receiver is not attempting to induce a 347 retransmission of a congestion related dropped packet. The following 348 list illustrates the handling procedure by chunk type for dropped 349 packets. 351 1. DATA - For a data chunk drop, the receiver SHOULD locate the 352 identified DATA chunk and mark it for retransmission. The DATA 353 chunk should be treated just as if it had been marked for fast 354 retransmit with the exception that no adjustment should be made 355 to the value of cwnd (providing that the receiver can validate a 356 portion of the packet as being what was sent). 358 2. SACK - For a lost SACK chunk, a receiver MAY wish to send out a 359 new SACK illustrating the current receiver conditions. 361 3. INIT - For a INIT chunk, the receiver SHOULD resend the INIT 362 restarting its local T-1 timer. 364 4. HEARTBEAT REQUEST - For a heartbeat request, the receiver SHOULD 365 resend a heartbeat to the source address of the packet. 367 5. SHUTDOWN - For a shutdown request, the SCTP receiver SHOULD 368 resend the shutdown request. 370 6. SHUTDOWN ACKNOWLEDGEMENT - For a shutdown acknowledgement the 371 receiver SHOULD resend the SHUTDOWN-ACK. 373 7. COOKIE ECHO - For a Cookie Echo the receiver SHOULD retransmit 374 the lost COOKIE ECHO, restarting any cookie timer. 376 8. COOKIE ACKNOWLEDGMENT - For a lost cookie-ack a receiver should 377 retransmit a cookie-ack to the peer. 379 9. ASCONF - For a lost ASCONF, the receiver SHOULD retransmit the 380 ASCONF restarting any timer associated with the ASCONF. 382 10. FORWARD TSN - For a lost forward TSN the endpoint SHOULD resend 383 a new forward TSN reporting the current value that the TSN 384 should be advanced to. Note this may not be the same 385 information as that contained in the dropped chunk. 387 After queuing for retransmission any lost chunks, the sender MUST 388 also examine the bandwidth and queue fields taking into consideration 389 the source. If the M bit is set to '0' then the source of the drop 390 report was the SCTP peer. In such a case the receiver MUST 391 immediately adjust its peer rwnd by taking the value in the 'Maximum 392 Rwnd' field, subtracting the value of the 'data on queue' field and 393 any data in-flight. 395 If the sender is a middle box, M bit set to '1', the receiver MAY 396 adjust the cwnd for the source address of the drop packet by applying 397 the following algorithm if the current RTT of the link is larger than 398 the variable RTO.Large. A receiver of a packet drop report MUST NOT 399 adjust its cwnd if the RTT is or has ever been measured to be less 400 than or equal to RTO.Large. 402 Establish the True RTT using the values normally 403 used in calculating the RTO, set this value in 404 milliseconds into the variable 'rtt'. 406 rtt = (lastsa >> 2) + lastsv) >> 1; 408 Validate that an adjustment can be made. 410 if ((pd.chunk_flags AND M_BIT) != M_BIT) 411 return 413 if ( rtt < RTO.Large) 414 return 416 Set 'bottle_bw' to the value found in the Link Bandwidth 417 field. 419 bottle_bw = ntohl(pd.bottle_bw); 421 Set 'on_queue' to the value found in the size on data queue 422 field. 424 on_queue = ntohl(pd.current_onq); 426 Adjust the on_queue for any in-flight data that may yet 427 not have arrived at the bottle neck. 428 if(on_queue < flight_size) { 429 on_queue = flight_size; 430 } 432 Calculate the bandwidth available by multiplying the bottle_bw 433 variable times the rtt and dividing the result by a thousand. 434 Call this value 'bw_avail'. 436 bw_avail = (bottle_bw*rtt)/1000; 438 If more is 'on_queue' than the current value of 'bw_avail' a 439 negative congestion window adjustment is needed. 441 if (on_queue > bw_avail) { 442 Clear the partial bytes acked field. 443 partial_bytes_acked = 0; 445 Subtract the bw_avail from the current on_queue call this 446 value the 'decrease'. 448 overrun = on_queue - bw_avail; 450 Undo any congestion adjustment if a SACK has been processed. 452 if (seen_a_sack_this_packet) 453 cwnd = prev_cwnd 455 Calculate the portion of the onqueue data that is 456 caused by this endpoints in-flight data. 458 seg_inflight = flight_size / mtu 459 seg_onqueue = on_queue / mtu 460 my_portion = (overrun * seg_inflight)/seg_onqueue; 462 If we have already adjusted the cwnd, indicated by the 463 fact that the cwnd is larger than the flight size, we 464 adjust our portion down by a smaller amount i.e the 465 amount we have already adjusted it previously. 467 if( cwnd > flight_size ) 468 adjust = cwnd - flight_size; 469 if( adjust > my_portion) 470 my_portion = 0; 471 else 472 my_portion -= adjust 473 } 475 Adjust the cwnd downward by our calculated amount. 477 cwnd -= my_portion 479 If the current flight size is larger than this new congestion 480 window, set the congestion window to the current flight size. 481 if (flight_size > cwnd) { 482 cwnd = flight_size 483 } 485 If the current congestion window is smaller than a 486 single MTU set the current congestion window to 1 MTU. 487 if (cwnd <= mtu) { 488 cwnd = mtu; 489 } 491 Set ssthresh to the current congestion window minus 1 byte. 492 ssthresh = cwnd - 1; 494 } else { 495 Otherwise an increase is needed. Calculate the increase value 496 'incr' by taking the minimum of one fourth the bw_avail minus 497 the size on queue OR the MTU size times max burst (whichever 498 is smaller). 500 incr = min(((bw_avail - on_queue) >> 2), 501 ((int)asoc.max_burst * (int)mtu)); 503 Add this value to the current congestion window 504 cwnd += incr; 506 After making an increase to the congestion window verify that 507 the value of cwnd is smaller than or equal to the bw_avail if 508 not, set the cwnd to the value of bw_avail. 510 if (cwnd > bw_avail) { 511 cwnd = bw_avail; 512 } 514 } 516 6. Security Considerations 518 TBD 520 7. Recommended Variables 522 The following are the recommended values for variables defined within 523 this document: 525 RTO.Large - 500 Milliseconds. 527 8. Normative References 529 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 530 3", BCP 9, RFC 2026, October 1996. 532 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 533 Requirement Levels", BCP 14, RFC 2119, March 1997. 535 [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., 536 Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., 537 Zhang, L., and V. Paxson, "Stream Control Transmission 538 Protocol", RFC 2960, October 2000. 540 Authors' Addresses 542 Randall R. Stewart 543 Adara Networks 544 2150 First Street 545 San Jose, CA 29036 546 USA 548 Email: randall@lakerest.net 550 Peter Lei 551 Cisco Systems, Inc. 552 8735 West Higgins Road 553 Suite 300 554 Chicago, IL 60631 555 USA 557 Email: peterlei@cisco.com 559 Michael Tuexen 560 Univ. of Applied Sciences Muenster 561 Stegerwaldstr. 39 562 48565 Steinfurt 563 Germany 565 Email: tuexen@fh-muenster.de