idnits 2.17.1 draft-iyengar-sctp-cacc-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 21. -- Found old boilerplate from RFC 3978, Section 5.5 on line 426. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 403. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 410. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 416. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand corner of the first page Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 72 instances of too long lines in the document, the longest one being 6 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. ** The abstract seems to contain references ([RFC2119], [RFC2960]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == Unrecognized Status in '', assuming Proposed Standard (Expected one of 'Standards Track', 'Full Standard', 'Draft Standard', 'Proposed Standard', 'Best Current Practice', 'Informational', 'Experimental', 'Informational', 'Historic'.) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 30, 2005) is 6720 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'IAS05' is defined on line 354, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2960 (Obsoleted by RFC 4960) -- Possible downref: Non-RFC (?) normative reference: ref. 'IAS05' Summary: 10 errors (**), 0 flaws (~~), 4 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force J. R. Iyengar, 2 Category: Internet Draft P. D. Amer 3 Expires: May 31, 2006 University of Delaware 5 R. Stewart 6 Cisco Systems 8 I. Arias-Rodriguez 9 Nokia 11 November 30, 2005 13 Preventing SCTP Congestion Window Overgrowth During Changeover 14 draft-iyengar-sctp-cacc-03.txt 16 Status of this Memo 18 By submitting this Internet-Draft, each author represents that any 19 applicable patent or other IPR claims of which he or she is aware have 20 been or will be disclosed, and any of which he or she becomes aware 21 will be disclosed, in accordance with Section 6 of BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering Task 24 Force (IETF), its areas, and its working groups. Note that other 25 groups may also distribute working documents as Internet-Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 Copyright Notice 40 Copyright (C) The Internet Society (2005). 42 Abstract 44 SCTP [RFC2960] supports IP multihoming at the transport layer. SCTP 45 allows an association to span multiple local and peer IP addresses, 46 and allows the application to dynamically change the primary 47 destination during an active association. We present a problem in the 48 current SCTP specification that results in unnecessary retransmissions 49 and "TCP-unfriendly" growth of the sender's congestion window during 50 certain changeover conditions. We present the problem and propose an 51 algorithm called the Split Fast Retransmit Changeover Aware Congestion 52 Control algorithm (SFR-CACC) as a solution. We recommend the addition 53 of SFR-CACC to the SCTP specification [RFC2960]. 55 Conventions 57 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,SHOULD 58 NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when they appear in 59 this document, are to be interpreted as described in [RFC2119]. 61 1. Introduction 63 In an SCTP [RFC2960] association, the sender transmits data to its 64 peer's primary destination address. SCTP provides for 65 application-initiated changeovers so that the sending application can 66 move the outgoing traffic to another path by changing the sender's 67 primary destination address. We uncovered a problem in the 68 current SCTP specification that results in unnecessary retransmissions 69 and "TCP-unfriendly" growth of the sender's congestion window under 70 certain changeover conditions. We present the problem and propose an 71 algorithm called the Split Fast Retransmit Changeover Aware Congestion 72 Control (SFR-CACC) algorithm as a solution. We recommend the addition 73 of the SFR-CACC algorithm to the SCTP specification [RFC2960]. 75 2. Congestion Window Overgrowth: Problem Description 77 We present a specific example which illustrates the congestion window 78 overgrowth problem. 80 2.1 Example Description: 82 Consider the architecture shown below: 84 ______ _________ ______ 85 | | / \ | | 86 | |A1 <============== Path 1 ============> B1| | 87 | |<------------->| |<------------>| | 88 | Host | | Network | | Host | 89 | A | | | | B | 90 | |<------------->| |<------------>| | 91 | |A2 <============== Path 2 ============> B2| | 92 | | \_________/ | | 93 ------ ------ 95 Fig 1: Example Architecture 97 SCTP endpoints A and B have an association between them. Both 98 endpoints are multihomed, A with network interfaces A1 and A2, and B 99 with interfaces B1 and B2. More precisely, A1, A2, B1 and B2 are IP 100 addresses associated with link layer interfaces. Here we assume only 101 one address per interface, so address and interface are used 102 interchangeably. 104 All four addresses are bound to the SCTP association. For one of 105 several possible reasons (e.g., path diversity, policy based routing, 106 load balancing), we assume in this example that the data traffic from 107 A to B1 is routed through A1, and from A to B2 is routed through A2. 109 Let C1 be the cwnd at A for destination B1, and C2 be the cwnd at A 110 for destination B2. C1 and C2 are denoted in terms of MTUs, not 111 bytes. 113 Consider the following sequence of events: 115 1) The sender (host A) initially sends data to the receiver (host B) 116 using primary destination address B1. This setting causes packets 117 to leave through A1. Assume these packets leave the 118 transport/network layers, and get buffered at A's link layer A1, 119 whereupon they get transmitted according to the channel's 120 availability. We refer to these TSNs (that is, packets) the first 121 group of TSNs. 123 2) Assume as the first group of TSNs is being transmitted through A1, 124 that the sender's application changes the primary destination to 125 B2, thereby causing any new data from the sender to be sent to 126 B2. In the example, we assume C2 = 2 at the moment of changeover 127 and new TSNs (second group of TSNs) are now transmitted to the new 128 primary, B2. This new primary destination causes new TSNs to 129 leave the sender through A2. Concurrently, the packets buffered 130 earlier at A1 are still being transmitted. Previous packets sent 131 through A1, and the packets sent through A2, can arrive at the 132 receiver B in an interleaved fashion on interfaces B1 and B2, 133 respectively. This reordering is introduced as a result of 134 changeover. 136 3) The receiver starts reporting gaps as soon as it notices 137 reordering. If the receiver communicates four missing reports to 138 the sender before all original transmissions of the first group 139 have been acked, the sender will start retransmitting the unacked 140 TSNs on path 2. 142 4) The SACKs for the original transmission of the first group of TSNs 143 reach A on A1. Since the sender cannot distinguish between SACKs 144 generated by transmissions from SACKs generated by 145 retransmissions, the SACKs now received by A on A1 end up acking 146 the retransmissions of the first group of TSNs, incorrectly 147 crediting C2 instead of C1. This behaviour whereby SACKs for 148 original transmissions incorrectly ack retransmissions continues 149 until all original transmissions of the first group are 150 retransmitted to B2. Thus, the SACKs from the original 151 transmissions cause C2 to grow (possibly drastically) from wrong 152 interpretation of the feedback. 154 2.2 Discussion 156 Our preliminary investigation shows that the problem occurs for a 157 range of {end-to-end delay, end-to-end available bandwidth, MTU} 158 settings. [IC+03] gives a more detailed description and analysis of 159 the problem. From the general model developed in [IC+03], we have 160 found that whenever a changeover is made to a higher quality path 161 (i.e., lower end-to-end delay, higher end-to-end available bandwidth 162 path), there is a likelihood of TCP-unfriendly cwnd growth and 163 unnecessary retransmissions. We also note that the bigger the quality 164 improvement that the new path provides, the larger the TCP-unfriendly 165 growth and number of false retransmissions will be. 167 The congestion window overgrowth (i.e., TCP-unfriendly congestion 168 window growth) problem exists even if buffering of the first group 169 occurs not at the sender's link layer, but in a router along the path 170 (in the example architecture, path 1). In essence, the transport 171 layers at the endpoints can be thought of as the sending and 172 receiving entities, and the buffering could potentially be 173 distributed anywhere along the end-to-end path. 175 3. Solution to the Problem: The SFR-CACC Algorithm 177 The problem of TCP-unfriendly cwnd growth occurs due to incorrect fast 178 retransmissions. These incorrect retransmissions occur because the 179 congestion control algorithm at the sender is unaware of the 180 occurrence of a changeover, and is hence unable to identify reordering 181 introduced due to changeover. In [IC+03], we propose the Changeover 182 Aware Congestion Control algorithms (CACC) - the Conservative CACC 183 algorithm (C-CACC), and the Split Fast Retransmit CACC algorithm 184 (SFR-CACC), which curb the TCP-unfriendly cwnd growth by avoiding 185 these unnecessary fast retransmissions. Of the three algorithms, 186 C-CACC has the disadvantage that in the face of loss, a lot of TSNs 187 could potentially have to wait for an RTO when they could have been 188 fast retransmitted. SFR-CACC alleviates this disadvantage. 190 The key idea in SFR-CACC is to maintain state at the sender on a 191 per-destination basis when a changeover happens. On the receipt of a 192 SACK, the sender uses this state to selectively increase the missing 193 report count for TSNs in the retransmission list. In SFR-CACC, we 194 further make the following observation: the reordering observed during 195 changeover happens because TSNs which are supposed to reach the 196 receiver in-sequence end up reaching the receiver in concurrent 197 groups, in-sequence within each group. With this observation, we 198 reason that the Fast Retransmit algorithm can be applied independently 199 within each group. That is, on the receipt of a SACK, if we can 200 estimate the TSN(s) that causes this SACK to be sent from the 201 receiver, we can use the SACK to increment missing report counts 202 within the causative TSN(s)'s group. Our estimate is conservative, if 203 a SACK could have been caused by TSNs in multiple groups, this SACK 204 will be used to increment missing report counts only for TSNs sent to 205 the current primary destination, if any. In the case where multiple 206 changeovers cycle back to a destination while the CHANGEOVER_ACTIVE is 207 still set, CYCLING_CHANGEOVER is set to indicate a double switch to 208 the destination. The CYCLING_CHANGEOVER flag is used to mark TSNs in 209 only the latest group sent to the current primary destination, thus 210 preventing incorrect marking of TSNs in any other changeover 211 range. SFR-CACC also enables Fast Retransmit for TSNs which could have 212 timed out on some destination, but were retransmitted on the current 213 primary destination after the latest changeover to the current primary 214 destination. We now present the SFR-CACC algorithm in its current 215 simplified form, also described in [IS+04,IAS05]. 217 3.1 Variables Introduced 219 In SFR-CACC, four variables are introduced: 221 1) CHANGEOVER_ACTIVE - a flag which indicates the occurrence of 222 a changeover. 223 2) next_tsn_at_change - an unsigned integer, which stores the next 224 TSN to be used by the sender, at the moment of changeover. 225 3) highest_tsn_in_sack_for_dest - an unsigned integer per destination, 226 which stores the highest TSN acked by the current SACK for each 227 destination. 228 4) cacc_saw_newack - a temporary flag per destination, which is used 229 during the processing of a SACK to estimate the causative TSN(s)'s 230 group. 232 3.2 The SFR-CACC Algorithm 234 The following algorithm requires that after a timeout retransmission, 235 the retransmitted TSN MUST be rendered ineligible for further fast 236 retransmission. 238 Upon receipt of a request to change the primary destination 239 address, the sender MUST do the following: 241 1) The sender MUST set CHANGEOVER_ACTIVE to indicate that a 242 changeover has occurred. 244 2) The sender MUST store the next TSN to be sent in 245 next_tsn_at_change. 247 On receipt of a SACK the sender SHOULD execute the following statements: 249 1) If the cumulative ack in the SACK passes next_tsn_at_change, the 250 CHANGEOVER_ACTIVE flag SHOULD be cleared. 252 2) If the SACK contains gap acks and the flag CHANGEOVER_ACTIVE 253 is set, then the receiver of the SACK MUST take the following 254 actions: 256 A) Initialize cacc_saw_newack to 0 for all destination 257 addresses. 259 B) For each TSN t being acked that has not been acked in any 260 SACK so far, set cacc_saw_newack to 1 for the destination that 261 the TSN was sent to. 263 C) Of the TSNs being newly acked, set highest_tsn_in_sack_for_dest to 264 the highest TSN being newly acked for the respective destinations. 266 3) If the CHANGEOVER_ACTIVE flag is set, then the sender MUST execute 267 steps C and D to determine if the missing report count for TSN t 268 SHOULD be incremented. Let d be the destination to which t was 269 sent. 271 C) If cacc_saw_newack is 0 for destination d, then the sender MUST 272 NOT increment missing report count for t. 274 D) If cacc_saw_newack is 1 for destination d, and if 275 highest_tsn_in_sack_for_dest for destination d greater than t 276 then the sender SHOULD increment missing report count for t 277 (according to [RFC2960] and [RA+05]). 279 NOTE: The HTNA algorithm does not need to be applied separately, 280 since step 3.D above covers the functionality of the HTNA algorithm. 282 3.3 Discussion 284 The SFR-CACC algorithm maintains state information during a 285 changeover, and uses this information to avoid incorrect fast 286 retransmissions. Consequently, this algorithm prevents the 287 TCP-unfriendly cwnd growth. This algorithm has the added advantage 288 that no extra bits are added to any packets, and thus the load on the 289 wire and the network is not increased. SFR-CACC is also capable of 290 handling multiple changeovers. One disadvantage of SFR-CACC is that 291 there is added complexity at the sender to maintain and use the added 292 state variables. Some of the TSNs on the old primary may also not be 293 eligible for Fast Retransmit. To quantify the number of TSNs which 294 will be ineligible for Fast Retransmit in the face of loss, let us 295 assume that only one changeover is performed, and that SACKs are not 296 lost. Under these assumptions, potentially only the last four packets 297 sent to the old primary destination will be forced to be retransmitted 298 with an RTO instead of a Fast Retransmit. In other words, under the 299 stated assumptions, if a TSN that is lost has at least four packets 300 successfully transmitted after it to the same destination, then the 301 TSN will be retransmitted via Fast Retransmit. 303 4. Conclusion 305 The general consensus at the IETF has been to dissuade the usage of 306 SCTP's multihoming feature for simultaneous data transfer to the 307 multiple destination addresses, largely due to insufficient research 308 in the area. Though there is some amount of simultaneous data transfer 309 in the described scenario, this phenomenon is an effect of changing 310 the primary destination; not necessarily a result of an application 311 intending to simultaneously transfer data over the multiple paths. 312 Among other reasons, this changeover could be initiated by an 313 application searching for a better path to the peer host for a long 314 session, or attempting to perform a smoother failover. 316 We recommend the addition of SFR-CACC to SCTP [RFC2960] to alleviate 317 the problem of TCP-unfriendly cwnd growth and unnecessary fast 318 retransmissions during a changeover. We have implemented the SFR-CACC 319 algorithm in the NetBSD/FreeBSD release for the KAME stack 320 [WEB_KAME]. The implementation uses three additional flags and 321 one TSN marker per-destination, as described in section 322 3.2. Approximately twenty lines of C code were needed to facilitate 323 SFR-CACC, most of which will be executed only when a changeover is 324 performed in an association. 326 5. Security Considerations 328 This document discusses a congestion control issue during changeover 329 in SCTP. This does not raise any new security issues with SCTP. 331 Acknowledgments 333 The authors would like to thank Vern Paxson, Mark Allman, Phillip 334 Conrad, Armando Caro, Sourabh Ladha and Keyur Shah for providing 335 comments and input. 337 References 339 [RFC2119] S. Bradner. "Key words for use in RFCs to Indicate 340 Requirement Levels". BCP 14, RFC 2119, May 1997. 342 [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, 343 T. Taylor, I. Rytina, M. Kalla, L. Zhang, V. Paxson. "Stream 344 Control Transmission Protocol". RFC2960, October 2000. 346 [IC+03] J. Iyengar, A. Caro, P. Amer, G. Heinz, R. Stewart. "Making 347 SCTP More Robust to Changeover". SPECTS, Montreal, Canada, July 348 2003. 350 [IS+04] J. Iyengar, K. Shah, P. Amer, R. Stewart. "Concurrent 351 Multipath Transfer Using SCTP Multihoming". SPECTS, San Jose, CA, 352 July 2004. 354 [IAS05] J. Iyengar, P. Amer, R. Stewart. "Concurrent Multipath 355 Transfer Using SCTP Multihoming Over Independent End-to-End Paths". 356 To appear in IEEE/ACM Transactions on Networking. 358 [SA+05] R. Stewart, I. Arias-Rodriguez, K. Poon, A. Caro, 359 M. Tuexen. "Stream Control Transmission Protocol (SCTP) 360 Specification Errata and Issues". Internet Draft: 361 draft-ietf-tsvwg-sctpimpguide-16.txt, October 2005. (work in 362 progress) 364 [WEB_KAME] Webpage of the KAME Project, http://www.kame.org 366 Authors' Addresses 368 Janardhan R. Iyengar 369 Department of Computer & Information Sciences 370 University of Delaware 371 103 Smith Hall 372 Newark, DE 19716, USA 373 email: iyengar@cis.udel.edu 375 Paul D. Amer 376 Department of Computer & Information Sciences 377 University of Delaware 378 103 Smith Hall 379 Newark, DE 19716, USA 380 email: amer@cis.udel.edu 382 Randall R. Stewart 383 24 Burning Bush Trail 384 Crystal Lake, IL 60012, USA 385 email: rrs@cisco.com 387 Ivan Arias-Rodriguez 388 Nokia Research Center 389 PO Box 407 390 FIN-00045 Nokia Group 391 Finland 392 email: ivan.arias-rodriguez@nokia.com 394 Intellectual Property Statement 396 The IETF takes no position regarding the validity or scope of any 397 Intellectual Property Rights or other rights that might be claimed to 398 pertain to the implementation or use of the technology described in 399 this document or the extent to which any license under such rights 400 might or might not be available; nor does it represent that it has 401 made any independent effort to identify any such rights. Information 402 on the procedures with respect to rights in RFC documents can be found 403 in BCP 78 and BCP 79. 405 Copies of IPR disclosures made to the IETF Secretariat and any 406 assurances of licenses to be made available, or the result of an 407 attempt made to obtain a general license or permission for the use of 408 such proprietary rights by implementers or users of this specification 409 can be obtained from the IETF on-line IPR repository at 410 http://www.ietf.org/ipr. 412 The IETF invites any interested party to bring to its attention any 413 copyrights, patents or patent applications, or other proprietary 414 rights that may cover technology that may be required to implement 415 this standard. Please address the information to the IETF at 416 ietf-ipr@ietf.org. 418 Disclaimer of Validity 420 This document and the information contained herein are provided on 421 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 422 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE 423 INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 424 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 425 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 426 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 428 Copyright Statement 430 Copyright (C) The Internet Society (2005). This document is subject 431 to the rights, licenses and restrictions contained in BCP 78, and 432 except as set forth therein, the authors retain all their rights. 434 Acknowledgment 436 Funding for the RFC Editor function is currently provided by the 437 Internet Society.