idnits 2.17.1 draft-ietf-tsvwg-sctp-failover-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 25, 2016) is 2985 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nishida 3 Internet-Draft GE Global Research 4 Intended status: Standards Track P. Natarajan 5 Expires: July 28, 2016 Cisco Systems 6 A. Caro 7 BBN Technologies 8 P. Amer 9 University of Delaware 10 K. Nielsen 11 Ericsson 12 January 25, 2016 14 SCTP-PF: Quick Failover Algorithm in SCTP 15 draft-ietf-tsvwg-sctp-failover-15.txt 17 Abstract 19 SCTP supports multi-homing. However, when the failover operation 20 specified in RFC4960 is followed, there can be significant delay and 21 performance degradation in the data transfer path failover. To 22 overcome this problem this document specifies a quick failover 23 algorithm (SCTP-PF) based on the introduction of a Potentially Failed 24 (PF) state in SCTP Path Management. 26 The document also specifies a dormant state operation of SCTP. This 27 dormant state operation is required to be followed by an SCTP-PF 28 implementation, but it may equally well be applied by a standard 29 RFC4960 SCTP implementation. 31 Additionally, the document introduces an alternative switchback 32 operation mode called Primary Path Switchover that will be beneficial 33 in certain situations. This mode of operation applies to both a 34 standard RFC4960 SCTP implementation as well as to a SCTP-PF 35 implementation. 37 The procedures defined in the document require only minimal 38 modifications to the RFC4960 specification. The procedures are 39 sender-side only and do not impact the SCTP receiver. 41 Status of This Memo 43 This Internet-Draft is submitted in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF). Note that other groups may also distribute 48 working documents as Internet-Drafts. The list of current Internet- 49 Drafts is at http://datatracker.ietf.org/drafts/current/. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 This Internet-Draft will expire on July 28, 2016. 58 Copyright Notice 60 Copyright (c) 2016 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (http://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 76 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4 77 3. SCTP with Potentially Failed Destination State (SCTP-PF) . . 4 78 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 79 3.2. Specification of the SCTP-PF Procedures . . . . . . . . . 5 80 4. Dormant State Operation . . . . . . . . . . . . . . . . . . . 9 81 4.1. SCTP Dormant State Procedure . . . . . . . . . . . . . . 10 82 5. Primary Path Switchover . . . . . . . . . . . . . . . . . . . 10 83 6. Suggested SCTP Protocol Parameter Values . . . . . . . . . . 12 84 7. Socket API Considerations . . . . . . . . . . . . . . . . . . 12 85 7.1. Support for the Potentially Failed Path State . . . . . . 13 86 7.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket 87 Option . . . . . . . . . . . . . . . . . . . . . . . . . 14 88 7.3. Exposing the Potentially Failed Path State 89 (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option . . 15 90 8. Security Considerations . . . . . . . . . . . . . . . . . . . 15 91 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 92 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 93 11. Proposed Change of Status (to be Deleted before Publication) 16 94 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 95 12.1. Normative References . . . . . . . . . . . . . . . . . . 17 96 12.2. Informative References . . . . . . . . . . . . . . . . . 17 97 Appendix A. Discussions of Alternative Approaches . . . . . . . 18 98 A.1. Reduce Path.Max.Retrans (PMR) . . . . . . . . . . . . . . 18 99 A.2. Adjust RTO related parameters . . . . . . . . . . . . . . 19 100 Appendix B. Discussions for Path Bouncing Effect . . . . . . . . 19 101 Appendix C. SCTP-PF for SCTP Single-homed Operation . . . . . . 20 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 104 1. Introduction 106 The Stream Control Transmission Protocol (SCTP) specified in 107 [RFC4960] supports multi-homing at the transport layer. SCTP's 108 multi-homing features include failure detection and failover 109 procedures to provide network interface redundancy and improved end- 110 to-end fault tolerance. In SCTP's current failure detection 111 procedure, the sender must experience Path.Max.Retrans (PMR) number 112 of consecutive failed timer-based retransmissions on a destination 113 address before detecting a path failure. Until detecting the path 114 failure, the sender continues to transmit data on the failed path. 115 The prolonged time in which [RFC4960] SCTP continues to use a failed 116 path severely degrades the performance of the protocol. To address 117 this problem, this document specifies a quick failover algorithm 118 (SCTP-PF) based on the introduction of a new Potentially Failed (PF) 119 path state in SCTP path management. The performance deficiencies of 120 the [RFC4960] failover operation, and the improvements obtainable 121 from the introduction of a Potentially Failed state in SCTP, were 122 proposed and documented in [NATARAJAN09] for Concurrent Multipath 123 Transfer SCTP [IYENGAR06]. 125 While SCTP-PF can accelerate failover process and improve 126 performance, the risks that an SCTP endpoint enters the dormant state 127 where all destination addresses are inactive can be increased. 128 [RFC4960] leaves the protocol operation during dormant state to 129 implementations and encourages to avoid entering the state as much as 130 possible by careful tuning of the Path.Max.Retrans (PMR) and 131 Association.Max.Retrans (AMR) parameters. We specify a dormant state 132 operation for SCTP-PF which makes SCTP-PF provide the same disruption 133 tolerance as [RFC4960] despite that the dormant state may be entered 134 more quickly. The dormant state operation may equally well be 135 applied by an [RFC4960] implementation and will here serve to provide 136 added fault tolerance for situations where the tuning of the 137 Path.Max.Retrans (PMR) and Association.Max.Retrans (AMR) parameters 138 fail to provide adequate prevention of the entering of the dormant 139 state. 141 The operation after the recovery of a failed path also impacts the 142 performance of the protocol. With the procedures specified in 143 [RFC4960] SCTP will, after a failover from the primary path, switch 144 back to use the primary path for data transfer as soon as this path 145 becomes available again. From a performance perspective such a 146 forced switchback of the data transmission path can be suboptimal as 147 the CWND towards the original primary destination address has to be 148 rebuilt once data transfer resumes, [CARO02]. As an optional 149 alternative to the switchback operation of [RFC4960], this document 150 specifies an alternative Primary Path Switchover procedure which 151 avoid such forced switchbacks of the data transfer path. The Primary 152 Path Switchover operation was originally proposed in [CARO02]. 154 While SCTP-PF primarily is motivated by a desire to improve the 155 multi-homed operation, the feature applies also to SCTP single-homed 156 operation. Here the algorithm serves to provide increased failure 157 detection on idle associations, whereas the failover or switchback 158 aspects of the algorithm will not be activated. This is discussed in 159 more detail in Appendix C. 161 A brief description of the motivation for the introduction of the 162 Potentially Failed state including a discussion of alternative 163 approaches to mitigate the deficiencies of the [RFC4960] failover 164 operation are given in the Appendices. Discussion of path bouncing 165 effects that might be caused by frequent switchovers, are also 166 provided there. 168 2. Conventions and Terminology 170 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 171 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 172 document are to be interpreted as described in [RFC2119]. 174 3. SCTP with Potentially Failed Destination State (SCTP-PF) 176 3.1. Overview 178 To minimize the performance impact during failover, the sender should 179 avoid transmitting data to a failed destination address as early as 180 possible. In the [RFC4960] SCTP path management scheme, the sender 181 stops transmitting data to a destination address only after the 182 destination address is marked inactive. This process takes a 183 significant amount of time as it requires the error counter of the 184 destination address to exceed the Path.Max.Retrans (PMR) threshold. 185 The issue cannot simply be mitigated by lowering of the PMR threshold 186 because this may result in spurious failure detection and unnecessary 187 prevention of the usage of a preferred primary path. Also due to the 188 coupled tuning of the Path.Max.Retrans (PMR) and the 189 Association.Max.Retrans (AMR) parameter values in [RFC4960], lowering 190 of the PMR threshold may result in lowering of the AMR threshold, 191 which would result in decrease of the fault tolerance of SCTP. 193 The solution provided in this document is to extend the SCTP path 194 management scheme of [RFC4960] by the addition of the Potentially 195 Failed (PF) state as an intermediate state in between the active and 196 inactive state of a destination address in the [RFC4960] path 197 management scheme, and let the failover of data transfer away from a 198 destination address be driven by the entering of the PF state instead 199 of by the entering of the inactive state. Thereby SCTP may perform 200 quick failover without negatively impacting the overall fault 201 tolerance of [RFC4960] SCTP. At the same time, RTO-based HEARTBEAT 202 probing is initiated towards a destination address once it enters PF 203 state. Thereby SCTP may quickly ascertain whether network 204 connectivity towards the destination address is broken or whether the 205 failover was spurious. In the case where the failover was spurious 206 data transfer may quickly resume towards the original destination 207 address. 209 The new failure detection algorithm assumes that loss detected by a 210 timeout implies either severe congestion or network connectivity 211 failure. It recommends that by default a destination address is 212 classified as PF at the occurrence of the first timeout. 214 3.2. Specification of the SCTP-PF Procedures 216 The SCTP-PF operation is specified as follows: 218 1. The sender maintains a new tunable SCTP Protocol Parameter 219 called PotentiallyFailed.Max.Retrans (PFMR). The PFMR defines 220 the new intermediate PF threshold on the destination address 221 error counter. When this threshold is exceeded the destination 222 address is classified as PF. The RECOMMENDED value of PFMR is 223 0, but other values MAY be used. If PFMR is set to be greater 224 than or equal to Path.Max.Retrans (PMR), the resulting PF 225 threshold will be so high that the destination address will 226 reach the inactive state before it can be classified as PF. 228 2. The error counter of an active destination address is 229 incremented as specified in [RFC4960]. This means that the 230 error counter of the destination address will be incremented 231 each time the T3-rtx timer expires, or each time a HEARTBEAT 232 chunk is sent when idle and not acknowledged within an RTO. 233 When the value in the destination address error counter exceeds 234 PFMR, the endpoint MUST mark the destination address as in the 235 PF state. 237 3. A SCTP-PF sender SHOULD NOT send data to destination addresses 238 in PF state when alternative destination addresses in active 239 state are available. Specifically this means that: 241 i When there is outbound data to send and the destination 242 address presently used for data transmission is in PF state, 243 the sender SHOULD choose a destination address in active 244 state, if one exists, and use this destination address for 245 data transmission. 247 ii When retransmitting data that has timed out and the sender 248 thus by [RFC4960], section 6.4.1, should attempt to pick a 249 new destination address for data retransmission, the sender 250 SHOULD choose an alternate destination transport address in 251 active state if one exists. 253 iii When there is outbound data to send and the SCTP user 254 explicitly requests to send data to a destination address in 255 PF state, the sender SHOULD send the data to an alternate 256 destination address in active state if one exists. 258 When choosing among multiple destination addresses in active 259 state an SCTP sender will follow the guiding principles of 260 section 6.4.1 of [RFC4960] of choosing most divergent source- 261 destination pairs compared with, for i.: the destination address 262 in PF state that it performs a failover from, and for ii.: the 263 destination address towards which the data timed out. Rules for 264 picking the most divergent source-destination pair are an 265 implementation decision and are not specified within this 266 document. 268 In all cases, the sender MUST NOT change the state of chosen 269 destination address, whether this state be active or PF, and it 270 MUST NOT clear the error counter of the destination address as a 271 result of choosing the destination address for data 272 transmission. 274 4. When the destination addresses are all in PF state or some in PF 275 state and some in inactive state, the sender MUST choose one 276 destination address in PF state and transmit or retransmit data 277 to this destination address using the following rules: 279 A. The sender SHOULD choose the destination in PF state with 280 the lowest error count (fewest consecutive timeouts) for 281 data transmission and transmit or retransmit data to this 282 destination. 284 B. When there are multiple destination addresses in PF state 285 with same error count, the sender should let the choice 286 among the multiple destination addresses in PF state with 287 equal error count be based on the [RFC4960], section 6.4.1, 288 principles of choosing most divergent source-destination 289 pairs when executing (potentially consecutive) 290 retransmission. Rules for picking the most divergent 291 source-destination pair are an implementation decision and 292 are not specified within this document. 294 The sender MUST NOT change the state and the error counter of 295 any destination address regardless of whether it has been chosen 296 for transmission or not. 298 5. The HB.interval of the Path Heartbeat function of [RFC4960] MUST 299 be ignored for destination addresses in PF state. Instead 300 HEARTBEAT chunks are sent to destination addresses in PF state 301 once per RTO. HEARTBEAT chunks SHOULD be sent to destination 302 addresses in PF state, but the sending of HEARTBEATS MUST honor 303 whether the Path Heartbeat function (Section 8.3 of [RFC4960]) 304 is enabled for the destination address or not. I.e., if the 305 Path Heartbeat function is disabled for the destination address 306 in question, HEARTBEATS MUST NOT be sent. Note that when 307 Heartbeat function is disabled, it may take longer to transition 308 a destination address in PF state back to active state. 310 6. HEARTBEATs are sent when a destination address reaches the PF 311 state. When a HEARTBEAT chunk is not acknowledged within the 312 RTO, the sender increments the error counter and exponentially 313 backs off the RTO value. If the error counter is less than PMR, 314 the sender transmits another packet containing the HEARTBEAT 315 chunk immediately after timeout expiration on the previous 316 HEARTBEAT. When data is being transmitted to a destination 317 address in the PF state, the transmission of a HEARTBEAT chunk 318 MAY be omitted in case where the receipt of a SACK of the data 319 or a T3-rtx timer expiration on the data can provide equivalent 320 information, such as the case where the data chunk has been 321 transmitted to a single destination address only. Likewise, the 322 timeout of a HEARTBEAT chunk MAY be ignored if data is 323 outstanding towards the destination address. 325 7. When the sender receives a HEARTBEAT ACK from a HEARTBEAT sent 326 to a destination address in PF state, the sender SHOULD clear 327 the error counter of the destination address and transition the 328 destination address back to active state. When the sender 329 resumes data transmission on a destination address after a 330 transition of the destination address from PF to active state, 331 it MUST do this following the prescriptions of Section 7.2 of 332 [RFC4960]. In a situation where a HEARTBEAT ACK arrives while 333 there is data outstanding towards the destination address to 334 which the HEARTBEAT was sent, then an implementation MAY choose 335 to not have the HEARTBEAT ACK reset the error counter, but have 336 the error counter reset await the fate of the outstanding data 337 transmission. This situation can happen when data is sent to a 338 destination address in PF state. 340 8. Additional (PMR - PFMR) consecutive timeouts on a destination 341 address in PF state confirm the path failure, upon which the 342 destination address transitions to the inactive state. As 343 described in [RFC4960], the sender (i) SHOULD notify the ULP 344 about this state transition, and (ii) transmit HEARTBEAT chunks 345 to the inactive destination address at a lower HB.interval 346 frequency as described in Section 8.3 of [RFC4960] (when the 347 Path Heartbeat function is enabled for the destination address). 349 9. Acknowledgments for chunks that have been transmitted to 350 multiple destinations (i.e., a chunk which has been 351 retransmitted to a different destination address than the 352 destination address to which the chunk was first transmitted) 353 SHOULD NOT clear the error count for an inactive destination 354 address and SHOULD NOT move a destination address in PF state 355 back to active state, since a sender cannot disambiguate whether 356 the ACK was for the original transmission or the 357 retransmission(s). A SCTP sender MAY clear the error counter 358 and move a destination address back to active state if it has 359 other information, than the acknowledgment, that uniquely 360 determines which destination, among multiple destination 361 addresses, the chunk reached. This document makes no reference 362 to what such information could consist of, nor how such 363 information could be obtained. 365 10. Acknowledgments for data chunks that has been transmitted to one 366 destination address only MUST clear the error counter for the 367 destination address and MUST transition a destination address in 368 PF state back to active state. This situation can happen when 369 new data is sent to a destination address in the PF state. It 370 can also happen in situations where the destination address is 371 in the PF state due to the occurrence of a spurious T3-rtx timer 372 and acknowledgments start to arrive for data sent prior to 373 occurrence of the spurious T3-rtx and data has not yet been 374 retransmitted towards other destinations. This document does 375 not specify special handling for detection of or reaction to 376 spurious T3-rtx timeouts, e.g., for special operation vis-a-vis 377 the congestion control handling or data retransmission operation 378 towards a destination address which undergoes a transition from 379 active to PF to active state due to a spurious T3-rtx timeout. 380 But it is noted that this is an area which would benefit from 381 additional attention, experimentation and specification for 382 single-homed SCTP as well as for multi-homed SCTP protocol 383 operation. 385 11. When all destination addresses are in inactive state, and SCTP 386 protocol operation thus is said to be in dormant state, the 387 prescriptions given in Section 4 shall be followed. 389 12. The SCTP stack SHOULD expose the PF state of its destination 390 addresses to the ULP as well as provide the means to notify the 391 ULP of state transitions of its destination addresses from 392 active to PF, and vice-versa. However it is recommended that an 393 SCTP stack implementing SCTP-PF also allows for that the ULP is 394 kept ignorant of the PF state of its destinations and the 395 associated state transitions, thus allowing for retain of the 396 simpler state transition model of RFC4960 in the ULP. For this 397 reason it is recommended that an SCTP stack implementing SCTP-PF 398 also provides the ULP with the means to suppress exposure of the 399 PF state and the associated state transitions. 401 4. Dormant State Operation 403 In a situation with complete disruption of the communication in 404 between the SCTP Endpoints, the aggressive HEARTBEAT transmissions of 405 SCTP-PF on destination addresses in PF state may make the association 406 enter dormant state faster than a standard [RFC4960] SCTP 407 implementation given the same setting of Path.Max.Retrans (PMR) and 408 Association.Max.Retrans (AMR). For example, an SCTP association with 409 two destination addresses typically would reach dormant state in half 410 the time of an [RFC4960] SCTP implementation in such situations. 411 This is because a SCTP PF sender will send HEARTBEATS and data 412 retransmissions in parallel with RTO intervals when there are 413 multiple destinations addresses in PF state. This argument presumes 414 that RTO << HB.interval of [RFC4960]. With the design goal that 415 SCTP-PF shall provide the same level of disruption tolerance as an 416 [RFC4960] SCTP implementation with the same Path.Max.Retrans (PMR) 417 and Association.Max.Retrans (AMR) setting, we prescribe for that an 418 SCTP-PF implementation SHOULD operate as described below in 419 Section 4.1 during dormant state. 421 An SCTP-PF implementation MAY choose a different dormant state 422 operation than the one described below in Section 4.1 provided that 423 the solution chosen does not decrease the fault tolerance of the 424 SCTP-PF operation. 426 The below prescription for SCTP-PF dormant state handling SHOULD NOT 427 be coupled to the value of the PFMR, but solely to the activation of 428 SCTP-PF logic in an SCTP implementation. 430 It is noted that the below dormant state operation is considered to 431 provide added disruption tolerance also for an [RFC4960] SCTP 432 implementation, and that it can be sensible for an [RFC4960] SCTP 433 implementation to follow this mode of operation. For an [RFC4960] 434 SCTP implementation the continuation of data transmission during 435 dormant state makes the fault tolerance of SCTP be more robust 436 towards situations where some, or all, alternative paths of an SCTP 437 association approach, or reach, inactive state before the primary 438 path used for data transmission observes trouble. 440 4.1. SCTP Dormant State Procedure 442 a. When the destination addresses are all in inactive state and data 443 is available for transfer, the sender MUST choose one destination 444 and transmit data to this destination address. 446 b. The sender MUST NOT change the state of the chosen destination 447 address (it remains in inactive state) and it MUST NOT clear the 448 error counter of the destination address as a result of choosing 449 the destination address for data transmission. 451 c. The sender SHOULD choose the destination in inactive state with 452 the lowest error count (fewest consecutive timeouts) for data 453 transmission. When there are multiple destinations with same 454 error count in inactive state, the sender SHOULD attempt to pick 455 the most divergent source - destination pair from the last source 456 - destination pair where failure was observed. Rules for picking 457 the most divergent source-destination pair are an implementation 458 decision and are not specified within this document. To support 459 differentiation of inactive destination addresses based on their 460 error count SCTP will need to allow for increment of the 461 destination address error counters up to some reasonable limit 462 above PMR+1, thus changing the prescriptions of [RFC4960], 463 section 8.3, in this respect. The exact limit to apply is not 464 specified in this document but it is considered reasonable to 465 require for the limit to be an order of magnitude higher than the 466 PMR value. A sender MAY choose to deploy other strategies that 467 the strategy defined here. The strategy to prioritize the last 468 active destination address, i.e., the destination address with 469 the fewest error counts is optimal when some paths are 470 permanently inactive, but suboptimal when a path instability is 471 transient. 473 5. Primary Path Switchover 475 The objective of the Primary Path Switchover operation is to allow 476 the SCTP sender to continue data transmission on a new working path 477 even when the old primary destination address becomes active again. 478 This is achieved by having SCTP perform a switchover of the primary 479 path to the new working path if the error counter of the primary path 480 exceeds a certain threshold. This mode of operation can be applied 481 not only to SCTP-PF implementations, but also to [RFC4960] 482 implementations. 484 The Primary Path Switchover operation requires only sender side 485 changes. The details are: 487 1. The sender maintains a new tunable parameter, called 488 Primary.Switchover.Max.Retrans (PSMR). For SCTP-PF 489 implementations, the PSMR MUST be set greater or equal to the 490 PFMR value. For [RFC4960] implementations the PSMR MUST be set 491 greater or equal to the PMR value. Implementations MUST reject 492 any other values of PSMR. 494 2. When the path error counter on a set primary path exceeds PSMR, 495 the SCTP implementation MUST autonomously select and set a new 496 primary path. 498 3. The primary path selected by the SCTP implementation MUST be the 499 path which at the given time would be chosen for data transfer. 500 A previously failed primary path can be used as data transfer 501 path as per normal path selection when the present data transfer 502 path fails. 504 4. For SCTP-PF, the recommended value of PSMR is PFMR when Primary 505 Path Switchover operation mode is used. This means that no 506 forced switchback to a previously failed primary path is 507 performed. An SCTP-PF implementation of Primary Path Switchover 508 MUST support the setting of PSMR = PFMR. A SCTP-PF 509 implementation of Primary Path Switchover MAY support setting of 510 PSMR > PFMR. 512 5. For [RFC4960] SCTP, the recommended value of PSMR is PMR when 513 Primary Path Switchover is used. This means that no forced 514 switchback to a previously failed primary path is performed. A 515 [RFC4960] SCTP implementation of Primary Path Switchover MUST 516 support the setting of PSMR = PMR. An [RFC4960] SCTP 517 implementation of Primary Path Switchover MAY support larger 518 settings of PSMR > PMR. 520 6. It MUST be possible to disable the Primary Path Switchover 521 operation and obtain the standard switchback operation of 522 [RFC4960]. 524 The manner of switchover operation that is most optimal in a given 525 scenario depends on the relative quality of a set primary path versus 526 the quality of alternative paths available as well as on the extent 527 to which it is desired for the mode of operation to enforce traffic 528 distribution over a number of network paths. I.e., load distribution 529 of traffic from multiple SCTP associations may be sought to be 530 enforced by distribution of the set primary paths with [RFC4960] 531 switchback operation. However as [RFC4960] switchback behavior is 532 suboptimal in certain situations, especially in scenarios where a 533 number of equally good paths are available, an SCTP implementation 534 MAY support also, as alternative behavior, the Primary Path 535 Switchover mode of operation and MAY enable it based on applications' 536 requests. 538 For an SCTP implementation that implements the Primary Path 539 Switchover operation, this specification RECOMMENDS that the standard 540 RFC4960 switchback operation is retained as the default operation. 542 6. Suggested SCTP Protocol Parameter Values 544 This document does not alter the [RFC4960] value RECOMMENDATIONS for 545 the SCTP Protocol Parameters defined in [RFC4960]. 547 The following protocol parameter is RECOMMENDED: 549 PotentiallyFailed.Max.Retrans (PFMR) - 0 551 7. Socket API Considerations 553 This section describes how the socket API defined in [RFC6458] is 554 extended to provide a way for the application to control and observe 555 the SCTP-PF behavior as well as the Primary Path Switchover function. 557 Please note that this section is informational only. 559 A socket API implementation based on [RFC6458] is, by means of the 560 existing SCTP_PEER_ADDR_CHANGE event, extended to provide the event 561 notification when a peer address enters or leaves the potentially 562 failed state as well as the socket API implementation is extended to 563 expose the potentially failed state of a peer address in the existing 564 SCTP_GET_PEER_ADDR_INFO structure. 566 Furthermore, two new read/write socket options for the level 567 IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS and 568 SCTP_EXPOSE_POTENTIALLY_FAILED_STATE are defined as described below. 569 The first socket option is used to control the values of the PFMR and 570 PSMR parameters described in Section 3 and in Section 5. The second 571 one controls the exposition of the potentially failed path state. 573 Support for the SCTP_PEER_ADDR_THLDS and 574 SCTP_EXPOSE_POTENTIALLY_FAILED_STATE socket options need also to be 575 added to the function sctp_opt_info(). 577 7.1. Support for the Potentially Failed Path State 579 As defined in [RFC6458], the SCTP_PEER_ADDR_CHANGE event is provided 580 if the status of a peer address changes. In addition to the state 581 changes described in [RFC6458], this event is also provided, if a 582 peer address enters or leaves the potentially failed state. The 583 notification as defined in [RFC6458] uses the following structure: 585 struct sctp_paddr_change { 586 uint16_t spc_type; 587 uint16_t spc_flags; 588 uint32_t spc_length; 589 struct sockaddr_storage spc_aaddr; 590 uint32_t spc_state; 591 uint32_t spc_error; 592 sctp_assoc_t spc_assoc_id; 593 } 595 [RFC6458] defines the constants SCTP_ADDR_AVAILABLE, 596 SCTP_ADDR_UNREACHABLE, SCTP_ADDR_REMOVED, SCTP_ADDR_ADDED, and 597 SCTP_ADDR_MADE_PRIM to be provided in the spc_state field. This 598 document defines in addition to that the new constant 599 SCTP_ADDR_POTENTIALLY_FAILED, which is reported if the affected 600 address becomes potentially failed. 602 The SCTP_GET_PEER_ADDR_INFO socket option defined in [RFC6458] can be 603 used to query the state of a peer address. It uses the following 604 structure: 606 struct sctp_paddrinfo { 607 sctp_assoc_t spinfo_assoc_id; 608 struct sockaddr_storage spinfo_address; 609 int32_t spinfo_state; 610 uint32_t spinfo_cwnd; 611 uint32_t spinfo_srtt; 612 uint32_t spinfo_rto; 613 uint32_t spinfo_mtu; 614 }; 616 [RFC6458] defines the constants SCTP_UNCONFIRMED, SCTP_ACTIVE, and 617 SCTP_INACTIVE to be provided in the spinfo_state field. This 618 document defines in addition to that the new constant 619 SCTP_POTENTIALLY_FAILED, which is reported if the peer address is 620 potentially failed. 622 7.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket Option 624 Applications can control the SCTP-PF behavior by getting or setting 625 the number of consecutive timeouts before a peer address is 626 considered potentially failed or unreachable. The same socket option 627 is used by applications to set and get the number of timeouts before 628 the primary path is changed automatically by the Primary Path 629 Switchover function. This socket option uses the level IPPROTO_SCTP 630 and the name SCTP_PEER_ADDR_THLDS. 632 The following structure is used to access and modify the thresholds: 634 struct sctp_paddrthlds { 635 sctp_assoc_t spt_assoc_id; 636 struct sockaddr_storage spt_address; 637 uint16_t spt_pathmaxrxt; 638 uint16_t spt_pathpfthld; 639 uint16_t spt_pathcpthld; 640 }; 642 spt_assoc_id: This parameter is ignored for one-to-one style 643 sockets. For one-to-many style sockets the application may fill 644 in an association identifier or SCTP_FUTURE_ASSOC. It is an error 645 to use SCTP_{CURRENT|ALL}_ASSOC in spt_assoc_id. 647 spt_address: This specifies which peer address is of interest. If a 648 wild card address is provided, this socket option applies to all 649 current and future peer addresses. 651 spt_pathmaxrxt: Each peer address of interest is considered 652 unreachable, if its path error counter exceeds spt_pathmaxrxt. 654 spt_pathpfthld: Each peer address of interest is considered 655 Potentially Failed, if its path error counter exceeds 656 spt_pathpfthld. 658 spt_pathcpthld: Each peer address of interest is not considered the 659 primary remote address anymore, if its path error counter exceeds 660 spt_pathcpthld. Using a value of 0xffff disables the selection of 661 a new primary peer address. If an implementation does not support 662 the automatically selection of a new primary address, it should 663 indicate an error with errno set to EINVAL if a value different 664 from 0xffff is used in spt_pathcpthld. For SCTP-PF, the setting 665 of spt_pathcpthld < spt_pathpfthld should be rejected with errno 666 set to EINVAL. For [RFC4960] SCTP, the setting of spt_pathcpthld 667 < spt_pathmaxrxt should be rejected with errno set to EINVAL. A 668 SCTP-PF implementation MAY support only setting of spt_pathcpthld 669 = spt_pathpfthld and spt_pathcpthld = 0xffff and a [RFC4960] SCTP 670 implementation MAY support only setting of spt_pathcpthld = 671 spt_pathmaxrxt and spt_pathcpthld = 0xffff. In these cases SCTP 672 shall reject setting of other values with errno set to EINVAL. 674 7.3. Exposing the Potentially Failed Path State 675 (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option 677 Applications can control the exposure of the potentially failed path 678 state in the SCTP_PEER_ADDR_CHANGE event and the 679 SCTP_GET_PEER_ADDR_INFO as described in Section 7.1. The default 680 value is implementation specific. 682 This socket option uses the level IPPROTO_SCTP and the name 683 SCTP_EXPOSE_POTENTIALLY_FAILED_STATE. 685 The following structure is used to control the exposition of the 686 potentially failed path state: 688 struct sctp_assoc_value { 689 sctp_assoc_t assoc_id; 690 uint32_t assoc_value; 691 }; 693 assoc_id: This parameter is ignored for one-to-one style sockets. 694 For one-to-many style sockets the application may fill in an 695 association identifier or SCTP_FUTURE_ASSOC. It is an error to 696 use SCTP_{CURRENT|ALL}_ASSOC in assoc_id. 698 assoc_value: The potentially failed path state is exposed if and 699 only if this parameter is non-zero. 701 8. Security Considerations 703 Security considerations for the use of SCTP and its APIs are 704 discussed in [RFC4960] and [RFC6458]. 706 The logic introduced by this document does not impact existing SCTP 707 messages on the wire. Also, this document does not introduce any new 708 SCTP messages on the wire that require new security considerations. 710 SCTP-PF makes SCTP not only more robust during primary path failure/ 711 congestion but also more vulnerable to network connectivity/ 712 congestion attacks on the primary path. SCTP-PF makes it easier for 713 an attacker to trick SCTP to change data transfer path, since the 714 duration of time that an attacker needs to negatively influence the 715 network connectivity is much shorter than [RFC4960]. However, SCTP- 716 PF does not constitute a significant change in the duration of time 717 and effort an attacker needs to keep SCTP away from the primary path. 719 With the standard switchback operation [RFC4960] SCTP resumes data 720 transfer on its primary path as soon as the next HEARTBEAT succeeds. 722 On the other hand, usage of the Primary Path Switchover mechanism, 723 does change the threat analysis. This is because on-path attackers 724 can force a permanent change of the data transfer path by blocking 725 the primary path until the switchover of the primary path is 726 triggered by the Primary Path Switchover algorithm. This especially 727 will be the case when the Primary Path Switchover is used together 728 with SCTP-PF with the particular setting of PSMR = PFMR = 0, as 729 Primary Path Switchover here happens already at the first RTO timeout 730 experienced. Users of the Primary Path Switchover mechanism should 731 be aware of this fact. 733 The event notification of path state transfer from active to 734 potentially failed state and vice versa gives attackers an increased 735 possibility to generate more local events. However, it is assumed 736 that event notifications are rate-limited in the implementation to 737 address this threat. 739 9. IANA Considerations 741 This document does not create any new registries or modify the rules 742 for any existing registries managed by IANA. 744 10. Acknowledgements 746 The authors wish to thank Michael Tuexen for his many invaluable 747 comments and for his very substantial support with the making of this 748 document. 750 11. Proposed Change of Status (to be Deleted before Publication) 752 Initially this work looked to entail some changes of the Congestion 753 Control (CC) operation of SCTP and for this reason the work was 754 proposed as Experimental. These intended changes of the CC operation 755 have since been judged to be irrelevant and are no longer part of the 756 specification. As the specification entails no other potential 757 harmful features, consensus exists in the WG to bring the work 758 forward as PS. 760 Initially concerns have been expressed about the possibility for the 761 mechanism to introduce path bouncing with potential harmful network 762 impacts. These concerns are believed to be unfounded. This issue is 763 addressed in Appendix B. 765 It is noted that the feature specified by this document is 766 implemented by multiple SCTP SW implementations and furthermore that 767 various variants of the solution have been deployed in telephony 768 signaling environments for several years with good results. 770 12. References 772 12.1. Normative References 774 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 775 Requirement Levels", BCP 14, RFC 2119, March 1997. 777 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 778 4960, September 2007. 780 12.2. Informative References 782 [CARO02] Caro Jr., A., Iyengar, J., Amer, P., Heinz, G., and R. 783 Stewart, "A Two-level Threshold Recovery Mechanism for 784 SCTP", Tech report, CIS Dept, University of Delaware , 7 785 2002. 787 [CARO04] Caro Jr., A., Amer, P., and R. Stewart, "End-to-End 788 Failover Thresholds for Transport Layer Multi homing", 789 MILCOM 2004 , 11 2004. 791 [CARO05] Caro Jr., A., "End-to-End Fault Tolerance using Transport 792 Layer Multi homing", Ph.D Thesis, University of Delaware , 793 1 2005. 795 [FALLON08] 796 Fallon, S., Jacob, P., Qiao, Y., Murphy, L., Fallon, E., 797 and A. Hanley, "SCTP Switchover Performance Issues in WLAN 798 Environments", IEEE CCNC 2008, 1 2008. 800 [GRINNEMO04] 801 Grinnemo, K-J. and A. Brunstrom, "Performance of SCTP- 802 controlled failovers in M3UA-based SIGTRAN networks", 803 Advanced Simulation Technologies Conference , 4 2004. 805 [IYENGAR06] 806 Iyengar, J., Amer, P., and R. Stewart, "Concurrent 807 Multipath Transfer using SCTP Multihoming over Independent 808 End-to-end Paths.", IEEE/ACM Trans on Networking 14(5), 10 809 2006. 811 [JUNGMAIER02] 812 Jungmaier, A., Rathgeb, E., and M. Tuexen, "On the use of 813 SCTP in failover scenarios", World Multiconference on 814 Systemics, Cybernetics and Informatics , 7 2002. 816 [NATARAJAN09] 817 Natarajan, P., Ekiz, N., Amer, P., and R. Stewart, 818 "Concurrent Multipath Transfer during Path Failure", 819 Computer Communications , 5 2009. 821 [RFC6458] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. 822 Yasevich, "Sockets API Extensions for the Stream Control 823 Transmission Protocol (SCTP)", RFC 6458, December 2011. 825 Appendix A. Discussions of Alternative Approaches 827 This section lists alternative approaches for the issues described in 828 this document. Although these approaches do not require to update 829 RFC4960, we do not recommend them from the reasons described below. 831 A.1. Reduce Path.Max.Retrans (PMR) 833 Smaller values for Path.Max.Retrans shorten the failover duration and 834 in fact this is recommended in some research results [JUNGMAIER02] 835 [GRINNEMO04] [FALLON08]. However to significantly reduce the 836 failover time it is required to go down (as with PFMR) to 837 Path.Max.Retrans=0 and with this setting SCTP switches to another 838 destination address already on a single timeout which may result in 839 spurious failover. Spurious failover is a problem in [RFC4960] SCTP 840 as the transmission of HEARTBEATS on the left primary path, unlike in 841 SCTP-PF, is governed by 'HB.interval' also during the failover 842 process. 'HB.interval' is usually set in the order of seconds 843 (recommended value is 30 seconds) and when the primary path becomes 844 inactive, the next HEARTBEAT may be transmitted only many seconds 845 later. Indeed as recommended, only 30 secs later. Meanwhile, the 846 primary path may since long have recovered, if it needed recovery at 847 all (indeed the failover could be truly spurious). In such 848 situations, post failover, an endpoint is forced to wait in the order 849 of many seconds before the endpoint can resume transmission on the 850 primary path and furthermore once it returns on the primary path the 851 CWND needs to be rebuild anew - a process which the throughput 852 already have had to suffer from on the alternate path. Using a 853 smaller value for 'HB.interval' might help this situation, but it 854 would result in a general waste of bandwidth as such more frequent 855 HEARTBEATING would take place also when there are no observed 856 troubles. The bandwidth overhead may be diminished by having the ULP 857 use a smaller 'HB.interval' only on the path which at any given time 858 is set to be the primary path, but this adds complication in the ULP. 860 In addition, smaller Path.Max.Retrans values also affect the 861 'Association.Max.Retrans' value. When the SCTP association's error 862 count exceeds Association.Max.Retrans threshold, the SCTP sender 863 considers the peer endpoint unreachable and terminates the 864 association. Section 8.2 in [RFC4960] recommends that 865 Association.Max.Retrans value should not be larger than the summation 866 of the Path.Max.Retrans of each of the destination addresses. Else 867 the SCTP sender considers its peer reachable even when all 868 destinations are INACTIVE and to avoid this dormant state operation, 869 [RFC4960] SCTP implementation SHOULD reduce Association.Max.Retrans 870 accordingly whenever it reduces Path.Max.Retrans. However, smaller 871 Association.Max.Retrans value decreases the fault tolerance of SCTP 872 as it increases the chances of association termination during minor 873 congestion events. 875 A.2. Adjust RTO related parameters 877 As several research results indicate, we can also shorten the 878 duration of failover process by adjusting RTO related parameters 879 [JUNGMAIER02] [FALLON08]. During failover process, RTO keeps being 880 doubled. However, if we can choose smaller value for RTO.max, we can 881 stop the exponential growth of RTO at some point. Also, choosing 882 smaller values for RTO.initial or RTO.min can contribute to keep the 883 RTO value small. 885 Similar to reducing Path.Max.Retrans, the advantage of this approach 886 is that it requires no modification to the current specification, 887 although it needs to ignore several recommendations described in the 888 Section 15 of [RFC4960]. However, this approach requires to have 889 enough knowledge about the network characteristics between end 890 points. Otherwise, it can introduce adverse side-effects such as 891 spurious timeouts. 893 The significant issue with this approach, however, is that even if 894 the RTO.max is lowered to an optimal low value, then as long as the 895 Path.Max.Retrans is kept at the [RFC4960] recommended value, the 896 reduction of the RTO.max doesn't reduce the failover time 897 sufficiently enough to prevent severe performance degradation during 898 failover. 900 Appendix B. Discussions for Path Bouncing Effect 902 The methods described in the document can accelerate the failover 903 process. Hence, they might introduce the path bouncing effect where 904 the sender keeps changing the data transmission path frequently. 905 This sounds harmful to the data transfer, however several research 906 results indicate that there is no serious problem with SCTP in terms 907 of path bouncing effect [CARO04] [CARO05]. 909 There are two main reasons for this. First, SCTP is basically 910 designed for multipath communication, which means SCTP maintains all 911 path related parameters (CWND, ssthresh, RTT, error count, etc) per 912 each destination address. These parameters cannot be affected by 913 path bouncing. In addition, when SCTP migrates the data transfer to 914 another path, it starts with the minimal or the initial CWND. Hence, 915 there is little chance for packet reordering or duplicating. 917 Second, even if all communication paths between the end-nodes share 918 the same bottleneck, the SCTP-PF results in a behavior already 919 allowed by [RFC4960]. 921 Appendix C. SCTP-PF for SCTP Single-homed Operation 923 For a single-homed SCTP association the only tangible effect of the 924 activation of SCTP-PF operation is enhanced failure detection in 925 terms of potential notification of the PF state of the sole 926 destination address as well as, for idle associations, more rapid 927 entering, and notification, of inactive state of the destination 928 address and more rapid end-point failure detection. It is believed 929 that neither of these effects are harmful, provided adequate dormant 930 state operation is implemented, and furthermore that they may be 931 particularly useful for applications that deploys multiple SCTP 932 associations for load balancing purposes. The early notification of 933 the PF state may be used for preventive measures as the entering of 934 the PF state can be used as a warning of potential congestion. 935 Depending on the PMR value, the aggressive HEARTBEAT transmission in 936 PF state may speed up the end-point failure detection (exceed of AMR 937 threshold on the sole path error counter) on idle associations in 938 case where relatively large HB.interval value compared to RTO (e.g. 939 30secs) is used. 941 Authors' Addresses 943 Yoshifumi Nishida 944 GE Global Research 945 2623 Camino Ramon 946 San Ramon, CA 94583 947 USA 949 Email: nishida@wide.ad.jp 951 Preethi Natarajan 952 Cisco Systems 953 510 McCarthy Blvd 954 Milpitas, CA 95035 955 USA 957 Email: prenatar@cisco.com 958 Armando Caro 959 BBN Technologies 960 10 Moulton St. 961 Cambridge, MA 02138 962 USA 964 Email: acaro@bbn.com 966 Paul D. Amer 967 University of Delaware 968 Computer Science Department - 434 Smith Hall 969 Newark, DE 19716-2586 970 USA 972 Email: amer@udel.edu 974 Karen E. E. Nielsen 975 Ericsson 976 Kistavaegen 25 977 Stockholm 164 80 978 Sweden 980 Email: karen.nielsen@tieto.com