idnits 2.17.1 draft-ietf-tsvwg-sctp-failover-16.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 17, 2016) is 2988 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nishida 3 Internet-Draft GE Global Research 4 Intended status: Standards Track P. Natarajan 5 Expires: August 20, 2016 Cisco Systems 6 A. Caro 7 BBN Technologies 8 P. Amer 9 University of Delaware 10 K. Nielsen 11 Ericsson 12 February 17, 2016 14 SCTP-PF: Quick Failover Algorithm in SCTP 15 draft-ietf-tsvwg-sctp-failover-16.txt 17 Abstract 19 SCTP supports multi-homing. However, when the failover operation 20 specified in RFC4960 is followed, there can be significant delay and 21 performance degradation in the data transfer path failover. To 22 overcome this problem this document specifies a quick failover 23 algorithm (SCTP-PF) based on the introduction of a Potentially Failed 24 (PF) state in SCTP Path Management. 26 The document also specifies a dormant state operation of SCTP. This 27 dormant state operation is required to be followed by an SCTP-PF 28 implementation, but it may equally well be applied by a standard 29 RFC4960 SCTP implementation. 31 Additionally, the document introduces an alternative switchback 32 operation mode called Primary Path Switchover that will be beneficial 33 in certain situations. This mode of operation applies to both a 34 standard RFC4960 SCTP implementation as well as to a SCTP-PF 35 implementation. 37 The procedures defined in the document require only minimal 38 modifications to the RFC4960 specification. The procedures are 39 sender-side only and do not impact the SCTP receiver. 41 Status of This Memo 43 This Internet-Draft is submitted in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF). Note that other groups may also distribute 48 working documents as Internet-Drafts. The list of current Internet- 49 Drafts is at http://datatracker.ietf.org/drafts/current/. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 This Internet-Draft will expire on August 20, 2016. 58 Copyright Notice 60 Copyright (c) 2016 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (http://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 76 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4 77 3. SCTP with Potentially Failed Destination State (SCTP-PF) . . 4 78 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 79 3.2. Specification of the SCTP-PF Procedures . . . . . . . . . 5 80 4. Dormant State Operation . . . . . . . . . . . . . . . . . . . 9 81 4.1. SCTP Dormant State Procedure . . . . . . . . . . . . . . 10 82 5. Primary Path Switchover . . . . . . . . . . . . . . . . . . . 11 83 6. Suggested SCTP Protocol Parameter Values . . . . . . . . . . 12 84 7. Socket API Considerations . . . . . . . . . . . . . . . . . . 12 85 7.1. Support for the Potentially Failed Path State . . . . . . 13 86 7.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket 87 Option . . . . . . . . . . . . . . . . . . . . . . . . . 14 88 7.3. Exposing the Potentially Failed Path State 89 (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option . . 15 90 8. Security Considerations . . . . . . . . . . . . . . . . . . . 15 91 9. MIB Considerations . . . . . . . . . . . . . . . . . . . . . 16 92 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 93 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 94 12. Proposed Change of Status (to be Deleted before Publication) 17 95 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 96 13.1. Normative References . . . . . . . . . . . . . . . . . . 17 97 13.2. Informative References . . . . . . . . . . . . . . . . . 17 98 Appendix A. Discussions of Alternative Approaches . . . . . . . 18 99 A.1. Reduce Path.Max.Retrans (PMR) . . . . . . . . . . . . . . 18 100 A.2. Adjust RTO related parameters . . . . . . . . . . . . . . 19 101 Appendix B. Discussions for Path Bouncing Effect . . . . . . . . 20 102 Appendix C. SCTP-PF for SCTP Single-homed Operation . . . . . . 20 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 105 1. Introduction 107 The Stream Control Transmission Protocol (SCTP) specified in 108 [RFC4960] supports multi-homing at the transport layer. SCTP's 109 multi-homing features include failure detection and failover 110 procedures to provide network interface redundancy and improved end- 111 to-end fault tolerance. In SCTP's current failure detection 112 procedure, the sender must experience Path.Max.Retrans (PMR) number 113 of consecutive failed timer-based retransmissions on a destination 114 address before detecting a path failure. Until detecting the path 115 failure, the sender continues to transmit data on the failed path. 116 The prolonged time in which [RFC4960] SCTP continues to use a failed 117 path severely degrades the performance of the protocol. To address 118 this problem, this document specifies a quick failover algorithm 119 (SCTP-PF) based on the introduction of a new Potentially Failed (PF) 120 path state in SCTP path management. The performance deficiencies of 121 the [RFC4960] failover operation, and the improvements obtainable 122 from the introduction of a Potentially Failed state in SCTP, were 123 proposed and documented in [NATARAJAN09] for Concurrent Multipath 124 Transfer SCTP [IYENGAR06]. 126 While SCTP-PF can accelerate failover process and improve 127 performance, the risks that an SCTP endpoint enters the dormant state 128 where all destination addresses are inactive can be increased. 129 [RFC4960] leaves the protocol operation during dormant state to 130 implementations and encourages to avoid entering the state as much as 131 possible by careful tuning of the Path.Max.Retrans (PMR) and 132 Association.Max.Retrans (AMR) parameters. We specify a dormant state 133 operation for SCTP-PF which makes SCTP-PF provide the same disruption 134 tolerance as [RFC4960] despite that the dormant state may be entered 135 more quickly. The dormant state operation may equally well be 136 applied by an [RFC4960] implementation and will here serve to provide 137 added fault tolerance for situations where the tuning of the 138 Path.Max.Retrans (PMR) and Association.Max.Retrans (AMR) parameters 139 fail to provide adequate prevention of the entering of the dormant 140 state. 142 The operation after the recovery of a failed path also impacts the 143 performance of the protocol. With the procedures specified in 145 [RFC4960] SCTP will, after a failover from the primary path, switch 146 back to use the primary path for data transfer as soon as this path 147 becomes available again. From a performance perspective such a 148 forced switchback of the data transmission path can be suboptimal as 149 the CWND towards the original primary destination address has to be 150 rebuilt once data transfer resumes, [CARO02]. As an optional 151 alternative to the switchback operation of [RFC4960], this document 152 specifies an alternative Primary Path Switchover procedure which 153 avoid such forced switchbacks of the data transfer path. The Primary 154 Path Switchover operation was originally proposed in [CARO02]. 156 While SCTP-PF primarily is motivated by a desire to improve the 157 multi-homed operation, the feature applies also to SCTP single-homed 158 operation. Here the algorithm serves to provide increased failure 159 detection on idle associations, whereas the failover or switchback 160 aspects of the algorithm will not be activated. This is discussed in 161 more detail in Appendix C. 163 A brief description of the motivation for the introduction of the 164 Potentially Failed state including a discussion of alternative 165 approaches to mitigate the deficiencies of the [RFC4960] failover 166 operation are given in the Appendices. Discussion of path bouncing 167 effects that might be caused by frequent switchovers, are also 168 provided there. 170 2. Conventions and Terminology 172 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 173 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 174 document are to be interpreted as described in [RFC2119]. 176 3. SCTP with Potentially Failed Destination State (SCTP-PF) 178 3.1. Overview 180 To minimize the performance impact during failover, the sender should 181 avoid transmitting data to a failed destination address as early as 182 possible. In the [RFC4960] SCTP path management scheme, the sender 183 stops transmitting data to a destination address only after the 184 destination address is marked inactive. This process takes a 185 significant amount of time as it requires the error counter of the 186 destination address to exceed the Path.Max.Retrans (PMR) threshold. 187 The issue cannot simply be mitigated by lowering of the PMR threshold 188 because this may result in spurious failure detection and unnecessary 189 prevention of the usage of a preferred primary path. Also due to the 190 coupled tuning of the Path.Max.Retrans (PMR) and the 191 Association.Max.Retrans (AMR) parameter values in [RFC4960], lowering 192 of the PMR threshold may result in lowering of the AMR threshold, 193 which would result in decrease of the fault tolerance of SCTP. 195 The solution provided in this document is to extend the SCTP path 196 management scheme of [RFC4960] by the addition of the Potentially 197 Failed (PF) state as an intermediate state in between the active and 198 inactive state of a destination address in the [RFC4960] path 199 management scheme, and let the failover of data transfer away from a 200 destination address be driven by the entering of the PF state instead 201 of by the entering of the inactive state. Thereby SCTP may perform 202 quick failover without negatively impacting the overall fault 203 tolerance of [RFC4960] SCTP. At the same time, RTO-based HEARTBEAT 204 probing is initiated towards a destination address once it enters PF 205 state. Thereby SCTP may quickly ascertain whether network 206 connectivity towards the destination address is broken or whether the 207 failover was spurious. In the case where the failover was spurious 208 data transfer may quickly resume towards the original destination 209 address. 211 The new failure detection algorithm assumes that loss detected by a 212 timeout implies either severe congestion or network connectivity 213 failure. It recommends that by default a destination address is 214 classified as PF at the occurrence of the first timeout. 216 3.2. Specification of the SCTP-PF Procedures 218 The SCTP-PF operation is specified as follows: 220 1. The sender maintains a new tunable SCTP Protocol Parameter 221 called PotentiallyFailed.Max.Retrans (PFMR). The PFMR defines 222 the new intermediate PF threshold on the destination address 223 error counter. When this threshold is exceeded the destination 224 address is classified as PF. The RECOMMENDED value of PFMR is 225 0. If PFMR is set to be greater than or equal to 226 Path.Max.Retrans (PMR), the resulting PF threshold will be so 227 high that the destination address will reach the inactive state 228 before it can be classified as PF. 230 2. The error counter of an active destination address is 231 incremented or cleared as specified in [RFC4960]. This means 232 that the error counter of the destination address in active 233 state will be incremented each time the T3-rtx timer expires, or 234 each time a HEARTBEAT chunk is sent when idle and not 235 acknowledged within an RTO. When the value in the destination 236 address error counter exceeds PFMR, the endpoint MUST mark the 237 destination address as in the PF state. 239 3. A SCTP-PF sender SHOULD NOT send data to destination addresses 240 in PF state when alternative destination addresses in active 241 state are available. Specifically this means that: 243 i When there is outbound data to send and the destination 244 address presently used for data transmission is in PF state, 245 the sender SHOULD choose a destination address in active 246 state, if one exists, and use this destination address for 247 data transmission. 249 ii As specified in [RFC4960] section 6.4.1, when the sender 250 retransmits data that has timed out, it should attempt to 251 pick a new destination address for data retransmission. In 252 this case, the sender SHOULD choose an alternate destination 253 transport address in active state if one exists. 255 iii When there is outbound data to send and the SCTP user 256 explicitly requests to send data to a destination address in 257 PF state, the sender SHOULD send the data to an alternate 258 destination address in active state if one exists. 260 When choosing among multiple destination addresses in active 261 state an SCTP sender will follow the guiding principles of 262 section 6.4.1 of [RFC4960] of choosing most divergent source- 263 destination pairs compared with, for i.: the destination address 264 in PF state that it performs a failover from, and for ii.: the 265 destination address towards which the data timed out. Rules for 266 picking the most divergent source-destination pair are an 267 implementation decision and are not specified within this 268 document. 270 In all cases, the sender MUST NOT change the state of chosen 271 destination address, whether this state be active or PF, and it 272 MUST NOT clear the error counter of the destination address as a 273 result of choosing the destination address for data 274 transmission. 276 4. When the destination addresses are all in PF state or some in PF 277 state and some in inactive state, the sender MUST choose one 278 destination address in PF state and SHOULD transmit or 279 retransmit data to this destination address using the following 280 rules: 282 A. The sender SHOULD choose the destination in PF state with 283 the lowest error count (fewest consecutive timeouts) for 284 data transmission and transmit or retransmit data to this 285 destination. 287 B. When there are multiple destination addresses in PF state 288 with same error count, the sender should let the choice 289 among the multiple destination addresses in PF state with 290 equal error count be based on the [RFC4960], section 6.4.1, 291 principles of choosing most divergent source-destination 292 pairs when executing (potentially consecutive) 293 retransmission. Rules for picking the most divergent 294 source-destination pair are an implementation decision and 295 are not specified within this document. 297 The sender MUST NOT change the state and the error counter of 298 any destination addresses as the result of the selection. 300 5. The HB.interval of the Path Heartbeat function of [RFC4960] MUST 301 be ignored for destination addresses in PF state. Instead 302 HEARTBEAT chunks are sent to destination addresses in PF state 303 once per RTO. HEARTBEAT chunks SHOULD be sent to destination 304 addresses in PF state, but the sending of HEARTBEATS MUST honor 305 whether the Path Heartbeat function (Section 8.3 of [RFC4960]) 306 is enabled for the destination address or not. I.e., if the 307 Path Heartbeat function is disabled for the destination address 308 in question, HEARTBEATS MUST NOT be sent. Note that when 309 Heartbeat function is disabled, it may take longer to transition 310 a destination address in PF state back to active state. 312 6. HEARTBEATs are sent when a destination address reaches the PF 313 state. When a HEARTBEAT chunk is not acknowledged within the 314 RTO, the sender increments the error counter and exponentially 315 backs off the RTO value. If the error counter is less than PMR, 316 the sender transmits another packet containing the HEARTBEAT 317 chunk immediately after timeout expiration on the previous 318 HEARTBEAT. When data is being transmitted to a destination 319 address in the PF state, the transmission of a HEARTBEAT chunk 320 MAY be omitted in case where the receipt of a SACK of the data 321 or a T3-rtx timer expiration on the data can provide equivalent 322 information, such as the case where the data chunk has been 323 transmitted to a single destination address only. Likewise, the 324 timeout of a HEARTBEAT chunk MAY be ignored if data is 325 outstanding towards the destination address. 327 7. When the sender receives a HEARTBEAT ACK from a HEARTBEAT sent 328 to a destination address in PF state, the sender SHOULD clear 329 the error counter of the destination address and transition the 330 destination address back to active state. However, there may be 331 a situation where HEARTBEAT chunks can go through while DATA 332 chunks cannot. Hence, in a situation where a HEARTBEAT ACK 333 arrives while there is data outstanding towards the destination 334 address to which the HEARTBEAT was sent, then an implementation 335 MAY choose to not have the HEARTBEAT ACK reset the error 336 counter, but have the error counter reset await the fate of the 337 outstanding data transmission. This situation can happen when 338 data is sent to a destination address in PF state. When the 339 sender resumes data transmission on a destination address after 340 a transition of the destination address from PF to active state, 341 it MUST do this following the prescriptions of Section 7.2 of 342 [RFC4960]. 344 8. Additional (PMR - PFMR) consecutive timeouts on a destination 345 address in PF state confirm the path failure, upon which the 346 destination address transitions to the inactive state. As 347 described in [RFC4960], the sender (i) SHOULD notify the ULP 348 about this state transition, and (ii) transmit HEARTBEAT chunks 349 to the inactive destination address at a lower HB.interval 350 frequency as described in Section 8.3 of [RFC4960] (when the 351 Path Heartbeat function is enabled for the destination address). 353 9. Acknowledgments for chunks that have been transmitted to 354 multiple destinations (i.e., a chunk which has been 355 retransmitted to a different destination address than the 356 destination address to which the chunk was first transmitted) 357 SHOULD NOT clear the error count for an inactive destination 358 address and SHOULD NOT move a destination address in PF state 359 back to active state, since a sender cannot disambiguate whether 360 the ACK was for the original transmission or the 361 retransmission(s). A SCTP sender MAY clear the error counter 362 and move a destination address back to active state by 363 information other than acknowledgments, when it can uniquely 364 determine which destination, among multiple destination 365 addresses, the chunk reached. This document makes no reference 366 to what such information could consist of, nor how such 367 information could be obtained. 369 10. Acknowledgments for data chunks that has been transmitted to one 370 destination address only MUST clear the error counter for the 371 destination address and MUST transition a destination address in 372 PF state back to active state. This situation can happen when 373 new data is sent to a destination address in the PF state. It 374 can also happen in situations where the destination address is 375 in the PF state due to the occurrence of a spurious T3-rtx timer 376 and acknowledgments start to arrive for data sent prior to 377 occurrence of the spurious T3-rtx and data has not yet been 378 retransmitted towards other destinations. This document does 379 not specify special handling for detection of or reaction to 380 spurious T3-rtx timeouts, e.g., for special operation vis-a-vis 381 the congestion control handling or data retransmission operation 382 towards a destination address which undergoes a transition from 383 active to PF to active state due to a spurious T3-rtx timeout. 384 But it is noted that this is an area which would benefit from 385 additional attention, experimentation and specification for 386 single-homed SCTP as well as for multi-homed SCTP protocol 387 operation. 389 11. When all destination addresses are in inactive state, and SCTP 390 protocol operation thus is said to be in dormant state, the 391 prescriptions given in Section 4 shall be followed. 393 12. The SCTP stack SHOULD expose the PF state of its destination 394 addresses to the ULP as well as provide the means to notify the 395 ULP of state transitions of its destination addresses from 396 active to PF, and vice-versa. However it is recommended that an 397 SCTP stack implementing SCTP-PF also allows for that the ULP is 398 kept ignorant of the PF state of its destinations and the 399 associated state transitions, thus allowing for retain of the 400 simpler state transition model of RFC4960 in the ULP. For this 401 reason it is recommended that an SCTP stack implementing SCTP-PF 402 also provides the ULP with the means to suppress exposure of the 403 PF state and the associated state transitions. 405 4. Dormant State Operation 407 In a situation with complete disruption of the communication in 408 between the SCTP Endpoints, the aggressive HEARTBEAT transmissions of 409 SCTP-PF on destination addresses in PF state may make the association 410 enter dormant state faster than a standard [RFC4960] SCTP 411 implementation given the same setting of Path.Max.Retrans (PMR) and 412 Association.Max.Retrans (AMR). For example, an SCTP association with 413 two destination addresses typically would reach dormant state in half 414 the time of an [RFC4960] SCTP implementation in such situations. 415 This is because a SCTP PF sender will send HEARTBEATS and data 416 retransmissions in parallel with RTO intervals when there are 417 multiple destinations addresses in PF state. This argument presumes 418 that RTO << HB.interval of [RFC4960]. With the design goal that 419 SCTP-PF shall provide the same level of disruption tolerance as an 420 [RFC4960] SCTP implementation with the same Path.Max.Retrans (PMR) 421 and Association.Max.Retrans (AMR) setting, we prescribe for that an 422 SCTP-PF implementation SHOULD operate as described below in 423 Section 4.1 during dormant state. 425 An SCTP-PF implementation MAY choose a different dormant state 426 operation than the one described below in Section 4.1 provided that 427 the solution chosen does not decrease the fault tolerance of the 428 SCTP-PF operation. 430 The below prescription for SCTP-PF dormant state handling MUST NOT be 431 coupled to the value of the PFMR, but solely to the activation of 432 SCTP-PF logic in an SCTP implementation. 434 It is noted that the below dormant state operation is considered to 435 provide added disruption tolerance also for an [RFC4960] SCTP 436 implementation, and that it can be sensible for an [RFC4960] SCTP 437 implementation to follow this mode of operation. For an [RFC4960] 438 SCTP implementation the continuation of data transmission during 439 dormant state makes the fault tolerance of SCTP be more robust 440 towards situations where some, or all, alternative paths of an SCTP 441 association approach, or reach, inactive state before the primary 442 path used for data transmission observes trouble. 444 4.1. SCTP Dormant State Procedure 446 a. When the destination addresses are all in inactive state and data 447 is available for transfer, the sender MUST choose one destination 448 and transmit data to this destination address. 450 b. The sender MUST NOT change the state of the chosen destination 451 address (it remains in inactive state) and it MUST NOT clear the 452 error counter of the destination address as a result of choosing 453 the destination address for data transmission. 455 c. The sender SHOULD choose the destination in inactive state with 456 the lowest error count (fewest consecutive timeouts) for data 457 transmission. When there are multiple destinations with same 458 error count in inactive state, the sender SHOULD attempt to pick 459 the most divergent source - destination pair from the last source 460 - destination pair where failure was observed. Rules for picking 461 the most divergent source-destination pair are an implementation 462 decision and are not specified within this document. To support 463 differentiation of inactive destination addresses based on their 464 error count SCTP will need to allow for increment of the 465 destination address error counters up to some reasonable limit 466 above PMR+1, thus changing the prescriptions of [RFC4960], 467 section 8.3, in this respect. The exact limit to apply is not 468 specified in this document but it is considered reasonable to 469 require for the limit to be an order of magnitude higher than the 470 PMR value. A sender MAY choose to deploy other strategies that 471 the strategy defined here. The strategy to prioritize the last 472 active destination address, i.e., the destination address with 473 the fewest error counts is optimal when some paths are 474 permanently inactive, but suboptimal when a path instability is 475 transient. 477 5. Primary Path Switchover 479 The objective of the Primary Path Switchover operation is to allow 480 the SCTP sender to continue data transmission on a new working path 481 even when the old primary destination address becomes active again. 482 This is achieved by having SCTP perform a switchover of the primary 483 path to the new working path if the error counter of the primary path 484 exceeds a certain threshold. This mode of operation can be applied 485 not only to SCTP-PF implementations, but also to [RFC4960] 486 implementations. 488 The Primary Path Switchover operation requires only sender side 489 changes. The details are: 491 1. The sender maintains a new tunable parameter, called 492 Primary.Switchover.Max.Retrans (PSMR). For SCTP-PF 493 implementations, the PSMR MUST be set greater or equal to the 494 PFMR value. For [RFC4960] implementations the PSMR MUST be set 495 greater or equal to the PMR value. Implementations MUST reject 496 any other values of PSMR. 498 2. When the path error counter on a set primary path exceeds PSMR, 499 the SCTP implementation MUST autonomously select and set a new 500 primary path. 502 3. The primary path selected by the SCTP implementation MUST be the 503 path which at the given time would be chosen for data transfer. 504 A previously failed primary path can be used as data transfer 505 path as per normal path selection when the present data transfer 506 path fails. 508 4. For SCTP-PF, the recommended value of PSMR is PFMR when Primary 509 Path Switchover operation mode is used. This means that no 510 forced switchback to a previously failed primary path is 511 performed. An SCTP-PF implementation of Primary Path Switchover 512 MUST support the setting of PSMR = PFMR. A SCTP-PF 513 implementation of Primary Path Switchover MAY support setting of 514 PSMR > PFMR. 516 5. For [RFC4960] SCTP, the recommended value of PSMR is PMR when 517 Primary Path Switchover is used. This means that no forced 518 switchback to a previously failed primary path is performed. A 519 [RFC4960] SCTP implementation of Primary Path Switchover MUST 520 support the setting of PSMR = PMR. An [RFC4960] SCTP 521 implementation of Primary Path Switchover MAY support larger 522 settings of PSMR > PMR. 524 6. It MUST be possible to disable the Primary Path Switchover 525 operation and obtain the standard switchback operation of 526 [RFC4960]. 528 The manner of switchover operation that is most optimal in a given 529 scenario depends on the relative quality of a set primary path versus 530 the quality of alternative paths available as well as on the extent 531 to which it is desired for the mode of operation to enforce traffic 532 distribution over a number of network paths. I.e., load distribution 533 of traffic from multiple SCTP associations may be sought to be 534 enforced by distribution of the set primary paths with [RFC4960] 535 switchback operation. However as [RFC4960] switchback behavior is 536 suboptimal in certain situations, especially in scenarios where a 537 number of equally good paths are available, an SCTP implementation 538 MAY support also, as alternative behavior, the Primary Path 539 Switchover mode of operation and MAY enable it based on applications' 540 requests. 542 For an SCTP implementation that implements the Primary Path 543 Switchover operation, this specification RECOMMENDS that the standard 544 RFC4960 switchback operation is retained as the default operation. 546 6. Suggested SCTP Protocol Parameter Values 548 This document does not alter the [RFC4960] value recommendation for 549 the SCTP Protocol Parameters defined in [RFC4960]. 551 The following protocol parameter is RECOMMENDED: 553 PotentiallyFailed.Max.Retrans (PFMR) - 0 555 7. Socket API Considerations 557 This section describes how the socket API defined in [RFC6458] is 558 extended to provide a way for the application to control and observe 559 the SCTP-PF behavior as well as the Primary Path Switchover function. 561 Please note that this section is informational only. 563 A socket API implementation based on [RFC6458] is, by means of the 564 existing SCTP_PEER_ADDR_CHANGE event, extended to provide the event 565 notification when a peer address enters or leaves the potentially 566 failed state as well as the socket API implementation is extended to 567 expose the potentially failed state of a peer address in the existing 568 SCTP_GET_PEER_ADDR_INFO structure. 570 Furthermore, two new read/write socket options for the level 571 IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS and 572 SCTP_EXPOSE_POTENTIALLY_FAILED_STATE are defined as described below. 573 The first socket option is used to control the values of the PFMR and 574 PSMR parameters described in Section 3 and in Section 5. The second 575 one controls the exposition of the potentially failed path state. 577 Support for the SCTP_PEER_ADDR_THLDS and 578 SCTP_EXPOSE_POTENTIALLY_FAILED_STATE socket options need also to be 579 added to the function sctp_opt_info(). 581 7.1. Support for the Potentially Failed Path State 583 As defined in [RFC6458], the SCTP_PEER_ADDR_CHANGE event is provided 584 if the status of a peer address changes. In addition to the state 585 changes described in [RFC6458], this event is also provided, if a 586 peer address enters or leaves the potentially failed state. The 587 notification as defined in [RFC6458] uses the following structure: 589 struct sctp_paddr_change { 590 uint16_t spc_type; 591 uint16_t spc_flags; 592 uint32_t spc_length; 593 struct sockaddr_storage spc_aaddr; 594 uint32_t spc_state; 595 uint32_t spc_error; 596 sctp_assoc_t spc_assoc_id; 597 } 599 [RFC6458] defines the constants SCTP_ADDR_AVAILABLE, 600 SCTP_ADDR_UNREACHABLE, SCTP_ADDR_REMOVED, SCTP_ADDR_ADDED, and 601 SCTP_ADDR_MADE_PRIM to be provided in the spc_state field. This 602 document defines in addition to that the new constant 603 SCTP_ADDR_POTENTIALLY_FAILED, which is reported if the affected 604 address becomes potentially failed. 606 The SCTP_GET_PEER_ADDR_INFO socket option defined in [RFC6458] can be 607 used to query the state of a peer address. It uses the following 608 structure: 610 struct sctp_paddrinfo { 611 sctp_assoc_t spinfo_assoc_id; 612 struct sockaddr_storage spinfo_address; 613 int32_t spinfo_state; 614 uint32_t spinfo_cwnd; 615 uint32_t spinfo_srtt; 616 uint32_t spinfo_rto; 617 uint32_t spinfo_mtu; 618 }; 620 [RFC6458] defines the constants SCTP_UNCONFIRMED, SCTP_ACTIVE, and 621 SCTP_INACTIVE to be provided in the spinfo_state field. This 622 document defines in addition to that the new constant 623 SCTP_POTENTIALLY_FAILED, which is reported if the peer address is 624 potentially failed. 626 7.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket Option 628 Applications can control the SCTP-PF behavior by getting or setting 629 the number of consecutive timeouts before a peer address is 630 considered potentially failed or unreachable. The same socket option 631 is used by applications to set and get the number of timeouts before 632 the primary path is changed automatically by the Primary Path 633 Switchover function. This socket option uses the level IPPROTO_SCTP 634 and the name SCTP_PEER_ADDR_THLDS. 636 The following structure is used to access and modify the thresholds: 638 struct sctp_paddrthlds { 639 sctp_assoc_t spt_assoc_id; 640 struct sockaddr_storage spt_address; 641 uint16_t spt_pathmaxrxt; 642 uint16_t spt_pathpfthld; 643 uint16_t spt_pathcpthld; 644 }; 646 spt_assoc_id: This parameter is ignored for one-to-one style 647 sockets. For one-to-many style sockets the application may fill 648 in an association identifier or SCTP_FUTURE_ASSOC. It is an error 649 to use SCTP_{CURRENT|ALL}_ASSOC in spt_assoc_id. 651 spt_address: This specifies which peer address is of interest. If a 652 wild card address is provided, this socket option applies to all 653 current and future peer addresses. 655 spt_pathmaxrxt: Each peer address of interest is considered 656 unreachable, if its path error counter exceeds spt_pathmaxrxt. 658 spt_pathpfthld: Each peer address of interest is considered 659 Potentially Failed, if its path error counter exceeds 660 spt_pathpfthld. 662 spt_pathcpthld: Each peer address of interest is not considered the 663 primary remote address anymore, if its path error counter exceeds 664 spt_pathcpthld. Using a value of 0xffff disables the selection of 665 a new primary peer address. If an implementation does not support 666 the automatically selection of a new primary address, it should 667 indicate an error with errno set to EINVAL if a value different 668 from 0xffff is used in spt_pathcpthld. For SCTP-PF, the setting 669 of spt_pathcpthld < spt_pathpfthld should be rejected with errno 670 set to EINVAL. For [RFC4960] SCTP, the setting of spt_pathcpthld 671 < spt_pathmaxrxt should be rejected with errno set to EINVAL. A 672 SCTP-PF implementation may support only setting of spt_pathcpthld 673 = spt_pathpfthld and spt_pathcpthld = 0xffff and a [RFC4960] SCTP 674 implementation may support only setting of spt_pathcpthld = 675 spt_pathmaxrxt and spt_pathcpthld = 0xffff. In these cases SCTP 676 shall reject setting of other values with errno set to EINVAL. 678 7.3. Exposing the Potentially Failed Path State 679 (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option 681 Applications can control the exposure of the potentially failed path 682 state in the SCTP_PEER_ADDR_CHANGE event and the 683 SCTP_GET_PEER_ADDR_INFO as described in Section 7.1. The default 684 value is implementation specific. 686 This socket option uses the level IPPROTO_SCTP and the name 687 SCTP_EXPOSE_POTENTIALLY_FAILED_STATE. 689 The following structure is used to control the exposition of the 690 potentially failed path state: 692 struct sctp_assoc_value { 693 sctp_assoc_t assoc_id; 694 uint32_t assoc_value; 695 }; 697 assoc_id: This parameter is ignored for one-to-one style sockets. 698 For one-to-many style sockets the application may fill in an 699 association identifier or SCTP_FUTURE_ASSOC. It is an error to 700 use SCTP_{CURRENT|ALL}_ASSOC in assoc_id. 702 assoc_value: The potentially failed path state is exposed if and 703 only if this parameter is non-zero. 705 8. Security Considerations 707 Security considerations for the use of SCTP and its APIs are 708 discussed in [RFC4960] and [RFC6458]. 710 The logic introduced by this document does not impact existing SCTP 711 messages on the wire. Also, this document does not introduce any new 712 SCTP messages on the wire that require new security considerations. 714 SCTP-PF makes SCTP not only more robust during primary path failure/ 715 congestion but also more vulnerable to network connectivity/ 716 congestion attacks on the primary path. SCTP-PF makes it easier for 717 an attacker to trick SCTP to change data transfer path, since the 718 duration of time that an attacker needs to negatively influence the 719 network connectivity is much shorter than [RFC4960]. However, SCTP- 720 PF does not constitute a significant change in the duration of time 721 and effort an attacker needs to keep SCTP away from the primary path. 722 With the standard switchback operation [RFC4960] SCTP resumes data 723 transfer on its primary path as soon as the next HEARTBEAT succeeds. 725 On the other hand, usage of the Primary Path Switchover mechanism, 726 does change the threat analysis. This is because on-path attackers 727 can force a permanent change of the data transfer path by blocking 728 the primary path until the switchover of the primary path is 729 triggered by the Primary Path Switchover algorithm. This especially 730 will be the case when the Primary Path Switchover is used together 731 with SCTP-PF with the particular setting of PSMR = PFMR = 0, as 732 Primary Path Switchover here happens already at the first RTO timeout 733 experienced. Users of the Primary Path Switchover mechanism should 734 be aware of this fact. 736 The event notification of path state transfer from active to 737 potentially failed state and vice versa gives attackers an increased 738 possibility to generate more local events. However, it is assumed 739 that event notifications are rate-limited in the implementation to 740 address this threat. 742 9. MIB Considerations 744 SCTP-PF introduces new SCTP algorithms for failover and switchback 745 with associated new state parameters. It is recommended that the 746 SCTP-MIB defined in [RFC3873] is updated to support the management of 747 the SCTP-PF implementation. This can be done by extending the 748 sctpAssocRemAddrActive field of the SCTPAssocRemAddrTable to include 749 information of the PF state of the destination address and by adding 750 new fields to the SCTPAssocRemAddrTable supporting 751 PotentiallyFailed.Max.Retrans (PFMR) and 752 Primary.Switchover.Max.Retrans (PSMR) parameters. 754 10. IANA Considerations 756 This document does not create any new registries or modify the rules 757 for any existing registries managed by IANA. 759 11. Acknowledgements 761 The authors wish to thank Michael Tuexen for his many invaluable 762 comments and for his very substantial support with the making of this 763 document. 765 12. Proposed Change of Status (to be Deleted before Publication) 767 Initially this work looked to entail some changes of the Congestion 768 Control (CC) operation of SCTP and for this reason the work was 769 proposed as Experimental. These intended changes of the CC operation 770 have since been judged to be irrelevant and are no longer part of the 771 specification. As the specification entails no other potential 772 harmful features, consensus exists in the WG to bring the work 773 forward as PS. 775 Initially concerns have been expressed about the possibility for the 776 mechanism to introduce path bouncing with potential harmful network 777 impacts. These concerns are believed to be unfounded. This issue is 778 addressed in Appendix B. 780 It is noted that the feature specified by this document is 781 implemented by multiple SCTP SW implementations and furthermore that 782 various variants of the solution have been deployed in telephony 783 signaling environments for several years with good results. 785 13. References 787 13.1. Normative References 789 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 790 Requirement Levels", BCP 14, RFC 2119, March 1997. 792 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 793 4960, September 2007. 795 13.2. Informative References 797 [CARO02] Caro Jr., A., Iyengar, J., Amer, P., Heinz, G., and R. 798 Stewart, "A Two-level Threshold Recovery Mechanism for 799 SCTP", Tech report, CIS Dept, University of Delaware , 7 800 2002. 802 [CARO04] Caro Jr., A., Amer, P., and R. Stewart, "End-to-End 803 Failover Thresholds for Transport Layer Multi homing", 804 MILCOM 2004 , 11 2004. 806 [CARO05] Caro Jr., A., "End-to-End Fault Tolerance using Transport 807 Layer Multi homing", Ph.D Thesis, University of Delaware , 808 1 2005. 810 [FALLON08] 811 Fallon, S., Jacob, P., Qiao, Y., Murphy, L., Fallon, E., 812 and A. Hanley, "SCTP Switchover Performance Issues in WLAN 813 Environments", IEEE CCNC 2008, 1 2008. 815 [GRINNEMO04] 816 Grinnemo, K-J. and A. Brunstrom, "Performance of SCTP- 817 controlled failovers in M3UA-based SIGTRAN networks", 818 Advanced Simulation Technologies Conference , 4 2004. 820 [IYENGAR06] 821 Iyengar, J., Amer, P., and R. Stewart, "Concurrent 822 Multipath Transfer using SCTP Multihoming over Independent 823 End-to-end Paths.", IEEE/ACM Trans on Networking 14(5), 10 824 2006. 826 [JUNGMAIER02] 827 Jungmaier, A., Rathgeb, E., and M. Tuexen, "On the use of 828 SCTP in failover scenarios", World Multiconference on 829 Systemics, Cybernetics and Informatics , 7 2002. 831 [NATARAJAN09] 832 Natarajan, P., Ekiz, N., Amer, P., and R. Stewart, 833 "Concurrent Multipath Transfer during Path Failure", 834 Computer Communications , 5 2009. 836 [RFC3873] Pastor, J. and M. Belinchon, "Stream Control Transmission 837 Protocol (SCTP) Management Information Base (MIB)", RFC 838 3873, DOI 10.17487/RFC3873, September 2004, 839 . 841 [RFC6458] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. 842 Yasevich, "Sockets API Extensions for the Stream Control 843 Transmission Protocol (SCTP)", RFC 6458, December 2011. 845 Appendix A. Discussions of Alternative Approaches 847 This section lists alternative approaches for the issues described in 848 this document. Although these approaches do not require to update 849 RFC4960, we do not recommend them from the reasons described below. 851 A.1. Reduce Path.Max.Retrans (PMR) 853 Smaller values for Path.Max.Retrans shorten the failover duration and 854 in fact this is recommended in some research results [JUNGMAIER02] 855 [GRINNEMO04] [FALLON08]. However to significantly reduce the 856 failover time it is required to go down (as with PFMR) to 857 Path.Max.Retrans=0 and with this setting SCTP switches to another 858 destination address already on a single timeout which may result in 859 spurious failover. Spurious failover is a problem in [RFC4960] SCTP 860 as the transmission of HEARTBEATS on the left primary path, unlike in 861 SCTP-PF, is governed by 'HB.interval' also during the failover 862 process. 'HB.interval' is usually set in the order of seconds 863 (recommended value is 30 seconds) and when the primary path becomes 864 inactive, the next HEARTBEAT may be transmitted only many seconds 865 later. Indeed as recommended, only 30 secs later. Meanwhile, the 866 primary path may since long have recovered, if it needed recovery at 867 all (indeed the failover could be truly spurious). In such 868 situations, post failover, an endpoint is forced to wait in the order 869 of many seconds before the endpoint can resume transmission on the 870 primary path and furthermore once it returns on the primary path the 871 CWND needs to be rebuild anew - a process which the throughput 872 already have had to suffer from on the alternate path. Using a 873 smaller value for 'HB.interval' might help this situation, but it 874 would result in a general waste of bandwidth as such more frequent 875 HEARTBEATING would take place also when there are no observed 876 troubles. The bandwidth overhead may be diminished by having the ULP 877 use a smaller 'HB.interval' only on the path which at any given time 878 is set to be the primary path, but this adds complication in the ULP. 880 In addition, smaller Path.Max.Retrans values also affect the 881 'Association.Max.Retrans' value. When the SCTP association's error 882 count exceeds Association.Max.Retrans threshold, the SCTP sender 883 considers the peer endpoint unreachable and terminates the 884 association. Section 8.2 in [RFC4960] recommends that 885 Association.Max.Retrans value should not be larger than the summation 886 of the Path.Max.Retrans of each of the destination addresses. Else 887 the SCTP sender considers its peer reachable even when all 888 destinations are INACTIVE and to avoid this dormant state operation, 889 [RFC4960] SCTP implementation SHOULD reduce Association.Max.Retrans 890 accordingly whenever it reduces Path.Max.Retrans. However, smaller 891 Association.Max.Retrans value decreases the fault tolerance of SCTP 892 as it increases the chances of association termination during minor 893 congestion events. 895 A.2. Adjust RTO related parameters 897 As several research results indicate, we can also shorten the 898 duration of failover process by adjusting RTO related parameters 899 [JUNGMAIER02] [FALLON08]. During failover process, RTO keeps being 900 doubled. However, if we can choose smaller value for RTO.max, we can 901 stop the exponential growth of RTO at some point. Also, choosing 902 smaller values for RTO.initial or RTO.min can contribute to keep the 903 RTO value small. 905 Similar to reducing Path.Max.Retrans, the advantage of this approach 906 is that it requires no modification to the current specification, 907 although it needs to ignore several recommendations described in the 908 Section 15 of [RFC4960]. However, this approach requires to have 909 enough knowledge about the network characteristics between end 910 points. Otherwise, it can introduce adverse side-effects such as 911 spurious timeouts. 913 The significant issue with this approach, however, is that even if 914 the RTO.max is lowered to an optimal low value, then as long as the 915 Path.Max.Retrans is kept at the [RFC4960] recommended value, the 916 reduction of the RTO.max doesn't reduce the failover time 917 sufficiently enough to prevent severe performance degradation during 918 failover. 920 Appendix B. Discussions for Path Bouncing Effect 922 The methods described in the document can accelerate the failover 923 process. Hence, they might introduce the path bouncing effect where 924 the sender keeps changing the data transmission path frequently. 925 This sounds harmful to the data transfer, however several research 926 results indicate that there is no serious problem with SCTP in terms 927 of path bouncing effect [CARO04] [CARO05]. 929 There are two main reasons for this. First, SCTP is basically 930 designed for multipath communication, which means SCTP maintains all 931 path related parameters (CWND, ssthresh, RTT, error count, etc) per 932 each destination address. These parameters cannot be affected by 933 path bouncing. In addition, when SCTP migrates the data transfer to 934 another path, it starts with the minimal or the initial CWND. Hence, 935 there is little chance for packet reordering or duplicating. 937 Second, even if all communication paths between the end-nodes share 938 the same bottleneck, the SCTP-PF results in a behavior already 939 allowed by [RFC4960]. 941 Appendix C. SCTP-PF for SCTP Single-homed Operation 943 For a single-homed SCTP association the only tangible effect of the 944 activation of SCTP-PF operation is enhanced failure detection in 945 terms of potential notification of the PF state of the sole 946 destination address as well as, for idle associations, more rapid 947 entering, and notification, of inactive state of the destination 948 address and more rapid end-point failure detection. It is believed 949 that neither of these effects are harmful, provided adequate dormant 950 state operation is implemented, and furthermore that they may be 951 particularly useful for applications that deploys multiple SCTP 952 associations for load balancing purposes. The early notification of 953 the PF state may be used for preventive measures as the entering of 954 the PF state can be used as a warning of potential congestion. 955 Depending on the PMR value, the aggressive HEARTBEAT transmission in 956 PF state may speed up the end-point failure detection (exceed of AMR 957 threshold on the sole path error counter) on idle associations in 958 case where relatively large HB.interval value compared to RTO (e.g. 959 30secs) is used. 961 Authors' Addresses 963 Yoshifumi Nishida 964 GE Global Research 965 2623 Camino Ramon 966 San Ramon, CA 94583 967 USA 969 Email: nishida@wide.ad.jp 971 Preethi Natarajan 972 Cisco Systems 973 510 McCarthy Blvd 974 Milpitas, CA 95035 975 USA 977 Email: prenatar@cisco.com 979 Armando Caro 980 BBN Technologies 981 10 Moulton St. 982 Cambridge, MA 02138 983 USA 985 Email: acaro@bbn.com 987 Paul D. Amer 988 University of Delaware 989 Computer Science Department - 434 Smith Hall 990 Newark, DE 19716-2586 991 USA 993 Email: amer@udel.edu 994 Karen E. E. Nielsen 995 Ericsson 996 Kistavaegen 25 997 Stockholm 164 80 998 Sweden 1000 Email: karen.nielsen@tieto.com