idnits 2.17.1 draft-ietf-tsvwg-sctp-failover-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4960]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 24, 2014) is 3411 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nishida 3 Internet-Draft GE Global Research 4 Intended status: Standards Track P. Natarajan 5 Expires: June 27, 2015 Cisco Systems 6 A. Caro 7 BBN Technologies 8 P. Amer 9 University of Delaware 10 K. Nielsen 11 Ericsson 12 December 24, 2014 14 SCTP-PF: Quick Failover Algorithm in SCTP 15 draft-ietf-tsvwg-sctp-failover-09.txt 17 Abstract 19 One of the major advantages of SCTP is the support of multi-homed 20 communication. A multi-homed SCTP end-point has the ability to 21 withstand network failures by migrating the traffic from an inactive 22 network to an active one. However, if the failover operation as 23 specified in [RFC4960] is followed, there can be a significant delay 24 in the migration to the active destination addresses, thus severely 25 reducing the effectiveness of the SCTP failover operation. 27 This memo complements [RFC4960] by the introduction of the 28 Potentially Failed path state and the associated new failover 29 operation called SCTP-PF to apply during a network failure. In 30 addition, the memo complements [RFC4960] by introducing of 31 alternative switchover operation modes for the data transfer path 32 management after the recovery of a failed primary path. These modes 33 offers for more performance optimal operation in some network 34 environments. The implementation of the additional switchover 35 operation modes is optional. 37 The procedures defined in the document require only minimal 38 modifications to the current specification. The procedures are 39 sender-side only and do not impact the SCTP receiver. 41 Status of This Memo 43 This Internet-Draft is submitted in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF). Note that other groups may also distribute 48 working documents as Internet-Drafts. The list of current Internet- 49 Drafts is at http://datatracker.ietf.org/drafts/current/. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 This Internet-Draft will expire on June 27, 2015. 58 Copyright Notice 60 Copyright (c) 2014 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (http://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 76 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 77 3. Issues with the SCTP Path Management . . . . . . . . . . . . 4 78 4. SCTP with Potentially-Failed Destination State (SCTP-PF) . . 5 79 4.1. SCTP-PF Concept . . . . . . . . . . . . . . . . . . . . . 5 80 4.2. SCTP-PF Algorithm in Detail . . . . . . . . . . . . . . . 6 81 4.3. Optional Feature: Permanent Failover . . . . . . . . . . 9 82 5. Socket API Considerations . . . . . . . . . . . . . . . . . . 11 83 5.1. Support for the Potentially Failed Path State . . . . . . 11 84 5.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket 85 Option . . . . . . . . . . . . . . . . . . . . . . . . . 12 86 5.3. Exposing the Potentially Failed Path State 87 (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option . . 13 88 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 89 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 90 8. Proposed Change of Status (to be Deleted before Publication) 14 91 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 92 9.1. Normative References . . . . . . . . . . . . . . . . . . 14 93 9.2. Informative References . . . . . . . . . . . . . . . . . 15 94 Appendix A. Discussions of Alternative Approaches . . . . . . . 16 95 A.1. Reduce Path.Max.Retrans (PMR) . . . . . . . . . . . . . . 16 96 A.2. Adjust RTO related parameters . . . . . . . . . . . . . . 16 97 Appendix B. Discussions for Path Bouncing Effect . . . . . . . . 17 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 100 1. Introduction 102 The Stream Control Transmission Protocol (SCTP) as specified in 103 [RFC4960] supports multihoming at the transport layer -- an SCTP 104 endpoint can bind to multiple IP addresses. SCTP's multihoming 105 features include failure detection and failover procedures to provide 106 network interface redundancy and improved end-to-end fault tolerance. 108 In SCTP's current failure detection procedure, the sender must 109 experience Path.Max.Retrans (PMR) number of consecutive failed timer- 110 based retransmissions on a destination address before detecting a 111 path failure. The sender fails over to an alternate active 112 destination address only after failure detection. Until detecting 113 the failover, the sender continues to transmit data on the failed 114 path, which degrades the SCTP performance. Concurrent Multipath 115 Transfer (CMT) [IYENGAR06] is an extension to SCTP that allows the 116 sender to transmit data on multiple paths simultaneously. Research 117 [NATARAJAN09] shows that the current failure detection procedure 118 worsens CMT performance during failover and can be significantly 119 improved by employing a better failover algorithm. 121 This document specifies an alternative failure detection procedure 122 for SCTP that improves the SCTP performance during a failover. 124 Also the operation after the recovery of a failed path impacts the 125 performance of the protocol. With procedures specified in [RFC4960], 126 SCTP will, after a failover from the primary path, switch back to the 127 primary path for data transfer as soon as this path becomes available 128 again. From a performance perspective, as confirmed in research 129 [CARO02], such a switchback of the data transmission path is not 130 optimal in general. As an optional alternative to the switchback 131 operation of [RFC4960], this document specifies the Permanent 132 Failover procedures proposed by [CARO02]. 134 Additional discussions for alternative approaches that do not require 135 modifications to [RFC4960] and path bouncing effects that might be 136 caused by frequent switchover are provided in the Appendices. 138 2. Conventions and Terminology 140 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 141 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 142 document are to be interpreted as described in [RFC2119]. 144 3. Issues with the SCTP Path Management 146 This section describes issues in the SCTP as specified in [RFC4960] 147 to be fixed by the approach described in this document. 149 An SCTP endpoint can support multiple IP addresses. Each SCTP 150 endpoint exchanges the list of its usable addresses during the 151 initial negotiation with its peer. Then the endpoints select one 152 address from the peer's list and use this as the primary destination 153 address. During normal transmission, an SCTP endpoint sends all user 154 data to the primary destination address. Also, it sends packets 155 containing a HEARTBEAT chunk to all idle destination addresses at a 156 certain interval to check the reachability of these destination 157 addresses. Idle destination addresses normally include all non- 158 primary destination addresses. 160 If a sender has multiple active destination addresses, it can 161 retransmit data to an non-primary destination address, if the 162 transmission to the primary times out. 164 When a sender receives an acknowledgment for DATA or HEARTBEAT chunks 165 sent to one of the destination addresses, it considers that 166 destination address to be active and clears the error counter for the 167 destination address. If it fails to receive acknowledgments, the 168 error count for the destination address is increased. If the error 169 counter exceeds the tunable protocol parameter Path.Max.Retrans 170 (PMR), the SCTP endpoint considers the destination address to be 171 inactive. 173 The failover process of SCTP is initiated when the primary path 174 becomes inactive (the error counter for the primary path exceeds 175 Path.Max.Retrans). If the primary path is marked inactive, SCTP 176 chooses a new destination address from one of the active destinations 177 and start using this address to send data to. If the primary path 178 becomes active again, SCTP uses the primary destination address for 179 subsequent data transmissions and stop using the non-primary one. 181 One issue with this failover process is that it usually takes a 182 significant amount of time before SCTP switches to the new 183 destination address. Let's say the primary path on a multi-homed 184 host becomes unavailable and the RTO value for the primary path at 185 that time is around 1 second, it usually takes over 60 seconds before 186 SCTP starts to use the non-primary path for initial data 187 transmission. This is because the recommended value for 188 Path.Max.Retrans in the [RFC4960] is 5, which requires 6 consecutive 189 timeouts before the failover takes place. Before SCTP switches to 190 the non-primary address, SCTP keeps trying to send packets to the 191 primary address and only retransmitted packets are sent to the non- 192 primary address and thus can be received by the receiver. This slow 193 failover process can cause significant performance degradation and is 194 not acceptable in some situations. 196 Another issue is that once the primary path becomes active again, the 197 traffic is switched back. This is not optimal in some situations. 198 This is further discussed in Section 4.3. 200 4. SCTP with Potentially-Failed Destination State (SCTP-PF) 202 To address the issues described in Section 3, this section extends 203 SCTP path management scheme by adding the Potentially Failed state 204 and the associated failover operation. We use the term SCTP-PF to 205 denote the resulting SCTP path management operation. 207 4.1. SCTP-PF Concept 209 SCTP-PF as defined stems from the following two observations about 210 SCTP's failure detection procedure: 212 o To minimize the performance impact during failover, the sender 213 should avoid transmitting data to the failed destination address 214 as early as possible. In the current SCTP path management scheme, 215 the sender stops transmitting data to a destination destination 216 only after the destination is marked Failed (inactive). Thus, a 217 smaller PMR value is better because the sender can transition a 218 destination address to the Failed (inactive) state quicker. 220 o Smaller PMR values increase the chances of spurious failure 221 detection where the sender incorrectly marks a destination address 222 as Failed (inactive) during periods of temporary congestion. As 223 [RFC4960] recommends for a coupling of the PMR value and the 224 protocol parameter Association.Max.Retrans (AMR) value such 225 spurious failure detection risks to carry over to spurious 226 association failure detection and closure. Larger PMR values are 227 preferable to avoid spurious failure detection. 229 From the above observations it is clear that tuning the PMR value 230 involves the following tradeoff -- a lower value improves performance 231 but increases the chances of spurious failure detection, whereas a 232 higher value degrades performance and reduces spurious failure 233 detection in a wide range of path conditions. Thus, tuning the 234 association's PMR value is an incomplete solution to address the 235 performance impact during failure. 237 SCTP-PF defined in this document introduces a new "Potentially- 238 Failed" (PF) destination state in SCTP's path management procedure. 239 The PF state was originally proposed to improve CMT performance 241 [NATARAJAN09]. The PF state is an intermediate state between the 242 Active and Failed states. SCTP's failure detection procedure is 243 modified to include the PF state. The new failure detection 244 algorithm assumes that loss detected by a timeout implies either 245 severe congestion or failure en-route. After a number of consecutive 246 timeouts on a path, the sender is unsure, and marks the corresponding 247 destination address as PF. A PF destination address is not used for 248 data transmission except in special cases (discussed below). The new 249 failure detection algorithm requires only sender-side changes. 251 4.2. SCTP-PF Algorithm in Detail 253 The SCTP-PF operation is specified as follows: 255 1. The sender maintains a new tunable parameter called Potentially- 256 Failed.Max.Retrans (PFMR). The RECOMMENDED value of PFMR = 0 257 when SCTP-PF is used. When PFMR is larger or equal to PMR, 258 SCTP-PF is turned off. 260 2. The error counter of an active destination address is 261 incremented as specified in [RFC4960]. This means that the 262 error counter of the destination address will be incremented 263 each time the T3-rtx timer expires, or at times where a 264 HEARTBEAT sent to an idle, active address is not acknowledged 265 within an RTO. When the value in the destination address error 266 counter exceeds PFMR, the endpoint MUST mark the destination 267 transport address as PF. 269 3. The sender SHOULD avoid data transmission to PF destination 270 addresses. When the destination addresses are all in PF state 271 or some in PF state and some in inactive state, the sender MUST 272 choose one destination address in PF state and transmit data to 273 this destination. The sender SHOULD choose the destination 274 address in PF state with the lowest error count (fewest 275 consecutive timeouts) for data transmission and transmit data to 276 this destination. When there are multiple PF destinations with 277 same error count, the sender SHOULD let the choice among the 278 multiple PF destination address with equal error count be based 279 on the [RFC4960], section 6.4.1, principles of choosing most 280 divergent source-destination pairs when executing (potentially 281 consecutive) retransmission. This means that the sender SHOULD 282 attempt to pick the most divergent source - destination pair 283 from the last source - destination pair on which data were 284 transmitted or retransmitted. Rules for picking the most 285 divergent source-destination pair are an implementation decision 286 and are not specified within this document. A sender may choose 287 to deploy other strategies than the above when choosing among 288 multiple PF destinations with equal error count. In all cases, 289 the sender MUST NOT change the state of chosen destination 290 address and it MUST NOT clear the destination's error counter as 291 a result of choosing the destination address for data 292 transmission. 294 4. HEARTBEAT chunks SHOULD be sent to PF destination(s) once per 295 RTO, which requires to ignore HB.interval for PF destinations. 296 If a HEARTBEAT chunk is not acknowledged, the sender SHOULD 297 increment the error counter and exponentially back off the RTO 298 value. If error counter is less than PMR, the sender SHOULD 299 transmit another packet containing HEARTBEAT chunk immediately 300 after T3-timer expiration. When data is transmitted to a PF 301 destination, the transmission of HEARTBEAT chunk MAY be omitted 302 as receipt of SACK chunks or a T3-rtx timer expiration can 303 provide equivalent information. It is RECOMMENDED that 304 HEARTBEAT chunks are send to PF destinations regardless of 305 whether the Path Heartbeat function (Section 8.3 of [RFC4960]) 306 is enabled for the destination address or not. 308 5. When the sender receives a HEARTBEAT ACK from a PF destination, 309 the sender MUST clear the destination's error counter and 310 transition the PF destination address back to Active state. 311 When the sender resumes data transmission on the destination 312 address, it MUST do this following the prescriptions of 313 Section 7.2 of [RFC4960]. 315 6. Additional (PMR - PFMR) consecutive timeouts on a PF destination 316 address confirm the path failure, upon which the destination 317 address transitions to the Inactive state. As described in 318 [RFC4960], the sender (i) SHOULD notify ULP about this state 319 transition, and (ii) transmit HEARTBEAT chunks to the Inactive 320 destination address at a lower frequency as described in 321 Section 8.3 of [RFC4960] (when this function is enabled for the 322 destination address). 324 7. When all destinations are in inactive state (association dormant 325 state) the sender MUST also choose one destination address to 326 transmit data to. The sender SHOULD choose the destination 327 address in inactive state with the lowest error count (fewest 328 consecutive timeouts) for data transmission and transmit data to 329 this destination. When there are multiple destination addresses 330 with same error count in inactive state, the sender SHOULD 331 attempt to pick the most divergent source - destination pair 332 from the last source - destination pair on which data were 333 transmitted or retransmitted following [RFC4960]. Rules for 334 picking the most divergent source-destination pair are an 335 implementation decision and are not specified within this 336 document. Therefore, a sender SHOULD allow for incrementing the 337 destination error counters up to some reasonable limit larger 338 than PMR+1, thus changing the prescriptions of [RFC4960], 339 section 8.3, in this respect. The exact limit to apply is not 340 specified in this document but it is considered reasonable to 341 require for such to be an order of magnitude higher than the PMR 342 value. A sender MAY choose to deploy other strategies than the 343 above. For example, a sender could choose to prioritize the 344 last active destination address during dormant state. The 345 strategy to prioritize the last active destination address is 346 optimal when some paths are permanently inactive, but suboptimal 347 when paths' instability is transient. While the increment of 348 the error counters above PMR+1 is a prerequisite for the error 349 counter values to serve to guide the path selection in dormant 350 state, then it is noted that by virtue of the introduction of 351 the Potentially Failed state, one may deploy higher values of 352 PMR without compromising the efficiency of the failover 353 operation, and thus making the increase of path error counters 354 above PMR+1 less critical as the dormant state will be less 355 likely to happen. The downside of increasing the PMR value 356 relative to the AMR value, however, is that the per destination 357 address failure detection and notification of such to ULP 358 thereby is weakened. In all cases the sender MUST NOT change 359 the state of the chosen destination address and it MUST NOT 360 clear the destination's error counter as a result of choosing 361 the destination address for data transmission. 363 8. Acknowledgments for chunks that have been transmitted to 364 multiple destinations (i.e., a chunk which has been 365 retransmitted to a different destination address than the 366 destination address to which the chunk was first transmitted) 367 SHOULD NOT clear the error count of an inactive destination 368 address and SHOULD NOT transition a PF destination address back 369 to Active state, since a sender cannot disambiguate whether the 370 ACK was for the original transmission or the retransmission(s). 371 The same ambiguity concerns the related congestion window 372 growth. The bytes of a newly acknowledged chunk which has been 373 transmitted to multiple destination addresses SHOULD be 374 considered for contribution to the congestion window growth 375 towards the destination address where the chunk was last sent. 376 The contribution of the ACKed bytes to the window growth is 377 subject to the prescriptions described in Section 7.2 of 378 [RFC4960] is fulfilled. A SCTP sender MAY apply a different 379 approach for both the error count handling and the congestion 380 control growth handling based on unequivocally information on 381 which destination (including multiple destination addresses) the 382 chunk reached. This document makes no reference to what such 383 unequivocally information could consist of, neither how such 384 unequivocally information could be obtained. The implementation 385 of such an alternative approach is left to implementations. 387 9. Acknowledgments for chunks that has been transmitted to one 388 destination address only MUST clear the error counter of the 389 destination address and MUST transition a PF destination address 390 back to Active state. This situation can happen when new data 391 is sent to a destination address in PF state. It can also 392 happen in situations where the destination address is in PF 393 state due to the occurrence of a spurious T3-rtx timer and 394 Acknowledgments start to arrive for data sent prior to 395 occurrence of the spurious T3-rtx and data has not yet been 396 retransmitted towards other destinations. This document does 397 not specify special handling for detection of or reaction to 398 spurious T3-rtx timeouts, e.g., for special operation vis-a-vis 399 the congestion control handling or data retransmission operation 400 towards a destination address which undergoes a transition from 401 active to PF to active state due to a spurious T3-rtx timeout. 402 But it is noted that this is an area which would benefit from 403 additional attention, experimentation and specification for 404 Single Homed SCTP as well as for Multi Homed SCTP protocol 405 operation. 407 10. SCTP stack SHOULD provide the ULP with the means to expose the 408 PF state of its destinations as well as the means to notify the 409 state transitions from Active to PF, and vice-versa. When doing 410 this, such an SCTP stack MUST provide the ULP with the means to 411 suppress exposure of PF state and associated state transitions 412 as well. 414 4.3. Optional Feature: Permanent Failover 416 In [RFC4960], an SCTP sender migrates the traffic back to the 417 original primary destination address once this address becomes active 418 again. As the CWND towards the original primary destination address 419 has to be rebuilt once data transfer resumes, the switch back to use 420 the original primary address is not always optimal. Indeed [CARO02] 421 shows that the switch back to the original primary may degrade SCTP 422 performance compared to continuing data transmission on the same 423 path, especially, but not only, in scenarios where this path's 424 characteristics are better. In order to mitigate this performance 425 degradation, the Permanent Failover operation was proposed in 426 [CARO02]. When SCTP changes the destination address due to failover, 427 Permanent Failover operation allows SCTP sender to continue data 428 transmission on the new working path even when the old primary 429 destination address becomes active again. This is achieved by having 430 SCTP perform a switch over of the primary path to the alternative 431 working path rather than having SCTP switch back data transfer to the 432 (previous) primary path. 434 The manner of switch over operation that is most optimal in a given 435 scenario depends on the relative quality of a set primary path versus 436 the quality of alternative paths available as well as it depends on 437 the extent to which it is desired for the mode of operation to 438 enforce traffic distribution over a number of network paths. I.e., 439 load distribution of traffic from multiple SCTP associations may be 440 sought to be enforced by distribution of the set primary paths with 441 [RFC4960] switchback operation. However as [RFC4960] switchback 442 behavior is suboptimal in certain situations, especially in scenarios 443 where a number of equally good paths are available, it is recommended 444 for SCTP to support also, as alternative behavior, the Permanent 445 Failover switch over modes of operation. 447 The Permanent Failover operation requires only sender side changes. 448 The details are: 450 1. The sender maintains a new tunable parameter, called 451 Primary.Switchover.Max.Retrans (PSMR). The PSMR MUST be set 452 greater or equal to the PFMR value. Implementations MUST reject 453 any other values of PSMR. 455 2. When the path error counter on a set primary path exceeds PSMR, 456 the SCTP implementation MUST autonomously select and set a new 457 primary path. 459 3. The primary path selected by the SCTP implementation MUST be the 460 path which at the given time would be chosen for data transfer. 461 A previously failed primary path MAY come in use as data transfer 462 path as per normal path selection when the present data transfer 463 path fails. 465 4. The recommended value of PSMR is PFMR when Permanent Failover is 466 used. This means that no forced switchback to a previously 467 failed primary path is performed. An implementation of Permanent 468 Failover MUST support the setting of PSMR = PFMR. An 469 implementation of Permanent Failover MAY support setting of PSMR 470 > PFMR. 472 5. It MUST be possible to disable the Permanent Failover and obtain 473 the standard switchback operation of [RFC4960]. 475 This specifications RECOMMENDS a default configuration that uses 476 standard RFC4960 switchback, i.e., switch back to the old primary 477 destination once the destination address becomes active again. 478 However, to support optimal operation in a wider range of network 479 scenarios, an implementation MAY implement Permanent Failover 480 operation as detailed above and MAY enable it based on network 481 configurations or users' requests. 483 5. Socket API Considerations 485 This section describes how the socket API defined in [RFC6458] is 486 extended to provide a way for the application to control and observe 487 the SCTP-PF behavior. 489 Please note that this section is informational only. 491 A socket API implementation based on [RFC6458] is, by means of the 492 existing SCTP_PEER_ADDR_CHANGE event, extended to provide the event 493 notification when a peer address enters or leaves the potentially 494 failed state as well as the socket API implementation is extended to 495 expose the potentially failed state of a peer address in the existing 496 SCTP_GET_PEER_ADDR_INFO structure. 498 Furthermore, two new read/write socket options for the level 499 IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS and 500 SCTP_EXPOSE_POTENTIALLY_FAILED_STATE are defined as described below. 501 The first socket option is used to control the values of the PFMR and 502 PSMR parameters described in Section 4. The second one controls the 503 exposition of the potentially failed path state. 505 Support for the SCTP_PEER_ADDR_THLDS and 506 SCTP_EXPOSE_POTENTIALLY_FAILED_STATE socket options need also to be 507 added to the function sctp_opt_info(). 509 5.1. Support for the Potentially Failed Path State 511 As defined in [RFC6458], the SCTP_PEER_ADDR_CHANGE event is provided 512 if the status of a peer address changes. In addition to the state 513 changes described in [RFC6458], this event is also provided, if a 514 peer address enters or leaves the potentially failed state. The 515 notification as defined in [RFC6458] uses the following structure: 517 struct sctp_paddr_change { 518 uint16_t spc_type; 519 uint16_t spc_flags; 520 uint32_t spc_length; 521 struct sockaddr_storage spc_aaddr; 522 uint32_t spc_state; 523 uint32_t spc_error; 524 sctp_assoc_t spc_assoc_id; 525 } 527 [RFC6458] defines the constants SCTP_ADDR_AVAILABLE, 528 SCTP_ADDR_UNREACHABLE, SCTP_ADDR_REMOVED, SCTP_ADDR_ADDED, and 529 SCTP_ADDR_MADE_PRIM to be provided in the spc_state field. This 530 document defines in addition to that the new constant 531 SCTP_ADDR_POTENTIALLY_FAILED, which is reported if the affected 532 address becomes potentially failed. 534 The SCTP_GET_PEER_ADDR_INFO socket option defined in [RFC6458] can be 535 used to query the state of a peer address. It uses the following 536 structure: 538 struct sctp_paddrinfo { 539 sctp_assoc_t spinfo_assoc_id; 540 struct sockaddr_storage spinfo_address; 541 int32_t spinfo_state; 542 uint32_t spinfo_cwnd; 543 uint32_t spinfo_srtt; 544 uint32_t spinfo_rto; 545 uint32_t spinfo_mtu; 546 }; 548 [RFC6458] defines the constants SCTP_UNCONFIRMED, SCTP_ACTIVE, and 549 SCTP_INACTIVE to be provided in the spinfo_state field. This 550 document defines in addition to that the new constant 551 SCTP_POTENTIALLY_FAILED, which is reported if the peer address is 552 potentially failed. 554 5.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket Option 556 Applications can control the SCTP-PF behavior by getting or setting 557 the number of consecutive timeouts before a peer address is 558 considered potentially failed or unreachable and before the primary 559 path is changed automatically. This socket option uses the level 560 IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS. 562 The following structure is used to access and modify the thresholds: 564 struct sctp_paddrthlds { 565 sctp_assoc_t spt_assoc_id; 566 struct sockaddr_storage spt_address; 567 uint16_t spt_pathmaxrxt; 568 uint16_t spt_pathpfthld; 569 uint16_t spt_pathcpthld; 570 }; 572 spt_assoc_id: This parameter is ignored for one-to-one style 573 sockets. For one-to-many style sockets the application may fill 574 in an association identifier or SCTP_FUTURE_ASSOC. It is an error 575 to use SCTP_{CURRENT|ALL}_ASSOC in spt_assoc_id. 577 spt_address: This specifies which peer address is of interest. If a 578 wildcard address is provided, this socket option applies to all 579 current and future peer addresses. 581 spt_pathmaxrxt: Each peer address of interest is considered 582 unreachable, if its path error counter exceeds spt_pathmaxrxt. 584 spt_pathpfthld: Each peer address of interest is considered 585 potentially failed, if its path error counter exceeds 586 spt_pathpfthld. 588 spt_pathcpthld: Each peer address of interest is not considered the 589 primary remote address anymore, if its path error counter exceeds 590 spt_pathcpthld. Using a value of 0xffff disables the selection of 591 a new primary peer address. If an implementation does not support 592 the automatically selection of a new primary address, it should 593 indicate an error with errno set to EINVAL if a value different 594 from 0xffff is used in spt_pathcpthld. Setting of spt_pathcpthld 595 < spt_pathpfthld should be rejected with errno set to EINVAL. An 596 implementation MAY support only setting of spt_pathcpthld = 597 spt_pathpfthld and spt_pathcpthld = 0xffff. In this case it shall 598 reject setting of other values with errno set to EINVAL. 600 5.3. Exposing the Potentially Failed Path State 601 (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option 603 Applications can control the exposure of the potentially failed path 604 state in the SCTP_PEER_ADDR_CHANGE event and the 605 SCTP_GET_PEER_ADDR_INFO as described in Section 5.1. The default 606 value is implementation specific. 608 This socket option uses the level IPPROTO_SCTP and the name 609 SCTP_EXPOSE_POTENTIALLY_FAILED_STATE. 611 The following structure is used to control the exposition of the 612 potentially failed path state: 614 struct sctp_assoc_value { 615 sctp_assoc_t assoc_id; 616 uint32_t assoc_value; 617 }; 619 assoc_id: This parameter is ignored for one-to-one style sockets. 620 For one-to-many style sockets the application may fill in an 621 association identifier or SCTP_FUTURE_ASSOC. It is an error to 622 use SCTP_{CURRENT|ALL}_ASSOC in assoc_id. 624 assoc_value: The potentially failed path state is exposed if and 625 only if this parameter is non-zero. 627 6. Security Considerations 629 Security considerations for the use of SCTP and its APIs are 630 discussed in [RFC4960] and [RFC6458]. There are no new security 631 considerations introduced in this document. 633 7. IANA Considerations 635 This document does not create any new registries or modify the rules 636 for any existing registries managed by IANA. 638 8. Proposed Change of Status (to be Deleted before Publication) 640 Initially this work looked to entail some changes of the Congestion 641 Control (CC) operation of SCTP and for this reason the work was 642 proposed as Experimental. These intended changes of the CC operation 643 have since been judged to be irrelevant and are no longer part of the 644 specification. As the specification entails no other potential 645 harmful features, consensus exists in the WG to bring the work 646 forward as PS. 648 Initially concerns have been expressed about the possibility for the 649 mechanism to introduce path bouncing with potential harmful network 650 impacts. These concerns are believed to be unfounded. This issue is 651 addressed in Appendix B. 653 It is noted that the feature specified by this document is 654 implemented by multiple SCTP SW implementations and furthermore that 655 various variants of the solution have been deployed in Telco 656 signaling environments for several years with good results. 658 9. References 660 9.1. Normative References 662 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 663 Requirement Levels", BCP 14, RFC 2119, March 1997. 665 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 666 4960, September 2007. 668 9.2. Informative References 670 [CARO02] Caro Jr., A., Iyengar, J., Amer, P., Heinz, G., and R. 671 Stewart, "A Two-level Threshold Recovery Mechanism for 672 SCTP", Tech report, CIS Dept, University of Delaware , 7 673 2002. 675 [CARO04] Caro Jr., A., Amer, P., and R. Stewart, "End-to-End 676 Failover Thresholds for Transport Layer Multihoming", 677 MILCOM 2004 , 11 2004. 679 [CARO05] Caro Jr., A., "End-to-End Fault Tolerance using Transport 680 Layer Multihoming", Ph.D Thesis, University of Delaware , 681 1 2005. 683 [FALLON08] 684 Fallon, S., Jacob, P., Qiao, Y., Murphy, L., Fallon, E., 685 and A. Hanley, "SCTP Switchover Performance Issues in WLAN 686 Environments", IEEE CCNC 2008, 1 2008. 688 [GRINNEMO04] 689 Grinnemo, K-J. and A. Brunstrom, "Performance of SCTP- 690 controlled failovers in M3UA-based SIGTRAN networks", 691 Advanced Simulation Technologies Conference , 4 2004. 693 [IYENGAR06] 694 Iyengar, J., Amer, P., and R. Stewart, "Concurrent 695 Multipath Transfer using SCTP Multihoming over Independent 696 End-to-end Paths.", IEEE/ACM Trans on Networking 14(5), 10 697 2006. 699 [JUNGMAIER02] 700 Jungmaier, A., Rathgeb, E., and M. Tuexen, "On the use of 701 SCTP in failover scenarios", World Multiconference on 702 Systemics, Cybernetics and Informatics , 7 2002. 704 [NATARAJAN09] 705 Natarajan, P., Ekiz, N., Amer, P., and R. Stewart, 706 "Concurrent Multipath Transfer during Path Failure", 707 Computer Communications , 5 2009. 709 [RFC6458] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. 710 Yasevich, "Sockets API Extensions for the Stream Control 711 Transmission Protocol (SCTP)", RFC 6458, December 2011. 713 Appendix A. Discussions of Alternative Approaches 715 This section lists alternative approaches for the issues desribed in 716 this document. Although these approaches do not require to update 717 RFC4960, we do not recommend them from the reasons described below. 719 A.1. Reduce Path.Max.Retrans (PMR) 721 Smaller values for Path.Max.Retrans shorten the failover duration. 722 In fact, this is recommended in some research results [JUNGMAIER02] 723 [GRINNEMO04] [FALLON08]. For example, if when Path.Max.Retrans=0, 724 SCTP switches to another destination address on a single timeout. 725 This smaller value for Path.Max.Retrans can results in spurious 726 failover, which might be a problem. 728 Unlike SCTP-PF, the interval for heartbeat packets is governed by 729 'HB.interval' even during failover process. 'HB.interval' is usually 730 set in the order of seconds (recommended value is 30 seconds). When 731 the primary path becomes inactive, the next HB can be transmitted 732 only seconds later. Meanwhile, the primary path may have recovered. 733 In such situations, post failover, an endpoint is forced to wait on 734 the order of seconds before the endpoint can resume transmission on 735 the primary path. However, using smaller value for 'HB.interval' 736 might help this situation, but it will be the waste of bandwidth in 737 most cases. 739 In addition, smaller Path.Max.Retrans values also affect 740 'Association.Max.Retrans' values. When the SCTP association's error 741 count (sum of error counts on all ACTIVE paths) exceeds 742 Association.Max.Retrans threshold, the SCTP sender considers the peer 743 endpoint unreachable and terminates the association. Therefore, 744 Section 8.2 in [RFC4960] recommends that Association.Max.Retrans 745 value should not be larger than the summation of the Path.Max.Retrans 746 of each of the destination addresses, else the SCTP sender considers 747 its peer reachable even when all destinations are INACTIVE. To avoid 748 such inconsistent behavior an SCTP implementation SHOULD reduce 749 Association.Max.Retrans accordingly whenever it reduces 750 Path.Max.Retrans. However, smaller Association.Max.Retrans value 751 increases chances of association termination during minor congestion 752 events. 754 A.2. Adjust RTO related parameters 756 As several research results indicate, we can also shorten the 757 duration of failover process by adjusting RTO related parameters 758 [JUNGMAIER02] [FALLON08]. During failover process, RTO keeps being 759 doubled. However, if we can choose smaller value for RTO.max, we can 760 stop the exponential growth of RTO at some point. Also, choosing 761 smaller values for RTO.initial or RTO.min can contribute to keep RTO 762 value small. 764 Similar to reducing Path.Max.Retrans, the advantage of this approach 765 is that it requires no modification to the current specification, 766 although it needs to ignore several recommendations described in the 767 Section 15 of [RFC4960]. However, this approach requires to have 768 enough knowledge about the network characteristics between end 769 points. Otherwise, it can introduce adverse side-effects such as 770 spurious timeouts. 772 Appendix B. Discussions for Path Bouncing Effect 774 The methods described in the document can accelerate the failover 775 process. Hence, they might introduce the path bouncing effect where 776 the sender keeps changing the data transmission path frequently. 777 This sounds harmful to the data transfer, however several research 778 results indicate that there is no serious problem with SCTP in terms 779 of path bouncing effect [CARO04] [CARO05]. 781 There are two main reasons for this. First, SCTP is basically 782 designed for multipath communication, which means SCTP maintains all 783 path related parameters (CWND, ssthresh, RTT, error count, etc) per 784 each destination address. These parameters cannot be affected by 785 path bouncing. In addition, when SCTP migrates the data transfer to 786 another path, it starts with the minimal or the initial CWND. Hence, 787 there is little chance for packet reordering or duplicating. 789 Second, even if all communication paths between the end-nodes share 790 the same bottleneck, the SCTP-PF results in a behavior already 791 allowed by [RFC4960]. 793 Authors' Addresses 795 Yoshifumi Nishida 796 GE Global Research 797 2623 Camino Ramon 798 San Ramon, CA 94583 799 USA 801 Email: nishida@wide.ad.jp 802 Preethi Natarajan 803 Cisco Systems 804 510 McCarthy Blvd 805 Milpitas, CA 95035 806 USA 808 Email: prenatar@cisco.com 810 Armando Caro 811 BBN Technologies 812 10 Moulton St. 813 Cambridge, MA 02138 814 USA 816 Email: acaro@bbn.com 818 Paul D. Amer 819 University of Delaware 820 Computer Science Department - 434 Smith Hall 821 Newark, DE 19716-2586 822 USA 824 Email: amer@udel.edu 826 Karen E. E. Nielsen 827 Ericsson 828 Kistavaegen 25 829 Stockholm 164 80 830 Sweden 832 Email: karen.nielsen@tieto.com