idnits 2.17.1 

draft-ietf-tsvwg-sctp-failover-11.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 17, 2015) is 3206 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260)


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         Y. Nishida
3	Internet-Draft                                        GE Global Research
4	Intended status: Standards Track                            P. Natarajan
5	Expires: January 18, 2016                                  Cisco Systems
6	                                                                 A. Caro
7	                                                        BBN Technologies
8	                                                                 P. Amer
9	                                                  University of Delaware
10	                                                              K. Nielsen
11	                                                                Ericsson
12	                                                           July 17, 2015

14	               SCTP-PF: Quick Failover Algorithm in SCTP
15	                 draft-ietf-tsvwg-sctp-failover-11.txt

17	Abstract

19	   SCTP supports multi-homing.  However, when the failover operation
20	   specified in RFC4960 is followed, there can be significant delay and
21	   performance degradation in the data transfer path failover.  To
22	   overcome this problem this document specifies a quick failover
23	   algorithm (SCTP-PF) based on the introduction of a Potentially Failed
24	   (PF) state in SCTP Path Management.

26	   The document also specifies a dormant state operation of SCTP.  This
27	   dormant state operation is required to be followed by an SCTP-PF
28	   implementation, but it may equally well be applied by a standard
29	   RFC4960 SCTP implementation.

31	   Additionally, the document introduces an alternative switchback mode
32	   called Permanent Failover that will be beneficial in some situations.
33	   This mode of operation applies to both a standard RFC4960 SCTP
34	   implementation as well as to a SCTP-PF implementation.

36	   The procedures defined in the document require only minimal
37	   modifications to the RFC4960 specification.  The procedures are
38	   sender-side only and do not impact the SCTP receiver.

40	Status of This Memo

42	   This Internet-Draft is submitted in full conformance with the
43	   provisions of BCP 78 and BCP 79.

45	   Internet-Drafts are working documents of the Internet Engineering
46	   Task Force (IETF).  Note that other groups may also distribute
47	   working documents as Internet-Drafts.  The list of current Internet-
48	   Drafts is at http://datatracker.ietf.org/drafts/current/.

50	   Internet-Drafts are draft documents valid for a maximum of six months
51	   and may be updated, replaced, or obsoleted by other documents at any
52	   time.  It is inappropriate to use Internet-Drafts as reference
53	   material or to cite them other than as "work in progress."

55	   This Internet-Draft will expire on January 18, 2016.

57	Copyright Notice

59	   Copyright (c) 2015 IETF Trust and the persons identified as the
60	   document authors.  All rights reserved.

62	   This document is subject to BCP 78 and the IETF Trust's Legal
63	   Provisions Relating to IETF Documents
64	   (http://trustee.ietf.org/license-info) in effect on the date of
65	   publication of this document.  Please review these documents
66	   carefully, as they describe your rights and restrictions with respect
67	   to this document.  Code Components extracted from this document must
68	   include Simplified BSD License text as described in Section 4.e of
69	   the Trust Legal Provisions and are provided without warranty as
70	   described in the Simplified BSD License.

72	Table of Contents

74	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
75	   2.  Conventions and Terminology . . . . . . . . . . . . . . . . .   4
76	   3.  SCTP with Potentially-Failed Destination State (SCTP-PF)  . .   4
77	     3.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   4
78	     3.2.  Specification of the SCTP-PF Procedures . . . . . . . . .   5
79	   4.  Dormant State Operation . . . . . . . . . . . . . . . . . . .   9
80	     4.1.  SCTP Dormant State Procedure  . . . . . . . . . . . . . .  10
81	   5.  Permanent Failover  . . . . . . . . . . . . . . . . . . . . .  11
82	   6.  Suggested SCTP Protocol Parameter Values  . . . . . . . . . .  12
83	   7.  Socket API Considerations . . . . . . . . . . . . . . . . . .  12
84	     7.1.  Support for the Potentially Failed Path State . . . . . .  13
85	     7.2.  Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket
86	           Option  . . . . . . . . . . . . . . . . . . . . . . . . .  14
87	     7.3.  Exposing the Potentially Failed Path State
88	           (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option  . .  15
89	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  15
90	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
91	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  16
92	   11. Proposed Change of Status (to be Deleted before Publication)   16
93	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  17
94	     12.1.  Normative References . . . . . . . . . . . . . . . . . .  17
95	     12.2.  Informative References . . . . . . . . . . . . . . . . .  17
96	   Appendix A.  Discussions of Alternative Approaches  . . . . . . .  18
97	     A.1.  Reduce Path.Max.Retrans (PMR) . . . . . . . . . . . . . .  18
98	     A.2.  Adjust RTO related parameters . . . . . . . . . . . . . .  19
99	   Appendix B.  Discussions for Path Bouncing Effect . . . . . . . .  19
100	   Appendix C.  SCTP-PF for SCTP Single-homed Operation  . . . . . .  20
101	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20

103	1.  Introduction

105	   The Stream Control Transmission Protocol (SCTP) specified in
106	   [RFC4960] supports multi homing at the transport layer.  SCTP's multi
107	   homing features include failure detection and failover procedures to
108	   provide network interface redundancy and improved end-to-end fault
109	   tolerance.  In SCTP's current failure detection procedure, the sender
110	   must experience Path.Max.Retrans (PMR) number of consecutive failed
111	   timer-based retransmissions on a destination address before detecting
112	   a path failure.  Until detecting the path failure, the sender
113	   continues to transmit data on the failed path.  The prolonged time in
114	   which [RFC4960] SCTP continues to use a failed path severely degrades
115	   the performance of the protocol.  To address this problem, this
116	   document specifies a quick failover algorithm (SCTP-PF) based on the
117	   introduction of a new Potentially Failed path state in SCTP path
118	   management.  The performance deficiencies of the [RFC4960] failover
119	   operation, and the improvements obtainable from the introduction of a
120	   Potentially Failed state in SCTP, were proposed and documented in
121	   [NATARAJAN09] for Concurrent Multipath Transfer SCTP [IYENGAR06].

123	   While SCTP-PF can accelerate failover process and improve
124	   performance, the risks that an SCTP endpoint enters in dormant state
125	   where all destination addresses are inactive can be increased.
126	   [RFC4960] leaves the protocol operation during dormant state to
127	   implementations and encourages to avoid entering the state as much as
128	   possible by careful tuning of the Path.Max.Retrans (PMR) and
129	   Association.Max.Retrans (AMR) parameters.  We specify a dormant state
130	   operation for SCTP-PF which makes SCTP-PF provide the same disruption
131	   tolerance as [RFC4960] despite that the dormant state may be entered
132	   more quickly.  The dormant state operation may equally well be
133	   applied by an [RFC4960] implementation and will here serve to provide
134	   added fault tolerance for situations where the tuning of the
135	   Path.Max.Retrans (PMR) and Association.Max.Retrans (AMR) parameters
136	   fail to provide adequate prevention of the entering of the dormant
137	   state.

139	   The operation after the recovery of a failed path equally well
140	   impacts the performance of the protocol.  With the procedures
141	   specified in [RFC4960] SCTP will, after a failover from the primary
142	   path, switch back to use the primary path for data transfer as soon
143	   as this path becomes available again.  From a performance perspective
144	   such a forced switchback of the data transmission path can be
145	   suboptimal as the CWND towards the original primary destination
146	   address has to be rebuilt once data transfer resumes, [CARO02].  As
147	   an optional alternative to the switchback operation of [RFC4960],
148	   this document specifies an alternative Permanent Failover procedure
149	   which avoid such forced switchbacks of the data transfer path.  The
150	   Permanent Failover operation was originally proposed in [CARO02].

152	   While SCTP-PF primarily is motivated by a desire to improve the
153	   multi-homed operation, the feature applies also to SCTP single-homed
154	   operation.  Here the algorithm serves to provide increased failure
155	   detection on idle associations, whereas the failover or switchback
156	   aspects of the algorithm will not be activated.  This is discussed in
157	   more detail in Appendix C.

159	   A brief description of the motivation for the introduction of the
160	   Potentially Failed state including a discussion of alternative
161	   approaches to mitigate the deficiencies of the [RFC4960] failover
162	   operation are given in the Appendices.  Discussion of path bouncing
163	   effects that might be caused by frequent switchover, are also
164	   provided there.

166	2.  Conventions and Terminology

168	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
169	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
170	   document are to be interpreted as described in [RFC2119].

172	3.  SCTP with Potentially-Failed Destination State (SCTP-PF)

174	3.1.  Overview

176	   To minimize the performance impact during failover, the sender should
177	   avoid transmitting data to a failed destination address as early as
178	   possible.  In the [RFC4960] SCTP path management scheme, the sender
179	   stops transmitting data to a destination address only after the
180	   destination address is marked inactive.  This process takes a
181	   significant amount of time as it requires the error counter of the
182	   destination address to exceed the Path.Max.Retrans (PMR) threshold.
183	   The issue cannot simply be mitigated by lowering of the PMR threshold
184	   because this may result in spurious failure detection and unnecessary
185	   prevention of the usage of a preferred primary path as well as it,
186	   due to the coupled tuning of the Path.Max.Retrans (PMR) and the
187	   Association.Max.Retrans (AMR) parameter values in [RFC4960], may
188	   result in compromisation of the fault tolerance of SCTP.

190	   The solution provided in this document is to extend the SCTP path
191	   management scheme of [RFC4960] by the addition of the Potentially
192	   Failed (PF) state as an intermediate state in between the active and
193	   inactive state of a destination address in [RFC4960] path management
194	   scheme, and let the failover of data transfer away from a destination
195	   address be driven by the entering of the PF state instead of by the
196	   entering of the inactive state.  Thereby SCTP may perform quick
197	   failover without compromising the overall fault tolerance of
198	   [RFC4960] SCTP.  At the same time, RTO-based HEARTBEAT probing is
199	   initiated towards a destination address once it enters PF state.
200	   Thereby SCTP may quickly ascertain whether network connectivity
201	   towards the destination address is broken or whether the failover was
202	   spurious.  In the case where the failover was spurious data transfer
203	   may quickly resume towards the original destination address.

205	   The new failure detection algorithm assumes that loss detected by a
206	   timeout implies either severe congestion or network connectivity
207	   failure and it assumes that by default a destination address is
208	   classified as PF already at the occurrence of one first timeout.

210	3.2.  Specification of the SCTP-PF Procedures

212	   The SCTP-PF operation is specified as follows:

214	   1.   The sender maintains a new tunable SCTP Protocol Parameter
215	        called PotentiallyFailed.Max.Retrans (PFMR).  The PFMR defines
216	        the new intermediate PF threshold on the destination address
217	        error counter at exceed of which the destination address is
218	        classified as PF.  The RECOMMENDED value of PFMR is 0, but other
219	        values MAY be used.  Setting PFMR larger to or equal to
220	        Path.Max.Retrans (PMR) does not result in definition of a PF
221	        threshold for the destination address.  I.e., the destination
222	        address will not be classified as PF prior to reaching inactive
223	        state.

225	   2.   The error counter of an active destination address is
226	        incremented as specified in [RFC4960].  This means that the
227	        error counter of the destination address will be incremented
228	        each time the T3-rtx timer expires, or each time a HEARTBEAT
229	        chunk is sent when idle and not acknowledged within an RTO.
230	        When the value in the destination address error counter exceeds
231	        PFMR, the endpoint MUST mark the destination address as in the
232	        PF state.

234	   3.   The PFMR threshold defines the point the destination address no
235	        longer is considered a good candidate for data transmission and
236	        a SCTP-PF sender SHOULD NOT send data to destination addresses
237	        in PF state when alternative destination addresses in active
238	        state are available.  Specifically this means that:

240	        i  When there is outbound data to send and the destination
241	           address presently used for data transmission is in PF state,
242	           the sender SHOULD choose a destination address in active
243	           state, if one exists, and failover to deploy this destination
244	           address for data transmission.

246	        ii When retransmitting data that has timed out and the sender
247	           thus by [RFC4960], section 6.4.1, should attempt to pick a
248	           new destination address for data retransmission, the sender
249	           SHOULD choose an alternate destination transport address in
250	           active state if one exists.

252	        iii  When there is outbound data to send and the SCTP user
253	           explicitly requests to send data to a destination address in
254	           PF state, the sender SHOULD send the data to an alternate
255	           destination address in active state if one exists.

257	        When choosing among multiple destination address in active state
258	        the following considerations are given:

260	        A.  An SCTP sender should comply with [RFC4960], section 6.4.1,
261	            principles of choosing most divergent source-destination
262	            pairs compared with, for i.: the destination address in PF
263	            state that it performs a failover from, and for ii.: the
264	            destination address towards which the data timed out.  Rules
265	            for picking the most divergent source-destination pair are
266	            an implementation decision and are not specified within this
267	            document.

269	        B.  A SCTP-PF sender MAY choose to send data to a destination
270	            address in PF state, even if destination addresses in active
271	            state exist, have the SCTP-PF sender other means of
272	            information available that disqualifies the destination
273	            address in active state from being preferred.  However, the
274	            discussion of such mechanisms is outside of the scope of the
275	            SCTP-PF operation specified in this document.

277	        In all cases, the sender MUST NOT change the state of chosen
278	        destination address, whether this state be active or PF, and it
279	        MUST NOT clear the error counter of the destination address as a
280	        result of choosing the destination address for data
281	        transmission.

283	   4.   When the destination addresses are all in PF state or some in PF
284	        state and some in inactive state, the sender MUST choose one
285	        destination address in PF state and transmit or retransmit data
286	        to this destination address using the following rules:

288	        A.  The sender SHOULD choose the destination in PF state with
289	            the lowest error count (fewest consecutive timeouts) for
290	            data transmission and transmit or retransmit data to this
291	            destination.

293	        B.  When there are multiple PF destinations with same error
294	            count, the sender should let the choice among the multiple
295	            PF destination with equal error count be based on the
296	            [RFC4960], section 6.4.1, principles of choosing most
297	            divergent source-destination pairs when executing
298	            (potentially consecutive) retransmission.  Rules for picking
299	            the most divergent source-destination pair are an
300	            implementation decision and are not specified within this
301	            document.

303	        C.  A sender MAY choose to deploy other strategies than the
304	            above when choosing among multiple PF destinations have the
305	            SCTP-PF sender other means of information available that
306	            qualifies a particular destination address for being used.
307	            The SCTP-PF protocol operation specified in this document
308	            makes no assumption of the existence of such other means of
309	            information and specifies for the above as the default
310	            operation of an SCTP-PF sender.

312	        The sender MUST NOT change the state and the error counter of
313	        any destination address regardless of whether it has been chosen
314	        for transmission or not.

316	   5.   The HB.interval of the Path Heartbeat function of [RFC4960]
317	        MUST be ignored for destination addresses in PF state.  Instead
318	        HEARTBEAT chunks are sent to destination addresses in PF state
319	        once per RTO.  HEARTBEAT chunks SHOULD be sent to destination
320	        addresses in PF state, but the sending of HEARTBEATS MUST honor
321	        whether the Path Heartbeat function (Section 8.3 of [RFC4960])
322	        is enabled for the destination address or not.  I.e., if the
323	        Path Heartbeat function is disabled for the destination address
324	        in question, HEARTBEATS MUST NOT be sent.  Note that when
325	        Heartbeat function is disabled, it may take longer to transition
326	        PF destination to ACTIVE.

328	   6.   HEARTBEATs are sent when a destination address reaches the PF
329	        state.  When a HEARTBEAT chunk is not acknowledged within the
330	        RTO, the sender increments the error counter and exponentially
331	        backs off the RTO value.  If the error counter is less than PMR,
332	        the sender transmits another packet containing the HEARTBEAT
333	        chunk immediately after timeout expiration on the previous
334	        HEARTBEAT.  When data is being transmitted to a destination
335	        address in the PF state, the transmission of a HEARTBEAT chunk
336	        MAY be omitted in case receipt of a SACK of or a T3-rtx timer
337	        expiration on the outstanding data can provide equivalent
338	        information, such as a case where the data chunk has transmitted
339	        to a single destination.  Likewise, the timeout of a HEARTBEAT
340	        chunk MAY be ignored if data is outstanding towards the
341	        destination address.

343	   7.   When the sender receives a HEARTBEAT ACK from a HEARTBEAT sent
344	        to a destination address in PF state, the sender MUST clear the
345	        error counter of the destination address and transition the
346	        destination address back to active state.  When the sender
347	        resumes data transmission on the destination address, it MUST do
348	        this following the prescriptions of Section 7.2 of [RFC4960].

350	   8.   Additional (PMR - PFMR) consecutive timeouts on a destination
351	        address in PF state confirm the path failure, upon which the
352	        destination address transitions to the inactive state.  As
353	        described in [RFC4960], the sender (i) SHOULD notify the ULP
354	        about this state transition, and (ii) transmit HEARTBEAT chunks
355	        to the inactive destination address at a lower HB.interval
356	        frequency as described in Section 8.3 of [RFC4960] (when the
357	        Path Heartbeat function is enabled for the destination address).

359	   9.   Acknowledgments for chunks that have been transmitted to
360	        multiple destinations (i.e., a chunk which has been
361	        retransmitted to a different destination address than the
362	        destination address to which the chunk was first transmitted)
363	        MUST NOT clear the error count for an inactive destination
364	        address and MUST NOT transition a destination address in PF
365	        state back to active state, since a sender cannot disambiguate
366	        whether the ACK was for the original transmission or the
367	        retransmission(s).  A SCTP sender MAY apply a different approach
368	        for the error count handling based on unequivocally information
369	        on which destination (including multiple destination addresses)
370	        the chunk reached.  This document makes no reference to what
371	        such unequivocally information could consist of, neither how
372	        such unequivocally information could be obtained.  The design of
373	        such an alternative approach is left to implementations.

375	   10.  Acknowledgments for chunks that has been transmitted to one
376	        destination address only MUST clear the error counter for the
377	        destination address and MUST transition a destination address in
378	        PF state back to Active state.  This situation can happen when
379	        new data is sent to a destination address in the PF state.  It
380	        can also happen in situations where the destination address is
381	        in the PF state due to the occurrence of a spurious T3-rtx timer
382	        and Acknowledgments start to arrive for data sent prior to
383	        occurrence of the spurious T3-rtx and data has not yet been
384	        retransmitted towards other destinations.  This document does
385	        not specify special handling for detection of or reaction to
386	        spurious T3-rtx timeouts, e.g., for special operation vis-a-vis
387	        the congestion control handling or data retransmission operation
388	        towards a destination address which undergoes a transition from
389	        active to PF to active state due to a spurious T3-rtx timeout.
390	        But it is noted that this is an area which would benefit from
391	        additional attention, experimentation and specification for
392	        Single Homed SCTP as well as for Multi Homed SCTP protocol
393	        operation.

395	   11.  When all destination addresses are in inactive state, and SCTP
396	        protocol operation thus is said to be in dormant state, the
397	        prescriptions given in Section 4 shall be followed.

399	   12.  The SCTP stack should provide the ULP with the means to expose
400	        the PF state of its destinations as well as the means to notify
401	        of state transitions from Active to PF, and vice-versa.  However
402	        it is recommended that an SCTP stack implementing SCTP-PF also
403	        allows for that the ULP is kept ignorant of the PF state of its
404	        destinations and the associated state transition.  For this
405	        reason is it recommended that an SCTP stack implementing SCTP-PF
406	        also should provide the ULP with the means to suppress exposure
407	        of PF state and the associated state transitions.

409	4.  Dormant State Operation

411	   In a situation with complete disruption of the communication in
412	   between the SCTP Endpoints, the aggressive HEARTBEAT transmissions of
413	   SCTP-PF on destination addresses in PF state may make the association
414	   enter dormant state faster than a standard [RFC4960] SCTP
415	   implementation given the same setting of Path.Max.Retrans (PMR) and
416	   Association.Max.Retrans (AMR).  For example, an SCTP association with
417	   two destination addresses typically would reach dormant state in half
418	   the time of an [RFC4960] SCTP implementation in such situations.
419	   This is because a SCTP PF sender will send HEARTBEATS and data
420	   retransmissions in parallel with RTO intervals when there are
421	   multiple destinations addresses in PF state.  This argument presumes
422	   that RTO << HB.interval of [RFC4960].  With the design goal that
423	   SCTP-PF shall provide the same level of disruption tolerance as an
424	   [RFC4960] SCTP implementation with the same Path.Max.Retrans (PMR)
425	   and Association.Max.Retrans (AMR) setting, we prescribe for that an
426	   SCTP-PF implementation SHOULD operate as described below in
427	   Section 4.1 during dormant state.

429	   An SCTP-PF implementation MAY choose a different dormant state
430	   operation than the one described below in Section 4.1 provided that
431	   the solution chosen does not compromise the fault tolerance of the
432	   SCTP-PF operation.

434	   The below prescription for SCTP-PF dormant state handling SHOULD NOT
435	   be coupled to the value of the PFMR, but solely to the activation of
436	   SCTP-PF logic in an SCTP implementation.

438	   It is noted that the below dormant state operation is considered to
439	   provide added disruption tolerance also for an [RFC4960] SCTP
440	   implementation, and that it can be sensible for an [RFC4960] SCTP
441	   implementation to follow this mode of operation.  For an [RFC4960]
442	   SCTP implementation the continuation of data transmission during
443	   dormant state makes the fault tolerance of SCTP be more robust
444	   towards situations where some, or all, alternative paths of an SCTP
445	   association approach, or reach, inactive state prior to that the
446	   primary path used for data transmission observes trouble.

448	4.1.  SCTP Dormant State Procedure

450	   a.  When the destination addresses are all in inactive state and data
451	       is available for transfer, the sender MUST choose one destination
452	       and transmit data to this destination address.

454	   b.  The sender MUST NOT change the state of the chosen destination
455	       address (it remains in inactive state) and it MUST NOT clear the
456	       error counter of the destination address as a result of choosing
457	       the destination address for data transmission.

459	   c.  The sender SHOULD choose the destination in inactive state with
460	       the lowest error count (fewest consecutive timeouts) for data
461	       transmission.  When there are multiple destinations with same
462	       error count in inactive state, the sender SHOULD attempt to pick
463	       the most divergent source - destination pair from the last source
464	       - destination pair where failure was observed.  Rules for picking
465	       the most divergent source-destination pair are an implementation
466	       decision and are not specified within this document.  To support
467	       differentiation of inactive destination addresses based on their
468	       error count SCTP will need to allow for increment of the
469	       destination address error counters up to some reasonable limit
470	       above PMR+1, thus changing the prescriptions of [RFC4960],
471	       section 8.3, in this respect.  The exact limit to apply is not
472	       specified in this document but it is considered reasonable to
473	       require for such to be an order of magnitude higher than the PMR
474	       value.  A sender MAY choose to deploy other strategies that the
475	       strategy defined by here.  The strategy to prioritize the last
476	       active destination address, i.e., the destination address with
477	       the fewest error counts is optimal when some paths are
478	       permanently inactive, but suboptimal when a path instability is
479	       transient.

481	5.  Permanent Failover

483	   The objective of the Permanent Failover operation is to allow the
484	   SCTP sender to continue data transmission on a new working path even
485	   when the old primary destination address becomes active again.  This
486	   is achieved by having SCTP perform a switch over of the primary path
487	   to the new working path if the error counter of the primary path
488	   exceeds a certain threshold.  This mode of operation can be applied
489	   not only to SCTP-PF implementations, but also to [RFC4960]
490	   implementations.

492	   The Permanent Failover operation requires only sender side changes.
493	   The details are:

495	   1.  The sender maintains a new tunable parameter, called
496	       Primary.Switchover.Max.Retrans (PSMR).  For SCTP-PF
497	       implementations, the PSMR MUST be set greater or equal to the
498	       PFMR value.  For [RFC4960] implementations the PSMR MUST be set
499	       greater or equal to the PMR value.  Implementations MUST reject
500	       any other values of PSMR.

502	   2.  When the path error counter on a set primary path exceeds PSMR,
503	       the SCTP implementation MUST autonomously select and set a new
504	       primary path.

506	   3.  The primary path selected by the SCTP implementation MUST be the
507	       path which at the given time would be chosen for data transfer.
508	       A previously failed primary path can be used as data transfer
509	       path as per normal path selection when the present data transfer
510	       path fails.

512	   4.  For SCTP-PF, the recommended value of PSMR is PFMR when Permanent
513	       Failover is used.  This means that no forced switchback to a
514	       previously failed primary path is performed.  An SCTP-PF
515	       implementation of Permanent Failover MUST support the setting of
516	       PSMR = PFMR.  A SCTP-PF implementation of Permanent Failover MAY
517	       support setting of PSMR > PFMR.

519	   5.  For [RFC4960] SCTP, the recommended value of PSMR is PMR when
520	       Permanent Failover is used.  This means that no forced switchback
521	       to a previously failed primary path is performed.  A [RFC4960]
522	       SCTP implementation of Permanent Failover MUST support the
523	       setting of PSMR = PMR An [RFC4960] SCTP implementation of
524	       Permanent Failover MAY support larger settings of PSMR > PMR.

526	   6.  It MUST be possible to disable the Permanent Failover and obtain
527	       the standard switchback operation of [RFC4960].

529	   The manner of switch over operation that is most optimal in a given
530	   scenario depends on the relative quality of a set primary path versus
531	   the quality of alternative paths available as well as it depends on
532	   the extent to which it is desired for the mode of operation to
533	   enforce traffic distribution over a number of network paths.  I.e.,
534	   load distribution of traffic from multiple SCTP associations may be
535	   sought to be enforced by distribution of the set primary paths with
536	   [RFC4960] switchback operation.  However as [RFC4960] switchback
537	   behavior is suboptimal in certain situations, especially in scenarios
538	   where a number of equally good paths are available, an SCTP
539	   implementation MAY support also, as alternative behavior, the
540	   Permanent Failover mode of operation and MAY enable it based on
541	   users' requests.

543	   For an SCTP implementation that implements Permanent Failover, this
544	   specification RECOMMENDS that the standard RFC4960 switchback
545	   operation is retained as the default operation.

547	6.  Suggested SCTP Protocol Parameter Values

549	   This document does not alter the [RFC4960] value RECOMMENDATIONS for
550	   the SCTP Protocol Parameters defined in [RFC4960].

552	   The following protocol parameter is RECOMMENDED:

554	      PotentiallyFailed.Max.Retrans (PFMR) - 0

556	7.  Socket API Considerations

558	   This section describes how the socket API defined in [RFC6458] is
559	   extended to provide a way for the application to control and observe
560	   the SCTP-PF behavior as well as the Permanent Failover function.

562	   Please note that this section is informational only.

564	   A socket API implementation based on [RFC6458] is, by means of the
565	   existing SCTP_PEER_ADDR_CHANGE event, extended to provide the event
566	   notification when a peer address enters or leaves the potentially
567	   failed state as well as the socket API implementation is extended to
568	   expose the potentially failed state of a peer address in the existing
569	   SCTP_GET_PEER_ADDR_INFO structure.

571	   Furthermore, two new read/write socket options for the level
572	   IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS and
573	   SCTP_EXPOSE_POTENTIALLY_FAILED_STATE are defined as described below.
574	   The first socket option is used to control the values of the PFMR and
575	   PSMR parameters described in Section 3 and in Section 5.  The second
576	   one controls the exposition of the potentially failed path state.

578	   Support for the SCTP_PEER_ADDR_THLDS and
579	   SCTP_EXPOSE_POTENTIALLY_FAILED_STATE socket options need also to be
580	   added to the function sctp_opt_info().

582	7.1.  Support for the Potentially Failed Path State

584	   As defined in [RFC6458], the SCTP_PEER_ADDR_CHANGE event is provided
585	   if the status of a peer address changes.  In addition to the state
586	   changes described in [RFC6458], this event is also provided, if a
587	   peer address enters or leaves the potentially failed state.  The
588	   notification as defined in [RFC6458] uses the following structure:

590	   struct sctp_paddr_change {
591	     uint16_t spc_type;
592	     uint16_t spc_flags;
593	     uint32_t spc_length;
594	     struct sockaddr_storage spc_aaddr;
595	     uint32_t spc_state;
596	     uint32_t spc_error;
597	     sctp_assoc_t spc_assoc_id;
598	   }

600	   [RFC6458] defines the constants SCTP_ADDR_AVAILABLE,
601	   SCTP_ADDR_UNREACHABLE, SCTP_ADDR_REMOVED, SCTP_ADDR_ADDED, and
602	   SCTP_ADDR_MADE_PRIM to be provided in the spc_state field.  This
603	   document defines in addition to that the new constant
604	   SCTP_ADDR_POTENTIALLY_FAILED, which is reported if the affected
605	   address becomes potentially failed.

607	   The SCTP_GET_PEER_ADDR_INFO socket option defined in [RFC6458] can be
608	   used to query the state of a peer address.  It uses the following
609	   structure:

611	   struct sctp_paddrinfo {
612	     sctp_assoc_t spinfo_assoc_id;
613	     struct sockaddr_storage spinfo_address;
614	     int32_t spinfo_state;
615	     uint32_t spinfo_cwnd;
616	     uint32_t spinfo_srtt;
617	     uint32_t spinfo_rto;
618	     uint32_t spinfo_mtu;
619	   };

621	   [RFC6458] defines the constants SCTP_UNCONFIRMED, SCTP_ACTIVE, and
622	   SCTP_INACTIVE to be provided in the spinfo_state field.  This
623	   document defines in addition to that the new constant
624	   SCTP_POTENTIALLY_FAILED, which is reported if the peer address is
625	   potentially failed.

627	7.2.  Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket Option

629	   Applications can control the SCTP-PF behavior by getting or setting
630	   the number of consecutive timeouts before a peer address is
631	   considered potentially failed or unreachable.  The same socket option
632	   is used by applications to set and get the number of timeouts before
633	   the primary path is changed automatically by the Permanent Failover
634	   function.  This socket option uses the level IPPROTO_SCTP and the
635	   name SCTP_PEER_ADDR_THLDS.

637	   The following structure is used to access and modify the thresholds:

639	   struct sctp_paddrthlds {
640	     sctp_assoc_t spt_assoc_id;
641	     struct sockaddr_storage spt_address;
642	     uint16_t spt_pathmaxrxt;
643	     uint16_t spt_pathpfthld;
644	     uint16_t spt_pathcpthld;
645	   };

647	   spt_assoc_id:  This parameter is ignored for one-to-one style
648	      sockets.  For one-to-many style sockets the application may fill
649	      in an association identifier or SCTP_FUTURE_ASSOC.  It is an error
650	      to use SCTP_{CURRENT|ALL}_ASSOC in spt_assoc_id.

652	   spt_address:  This specifies which peer address is of interest.  If a
653	      wild card address is provided, this socket option applies to all
654	      current and future peer addresses.

656	   spt_pathmaxrxt:  Each peer address of interest is considered
657	      unreachable, if its path error counter exceeds spt_pathmaxrxt.

659	   spt_pathpfthld:  Each peer address of interest is considered
660	      Potentially Failed, if its path error counter exceeds
661	      spt_pathpfthld.

663	   spt_pathcpthld:  Each peer address of interest is not considered the
664	      primary remote address anymore, if its path error counter exceeds
665	      spt_pathcpthld.  Using a value of 0xffff disables the selection of
666	      a new primary peer address.  If an implementation does not support
667	      the automatically selection of a new primary address, it should
668	      indicate an error with errno set to EINVAL if a value different
669	      from 0xffff is used in spt_pathcpthld.  For SCTP-PF, the setting
670	      of spt_pathcpthld < spt_pathpfthld should be rejected with errno
671	      set to EINVAL.  For [RFC4960] SCTP, the setting of spt_pathcpthld
672	      < spt_pathmaxrxt should be rejected with errno set to EINVAL.  A
673	      SCTP-PF implementation MAY support only setting of spt_pathcpthld
674	      = spt_pathpfthld and spt_pathcpthld = 0xffff and a [RFC4960] SCTP
675	      implementation MAY support only setting of spt_pathcpthld =
676	      spt_pathmaxrxt and spt_pathcpthld = 0xffff.  In these cases SCTP
677	      shall reject setting of other values with errno set to EINVAL.

679	7.3.  Exposing the Potentially Failed Path State
680	      (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option

682	   Applications can control the exposure of the potentially failed path
683	   state in the SCTP_PEER_ADDR_CHANGE event and the
684	   SCTP_GET_PEER_ADDR_INFO as described in Section 7.1.  The default
685	   value is implementation specific.

687	   This socket option uses the level IPPROTO_SCTP and the name
688	   SCTP_EXPOSE_POTENTIALLY_FAILED_STATE.

690	   The following structure is used to control the exposition of the
691	   potentially failed path state:

693	   struct sctp_assoc_value {
694	     sctp_assoc_t assoc_id;
695	     uint32_t assoc_value;
696	   };

698	   assoc_id:  This parameter is ignored for one-to-one style sockets.
699	      For one-to-many style sockets the application may fill in an
700	      association identifier or SCTP_FUTURE_ASSOC.  It is an error to
701	      use SCTP_{CURRENT|ALL}_ASSOC in assoc_id.

703	   assoc_value:  The potentially failed path state is exposed if and
704	      only if this parameter is non-zero.

706	8.  Security Considerations

708	   Security considerations for the use of SCTP and its APIs are
709	   discussed in [RFC4960] and [RFC6458].

711	   The logic introduced by this document does not impact existing on-
712	   the-wire SCTP messages.  Also, this document does not introduce any
713	   new on-the-wire SCTP messages that require new security
714	   considerations.

716	   SCTP-PF makes SCTP not only more robust during primary path failure/
717	   congestion but also more vulnerable to network connectivity/
718	   congestion attacks on the primary path.  SCTP-PF makes it easier for
719	   an attacker to trick SCTP to change data transfer path, since the
720	   duration of time that an attacker needs to compromise the network
721	   connectivity is much shorter than [RFC4960].  However, SCTP-PF does
722	   not constitute a significant change in the duration of time and
723	   effort an attacker needs to keep SCTP away from the primary path.
724	   With the standard switchback operation [RFC4960] SCTP resumes data
725	   transfer on its primary path as soon as the next HEARTBEAT succeeds.

727	   On the other hand, usage of the Permanent Failover mechanism, does
728	   change the treat analysis.  This is because attackers can force a
729	   permanent change of the data transfer path by blocking the primary
730	   path until the switchover of the primary path is triggered by the
731	   Permanent Failover algorithm.  This especially will be the case when
732	   Permanent Failover is used together with SCTP-PF with the particular
733	   setting of PSMR = PFMR = 0, as Permanent Failover here happens
734	   already at the first RTO timeout experienced.  Users of the Permanent
735	   Failover mechanism should be aware of this fact.

737	   The event notification of path state transfer from active to
738	   potentially failed state and vice versa gives attackers an increased
739	   possibility to generate more local events.  However, it is assumed
740	   that event notifications are rate-limited in the implementation to
741	   address this threat.

743	9.  IANA Considerations

745	   This document does not create any new registries or modify the rules
746	   for any existing registries managed by IANA.

748	10.  Acknowledgements

750	   The authors wish to thank Michael Tuexen for his many invaluable
751	   comments and for his very substantial support with the making of this
752	   document.

754	11.  Proposed Change of Status (to be Deleted before Publication)

756	   Initially this work looked to entail some changes of the Congestion
757	   Control (CC) operation of SCTP and for this reason the work was
758	   proposed as Experimental.  These intended changes of the CC operation
759	   have since been judged to be irrelevant and are no longer part of the
760	   specification.  As the specification entails no other potential
761	   harmful features, consensus exists in the WG to bring the work
762	   forward as PS.

764	   Initially concerns have been expressed about the possibility for the
765	   mechanism to introduce path bouncing with potential harmful network
766	   impacts.  These concerns are believed to be unfounded.  This issue is
767	   addressed in Appendix B.

769	   It is noted that the feature specified by this document is
770	   implemented by multiple SCTP SW implementations and furthermore that
771	   various variants of the solution have been deployed in Telco
772	   signaling environments for several years with good results.

774	12.  References

776	12.1.  Normative References

778	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
779	              Requirement Levels", BCP 14, RFC 2119, March 1997.

781	   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol", RFC
782	              4960, September 2007.

784	12.2.  Informative References

786	   [CARO02]   Caro Jr., A., Iyengar, J., Amer, P., Heinz, G., and R.
787	              Stewart, "A Two-level Threshold Recovery Mechanism for
788	              SCTP", Tech report, CIS Dept, University of Delaware , 7
789	              2002.

791	   [CARO04]   Caro Jr., A., Amer, P., and R. Stewart, "End-to-End
792	              Failover Thresholds for Transport Layer Multihoming",
793	              MILCOM 2004 , 11 2004.

795	   [CARO05]   Caro Jr., A., "End-to-End Fault Tolerance using Transport
796	              Layer Multihoming", Ph.D Thesis, University of Delaware ,
797	              1 2005.

799	   [FALLON08]
800	              Fallon, S., Jacob, P., Qiao, Y., Murphy, L., Fallon, E.,
801	              and A. Hanley, "SCTP Switchover Performance Issues in WLAN
802	              Environments", IEEE CCNC 2008, 1 2008.

804	   [GRINNEMO04]
805	              Grinnemo, K-J. and A. Brunstrom, "Performance of SCTP-
806	              controlled failovers in M3UA-based SIGTRAN networks",
807	              Advanced Simulation Technologies Conference , 4 2004.

809	   [IYENGAR06]
810	              Iyengar, J., Amer, P., and R. Stewart, "Concurrent
811	              Multipath Transfer using SCTP Multihoming over Independent
812	              End-to-end Paths.", IEEE/ACM Trans on Networking 14(5), 10
813	              2006.

815	   [JUNGMAIER02]
816	              Jungmaier, A., Rathgeb, E., and M. Tuexen, "On the use of
817	              SCTP in failover scenarios", World Multiconference on
818	              Systemics, Cybernetics and Informatics , 7 2002.

820	   [NATARAJAN09]
821	              Natarajan, P., Ekiz, N., Amer, P., and R. Stewart,
822	              "Concurrent Multipath Transfer during Path Failure",
823	              Computer Communications , 5 2009.

825	   [RFC6458]  Stewart, R., Tuexen, M., Poon, K., Lei, P., and V.
826	              Yasevich, "Sockets API Extensions for the Stream Control
827	              Transmission Protocol (SCTP)", RFC 6458, December 2011.

829	Appendix A.  Discussions of Alternative Approaches

831	   This section lists alternative approaches for the issues described in
832	   this document.  Although these approaches do not require to update
833	   RFC4960, we do not recommend them from the reasons described below.

835	A.1.  Reduce Path.Max.Retrans (PMR)

837	   Smaller values for Path.Max.Retrans shorten the failover duration and
838	   in fact this is recommended in some research results [JUNGMAIER02]
839	   [GRINNEMO04] [FALLON08].  However to significantly reduce the
840	   failover time it is required to go down (as with PFMR) to
841	   Path.Max.Retrans=0 and with this setting SCTP switches to another
842	   destination address already on a single timeout which may result in
843	   spurious failover.  Spurious failover is a problem in [RFC4960] SCTP
844	   as the transmission of HEARTBEATS on the left primary path, unlike in
845	   SCTP-PF, is governed by 'HB.interval' also during the failover
846	   process.  'HB.interval' is usually set in the order of seconds
847	   (recommended value is 30 seconds) and when the primary path becomes
848	   inactive, the next HEARTBEAT may be transmitted only many seconds
849	   later.  Indeed as recommended, only 30 secs later.  Meanwhile, the
850	   primary path may since long have recovered, if it needed recovery at
851	   all (indeed the failover could be truely spurious).  In such
852	   situations, post failover, an endpoint is forced to wait in the order
853	   of many seconds before the endpoint can resume transmission on the
854	   primary path and furthermore once it returns on the primary path the
855	   CWND needs to be rebuild anew - a process which the throughput
856	   already have had to suffer from on the alternate path.  Using a
857	   smaller value for 'HB.interval' might help this situation, but it
858	   would result in a general waste of bandwidth as such more frequent
859	   HEARBEATING would take place also when there are no observed
860	   troubles.  The bandwidth overhead may be diminished by having the ULP
861	   use a smaller 'HB.interval' only on the path which at any given time
862	   is set to be the primary path, but this adds complication in the ULP.

864	   In addition, smaller Path.Max.Retrans values also affect the
865	   'Association.Max.Retrans' value.  When the SCTP association's error
866	   count exceeds Association.Max.Retrans threshold, the SCTP sender
867	   considers the peer endpoint unreachable and terminates the
868	   association.  Section 8.2 in [RFC4960] recommends that
869	   Association.Max.Retrans value should not be larger than the summation
870	   of the Path.Max.Retrans of each of the destination addresses.  Else
871	   the SCTP sender considers its peer reachable even when all
872	   destinations are INACTIVE and to avoid this dormant state operation,
873	   [RFC4960]  SCTP implementation SHOULD reduce Association.Max.Retrans
874	   accordingly whenever it reduces Path.Max.Retrans.  However, smaller
875	   Association.Max.Retrans value compromizes the fault tolerance of SCTP
876	   as it increases the chances of association termination during minor
877	   congestion events.

879	A.2.  Adjust RTO related parameters

881	   As several research results indicate, we can also shorten the
882	   duration of failover process by adjusting RTO related parameters
883	   [JUNGMAIER02] [FALLON08].  During failover process, RTO keeps being
884	   doubled.  However, if we can choose smaller value for RTO.max, we can
885	   stop the exponential growth of RTO at some point.  Also, choosing
886	   smaller values for RTO.initial or RTO.min can contribute to keep the
887	   RTO value small.

889	   Similar to reducing Path.Max.Retrans, the advantage of this approach
890	   is that it requires no modification to the current specification,
891	   although it needs to ignore several recommendations described in the
892	   Section 15 of [RFC4960].  However, this approach requires to have
893	   enough knowledge about the network characteristics between end
894	   points.  Otherwise, it can introduce adverse side-effects such as
895	   spurious timeouts.

897	   The significant issue with this approach, however, is that even if
898	   the RTO.max is lowered to an optimal low value, then as long as the
899	   Path.Max.Retrans is kept at the [RFC4960] recommended value, the
900	   reduction of the RTO.max doesn't reduce the failover time
901	   sufficiently enough to prevent severe performance degradation during
902	   failover.

904	Appendix B.  Discussions for Path Bouncing Effect

906	   The methods described in the document can accelerate the failover
907	   process.  Hence, they might introduce the path bouncing effect where
908	   the sender keeps changing the data transmission path frequently.
909	   This sounds harmful to the data transfer, however several research
910	   results indicate that there is no serious problem with SCTP in terms
911	   of path bouncing effect [CARO04] [CARO05].

913	   There are two main reasons for this.  First, SCTP is basically
914	   designed for multipath communication, which means SCTP maintains all
915	   path related parameters (CWND, ssthresh, RTT, error count, etc) per
916	   each destination address.  These parameters cannot be affected by
917	   path bouncing.  In addition, when SCTP migrates the data transfer to
918	   another path, it starts with the minimal or the initial CWND.  Hence,
919	   there is little chance for packet reordering or duplicating.

921	   Second, even if all communication paths between the end-nodes share
922	   the same bottleneck, the SCTP-PF results in a behavior already
923	   allowed by [RFC4960].

925	Appendix C.  SCTP-PF for SCTP Single-homed Operation

927	   For a single-homed SCTP association the only tangible effect of the
928	   activation of SCTP-PF operation is enhanced failure detection in
929	   terms of potential notification of the PF state of the sole
930	   destination address as well as, for idle associations, more rapid
931	   entering, and notification, of inactive state of the destination
932	   address and more rapid end-point failure detection.  It is believed
933	   that neither of these effects are harmful, provided adequate dormant
934	   state operation is implemented, and furthermore that they may be
935	   particularly useful for applications that deploys multiple SCTP
936	   associations for load balancing purposes.  The early notification of
937	   the PF state may be used for preventive measures as the entering of
938	   the PF state can be used as a warning of potential congestion.
939	   Depending on the PMR value, the aggressive HEARTBEAT transmission in
940	   PF state may speed up the end-point failure detection (exceed of AMR
941	   threshold on the sole path error counter) on idle associations in
942	   case where relatively large HB.interval value compared to RTO (e.g.
943	   30secs) is used.

945	Authors' Addresses

947	   Yoshifumi Nishida
948	   GE Global Research
949	   2623 Camino Ramon
950	   San Ramon, CA  94583
951	   USA

953	   Email: nishida@wide.ad.jp

955	   Preethi Natarajan
956	   Cisco Systems
957	   510 McCarthy Blvd
958	   Milpitas, CA  95035
959	   USA

961	   Email: prenatar@cisco.com
962	   Armando Caro
963	   BBN Technologies
964	   10 Moulton St.
965	   Cambridge, MA  02138
966	   USA

968	   Email: acaro@bbn.com

970	   Paul D. Amer
971	   University of Delaware
972	   Computer Science Department - 434 Smith Hall
973	   Newark, DE  19716-2586
974	   USA

976	   Email: amer@udel.edu

978	   Karen E. E. Nielsen
979	   Ericsson
980	   Kistavaegen 25
981	   Stockholm  164 80
982	   Sweden

984	   Email: karen.nielsen@tieto.com