idnits 2.17.1 draft-ravisingh-teas-rsvp-setup-retry-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 2, 2015) is 3214 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Ingress' is mentioned on line 155, but not defined == Missing Reference: 'Transit' is mentioned on line 155, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. 'RShakir' Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TEAS Working Group Ravi Singh 3 Internet Draft Juniper Networks 4 Intended status: Best Current Practice Rob Shakir 5 British Telecom 6 Vishnu Pavan Beeram 7 Juniper Networks 8 Tarek Saad 9 Cisco Systems 11 Expires: January 2, 2016 July 2, 2015 13 RSVP Setup Retry - BCP 14 draft-ravisingh-teas-rsvp-setup-retry-01 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html 37 This Internet-Draft will expire on January 2, 2016. 39 Copyright Notice 41 Copyright (c) 2015 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with 49 respect to this document. Code Components extracted from this 50 document must include Simplified BSD License text as described in 51 Section 4.e of the Trust Legal Provisions and are provided without 52 warranty as described in the Simplified BSD License. 54 Abstract 56 This document discusses the best current practices associated with 57 the implementation of RSVP setup-retry timer. 59 Conventions used in this document 61 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 62 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 63 document are to be interpreted as described in RFC-2119 [RFC2119]. 65 Table of Contents 67 1. Introduction...................................................2 68 2. Setup-Retry Timer..............................................3 69 3. Possible ill-effects due to implementation choices.............3 70 4. Causes of the above ill-effects................................5 71 5. Solution to the implementation issues..........................5 72 6. Security Considerations........................................6 73 7. IANA Considerations............................................6 74 8. Normative References...........................................6 75 9. Acknowledgments................................................6 76 10. Authors' Addresses............................................6 77 Contributors......................................................7 79 1. Introduction 81 In an RSVP-TE network with a very large number of LSPs, link/node 82 failure(s) may produce a noticeable increase in RSVP-TE control 83 traffic. As a result, RSVP-TE messages might get delayed by virtue 84 of being stuck in a queue that is overwhelmed with messages to be 85 sent or they might get lost forever. For example, a Path message 86 intended to be sent by a transit router might be stuck in the output 87 queue to be sent to the next-hop. Alternately, it might have got 88 dropped on the receive side due to queue overflows. The same could 89 happen for a Resv message in the reverse direction. Also, in the 90 absence of reliable delivery of Path-Error messages [RFC2961], an 91 error that gets generated at transit/egress for an LSP that is in 92 the process of being setup may never make it to the ingress. 94 Lost/delayed RSVP-TE messages cause the following problems for an 95 ingress router: 96 - In the absence of an error indication, how is an ingress to know 97 that an LSP for which signaling was (re-)initiated and a Resv has 98 not yet been received, is ever going to come up? 99 - In the absence of any indication, what action should the ingress 100 take to support low-latency LSP-setup? 102 The above problems essentially boil-down to: how long should the 103 ingress continue to wait before giving up on its attempt to bring up 104 the LSP, and take some alternative course of action (e.g., try to 105 bring up the LSP on an alternate path)?. To mitigate this problem, 106 some implementations use a setup-retry timer mechanism. This 107 document discusses the issues associated with a particular 108 implementation of this timer and makes some specific recommendations 109 to get around these issues. 111 2. Setup-Retry Timer 113 The setup-retry timer is usually a configurable timer which (in the 114 absence of an error indication) goes off when an LSP with a given 115 LSPID has not received the corresponding Resv in response to its 116 Path during a pre-configured duration after its first Path had been 117 sent. 119 Use of the setup-retry timer is based on the presumption that if 120 signaling for a given LSP has not been completed within an 121 "expected" duration, it is not going to be completed at all. The 122 intent in the use of this timer is to expeditiously take some 123 alternative course of action when an LSP has not yet completed its 124 signaling within an "expected" duration of time. 126 3. Possible ill-effects due to implementation choices 128 As mentioned in the previous section, the intent in the use of this 129 timer is to take some alternative course of action when an LSP has 130 not yet completed its signaling within an "expected" duration of 131 time. One such course of action is for the ingress router to 132 initiate tear-down for the previously in-the-process-of-being- 133 signaled path via a PathTear; run CSPF; and use the outcome of this 134 CSPF to signal the brand-new path for this tunnel with a different 135 LSP-ID, typically, bumped up by 1. This section describes the 136 problems caused by such course of action. 138 As mentioned in Section 1, in a network with a very large number of 139 RSVP-TE LSPs, link/node failure(s) may produce a noticeable increase 140 in the volume of RSVP-TE control traffic, which in turn might cause 141 a router to either drop RSVP-TE messages or alternately cause them 142 to be sent excessively late. 144 As a result, the following problems can occur: 145 - LSP setup latency might be excessively high. 146 - Error messages that indicate failure in LSP setup might not make 147 it to the ingress router. 149 A mix of the above problems can cause the setup-retry timer for a 150 given LSP (at the ingress router) to fire repeatedly over a period 151 of time. The situation being such the ingress gets stuck in a cycle 152 as illustrated below for some/many LSPs: 154 -------------------------------------------------------------------- 155 Ingress Timeline | [Ingress]---[]---[]...[Transit]...[]---[]- 156 ------------------------| 157 1. Trigger LSP setup | Path 158 : | TNL-ID=X 159 : | LSP-ID=Y 160 : | --------> 161 | ------------> Path (X, Y) 162 : | -------> ---------> 163 : | : 164 : | : 165 2. Setup-Retry Timer | : 166 fires; Recompute | : 167 path; | : 168 3. Trigger Teardown | PathTear 169 | TNL-ID=X 170 | LSP-ID=Y 171 | --------> 172 | ------------> PathTear (X, Y) 173 | -------> ---------> 174 4. Trigger setup for new| Path 175 instance of the LSP | TNL-ID=X 176 (same ERO) | LSP-ID=Y+1 177 : | --------> 178 : | ------------> Path (X, Y+1) 179 : | -------> ---------> 180 : | Resv 181 | TNL-ID=X 182 : | LSP-ID=Y 183 : | <--------- 184 : | ResvError 185 : | No Path 186 : | ---------> 187 5. Repeat loop through | : 188 2-4 | : 189 -------------------------------------------------------------------- 191 In the above illustration, notice how the transit router never gets 192 to completely process the "current" LSP-ID (see [RShakir] for more). 193 The implementation recommendations made in this document will help 194 avoid this snowball effect. 196 4. Causes of the above ill-effects 198 The implementation issues listed in section 3 end up causing an 199 increase in the control plane load on a network whose control plane 200 is already under stress. The foregoing is caused by unnecessarily 201 doing the following even when there is no change in the computed 202 path: 204 - Sending PathTears causes excessive and unjustifiable work on those 205 downstream routers on the "previous ERO path" that had managed to 206 bring the LSP UP. In other words, the slowness of a given transit 207 router should not be the cause to penalize all other transit 208 routers downstream of it, as doing so just increases the overall 209 network stress. 211 - Sending Path for LSPID=Y+1 causes unnecessary work for all routers 212 on the ERO path including those that were already running slow and 213 were the real cause of the Resv for LSPDID=Y not having been 214 received timely by the ingress. 216 5. Solution to the implementation issues 218 To eliminate causes of the ill-effects listed in the previous 219 section and thus to eliminate the ill-effects, this document makes 220 the following recommendations. 222 When the setup-retry timer fires: 224 If there is no change in the computed path (no error indication for 225 that LSP has been received via a PathErr or a TE update indicating a 226 failure), 227 - Do not send PathTear for LSPID=Y 228 - Just let the Path State get refreshed for LSPID=Y. 230 The recommended default behavior is to keep retrying until the path 231 changes or the user intervenes. Implementations MAY choose to 232 provide the user with an option to override this default behavior 233 and specify a policy to determine when to stop retrying. 235 Implementations SHOULD use the recommendations listed in this 236 section to avoid getting stuck in a LSP signaling hysteresis. 238 6. Security Considerations 240 This document does not introduce any new security concerns. 242 7. IANA Considerations 244 None. 246 8. Normative References 248 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 249 Requirement Levels", BCP 14, RFC 2119, March 1997. 251 [RShakir] Rob Shakir, "The next spring forward", 252 http://rob.sh/files/the-next-spring-forward_rjs120314.pdf 253 March 2014. 255 [RFC2961] Berger, L., "RSVP Refresh Overhead Reduction Extensions", 256 RFC 2961, April 2001. 258 9. Acknowledgments 260 The authors would like to thank Yakov Rekhter and Raveendra Torvi 261 for their inputs. 263 10. Authors' Addresses 265 Ravi Singh 266 Juniper Networks 267 Email: ravis@juniper.net 269 Rob Shakir 270 British Telecom 271 Email: rob.shakir@bt.com 273 Tarek Saad 274 Cisco Systems 275 Email: tsaad@cisco.com 277 Vishnu Pavan Beeram 278 Juniper Networks 279 Email: vbeeram@juniper.net 281 Contributors 283 Markus Jork 284 Juniper Networks 285 Email: mjork@juniper.net 287 Aman Kapoor 288 Juniper Networks 289 Email: amanka@juniper.net