idnits 2.17.1 draft-beeram-mpls-rsvp-te-scaling-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 7 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 9 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC2205], [RFC3209]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 08, 2015) is 3330 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Vishnu Pavan Beeram 3 Internet Draft Juniper Networks 4 Intended status: Informational Ina Minei 5 Google, Inc 6 Yakov Rekhter 7 Juniper Networks 8 Ebben Aries 9 Facebook 10 Dante Pacella 11 Verizon 13 Expires: September 08, 2015 March 08, 2015 15 RSVP-TE Scalability - Recommendations 16 draft-beeram-mpls-rsvp-te-scaling-01 18 Status of this Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six 29 months and may be updated, replaced, or obsoleted by other documents 30 at any time. It is inappropriate to use Internet-Drafts as 31 reference material or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt 36 The list of Internet-Draft Shadow Directories can be accessed at 37 http://www.ietf.org/shadow.html 39 This Internet-Draft will expire on September 08, 2015. 41 Copyright Notice 43 Copyright (c) 2015 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with 51 respect to this document. Code Components extracted from this 52 document must include Simplified BSD License text as described in 53 Section 4.e of the Trust Legal Provisions and are provided without 54 warranty as described in the Simplified BSD License. 56 Abstract 58 RSVP-TE [RFC3209] describes the use of standard RSVP [RFC2205] to 59 establish Label Switched Paths (LSPs). As such, RSVP-TE inherited 60 some properties of RSVP that adversely affect its control plane 61 scalability. Specifically these properties are (a) reliance on 62 periodic refreshes for state synchronization between RSVP neighbors 63 and for recovery from lost RSVP messages, (b) reliance on refresh 64 timeout for stale state cleanup, and (c) lack of any mechanisms by 65 which a receiver of RSVP messages can apply back pressure to the 66 sender(s) of these messages. 68 Subsequent to [RFC2205] and [RFC3209] further enhancements to RSVP 69 and RSVP-TE have been developed. In this document we describe how an 70 implementation of RSVP-TE can use these enhancements to address the 71 above mentioned properties to improve RSVP-TE control plane 72 scalability. 74 Conventions used in this document 76 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 77 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 78 document are to be interpreted as described in RFC-2119 [RFC2119]. 80 Table of Contents 82 1. Introduction...................................................3 83 1.1. Reliance on refreshes and refresh timeouts................3 84 1.2. Lack of back pressure.....................................4 85 2. Recommendations................................................5 86 2.1. Eliminating reliance on refreshes and refresh timeouts....5 87 2.2. Providing the ability to apply back pressure..............6 88 2.3. Making Acknowledgements mandatory.........................6 89 2.4. Clarifications on reaching Rapid Retry Limit (Rl).........7 90 2.5. Avoiding use of Router Alert IP Option....................7 91 2.6. Checking Data Plane readiness.............................8 92 3. Security Considerations........................................8 93 4. IANA Considerations............................................8 94 5. Normative References...........................................8 95 6. Acknowledgments................................................9 97 1. Introduction 99 RSVP-TE [RFC3209] describes the use of standard RSVP [RFC2205] to 100 establish Label Switched Paths (LSPs). As such, RSVP-TE inherited 101 some properties of RSVP that adversely affect its control plane 102 scalability. Specifically these properties are (a) reliance on 103 periodic refreshes for state synchronization between RSVP neighbors 104 and for recovery from lost RSVP messages, (b) reliance on refresh 105 timeout for stale state cleanup, and (c) lack of any mechanisms by 106 which a receiver of RSVP messages can apply back pressure to the 107 sender(s) of these messages. The following elaborates on this. 109 1.1. Reliance on refreshes and refresh timeouts 111 Standard RSVP [RFC2205] maintains state via the generation of RSVP 112 Path/Resv refresh messages. Refresh messages are used to both 113 synchronize state between RSVP neighbors and to recover from lost 114 RSVP messages. The use of Refresh messages to cover many possible 115 failures has resulted in two operational problems. The first 116 relates to scaling, the second relates to the reliability and 117 latency of RSVP signaling. 119 The scaling problem is linked to the control plane resource 120 requirements of running RSVP-TE. The resource requirements increase 121 proportionally with the number of LSPs established by RSVP-TE. Each 122 such LSP requires the generation, transmission, reception and 123 processing of RSVP Path and Resv messages per refresh period. 124 Supporting a large number of LSPs and the corresponding volume of 125 refresh messages, presents a scaling problem for the RSVP-TE control 126 plane. 128 The reliability and latency problem occurs when a triggered (non- 129 refresh) RSVP message such as Path, Resv, or PathTear is lost in 130 transmission. Standard RSVP [RFC2205] recovers from a lost message 131 via RSVP refresh messages. In the face of transmission loss of RSVP 132 messages, the end-to-end latency of RSVP signaling, and thus the 133 end-to-end latency of RSVP-TE signaled LSP establishment, is tied to 134 the refresh interval of the Label Switch Router(s) experiencing the 135 loss. When end-to-end signaling is limited by the refresh interval, 136 the delay incurred in the establishment or the change of an RSVP-TE 137 signaled LSP may be beyond the range of what is acceptable in 138 practice. This is because RSVP-TE ultimately controls establishment 139 of the forwarding state required to realize RSVP-TE signaled LSPs. 140 Thus delay incurred in the establishment or the change of such LSPs 141 results in delaying the data plane convergence, which in turn 142 adversely impacts the services that rely on the data plane. 144 One way to address the scaling problem caused by the refresh volume 145 is to increase the refresh period, "R" as defined in Section 3.7 of 146 [RFC2205]. Increasing the value of R provides linear improvement on 147 RSVP-TE signaling overhead, but at the cost of increasing the time 148 it takes to synchronize state. For the reasons mentioned in the 149 previous paragraph, in the context of RSVP-TE signaled LSPs, 150 increasing the time to synchronize state is not an acceptable 151 option. 153 One way to address the reliability and latency of RSVP signaling is 154 to decrease the refresh period R. Decreasing the value of R 155 increases the probability that state will be installed in the face 156 of message loss, but at the cost of increasing refresh message rate 157 and associated processing requirements, which in turn adversely 158 affects RSVP-TE control plane scalability. 160 An additional problem is the time to clean up the stale state after 161 a tear message is lost. RSVP does not retransmit ResvTear or 162 PathTear messages. If the sole tear message transmitted is lost, the 163 stale state will only be cleaned up once the refresh timeout has 164 expired. This may result in resources associated with the stale 165 state being allocated for an unnecessary period of time. Note that 166 even when the refresh period is adjusted, the refresh timeout must 167 still expire since tear messages are not retransmitted. Decreasing 168 the refresh timeout by decreasing the refresh interval will speed up 169 timely stale state cleanup, but at the cost of increasing refresh 170 message rate, which in turn adversely affects RSVP-TE control plane 171 scalability. 173 1.2. Lack of back pressure 175 In standard RSVP, an RSVP speaker sends RSVP messages to a peer with 176 no regard for whether the peer's RSVP control plane is busy. There 177 is no control plane mechanism by which an RSVP speaker may apply 178 back pressure to the peer by asking the peer to reduce the rate of 179 RSVP messages that the peer sends to the speaker. RSVP-TE inherited 180 this from standard RSVP. Lack of such a mechanism could result in 181 RSVP-TE control plane congestion. 183 RSVP-TE control plane is especially susceptible to congestion during 184 link/node failures, as such failures produce bursts of RSVP-TE 185 messages: Path/Resv for re-routing LSPs affected by the failures, 186 Path/Resv for setup of new backup LSPs (as required by RSVP-TE Fast 187 Reroute [RFC4090]), Tear/Error messages for the affected LSPs. Note 188 that the load on the RSVP-TE control plane caused by these bursts is 189 in addition to the load due to the periodic refreshes of Path/Resv 190 messages for the LSPs not affected by the failures. 192 RSVP-TE control plane congestion may result in loss of RSVP 193 messages, which in turn have detrimental effects on the overall 194 system behavior. Path/Resv refreshes lost by a peer's busy control 195 plane will cause refresh timeout for some or all of its existing 196 RSVP-TE state on the peer, thus inadvertently deleting existing LSPs 197 and disrupting traffic carried over these LSPs. Triggered Path/Resv 198 lost by a peer's busy control plane may result in failure to 199 establish new backup LSPs used by RSVP-TE Fast Reroute [RFC4090] 200 before the state for the corresponding protected primary LSPs times 201 out, thus defeating the whole purpose of RSVP-TE Fast Reroute. 203 2. Recommendations 205 Subsequent to the publication of [RFC2205] and [RFC3209] further 206 enhancements to RSVP and RSVP-TE have been developed. In this 207 section we describe how these enhancements could be used to address 208 the problems listed in Section 1. 210 2.1. Eliminating reliance on refreshes and refresh timeouts 212 To eliminate reliance on refreshes for both state synchronization 213 between RSVP neighbors and for recovery from lost RSVP messages, as 214 well as to address both the refresh volume and the reliability 215 issues with RSVP mechanisms other than adjusting refresh rate, this 216 document RECOMMENDS the following: 218 - Implement reliable delivery of Path/Resv messages using the 219 procedures specified in [RFC2961]. 221 - Indicate support for RSVP Refresh Overhead Reduction Extensions 222 (as specified in Section 2 of [RFC2961] by default, with the ability 223 to override the default via configuration. 225 - Make the value of the refresh interval configurable with the 226 default value of 20 minutes. 228 To eliminate reliance on refresh timeouts, in addition to the above, 229 this document RECOMMENDS the following: 231 - Implement reliable delivery of Tear/Err messages using the 232 procedures specified in [RFC2961] 234 - Implement coupling the state of individual LSPs with the state of 235 the corresponding RSVP-TE signaling adjacency. When an RSVP-TE 236 speaker detects RSVP-TE signaling adjacency failure, the speaker 237 MUST clean up the LSP state for all LSPs affected by the failed 238 adjacency. The LSP state is the combination of "path state" 239 maintained as Path State Block and "reservation state" maintained as 240 Reservation State Block (see Section 2.1 of [RFC2205]). 242 - Use of Node-ID based Hello session ([RFC3209], [RFC4558]) for 243 detection of RSVP-TE signaling adjacency failures. Make the value of 244 the node hello_interval [RFC3209] configurable; increase the default 245 value from 5 ms (as specified in Section 5.3 of [RFC3209]) to 9 246 seconds. 248 - Implement procedures specified in [draft-chandra-mpls-enhanced- 249 frr-bypass] which describes methods to facilitate FRR that works 250 independently of the refresh-interval. 252 2.2. Providing the ability to apply back pressure 254 To provide an RSVP speaker with the ability to apply back pressure 255 to its peer(s) to reduce/eliminate RSVP-TE control plane congestion, 256 in addition to the above, this document RECOMMENDS the following: 258 - Use lack of ACKs from a peer as an indication of peer's RSVP-TE 259 control plane congestion, in which case the local system SHOULD 260 throttle RSVP-TE messages to the affected peer. This has to be done 261 on a per-peer basis. 263 - Retransmit of all RSVP-TE messages using exponential backoff, as 264 specified in Section 6 of [RFC2961]. 266 - Increase the Retry Limit (Rl), as defined in Section 6.2 of 267 [RFC2961], from 3 to 7. 269 - Prioritize Tear/Error over trigger Path/Resv sent to a peer when 270 the local system detects RSVP-TE control plane congestion in the 271 peer. 273 2.3. Making Acknowledgements mandatory 275 The reliable message delivery mechanism specified in [RFC2961] 276 states that "Nodes receiving a non-out of order message containing a 277 MESSAGE_ID object with the ACK_Desired flag set, SHOULD respond with 278 a MESSAGE_ID_ACK object." To improve predictability of the system in 279 terms of reliable message delivery this document RECOMMENDS that 280 nodes receiving a non-out of order message containing a MESSAGE_ID 281 object with the ACK_Desired flag set, MUST respond with a 282 MESSAGE_ID_ACK object. 284 2.4. Clarifications on reaching Rapid Retry Limit (Rl) 286 According to section 6 of [RFC2961] "The staged retransmission will 287 continue until either an appropriate MESSAGE_ID_ACK object is 288 received, or the rapid retry limit, Rl, has been reached." The 289 following clarifies what actions, if any, a router should take once 290 Rl has been reached. 292 If it is the retransmission of Tear/Err messages and Rl has been 293 reached, the router need not take any further actions. 295 If it is the retransmission of Path/Resv messages and Rl has been 296 reached, then the router starts periodic retransmission of these 297 messages every 30 seconds. The retransmitted messages MUST carry 298 MESSAGE_ID object with ACK_Desired flag set. This periodic 299 retransmission SHOULD continue until an appropriate MESSAGE_ID ACK 300 object is received indicating acknowledgement of the (retransmitted) 301 Path/Resv message. 303 2.5. Avoiding use of Router Alert IP Option 305 In RSVP-TE the Path message is carried in an IP packet that is 306 addressed to the tail end of the LSP that is signaled using this 307 message. To make all the intermediate/transit LSRs process this 308 message, the IP packet carrying the message includes the Router 309 Alert IP option. The same applies to the PathTear message. 311 An alternative to relying on the Router Alert IP option is to carry 312 the Path or PathTear message as a sub-message of a Bundle message 313 [RFC2961], as Bundle messages are "addressed directly to RSVP 314 neighbors" and "SHOULD NOT be sent with the Router Alert IP option 315 in their IP headers" [RFC2961]. Notice that since a Bundle message 316 could contain only a single sub-message, this approach could be used 317 to send just a single Path or PathTear message. This document 318 RECOMMENDS implementing support for Bundle messages [RFC2961], and 319 carrying Path and PathTear message(s) as sub-message(s) of a Bundle 320 message. 322 2.6. Checking Data Plane readiness 324 In certain scenarios, like Make-Before-Break (MBB), a router needs 325 to move traffic from an existing LSP to a new LSP in the least 326 disruptive fashion. To accomplish this the data plane of the new LSP 327 must be operational before the router moves the traffic. 329 A possible mechanism by which the router can determine whether the 330 data plane of the new LSP is operational is specified in [draft- 331 bonica-mpls-self-ping]. This document RECOMMENDS implementing this 332 mechanism and using it whenever the ingress of an LSP needs to check 333 whether the data plane of the LSP is operational. 335 3. Security Considerations 337 This document does not introduce new security issues. The security 338 considerations pertaining to the original RSVP protocol [RFC2205] 339 and RSVP-TE [RFC3209] remain relevant. 341 4. IANA Considerations 343 This document makes no request of IANA. 345 Note to RFC Editor: this section may be removed on publication as an 346 RFC 348 5. Normative References 350 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 351 Requirement Levels", BCP 14, RFC 2119, March 1997. 353 [RFC2205] Braden, R., "Resource Reservation Protocol (RSVP)", 354 RFC 2205, September 1997. 356 [RFC2961] Berger, L., "RSVP Refresh Overhead Reduction 357 Extensions", RFC 2961, April 2001. 359 [RFC3209] Awduche, D., "RSVP-TE: Extensions to RSVP for LSP 360 Tunnels", RFC 3209, December 2001. 362 [RFC4090] Pan, P., "Fast Reroute Extensions to RSVP-TE for LSP 363 Tunnels", RFC 4090, May 2005. 365 [RFC4558] Ali, Z., "Node-ID Based Resource Reservation (RSVP) 366 Hello: A Clarification Statement", RFC 4558, June 2006. 368 [draft-bonica-mpls-self-ping] Ron Bonica, et al., "LSP Self-Ping", 369 draft-bonica-mpls-self-ping, (work in progress) 371 [draft-chandra-mpls-enhanced-frr-bypass] Chandra Ramachandran, et 372 al., "Refresh Interval Independent FRR Facility 373 Protection", draft-chandra-mpls-enhanced-frr-bypass, 374 (work in progress) 376 6. Acknowledgments 378 Most of the text in Section 1.1 has been taken almost verbatim from 379 [RFC2961]. 381 Authors' Addresses 383 Vishnu Pavan Beeram 384 Juniper Networks 385 Email: vbeeram@juniper.net 387 Ina Minei 388 Google, Inc 389 Email: inaminei@google.com 391 Yakov Rekhter 392 Juniper Networks 393 Email: yakov@juniper.net 395 Ebben Aries 396 Facebook 397 Email: exa@fb.com 399 Dante Pacella 400 Verizon 401 Email: dante.j.pacella@verizon.com 403 Markus Jork 404 Juniper Networks 405 Email: mjork@juniper.net