idnits 2.17.1 draft-osborne-mpls-psc-updates-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 04, 2013) is 3854 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BP' is mentioned on line 269, but not defined == Missing Reference: 'UP' is mentioned on line 269, but not defined == Missing Reference: 'BS' is mentioned on line 269, but not defined == Unused Reference: 'RFC6428' is defined on line 337, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Osborne 3 Internet-Draft Cisco Systems 4 Intended status: Standards Track October 04, 2013 5 Expires: April 07, 2014 7 Updates to PSC 8 draft-osborne-mpls-psc-updates-03 10 Abstract 12 This document contains four updates to the Protection State 13 Coordination (PSC) logic defined in RFC6378, "MPLS Transport Profile 14 (MPLS-TP) Linear Protection" . Two of the updates correct existing 15 behavior. The third clears up a behavior which was not explained in 16 the RFC, and the fourth adds rules around handling capabilities 17 mismatches. 19 Requirements Language 21 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 22 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 23 document are to be interpreted as described in RFC 2119 [RFC2119]. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on April 07, 2014. 42 Copyright Notice 44 Copyright (c) 2013 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 2. Incorrect local status after failure . . . . . . . . . . . . 2 61 3. Reversion deadlock due to a race condition . . . . . . . . . 3 62 4. Clarifying PSC's behavior in the face of multiple inputs . . 4 63 5. Handling a capabilities mismatch . . . . . . . . . . . . . . 6 64 5.1. PT mismatch . . . . . . . . . . . . . . . . . . . . . . . 6 65 5.2. R mismatch . . . . . . . . . . . . . . . . . . . . . . . 6 66 5.3. Unsupported modes . . . . . . . . . . . . . . . . . . . . 7 67 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 68 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 69 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 70 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 71 9.1. Normative References . . . . . . . . . . . . . . . . . . 7 72 9.2. Informative References . . . . . . . . . . . . . . . . . 8 73 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 75 1. Introduction 77 This document contains four updates to PSC [RFC6378]. Three of them 78 fix issues #2, #7 and #8 as identified in the ITU's liaison statement 79 "Recommendation ITU-T G.8131/Y.1382 revision - Linear protection 80 switching for MPLS-TP networks" [LIAISON]. The fourth clears up a 81 behavior which was not well explained in RFC6378. These updates are 82 not changes to the protocol's packet format or to PSC's design, but 83 are corrections and clarifications to specific aspects of the 84 protocol's procedures. 86 2. Incorrect local status after failure 88 Issue #2 in the liaison identifies a case where a strict reading of 89 RFC6378 leaves a node reporting an inaccurate status: 91 . A node can end up sending incorrect status - NR(0,1) - despite the 92 failure of the protection LSP (P-LSP). This is clearly not correct, 93 as a node should not be sending NR if it has a local failure. To 94 address this issue, the fourth bullet in section 4.3.3.3 is replaced 95 with the following three bullets: 97 o If the current state is due to a local or remote Manual Switch, a 98 local Signal Fail indication on the protection path SHALL cause 99 the LER to enter local Unavailable state and begin transmission of 100 an SF(0,0) message. 102 o If the LER is in local Protecting Administrative state due to a 103 local Forced Switch, a local Signal Fail indication on the 104 protection path SHALL be ignored. 106 o If the LER is in remote Protecting Administrative state due to a 107 remote Forced Switch, a local Signal Fail indication on the 108 protection path SHALL cause the LER to remain in remote Protecting 109 administrative state and transmit an SF(0,1) message. 111 3. Reversion deadlock due to a race condition 113 Issue #8 in the liaison identifies a deadlock case where each node 114 can end up sending NR(0,1) when it should instead be in the process 115 of recovering from the failure (i.e. entering into WTR or DNR, as 116 appropriate for the protection domain). The root of the issue is 117 that a pair of nodes can simultaneously enter WTR state, receive an 118 out of date SF-W indication and transition into a remotely triggered 119 WTR, and remain in remotely triggered WTR waiting for the other end 120 to trigger a change in status. 122 In the case identified in issue #8, each node can end up sending 123 NR(0,1), which is an indication that the transmitting node has no 124 local failure, but is instead reacting to the remote SF-W. If a node 125 which receives NR(0,1) is in fact not indicating a local error, the 126 receive node can take the received NR(0,1) as an indication that 127 there is no error in the protection domain, and recovery procedures 128 (WTR or DNR) should begin. 130 This is addressed by adding the following text as the penultimate 131 bullet in section 4.3.3.4: 133 o If a node is in Protecting Failure state due to a remote SF-W and 134 receives NR(0,1), this SHALL cause the node to begin recovery 135 procedures. If the LER is configured for revertive behavior, it 136 enters into Wait-to-Restore state, starts the WTR timer, and 137 begins transmitting WTR(0,1). If the LER is configured for non- 138 revertive behavior, it enters into Do-Not-Revert state and begins 139 transmitting a DNR(0,1) message. 141 Additionally, the final bullet in section 4.3.3.3 is changed from 143 o A remote NR(0,0) message SHALL be ignored if in local Protecting 144 administrative state. 146 to 148 o A remote No Request message SHALL be ignored if in local 149 Protecting administrative state. 151 This indicates that a remote NR triggers the same behavior regardless 152 of the value of FPath and Path. This change does not directly 153 address issue #8, but fixes a similar issue - if a node receives NR 154 while in Remote administrative state, the value of FPath and Path 155 have no bearing on the node's reaction to this NR. 157 4. Clarifying PSC's behavior in the face of multiple inputs 159 RFC6378 describes the PSC state machine. Figure 1 in section 3 shows 160 two inputs into the PSC Control logic - Local Request logic and 161 Remote PSC Request. When there is only one input into the PSC 162 Control logic - a local request or a remote request but not both - 163 the PSC Control logic decides what that input signifies and then 164 takes one or more actions, as necessary. This is what the PSC State 165 Machine in section 4.3 describes. 167 RFC6378 does not sufficiently describe the behavior in the face of 168 multiple inputs into the PSC Control Logic (one Local Request and one 169 Remote Request). This section clarifies the expected behavior. 171 There are two cases to think about when considering dual inputs into 172 the PSC Control logic. The first is when the same request is 173 presented from both local and remote sources. One example of this 174 case is a Forced Switch (FS) configured on both ends of an LSP. This 175 will result in the PSC Control logic receiving both a local FS and 176 remove FS. For convenience, this scenario is written as [L(FS), 177 R(FS))] 179 The second case, which is handled in exactly the same way as the 180 first, is when the two inputs into the PSC Control logic describe 181 different events. There are a number of variations on this case. 182 One example is when there is a Lockout of Protection from the Local 183 request logic and a Forced Switch from the Remote PSC Request. This 184 is shortened to [L(LO), R(FS)]. 186 In both cases the question is not how the PSC Control logic decides 187 which of these is the one it acts upon. Section 4.3.2 of RFC6378 188 lists the priority order, and prioritizes the local input over the 189 remote input in case both inputs are of the same priority. So in the 190 first example it is the local SF that drives the PSC Control logic, 191 and in the second example it is the local Lockout which drives the 192 PSC Control logic. 194 The point that this section clears up is around what happens when the 195 highest priority input goes away. Consider the first case. 196 Initially, the PSC Control logic has [L(FS), R(FS)] and L(FS) is 197 driving PSC's behavior. When L(FS) is removed but R(FS) remains, 198 what does PSC do? A strict reading of the FSM would suggest that PSC 199 transition from PA:F:L into N, and at some future time (perhaps after 200 the remote request refreshes) PSC would transition from N to PA:F:R. 201 This is an unreasonable behavior, as there is no sensible 202 justification for a node behaving as if things were normal (i.e. N 203 state) when it is clear that they are not. 205 The second case is similar. If a node starts with [L(LO), R(FS)] and 206 the local lockout is removed, a strict reading of the state machine 207 would suggest that the node transition from UA:LO:L to N, and then at 208 some future time presumably notice the R(FS) and transition from N to 209 PA:F:R. As with the first case, this is clearly not a useful 210 behavior. 212 In both cases the request which was driving PSC's behavior was 213 removed. What should happen is that the PSC Control logic should, 214 upon removal of an input, immediately reevaluate all other inputs to 215 decide on the next course of action. This requires an implementation 216 to store the most recent local and remote inputs regardless of their 217 eventual use as triggers for the PSC Control Logic. 219 There is a third case. Consider a node with [L(FS), R(LO)]. At some 220 point in time the remote node replaces its Lockout request with a 221 Signal Fail on Working, so that the inputs into the PSC Control logic 222 on the receiving node go to [L(FS), R(SF-W)]. Similar to the first 223 two cases, the node should immediately reevaluate both its local and 224 remote inputs to determine the highest priority among them, and act 225 on that input accordingly. That is in fact what happens, as defined 226 in Section 4.3.3: 228 "When a LER is in a remote state, i.e., state transition in reaction 229 to a PSC message received from the far-end LER, and receives a new 230 PSC message from the far-end LER that indicates a contradictory 231 state, e.g., in remote Unavailable state receiving a remote FS(1,1) 232 message, then the PSC Control logic SHALL reevaluate all inputs (both 233 the local input and the remote message) as if the LER is in the 234 Normal state." 236 This section extends that paragraph to handle the first two cases. 237 The essence of the quoted paragraph is that when faced with multiple 238 inputs, PSC must reevaluate any changes as if it was in Normal state. 239 So the quoted paragraph is replaced with the following text: 241 "The PSC Control logic may simultaneously have Local and Remote 242 requests, and the highest priority of these requests ultimately 243 drives the behavior of the PSC Control logic. When this highest 244 priority request is removed or is replaced with another input, then 245 the PSC Control logic SHALL immediately reevaluate all inputs (both 246 the local input and the remote message), transitioning into a new 247 state only upon reevaluation of all inputs". 249 5. Handling a capabilities mismatch 251 PSC has no explicit facility to negotiate any properties of the 252 protection domain. It does, however, have the ability to signal two 253 properties of that domain, via the Protection Type (PT) and Revertive 254 (R) bits. RFC6378 specifies that if these bits do not match an 255 operator "SHALL [be notified]" (PT, section 4.2.3) or "SHOULD be 256 notified" (R, section 4.2.4). However, there is no text which 257 specifies the behavior of the end nodes of a protection domain in 258 case of a mismatch. This section provides that text, as requested by 259 issue #7 in the liaison. 261 5.1. PT mismatch 263 The behavior of the protection domain depends on the exact PT 264 mismatch. Section 4.2.3 of RFC6378 specifies three protection types 265 - bidirectional switching using a permanent bridge, bidirectional 266 switching using a selector bridge, and unidirectional switching using 267 a permanent bridge. They are abbreviated here as BP, BS and UP. 269 There are three possible mismatches: [BP, UP], [BP, BS], and [UP, 270 BS]. The priority is: 272 UP > BS > BP 274 In other words: 276 o If the PT mismatch is {BP, UP}, the node transmitting BP MUST 277 switch to UP mode if it is supported. 279 o If the PT mismatch is {BP, BS}, the node transmitting BP MUST 280 switch to BS mode if it is supported. 282 o If the PT mismatch is {UP, BS}, the node transmitting BS MUST 283 switch to UP mode if it is supported. 285 5.2. R mismatch 286 The R bit indicates whether the protection domain is in Revertive or 287 Non-Revertive behavior. If the R bits do not match, the node 288 indicating Non-Revertive MUST switch to Revertive if it is supported. 290 5.3. Unsupported modes 292 An implementation may not support all three PT modes and/or both R 293 modes, and thus a pair of nodes may be unable to converge on a common 294 mode. This creates a permanent mismatch, resolvable only by operator 295 intervention. An implementation SHOULD alert the operator to an 296 irreconcilable mismatch. 298 It is desirable to allow the protection domain to function in a non- 299 failure mode even if there is a mismatch, as the mismatches of PT or 300 R have to do with how nodes recover from a failure. An 301 implementation SHOULD allow traffic to be sent on the Working LSP as 302 long as there is no failure (e.g. NR state) regardless of any PT or R 303 mismatch. 305 If there is a trigger which would cause the protection LSP to be 306 used, such as SF or MS, a node MUST NOT use the protection LSP to 307 carry traffic. 309 6. Security Considerations 311 These changes and clarifications raise no new security concerns. 313 7. IANA Considerations 315 There are no requests for IANA actions in this document.. 317 Note to RFC Editor: this section may be removed on publication as an 318 RFC. 320 8. Acknowledgements 322 The author of this document thanks Taesik Cheung, Alessandro 323 D'Alessandro, Annamaria Fulignoli, Sagar Soni, George Swallow and 324 Yaacov Weingarten for their contributions and review. 326 9. References 328 9.1. Normative References 330 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 331 Requirement Levels", BCP 14, RFC 2119, March 1997. 333 [RFC6378] Weingarten, Y., Bryant, S., Osborne, E., Sprecher, N., and 334 A. Fulignoli, "MPLS Transport Profile (MPLS-TP) Linear 335 Protection", RFC 6378, October 2011. 337 [RFC6428] Allan, D., Swallow Ed. , G., and J. Drake Ed. , "Proactive 338 Connectivity Verification, Continuity Check, and Remote 339 Defect Indication for the MPLS Transport Profile", RFC 340 6428, November 2011. 342 9.2. Informative References 344 [LIAISON] ITU-T SG15, "Liaison Statement: Recommendation ITU-T 345 G.8131/Y.1382 revision - Linear protection switching for 346 MPLS-TP networks", , . 349 Author's Address 351 Eric Osborne 352 Cisco Systems 354 Email: eosborne@cisco.com