idnits 2.17.1 draft-ietf-mpls-psc-updates-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC6378, but the abstract doesn't seem to directly say this. It does mention RFC6378 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 22, 2014) is 3629 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC-ietf-mpls-psc-updates-04' is mentioned on line 413, but not defined Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Osborne 3 Internet-Draft 4 Updates: 6378 (if approved) April 22, 2014 5 Intended status: Standards Track 6 Expires: October 24, 2014 8 Updates to MPLS Transport Profile Linear Protection 9 draft-ietf-mpls-psc-updates-05 11 Abstract 13 This document contains a number of updates to the Protection State 14 Coordination (PSC) logic defined in RFC6378, "MPLS Transport Profile 15 (MPLS-TP) Linear Protection". These updates provide some rules and 16 recommendations around the use of TLVs in PSC, address some issues 17 raised in an ITU-T liaison statement, and clarify PSC's behavior in a 18 case not well explained in RFC6378. 20 Requirements Language 22 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 23 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 24 document are to be interpreted as described in RFC 2119 [RFC2119]. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on October 24, 2014. 43 Copyright Notice 45 Copyright (c) 2014 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Message Formatting and Error Handling . . . . . . . . . . . . 3 62 2.1. Format on the wire . . . . . . . . . . . . . . . . . . . 3 63 2.2. PSC TLV Format . . . . . . . . . . . . . . . . . . . . . 3 64 2.3. Error handling . . . . . . . . . . . . . . . . . . . . . 3 65 2.3.1. Malformed messages . . . . . . . . . . . . . . . . . 4 66 2.3.2. Well-formed but unexpected TLV . . . . . . . . . . . 4 67 3. Incorrect local status after failure . . . . . . . . . . . . 4 68 4. Handling a capabilities mismatch . . . . . . . . . . . . . . 5 69 4.1. Protection Type mismatch . . . . . . . . . . . . . . . . 5 70 4.2. R mismatch . . . . . . . . . . . . . . . . . . . . . . . 6 71 4.3. Unsupported modes . . . . . . . . . . . . . . . . . . . . 6 72 5. Reversion deadlock due to a race condition . . . . . . . . . 6 73 6. Clarifying PSC's behavior in the face of multiple inputs . . 7 74 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 75 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 76 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 77 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 78 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 79 10.2. Informative References . . . . . . . . . . . . . . . . . 10 80 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 82 1. Introduction 84 This document contains a number of updates to PSC [RFC6378]. One 85 provides some rules and recommendations around the use of TLVs in 86 PSC. Three of them address issues #2, #7 and #8 as identified in the 87 ITU's liaison statement "Recommendation ITU-T G.8131/Y.1382 revision 88 - Linear protection switching for MPLS-TP networks" [LIAISON]. 89 Another clears up a behavior which was not well explained in RFC6378. 90 These updates are not changes to the protocol's packet format or to 91 PSC's design, but are corrections and clarifications to specific 92 aspects of the protocol's procedures. 94 This document assumes familiarity with RFC6378 and its terms, 95 conventions and acronyms. Any term used in this document but not 96 defined herein can be found in RFC6378. In particular, this document 97 shares the acronyms defined in RFC6378 section 2.1. 99 2. Message Formatting and Error Handling 101 This section covers message formatting, as well as some recommended 102 error checking. 104 2.1. Format on the wire 106 All integer fields in the PSC TLV are encoded as unsigned integers in 107 network bit order. 109 2.2. PSC TLV Format 111 [RFC6378] provides the capability to carry TLVs in the PSC messages. 112 This section defines the format to be used by all such TLVs. All 113 fields are encoded in network byte order. 115 Type field (T): 117 A two octet field that encodes a type value. The type values are 118 recorded in the IANA registry "MPLS PSC TLV Registry". 120 Length field (L) : 122 A two octet field that encodes the length in octets of the Value 123 field. 125 The TLV Length is the sum of the lengths of all TLVs in the message. 126 The length of a TLV is the sum of the lengths of the three TLV 127 fields, i.e., the the length of the value field + 4. 129 The value of this field MUST be a multiple of 4. 131 Value field (V) : 133 The contents of the TLV. This field MUST be a multiple of 4 octets 134 and so may contain explicit padding. 136 2.3. Error handling 138 It is recommended to implement error and bounds checking to ensure 139 that received messages, if improperly formatted, are handled in such 140 a way to minimize the impact of this formatting on the behavior of 141 the network and its devices. This section covers two such areas - 142 malformed messages and well-formed but unexpected TLVs. 144 Neither of these sections is intended to limit the error or bounds 145 checking a device performs. The recommendations here should be taken 146 as a starting point. 148 2.3.1. Malformed messages 150 A implementation SHOULD: 152 o Ensure any fields prior to TLV Length are consistent with RFC 153 6378, particularly Section 4.2. 155 o Ensure the overall length of the message matches the value in the 156 TLV Length + 12. 158 o Check that the sum of the lengths of all TLVs matches the value in 159 the TLV Length. 161 If an implementation receives a message which fails any malformed 162 message checks, it MUST drop the message and SHOULD alert the 163 operator to the malformed message. The method(s) used to alert the 164 operator are outside the scope of this document, but may include 165 things like syslog or console messages. 167 2.3.2. Well-formed but unexpected TLV 169 If a message is deemed to be properly formed, an implementation 170 SHOULD check all TLVs to ensure that it knows what to do with them. 171 A well-formed but unknown TLV value MUST be ignored, and the rest of 172 the message processesed as if the ignored TLV did not exist. An 173 implementation detecting a malformed TLV SHOULD alert the operator as 174 described in Section 2.3.1. 176 3. Incorrect local status after failure 178 Issue #2 in the liaison identifies a case where a strict reading of 179 RFC6378 leaves a node reporting an inaccurate status: 181 A node can end up sending incorrect status - NR(0,1) - despite the 182 failure of the protection LSP (P-LSP). This is clearly not correct, 183 as a node should not be sending NR if it has a local failure. To 184 address this issue, the fourth bullet in section 4.3.3.3 of RFC6378 185 is replaced with the following three bullets: 187 o If the current state is due to a local or remote Manual Switch, a 188 local Signal Fail indication on the protection path SHALL cause 189 the LER to enter local Unavailable state and begin transmission of 190 an SF(0,0) message. 192 o If the LER is in local Protecting Administrative state due to a 193 local Forced Switch, a local Signal Fail indication on the 194 protection path SHALL be ignored. 196 o If the LER is in remote Protecting Administrative state due to a 197 remote Forced Switch, a local Signal Fail indication on the 198 protection path SHALL cause the LER to remain in remote Protecting 199 administrative state and transmit an SF(0,1) message. 201 4. Handling a capabilities mismatch 203 PSC has no explicit facility to negotiate any properties of the 204 protection domain. It does, however, have the ability to signal two 205 properties of that domain, via the Protection Type (PT) and Revertive 206 (R) bits. RFC6378 specifies that if these bits do not match an 207 operator "SHALL [be notified]" (PT, section 4.2.3) or "SHOULD be 208 notified" (R, section 4.2.4). However, there is no text which 209 specifies the behavior of the end nodes of a protection domain in 210 case of a mismatch. This section provides that text, as requested by 211 issue #7 in the liaison. 213 4.1. Protection Type mismatch 215 The behavior of the protection domain depends on the exact Protection 216 Type (PT) mismatch. Section 4.2.3 of RFC6378 specifies three 217 protection types - bidirectional switching using a permanent bridge, 218 bidirectional switching using a selector bridge, and unidirectional 219 switching using a permanent bridge. They are abbreviated here as BP, 220 BS and UP. 222 There are three possible mismatches: {BP, UP}, {BP, BS}, and {UP, 223 BS}. The priority is: 225 UP > BS > BP 227 In other words: 229 o If the PT mismatch is {BP, UP}, the node transmitting BP MUST 230 switch to UP mode if it is supported. 232 o If the PT mismatch is {BP, BS}, the node transmitting BP MUST 233 switch to BS mode if it is supported. 235 o If the PT mismatch is {UP, BS}, the node transmitting BS MUST 236 switch to UP mode if it is supported. 238 If a node does not support a mode to which it is required to switch 239 then that node MUST behave as in Section 4.3. 241 4.2. R mismatch 243 The R bit indicates whether the protection domain is in Revertive or 244 Non-Revertive behavior. If the R bits do not match, the node 245 indicating Non-Revertive MUST switch to Revertive if it is supported. 246 If it is not supported a node must behave as in Section 4.3 248 4.3. Unsupported modes 250 An implementation may not support all three PT modes and/or both R 251 modes, and thus a pair of nodes may be unable to converge on a common 252 mode. This creates a permanent mismatch, resolvable only by operator 253 intervention. An implementation SHOULD alert the operator to an 254 irreconcilable mismatch. 256 It is desirable to allow the protection domain to function in a non- 257 failure mode even if there is a mismatch, as the mismatches of PT or 258 R have to do with how nodes recover from a failure. An 259 implementation SHOULD allow traffic to be sent on the Working LSP as 260 long as there is no failure (e.g. NR state) regardless of any PT or R 261 mismatch. 263 If there is a trigger which would cause the protection LSP to be 264 used, such as SF or MS, a node MUST NOT use the protection LSP to 265 carry traffic. 267 5. Reversion deadlock due to a race condition 269 Issue #8 in the liaison identifies a deadlock case where each node 270 can end up sending NR(0,1) when it should instead be in the process 271 of recovering from the failure (i.e. entering into WTR or DNR, as 272 appropriate for the protection domain). The root of the issue is 273 that a pair of nodes can simultaneously enter WTR state, receive an 274 out of date SF-W indication and transition into a remotely triggered 275 WTR, and remain in remotely triggered WTR waiting for the other end 276 to trigger a change in status. 278 In the case identified in issue #8, each node can end up sending 279 NR(0,1), which is an indication that the transmitting node has no 280 local failure, but is instead reacting to the remote SF-W. If a node 281 which receives NR(0,1) is in fact not indicating a local error, the 282 correct behavior for the receiving node is to take the received 283 NR(0,1) as an indication that there is no error in the protection 284 domain, and recovery procedures (WTR or DNR) should begin. 286 This is addressed by adding the following text as the penultimate 287 bullet in section 4.3.3.4 of RFC6378: 289 o If a node is in Protecting Failure state due to a remote SF-W and 290 receives NR(0,1), this SHALL cause the node to begin recovery 291 procedures. If the LER is configured for revertive behavior, it 292 enters into Wait-to-Restore state, starts the WTR timer, and 293 begins transmitting WTR(0,1). If the LER is configured for non- 294 revertive behavior, it enters into Do-Not-Revert state and begins 295 transmitting a DNR(0,1) message. 297 Additionally, the final bullet in section 4.3.3.3 is changed from 299 o A remote NR(0,0) message SHALL be ignored if in local Protecting 300 administrative state. 302 to 304 o A remote No Request message SHALL be ignored if in local 305 Protecting administrative state. 307 This indicates that a remote NR triggers the same behavior regardless 308 of the value of FPath and Path. This change does not directly 309 address issue #8, but fixes a similar issue - if a node receives NR 310 while in Remote administrative state, the value of FPath and Path 311 have no bearing on the node's reaction to this NR. 313 6. Clarifying PSC's behavior in the face of multiple inputs 315 RFC6378 describes the PSC state machine. Figure 1 in section 3 shows 316 two inputs into the PSC Control logic - Local Request logic and 317 Remote PSC Request. When there is only one input into the PSC 318 Control logic - a local request or a remote request but not both - 319 the PSC Control logic decides what that input signifies and then 320 takes one or more actions, as necessary. This is what the PSC State 321 Machine in section 4.3 describes. 323 RFC6378 does not sufficiently describe the behavior in the face of 324 multiple inputs into the PSC Control Logic (one Local Request and one 325 Remote Request). This section clarifies the expected behavior. 327 There are two cases to think about when considering dual inputs into 328 the PSC Control logic. The first is when the same request is 329 presented from both local and remote sources. One example of this 330 case is a Forced Switch (FS) configured on both ends of an LSP. This 331 will result in the PSC Control logic receiving both a local FS and 332 remove FS. For convenience, this scenario is written as [L(FS), 333 R(FS)] - that is, Local(Forced Switch) and Remote(Forced Switch). 335 The second case, which is handled in exactly the same way as the 336 first, is when the two inputs into the PSC Control logic describe 337 different events. There are a number of variations on this case. 338 One example is when there is a Lockout of Protection from the Local 339 request logic and a Signal Fail on the Working path from the Remote 340 PSC Request. This is shortened to [L(LO), R(SF-W)]. 342 In both cases the question is not how the PSC Control logic decides 343 which of these is the one it acts upon. Section 4.3.2 of RFC6378 344 lists the priority order, and prioritizes the local input over the 345 remote input in case both inputs are of the same priority. So in the 346 first example it is the local SF that drives the PSC Control logic, 347 and in the second example it is the local Lockout which drives the 348 PSC Control logic. 350 The point that this section clears up is around what happens when the 351 highest priority input goes away. Consider the first case. 352 Initially, the PSC Control logic has [L(FS), R(FS)] and L(FS) is 353 driving PSC's behavior. When L(FS) is removed but R(FS) remains, 354 what does PSC do? A strict reading of the FSM would suggest that PSC 355 transition from PA:F:L into N, and at some future time (perhaps after 356 the remote request refreshes) PSC would transition from N to PA:F:R. 357 This is an unreasonable behavior, as there is no sensible 358 justification for a node behaving as if things were normal (i.e., N 359 state) when it is clear that they are not. 361 The second case is similar. If a node starts with [L(LO), R(SF-W)] 362 and the local lockout is removed, a strict reading of the state 363 machine would suggest that the node transition from UA:LO:L to N, and 364 then at some future time presumably notice the R(SF-W) and transition 365 from N to PF:W:R. As with the first case, this is clearly not a 366 useful behavior. 368 In both cases the request that was driving PSC's behavior was 369 removed. What should happen is that the PSC Control logic should, 370 upon removal of an input, immediately reevaluate all other inputs to 371 decide on the next course of action. This requires an implementation 372 to store the most recent local and remote inputs regardless of their 373 eventual use as triggers for the PSC Control Logic. 375 There is a third case. Consider a node with [L(FS), R(LO)]. At some 376 point in time the remote node replaces its Lockout request with a 377 Signal Fail on Working, so that the inputs into the PSC Control logic 378 on the receiving node go to [L(FS), R(SF-W)]. Similar to the first 379 two cases, the node should immediately reevaluate both its local and 380 remote inputs to determine the highest priority among them, and act 381 on that input accordingly. That is in fact what happens, as defined 382 in Section 4.3.3: 384 "When a LER is in a remote state, i.e.,, state transition in reaction 385 to a PSC message received from the far-end LER, and receives a new 386 PSC message from the far-end LER that indicates a contradictory 387 state, e.g., in remote Unavailable state receiving a remote FS(1,1) 388 message, then the PSC Control logic SHALL reevaluate all inputs (both 389 the local input and the remote message) as if the LER is in the 390 Normal state." 392 This section extends that paragraph to handle the first two cases. 393 The essence of the quoted paragraph is that when faced with multiple 394 inputs, PSC must reevaluate any changes as if it was in Normal state. 395 So the quoted paragraph is replaced with the following text: 397 "The PSC Control logic may simultaneously have Local and Remote 398 requests, and the highest priority of these requests ultimately 399 drives the behavior of the PSC Control logic. When this highest 400 priority request is removed or is replaced with another input, then 401 the PSC Control logic SHALL immediately reevaluate all inputs (both 402 the local input and the remote message), transitioning into a new 403 state only upon reevaluation of all inputs". 405 7. Security Considerations 407 These changes and clarifications raise no new security concerns. 409 8. IANA Considerations 411 IANA is requested to mark the value 0 in the "MPLS PSC TLV Registry" 412 as "Reserved, not to be allocated" and to update the references to 413 show [RFC6378] and [RFC-ietf-mpls-psc-updates-04]. Note that this 414 action provides documentation of an action already taken by IANA but 415 not recorded in RFC 6378. 417 9. Acknowledgements 419 The author of this document thanks Taesik Cheung, Alessandro 420 D'Alessandro, Annamaria Fulignoli, Sagar Soni, George Swallow and 421 Yaacov Weingarten for their contributions and review, and Adrian 422 Farrel for the text of Section 2. 424 10. References 426 10.1. Normative References 428 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 429 Requirement Levels", BCP 14, RFC 2119, March 1997. 431 [RFC6378] Weingarten, Y., Bryant, S., Osborne, E., Sprecher, N., and 432 A. Fulignoli, "MPLS Transport Profile (MPLS-TP) Linear 433 Protection", RFC 6378, October 2011. 435 10.2. Informative References 437 [LIAISON] ITU-T SG15, "Liaison Statement: Recommendation ITU-T 438 G.8131/Y.1382 revision - Linear protection switching for 439 MPLS-TP networks", . 442 Author's Address 444 Eric Osborne 446 Email: eric.osborne@notcom.com