idnits 2.17.1 draft-ietf-mpls-psc-updates-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC6378, but the abstract doesn't seem to directly say this. It does mention RFC6378 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 29, 2014) is 3592 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC-ietf-mpls-psc-updates-04' is mentioned on line 428, but not defined Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Osborne 3 Internet-Draft 4 Updates: 6378 (if approved) May 29, 2014 5 Intended status: Standards Track 6 Expires: November 30, 2014 8 Updates to MPLS Transport Profile Linear Protection 9 draft-ietf-mpls-psc-updates-06 11 Abstract 13 This document contains a number of updates to the Protection State 14 Coordination (PSC) logic defined in RFC6378, "MPLS Transport Profile 15 (MPLS-TP) Linear Protection". These updates provide some rules and 16 recommendations around the use of TLVs in PSC, address some issues 17 raised in an ITU-T liaison statement, and clarify PSC's behavior in a 18 case not well explained in RFC6378. 20 Requirements Language 22 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 23 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 24 document are to be interpreted as described in RFC 2119 [RFC2119]. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on November 30, 2014. 43 Copyright Notice 45 Copyright (c) 2014 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Message Formatting and Error Handling . . . . . . . . . . . . 3 62 2.1. PSC TLV Format . . . . . . . . . . . . . . . . . . . . . 3 63 2.2. Error handling . . . . . . . . . . . . . . . . . . . . . 4 64 2.2.1. Malformed messages . . . . . . . . . . . . . . . . . 4 65 2.2.2. Well-formed but unknown or unexpected TLV . . . . . . 4 66 3. Incorrect local status after failure . . . . . . . . . . . . 5 67 4. Handling a capabilities mismatch . . . . . . . . . . . . . . 5 68 4.1. Protection Type mismatch . . . . . . . . . . . . . . . . 5 69 4.2. R mismatch . . . . . . . . . . . . . . . . . . . . . . . 6 70 4.3. Unsupported modes . . . . . . . . . . . . . . . . . . . . 6 71 5. Reversion deadlock due to a race condition . . . . . . . . . 6 72 6. Clarifying PSC's behavior in the face of multiple inputs . . 7 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 74 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 75 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 76 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 77 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 78 10.2. Informative References . . . . . . . . . . . . . . . . . 10 79 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 11 81 1. Introduction 83 This document contains a number of updates to PSC [RFC6378]. One 84 provides some rules and recommendations around the use of TLVs in 85 PSC. Three of them address issues #2, #7 and #8 as identified in the 86 ITU's liaison statement "Recommendation ITU-T G.8131/Y.1382 revision 87 - Linear protection switching for MPLS-TP networks" [LIAISON]. 88 Another clears up a behavior which was not well explained in RFC6378. 89 These updates are not changes to the protocol's packet format or to 90 PSC's design, but are corrections and clarifications to specific 91 aspects of the protocol's procedures. This document does not 92 introduce backward compatibility issues with implementations of RFC 93 6378. 95 It should be noted that [I-D.ietf-mpls-tp-psc-itu] contains protocol 96 mechanisms for an alternate mode of operating MPLS-TP PSC. Those 97 modes are built on the message structures and procedures of [RFC6378] 98 and so, while this document does not update 99 [I-D.ietf-mpls-tp-psc-itu], it has an impact on that work through its 100 update to [RFC6378]. 102 This document assumes familiarity with RFC6378 and its terms, 103 conventions and acronyms. Any term used in this document but not 104 defined herein can be found in RFC6378. In particular, this document 105 shares the acronyms defined in RFC6378 section 2.1. 107 2. Message Formatting and Error Handling 109 This section covers message formatting, as well as some recommended 110 error checking. 112 2.1. PSC TLV Format 114 [RFC6378] provides the capability to carry TLVs in the PSC messages. 115 All fields are encoded in network byte order. Each TLV contains 116 three fields, as follows: 118 0 1 2 3 119 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 120 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 121 | Type | Length | 122 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 123 | Value | 124 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 126 Type field (T): 128 A two octet field that encodes a type value. The type values are 129 recorded in the IANA registry "MPLS PSC TLV Registry". 131 Length field (L) : 133 A two octet field that encodes the length in octets of the Value 134 field. 136 The value of this field MUST be a multiple of 4. 138 Value field (V) : 140 The payload of the TLV. The length of this field (which is the value 141 of the Length field) MUST be a multiple of 4 octets, and so this 142 field may contain explicit padding. The length of each single TLV is 143 the sum of the lengths of its three fields: the length of the value 144 field + 4. The overall TLV Length field in the PSC message contains 145 the total length of all TLVs in octets. 147 2.2. Error handling 149 It is recommended to implement error and bounds checking to ensure 150 that received messages, if improperly formatted, are handled in such 151 a way to minimize the impact of this formatting on the behavior of 152 the network and its devices. This section covers two such areas - 153 malformed messages and well-formed but unexpected TLVs. 155 Neither of these sections is intended to limit the error or bounds 156 checking a device performs. The recommendations herein should be 157 taken as a starting point. 159 2.2.1. Malformed messages 161 A implementation SHOULD: 163 o Ensure any fields prior to TLV Length are consistent with RFC 164 6378, particularly Section 4.2. 166 o Ensure the overall length of the message matches the value in the 167 TLV Length + 12. 169 o Check that the sum of the lengths of all TLVs matches the value in 170 the TLV Length. 172 If an implementation receives a message which fails any malformed 173 message checks, it MUST drop the message and SHOULD alert the 174 operator to the malformed message. The method(s) used to alert the 175 operator are outside the scope of this document, but may include 176 things like syslog or console messages. 178 2.2.2. Well-formed but unknown or unexpected TLV 180 If a message is deemed to be properly formed, an implementation 181 SHOULD check all TLVs to ensure that it knows what to do with them. 182 A well-formed but unknown or unexpected TLV value MUST be ignored, 183 and the rest of the message processed as if the ignored TLV did not 184 exist. An implementation detecting a malformed TLV SHOULD alert the 185 operator as described in Section 2.2.1. 187 3. Incorrect local status after failure 189 Issue #2 in the liaison identifies a case where a strict reading of 190 RFC6378 leaves a node reporting an inaccurate status: 192 A node can end up sending incorrect status - NR(0,1) - despite the 193 failure of the protection LSP (P-LSP). This is clearly not correct, 194 as a node should not be sending NR if it has a local failure. To 195 address this issue, the fourth bullet in section 4.3.3.3 of RFC6378 196 is replaced with the following three bullets: 198 o If the current state is due to a local or remote Manual Switch, a 199 local Signal Fail indication on the protection path SHALL cause 200 the LER to enter local Unavailable state and begin transmission of 201 an SF(0,0) message. 203 o If the LER is in local Protecting Administrative state due to a 204 local Forced Switch, a local Signal Fail indication on the 205 protection path SHALL be ignored. 207 o If the LER is in remote Protecting Administrative state due to a 208 remote Forced Switch, a local Signal Fail indication on the 209 protection path SHALL cause the LER to remain in remote Protecting 210 administrative state and transmit an SF(0,1) message. 212 4. Handling a capabilities mismatch 214 PSC has no explicit facility to negotiate any properties of the 215 protection domain. It does, however, have the ability to signal two 216 properties of that domain, via the Protection Type (PT) and Revertive 217 (R) bits. RFC6378 specifies that if these bits do not match an 218 operator "SHALL [be notified]" (PT, section 4.2.3) or "SHOULD be 219 notified" (R, section 4.2.4). However, there is no text which 220 specifies the behavior of the end nodes of a protection domain in 221 case of a mismatch. This section provides that text, as requested by 222 issue #7 in the liaison. 224 4.1. Protection Type mismatch 226 The behavior of the protection domain depends on the exact Protection 227 Type (PT) mismatch. Section 4.2.3 of RFC6378 specifies three 228 protection types - bidirectional switching using a permanent bridge, 229 bidirectional switching using a selector bridge, and unidirectional 230 switching using a permanent bridge. They are abbreviated here as BP, 231 BS and UP. 233 There are three possible mismatches: {BP, UP}, {BP, BS}, and {UP, 234 BS}. The priority is: 236 UP > BS > BP 238 In other words: 240 o If the PT mismatch is {BP, UP}, the node transmitting BP MUST 241 switch to UP mode if it is supported. 243 o If the PT mismatch is {BP, BS}, the node transmitting BP MUST 244 switch to BS mode if it is supported. 246 o If the PT mismatch is {UP, BS}, the node transmitting BS MUST 247 switch to UP mode if it is supported. 249 If a node does not support a mode to which it is required to switch 250 then that node MUST behave as in Section 4.3. 252 4.2. R mismatch 254 The R bit indicates whether the protection domain is in Revertive or 255 Non-Revertive behavior. If the R bits do not match, the node 256 indicating Non-Revertive MUST switch to Revertive if it is supported. 257 If it is not supported a node must behave as in Section 4.3 259 4.3. Unsupported modes 261 An implementation may not support all three PT modes and/or both R 262 modes, and thus a pair of nodes may be unable to converge on a common 263 mode. This creates a permanent mismatch, resolvable only by operator 264 intervention. An implementation SHOULD alert the operator to an 265 irreconcilable mismatch. 267 It is desirable to allow the protection domain to function in a non- 268 failure mode even if there is a mismatch, as the mismatches of PT or 269 R have to do with how nodes recover from a failure. An 270 implementation SHOULD allow traffic to be sent on the Working LSP as 271 long as there is no failure (e.g. NR state) regardless of any PT or 272 R mismatch. 274 If there is a trigger which would cause the protection LSP to be 275 used, such as SF or MS, a node MUST NOT use the protection LSP to 276 carry traffic. 278 5. Reversion deadlock due to a race condition 280 Issue #8 in the liaison identifies a deadlock case where each node 281 can end up sending NR(0,1) when it should instead be in the process 282 of recovering from the failure (i.e. entering into WTR or DNR, as 283 appropriate for the protection domain). The root of the issue is 284 that a pair of nodes can simultaneously enter WTR state, receive an 285 out of date SF-W indication and transition into a remotely triggered 286 WTR, and remain in remotely triggered WTR waiting for the other end 287 to trigger a change in status. 289 In the case identified in issue #8, each node can end up sending 290 NR(0,1), which is an indication that the transmitting node has no 291 local failure, but is instead reacting to the remote SF-W. If a node 292 which receives NR(0,1) is in fact not indicating a local error, the 293 correct behavior for the receiving node is to take the received 294 NR(0,1) as an indication that there is no error in the protection 295 domain, and recovery procedures (WTR or DNR) should begin. 297 This is addressed by adding the following text as the penultimate 298 bullet in section 4.3.3.4 of RFC6378: 300 o If a node is in Protecting Failure state due to a remote SF-W and 301 receives NR(0,1), this SHALL cause the node to begin recovery 302 procedures. If the LER is configured for revertive behavior, it 303 enters into Wait-to-Restore state, starts the WTR timer, and 304 begins transmitting WTR(0,1). If the LER is configured for non- 305 revertive behavior, it enters into Do-Not-Revert state and begins 306 transmitting a DNR(0,1) message. 308 Additionally, the final bullet in section 4.3.3.3 is changed from 310 o A remote NR(0,0) message SHALL be ignored if in local Protecting 311 administrative state. 313 to 315 o A remote No Request message SHALL be ignored if in local 316 Protecting administrative state. 318 This indicates that a remote NR triggers the same behavior regardless 319 of the value of FPath and Path. This change does not directly 320 address issue #8, but fixes a similar issue - if a node receives NR 321 while in Remote administrative state, the value of FPath and Path 322 have no bearing on the node's reaction to this NR. 324 6. Clarifying PSC's behavior in the face of multiple inputs 326 RFC6378 describes the PSC state machine. Figure 1 in section 3 shows 327 two inputs into the PSC Control logic - Local Request logic and 328 Remote PSC Request. When there is only one input into the PSC 329 Control logic - a local request or a remote request but not both - 330 the PSC Control logic decides what that input signifies and then 331 takes one or more actions, as necessary. This is what the PSC State 332 Machine in section 4.3 describes. 334 RFC6378 does not sufficiently describe the behavior in the face of 335 multiple inputs into the PSC Control Logic (one Local Request and one 336 Remote Request). This section clarifies the expected behavior. 338 There are two cases to think about when considering dual inputs into 339 the PSC Control logic. The first is when the same request is 340 presented from both local and remote sources. One example of this 341 case is a Forced Switch (FS) configured on both ends of an LSP. This 342 will result in the PSC Control logic receiving both a local FS and 343 remove FS. For convenience, this scenario is written as [L(FS), 344 R(FS)] - that is, Local(Forced Switch) and Remote(Forced Switch). 346 The second case, which is handled in exactly the same way as the 347 first, is when the two inputs into the PSC Control logic describe 348 different events. There are a number of variations on this case. 349 One example is when there is a Lockout of Protection from the Local 350 request logic and a Signal Fail on the Working path from the Remote 351 PSC Request. This is shortened to [L(LO), R(SF-W)]. 353 In both cases the question is not how the PSC Control logic decides 354 which of these is the one it acts upon. Section 4.3.2 of RFC6378 355 lists the priority order, and prioritizes the local input over the 356 remote input in case both inputs are of the same priority. So in the 357 first example it is the local SF that drives the PSC Control logic, 358 and in the second example it is the local Lockout which drives the 359 PSC Control logic. 361 The point that this section clears up is around what happens when the 362 highest priority input goes away. Consider the first case. 363 Initially, the PSC Control logic has [L(FS), R(FS)] and L(FS) is 364 driving PSC's behavior. When L(FS) is removed but R(FS) remains, 365 what does PSC do? A strict reading of the FSM would suggest that PSC 366 transition from PA:F:L into N, and at some future time (perhaps after 367 the remote request refreshes) PSC would transition from N to PA:F:R. 368 This is an unreasonable behavior, as there is no sensible 369 justification for a node behaving as if things were normal (i.e., N 370 state) when it is clear that they are not. 372 The second case is similar. If a node starts with [L(LO), R(SF-W)] 373 and the local lockout is removed, a strict reading of the state 374 machine would suggest that the node transition from UA:LO:L to N, and 375 then at some future time presumably notice the R(SF-W) and transition 376 from N to PF:W:R. As with the first case, this is clearly not a 377 useful behavior. 379 In both cases the request that was driving PSC's behavior was 380 removed. What should happen is that the PSC Control logic should, 381 upon removal of an input, immediately reevaluate all other inputs to 382 decide on the next course of action. This requires an implementation 383 to store the most recent local and remote inputs regardless of their 384 eventual use as triggers for the PSC Control Logic. 386 There is also a third case. Consider a node with [L(FS), R(LO)]. At 387 some point in time the remote node replaces its Lockout request with 388 a Signal Fail on Working, so that the inputs into the PSC Control 389 logic on the receiving node go to [L(FS), R(SF-W)]. Similar to the 390 first two cases, the node should immediately reevaluate both its 391 local and remote inputs to determine the highest priority among them, 392 and act on that input accordingly. That is in fact what happens, as 393 defined in Section 4.3.3: 395 "When a LER is in a remote state, i.e., state transition in reaction 396 to a PSC message received from the far-end LER, and receives a new 397 PSC message from the far-end LER that indicates a contradictory 398 state, e.g., in remote Unavailable state receiving a remote FS(1,1) 399 message, then the PSC Control logic SHALL reevaluate all inputs (both 400 the local input and the remote message) as if the LER is in the 401 Normal state." 403 This section extends that paragraph to handle the first two cases. 404 The essence of the quoted paragraph is that when faced with multiple 405 inputs, PSC must reevaluate any changes as if it was in Normal state. 406 So the quoted paragraph is replaced with the following text: 408 "The PSC Control logic may simultaneously have Local and Remote 409 requests, and the highest priority of these requests ultimately 410 drives the behavior of the PSC Control logic. When this highest 411 priority request is removed or is replaced with another input, then 412 the PSC Control logic SHALL immediately reevaluate all inputs (both 413 the local input and the remote message), transitioning into a new 414 state only upon reevaluation of all inputs". 416 7. Security Considerations 418 These changes and clarifications raise no new security concerns. RFC 419 6941 [RFC6941] provides the baseline security discussion for MPLS-TP, 420 and PSC (both RFC 6378 and this document) fall under that umbrella. 421 Additionally, Section 2.2 clarifies how to react to malformed or 422 unexpected messages. 424 8. IANA Considerations 426 IANA is requested to mark the value 0 in the "MPLS PSC TLV Registry" 427 as "Reserved, not to be allocated" and to update the references to 428 show [RFC6378] and [RFC-ietf-mpls-psc-updates-04]. Note that this 429 action provides documentation of an action already taken by IANA but 430 not recorded in RFC 6378. 432 9. Acknowledgements 434 The author of this document thanks Taesik Cheung, Alessandro 435 D'Alessandro, Annamaria Fulignoli, Sagar Soni, George Swallow and 436 Yaacov Weingarten for their contributions and review, and Adrian 437 Farrel for the text of Section 2. 439 10. References 441 10.1. Normative References 443 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 444 Requirement Levels", BCP 14, RFC 2119, March 1997. 446 [RFC6378] Weingarten, Y., Bryant, S., Osborne, E., Sprecher, N., and 447 A. Fulignoli, "MPLS Transport Profile (MPLS-TP) Linear 448 Protection", RFC 6378, October 2011. 450 10.2. Informative References 452 [I-D.ietf-mpls-tp-psc-itu] 453 Ryoo, J., Gray, E., Helvoort, H., D'Alessandro, A., 454 Cheung, T., and E. Osborne, "MPLS Transport Profile (MPLS- 455 TP) Linear Protection to Match the Operational 456 Expectations of SDH, OTN and Ethernet Transport Network 457 Operators", draft-ietf-mpls-tp-psc-itu-04 (work in 458 progress), March 2014. 460 [LIAISON] ITU-T SG15, "Liaison Statement: Recommendation ITU-T G 461 .8131/Y.1382 revision - Linear protection switching for 462 MPLS-TP networks", . 465 [RFC6941] Fang, L., Niven-Jenkins, B., Mansfield, S., and R. 466 Graveman, "MPLS Transport Profile (MPLS-TP) Security 467 Framework", RFC 6941, April 2013. 469 Author's Address 471 Eric Osborne 473 Email: eric.osborne@notcom.com