idnits 2.17.1 draft-ietf-mpls-psc-updates-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC6378, but the abstract doesn't seem to directly say this. It does mention RFC6378 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 21, 2014) is 3657 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC-ietf-mpls-psc-updates-04' is mentioned on line 412, but not defined Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Osborne 3 Internet-Draft 4 Updates: 6378 (if approved) April 21, 2014 5 Intended status: Standards Track 6 Expires: October 23, 2014 8 Updates to MPLS Transport Profile Linear Protection 9 draft-ietf-mpls-psc-updates-04 11 Abstract 13 This document contains a number of updates to the Protection State 14 Coordination (PSC) logic defined in RFC6378, "MPLS Transport Profile 15 (MPLS-TP) Linear Protection". These updates provide some rules and 16 recommendations around the use of TLVs in PSC, address some issues 17 raised in an ITU-T liaision statement, and clarify 19 Requirements Language 21 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 22 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 23 document are to be interpreted as described in RFC 2119 [RFC2119]. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on October 23, 2014. 42 Copyright Notice 44 Copyright (c) 2014 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 2. Message Formatting and Error Handling . . . . . . . . . . . . 3 61 2.1. Format on the wire . . . . . . . . . . . . . . . . . . . 3 62 2.2. PSC TLV Format . . . . . . . . . . . . . . . . . . . . . 3 63 2.3. Error handling . . . . . . . . . . . . . . . . . . . . . 3 64 2.3.1. Malformed messages . . . . . . . . . . . . . . . . . 4 65 2.3.2. Well-formed but unexpected TLV . . . . . . . . . . . 4 66 3. Incorrect local status after failure . . . . . . . . . . . . 4 67 4. Handling a capabilities mismatch . . . . . . . . . . . . . . 5 68 4.1. Protection Type mismatch . . . . . . . . . . . . . . . . 5 69 4.2. R mismatch . . . . . . . . . . . . . . . . . . . . . . . 6 70 4.3. Unsupported modes . . . . . . . . . . . . . . . . . . . . 6 71 5. Reversion deadlock due to a race condition . . . . . . . . . 6 72 6. Clarifying PSC's behavior in the face of multiple inputs . . 7 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 74 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 75 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 76 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 77 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 78 10.2. Informative References . . . . . . . . . . . . . . . . . 10 79 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 81 1. Introduction 83 This document contains a number of updates to PSC [RFC6378]. One 84 provides some rules and recommendations around the use of TLVs in 85 PSC. Three of them address issues #2, #7 and #8 as identified in the 86 ITU's liaison statement "Recommendation ITU-T G.8131/Y.1382 revision 87 - Linear protection switching for MPLS-TP networks" [LIAISON]. 88 Another clears up a behavior which was not well explained in RFC6378. 89 These updates are not changes to the protocol's packet format or to 90 PSC's design, but are corrections and clarifications to specific 91 aspects of the protocol's procedures. 93 This document assumes familiarity with RFC6378 and its terms, 94 conventions and acronyms. Any term used in this document but not 95 defined herein can be found in RFC6378. In particular, this document 96 shares the acronyms defined in RFC6378 section 2.1. 98 2. Message Formatting and Error Handling 100 This section covers message formatting, as well as some recommended 101 error checking. 103 2.1. Format on the wire 105 All integer fields in the PSC TLV are encoded as unsigned integers in 106 network bit order. 108 2.2. PSC TLV Format 110 [RFC6378] provides the capability to carry TLVs in the PSC messages. 111 This section defines the format to be used by all such TLVs. All 112 fields are encoded in network byte order. 114 Type field (T): 116 A two octet field that encodes a type value. The type values are 117 recorded in the IANA registry "MPLS PSC TLV Registry". 119 Length field (L) : 121 A two octet field that encodes the length in octets of the Value 122 field. 124 The TLV Length is the sum of the lengths of all TLVs in the message. 125 The length of a TLV is the sum of the lengths of the three TLV 126 fields, i.e., the the length of the value field + 4. 128 The value of this field MUST be a multiple of 4. 130 Value field (V) : 132 The contents of the TLV. This field MUST be a multiple of 4 octets 133 and so may contain explicit padding. 135 2.3. Error handling 137 It is recommended to implement error and bounds checking to ensure 138 that received messages, if improperly formatted, are handled in such 139 a way to minimize the impact of this formatting on the behavior of 140 the network and its devices. This section covers two such areas - 141 malformed messages and well-formed but unexpected TLVs. 143 Neither of these sections is intended to limit the error or bounds 144 checking a device performs. The recommendations here should be taken 145 as a starting point. 147 2.3.1. Malformed messages 149 A implementation SHOULD: 151 o Ensure any fields prior to TLV Length are consistent with RFC 152 6378, particularly Section 4.2. 154 o Ensure the overall length of the message matches the value in the 155 TLV Length + 12. 157 o Check that the sum of the lengths of all TLVs matches the value in 158 the TLV Length. 160 If an implementation receives a message which fails any malformed 161 message checks, it MUST drop the message and SHOULD alert the 162 operator to the malformed message. The method(s) used to alert the 163 operator are outside the scope of this document, but may include 164 things like syslog or console messages. 166 2.3.2. Well-formed but unexpected TLV 168 If a message is deemed to be properly formed, an implementation 169 SHOULD check all TLVs to ensure that it knows what to do with them. 170 A well-formed but unknown TLV value MUST be ignored, and the rest of 171 the message processesed as if the ignored TLV did not exist. An 172 implementation detecting a malformed TLV SHOULD alert the operator as 173 described in Section 2.3.1. 175 3. Incorrect local status after failure 177 Issue #2 in the liaison identifies a case where a strict reading of 178 RFC6378 leaves a node reporting an inaccurate status: 180 A node can end up sending incorrect status - NR(0,1) - despite the 181 failure of the protection LSP (P-LSP). This is clearly not correct, 182 as a node should not be sending NR if it has a local failure. To 183 address this issue, the fourth bullet in section 4.3.3.3 of RFC6378 184 is replaced with the following three bullets: 186 o If the current state is due to a local or remote Manual Switch, a 187 local Signal Fail indication on the protection path SHALL cause 188 the LER to enter local Unavailable state and begin transmission of 189 an SF(0,0) message. 191 o If the LER is in local Protecting Administrative state due to a 192 local Forced Switch, a local Signal Fail indication on the 193 protection path SHALL be ignored. 195 o If the LER is in remote Protecting Administrative state due to a 196 remote Forced Switch, a local Signal Fail indication on the 197 protection path SHALL cause the LER to remain in remote Protecting 198 administrative state and transmit an SF(0,1) message. 200 4. Handling a capabilities mismatch 202 PSC has no explicit facility to negotiate any properties of the 203 protection domain. It does, however, have the ability to signal two 204 properties of that domain, via the Protection Type (PT) and Revertive 205 (R) bits. RFC6378 specifies that if these bits do not match an 206 operator "SHALL [be notified]" (PT, section 4.2.3) or "SHOULD be 207 notified" (R, section 4.2.4). However, there is no text which 208 specifies the behavior of the end nodes of a protection domain in 209 case of a mismatch. This section provides that text, as requested by 210 issue #7 in the liaison. 212 4.1. Protection Type mismatch 214 The behavior of the protection domain depends on the exact Protection 215 Type (PT) mismatch. Section 4.2.3 of RFC6378 specifies three 216 protection types - bidirectional switching using a permanent bridge, 217 bidirectional switching using a selector bridge, and unidirectional 218 switching using a permanent bridge. They are abbreviated here as BP, 219 BS and UP. 221 There are three possible mismatches: {BP, UP}, {BP, BS}, and {UP, 222 BS}. The priority is: 224 UP > BS > BP 226 In other words: 228 o If the PT mismatch is {BP, UP}, the node transmitting BP MUST 229 switch to UP mode if it is supported. 231 o If the PT mismatch is {BP, BS}, the node transmitting BP MUST 232 switch to BS mode if it is supported. 234 o If the PT mismatch is {UP, BS}, the node transmitting BS MUST 235 switch to UP mode if it is supported. 237 If a node does not support a mode to which it is required to switch 238 then that node MUST behave as in Section 4.3. 240 4.2. R mismatch 242 The R bit indicates whether the protection domain is in Revertive or 243 Non-Revertive behavior. If the R bits do not match, the node 244 indicating Non-Revertive MUST switch to Revertive if it is supported. 245 If it is not supported a node must behave as in Section 4.3 247 4.3. Unsupported modes 249 An implementation may not support all three PT modes and/or both R 250 modes, and thus a pair of nodes may be unable to converge on a common 251 mode. This creates a permanent mismatch, resolvable only by operator 252 intervention. An implementation SHOULD alert the operator to an 253 irreconcilable mismatch. 255 It is desirable to allow the protection domain to function in a non- 256 failure mode even if there is a mismatch, as the mismatches of PT or 257 R have to do with how nodes recover from a failure. An 258 implementation SHOULD allow traffic to be sent on the Working LSP as 259 long as there is no failure (e.g. NR state) regardless of any PT or R 260 mismatch. 262 If there is a trigger which would cause the protection LSP to be 263 used, such as SF or MS, a node MUST NOT use the protection LSP to 264 carry traffic. 266 5. Reversion deadlock due to a race condition 268 Issue #8 in the liaison identifies a deadlock case where each node 269 can end up sending NR(0,1) when it should instead be in the process 270 of recovering from the failure (i.e. entering into WTR or DNR, as 271 appropriate for the protection domain). The root of the issue is 272 that a pair of nodes can simultaneously enter WTR state, receive an 273 out of date SF-W indication and transition into a remotely triggered 274 WTR, and remain in remotely triggered WTR waiting for the other end 275 to trigger a change in status. 277 In the case identified in issue #8, each node can end up sending 278 NR(0,1), which is an indication that the transmitting node has no 279 local failure, but is instead reacting to the remote SF-W. If a node 280 which receives NR(0,1) is in fact not indicating a local error, the 281 correct behavior for the receiving node is to take the received 282 NR(0,1) as an indication that there is no error in the protection 283 domain, and recovery procedures (WTR or DNR) should begin. 285 This is addressed by adding the following text as the penultimate 286 bullet in section 4.3.3.4 of RFC6378: 288 o If a node is in Protecting Failure state due to a remote SF-W and 289 receives NR(0,1), this SHALL cause the node to begin recovery 290 procedures. If the LER is configured for revertive behavior, it 291 enters into Wait-to-Restore state, starts the WTR timer, and 292 begins transmitting WTR(0,1). If the LER is configured for non- 293 revertive behavior, it enters into Do-Not-Revert state and begins 294 transmitting a DNR(0,1) message. 296 Additionally, the final bullet in section 4.3.3.3 is changed from 298 o A remote NR(0,0) message SHALL be ignored if in local Protecting 299 administrative state. 301 to 303 o A remote No Request message SHALL be ignored if in local 304 Protecting administrative state. 306 This indicates that a remote NR triggers the same behavior regardless 307 of the value of FPath and Path. This change does not directly 308 address issue #8, but fixes a similar issue - if a node receives NR 309 while in Remote administrative state, the value of FPath and Path 310 have no bearing on the node's reaction to this NR. 312 6. Clarifying PSC's behavior in the face of multiple inputs 314 RFC6378 describes the PSC state machine. Figure 1 in section 3 shows 315 two inputs into the PSC Control logic - Local Request logic and 316 Remote PSC Request. When there is only one input into the PSC 317 Control logic - a local request or a remote request but not both - 318 the PSC Control logic decides what that input signifies and then 319 takes one or more actions, as necessary. This is what the PSC State 320 Machine in section 4.3 describes. 322 RFC6378 does not sufficiently describe the behavior in the face of 323 multiple inputs into the PSC Control Logic (one Local Request and one 324 Remote Request). This section clarifies the expected behavior. 326 There are two cases to think about when considering dual inputs into 327 the PSC Control logic. The first is when the same request is 328 presented from both local and remote sources. One example of this 329 case is a Forced Switch (FS) configured on both ends of an LSP. This 330 will result in the PSC Control logic receiving both a local FS and 331 remove FS. For convenience, this scenario is written as [L(FS), 332 R(FS)] - that is, Local(Forced Switch) and Remote(Forced Switch). 334 The second case, which is handled in exactly the same way as the 335 first, is when the two inputs into the PSC Control logic describe 336 different events. There are a number of variations on this case. 337 One example is when there is a Lockout of Protection from the Local 338 request logic and a Signal Fail on the Working path from the Remote 339 PSC Request. This is shortened to [L(LO), R(SF-W)]. 341 In both cases the question is not how the PSC Control logic decides 342 which of these is the one it acts upon. Section 4.3.2 of RFC6378 343 lists the priority order, and prioritizes the local input over the 344 remote input in case both inputs are of the same priority. So in the 345 first example it is the local SF that drives the PSC Control logic, 346 and in the second example it is the local Lockout which drives the 347 PSC Control logic. 349 The point that this section clears up is around what happens when the 350 highest priority input goes away. Consider the first case. 351 Initially, the PSC Control logic has [L(FS), R(FS)] and L(FS) is 352 driving PSC's behavior. When L(FS) is removed but R(FS) remains, 353 what does PSC do? A strict reading of the FSM would suggest that PSC 354 transition from PA:F:L into N, and at some future time (perhaps after 355 the remote request refreshes) PSC would transition from N to PA:F:R. 356 This is an unreasonable behavior, as there is no sensible 357 justification for a node behaving as if things were normal (i.e., N 358 state) when it is clear that they are not. 360 The second case is similar. If a node starts with [L(LO), R(SF-W)] 361 and the local lockout is removed, a strict reading of the state 362 machine would suggest that the node transition from UA:LO:L to N, and 363 then at some future time presumably notice the R(SF-W) and transition 364 from N to PF:W:R. As with the first case, this is clearly not a 365 useful behavior. 367 In both cases the request that was driving PSC's behavior was 368 removed. What should happen is that the PSC Control logic should, 369 upon removal of an input, immediately reevaluate all other inputs to 370 decide on the next course of action. This requires an implementation 371 to store the most recent local and remote inputs regardless of their 372 eventual use as triggers for the PSC Control Logic. 374 There is a third case. Consider a node with [L(FS), R(LO)]. At some 375 point in time the remote node replaces its Lockout request with a 376 Signal Fail on Working, so that the inputs into the PSC Control logic 377 on the receiving node go to [L(FS), R(SF-W)]. Similar to the first 378 two cases, the node should immediately reevaluate both its local and 379 remote inputs to determine the highest priority among them, and act 380 on that input accordingly. That is in fact what happens, as defined 381 in Section 4.3.3: 383 "When a LER is in a remote state, i.e.,, state transition in reaction 384 to a PSC message received from the far-end LER, and receives a new 385 PSC message from the far-end LER that indicates a contradictory 386 state, e.g., in remote Unavailable state receiving a remote FS(1,1) 387 message, then the PSC Control logic SHALL reevaluate all inputs (both 388 the local input and the remote message) as if the LER is in the 389 Normal state." 391 This section extends that paragraph to handle the first two cases. 392 The essence of the quoted paragraph is that when faced with multiple 393 inputs, PSC must reevaluate any changes as if it was in Normal state. 394 So the quoted paragraph is replaced with the following text: 396 "The PSC Control logic may simultaneously have Local and Remote 397 requests, and the highest priority of these requests ultimately 398 drives the behavior of the PSC Control logic. When this highest 399 priority request is removed or is replaced with another input, then 400 the PSC Control logic SHALL immediately reevaluate all inputs (both 401 the local input and the remote message), transitioning into a new 402 state only upon reevaluation of all inputs". 404 7. Security Considerations 406 These changes and clarifications raise no new security concerns. 408 8. IANA Considerations 410 IANA is requested to mark the value 0 in the "MPLS PSC TLV Registry" 411 as "Reserved, not to be allocated" and to update the references to 412 show [RFC6378] and [RFC-ietf-mpls-psc-updates-04]. Note that this 413 action provides documentation of an action already taken by IANA but 414 not recorded in RFC 6378. 416 9. Acknowledgements 418 The author of this document thanks Taesik Cheung, Alessandro 419 D'Alessandro, Annamaria Fulignoli, Sagar Soni, George Swallow and 420 Yaacov Weingarten for their contributions and review, and Adrian 421 Farrel for the text of Section 2. 423 10. References 425 10.1. Normative References 427 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 428 Requirement Levels", BCP 14, RFC 2119, March 1997. 430 [RFC6378] Weingarten, Y., Bryant, S., Osborne, E., Sprecher, N., and 431 A. Fulignoli, "MPLS Transport Profile (MPLS-TP) Linear 432 Protection", RFC 6378, October 2011. 434 10.2. Informative References 436 [LIAISON] ITU-T SG15, "Liaison Statement: Recommendation ITU-T 437 G.8131/Y.1382 revision - Linear protection switching for 438 MPLS-TP networks", . 441 Author's Address 443 Eric Osborne 445 Email: eric.osborne@notcom.com