idnits 2.17.1 draft-ietf-mpls-oam-frmwk-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5 on line 470. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 439. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 446. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 454. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 2005) is 6735 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-07) exists of draft-ietf-mpls-oam-requirements-05 Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft David Allan, Editor 2 Document: draft-ietf-mpls-oam-frmwk-05.txt Nortel Networks 3 Thomas D. Nadeau, Editor 4 Cisco Systems, Inc. 5 Category: Informational 6 Expires: May 2006 November 2005 8 A Framework for MPLS Operations 9 and Management (OAM) 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that 14 any applicable patent or other IPR claims of which he or she is 15 aware have been or will be disclosed, and any of which he or she 16 becomes aware will be disclosed, in accordance with Section 6 of 17 BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as 22 Internet-Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six 25 months and may be updated, replaced, or obsoleted by other 26 documents at any time. It is inappropriate to use 27 Internet-Drafts as reference material or to cite them other than 28 as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Abstract 37 This document is a framework for how data plane protocols can 38 be applied to operations and maintenance procedures for 39 Multi-Protocol Label Switching. The document is structured to 40 outline how Operations and Management functionality can be used to 41 assist in fault management, configuration, accounting, performance 42 management and security, commonly known by the acronym FCAPS. 44 Table of Contents 45 1. Introduction ...................................................2 46 2. Terminology.....................................................2 47 3. Fault Management................................................3 48 3.1 Fault detection...............................................3 49 3.1.1 Enumeration and detection of types of data plane faults.....3 50 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 52 3.1.2 Timeliness..................................................5 53 3.2 Diagnosis.....................................................6 54 3.2.1 Characterization............................................6 55 3.2.2 Isolation...................................................6 56 3.3 Availability..................................................7 57 4. Configuration Management.......................................7 58 5. Accounting Management..........................................7 59 6. Performance Management.........................................7 60 7. Security Management............................................8 61 8. IANA Considerations ...........................................8 62 9. Security Considerations .......................................8 63 10. Intellectual Property Statement................................8 64 11. Copyright statement............................................9 65 12. Acknowledgments ...............................................9 66 13. References.....................................................9 67 13.1 Normative References ..........................................9 68 13.2 Informative References ........................................9 69 14. Authors' Address..............................................10 71 1. Introduction 73 This memo outlines in broader terms how data plane protocols 74 can assist in meeting the operations and management (OAM) 75 requirements outlined in [MPLSREQS] and [Y1710] and can apply to 76 the management functions of fault, configuration, accounting, 77 performance and security (commonly known as FCAPS) for MPLS networks 78 as defined in [RFC3031]. The approach of the document is to outline 79 functionality, the potential mechanisms to provide the function and 80 the required applicability of data plane OAM functions. Included 81 in the discussion are security issues specific to use of tools 82 within a provider domain and use for inter provider LSPs. 84 2. Terminology 86 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 87 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 88 document are to be interpreted as described in [RFC2119]. 90 OAM Operations and Management 91 FCAPS Fault management, Configuration management, 92 Administration management, Performance 93 management, and Security management 94 FEC Forwarding Equivalence Class 95 ILM Incoming Label Map 96 NHLFE Next Hop Label Forwarding Entry 97 MIB Management Information Base 98 LSR Label Switching Router 99 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 101 RTT Round Trip Time 103 3. Fault Management 105 3.1 Fault detection 107 Fault detection encompasses the identification of all data 108 plane failures between the ingress and egress of an LSP. 109 This section will enumerate common failure scenarios and 110 explain how one might (or might not) detect the situation. 112 3.1.1 Enumeration and detection of types of data plane faults 114 Lower layer faults: 116 Lower layer faults are those in the physical or virtual link 117 that impact the transport of MPLS labeled packets between 118 adjacent LSRs at the specific level of interest. Some physical 119 links (such as SONET/SDH) may have link layer OAM functionality 120 and detect and notify the LSR of link layer faults directly. 121 Some physical links (such as Ethernet) may not have this 122 capability and require MPLS or IP layer heartbeats to detect 123 failures. However, once detected, reaction to these fault 124 notifications is often the same as those described in the first 125 case. 127 Node failures: 129 Node failures are those that impact the forwarding capability 130 of a node component, including its entire set of links. This 131 can be due to component failure, power outage, or reset of 132 control processor in an LSR employing a distributed 133 architecture, etc. 135 MPLS LSP mis-forwarding: 137 Mis-forwarding occurs when there is a loss of synchronization 138 between the data and the control planes in one or more nodes. 139 This can occur due to hardware failure, software failure or 140 configuration problems. 142 It will manifest itself in one of two forms: 144 - packets belonging to a particular LSP are cross-connected 145 into an NHLFE for which there is no corresponding ILM at 146 the next downstream LSR. This can occur in cases where the 147 NHLFE entry is corrupted. Therefore the packet arrives at 148 the next LSR with a top label value for which the LSR has no 149 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 151 corresponding forwarding information, and is typically 152 dropped. This is a No Incoming Label Map (No ILM) condition 153 and can be detected directly by the downstream LSR which 154 receives the incorrectly labeled packet. 156 - packets belonging to a particular LSP are cross-connected 157 into an incorrect NHLFE entry for which there is a 158 corresponding ILM at the next downstream LSR, but is 159 associated with a different LSP. This may be detected by 160 a number of means: 161 o some or all of the misdirected traffic is not routable 162 at the egress node. 163 o Or OAM probing is able to detect the fault by detecting 164 the inconsistency between the data path and the control 165 plane state. 167 Discontinuities in the MPLS Encapsulation 168 The forwarding path of the FEC carried by an LSP may transit 169 nodes or links for which MPLS is not configured. This may 170 result in a number of behaviors which are undesirable and not 171 easily detected 172 - if exposed payload is not routable at the LSR resulting in 173 silent discard OR 174 - the exposed MPLS label was not offered by the LSR which may 175 result in either silent discard or mis-forwarding 177 Alternately the payload may be routable and packets 178 successfully delivered but bypasses associated MPLS 179 instrumentation and tools. 181 MTU problems 182 MTU problems occur when client traffic cannot be fragmented by 183 intermediate LSRs, and is dropped somewhere along the path of 184 the LSP. MTU problems should appear as a discrepancy in the 185 traffic count between the set of ingress LSRs and the egress 186 LSRs for a FEC and will appear in the corresponding MPLS MIB 187 performance tables in the transit LSRs as discarded packets. 189 TTL Mishandling 190 The implementation of TTL handling is inconsistent at 191 penultimate hop LSRs. Tools that rely on consistent TTL 192 processing may produce inconsistent results in any given 193 network. 195 Congestion 196 Congestion occurs when the offered load on any interface 197 exceeds the link capacity for sufficient time that the 198 interface buffering is exhausted. Congestion problems will 199 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 201 appear as a discrepancy in the traffic count between the set of 202 ingress LSRs and the egress LSRs for a FEC and will appear in 203 the MPLS MIB performance tables in the transit LSRs as 204 discarded packets. 206 Mis-ordering 207 Mis-ordering of LSP traffic occurs when incorrect or 208 inappropriate load sharing is implemented within an MPLS 209 network. Load sharing typically takes place when equal cost 210 paths exist between the ingress and egress of an LSP. In these 211 cases, traffic is split among these equal cost paths using a 212 variety of algorithms. One such algorithm relies on splitting 213 traffic between each path on a per-packet basis. When this is 214 done, it is possible for some packets along the path to be 215 delayed due to congestion or slower links, which may result in 216 packets being received out of order at the egress. Detection 217 and remedy of this situation may be left up to client 218 applications that use the LSPs. For instance, TCP is capable of 219 re-ordering packets belonging to a specific flow (although this 220 may result in re-transmission of some of the mis-ordered 221 packets). 223 Detection of mis-ordering can also be determined by sending 224 probe traffic along the path and verifying that all probe 225 traffic is indeed received in the order it was transmitted. 226 This will only detect truly pathological problems as 227 mis-ordering typically is an insufficiently predictable and 228 repeatable problem. 230 LSRs do not normally implement mechanisms to detect 231 mis-ordering of flows. 233 Payload Corruption 234 Payload corruption may occur and be undetectable by LSRs. Such 235 errors are typically detected by client payload integrity 236 mechanisms. 238 3.1.2 Timeliness 240 The design of SLAs and management support systems requires that 241 ample headroom be alloted in terms of their processing capabilities 242 in order to process and handle all necessary fault conditions 243 within the bounds stipulated in the SLA. This includes planning for 244 event handling using a time budget which takes into account the 245 over-all SLA and time to address any defects which arise. However, 246 it is possible that some fault conditions may surpass this budget 247 due their catastrophic nature (e.g.: fibre cut) or due to 248 incorrect planning of the time processing budget. 250 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 252 ^ -------------- 253 | | ^ 254 | | |---- Time to notify NOC + process/correct 255 SLA | | v defect 256 Max - | ------------- 257 Time | | ^ 258 | | |----- Time to diagnose/isolate/correct 259 | | v 260 v ------------- 262 Figure 1: Fault Correction Budget 264 In figure 1, we represent the overall fault correction time budget 265 by the maximum time as specified in an SLA for the service in 266 question. This time is then divided into two subsections, the first 267 encompassing the total time required to detect a fault and notify an 268 operator (or optionally automatically correct the defect). This 269 section may have an explicit maximum time to detect defects arising 270 from either the application or a need to do alarm management (i.e.: 271 suppression) and this will be reflected in the frequency of OAM 272 execution. The second section indicates the time required to notify 273 the operational systems used to diagnose, isolate and correct the 274 defect (if they cannot be corrected automatically). 276 3.2 Diagnosis 278 3.2.1 Characterization 280 Characterization is defined as determining the forwarding path of a 281 packet (which may not be necessarily known). Characterization may be 282 performed on a working path through the network. This is done for 283 example, to determine ECMP paths, the MTU of a path, or simply to 284 know the path occupied by a specific FEC. Characterization will be 285 able to leverage mechanisms used for isolation. 287 3.2.2 Isolation 289 Isolation of a fault can occur in two forms. In the first case, the 290 local failure is detected, and the node where the failure occurred 291 is capable of issuing an alarm for such an event. The node should 292 attempt to withdraw the defective resources and/or rectify the 293 situation prior to raising an alarm. Active data plane OAM 294 mechanisms may also detect the failure conditions remotely and issue 295 their own alarms if the situation is not rectified quickly enough. 297 In the second case, the fault has not been detected locally. In this 298 case, the local node cannot raise an alarm, nor can it be expected 299 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 301 to rectify the situation. In this case, the failure may be detected 302 remotely via data plane OAM. This mechanism should also be able to 303 determine the location of the fault, perhaps on the basis of limited 304 information such as a customer complaint. This mechanism may also be 305 able to automatically remove the defective resources from the 306 network and restore service, but should at least provide a network 307 operator with enough information by which they can perform this 308 operation. Given that detection of faults is desired to happen as 309 quickly as possible, tools which posses the ability to incrementally 310 test LSP health should be used to uncover faults. 312 3.3 Availability 314 Availability is the measure of the percentage of time that a service 315 is operating within specification, often specified by an SLA. 317 MPLS has several forwarding modes (depending on the control plane 318 used). As such more than one model may be defined and require more 319 than one measurement technique. 321 4. Configuration Management 323 Data plane OAM can assist in configuration management by providing 324 the ability to verify the configuration of an LSP or of applications 325 utilizing that LSP. This would be an ad-hoc data plane probe 326 that should both verify path integrity (a complete path exists) as 327 well as verifying that the path function is synchronized with the 328 control plane. The probe would carry as part of the payload relevant 329 control plane information that the receiver would be able to compare 330 with the local control plane configuration. 332 5. Accounting 334 The requirements for accounting in MPLS networks as specified in 335 [MPLSREQS] do not place any requirements on data plane OAM. 337 6. Performance Management 339 Performance management permits the information transfer 340 characteristics of LSPs to be measured, perhaps in order to 341 compare against an SLA. This falls into two categories, latency 342 (where jitter is considered a variation in latency) and information 343 loss. 345 Latency can be measured in two ways: one is to have precisely 346 synchronized clocks at the ingress and egress such that time-stamps 347 in PDUs flowing from the ingress to the egress can be compared. The 348 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 350 other is to use an exchange of PING type PDUs that gives a round 351 trip time (RTT) measurement, and an estimate of the one way latency 352 can be inferred with some loss of precision. Use of load spreading 353 techniques such as ECMP mean that any individual RTT measurement is 354 only representative of the typical RTT for a FEC. 356 To measure information loss, a common practice is to periodically 357 read ingress and egress counters (i.e.: MIB module counters). This 358 information may also be used for offline correlation. Another common 359 practice is to send explicit probe traffic which traverses the data 360 plane path in question. This probe traffic can also be used to 361 measure jitter and delay. 363 7. Security Management 365 Providing a secure OAM environment is required if MPLS specific 366 network mechanisms are to be used successfully. To this end, 367 operators have a number of options when deploying network mechanisms 368 including simply filtering OAM messages at the edge of the MPLS 369 network. Malicious users should not be able to use non-MPLS 370 interfaces to insert MPLS specific OAM transactions. Provider 371 initiated OAM transactions should be able to be blocked from leaking 372 outside the MPLS cloud. 374 Finally, if a provider does wish to allow OAM messages to flow into 375 (or through) their networks, for example, in a multi-provider 376 deployment, authentication and authorization is required to prevent 377 malicious and/or unauthorized access. Also, given that MPLS networks 378 often run IP simultaneously, similar requirements apply to any 379 native IP OAM network mechanisms in use. Therefore, authentication 380 and authorization for OAM technologies is something that MUST be 381 considered when designing network mechanisms which satisfy the 382 framework presented in this document. 384 OAM messaging can address some existing security concerns with the 385 MPLS architecture. i.e. through rigorous defect handling operator's 386 can offer their customers a greater degree of integrity protection 387 that their traffic will not be incorrectly delivered (for example by 388 being able to detect leaking LSP traffic from a VPN). 390 Support for inter-provider data plane OAM messaging introduces a 391 number of security concerns as by definition, portions of LSPs will 392 not be within a single provider's network, the provider has no 393 control over who may inject traffic into the LSP which can be 394 exploited for denial of service attacks. OAM PDUs are not 395 explicitly identified in the MPLS header and therefore are not 396 typically inspected by transit LSRs. This creates opportunity for 397 malicious or poorly behaved users to disrupt network operations. 399 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 401 Attempts to introduce filtering on target LSP OAM flows may be 402 problematic if flows are not visible to intermediate LSRs. However 403 it may be possible to interdict flows on the return path between 404 providers (as faithfulness to the forwarding path is to a return 405 path requirement) to mitigate aspects of this vulnerability. 407 OAM tools may permit unauthorized or malicious users to extract 408 significant amounts of information about network configuration. This 409 would be especially true of IP based tools as in many network 410 configurations, MPLS does not typically extend to untrusted hosts, 411 but IP does. For example, TTL hiding at ingress and egress LSRs will 412 prevent external users from using TTL-based mechanisms to probe an 413 operator's network. This suggests that tools used for problem 414 diagnosis or which by design are capable of extracting significant 415 amounts of information will require authentication and authorization 416 of the originator. This may impact the scalability of such tools 417 when employed for monitoring instead of diagnosis. 419 8. IANA Considerations 421 This document does not contain any IANA considerations. 423 9. Security Considerations 425 This document describes a framework for MPLS Operations and 426 Management. Although this document discusses and addresses some 427 security concerns in section 7 above, it does not introduce any 428 new security concerns. 430 10. Intellectual Property Statement 432 The IETF takes no position regarding the validity or scope of any 433 Intellectual Property Rights or other rights that might be claimed to 434 pertain to the implementation or use of the technology described in 435 this document or the extent to which any license under such rights 436 might or might not be available; nor does it represent that it has 437 made any independent effort to identify any such rights. Information 438 on the procedures with respect to rights in RFC documents can be 439 found in BCP 78 and BCP 79. 441 Copies of IPR disclosures made to the IETF Secretariat and any 442 assurances of licenses to be made available, or the result of an 443 attempt made to obtain a general license or permission for the use of 444 such proprietary rights by implementers or users of this 445 specification can be obtained from the IETF on-line IPR repository at 446 http://www.ietf.org/ipr. 448 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 450 The IETF invites any interested party to bring to its attention any 451 copyrights, patents or patent applications, or other proprietary 452 rights that may cover technology that may be required to implement 453 this standard. Please address the information to the IETF at 454 ietf-ipr@ietf.org. 456 11. Copyright Statement 458 Copyright (C) The Internet Society (2005). 460 This document is subject to the rights, licenses and restrictions 461 contained in BCP 78, and except as set forth therein, the authors 462 retain all their rights. 464 This document and the information contained herein are provided on an 465 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 466 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 467 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 468 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 469 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 470 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 472 12. Acknowledgments 474 The editors would like to thank Monique Morrow from Cisco Systems, 475 and Harmen van Der Linde from AT&T for their valuable review comments 476 on this document. 478 13. References 480 13.1 Normative References 482 [RFC2119] Bradner, S., "Key Words for use in RFCs to Indicate 483 Requirement Levels", BCP 14, RFC 2119, March 1997. 485 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, 486 "Multiprotocol Label Switching Architecture", RFC 487 3031, January 2001. 489 [MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks", 490 draft-ietf-mpls-oam-requirements-05.txt, November 2004 492 [Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM 493 Functionality for MPLS Networks" 495 13.2 Informative References 496 draft-ietf-mpls-oam-frmwk-05 December 6, 2005 498 14. Authors' Addresses 500 David Allan 501 Nortel Networks Phone: +1-613-763-6362 502 3500 Carling Ave. Email: dallan@nortelnetworks.com 503 Ottawa, Ontario, CANADA 505 Thomas D. Nadeau 506 Cisco Systems Phone: +1-978-936-1470 507 300 Beaver Brook Drive Email: tnadeau@cisco.com 508 Boxborough, MA 01824