idnits 2.17.1 draft-allan-nadeau-mpls-oam-frmwk-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.5 on line 334. ** The document claims conformance with section 10 of RFC 2026, but uses some RFC 3978/3979 boilerplate. As RFC 3978/3979 replaces section 10 of RFC 2026, you should not claim conformance with it if you have changed to using RFC 3978/3979 boilerplate. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack an RFC 3978 Section 5.4 (updated by RFC 4748) Copyright Line. ** The document seems to lack an RFC 3978 Section 5.4 Reference to BCP 78. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document seems to lack an RFC 3979 Section 5, para. 1 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 2 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2004) is 7160 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'REQ' is mentioned on line 69, but not defined == Unused Reference: 'RFC3031' is defined on line 345, but no explicit reference was found in the text == Unused Reference: 'ALLAN' is defined on line 349, but no explicit reference was found in the text == Unused Reference: 'MPLSREQS' is defined on line 352, but no explicit reference was found in the text == Unused Reference: 'Y1710' is defined on line 355, but no explicit reference was found in the text == Outdated reference: A later version (-06) exists of draft-allan-mpls-loadbal-05 == Outdated reference: A later version (-07) exists of draft-ietf-mpls-oam-requirements-01 Summary: 13 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft David Allan, Editor 3 Thomas D. Nadeau, Editor 4 Document: draft-allan-nadeau-mpls-oam-frmwk-00.txt 5 Category: Informational 6 Expires: March 2005 September 2004 8 A Framework for MPLS Operations 9 and Management (OAM) 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than a "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/1id-abstracts.html 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html 32 Copyright Notice 33 Copyright(C) The Internet Society (2001). All Rights Reserved. 35 Abstract 36 This document is a framework for how data plane OAM functions can be 37 applied to operations and maintenance procedures. The document is 38 structured to outline how OAM functionality can be used to assist in 39 fault management, configuration, accounting, performance management 40 and security, commonly known by the acronym FCAPS. 42 A Framework for MPLS OAM September 2004 44 Table of Contents 46 1. Introduction and Scope ........................................2 47 2. Terminology....................................................2 48 3. Fault Management...............................................2 49 3.1 Fault detection...............................................2 50 3.1.1 Enumeration and detection of types of data plane faults.....3 51 3.1.2 Timeliness..................................................5 52 3.2 Diagnosis.....................................................5 53 3.2.1 Characterization............................................5 54 3.2.2 Isolation...................................................5 55 3.3 Availability..................................................5 56 4. Configuration Management.......................................5 57 5. Administration.................................................6 58 6. Performance measurement........................................6 59 7. Security.......................................................6 60 8. Full Copyright Statement.......................................7 61 9. Intellectual Property Rights Notices...........................7 62 10. References.....................................................7 63 11. Editors Address................................................8 65 1. Introduction and Scope 67 This memo outlines in broader terms how data plane OAM functionality 68 can assist in meeting the operations and management (OAM) 69 requirements outlined in [REQ] and can apply to the operational 70 functions of fault, configuration, accounting, performance and 71 security (commonly known as FCAPS). The approach of the document is 72 to outline the requisite functionality, the potential mechanisms to 73 provide the function and the applicability of data plane OAM 74 functions. 76 2. Terminology 78 OAM Operations and Management 79 FCAPS Fault, Administration, Configuration, 80 Provisioning, and Security 81 ILM Incoming Label Map 82 NHLFE Next Hop Label Forwarding Entry 83 MIB Management Information Base 84 LSR Label Switching Router 85 RTT Round Trip Time 87 3. Fault Management 89 3.1 Fault detection 91 Fault detection encompasses identifying all causes of failure to 92 transfer information between the ingress and egress of an LSP 93 A Framework for MPLS OAM September 2004 95 ingress. This section will enumerate common failure scenarios and 96 explain how one might (or might not) detect the situation. 98 3.1.1 Enumeration and detection of types of data plane faults 100 Physical layer faults: 102 Lower layer faults are those that impact the physical layer or 103 link layer that transports MPLS between adjacent LSRs. Some 104 physical links (such as SONET/SDH) may have link layer OAM 105 functionality and detect and notify the LSR of link layer 106 faults directly. Some physical links (such as Ethernet) may not 107 have this capability and require MPLS or IP layer heartbeats to 108 detect failures. However, once detected, reaction to these 109 fault notifications is often the same as those described in the 110 first case. 112 Node failures: 114 Node failures are those that impact the forwarding capability 115 of an entire node, including its entire set of links. This can 116 be due to component failure, power outage, or reset of control 117 processor in an LSR employing a distributed architecture, etc. 119 MPLS LSP misbranching: 121 Misbranching occurs when there is a loss of synchronization 122 between the data and the control planes. This can occur due to 123 hardware failure, software failure or configuration problems. 124 It will manifest itself in one of two forms: 126 - packets belonging to a particular LSP are cross connected 127 into a an NHLFE for which there is no corresponding ILM at 128 the next downstream LSR. This can occur in cases where the 129 NHLFE entry is corrupted. Therefore the packet arrives at 130 the next LSR with a top label value for which the LSR has no 131 corresponding forwarding information, and is typically 132 dropped. This is a No Incoming Label Map (ILM) condition and 133 can be detected directly by the downstream LSR which 134 receives the incorrectly labeled packet. 136 - packets belonging to a particular LSP are cross connected 137 into an incorrect NHLFE entry for which there is a 138 corresponding ILM at the next downstream LSR, but which was 139 is associated with a different LSP. This may be detected by 140 a number of means: 141 o some or all of the misdirected traffic is not routable 142 at the egress node. 143 o Or OAM probing is able to detect the fault by detecting 144 the inconsistency between the path and the control 145 plane. 147 Discontinuities in the MPLS Encapsulation 148 A Framework for MPLS OAM September 2004 150 The forwarding path of the FEC carried by an LSP may transit 151 nodes for which MPLS is not configured. This may result in a 152 number of behaviors (most undesirable). When there was only one 153 label in the stack and the payload was IP, IP forwarding will 154 direct the packet to the correct interface. This would be the 155 same if PHP is employed. Packets with a label stack will be 156 discarded (Tom: can you confirm this for your end). 158 MTU problems 159 MTU problems occur when client traffic cannot be fragmented by 160 intermediate LSRs, and is dropped somewhere along the path of 161 the LSP. MTU problems should appear as a discrepancy in the 162 traffic count between the set of ingresses and the egresses for 163 a FEC and will appear in the corresponding MIB performance 164 tables in the transit LSRs as discarded packets. 166 TTL Mishandling 167 Some Penultimate hop LSRs may consistently process TTL expiry 168 and propagation at penultimate hop LSRs. In these cases, it is 169 possible for tools that rely on consistent processing to fail. 171 Congestion 172 Congestion occurs when the offered load on any interface 173 exceeds the link capacity for sufficient time that the 174 interface buffering is exhausted. Congestion problems will 175 appear as a discrepancy in the traffic count between the set of 176 ingresses and the egresses for a FEC and will appear in the MIB 177 performance tables in the transit LSRs as discarded packets. 179 Misordering 180 Misordering of LSP traffic occurs when incorrect or 181 inappropriate load sharing is implemented within an MPLS 182 network. Load sharing typically takes place when equal cost 183 paths exist between the ingress and egress of an LSP. In these 184 cases, traffic is split among these equal cost paths using a 185 variety of algorithms. One such algorithm relies on splitting 186 traffic between each path on a per-packet basis. When this is 187 done, it is possible for some packets along the path to be 188 delayed due to congestion or slower links, which may result in 189 packets being received out of order at the egress. Detection 190 and remedy of this situation may be left up to client 191 applications that use the LSPs. For instance, TCP is capable of 192 re-ordering packets belonging to a specific flow. Detection of 193 mis-ordering can also be determined by sending probe traffic 194 along the path and verifying that all probe traffic is indeed 195 received in the order it was transmitted. 197 LSRs do not normally implement mechanisms to detect misordering 198 of flows. 200 Payload Corruption 201 A Framework for MPLS OAM September 2004 203 Payload corruption may occur and be undetectable by LSRs. Such 204 errors are typically detected by client payload integrity 205 mechanisms. 207 3.1.2 Timeliness 208 (for a future version) 210 3.2 Diagnosis 212 3.2.1 Characterization 213 Characterization is defined as determining the forwarding path of a 214 packet (which may not be necessarily known). Characterization may be 215 performed on a working path through the network. This is done for 216 example, to determine ECMP paths, the MTU of a path, or simply to 217 know the path occupied by a specific FEC. Characterization will be 218 able to leverage mechanisms used for isolation. 220 3.2.2 Isolation 221 Isolation of a fault can occur in two forms. In the first case, the 222 local failure is detected, and the node where the failure occurred 223 is capable of issuing an alarm for such an event. The node should 224 attempt to withdraw the defective resources and/or rectify the 225 situation prior to raising an alarm. Active data plane OAM 226 mechanisms may also detect the failure conditions remotely and issue 227 their own alarms if the situation is not rectified quickly enough. 229 In the second case, the fault has not been detected locally. In this 230 case, the local node cannot raise an alarm, nor can it be expected 231 to rectify the situation. In this case, the failure may be detected 232 remotely via data plane OAM. This mechanism should also be able to 233 determine the location of the fault, perhaps on the basis of limited 234 information such as a customer complaint. This mechanism may also be 235 able to automatically remove the defective resources from and the 236 network and restore service, but should at least provide a network 237 operator with enough information by which they can perform this 238 operation. Given that detection of faults is desired to happen as 239 quickly as possible, tools which posses the ability to incrementally 240 test LSP health should be used to uncover faults. 242 3.3 Availability 244 Availability is the measure of the percentage of time that a service 245 is operating within specification. 247 MPLS has several forwarding modes (depending on the control plane 248 used). As such more than one availability models may be defined. 250 4. Configuration Management 252 Data plane OAM can assist in configuration management by providing 253 the ability to verify configuration of an LSP or of applications 254 A Framework for MPLS OAM September 2004 256 that may utilize that LSP. This would be an ad-hoc data plane probe 257 that should both verify path integrity (a complete path exists) as 258 well as verifying that the path function is synchronized with the 259 control plane. The probe would carry as part of the payload relevant 260 control plane information that the receiver would be able to compare 261 with the local control plane configuration. 263 5. Accounting 264 Ed Note: (for a future version) 266 6. Performance measurement 268 Performance measurement permits the information transfer 269 characteristics of LSPs to be measured. This falls into two 270 categories, latency and information loss. 272 Latency can be measured in two ways: one is to have precisely 273 synchronized clocks at the ingress and egress such that timestamps 274 in PDUs flowing from the ingress to the egress can be compared. The 275 other is to use an exchange of PING type PDUs that gives a round 276 trip time (RTT) measurement, and an estimate of the one way latency 277 can be inferred with some loss of precision. Use of load spreading 278 techniques such as ECMP mean that any individual RTT measurement is 279 only representative of the typical RTT for a FEC. 281 To measure information loss, a common practice is to periodically 282 read ingress and egress counters (i.e.: MIB module counters). This 283 information may also be used for offline correlation. Another common 284 practice is to send explicit probe traffic. This probe traffic can 285 also be used to measure jitter and delay. 287 7. Security 289 Support for intra-provider data plane OAM messaging does not 290 introduce any new security concerns to the MPLS architecture. 291 Though it does actually address some that already exist, i.e. 292 through rigorous defect handling operator's can offer their 293 customers a greater degree of integrity protection that their 294 traffic will not be misdelivered (for example by being able to 295 detect leaking LSP traffic from a VPN). 297 Support for inter-provider data plane OAM messaging introduces a 298 number of security concerns as by definition, portions of LSPs will 299 not be in trusted space, the provider has no control over who may 300 inject traffic into the LSP. This creates opportunity for malicious 301 or poorly behaved users to disrupt network operations. Attempts to 302 introduce filtering on target LSP OAM flows may be problematic if 303 flows are not visible to intermediate LSRs. However it may be 304 possible to interdict flows on the return path between providers (as 305 faithfulness to the forwarding path is not a return path 306 requirement) to mitigate aspects of this vulnerability. 308 A Framework for MPLS OAM September 2004 310 OAM tools may permit unauthorized or malicious users to extract 311 significant amounts of information about network configuration. This 312 would be especially true of IP based tools as in many network 313 configurations, MPLS does not typically extend to untrusted hosts, 314 but IP does. For example, TTL hiding at ingress and egress LSRs will 315 prevent external users from using TTL-based mechanisms to probe an 316 operator's network. This suggests that tools used for problem 317 diagnosis or which by design are capable of extracting significant 318 amounts of information will require authentication and authorization 319 of the originator. This may impact the scalability of such tools 320 when employed for monitoring instead of diagnosis. 322 8. Full Copyright Statement 324 Copyright (C) The Internet Society (year). This document is subject 325 to the rights, licenses and restrictions contained in BCP 78, and 326 except as set forth therein, the authors retain all their rights. 328 This document and the information contained herein are provided on 329 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 330 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE 331 INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 332 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 333 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 334 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 336 9. Intellectual Property Rights Notices. 338 By submitting this Internet-Draft, the authors certify that any 339 applicable patent or other IPR claims of which they are aware have 340 been disclosed, or will be disclosed, and any of which they become 341 aware will be disclosed, in accordance with RFC 3668. 343 10. References 345 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, 346 "Multiprotocol Label Switching Architecture", RFC 347 3031, January 2001. 349 [ALLAN] Allan, D., "Guidelines for MPLS Load Balancing", draft- 350 allan-mpls-loadbal-05.txt, IETF work in progress, October 2003 352 [MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks", 353 draft-ietf-mpls-oam-requirements-01.txt, June 2003 355 [Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM 356 Functionality for MPLS Networks" 357 A Framework for MPLS OAM September 2004 359 11. Editors Address 361 David Allan 362 Nortel Networks Phone: +1-613-763-6362 363 3500 Carling Ave. Email: dallan@nortelnetworks.com 364 Ottawa, Ontario, CANADA 366 Thomas D. Nadeau 367 Cisco Systems Phone: +1-978-936-1470 368 300 Beaver Brook Drive Email: tnadeau@cisco.com 369 Boxborough, MA 01824