idnits 2.17.1 

draft-ietf-mpls-oam-frmwk-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 470.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 439.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 446.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 454.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 2005) is 6735 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-07) exists of
     draft-ietf-mpls-oam-requirements-05


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	 Internet Draft                                   David Allan, Editor
2	 Document: draft-ietf-mpls-oam-frmwk-05.txt           Nortel Networks
3	                                             Thomas D. Nadeau, Editor
4	                                                   Cisco Systems, Inc.
5	 Category: Informational
6	 Expires: May 2006                                      November 2005

8	                 A Framework for MPLS Operations
9	                       and Management (OAM)

11	 Status of this Memo

13	   By submitting this Internet-Draft, each author represents that
14	   any applicable patent or other IPR claims of which he or she is
15	   aware have been or will be disclosed, and any of which he or she
16	   becomes aware will be disclosed, in accordance with Section 6 of
17	   BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as
22	   Internet-Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six
25	   months and may be updated, replaced, or obsoleted by other
26	   documents at any time.  It is inappropriate to use
27	   Internet-Drafts as reference material or to cite them other than
28	   as "work in progress."

30	   The list of current Internet-Drafts can be accessed at
31	   http://www.ietf.org/ietf/1id-abstracts.txt.

33	   The list of Internet-Draft Shadow Directories can be accessed at
34	   http://www.ietf.org/shadow.html.

36	 Abstract
37	    This document is a framework for how data plane protocols can
38	    be applied to operations and maintenance procedures for
39	    Multi-Protocol Label Switching. The document is structured to
40	    outline how Operations and Management functionality can be used to
41	    assist in fault management, configuration, accounting, performance
42	    management and security, commonly known by the acronym FCAPS.

44	 Table of Contents
45	 1.   Introduction ...................................................2
46	 2.   Terminology.....................................................2
47	 3.   Fault Management................................................3
48	    3.1 Fault detection...............................................3
49	    3.1.1 Enumeration and detection of types of data plane faults.....3
50	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

52	    3.1.2 Timeliness..................................................5
53	    3.2 Diagnosis.....................................................6
54	    3.2.1 Characterization............................................6
55	    3.2.2 Isolation...................................................6
56	    3.3 Availability..................................................7
57	 4.    Configuration Management.......................................7
58	 5.    Accounting Management..........................................7
59	 6.    Performance Management.........................................7
60	 7.    Security Management............................................8
61	 8.    IANA Considerations ...........................................8
62	 9.    Security Considerations .......................................8
63	 10.   Intellectual Property Statement................................8
64	 11.   Copyright statement............................................9
65	 12.   Acknowledgments ...............................................9
66	 13.   References.....................................................9
67	 13.1  Normative References ..........................................9
68	 13.2  Informative References ........................................9
69	 14.   Authors' Address..............................................10

71	 1. Introduction

73	    This memo outlines in broader terms how data plane protocols
74	    can assist in meeting the operations and management (OAM)
75	    requirements outlined in [MPLSREQS] and [Y1710] and can apply to
76	    the management functions of fault, configuration, accounting,
77	    performance and security (commonly known as FCAPS) for MPLS networks
78	    as defined in [RFC3031]. The approach of the document is to outline
79	    functionality, the potential mechanisms to provide the function and
80	    the required applicability of data plane OAM functions. Included
81	    in the discussion are security issues specific to use of tools
82	    within a provider domain and use for inter provider LSPs.

84	 2. Terminology

86	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
87	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
88	   document are to be interpreted as described in [RFC2119].

90	    OAM          Operations and Management
91	    FCAPS        Fault management, Configuration management,
92	                 Administration management, Performance
93	                 management, and Security management
94	    FEC          Forwarding Equivalence Class
95	    ILM          Incoming Label Map
96	    NHLFE        Next Hop Label Forwarding Entry
97	    MIB          Management Information Base
98	    LSR          Label Switching Router
99	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

101	    RTT          Round Trip Time

103	 3. Fault Management

105	 3.1 Fault detection

107	    Fault detection encompasses the identification of all data
108	    plane failures between the ingress and egress of an LSP.
109	    This section will enumerate common failure scenarios and
110	    explain how one might (or might not) detect the situation.

112	 3.1.1 Enumeration and detection of types of data plane faults

114	    Lower layer faults:

116	         Lower layer faults are those in the physical or virtual link
117	         that impact the transport of MPLS labeled packets between
118	         adjacent LSRs at the specific level of interest. Some physical
119	         links (such as SONET/SDH) may have link layer OAM functionality
120	         and detect and notify the LSR of link layer faults directly.
121	         Some physical links (such as Ethernet) may not have this
122	         capability and require MPLS or IP layer heartbeats to detect
123	         failures. However, once detected, reaction to these fault
124	         notifications is often the same as those described in the first
125	         case.

127	    Node failures:

129	         Node failures are those that impact the forwarding capability
130	         of a node component, including its entire set of links. This
131	         can be due to component failure, power outage, or reset of
132	         control processor in an LSR employing a distributed
133	         architecture, etc.

135	    MPLS LSP mis-forwarding:

137	         Mis-forwarding occurs when there is a loss of synchronization
138	         between the data and the control planes in one or more nodes.
139	         This can occur due to hardware failure, software failure or
140	         configuration problems.

142	         It will manifest itself in one of two forms:

144	         - packets belonging to a particular LSP are cross-connected
145	           into an NHLFE for which there is no corresponding ILM at
146	           the next downstream LSR. This can occur in cases where the
147	           NHLFE entry is corrupted. Therefore the packet arrives at
148	           the next LSR with a top label value for which the LSR has no
149	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

151	           corresponding forwarding information, and is typically
152	           dropped. This is a No Incoming Label Map (No ILM) condition
153	           and can be detected directly by the downstream LSR which
154	           receives the incorrectly labeled packet.

156	         - packets belonging to a particular LSP are cross-connected
157	           into an incorrect NHLFE entry for which there is a
158	           corresponding ILM at the next downstream LSR, but is
159	           associated with a different LSP. This may be detected by
160	           a number of means:
161	              o some or all of the misdirected traffic is not routable
162	                at the egress node.
163	              o Or OAM probing is able to detect the fault by detecting
164	                the inconsistency between the data path and the control
165	                plane state.

167	    Discontinuities in the MPLS Encapsulation
168	         The forwarding path of the FEC carried by an LSP may transit
169	         nodes or links for which MPLS is not configured. This may
170	         result in a number of behaviors which are undesirable and not
171	         easily detected
172	         - if exposed payload is not routable at the LSR resulting in
173	         silent discard OR
174	         - the exposed MPLS label was not offered by the LSR which may
175	         result in either silent discard or mis-forwarding

177	         Alternately the payload may be routable and packets
178	         successfully delivered but bypasses associated MPLS
179	         instrumentation and tools.

181	    MTU problems
182	         MTU problems occur when client traffic cannot be fragmented by
183	         intermediate LSRs, and is dropped somewhere along the path of
184	         the LSP. MTU problems should appear as a discrepancy in the
185	         traffic count between the set of ingress LSRs and the egress
186	         LSRs for a FEC and will appear in the corresponding MPLS MIB
187	         performance tables in the transit LSRs as discarded packets.

189	    TTL Mishandling
190	         The implementation of TTL handling is inconsistent at
191	         penultimate hop LSRs. Tools that rely on consistent TTL
192	         processing may produce inconsistent results in any given
193	         network.

195	    Congestion
196	         Congestion occurs when the offered load on any interface
197	         exceeds the link capacity for sufficient time that the
198	         interface buffering is exhausted. Congestion problems will
199	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

201	         appear as a discrepancy in the traffic count between the set of
202	         ingress LSRs and the egress LSRs for a FEC and will appear in
203	         the MPLS MIB performance tables in the transit LSRs as
204	         discarded packets.

206	    Mis-ordering
207	         Mis-ordering of LSP traffic occurs when incorrect or
208	         inappropriate load sharing is implemented within an MPLS
209	         network. Load sharing typically takes place when equal cost
210	         paths exist between the ingress and egress of an LSP. In these
211	         cases, traffic is split among these equal cost paths using a
212	         variety of algorithms. One such algorithm relies on splitting
213	         traffic between each path on a per-packet basis. When this is
214	         done, it is possible for some packets along the path to be
215	         delayed due to congestion or slower links, which may result in
216	         packets being received out of order at the egress. Detection
217	         and remedy of this situation may be left up to client
218	         applications that use the LSPs. For instance, TCP is capable of
219	         re-ordering packets belonging to a specific flow (although this
220	         may result in re-transmission of some of the mis-ordered
221	         packets).

223	         Detection of mis-ordering can also be determined by sending
224	         probe traffic along the path and verifying that all probe
225	         traffic is indeed received in the order it was transmitted.
226	         This will only detect truly pathological problems as
227	         mis-ordering typically is an insufficiently predictable and
228	         repeatable problem.

230	         LSRs do not normally implement mechanisms to detect
231	         mis-ordering of flows.

233	    Payload Corruption
234	         Payload corruption may occur and be undetectable by LSRs. Such
235	         errors are typically detected by client payload integrity
236	         mechanisms.

238	 3.1.2 Timeliness

240	    The design of SLAs and management support systems requires that
241	    ample headroom be alloted in terms of their processing capabilities
242	    in order to process and handle all necessary fault conditions
243	    within the bounds stipulated in the SLA. This includes planning for
244	    event handling using a time budget which takes into account the
245	    over-all SLA and time to address any defects which arise. However,
246	    it is possible that some fault conditions may surpass this budget
247	    due their catastrophic nature (e.g.: fibre cut) or due to
248	    incorrect planning of the time processing budget.

250	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

252	        ^    --------------
253	        |    |           ^
254	        |    |           |----  Time to notify NOC + process/correct
255	  SLA   |    |           v      defect
256	  Max - |    -------------
257	  Time  |    |           ^
258	        |    |           |-----  Time to diagnose/isolate/correct
259	        |    |           v
260	        v    -------------

262	        Figure 1: Fault Correction Budget

264	    In figure 1, we represent the overall fault correction time budget
265	    by the maximum time as specified in an SLA for the service in
266	    question. This time is then divided into two subsections, the first
267	    encompassing the total time required to detect a fault and notify an
268	    operator (or optionally automatically correct the defect). This
269	    section may have an explicit maximum time to detect defects arising
270	    from either the application or a need to do alarm management (i.e.:
271	    suppression) and this will be reflected in the frequency of OAM
272	    execution. The second section indicates the time required to notify
273	    the operational systems used to diagnose, isolate and correct the
274	    defect (if they cannot be corrected automatically).

276	 3.2 Diagnosis

278	 3.2.1 Characterization

280	    Characterization is defined as determining the forwarding path of a
281	    packet (which may not be necessarily known). Characterization may be
282	    performed on a working path through the network. This is done for
283	    example, to determine ECMP paths, the MTU of a path, or simply to
284	    know the path occupied by a specific FEC. Characterization will be
285	    able to leverage mechanisms used for isolation.

287	 3.2.2 Isolation

289	    Isolation of a fault can occur in two forms. In the first case, the
290	    local failure is detected, and the node where the failure occurred
291	    is capable of issuing an alarm for such an event. The node should
292	    attempt to withdraw the defective resources and/or rectify the
293	    situation prior to raising an alarm. Active data plane OAM
294	    mechanisms may also detect the failure conditions remotely and issue
295	    their own alarms if the situation is not rectified quickly enough.

297	    In the second case, the fault has not been detected locally. In this
298	    case, the local node cannot raise an alarm, nor can it be expected
299	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

301	    to rectify the situation. In this case, the failure may be detected
302	    remotely via data plane OAM.  This mechanism should also be able to
303	    determine the location of the fault, perhaps on the basis of limited
304	    information such as a customer complaint. This mechanism may also be
305	    able to automatically remove the defective resources from the
306	    network and restore service, but should at least provide a network
307	    operator with enough information by which they can perform this
308	    operation. Given that detection of faults is desired to happen as
309	    quickly as possible, tools which posses the ability to incrementally
310	    test LSP health should be used to uncover faults.

312	 3.3 Availability

314	    Availability is the measure of the percentage of time that a service
315	    is operating within specification, often specified by an SLA.

317	    MPLS has several forwarding modes (depending on the control plane
318	    used). As such more than one model may be defined and require more
319	    than one measurement technique.

321	 4.  Configuration Management

323	    Data plane OAM can assist in configuration management by providing
324	    the ability to verify the configuration of an LSP or of applications
325	    utilizing that LSP. This would be an ad-hoc data plane probe
326	    that should both verify path integrity (a complete path exists) as
327	    well as verifying that the path function is synchronized with the
328	    control plane. The probe would carry as part of the payload relevant
329	    control plane information that the receiver would be able to compare
330	    with the local control plane configuration.

332	 5. Accounting

334	    The requirements for accounting in MPLS networks as specified in
335	    [MPLSREQS] do not place any requirements on data plane OAM.

337	 6.  Performance Management

339	    Performance management permits the information transfer
340	    characteristics of LSPs to be measured, perhaps in order to
341	    compare against an SLA. This falls into two categories, latency
342	    (where jitter is considered a variation in latency) and information
343	    loss.

345	    Latency can be measured in two ways: one is to have precisely
346	    synchronized clocks at the ingress and egress such that time-stamps
347	    in PDUs flowing from the ingress to the egress can be compared. The
348	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

350	    other is to use an exchange of PING type PDUs that gives a round
351	    trip time (RTT) measurement, and an estimate of the one way latency
352	    can be inferred with some loss of precision. Use of load spreading
353	    techniques such as ECMP mean that any individual RTT measurement is
354	    only representative of the typical RTT for a FEC.

356	    To measure information loss, a common practice is to periodically
357	    read ingress and egress counters (i.e.: MIB module counters). This
358	    information may also be used for offline correlation. Another common
359	    practice is to send explicit probe traffic which traverses the data
360	    plane path in question. This probe traffic can also be used to
361	    measure jitter and delay.

363	 7. Security Management

365	    Providing a secure OAM environment is required if MPLS specific
366	    network mechanisms are to be used successfully. To this end,
367	    operators have a number of options when deploying network mechanisms
368	    including simply filtering OAM messages at the edge of the MPLS
369	    network. Malicious users should not be able to use non-MPLS
370	    interfaces to insert MPLS specific OAM transactions. Provider
371	    initiated OAM transactions should be able to be blocked from leaking
372	    outside the MPLS cloud.

374	    Finally, if a provider does wish to allow OAM messages to flow into
375	    (or through) their networks, for example, in a multi-provider
376	    deployment, authentication and authorization is required to prevent
377	    malicious and/or unauthorized access. Also, given that MPLS networks
378	    often run IP simultaneously, similar requirements apply to any
379	    native IP OAM network mechanisms in use. Therefore, authentication
380	    and authorization for OAM technologies is something that MUST be
381	    considered when designing network mechanisms which satisfy the
382	    framework presented in this document.

384	    OAM messaging can address some existing security concerns with the
385	    MPLS architecture. i.e. through rigorous defect handling operator's
386	    can offer their customers a greater degree of integrity protection
387	    that their traffic will not be incorrectly delivered (for example by
388	    being able to detect leaking LSP traffic from a VPN).

390	    Support for inter-provider data plane OAM messaging introduces a
391	    number of security concerns as by definition, portions of LSPs will
392	    not be within a single provider's network, the provider has no
393	    control over who may inject traffic into the LSP which can be
394	    exploited for denial of service attacks. OAM PDUs are not
395	    explicitly identified in the MPLS header and therefore are not
396	    typically inspected by transit LSRs. This creates opportunity for
397	    malicious or poorly behaved users to disrupt network operations.

399	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

401	    Attempts to introduce filtering on target LSP OAM flows may be
402	    problematic if flows are not visible to intermediate LSRs. However
403	    it may be possible to interdict flows on the return path between
404	    providers (as faithfulness to the forwarding path is to a return
405	    path requirement) to mitigate aspects of this vulnerability.

407	    OAM tools may permit unauthorized or malicious users to extract
408	    significant amounts of information about network configuration. This
409	    would be especially true of IP based tools as in many network
410	    configurations, MPLS does not typically extend to untrusted hosts,
411	    but IP does. For example, TTL hiding at ingress and egress LSRs will
412	    prevent external users from using TTL-based mechanisms to probe an
413	    operator's network. This suggests that tools used for problem
414	    diagnosis or which by design are capable of extracting significant
415	    amounts of information will require authentication and authorization
416	    of the originator. This may impact the scalability of such tools
417	    when employed for monitoring instead of diagnosis.

419	8. IANA Considerations

421	   This document does not contain any IANA considerations.

423	9. Security Considerations

425	   This document describes a framework for MPLS Operations and
426	   Management. Although this document discusses and addresses some
427	   security concerns in section 7 above, it does not introduce any
428	   new security concerns.

430	10. Intellectual Property Statement

432	   The IETF takes no position regarding the validity or scope of any
433	   Intellectual Property Rights or other rights that might be claimed to
434	   pertain to the implementation or use of the technology described in
435	   this document or the extent to which any license under such rights
436	   might or might not be available; nor does it represent that it has
437	   made any independent effort to identify any such rights.  Information
438	   on the procedures with respect to rights in RFC documents can be
439	   found in BCP 78 and BCP 79.

441	   Copies of IPR disclosures made to the IETF Secretariat and any
442	   assurances of licenses to be made available, or the result of an
443	   attempt made to obtain a general license or permission for the use of
444	   such proprietary rights by implementers or users of this
445	   specification can be obtained from the IETF on-line IPR repository at
446	   http://www.ietf.org/ipr.

448	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

450	   The IETF invites any interested party to bring to its attention any
451	   copyrights, patents or patent applications, or other proprietary
452	   rights that may cover technology that may be required to implement
453	   this standard.  Please address the information to the IETF at
454	   ietf-ipr@ietf.org.

456	11. Copyright Statement

458	   Copyright (C) The Internet Society (2005).

460	   This document is subject to the rights, licenses and restrictions
461	   contained in BCP 78, and except as set forth therein, the authors
462	   retain all their rights.

464	   This document and the information contained herein are provided on an
465	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
466	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
467	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
468	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
469	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
470	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

472	12. Acknowledgments

474	   The editors would like to thank Monique Morrow from Cisco Systems,
475	   and Harmen van Der Linde from AT&T for their valuable review comments
476	   on this document.

478	13. References

480	13.1 Normative References

482	    [RFC2119]  Bradner, S., "Key Words for use in RFCs to Indicate
483	               Requirement Levels", BCP 14, RFC 2119, March 1997.

485	    [RFC3031] Rosen, E., Viswanathan, A., and R. Callon,
486	              "Multiprotocol Label Switching Architecture", RFC
487	              3031, January 2001.

489	    [MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks",
490	               draft-ietf-mpls-oam-requirements-05.txt, November 2004

492	    [Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM
493	            Functionality for MPLS Networks"

495	13.2 Informative References
496	             draft-ietf-mpls-oam-frmwk-05            December 6, 2005

498	14. Authors' Addresses

500	    David Allan
501	    Nortel Networks              Phone: +1-613-763-6362
502	    3500 Carling Ave.            Email: dallan@nortelnetworks.com
503	    Ottawa, Ontario, CANADA

505	    Thomas D. Nadeau
506	    Cisco Systems                Phone: +1-978-936-1470
507	    300 Beaver Brook Drive       Email: tnadeau@cisco.com
508	    Boxborough, MA 01824