idnits 2.17.1 

draft-allan-mpls-oam-frmwk-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 2003) is 7492 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Missing reference section? '1' on line 109 looks like a reference

  -- Missing reference section? 'CHANG' on line 124 looks like a reference

  -- Missing reference section? 'HEINANEN' on line 127 looks like a reference

  -- Missing reference section? 'HARRISON-REQ' on line 129 looks like a
     reference

  -- Missing reference section? 'HARRISON-MECH' on line 129 looks like a
     reference

  -- Missing reference section? 'MARTINI' on line 782 looks like a reference

  -- Missing reference section? 'Y1710' on line 1333 looks like a reference

  -- Missing reference section? 'MPLSREQS' on line 1317 looks like a reference

  -- Missing reference section? 'HIERARCHY' on line 1289 looks like a
     reference

  -- Missing reference section? 'ICMP' on line 1293 looks like a reference

  -- Missing reference section? '2547' on line 1320 looks like a reference

  -- Missing reference section? 'KOMPELLA' on line 1297 looks like a reference

  -- Missing reference section? 'DUBE' on line 1285 looks like a reference

  -- Missing reference section? 'MPLSDIFF' on line 1314 looks like a reference

  -- Missing reference section? 'ALLAN' on line 1279 looks like a reference

  -- Missing reference section? 'SWALLOW' on line 1323 looks like a reference

  -- Missing reference section? 'TTL' on line 1327 looks like a reference

  -- Missing reference section? '3429' on line 883 looks like a reference

  -- Missing reference section? 'VCCV' on line 1330 looks like a reference

  -- Missing reference section? 'ARCH' on line 1282 looks like a reference

  -- Missing reference section? 'LSR-TEST' on line 1306 looks like a reference

  -- Missing reference section? 'LSP-PING' on line 1303 looks like a reference

  -- Missing reference section? 'Y1711' on line 1336 looks like a reference

  -- Missing reference section? 'Y17FECCV' on line 1339 looks like a reference


     Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 26 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	 Internet Draft                                      David Allan(editor)
3	 Document: draft-allan-mpls-oam-frmwk-05.txt             Nortel Networks
4	 Category: Informational                                    October 2003
5	 Expires: April 2004

7	                     A Framework for MPLS Data Plane OAM

9	 Status of this Memo

11	    This document is an Internet-Draft and is in full conformance with
12	    all provisions of Section 10 of RFC2026.

14	    Internet-Drafts are working documents of the Internet Engineering
15	    Task Force (IETF), its areas, and its working groups.  Note that
16	    other groups may also distribute working documents as Internet-
17	    Drafts.

19	    Internet-Drafts are draft documents valid for a maximum of six
20	    months and may be updated, replaced, or obsoleted by other documents
21	    at any time.  It is inappropriate to use Internet-Drafts as
22	    reference material or to cite them other than as "work in progress."

24	    The list of current Internet-Drafts can be accessed at
25	         http://www.ietf.org/ietf/1id-abstracts.txt
26	    The list of Internet-Draft Shadow Directories can be accessed at
27	         http://www.ietf.org/shadow.html.

29	 Copyright Notice
30	    Copyright(C) The Internet Society (2001). All Rights Reserved.

32	 Abstract
33	    This Internet draft discusses many of the issues associated with
34	    data plane OAM for MPLS. The goal being to provide a comprehensive
35	    framework for developing tools capable of performing "in service"
36	    maintenance of LSPs. Included in this discussion is some of the
37	    implications of MPLS architecture on the ability to support fault,
38	    diagnostic and performance management OAM applications, a summary of
39	    currently specified OAM mechanisms, and a framework whereby
40	    collectively this MPLS-OAM toolset can address all aspects of the
41	    MPLS architecture.

43	 Sub-IP ID Summary

45	    (This section to be removed before publication.)

47	    WHERE DOES IT FIT IN THE PICTURE OF THE SUB-IP WORK

49	    Fits in the MPLS box.

51	    WHY IS IT TARGETED AT THIS WG
52	                  A Framework for MPLS Data Plane OAM    October 2003

54	    MPLS WG has added requirements, framework and mechanisms for OAM to
55	    its charter. This draft is a candidate framework document.

57	    JUSTIFICATION

59	    The WG should consider this document, as it discusses the design
60	    aspects of error detection and measurement for packet based MPLS
61	    LSPs.

63	 Table of Contents

65	 1.    Conventions used in this document.............................3
66	 1.    Conventions used in this document.............................3
67	 2.    Changes since the last version (to be removed on publication).3
68	 3.    Contributors..................................................3
69	 4.    Requirements..................................................4
70	 5.    Domain Concepts...............................................4
71	 6.    OAM Applications..............................................5
72	 7.    Deployment Scenarios..........................................6
73	 8.    MPLS architecture implications for OAM........................7
74	    8.1 Topology variations within an MPLS level.....................7
75	    8.1.1 Implications for Fault Management.........................10
76	    8.1.2 Implications for Performance Management...................10
77	    8.2 LSP Creation Method.........................................12
78	    8.3 Lack of Fixed Hierarchy.....................................13
79	    8.4 Use of time to live (TTL)...................................13
80	    8.5 State Association...........................................14
81	    8.6 Alarm Management............................................15
82	    8.7 Other Design Issues.........................................15
83	 9.    Ease of Implementation.......................................15
84	 10.   OAM Messaging................................................16
85	 11.   Distinguishing OAM data plane flows..........................17
86	    11.1  RFC 3429 "OAM Alert Label"................................17
87	    11.2  VCCV......................................................17
88	    11.3  PW PID....................................................17
89	 12.   The OAM Return Path..........................................17
90	 13.   Use of Hierarchy to Simplify OAM.............................19
91	 14.   Current Tools and Applicability..............................20
92	    14.1  LSP-PING (MPLS WG)........................................20
93	    14.2  Y.1711 (ITU-T SG13/Q3)....................................21
94	    14.2.1 Connectivity Verification (CV) PDU.......................22
95	    14.2.2 Fast-Failure-Detection (FFD) PDU.........................22
96	    14.1.3 Forward and Backward Defect Indication (FDI & BDI).......23
97	    14.3  Y.17fec-cv (ITU-T SG13/Q3)................................23
98	 15.   Security Considerations......................................23
99	 16.   A summary of what can be achieved............................24
100	 17.   References...................................................24
101	 18.   Editor's Address.............................................25
102	                  A Framework for MPLS Data Plane OAM    October 2003

104	 1. Conventions used in this document

106	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
107	    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
108	    this document are to be interpreted as described in RFC-2119 [1].

110	    The term MPLS "level" nominally refers to the MPLS stack level
111	    inclusive of reserved labels. In this document the term "level" is
112	    used exclusive of reserved labels, therefore the term "level" is
113	    more precisely analogous to a specific MPLS subnetwork layer
114	    instance.

116	 2. Changes since the last version (to be removed on publication)

118	    Section 11 recast from being a discussion of potential mechanisms,
119	    to being a survey of the defined mechanisms.

121	    Section 14 added which provides a survey of MPLS OAM mechanisms
122	    defined in both the IETF and the ITU-T.

124	    Reference to [CHANG] draft and discussion of reverse notification
125	    tree removed.

127	    Reference to [HEINANEN] on directory based LDP VPNs removed.

129	    Reference to [HARRISON-REQ] and [HARRISON-MECH] replaced with
130	    Y.1710 and Y.1711 respectively.

132	    [MARTINI] reference updated.

134	 3. Contributors

136	    Mina Azad
137	    Azad-Mohtaj Consulting       Phone: 1-613-722-0878
138	    Ottawa, Ontario, CANADA      Email: mohtaj@rogers.com

140	    Jerry Ash
141	    AT&T
142	    Room D5-2A01
143	    200 Laurel Avenue            Phone: +1 732-420-4578
144	    Middletown, NJ 07748, USA    Email: gash@att.com

146	    Neil Harrison
147	    BT Global Services           Email: neil.2.Harrison@bt.com

149	    Sanford Goldfless
150	    192 Fuller St                Phone:  617-738-1754
151	    Brookline MA 02446           Email:  sandy9@rcn.com

153	    Eric Davalo
154	    Maple Optical Systems
155	                  A Framework for MPLS Data Plane OAM    October 2003

157	    3200 North First Street      Phone:  408 545 3110
158	    San Jose CA 95134            Email:  edavalo@mapleoptical.com

160	    Arun Punj
161	    Marconi Communications
162	    1000 Marconi Drive,
163	    Warrandale - PA - 15086      Email: Arun.Punj@marconi.com

165	    Marcus Brunner
166	    Network Laboratories - NEC Europe Ltd.
167	    Adenauerplatz 6              Phone: +49 (0)6221/ 9051129
168	    D-69115 Heidelberg, Germany  Email: brunner@ccrle.nec.de

170	    Chou Lan Pok
171	    SBC Technology Resources, Inc.
172	    4698 Willow Road,            Phone:  +1925-598-1229
173	    Pleasanton, CA 94583         Email:  pok@tri.sbc.com

175	    Wesam Alanqar
176	    Sprint
177	    9300, Metcalf Ave,           Phone:  +1-913-534-5623
178	    Overland Park, KS 66212      Email : wesam.alanqar@mail.sprint.com

180	    M. Akber Qureshi
181	    Lucent Technologies
182	    101 Crawfords Corner Road    Phone:  +1 732 949 4791
183	    Holmdel, NJ 07733            Email:  mqureshi@lucent.com

185	    Don Fedyk
186	    Nortel Networks
187	    600 Technology Park          Phone:  +1 978 288 3041
188	    Billerica  MA 01821          EMail:  dwfedyk@nortelnetworks.com

190	 4. Requirements

192	    MPLS data-plane OAM specific requirements and a summary of
193	    requirements that have appeared in numerous PPVPN, PWE3, and MPLS
194	    documents appear in [Y1710] and [MPLSREQS]. This Internet draft
195	    discusses the implications of extending OAM across the MPLS
196	    architecture, and adds additional data-plane OAM requirements and
197	    capabilities for managing multi-provider networks. This document
198	    also broadens the scope of the requirements discussion in
199	    identifying where certain OAM applications simply cannot be
200	    implemented without modifications to current practice/architecture.

202	    Finally this draft offers a survey of the currently standardized or
203	    about to be standardized tools.

205	 5. Domain Concepts

207	    MPLS introduces a richness in layering which renders traditional
208	    definitions of 'domain' inadequate. In particular, it is noted that
209	                  A Framework for MPLS Data Plane OAM    October 2003

211	    MPLS has no fixed layered hierarchy (this is a unique property that
212	    no other technology has offered before).

214	    A provider may have MPLS peer providers, use MPLS transit from
215	    serving providers (and require MPLS or non-MPLS client transport),
216	    and offer MPLS transit to MPLS or non-MPLS clients). Further, the
217	    same provider may use a hierarchy of LSPs within their own network.
218	    This Internet Draft defines the concept of an "Operations Domain"
219	    (to cover OAM capabilities operated by a single provider) that may
220	    only be a portion of the end-to-end LSP. Operations Domain functions
221	    are an interdependent mix of control-plane, data-plane (a.k.a. user-
222	    plane), and management-plane functions.

224	    An LSP "of level m" may span numerous Operational Domains. The data
225	    plane of the LSP is a contiguous entity consisting of data plane
226	    portions of traversing operational domains. The control and
227	    management planes of these operational domains may be disjoint. The
228	    goal is to provide OAM functionality for each LSP independent of the
229	    LSP creation mechanism or payload.

231	    It is possible to have a hierarchy of operators (e.g. carriers of
232	    carriers), where overlay Operational Domains are opaque to the
233	    serving Domain. Therefore it is required that each LSP Operational
234	    Domain implement its own OAM functionality, and the OAM applications
235	    are confined to the Operational Domains traversed at level "m".

237	    Note that this concept has subtle differences with concepts of
238	    horizontal and vertical hierarchy as defined in [HIERARCHY].
239	    Vertical hierarchy usually refers to networking layer boundaries
240	    distinguished by technology. An operational domain may refer to an
241	    operator specific hierarchical subset of the LSP levels within the
242	    MPLS network and/or a horizontal partitioning within a specific LSP.
243	    Similarly there is a further way to consider the concept of
244	    operational domain and horizontal hierarchy. An operational domain
245	    may be hierarchically partitioned (e.g. OSPF "areas") but may be
246	    operationally integrated and contiguous.

248	 6. OAM Applications

250	    The purpose of having data plane LSP specific OAM transactions is to
251	    support useful OAM applications. Examples of such applications
252	    include:

254	    Fault management

256	    - On demand verification: the ability to perform connectivity tests
257	    that exercise the specific LSP and the provisioning at the ingress
258	    and egress. On demand suggests that verification may be performed on
259	    an ad-hoc basis.

261	    - Fault detection: Operators cannot expect customers to act as fault
262	    detectors, and so the ability to perform automated detection of the
263	    failure of a specific LSP is a "must have" feature (although when
264	                  A Framework for MPLS Data Plane OAM    October 2003

266	    one reviews the section on LSP creation above, one realizes it will
267	    not be ubiquitously used). Some MPLS deployment scenarios may not
268	    have a control plane or may have LSP processing components not in
269	    common with the control plane, so fault detection procedures may
270	    need to be augmented with LSP specific methods.

272	    - Fault sectionalization: The ability to efficiently determine where
273	    a failure has occurred in an LSP.  Sectionalization must be able to
274	    be performed from an arbitrary LSR along the path of the LSP.

276	    - Fault Propagation: specific MPLS deployment scenarios may not have
277	    a control-plane to propagate LSP failure information. Fault
278	    propagation has numerous forms and there are variations depending on
279	    whether the failure is in the serving layer/level or :
280	    i)  Northbound from the failed level to the management plane.
281	    ii)  Within the failed level.
282	    iii) From the failed level to its clients.
283	    iv)  Within the client level to the LSP ingress and egress either
284	    via the user or control planes.
285	    And in all cases it is the termination of a layer that performs the
286	    function.

288	    Performance management

290	    - The ability to determine whether an LSP meets certain goals with
291	    respect to latency, packet loss etc.
292	    - The ability to collect information to facilitate network
293	    engineering decisions.

295	    Of the above applications, verification, detection and
296	    sectionalization explicitly need to exercise all components of the
297	    forwarding path of the target LSP, otherwise there will be failure
298	    scenarios that cannot be detected or properly sectionalized. These
299	    applications cannot be supported properly if there are differences
300	    in handling between user traffic and OAM probes at intermediate
301	    LSRs.

303	    A separate and useful classification of the applications outlined
304	    above is to distinguish the difference between monitoring
305	    applications and diagnosis. Monitoring applications are typically
306	    unattended in operation, collect operational statistics, and upon
307	    detection of problems, must provide sufficient information to permit
308	    precise diagnosis of the problem to be performed and frequently some
309	    form of automated network response to problems. Diagnosis
310	    applications are typically attended in operation and must be able to
311	    authoritatively locate and isolate faults. The security implications
312	    of this distinction is discussed in the security considerations
313	    section.

315	 7. Deployment Scenarios
316	                  A Framework for MPLS Data Plane OAM    October 2003

318	    At the present time there are a number of MPLS deployment scenarios
319	    each with a number of subtleties from a data plane OAM perspective.
320	    Each can be viewed as a characteristic of an operational domain:

322	    The sparse model: This can be in conjunction with control plane
323	    signaling (e.g. MPLS based traffic engineering applied to an IP
324	    network) or with simple provisioned LSPs (no control plane
325	    signaling). The key feature being that the MPLS operational domain
326	    will not have any-to-any connectivity at the MPLS layer due to the
327	    sparse use of LSPs to augment the served layer connectivity. This
328	    has operational and scalability implications as OAM connectivity
329	    must be explicitly added to the model, or the operator may be
330	    obliged to depend on "layer violations" embedded in OAM mechanisms
331	    which are strictly only relevant to a different higher layer network
332	    (e.g. [ICMP]) to generate a return path.

334	    The ubiquitous model: This model generally combines MPLS, integrated
335	    routing and control to produce universal any-to-any connectivity
336	    within an operational domain. This may be combined with a hierarchy
337	    of LSPs to modify the topology presented to the client layer. This
338	    offers providers the option of utilizing the resources inherent to
339	    all planes of the Operational Domain in designing OAM functionality.

341	    These two models of MPLS connectivity can be stacked or concatenated
342	    to support numerous configurations of peering and overlay networking
343	    arrangements between providers and users. A direct inference being
344	    that an operational domain will not necessarily have knowledge of
345	    the domains above and/or below it, and in the general case far less
346	    knowledge of (and certainly less control over) its peer domains. OAM
347	    applications for LSPs of a specific level are confined to an
348	    operational domain and its data plane peers.

350	    More recently there is a tendency to overlay a L2 or L3 VPN service
351	    level on the data-plane of an operational domain, with it's own
352	    identifiers and addressing, while tunneling control information
353	    across the control plane of the operational domain using BGP-4
354	    [2547][KOMPELLA] or extended LDP discovery [MARTINI]. From a data
355	    plane OAM perspective, we would consider this to be a separate
356	    operational domain, and anticipate that it is only a matter of time
357	    before such service levels evolve to span multiple operational
358	    domains (for example, an L2 or L3 VPN that spans multiple providers,
359	    or the introduction of tandem points at the data plane of the
360	    service level).

362	 8. MPLS architecture implications for OAM

364	 8.1 Topology variations within an MPLS level

366	    There are a number of topology variations in the MPLS architecture
367	    that have OAM implications. These are:

369	    - Uni-directional and bi-directional LSPs. A uni-directional LSP
370	    only provides connectivity in one direction, and if return path
371	                  A Framework for MPLS Data Plane OAM    October 2003

373	    connectivity exists, it is an attribute of the operational domain
374	    (e.g. signaling, management or client layers), and not a unique
375	    attribute of the LSP. Bi-directional LSPs or specific return path
376	    (e.g. [DUBE]) have inherent symmetrical connectivity as an attribute
377	    of the LSP.

379	    - Multipoint-to-point (mp2p) LSPs are where a single LSP uses
380	    "merge" LSR transfer functions to provide connectivity between
381	    multiple ingress LSRs and one egress LSR (sufficient information
382	    being present in the payload to permit higher layer demultiplexing
383	    at the egress). There are a number of problems inherent to mp2p
384	    topological constructs that cannot be addressed by traditional p2p
385	    mechanisms. One issue being that for some OAM applications (e.g.
386	    data plane fault propagation) OAM flows may require visibility at
387	    merge-points to limit the impact of partial failures or congestion.

389	    "Best effort" mp2p LSPs may have fairness issues with some packet
390	    schedulers. This may complicate obtaining consistent measurements
391	    under congestion conditions. Explicitly routed mp2p LSPs with
392	    associated resource reservations are significantly more complex to
393	    engineer. The resource reservations required will be cumulative at
394	    merge points (as will jitter), and the ability to provide
395	    differentiated handling for specific ingresses is lost once any
396	    merge point is crossed. One opinion would be that the complexity and
397	    difficulty in the configuration/maintenance of ER-mp2p LSPs
398	    significantly outweighs the scalability benefits, and would not
399	    likely be deployed.

401	    - Penultimate Hop label Popping (PHP), is an optimization in the
402	    architecture in which the last LSR prior to the egress removes the
403	    redundant current MPLS label from the label stack. Therefore the
404	    ability to infer LSP specific context (OAM and other) is lost prior
405	    to reaching the final destination.

407	    MPLS does not provide for protocol multiplexing via payload
408	    identification (with the exception of the explicit IPv4 and IPv6
409	    labels). PHP requires that the final hop have a common protocol
410	    payload (typically IP) or is able to map to lower layer protocol
411	    multiplexing capability (e.g. PPP Protocol Field or Ethernet
412	    ethertype) as the ability to infer payload type from LSP label is
413	    lost.

415	    Another scenario where PHP is employed is when the egress LSR is not
416	    actually MPLS data plane capable. This has data-plane OAM
417	    implications in that any MPLS specific flows need to terminate at
418	    the PHP LSR. This requires that the PHP LSR proxies OAM functions on
419	    behalf of the egress LSR. This will introduce complexity when any
420	    type of consequent actions such as layer interworking of fault
421	    notification is required.

423	    - E-LSPs [MPLSDIFF] in which a single LSP supports multiple queuing
424	    disciplines to support multiple QoS behavior aggregates. Ability to
425	                  A Framework for MPLS Data Plane OAM    October 2003

427	    perform OAM performance functions on a "per behavior aggregate"
428	    basis is critical to managing E-LSPs.

430	    - Management plane provisioned LSPs, vs. control plane signaled
431	    LSPs. In many scenarios associated with a control plane, the
432	    topology of the LSP varies over time. This can be due to many
433	    reasons, implicit routing, dynamic set up of local repair tunnels
434	    etc. etc.

436	    - The potential existence of multiple LSPs between an ingress and an
437	    egress LSR. This can be for many reasons, L-LSPs, equal cost
438	    multipath routing etc. etc.

440	    - The potential existence of multiple next hop label forwarding
441	    entries (NHLFEs) for a single incoming label. This is the scenario
442	    whereby the incoming label map (ILM) for an incoming label switch
443	    hop (LSH) maps to an inverse multiplex of NHLFEs which may be re-
444	    merged into a common egress or have multiple egress points. The
445	    mechanism for selecting the NHLFE to use may be proprietary and is
446	    performed on a packet by packet basis. Some implementations hash
447	    both the label stack and any IP payload source and destination
448	    addresses in order to preserve flow ordering while achieving good
449	    fan out. However this means that predictability of any nested LSPs
450	    degrades in the presence of problems.

452	    OAM tools not specifically aware of this construct may produce
453	    random results (insufficient frequency of failure to trigger
454	    threshold detection), or pathologically may only test a subset of
455	    the NHLFEs impacting both the detection and diagnosis of defects.
456	    Similarly performance monitoring is impacted as packets in flight
457	    cannot accurately be accounted for. The ramifications are
458	    comprehensively discussed in [ALLAN].

460	    - Use of per-platform label space. A per-platform label has
461	    significance at a nodal level and not just an interface level. Some
462	    of the more interesting applications being the ability to create
463	    unsignalled facility backup LSPs in "bypass tunnels" [SWALLOW].
464	    Traffic arriving on multiple interfaces and/or LSP tunnels may use a
465	    common per-platform label and will have a common ILM and NHLFEs.
466	    This can have implications similar to mp2p and PHP depending on how
467	    it is used; LSP origin information is not conserved when multiple
468	    sources use a common label.

470	    - p2mp and mp2mp LSPs (a.k.a. MPLS Multicast) is for further study.
471	    At the present time what placeholders exist in the architecture for
472	    multicast treat it as a separate protocol from "unicast" MPLS (with
473	    the exception of ATM variations of MPLS).

475	    These topological variations introduce complexity when attempting to
476	    instrument OAM applications within a specific MPLS level such as
477	    performance management, fault detection, fault isolation/diagnosis,
478	    fault handling (e.g. consequent actions taken to avoid raising
479	    unnecessary alarms in client layers) and fault notification.

481	                  A Framework for MPLS Data Plane OAM    October 2003

483	 8.1.1 Implications for Fault Management

485	    mp2p, E-LSPs and PHP have implications for fault management,
486	    specifically if an LSR is required to have knowledge of both the
487	    ingress LSR and the specific LSP that an OAM message arrived on, or
488	    is expected to have knowledge of, and maintain state about the set
489	    of ingress LSRs for an LSP. OAM messaging needs mechanisms to
490	    distinguish both the ingress LSR and the specific LSP. (This ability
491	    is expressed on these terms as LSPs are typically not given globally
492	    unique identifiers, more frequently some locally administered LSR-ID
493	    is used).

495	    Connectivity verification requires testing of connectivity between
496	    all possible ingress/egress combinations. Frequently it will not be
497	    possible to infer the ingress LSR and specific LSP directly at the
498	    egress as such information may be lost at merge points in mp2p LSPs
499	    or due to PHP. This is true for both OAM messaging, and normal data
500	    plane payloads. There may be numerous reasons why an ingress-egress
501	    pair may have a plurality of LSPs between them, so the ability to
502	    distinguish the source and purpose of specific probes beyond mere
503	    knowledge of the originating LSR is required.

505	    The ability to distinguish the ingress can be achieved via modifying
506	    the OAM protocol to carry such information, or may be achieved via
507	    modifications to operational procedures such as overlaying p2p
508	    connectivity.

510	 8.1.2 Implications for Performance Management

512	    Many performance management functions can be performed by obtaining
513	    and comparing measurements taken at different points in the network.
514	    Comparing ingress and egress statistics being the simplest example
515	    (but is usually restricted to within a single domain). The key issue
516	    is ensuring that "apples-to-apples" comparison of measurements is
517	    possible. This means that all measurement points need to be able to
518	    similarly classify the traffic and performance they are measuring,
519	    and that the measurements are synchronized in time and compensate
520	    for traffic in flight between the measurement points.

522	    For example, a relatively simple technique for establishing key
523	    performance metrics would be to compare what was sent with what was
524	    received. For example in the PPP line quality monitoring (LQM)
525	    function the ingress periodically sends statistics to the egress for
526	    comparison subject to the same queuing discipline as the data plane
527	    traffic, such that traffic in flight is properly accounted for.
528	    (Note that re-ordering will introduce errors but is not expected to
529	    be frequent.)

531	    It is important to distinguish, and be able to measure, what
532	    constitutes the up and down states of an LSP.  This needs to be
533	    standardized so that there is unified treatment.  A key observation
534	    here is that QoS metrics (like loss, errored packets, delay, etc)
535	                  A Framework for MPLS Data Plane OAM    October 2003

537	    are only relevant to when the LSP is in the up-state; and so any
538	    collection of QoS measurements is suspended when the LSP enters the
539	    down-state. This requires specification of the state transitions to
540	    achieve measurement consistency, and is a pre-requisite to QoS
541	    assessment. This is a particularly important metric to operators,
542	    since customers will be expecting operators to be able to offer both
543	    QoS and availability SLAs, and so these must be differentiated and
544	    uniquely measurable.

546	    A simple ingress/egress comparison is not always possible, there is
547	    no ability to similarly classify what is being measured at the
548	    ingress and egress of an LSP. mp2p LSPs and PHP do not have a 1:1
549	    relationship between the ingress and the egress. LSPs containing
550	    ILMs that map to multiple NHLFEs introduce measurement inaccuracy as
551	    not all packets share a common queuing discipline and where this
552	    results in multiple egress points from the network, there is an
553	    inability to synchronize measurements. Partial failure of an mp2p
554	    LSP (incl. ECMP) will result in the inability to successfully
555	    collect statistics

557	    So, in addition to having to define up/down-state transitions, for
558	    successful PM the 1:1 relationship needs to be restored by either:

560	    - The mp2p/PHP LSP is modeled as one LSP for measurement. This means
561	    that measurements performed at ingress points need to be
562	    synchronized and adjusted for common LSP segments such that the
563	    results are all presented to the egress simultaneously (again
564	    correcting for traffic in flight), a technique dependent on such a
565	    high degree of synchronization would be impossible to perfect, and
566	    prone to a degree of error.

568	    - The mp2p/PHP LSP is modeled as a collection of "ingress" LSPs for
569	    measurement. This means that the egress needs to be able to maintain
570	    statistics by ingress and appropriately classify traffic
571	    measurements.

573	    Neither of the above is achievable at the present time without
574	    modifying existing operational procedures. The first approach
575	    involves treating the mp2p/PHP LSP as an aggregate, and as such it
576	    can partially fail and degrade. This complicates the establishment
577	    of performance metrics and specifying recovery procedures on errors.

579	    The second approach requires decomposing the mp2p/PHP LSP such that
580	    both payload and OAM traffic can be demultiplexed at the egress and
581	    correctly associated with "per-ingress" state. The ability to
582	    demultiplex both OAM and payload implies a common wrapper, and the
583	    net effect would be to overlay p2p connectivity on top of the
584	    merge/PHP based transport level.

586	    The existence of E-LSPs adds a wrinkle to the problem of measurement
587	    synchronization. An E-LSP may implement multiple diffserv PHBs and
588	    incorporate multiple queuing disciplines. An aggregate measurement
589	    for the entire LSP sent from ingress to egress would frequently have
590	                  A Framework for MPLS Data Plane OAM    October 2003

592	    a small margin of error when compared with an aggregate measurement
593	    taken at the egress. Separate measurement comparisons for each
594	    supported EXP code point would be required to eliminate the error.

596	    The situation is slightly different for p2p LSPs containing ILMs
597	    that map to multiple NHLFEs. If all the NHLFEs are merged back into
598	    a single entity prior to the egress, there will inherently be a
599	    degree of measurement error that modifications to operational
600	    procedure cannot correct. However there is no guarantee that this
601	    will be the case, and any individual ingress measurement may be
602	    compared with only one of several egress measurement points (either
603	    random or pathological).

605	 8.2 LSP Creation Method

607	    The ability to usefully audit the constituent components of an LSP
608	    is dependent on the technique used to create the LSP.Presently
609	    defined are provisioning, LDP, CR_LDP, RSVP-TE, and BGP.

611	    LSP creation techniques that are currently defined fall at two
612	    relative extremes:

614	    At one extreme is explicitly routed point-to-point connection
615	    between fixed ingress and egress points in the network. Explicitly
616	    routed (ER) LSPs  (today created via provisioning, CR-LDP, RSVP-TE
617	    or BGP) have a significant degree of testability as the path across
618	    the network and the egress point is fixed and knowable to a testing
619	    entity. Similarly explicit pairwise and stateful
620	    testing/measurement relationships can be set up (e.g. connectivity
621	    verification) and strict criteria for failure established.

623	    In the middle is static mp2p constructs typically signaled via BGP
624	    (e.g. RFC 2547).

626	    At the other extreme is when LSP construction is topology driven
627	    (such as dynamic "shortest path first" routing combined with LDP),
628	    whereby the details of path construction between the ingress and
629	    egress points in the network will vary over time and may involve
630	    several stages of multiplexing with traffic from other sources. The
631	    details of path construction at any given instant are not
632	    necessarily knowable to an auditing entity so any attempt to
633	    interpret the results of an audit may generate spurious results.
634	    Further, the MPLS network may only be a portion of the operational
635	    domain, and the egress point from the network for an FEC may vary
636	    over time.

638	    The testable unit in an LDP network is the FEC not the LSP, and the
639	    potential existence of a many to many relationship of ingress and
640	    egress points limits the testability of the FEC, or at least may
641	    limit the frequency of using such tests.

643	    The connectivity instantiated in a specific LSP created by a
644	    topology driven control plane signaling mechanism will recover from
645	                  A Framework for MPLS Data Plane OAM    October 2003

647	    many defects in the network. The quality of recovery typically
648	    being a function of how the network is engineered.

650	    Problems are typically detected by having MPLS connectivity fate
651	    share with the constituent physical links and routing adjacencies,
652	    and topology driven path re-arrangement will restore the
653	    connectivity (with some interruption and other side effects
654	    occurring between the initial failure and re-convergence of the
655	    network). However exclusive dependence on fate sharing for failure
656	    detection means that LSP components may have unique failure modes
657	    from which the network will not recover and can only be diagnosed
658	    reactively.

660	    As can be inferred from the above, what is required for topology
661	    driven LSPs is a test mechanism that audits forwarding policy as
662	    this is the metric by which some aspects of network performance can
663	    be measured.

665	 8.3 Lack of Fixed Hierarchy

667	    MPLS supports an arbitrary hierarchy in the form of label stacking.
668	    This is a facility that can be leveraged for OAM purposes. As an
669	    example, the section on implications for performance management has
670	    already outlined how p2p topology for PM can be overlaid on an
671	    arbitrary merged topology to add manageability of services.
672	    Similarly functions requiring sectionalization of an LSP or ability
673	    to isolate partial failure of a complex construct can be achieved by
674	    constructing the LSP as an overlay upon a concatenation of
675	    operationally significant shorter LSPs. By operationally significant
676	    we would refer to LSPs that spanned useful portions of the whole
677	    construct (e.g. a branch of an mp2p LSP, or bypassed LSRs that did
678	    not have OAM capability).

680	    This could simplify the instrumentation of level specific OAM by
681	    ensuring only e2e functions were required (as opposed to functions
682	    originating or terminating at arbitrary points in the network),
683	    while driving up the complexity of LSP establishment due to the
684	    resultant inter-level configuration issues when creating multi-level
685	    constructs with the desired manageability.

687	 8.4 Use of time to live (TTL)

689	    Experience within the IP world has suggested that TTL was a
690	    serendipitous feature that can be similarly leveraged by MPLS.

692	    However in the MPLS world, TTL suffers from inconsistent
693	    implementation depending on the link layer technology spanned by the
694	    target LSP. The existence of non-TTL capable links (e.g. MPLS/ATM)
695	    has impact on the utility of using TTL to augment the MPLS OAM
696	    toolkit. For example, use of TTL as an aid in fault sectionalization
697	    can only isolate a fault to the granularity of a non-TTL capable
698	    span of LSH or LSP segments.

700	                  A Framework for MPLS Data Plane OAM    October 2003

702	    There are other variations in TTL handling that suggest interpreting
703	    results of TTL based tests may be problematic. As outlined in [TTL]
704	    there are two models of TTL handling with different implications:

706	    - the uniform model, in which decrement of TTL is independent of the
707	    MPLS level. At the ingress point to an MPLS level, the current TTL
708	    is copied into the new top label, and at egress is copied back to
709	    the revealed top level.

711	    - the pipe and short pipe models, whereby MPLS tunnels (aka LSPs)
712	    are used to hide the intermediate MPLS nodes between LSP ingress and
713	    egress from a TTL perspective.

715	    The uniform model originates with preserving IP TTL semantics when
716	    IP traffic transits an MPLS subnetwork. The uniform model will
717	    reduce the resource consumption of routing loops, but in a correctly
718	    operating network may lead to premature discard of packets outside
719	    the operational domain they originated from (due to the existence of
720	    an arbitrary number of serving MPLS levels). Similarly when a
721	    routing loop occurs, determining the MPLS level that is the source
722	    of the problem will be difficult as there is no method to correlate
723	    it with the level where the exhaustion event occurred.

725	    The pipe model is more consistent with the operational domain model
726	    in that TTL exhaustion will only occur at a specified level and the
727	    initial values used at LSP ingress are more likely to be reflective
728	    of detecting what would genuinely constitute a routing loop.

730	    A reasonable expectation is that the uniform model would not be used
731	    outside of an operational domain.

733	    A separate issue is that it is also possible that an LSR may
734	    decrement TTL by an amount other than one as a matter of policy.
735	    This means that the results obtained via any tools that use TTL
736	    exhaustion will require some interpretation.

738	 8.5 State Association

740	    The design of OAM flows in MPLS levels that multiplex traffic from
741	    multiple sources together may introduce implementation complexity
742	    where the flows are processed. The receiver of the OAM message will
743	    need to extract information from the packet to identify the LSP and
744	    associate it with ingress and LSP specific state. If the ingress/LSP
745	    identifier in the packet is not administered by the processing node,
746	    it will be unable to optimize the implementation of the state
747	    association mechanism and will be required to perform some sort of
748	    table search.

750	    If the identifier is administered by the processing node and that
751	    node is not the originator of the probe, some mechanism will be
752	    required to distribute this information uniquely to each probe
753	    originator.

755	                  A Framework for MPLS Data Plane OAM    October 2003

757	 8.6 Alarm Management

759	    MPLS permits layers of different operational behaviors to recurse.
760	    When the alarm management paradigms differ they may not be
761	    reconcilable. For example, an LDP network has no ability to perform
762	    alarm suppression directly within the dataplane for e2e tools either
763	    used within the LDP layer or overlaid on an LDP layer that are
764	    impacted by a failure. The LDP network will recover, but the node
765	    that could report the failure may not directly participate in the
766	    recovery, therefore data plane alarm suppression mechanisms cannot
767	    be synchronized with service restoration.

769	 8.7 Other Design Issues

771	    It is desirable to make the data plane OAM implementations
772	    independent of LSP specifics. It would be desirable to have common
773	    mechanisms across p2p and mp2p LSPs, PHP or no-PHP and independent
774	    of payload and the method of LSP creation in order to minimize
775	    overall complexity. The OAM application originator should not need
776	    (as far as is practical) any knowledge of the details of LSP
777	    construction.

779	    PM may require that instrumentation of many OAM applications is only
780	    possible for p2p LSPs and therefore would only be possible for a
781	    select group of MPLS levels (e.g. overlaid service labels as per
782	    [KOMPELLA] or [MARTINI]).

784	    Fault management must be applicable across the spectrum of all label
785	    levels and LSR transfer functions.

787	    Finally, the possibility of re-ordering of OAM messaging must be
788	    considered. The design of OAM applications and messaging must be
789	    tolerant of out of order delivery and some degree of packet loss.
790	    For some applications the originator/termination will require a
791	    means to uniquely correlate requests with probe responses (including
792	    responses to mis-directed probes) or verify in sequence receipt.

794	 9. Ease of Implementation

796	    Complex functions are typically require software implementation and
797	    are not capable of handling line rate messaging. Implementations
798	    defend themselves via rate-limiting or similar load management
799	    techniques to avoid vulnerabilities to DOS attacks or simple mis-use
800	    by incompetent craftspersons. In many cases, the complexity of
801	    adding strong authentication as defense against DOS attacks may be
802	    less onerous than promiscuous processing of complex probes.

804	    Probes supporting monitoring applications gain the most benefit when
805	    they can run at line rate such that there are no concerns about
806	    processing capacity at the processing network elements. Such tests
807	    will generate predicable results (or at least not have results
808	                  A Framework for MPLS Data Plane OAM    October 2003

810	    degraded when network elements are under stress) and automated
811	    procedures can be designed around such mechanisms. MP2P LSPs are an
812	    exemplary case where egress processing of probes may be required to
813	    support probes from an arbitrary number of unsynchronized sources.

815	    Messaging mechanisms to perform diagnostic tests (once a fault has
816	    been authoritatively established) tend to be more complex and
817	    software intensive. Diagnostic tests are frequently used by
818	    craftspersons, and can be more tolerant of things like discard due
819	    to rate limiting.

821	 10.OAM Messaging

823	    OAM should be decoupled from user behavior to ensure consistent OAM
824	    functional behavior (under any traffic conditions) and avoid the use
825	    of customers as guinea pigs.

827	    At the specific LSP level, support of OAM applications require
828	    messages that flow between three entities, the LSP ingress, the
829	    intervening network and the LSP egress. As an LSP is unidirectional,
830	    it should be self evident that OAM applications that require
831	    feedback in the reverse direction will have such communication occur
832	    either at the specific LSP level, or some data plane LSP level in
833	    the operational domain, or one of the other planes (control or
834	    management) of the operational domain.

836	    The set of possible individual transactions (plus examples of their
837	    utility) is as follows:

839	    LSP specific data-plane transactions:
840	    - ingress to egress
841	        applicability: verification, fault detection, performance
842	        management
843	    - ingress to network
844	        message will terminate at an intermediate LSR traversed by
845	        the LSP.
846	        Applicability: sectionalization from source
847	    - network to egress
848	        message is inserted into the LSP at an intermediate node
849	        and terminates at the LSP egress LSR.
850	        Applicability: sectionalization from arbitrary point in an
851	                       LSP.
852	    - Network to network
853	        Applicability: sectionalization from arbitrary point in an
854	                       LSP.

856	    Feedback transactions
857	    - egress to ingress
858	        applicability: verification, fault detection.
859	    - egress to network
860	        flow originates at the LSP egress and terminates at
861	        an
862	        intermediate node traversed by the LSP.

864	                  A Framework for MPLS Data Plane OAM    October 2003

866	           Applicability: sectionalization from arbitrary point in an
867	                       LSP.
868	    - network to ingress
869	        flow will originate at an intermediate LSR traversed by
870	        the LSP and terminate at the LSP source.
871	        Applicability: sectionalization from ingress.
872	    - network to network
873	        Applicability: sectionalization from arbitrary point in an
874	        LSP.

876	 11.Distinguishing OAM data plane flows

878	    MPLS provides several mechanisms for distinguishing OAM data plane
879	    flows.

881	 11.1 RFC 3429 "OAM Alert Label"

883	    RFC 3429 [3429] defines the OAM alert label which identifies that
884	    the payload is a Y.1711 PDU. The OAM alert label may be used for p2p
885	    LSPs that do not encounter lower layer ECMP, and for Y.17fec-cv
886	    PDUs.

888	 11.2 VCCV

890	    [VCCV] provides procedures for PEs to negotiate an OAM protocol to
891	    be multiplexed with payload over a PW, and defines a bit in the PW
892	    header which indicates when the PW PDU contains OAM flows or payload
893	    flows. The purpose is to carry IP based OAM protocols (LSP-PING,
894	    ICMP etc.) opaque to any ECMP mechanisms

896	 11.3 PW PID

898	    [ARCH] defines a PW PID which permits OAM protocols to be
899	    multiplexed with a PW in a form whereby they self identify to the
900	    far end PE. This can be used to transport Y.1711 or Y.17fec-cv PDUs
901	    opaquely over an ECMP infrastructure such that they properly fate
902	    share with the PW.

904	 12.The OAM Return Path

906	    The ability to use OAM applications such as single-ended monitoring
907	    of both directions from one end, or to support applications such as
908	    protection switching in a 1/N:M case, requires a return path to the
909	    LSP ingress. This enhances the scalability and reliability of some
910	    OAM applications as data plane OAM can function as a closed system.
911	    A specific example being use of a loopback where the only place
912	    state and timing need be maintained is at the loopback originator.

914	    This requires a return path to complete the loop between the "target
915	    LSP" and the OAM application originator. This will permit reliable
916	    transaction flows to be implemented that impose minimal state on the
917	    network.

919	                  A Framework for MPLS Data Plane OAM    October 2003

921	    For the few OAM applications that require a return path, the return
922	    path can be tolerant of being topologically disjoint with the target
923	    LSP (providing the differential delays are small, ie <<1s),
924	    reachability of the application originator being the only hard
925	    requirement. Similarly, different OAM applications will have
926	    different return path requirements, and a hybrid of using all the
927	    planes of the operational domain (according to the application) may
928	    be significantly simpler and more operationally tractable than
929	    significant modifications to current usage to fill in connectivity
930	    gaps at the specific label level.

932	    This is a key point, LSPs are currently by definition uni-
933	    directional (bi-directional to date being a construct of multiple
934	    uni-directional LSPs), so for any non-ubiquitous deployment of MPLS
935	    connectivity, some modification of operational procedure to provide
936	    for OAM messaging will be required for the few applications that
937	    need it. Strict symmetry of connectivity at a specific label level
938	    is not guaranteed.

940	    In any type of sparse usage scenario (e.g. provisioned LSPs or use
941	    exclusively for TE) there will not be an inherent any-to-any
942	    connectivity in the data plane, and there may not be a control plane
943	    signaling system.

945	    In an implicit MPLS topology (e.g. LDP DU), any to any connectivity
946	    will typically exist, or will be easily available with minor
947	    alterations to operational procedure (LSRs advertise selves as
948	    FECs). This would continue to be true for an integrated model in
949	    which TE and an implicit topology were combined.

951	    In any type of multi-provider MPLS topology, the scenario is more
952	    complex, as for numerous reasons a provider may not wish to
953	    provision/advertise external connectivity to their LSRs. Similarly,
954	    for security reasons, providers may wish to apply some degree of
955	    policy or filtering of OAM traffic at operational domain boundaries.

957	    Data plane OAM messaging should be designed to leverage as much
958	    "free connectivity" as can be obtained in the network, while
959	    ensuring the constructs have sufficient extensibility to ensure the
960	    corner cases are covered.

962	    Within the operational domain of a single provider, it is relatively
963	    easy to envision that a combination of data-plane, and control plane
964	    functionality will ensure that a data-plane return path is
965	    frequently available (although it may be topologically disjoint from
966	    the target LSP). This is less so for inter provider scenarios. Here
967	    there are a number of potential obstacles such as:
968	    - disjoint control plane
969	    - disjoint addressing plan
970	    - requirements for policy enforcement and security
971	    - impacts to scalability of ubiquitous visibility of individual LSRs
972	    across multiple operational domains.

974	                  A Framework for MPLS Data Plane OAM    October 2003

976	    There are a number of approaches to providing inter-domain OAM
977	    connectivity, the following is a brief commentary on each:

979	    1) Reverse Notification Tree (a.k.a using bi-directional LSP)
980	    In this method, each LSP has a dedicated reverse path - i.e. the
981	    reverse path is established and associated with the LSP at the LSP
982	    setup time. This requires binding the reverse path to each LSR that
983	    is traversed by the LSP. This method is not scaleable, as it
984	    requires doubling the number of LSPs in the network. Moreover each
985	    reverse path requires its own OAM.

987	    2) Global OAM capability
988	    Similar to IP v4 to IP v6 migration methodology, this method
989	    proposes use of a global operations domain with control-plane, data-
990	    plane, and management-plane that interact with control-plane, data-
991	    plane, and management-plane of individual operations domains. This
992	    method requires commitment and buy-in from all network operators.

994	    3) Inter-domain OAM gateway
995	    This method proposes use of a gateway like functions at LSRs that
996	    are at operations domain boundaries. OAM gateway like functions
997	    includes capabilities to correlate OAM information from one
998	    operations domain to another and permit inter-carrier
999	    sectionalization problems to be resolved.

1001	    Specification of an inter-domain OAM gateway capability would appear
1002	    to be the most realistic solution.

1004	 13. Use of Hierarchy to Simplify OAM

1006	    MPLS hierarchy provides a mechanism to address a number of OAM
1007	    issues.

1009	    Section 5 outlined domain concepts that nominally would require
1010	    intermediate nodes to inspect and possibly process OAM PDUs. MPLS
1011	    does not currently have this capability. However frequently an
1012	    operational domain is self contained and may easily be instantiated
1013	    as a distinct MPLS layer which transports the domain spanning MPLS
1014	    client. This permits the domain specific components of the LSP to be
1015	    uniquely instrumented using end to end tools and provides security
1016	    benefits in that the provider specific components of the domain are
1017	    logically isolated from the clients.

1019	    Section 7.1 outlined some of the impacts of MPLS topological
1020	    constructs that multiplexed traffic from multiple sources together.
1021	    Section 7.5 identified additional complexity modifying protocols to
1022	    address state mapping for OAM purposes could entail. The key issue
1023	    identified is that for fault management, OAM protocol design would
1024	    permit mp2p and PHP to be addressed (but at a specific
1025	    implementation cost), but this is not possible for performance
1026	    management, in particular if ingress specific traffic counts are
1027	    required.

1029	                  A Framework for MPLS Data Plane OAM    October 2003

1031	    Rather than attempting OAM protocol design to address what by
1032	    definition will be an incomplete solution, it would be useful to
1033	    define a common mechanism to demultiplex both MPLS level payload and
1034	    OAM flows. The common mechanism ideally would be in the form of a
1035	    wrapper that included an egress administered ingress identifier.

1037	    One instantiation of such a wrapper would be a p2p MPLS label. The
1038	    mechanisms exist for label distribution (in the form of extended LDP
1039	    discovery), and LSPs are already passively instrumented (e.g. packet
1040	    and byte counts). Similar benefits are obtained when the
1041	    implementation is extended into the use of probe messages. State
1042	    association at the egress becomes simple in that the state is
1043	    associated directly with the incoming label (and can be obtained by
1044	    augmenting the ILM lookup).

1046	    The use of p2p overlays is one method of instrumenting mp2p and PHP
1047	    LSPs that addresses all the issues outlined in section 7. It also
1048	    significantly simplifies OAM protocol design and implementation.

1050	 14.Current Tools and Applicability

1052	    A number of OAM tools have been specified by both the IETF and the
1053	    ITU-T.

1055	 14.1    LSP-PING (MPLS WG)

1057	    LSP-PING is designed to be retrofitted to existing deployed
1058	    networks and to exercise all functionality currently deployed. In
1059	    order to do so, the design trade off is that detection or diagnosis
1060	    of a problem may take an arbitrary number of transactions.

1062	    Protocol complexity is tolerated as initial implementations will be
1063	    in software. Protocol complexity manifests itself in the form of
1064	    TLV encoding of key information (FEC stack elements, and downstream
1065	    LSR label map. Future functionality may be added to the protocol
1066	    via the definition of additional Type-Length-Value (TLV)
1067	    information elements.

1069	    Aspects of the protocol design would permit a sparse subset to be
1070	    handled in hardware (exact pattern match on the PDU). For example,
1071	    in a VPN application, pinging a PE is facilitated by limiting the
1072	    number of FECs at any level in the stack to one. Presumably an
1073	    implementation of probe handling that matched on a ping of the PE
1074	    loopback address could be optimized for that specific case.

1076	    LSP-PING permits a uni-directional path to be tested from a single
1077	    point, but depends on a reliable return path in order to propagate
1078	    the test results back to the originating LSR. Therefore the
1079	    protocol is designed to tolerate degrees of ambiguity in individual
1080	    test results. Failure of an individual ping response may be due to
1081	    any of several causes:
1082	         - Forwarding path failure (including partial failure of ECMP
1083	                  A Framework for MPLS Data Plane OAM    October 2003

1085	           or other load balancing constructs)
1086	         - Return path failure
1087	         - Port rate limiting at the egress
1088	         - Port rate limiting at the ping origin
1089	         - Congestive loss in the network

1091	    And to deal with this ping supports several features to allow
1092	    ambiguity to be eliminated via having the ingress perform
1093	    variations of the original transaction:
1094	         - Probe sequencing to permit both ingress and egress to detect
1095	         gaps in probe sequences.
1096	         - Return path may be specified permitting data plane and
1097	         control plane problems to be distinguished.
1098	         - Destination address may be manipulated to exercise payload
1099	         sensitive ECMP implementations

1101	    LSP-PING generally assumes PHP at the egress and that any specific
1102	    LSP binding at the egress point of probe processing may not exist.
1103	    From the perspective of reliable fault detection this is a minor
1104	    issue as the use of a non-routable destination address limits any
1105	    untested modes of failure. However this does alter the granularity
1106	    of useful verification, as probe contents must be checked with the
1107	    set of FECs associated with the LSR, rather than simply the set
1108	    specifically associated with the LSP of interest. When testing a
1109	    label stack for a VPN PE, the number of individual transactions
1110	    required may be quite large as the number of FEC elements supported
1111	    by the PE can be considerable.

1113	    LSP-PING permits a label stack. For PW and VPN application, PHP may
1114	    be employed by the PE such that PWs and VPN labels may not be
1115	    directly tested (hence the FEC stack to permit transport or PSN
1116	    probes to proxy verification for the transported application).

1118	    LSP-PING has a traceroute mode that can extract a significant
1119	    amount of information w.r.t. network configuration. Specifically
1120	    all details of path construction for a given FEC (note that LSP-
1121	    PING will most likely need to be augmented with authentication and
1122	    authorization capability in the long term).

1124	    Modes of use for LSP ping are being defined [LSR-TEST] that leverage
1125	    TTL decrement to bound the scope of any individual test.

1127	 14.2 Y.1711 (ITU-T SG13/Q3)

1129	    Y.1711 is focused on fault/alarm management and availability
1130	    measurement for P2P LSPs. The major design objective of Y.1711 as
1131	    it currently stands is automatic defect detection and handling.  A
1132	    secondary goal is to be able to measure availability. It trades
1133	    precision in fault isolation in return for simplified defect
1134	    detection/handling capability (frequently referred to as "bounded
1135	    detection time"). Y.1711 PDUs have a small number of fixed fields
1136	    in order to minimize parsing and processing overhead.

1138	                  A Framework for MPLS Data Plane OAM    October 2003

1140	    Message processing is primarily performed at the egress such that
1141	    for uni-directional LSPs, there is minimal ambiguity in detecting
1142	    failure.  This is also required to take the appropriate consequent
1143	    actions, eg to inform higher layer clients of lower layer failures
1144	    and thus avoid generating alarm storms in inappropriate places, or
1145	    to suppress traffic if a security compromise is indicated (ie
1146	    traffic arriving from the wrong source).

1148	    Probe processing provides a simple "pass/fail" indication and
1149	    sufficient information to permit a craftsperson to initiate
1150	    diagnosis. It is dependent on other tools to perform specific
1151	    diagnosis and isolation of problems.

1153	    Y.1711 is not designed to extract information from the network as
1154	    to configuration and layout of network components. It does not
1155	    currently define any path tracing functionality and only operates
1156	    on LSP endpoints.

1158	    A corollary of the above, is that only LSP end points have any role
1159	    in OAM processing, and the Y.1711 PDUs pass transparently through
1160	    intermediate nodes.

1162	    Y.1711 depends on some degree of ubiquitous deployment at the edge
1163	    to maximize coverage of fault detection.

1165	    Y.1711 is primarily focused on tunnel end points. However core LSRs
1166	    may add significant value by implementing a specific subset of
1167	    Y.1711: FDI generation for P2P LSPs to provide alarm suppression
1168	    and fault notification to the edge devices when failures in the
1169	    core occur.

1171	 14.2.1  Connectivity Verification (CV) PDU

1173	    The CV PDU is used as a heartbeat mechanism to verify connectivity
1174	    between the LSP ingress and egress. Frequent injection of CV probes
1175	    is a prerequisite for consistent/deterministic defect
1176	    detection/handling and availability measurement. Injection of CV
1177	    probes into LSPs from multiple sources (MP2P possibly with ECMP) is
1178	    assumed to result in arrival rates at the LSP egress bursting at
1179	    line rate.

1181	 14.2.2  Fast-Failure-Detection (FFD) PDU

1183	    The FFD PDU also provides a heartbeat mechanism similar to CV PDU
1184	    but at a much faster rate. Y.1711 suggests that a LSP can be
1185	    provisioned either with CV PDU or FFD PDU. CV PDU provides failure
1186	    detection in order of 3 seconds whereas FFD PDU when provisioned
1187	    can improve the failure detection time to 100 msecs range. FFD PDU
1188	    can be selectively provisioned on LSPs requiring fast failure
1189	    detection.

1191	                  A Framework for MPLS Data Plane OAM    October 2003

1193	 14.1.3 Forward and Backward Defect Indication (FDI & BDI)

1195	    The CV probe is augmented with defect notification PDUs, FDI for
1196	    the forward direction, and BDI for the reverse direction. These are
1197	    used for alarm suppression and control of performance measurement
1198	    functions. BDI has limited applicability given that most LSPs are
1199	    uni-directional, however it is very useful for interworking OAM
1200	    with bi-directional PW clients (e.g. ATM).

1202	 14.3    Y.17fec-cv (ITU-T SG13/Q3)

1204	    A slightly more sophisticated probe type based upon Y.1711 protocol
1205	    mechanisms is the Forwarding Equivalence Class Connectivity
1206	    Verification (FEC-CV) PDU. FEC-CV, can carry aggregated LSP
1207	    information (in the form of a bloom filter) such that a significant
1208	    amount of configuration information can be verified in a single
1209	    transaction. This is generally in the form of FEC information that
1210	    functions as a functional description of the LSP. Simple boolean
1211	    operations on the bloom filter at the LSP egress can be used to
1212	    detect misbranching while being tolerant of inbound filtering and
1213	    other artifacts of network operations. The PDU can adapt to new
1214	    applications via defining new coding rules for the FEC information,
1215	    but no not require any changes to the actual PDU processing.

1217	    Y.17fec-cv is designed to complement existing link and node failure
1218	    detection mechanisms by filling a fault detection gap in the MPLS
1219	    OAM toolset as part of an overall operational framework. Unlike the
1220	    Y.1711 CV or LSP-PING, it is not a self contained mechanism for
1221	    detection of all faults or performing availability assessment.

1223	 15.Security Considerations

1225	    Support for intra-provider data plane OAM messaging does not
1226	    introduce any new security concerns to the MPLS architecture.
1227	    Though it does actually address some that already exist, i.e.
1228	    through rigorous defect handling operator's can offer their
1229	    customers a greater degree of integrity protection that their
1230	    traffic will not be misdelivered (for example by being able to
1231	    detect leaking LSP traffic from a VPN).

1233	    Support for inter-provider data plane OAM messaging introduces a
1234	    number of security concerns as by definition, portions of LSPs will
1235	    not be in trusted space, the provider has no control over who may
1236	    inject traffic into the LSP. This creates opportunity for malicious
1237	    or poorly behaved users to disrupt network operations. Attempts to
1238	    introduce filtering on target LSP OAM flows may be problematic if
1239	    flows are not visible to intermediate LSRs. However it may be
1240	    possible to interdict flows on the return path between providers (as
1241	    faithfulness to the forwarding path is not a return path
1242	    requirement) to mitigate aspects of this vulnerability.

1244	                  A Framework for MPLS Data Plane OAM    October 2003

1246	    OAM tools may permit unauthorized or malicious users to extract
1247	    significant amounts of information about network configuration. This
1248	    would be especially true of IP based tools as in many network
1249	    configurations, MPLS does not typically extend to untrusted hosts,
1250	    but IP does. This suggests that tools used for problem diagnosis or
1251	    which by design are capable of extracting significant amounts of
1252	    information will require authentication and authorization of the
1253	    originator. This may impact the scalability of such tools when
1254	    employed for monitoring instead of diagnosis.

1256	 16. A summary of what can be achieved.

1258	    This draft identifies useful MPLS OAM capability that potentially
1259	    could be provided via data plane OAM functions. In particular with
1260	    respect to automatic fault detection and failure handling.

1262	    This draft suggests that it may be possible to provide this
1263	    capability for any level in the label stack either by instrumenting
1264	    that level, or instrumenting an overlay and provides an overview of
1265	    the tools available to do so.

1267	    This draft also identifies that many aspects of performance
1268	    management are intractable for some MPLS topological constructs. Any
1269	    type of comparative measurement between an ingress and the egress of
1270	    an LSP requires a 1:1 cardinality, or the ability of the egress to
1271	    uniquely determine the ingress for each measured unit of
1272	    communication, something that LSP merge, PHP and possible use of per
1273	    platform label space at the measured LSP level undermine. Again a
1274	    potential solution is to instrument a p2p overlay where such
1275	    detailed measurements are required, and otherwise unavailable.

1277	 17. References

1279	    [ALLAN] Allan, D., "Guidelines for MPLS Load Balancing", draft-
1280	      allan-mpls-loadbal-05.txt, IETF work in progress, October 2003

1282	    [ARCH] Bryant et.al. "PWE3 Architecture", draft-ietf-pwe3-arch-
1283	      06.txt, IETF work in progress, October 2003

1285	    [DUBE] Dube, R., Costa, M. "Bi-directional LSPs for classical
1286	      MPLS", draft-dube-bidirectional-lsp-00.txt, IETF work in
1287	      progress, July 2002

1289	    [HIERARCHY] Lai et.al. "Network Hierarchy and Multilayer
1290	      Survivability", draft-ietf-tewg-restore-hierarchy-00.txt, IETF
1291	      Work in Progress, September 2001

1293	    [ICMP] Bonica et. al. "ICMP Extensions for MultiProtocol Label
1294	      Switching", draft-ietf-mpls-icmp-02.txt,
1295	      IETF Work in Progress, August 2000.

1297	    [KOMPELLA] Kompella et.al. "MPLS-based Layer 2 VPNs",
1298	      draft-kompella-mpls-l2vpn-02.txt, IETF Work in Progress,
1299	                  A Framework for MPLS Data Plane OAM    October 2003

1301	      December 2000

1303	    [LSP-PING] Pan et.al. "Detecting Data Plane Liveliness in MPLS",
1304	      draft-ietf-mpls-lsp-ping-03, IETF work in progress, June 2003

1306	    [LSR-TEST] Swallow et.al., "Label Switching Router Self-Test",
1307	      draft-ietf-mpls-lsr-self-test-00.txt, IETF Work in Progress,
1308	      October 2003

1310	    [MARTINI]Martini et.al. "Pseudowire Setup and Maintenance using
1311	      LDP", draft-ietf-pwe3-control-protocol-04.txt, IETF Work in
1312	      Progress, October 2003

1314	    [MPLSDIFF] Le Faucheur et.al. "MPLS Support of Differentiated
1315	      Services", IETF RFC 3270, May 2002

1317	    [MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks",
1318	      draft-ietf-mpls-oam-requirements-01.txt, June 2003

1320	    [2547] Rosen, E. Rekhter, Y., "BGP/MPLS VPNs", IETF RFC 2547,
1321	      March 1999

1323	    [SWALLOW] Swallow, G. and Goguen, R., "RSVP Label Allocation for
1324	      Backup Tunnels", draft-swallow-rsvp-bypass-label-01.txt,
1325	      November 2000

1327	    [TTL] Agarwal, P., and Akyol, B., "TTL Processing in MPLS Networks",
1328	      IETF RFC 3443, January 2003

1330	    [VCCV] Nadeau et.al., "Pseudo Wire (PW) Virtual Circuit Connection
1331	      Verification (VCCV)", draft-ietf-pwe3-vccv-00.txt, July 2003

1333	    [Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM
1334	      Functionality for MPLS Networks"

1336	    [Y1711] ITU-T Recommendation Y.1711(2002), "OAM Mechanism for MPLS
1337	      Networks"

1339	    [Y17FECCV] ITU-T Draft Recommendation Y.17fec-cv, "Misbranching
1340	      Detection in MPLS Networks", Temporary Document TD25rev1 (WP3/13),
1341	      July 2003

1343	 18. Editor's Address

1345	    David Allan
1346	    Nortel Networks              Phone: 1-613-763-6362
1347	    3500 Carling Ave.            Email: dallan@nortelnetworks.com
1348	    Ottawa, Ontario, CANADA