idnits 2.17.1 draft-allan-mpls-oam-frmwk-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 2003) is 7492 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Missing reference section? '1' on line 109 looks like a reference -- Missing reference section? 'CHANG' on line 124 looks like a reference -- Missing reference section? 'HEINANEN' on line 127 looks like a reference -- Missing reference section? 'HARRISON-REQ' on line 129 looks like a reference -- Missing reference section? 'HARRISON-MECH' on line 129 looks like a reference -- Missing reference section? 'MARTINI' on line 782 looks like a reference -- Missing reference section? 'Y1710' on line 1333 looks like a reference -- Missing reference section? 'MPLSREQS' on line 1317 looks like a reference -- Missing reference section? 'HIERARCHY' on line 1289 looks like a reference -- Missing reference section? 'ICMP' on line 1293 looks like a reference -- Missing reference section? '2547' on line 1320 looks like a reference -- Missing reference section? 'KOMPELLA' on line 1297 looks like a reference -- Missing reference section? 'DUBE' on line 1285 looks like a reference -- Missing reference section? 'MPLSDIFF' on line 1314 looks like a reference -- Missing reference section? 'ALLAN' on line 1279 looks like a reference -- Missing reference section? 'SWALLOW' on line 1323 looks like a reference -- Missing reference section? 'TTL' on line 1327 looks like a reference -- Missing reference section? '3429' on line 883 looks like a reference -- Missing reference section? 'VCCV' on line 1330 looks like a reference -- Missing reference section? 'ARCH' on line 1282 looks like a reference -- Missing reference section? 'LSR-TEST' on line 1306 looks like a reference -- Missing reference section? 'LSP-PING' on line 1303 looks like a reference -- Missing reference section? 'Y1711' on line 1336 looks like a reference -- Missing reference section? 'Y17FECCV' on line 1339 looks like a reference Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 26 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft David Allan(editor) 3 Document: draft-allan-mpls-oam-frmwk-05.txt Nortel Networks 4 Category: Informational October 2003 5 Expires: April 2004 7 A Framework for MPLS Data Plane OAM 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other documents 21 at any time. It is inappropriate to use Internet-Drafts as 22 reference material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 Copyright Notice 30 Copyright(C) The Internet Society (2001). All Rights Reserved. 32 Abstract 33 This Internet draft discusses many of the issues associated with 34 data plane OAM for MPLS. The goal being to provide a comprehensive 35 framework for developing tools capable of performing "in service" 36 maintenance of LSPs. Included in this discussion is some of the 37 implications of MPLS architecture on the ability to support fault, 38 diagnostic and performance management OAM applications, a summary of 39 currently specified OAM mechanisms, and a framework whereby 40 collectively this MPLS-OAM toolset can address all aspects of the 41 MPLS architecture. 43 Sub-IP ID Summary 45 (This section to be removed before publication.) 47 WHERE DOES IT FIT IN THE PICTURE OF THE SUB-IP WORK 49 Fits in the MPLS box. 51 WHY IS IT TARGETED AT THIS WG 52 A Framework for MPLS Data Plane OAM October 2003 54 MPLS WG has added requirements, framework and mechanisms for OAM to 55 its charter. This draft is a candidate framework document. 57 JUSTIFICATION 59 The WG should consider this document, as it discusses the design 60 aspects of error detection and measurement for packet based MPLS 61 LSPs. 63 Table of Contents 65 1. Conventions used in this document.............................3 66 1. Conventions used in this document.............................3 67 2. Changes since the last version (to be removed on publication).3 68 3. Contributors..................................................3 69 4. Requirements..................................................4 70 5. Domain Concepts...............................................4 71 6. OAM Applications..............................................5 72 7. Deployment Scenarios..........................................6 73 8. MPLS architecture implications for OAM........................7 74 8.1 Topology variations within an MPLS level.....................7 75 8.1.1 Implications for Fault Management.........................10 76 8.1.2 Implications for Performance Management...................10 77 8.2 LSP Creation Method.........................................12 78 8.3 Lack of Fixed Hierarchy.....................................13 79 8.4 Use of time to live (TTL)...................................13 80 8.5 State Association...........................................14 81 8.6 Alarm Management............................................15 82 8.7 Other Design Issues.........................................15 83 9. Ease of Implementation.......................................15 84 10. OAM Messaging................................................16 85 11. Distinguishing OAM data plane flows..........................17 86 11.1 RFC 3429 "OAM Alert Label"................................17 87 11.2 VCCV......................................................17 88 11.3 PW PID....................................................17 89 12. The OAM Return Path..........................................17 90 13. Use of Hierarchy to Simplify OAM.............................19 91 14. Current Tools and Applicability..............................20 92 14.1 LSP-PING (MPLS WG)........................................20 93 14.2 Y.1711 (ITU-T SG13/Q3)....................................21 94 14.2.1 Connectivity Verification (CV) PDU.......................22 95 14.2.2 Fast-Failure-Detection (FFD) PDU.........................22 96 14.1.3 Forward and Backward Defect Indication (FDI & BDI).......23 97 14.3 Y.17fec-cv (ITU-T SG13/Q3)................................23 98 15. Security Considerations......................................23 99 16. A summary of what can be achieved............................24 100 17. References...................................................24 101 18. Editor's Address.............................................25 102 A Framework for MPLS Data Plane OAM October 2003 104 1. Conventions used in this document 106 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 107 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 108 this document are to be interpreted as described in RFC-2119 [1]. 110 The term MPLS "level" nominally refers to the MPLS stack level 111 inclusive of reserved labels. In this document the term "level" is 112 used exclusive of reserved labels, therefore the term "level" is 113 more precisely analogous to a specific MPLS subnetwork layer 114 instance. 116 2. Changes since the last version (to be removed on publication) 118 Section 11 recast from being a discussion of potential mechanisms, 119 to being a survey of the defined mechanisms. 121 Section 14 added which provides a survey of MPLS OAM mechanisms 122 defined in both the IETF and the ITU-T. 124 Reference to [CHANG] draft and discussion of reverse notification 125 tree removed. 127 Reference to [HEINANEN] on directory based LDP VPNs removed. 129 Reference to [HARRISON-REQ] and [HARRISON-MECH] replaced with 130 Y.1710 and Y.1711 respectively. 132 [MARTINI] reference updated. 134 3. Contributors 136 Mina Azad 137 Azad-Mohtaj Consulting Phone: 1-613-722-0878 138 Ottawa, Ontario, CANADA Email: mohtaj@rogers.com 140 Jerry Ash 141 AT&T 142 Room D5-2A01 143 200 Laurel Avenue Phone: +1 732-420-4578 144 Middletown, NJ 07748, USA Email: gash@att.com 146 Neil Harrison 147 BT Global Services Email: neil.2.Harrison@bt.com 149 Sanford Goldfless 150 192 Fuller St Phone: 617-738-1754 151 Brookline MA 02446 Email: sandy9@rcn.com 153 Eric Davalo 154 Maple Optical Systems 155 A Framework for MPLS Data Plane OAM October 2003 157 3200 North First Street Phone: 408 545 3110 158 San Jose CA 95134 Email: edavalo@mapleoptical.com 160 Arun Punj 161 Marconi Communications 162 1000 Marconi Drive, 163 Warrandale - PA - 15086 Email: Arun.Punj@marconi.com 165 Marcus Brunner 166 Network Laboratories - NEC Europe Ltd. 167 Adenauerplatz 6 Phone: +49 (0)6221/ 9051129 168 D-69115 Heidelberg, Germany Email: brunner@ccrle.nec.de 170 Chou Lan Pok 171 SBC Technology Resources, Inc. 172 4698 Willow Road, Phone: +1925-598-1229 173 Pleasanton, CA 94583 Email: pok@tri.sbc.com 175 Wesam Alanqar 176 Sprint 177 9300, Metcalf Ave, Phone: +1-913-534-5623 178 Overland Park, KS 66212 Email : wesam.alanqar@mail.sprint.com 180 M. Akber Qureshi 181 Lucent Technologies 182 101 Crawfords Corner Road Phone: +1 732 949 4791 183 Holmdel, NJ 07733 Email: mqureshi@lucent.com 185 Don Fedyk 186 Nortel Networks 187 600 Technology Park Phone: +1 978 288 3041 188 Billerica MA 01821 EMail: dwfedyk@nortelnetworks.com 190 4. Requirements 192 MPLS data-plane OAM specific requirements and a summary of 193 requirements that have appeared in numerous PPVPN, PWE3, and MPLS 194 documents appear in [Y1710] and [MPLSREQS]. This Internet draft 195 discusses the implications of extending OAM across the MPLS 196 architecture, and adds additional data-plane OAM requirements and 197 capabilities for managing multi-provider networks. This document 198 also broadens the scope of the requirements discussion in 199 identifying where certain OAM applications simply cannot be 200 implemented without modifications to current practice/architecture. 202 Finally this draft offers a survey of the currently standardized or 203 about to be standardized tools. 205 5. Domain Concepts 207 MPLS introduces a richness in layering which renders traditional 208 definitions of 'domain' inadequate. In particular, it is noted that 209 A Framework for MPLS Data Plane OAM October 2003 211 MPLS has no fixed layered hierarchy (this is a unique property that 212 no other technology has offered before). 214 A provider may have MPLS peer providers, use MPLS transit from 215 serving providers (and require MPLS or non-MPLS client transport), 216 and offer MPLS transit to MPLS or non-MPLS clients). Further, the 217 same provider may use a hierarchy of LSPs within their own network. 218 This Internet Draft defines the concept of an "Operations Domain" 219 (to cover OAM capabilities operated by a single provider) that may 220 only be a portion of the end-to-end LSP. Operations Domain functions 221 are an interdependent mix of control-plane, data-plane (a.k.a. user- 222 plane), and management-plane functions. 224 An LSP "of level m" may span numerous Operational Domains. The data 225 plane of the LSP is a contiguous entity consisting of data plane 226 portions of traversing operational domains. The control and 227 management planes of these operational domains may be disjoint. The 228 goal is to provide OAM functionality for each LSP independent of the 229 LSP creation mechanism or payload. 231 It is possible to have a hierarchy of operators (e.g. carriers of 232 carriers), where overlay Operational Domains are opaque to the 233 serving Domain. Therefore it is required that each LSP Operational 234 Domain implement its own OAM functionality, and the OAM applications 235 are confined to the Operational Domains traversed at level "m". 237 Note that this concept has subtle differences with concepts of 238 horizontal and vertical hierarchy as defined in [HIERARCHY]. 239 Vertical hierarchy usually refers to networking layer boundaries 240 distinguished by technology. An operational domain may refer to an 241 operator specific hierarchical subset of the LSP levels within the 242 MPLS network and/or a horizontal partitioning within a specific LSP. 243 Similarly there is a further way to consider the concept of 244 operational domain and horizontal hierarchy. An operational domain 245 may be hierarchically partitioned (e.g. OSPF "areas") but may be 246 operationally integrated and contiguous. 248 6. OAM Applications 250 The purpose of having data plane LSP specific OAM transactions is to 251 support useful OAM applications. Examples of such applications 252 include: 254 Fault management 256 - On demand verification: the ability to perform connectivity tests 257 that exercise the specific LSP and the provisioning at the ingress 258 and egress. On demand suggests that verification may be performed on 259 an ad-hoc basis. 261 - Fault detection: Operators cannot expect customers to act as fault 262 detectors, and so the ability to perform automated detection of the 263 failure of a specific LSP is a "must have" feature (although when 264 A Framework for MPLS Data Plane OAM October 2003 266 one reviews the section on LSP creation above, one realizes it will 267 not be ubiquitously used). Some MPLS deployment scenarios may not 268 have a control plane or may have LSP processing components not in 269 common with the control plane, so fault detection procedures may 270 need to be augmented with LSP specific methods. 272 - Fault sectionalization: The ability to efficiently determine where 273 a failure has occurred in an LSP. Sectionalization must be able to 274 be performed from an arbitrary LSR along the path of the LSP. 276 - Fault Propagation: specific MPLS deployment scenarios may not have 277 a control-plane to propagate LSP failure information. Fault 278 propagation has numerous forms and there are variations depending on 279 whether the failure is in the serving layer/level or : 280 i) Northbound from the failed level to the management plane. 281 ii) Within the failed level. 282 iii) From the failed level to its clients. 283 iv) Within the client level to the LSP ingress and egress either 284 via the user or control planes. 285 And in all cases it is the termination of a layer that performs the 286 function. 288 Performance management 290 - The ability to determine whether an LSP meets certain goals with 291 respect to latency, packet loss etc. 292 - The ability to collect information to facilitate network 293 engineering decisions. 295 Of the above applications, verification, detection and 296 sectionalization explicitly need to exercise all components of the 297 forwarding path of the target LSP, otherwise there will be failure 298 scenarios that cannot be detected or properly sectionalized. These 299 applications cannot be supported properly if there are differences 300 in handling between user traffic and OAM probes at intermediate 301 LSRs. 303 A separate and useful classification of the applications outlined 304 above is to distinguish the difference between monitoring 305 applications and diagnosis. Monitoring applications are typically 306 unattended in operation, collect operational statistics, and upon 307 detection of problems, must provide sufficient information to permit 308 precise diagnosis of the problem to be performed and frequently some 309 form of automated network response to problems. Diagnosis 310 applications are typically attended in operation and must be able to 311 authoritatively locate and isolate faults. The security implications 312 of this distinction is discussed in the security considerations 313 section. 315 7. Deployment Scenarios 316 A Framework for MPLS Data Plane OAM October 2003 318 At the present time there are a number of MPLS deployment scenarios 319 each with a number of subtleties from a data plane OAM perspective. 320 Each can be viewed as a characteristic of an operational domain: 322 The sparse model: This can be in conjunction with control plane 323 signaling (e.g. MPLS based traffic engineering applied to an IP 324 network) or with simple provisioned LSPs (no control plane 325 signaling). The key feature being that the MPLS operational domain 326 will not have any-to-any connectivity at the MPLS layer due to the 327 sparse use of LSPs to augment the served layer connectivity. This 328 has operational and scalability implications as OAM connectivity 329 must be explicitly added to the model, or the operator may be 330 obliged to depend on "layer violations" embedded in OAM mechanisms 331 which are strictly only relevant to a different higher layer network 332 (e.g. [ICMP]) to generate a return path. 334 The ubiquitous model: This model generally combines MPLS, integrated 335 routing and control to produce universal any-to-any connectivity 336 within an operational domain. This may be combined with a hierarchy 337 of LSPs to modify the topology presented to the client layer. This 338 offers providers the option of utilizing the resources inherent to 339 all planes of the Operational Domain in designing OAM functionality. 341 These two models of MPLS connectivity can be stacked or concatenated 342 to support numerous configurations of peering and overlay networking 343 arrangements between providers and users. A direct inference being 344 that an operational domain will not necessarily have knowledge of 345 the domains above and/or below it, and in the general case far less 346 knowledge of (and certainly less control over) its peer domains. OAM 347 applications for LSPs of a specific level are confined to an 348 operational domain and its data plane peers. 350 More recently there is a tendency to overlay a L2 or L3 VPN service 351 level on the data-plane of an operational domain, with it's own 352 identifiers and addressing, while tunneling control information 353 across the control plane of the operational domain using BGP-4 354 [2547][KOMPELLA] or extended LDP discovery [MARTINI]. From a data 355 plane OAM perspective, we would consider this to be a separate 356 operational domain, and anticipate that it is only a matter of time 357 before such service levels evolve to span multiple operational 358 domains (for example, an L2 or L3 VPN that spans multiple providers, 359 or the introduction of tandem points at the data plane of the 360 service level). 362 8. MPLS architecture implications for OAM 364 8.1 Topology variations within an MPLS level 366 There are a number of topology variations in the MPLS architecture 367 that have OAM implications. These are: 369 - Uni-directional and bi-directional LSPs. A uni-directional LSP 370 only provides connectivity in one direction, and if return path 371 A Framework for MPLS Data Plane OAM October 2003 373 connectivity exists, it is an attribute of the operational domain 374 (e.g. signaling, management or client layers), and not a unique 375 attribute of the LSP. Bi-directional LSPs or specific return path 376 (e.g. [DUBE]) have inherent symmetrical connectivity as an attribute 377 of the LSP. 379 - Multipoint-to-point (mp2p) LSPs are where a single LSP uses 380 "merge" LSR transfer functions to provide connectivity between 381 multiple ingress LSRs and one egress LSR (sufficient information 382 being present in the payload to permit higher layer demultiplexing 383 at the egress). There are a number of problems inherent to mp2p 384 topological constructs that cannot be addressed by traditional p2p 385 mechanisms. One issue being that for some OAM applications (e.g. 386 data plane fault propagation) OAM flows may require visibility at 387 merge-points to limit the impact of partial failures or congestion. 389 "Best effort" mp2p LSPs may have fairness issues with some packet 390 schedulers. This may complicate obtaining consistent measurements 391 under congestion conditions. Explicitly routed mp2p LSPs with 392 associated resource reservations are significantly more complex to 393 engineer. The resource reservations required will be cumulative at 394 merge points (as will jitter), and the ability to provide 395 differentiated handling for specific ingresses is lost once any 396 merge point is crossed. One opinion would be that the complexity and 397 difficulty in the configuration/maintenance of ER-mp2p LSPs 398 significantly outweighs the scalability benefits, and would not 399 likely be deployed. 401 - Penultimate Hop label Popping (PHP), is an optimization in the 402 architecture in which the last LSR prior to the egress removes the 403 redundant current MPLS label from the label stack. Therefore the 404 ability to infer LSP specific context (OAM and other) is lost prior 405 to reaching the final destination. 407 MPLS does not provide for protocol multiplexing via payload 408 identification (with the exception of the explicit IPv4 and IPv6 409 labels). PHP requires that the final hop have a common protocol 410 payload (typically IP) or is able to map to lower layer protocol 411 multiplexing capability (e.g. PPP Protocol Field or Ethernet 412 ethertype) as the ability to infer payload type from LSP label is 413 lost. 415 Another scenario where PHP is employed is when the egress LSR is not 416 actually MPLS data plane capable. This has data-plane OAM 417 implications in that any MPLS specific flows need to terminate at 418 the PHP LSR. This requires that the PHP LSR proxies OAM functions on 419 behalf of the egress LSR. This will introduce complexity when any 420 type of consequent actions such as layer interworking of fault 421 notification is required. 423 - E-LSPs [MPLSDIFF] in which a single LSP supports multiple queuing 424 disciplines to support multiple QoS behavior aggregates. Ability to 425 A Framework for MPLS Data Plane OAM October 2003 427 perform OAM performance functions on a "per behavior aggregate" 428 basis is critical to managing E-LSPs. 430 - Management plane provisioned LSPs, vs. control plane signaled 431 LSPs. In many scenarios associated with a control plane, the 432 topology of the LSP varies over time. This can be due to many 433 reasons, implicit routing, dynamic set up of local repair tunnels 434 etc. etc. 436 - The potential existence of multiple LSPs between an ingress and an 437 egress LSR. This can be for many reasons, L-LSPs, equal cost 438 multipath routing etc. etc. 440 - The potential existence of multiple next hop label forwarding 441 entries (NHLFEs) for a single incoming label. This is the scenario 442 whereby the incoming label map (ILM) for an incoming label switch 443 hop (LSH) maps to an inverse multiplex of NHLFEs which may be re- 444 merged into a common egress or have multiple egress points. The 445 mechanism for selecting the NHLFE to use may be proprietary and is 446 performed on a packet by packet basis. Some implementations hash 447 both the label stack and any IP payload source and destination 448 addresses in order to preserve flow ordering while achieving good 449 fan out. However this means that predictability of any nested LSPs 450 degrades in the presence of problems. 452 OAM tools not specifically aware of this construct may produce 453 random results (insufficient frequency of failure to trigger 454 threshold detection), or pathologically may only test a subset of 455 the NHLFEs impacting both the detection and diagnosis of defects. 456 Similarly performance monitoring is impacted as packets in flight 457 cannot accurately be accounted for. The ramifications are 458 comprehensively discussed in [ALLAN]. 460 - Use of per-platform label space. A per-platform label has 461 significance at a nodal level and not just an interface level. Some 462 of the more interesting applications being the ability to create 463 unsignalled facility backup LSPs in "bypass tunnels" [SWALLOW]. 464 Traffic arriving on multiple interfaces and/or LSP tunnels may use a 465 common per-platform label and will have a common ILM and NHLFEs. 466 This can have implications similar to mp2p and PHP depending on how 467 it is used; LSP origin information is not conserved when multiple 468 sources use a common label. 470 - p2mp and mp2mp LSPs (a.k.a. MPLS Multicast) is for further study. 471 At the present time what placeholders exist in the architecture for 472 multicast treat it as a separate protocol from "unicast" MPLS (with 473 the exception of ATM variations of MPLS). 475 These topological variations introduce complexity when attempting to 476 instrument OAM applications within a specific MPLS level such as 477 performance management, fault detection, fault isolation/diagnosis, 478 fault handling (e.g. consequent actions taken to avoid raising 479 unnecessary alarms in client layers) and fault notification. 481 A Framework for MPLS Data Plane OAM October 2003 483 8.1.1 Implications for Fault Management 485 mp2p, E-LSPs and PHP have implications for fault management, 486 specifically if an LSR is required to have knowledge of both the 487 ingress LSR and the specific LSP that an OAM message arrived on, or 488 is expected to have knowledge of, and maintain state about the set 489 of ingress LSRs for an LSP. OAM messaging needs mechanisms to 490 distinguish both the ingress LSR and the specific LSP. (This ability 491 is expressed on these terms as LSPs are typically not given globally 492 unique identifiers, more frequently some locally administered LSR-ID 493 is used). 495 Connectivity verification requires testing of connectivity between 496 all possible ingress/egress combinations. Frequently it will not be 497 possible to infer the ingress LSR and specific LSP directly at the 498 egress as such information may be lost at merge points in mp2p LSPs 499 or due to PHP. This is true for both OAM messaging, and normal data 500 plane payloads. There may be numerous reasons why an ingress-egress 501 pair may have a plurality of LSPs between them, so the ability to 502 distinguish the source and purpose of specific probes beyond mere 503 knowledge of the originating LSR is required. 505 The ability to distinguish the ingress can be achieved via modifying 506 the OAM protocol to carry such information, or may be achieved via 507 modifications to operational procedures such as overlaying p2p 508 connectivity. 510 8.1.2 Implications for Performance Management 512 Many performance management functions can be performed by obtaining 513 and comparing measurements taken at different points in the network. 514 Comparing ingress and egress statistics being the simplest example 515 (but is usually restricted to within a single domain). The key issue 516 is ensuring that "apples-to-apples" comparison of measurements is 517 possible. This means that all measurement points need to be able to 518 similarly classify the traffic and performance they are measuring, 519 and that the measurements are synchronized in time and compensate 520 for traffic in flight between the measurement points. 522 For example, a relatively simple technique for establishing key 523 performance metrics would be to compare what was sent with what was 524 received. For example in the PPP line quality monitoring (LQM) 525 function the ingress periodically sends statistics to the egress for 526 comparison subject to the same queuing discipline as the data plane 527 traffic, such that traffic in flight is properly accounted for. 528 (Note that re-ordering will introduce errors but is not expected to 529 be frequent.) 531 It is important to distinguish, and be able to measure, what 532 constitutes the up and down states of an LSP. This needs to be 533 standardized so that there is unified treatment. A key observation 534 here is that QoS metrics (like loss, errored packets, delay, etc) 535 A Framework for MPLS Data Plane OAM October 2003 537 are only relevant to when the LSP is in the up-state; and so any 538 collection of QoS measurements is suspended when the LSP enters the 539 down-state. This requires specification of the state transitions to 540 achieve measurement consistency, and is a pre-requisite to QoS 541 assessment. This is a particularly important metric to operators, 542 since customers will be expecting operators to be able to offer both 543 QoS and availability SLAs, and so these must be differentiated and 544 uniquely measurable. 546 A simple ingress/egress comparison is not always possible, there is 547 no ability to similarly classify what is being measured at the 548 ingress and egress of an LSP. mp2p LSPs and PHP do not have a 1:1 549 relationship between the ingress and the egress. LSPs containing 550 ILMs that map to multiple NHLFEs introduce measurement inaccuracy as 551 not all packets share a common queuing discipline and where this 552 results in multiple egress points from the network, there is an 553 inability to synchronize measurements. Partial failure of an mp2p 554 LSP (incl. ECMP) will result in the inability to successfully 555 collect statistics 557 So, in addition to having to define up/down-state transitions, for 558 successful PM the 1:1 relationship needs to be restored by either: 560 - The mp2p/PHP LSP is modeled as one LSP for measurement. This means 561 that measurements performed at ingress points need to be 562 synchronized and adjusted for common LSP segments such that the 563 results are all presented to the egress simultaneously (again 564 correcting for traffic in flight), a technique dependent on such a 565 high degree of synchronization would be impossible to perfect, and 566 prone to a degree of error. 568 - The mp2p/PHP LSP is modeled as a collection of "ingress" LSPs for 569 measurement. This means that the egress needs to be able to maintain 570 statistics by ingress and appropriately classify traffic 571 measurements. 573 Neither of the above is achievable at the present time without 574 modifying existing operational procedures. The first approach 575 involves treating the mp2p/PHP LSP as an aggregate, and as such it 576 can partially fail and degrade. This complicates the establishment 577 of performance metrics and specifying recovery procedures on errors. 579 The second approach requires decomposing the mp2p/PHP LSP such that 580 both payload and OAM traffic can be demultiplexed at the egress and 581 correctly associated with "per-ingress" state. The ability to 582 demultiplex both OAM and payload implies a common wrapper, and the 583 net effect would be to overlay p2p connectivity on top of the 584 merge/PHP based transport level. 586 The existence of E-LSPs adds a wrinkle to the problem of measurement 587 synchronization. An E-LSP may implement multiple diffserv PHBs and 588 incorporate multiple queuing disciplines. An aggregate measurement 589 for the entire LSP sent from ingress to egress would frequently have 590 A Framework for MPLS Data Plane OAM October 2003 592 a small margin of error when compared with an aggregate measurement 593 taken at the egress. Separate measurement comparisons for each 594 supported EXP code point would be required to eliminate the error. 596 The situation is slightly different for p2p LSPs containing ILMs 597 that map to multiple NHLFEs. If all the NHLFEs are merged back into 598 a single entity prior to the egress, there will inherently be a 599 degree of measurement error that modifications to operational 600 procedure cannot correct. However there is no guarantee that this 601 will be the case, and any individual ingress measurement may be 602 compared with only one of several egress measurement points (either 603 random or pathological). 605 8.2 LSP Creation Method 607 The ability to usefully audit the constituent components of an LSP 608 is dependent on the technique used to create the LSP.Presently 609 defined are provisioning, LDP, CR_LDP, RSVP-TE, and BGP. 611 LSP creation techniques that are currently defined fall at two 612 relative extremes: 614 At one extreme is explicitly routed point-to-point connection 615 between fixed ingress and egress points in the network. Explicitly 616 routed (ER) LSPs (today created via provisioning, CR-LDP, RSVP-TE 617 or BGP) have a significant degree of testability as the path across 618 the network and the egress point is fixed and knowable to a testing 619 entity. Similarly explicit pairwise and stateful 620 testing/measurement relationships can be set up (e.g. connectivity 621 verification) and strict criteria for failure established. 623 In the middle is static mp2p constructs typically signaled via BGP 624 (e.g. RFC 2547). 626 At the other extreme is when LSP construction is topology driven 627 (such as dynamic "shortest path first" routing combined with LDP), 628 whereby the details of path construction between the ingress and 629 egress points in the network will vary over time and may involve 630 several stages of multiplexing with traffic from other sources. The 631 details of path construction at any given instant are not 632 necessarily knowable to an auditing entity so any attempt to 633 interpret the results of an audit may generate spurious results. 634 Further, the MPLS network may only be a portion of the operational 635 domain, and the egress point from the network for an FEC may vary 636 over time. 638 The testable unit in an LDP network is the FEC not the LSP, and the 639 potential existence of a many to many relationship of ingress and 640 egress points limits the testability of the FEC, or at least may 641 limit the frequency of using such tests. 643 The connectivity instantiated in a specific LSP created by a 644 topology driven control plane signaling mechanism will recover from 645 A Framework for MPLS Data Plane OAM October 2003 647 many defects in the network. The quality of recovery typically 648 being a function of how the network is engineered. 650 Problems are typically detected by having MPLS connectivity fate 651 share with the constituent physical links and routing adjacencies, 652 and topology driven path re-arrangement will restore the 653 connectivity (with some interruption and other side effects 654 occurring between the initial failure and re-convergence of the 655 network). However exclusive dependence on fate sharing for failure 656 detection means that LSP components may have unique failure modes 657 from which the network will not recover and can only be diagnosed 658 reactively. 660 As can be inferred from the above, what is required for topology 661 driven LSPs is a test mechanism that audits forwarding policy as 662 this is the metric by which some aspects of network performance can 663 be measured. 665 8.3 Lack of Fixed Hierarchy 667 MPLS supports an arbitrary hierarchy in the form of label stacking. 668 This is a facility that can be leveraged for OAM purposes. As an 669 example, the section on implications for performance management has 670 already outlined how p2p topology for PM can be overlaid on an 671 arbitrary merged topology to add manageability of services. 672 Similarly functions requiring sectionalization of an LSP or ability 673 to isolate partial failure of a complex construct can be achieved by 674 constructing the LSP as an overlay upon a concatenation of 675 operationally significant shorter LSPs. By operationally significant 676 we would refer to LSPs that spanned useful portions of the whole 677 construct (e.g. a branch of an mp2p LSP, or bypassed LSRs that did 678 not have OAM capability). 680 This could simplify the instrumentation of level specific OAM by 681 ensuring only e2e functions were required (as opposed to functions 682 originating or terminating at arbitrary points in the network), 683 while driving up the complexity of LSP establishment due to the 684 resultant inter-level configuration issues when creating multi-level 685 constructs with the desired manageability. 687 8.4 Use of time to live (TTL) 689 Experience within the IP world has suggested that TTL was a 690 serendipitous feature that can be similarly leveraged by MPLS. 692 However in the MPLS world, TTL suffers from inconsistent 693 implementation depending on the link layer technology spanned by the 694 target LSP. The existence of non-TTL capable links (e.g. MPLS/ATM) 695 has impact on the utility of using TTL to augment the MPLS OAM 696 toolkit. For example, use of TTL as an aid in fault sectionalization 697 can only isolate a fault to the granularity of a non-TTL capable 698 span of LSH or LSP segments. 700 A Framework for MPLS Data Plane OAM October 2003 702 There are other variations in TTL handling that suggest interpreting 703 results of TTL based tests may be problematic. As outlined in [TTL] 704 there are two models of TTL handling with different implications: 706 - the uniform model, in which decrement of TTL is independent of the 707 MPLS level. At the ingress point to an MPLS level, the current TTL 708 is copied into the new top label, and at egress is copied back to 709 the revealed top level. 711 - the pipe and short pipe models, whereby MPLS tunnels (aka LSPs) 712 are used to hide the intermediate MPLS nodes between LSP ingress and 713 egress from a TTL perspective. 715 The uniform model originates with preserving IP TTL semantics when 716 IP traffic transits an MPLS subnetwork. The uniform model will 717 reduce the resource consumption of routing loops, but in a correctly 718 operating network may lead to premature discard of packets outside 719 the operational domain they originated from (due to the existence of 720 an arbitrary number of serving MPLS levels). Similarly when a 721 routing loop occurs, determining the MPLS level that is the source 722 of the problem will be difficult as there is no method to correlate 723 it with the level where the exhaustion event occurred. 725 The pipe model is more consistent with the operational domain model 726 in that TTL exhaustion will only occur at a specified level and the 727 initial values used at LSP ingress are more likely to be reflective 728 of detecting what would genuinely constitute a routing loop. 730 A reasonable expectation is that the uniform model would not be used 731 outside of an operational domain. 733 A separate issue is that it is also possible that an LSR may 734 decrement TTL by an amount other than one as a matter of policy. 735 This means that the results obtained via any tools that use TTL 736 exhaustion will require some interpretation. 738 8.5 State Association 740 The design of OAM flows in MPLS levels that multiplex traffic from 741 multiple sources together may introduce implementation complexity 742 where the flows are processed. The receiver of the OAM message will 743 need to extract information from the packet to identify the LSP and 744 associate it with ingress and LSP specific state. If the ingress/LSP 745 identifier in the packet is not administered by the processing node, 746 it will be unable to optimize the implementation of the state 747 association mechanism and will be required to perform some sort of 748 table search. 750 If the identifier is administered by the processing node and that 751 node is not the originator of the probe, some mechanism will be 752 required to distribute this information uniquely to each probe 753 originator. 755 A Framework for MPLS Data Plane OAM October 2003 757 8.6 Alarm Management 759 MPLS permits layers of different operational behaviors to recurse. 760 When the alarm management paradigms differ they may not be 761 reconcilable. For example, an LDP network has no ability to perform 762 alarm suppression directly within the dataplane for e2e tools either 763 used within the LDP layer or overlaid on an LDP layer that are 764 impacted by a failure. The LDP network will recover, but the node 765 that could report the failure may not directly participate in the 766 recovery, therefore data plane alarm suppression mechanisms cannot 767 be synchronized with service restoration. 769 8.7 Other Design Issues 771 It is desirable to make the data plane OAM implementations 772 independent of LSP specifics. It would be desirable to have common 773 mechanisms across p2p and mp2p LSPs, PHP or no-PHP and independent 774 of payload and the method of LSP creation in order to minimize 775 overall complexity. The OAM application originator should not need 776 (as far as is practical) any knowledge of the details of LSP 777 construction. 779 PM may require that instrumentation of many OAM applications is only 780 possible for p2p LSPs and therefore would only be possible for a 781 select group of MPLS levels (e.g. overlaid service labels as per 782 [KOMPELLA] or [MARTINI]). 784 Fault management must be applicable across the spectrum of all label 785 levels and LSR transfer functions. 787 Finally, the possibility of re-ordering of OAM messaging must be 788 considered. The design of OAM applications and messaging must be 789 tolerant of out of order delivery and some degree of packet loss. 790 For some applications the originator/termination will require a 791 means to uniquely correlate requests with probe responses (including 792 responses to mis-directed probes) or verify in sequence receipt. 794 9. Ease of Implementation 796 Complex functions are typically require software implementation and 797 are not capable of handling line rate messaging. Implementations 798 defend themselves via rate-limiting or similar load management 799 techniques to avoid vulnerabilities to DOS attacks or simple mis-use 800 by incompetent craftspersons. In many cases, the complexity of 801 adding strong authentication as defense against DOS attacks may be 802 less onerous than promiscuous processing of complex probes. 804 Probes supporting monitoring applications gain the most benefit when 805 they can run at line rate such that there are no concerns about 806 processing capacity at the processing network elements. Such tests 807 will generate predicable results (or at least not have results 808 A Framework for MPLS Data Plane OAM October 2003 810 degraded when network elements are under stress) and automated 811 procedures can be designed around such mechanisms. MP2P LSPs are an 812 exemplary case where egress processing of probes may be required to 813 support probes from an arbitrary number of unsynchronized sources. 815 Messaging mechanisms to perform diagnostic tests (once a fault has 816 been authoritatively established) tend to be more complex and 817 software intensive. Diagnostic tests are frequently used by 818 craftspersons, and can be more tolerant of things like discard due 819 to rate limiting. 821 10.OAM Messaging 823 OAM should be decoupled from user behavior to ensure consistent OAM 824 functional behavior (under any traffic conditions) and avoid the use 825 of customers as guinea pigs. 827 At the specific LSP level, support of OAM applications require 828 messages that flow between three entities, the LSP ingress, the 829 intervening network and the LSP egress. As an LSP is unidirectional, 830 it should be self evident that OAM applications that require 831 feedback in the reverse direction will have such communication occur 832 either at the specific LSP level, or some data plane LSP level in 833 the operational domain, or one of the other planes (control or 834 management) of the operational domain. 836 The set of possible individual transactions (plus examples of their 837 utility) is as follows: 839 LSP specific data-plane transactions: 840 - ingress to egress 841 applicability: verification, fault detection, performance 842 management 843 - ingress to network 844 message will terminate at an intermediate LSR traversed by 845 the LSP. 846 Applicability: sectionalization from source 847 - network to egress 848 message is inserted into the LSP at an intermediate node 849 and terminates at the LSP egress LSR. 850 Applicability: sectionalization from arbitrary point in an 851 LSP. 852 - Network to network 853 Applicability: sectionalization from arbitrary point in an 854 LSP. 856 Feedback transactions 857 - egress to ingress 858 applicability: verification, fault detection. 859 - egress to network 860 flow originates at the LSP egress and terminates at 861 an 862 intermediate node traversed by the LSP. 864 A Framework for MPLS Data Plane OAM October 2003 866 Applicability: sectionalization from arbitrary point in an 867 LSP. 868 - network to ingress 869 flow will originate at an intermediate LSR traversed by 870 the LSP and terminate at the LSP source. 871 Applicability: sectionalization from ingress. 872 - network to network 873 Applicability: sectionalization from arbitrary point in an 874 LSP. 876 11.Distinguishing OAM data plane flows 878 MPLS provides several mechanisms for distinguishing OAM data plane 879 flows. 881 11.1 RFC 3429 "OAM Alert Label" 883 RFC 3429 [3429] defines the OAM alert label which identifies that 884 the payload is a Y.1711 PDU. The OAM alert label may be used for p2p 885 LSPs that do not encounter lower layer ECMP, and for Y.17fec-cv 886 PDUs. 888 11.2 VCCV 890 [VCCV] provides procedures for PEs to negotiate an OAM protocol to 891 be multiplexed with payload over a PW, and defines a bit in the PW 892 header which indicates when the PW PDU contains OAM flows or payload 893 flows. The purpose is to carry IP based OAM protocols (LSP-PING, 894 ICMP etc.) opaque to any ECMP mechanisms 896 11.3 PW PID 898 [ARCH] defines a PW PID which permits OAM protocols to be 899 multiplexed with a PW in a form whereby they self identify to the 900 far end PE. This can be used to transport Y.1711 or Y.17fec-cv PDUs 901 opaquely over an ECMP infrastructure such that they properly fate 902 share with the PW. 904 12.The OAM Return Path 906 The ability to use OAM applications such as single-ended monitoring 907 of both directions from one end, or to support applications such as 908 protection switching in a 1/N:M case, requires a return path to the 909 LSP ingress. This enhances the scalability and reliability of some 910 OAM applications as data plane OAM can function as a closed system. 911 A specific example being use of a loopback where the only place 912 state and timing need be maintained is at the loopback originator. 914 This requires a return path to complete the loop between the "target 915 LSP" and the OAM application originator. This will permit reliable 916 transaction flows to be implemented that impose minimal state on the 917 network. 919 A Framework for MPLS Data Plane OAM October 2003 921 For the few OAM applications that require a return path, the return 922 path can be tolerant of being topologically disjoint with the target 923 LSP (providing the differential delays are small, ie <<1s), 924 reachability of the application originator being the only hard 925 requirement. Similarly, different OAM applications will have 926 different return path requirements, and a hybrid of using all the 927 planes of the operational domain (according to the application) may 928 be significantly simpler and more operationally tractable than 929 significant modifications to current usage to fill in connectivity 930 gaps at the specific label level. 932 This is a key point, LSPs are currently by definition uni- 933 directional (bi-directional to date being a construct of multiple 934 uni-directional LSPs), so for any non-ubiquitous deployment of MPLS 935 connectivity, some modification of operational procedure to provide 936 for OAM messaging will be required for the few applications that 937 need it. Strict symmetry of connectivity at a specific label level 938 is not guaranteed. 940 In any type of sparse usage scenario (e.g. provisioned LSPs or use 941 exclusively for TE) there will not be an inherent any-to-any 942 connectivity in the data plane, and there may not be a control plane 943 signaling system. 945 In an implicit MPLS topology (e.g. LDP DU), any to any connectivity 946 will typically exist, or will be easily available with minor 947 alterations to operational procedure (LSRs advertise selves as 948 FECs). This would continue to be true for an integrated model in 949 which TE and an implicit topology were combined. 951 In any type of multi-provider MPLS topology, the scenario is more 952 complex, as for numerous reasons a provider may not wish to 953 provision/advertise external connectivity to their LSRs. Similarly, 954 for security reasons, providers may wish to apply some degree of 955 policy or filtering of OAM traffic at operational domain boundaries. 957 Data plane OAM messaging should be designed to leverage as much 958 "free connectivity" as can be obtained in the network, while 959 ensuring the constructs have sufficient extensibility to ensure the 960 corner cases are covered. 962 Within the operational domain of a single provider, it is relatively 963 easy to envision that a combination of data-plane, and control plane 964 functionality will ensure that a data-plane return path is 965 frequently available (although it may be topologically disjoint from 966 the target LSP). This is less so for inter provider scenarios. Here 967 there are a number of potential obstacles such as: 968 - disjoint control plane 969 - disjoint addressing plan 970 - requirements for policy enforcement and security 971 - impacts to scalability of ubiquitous visibility of individual LSRs 972 across multiple operational domains. 974 A Framework for MPLS Data Plane OAM October 2003 976 There are a number of approaches to providing inter-domain OAM 977 connectivity, the following is a brief commentary on each: 979 1) Reverse Notification Tree (a.k.a using bi-directional LSP) 980 In this method, each LSP has a dedicated reverse path - i.e. the 981 reverse path is established and associated with the LSP at the LSP 982 setup time. This requires binding the reverse path to each LSR that 983 is traversed by the LSP. This method is not scaleable, as it 984 requires doubling the number of LSPs in the network. Moreover each 985 reverse path requires its own OAM. 987 2) Global OAM capability 988 Similar to IP v4 to IP v6 migration methodology, this method 989 proposes use of a global operations domain with control-plane, data- 990 plane, and management-plane that interact with control-plane, data- 991 plane, and management-plane of individual operations domains. This 992 method requires commitment and buy-in from all network operators. 994 3) Inter-domain OAM gateway 995 This method proposes use of a gateway like functions at LSRs that 996 are at operations domain boundaries. OAM gateway like functions 997 includes capabilities to correlate OAM information from one 998 operations domain to another and permit inter-carrier 999 sectionalization problems to be resolved. 1001 Specification of an inter-domain OAM gateway capability would appear 1002 to be the most realistic solution. 1004 13. Use of Hierarchy to Simplify OAM 1006 MPLS hierarchy provides a mechanism to address a number of OAM 1007 issues. 1009 Section 5 outlined domain concepts that nominally would require 1010 intermediate nodes to inspect and possibly process OAM PDUs. MPLS 1011 does not currently have this capability. However frequently an 1012 operational domain is self contained and may easily be instantiated 1013 as a distinct MPLS layer which transports the domain spanning MPLS 1014 client. This permits the domain specific components of the LSP to be 1015 uniquely instrumented using end to end tools and provides security 1016 benefits in that the provider specific components of the domain are 1017 logically isolated from the clients. 1019 Section 7.1 outlined some of the impacts of MPLS topological 1020 constructs that multiplexed traffic from multiple sources together. 1021 Section 7.5 identified additional complexity modifying protocols to 1022 address state mapping for OAM purposes could entail. The key issue 1023 identified is that for fault management, OAM protocol design would 1024 permit mp2p and PHP to be addressed (but at a specific 1025 implementation cost), but this is not possible for performance 1026 management, in particular if ingress specific traffic counts are 1027 required. 1029 A Framework for MPLS Data Plane OAM October 2003 1031 Rather than attempting OAM protocol design to address what by 1032 definition will be an incomplete solution, it would be useful to 1033 define a common mechanism to demultiplex both MPLS level payload and 1034 OAM flows. The common mechanism ideally would be in the form of a 1035 wrapper that included an egress administered ingress identifier. 1037 One instantiation of such a wrapper would be a p2p MPLS label. The 1038 mechanisms exist for label distribution (in the form of extended LDP 1039 discovery), and LSPs are already passively instrumented (e.g. packet 1040 and byte counts). Similar benefits are obtained when the 1041 implementation is extended into the use of probe messages. State 1042 association at the egress becomes simple in that the state is 1043 associated directly with the incoming label (and can be obtained by 1044 augmenting the ILM lookup). 1046 The use of p2p overlays is one method of instrumenting mp2p and PHP 1047 LSPs that addresses all the issues outlined in section 7. It also 1048 significantly simplifies OAM protocol design and implementation. 1050 14.Current Tools and Applicability 1052 A number of OAM tools have been specified by both the IETF and the 1053 ITU-T. 1055 14.1 LSP-PING (MPLS WG) 1057 LSP-PING is designed to be retrofitted to existing deployed 1058 networks and to exercise all functionality currently deployed. In 1059 order to do so, the design trade off is that detection or diagnosis 1060 of a problem may take an arbitrary number of transactions. 1062 Protocol complexity is tolerated as initial implementations will be 1063 in software. Protocol complexity manifests itself in the form of 1064 TLV encoding of key information (FEC stack elements, and downstream 1065 LSR label map. Future functionality may be added to the protocol 1066 via the definition of additional Type-Length-Value (TLV) 1067 information elements. 1069 Aspects of the protocol design would permit a sparse subset to be 1070 handled in hardware (exact pattern match on the PDU). For example, 1071 in a VPN application, pinging a PE is facilitated by limiting the 1072 number of FECs at any level in the stack to one. Presumably an 1073 implementation of probe handling that matched on a ping of the PE 1074 loopback address could be optimized for that specific case. 1076 LSP-PING permits a uni-directional path to be tested from a single 1077 point, but depends on a reliable return path in order to propagate 1078 the test results back to the originating LSR. Therefore the 1079 protocol is designed to tolerate degrees of ambiguity in individual 1080 test results. Failure of an individual ping response may be due to 1081 any of several causes: 1082 - Forwarding path failure (including partial failure of ECMP 1083 A Framework for MPLS Data Plane OAM October 2003 1085 or other load balancing constructs) 1086 - Return path failure 1087 - Port rate limiting at the egress 1088 - Port rate limiting at the ping origin 1089 - Congestive loss in the network 1091 And to deal with this ping supports several features to allow 1092 ambiguity to be eliminated via having the ingress perform 1093 variations of the original transaction: 1094 - Probe sequencing to permit both ingress and egress to detect 1095 gaps in probe sequences. 1096 - Return path may be specified permitting data plane and 1097 control plane problems to be distinguished. 1098 - Destination address may be manipulated to exercise payload 1099 sensitive ECMP implementations 1101 LSP-PING generally assumes PHP at the egress and that any specific 1102 LSP binding at the egress point of probe processing may not exist. 1103 From the perspective of reliable fault detection this is a minor 1104 issue as the use of a non-routable destination address limits any 1105 untested modes of failure. However this does alter the granularity 1106 of useful verification, as probe contents must be checked with the 1107 set of FECs associated with the LSR, rather than simply the set 1108 specifically associated with the LSP of interest. When testing a 1109 label stack for a VPN PE, the number of individual transactions 1110 required may be quite large as the number of FEC elements supported 1111 by the PE can be considerable. 1113 LSP-PING permits a label stack. For PW and VPN application, PHP may 1114 be employed by the PE such that PWs and VPN labels may not be 1115 directly tested (hence the FEC stack to permit transport or PSN 1116 probes to proxy verification for the transported application). 1118 LSP-PING has a traceroute mode that can extract a significant 1119 amount of information w.r.t. network configuration. Specifically 1120 all details of path construction for a given FEC (note that LSP- 1121 PING will most likely need to be augmented with authentication and 1122 authorization capability in the long term). 1124 Modes of use for LSP ping are being defined [LSR-TEST] that leverage 1125 TTL decrement to bound the scope of any individual test. 1127 14.2 Y.1711 (ITU-T SG13/Q3) 1129 Y.1711 is focused on fault/alarm management and availability 1130 measurement for P2P LSPs. The major design objective of Y.1711 as 1131 it currently stands is automatic defect detection and handling. A 1132 secondary goal is to be able to measure availability. It trades 1133 precision in fault isolation in return for simplified defect 1134 detection/handling capability (frequently referred to as "bounded 1135 detection time"). Y.1711 PDUs have a small number of fixed fields 1136 in order to minimize parsing and processing overhead. 1138 A Framework for MPLS Data Plane OAM October 2003 1140 Message processing is primarily performed at the egress such that 1141 for uni-directional LSPs, there is minimal ambiguity in detecting 1142 failure. This is also required to take the appropriate consequent 1143 actions, eg to inform higher layer clients of lower layer failures 1144 and thus avoid generating alarm storms in inappropriate places, or 1145 to suppress traffic if a security compromise is indicated (ie 1146 traffic arriving from the wrong source). 1148 Probe processing provides a simple "pass/fail" indication and 1149 sufficient information to permit a craftsperson to initiate 1150 diagnosis. It is dependent on other tools to perform specific 1151 diagnosis and isolation of problems. 1153 Y.1711 is not designed to extract information from the network as 1154 to configuration and layout of network components. It does not 1155 currently define any path tracing functionality and only operates 1156 on LSP endpoints. 1158 A corollary of the above, is that only LSP end points have any role 1159 in OAM processing, and the Y.1711 PDUs pass transparently through 1160 intermediate nodes. 1162 Y.1711 depends on some degree of ubiquitous deployment at the edge 1163 to maximize coverage of fault detection. 1165 Y.1711 is primarily focused on tunnel end points. However core LSRs 1166 may add significant value by implementing a specific subset of 1167 Y.1711: FDI generation for P2P LSPs to provide alarm suppression 1168 and fault notification to the edge devices when failures in the 1169 core occur. 1171 14.2.1 Connectivity Verification (CV) PDU 1173 The CV PDU is used as a heartbeat mechanism to verify connectivity 1174 between the LSP ingress and egress. Frequent injection of CV probes 1175 is a prerequisite for consistent/deterministic defect 1176 detection/handling and availability measurement. Injection of CV 1177 probes into LSPs from multiple sources (MP2P possibly with ECMP) is 1178 assumed to result in arrival rates at the LSP egress bursting at 1179 line rate. 1181 14.2.2 Fast-Failure-Detection (FFD) PDU 1183 The FFD PDU also provides a heartbeat mechanism similar to CV PDU 1184 but at a much faster rate. Y.1711 suggests that a LSP can be 1185 provisioned either with CV PDU or FFD PDU. CV PDU provides failure 1186 detection in order of 3 seconds whereas FFD PDU when provisioned 1187 can improve the failure detection time to 100 msecs range. FFD PDU 1188 can be selectively provisioned on LSPs requiring fast failure 1189 detection. 1191 A Framework for MPLS Data Plane OAM October 2003 1193 14.1.3 Forward and Backward Defect Indication (FDI & BDI) 1195 The CV probe is augmented with defect notification PDUs, FDI for 1196 the forward direction, and BDI for the reverse direction. These are 1197 used for alarm suppression and control of performance measurement 1198 functions. BDI has limited applicability given that most LSPs are 1199 uni-directional, however it is very useful for interworking OAM 1200 with bi-directional PW clients (e.g. ATM). 1202 14.3 Y.17fec-cv (ITU-T SG13/Q3) 1204 A slightly more sophisticated probe type based upon Y.1711 protocol 1205 mechanisms is the Forwarding Equivalence Class Connectivity 1206 Verification (FEC-CV) PDU. FEC-CV, can carry aggregated LSP 1207 information (in the form of a bloom filter) such that a significant 1208 amount of configuration information can be verified in a single 1209 transaction. This is generally in the form of FEC information that 1210 functions as a functional description of the LSP. Simple boolean 1211 operations on the bloom filter at the LSP egress can be used to 1212 detect misbranching while being tolerant of inbound filtering and 1213 other artifacts of network operations. The PDU can adapt to new 1214 applications via defining new coding rules for the FEC information, 1215 but no not require any changes to the actual PDU processing. 1217 Y.17fec-cv is designed to complement existing link and node failure 1218 detection mechanisms by filling a fault detection gap in the MPLS 1219 OAM toolset as part of an overall operational framework. Unlike the 1220 Y.1711 CV or LSP-PING, it is not a self contained mechanism for 1221 detection of all faults or performing availability assessment. 1223 15.Security Considerations 1225 Support for intra-provider data plane OAM messaging does not 1226 introduce any new security concerns to the MPLS architecture. 1227 Though it does actually address some that already exist, i.e. 1228 through rigorous defect handling operator's can offer their 1229 customers a greater degree of integrity protection that their 1230 traffic will not be misdelivered (for example by being able to 1231 detect leaking LSP traffic from a VPN). 1233 Support for inter-provider data plane OAM messaging introduces a 1234 number of security concerns as by definition, portions of LSPs will 1235 not be in trusted space, the provider has no control over who may 1236 inject traffic into the LSP. This creates opportunity for malicious 1237 or poorly behaved users to disrupt network operations. Attempts to 1238 introduce filtering on target LSP OAM flows may be problematic if 1239 flows are not visible to intermediate LSRs. However it may be 1240 possible to interdict flows on the return path between providers (as 1241 faithfulness to the forwarding path is not a return path 1242 requirement) to mitigate aspects of this vulnerability. 1244 A Framework for MPLS Data Plane OAM October 2003 1246 OAM tools may permit unauthorized or malicious users to extract 1247 significant amounts of information about network configuration. This 1248 would be especially true of IP based tools as in many network 1249 configurations, MPLS does not typically extend to untrusted hosts, 1250 but IP does. This suggests that tools used for problem diagnosis or 1251 which by design are capable of extracting significant amounts of 1252 information will require authentication and authorization of the 1253 originator. This may impact the scalability of such tools when 1254 employed for monitoring instead of diagnosis. 1256 16. A summary of what can be achieved. 1258 This draft identifies useful MPLS OAM capability that potentially 1259 could be provided via data plane OAM functions. In particular with 1260 respect to automatic fault detection and failure handling. 1262 This draft suggests that it may be possible to provide this 1263 capability for any level in the label stack either by instrumenting 1264 that level, or instrumenting an overlay and provides an overview of 1265 the tools available to do so. 1267 This draft also identifies that many aspects of performance 1268 management are intractable for some MPLS topological constructs. Any 1269 type of comparative measurement between an ingress and the egress of 1270 an LSP requires a 1:1 cardinality, or the ability of the egress to 1271 uniquely determine the ingress for each measured unit of 1272 communication, something that LSP merge, PHP and possible use of per 1273 platform label space at the measured LSP level undermine. Again a 1274 potential solution is to instrument a p2p overlay where such 1275 detailed measurements are required, and otherwise unavailable. 1277 17. References 1279 [ALLAN] Allan, D., "Guidelines for MPLS Load Balancing", draft- 1280 allan-mpls-loadbal-05.txt, IETF work in progress, October 2003 1282 [ARCH] Bryant et.al. "PWE3 Architecture", draft-ietf-pwe3-arch- 1283 06.txt, IETF work in progress, October 2003 1285 [DUBE] Dube, R., Costa, M. "Bi-directional LSPs for classical 1286 MPLS", draft-dube-bidirectional-lsp-00.txt, IETF work in 1287 progress, July 2002 1289 [HIERARCHY] Lai et.al. "Network Hierarchy and Multilayer 1290 Survivability", draft-ietf-tewg-restore-hierarchy-00.txt, IETF 1291 Work in Progress, September 2001 1293 [ICMP] Bonica et. al. "ICMP Extensions for MultiProtocol Label 1294 Switching", draft-ietf-mpls-icmp-02.txt, 1295 IETF Work in Progress, August 2000. 1297 [KOMPELLA] Kompella et.al. "MPLS-based Layer 2 VPNs", 1298 draft-kompella-mpls-l2vpn-02.txt, IETF Work in Progress, 1299 A Framework for MPLS Data Plane OAM October 2003 1301 December 2000 1303 [LSP-PING] Pan et.al. "Detecting Data Plane Liveliness in MPLS", 1304 draft-ietf-mpls-lsp-ping-03, IETF work in progress, June 2003 1306 [LSR-TEST] Swallow et.al., "Label Switching Router Self-Test", 1307 draft-ietf-mpls-lsr-self-test-00.txt, IETF Work in Progress, 1308 October 2003 1310 [MARTINI]Martini et.al. "Pseudowire Setup and Maintenance using 1311 LDP", draft-ietf-pwe3-control-protocol-04.txt, IETF Work in 1312 Progress, October 2003 1314 [MPLSDIFF] Le Faucheur et.al. "MPLS Support of Differentiated 1315 Services", IETF RFC 3270, May 2002 1317 [MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks", 1318 draft-ietf-mpls-oam-requirements-01.txt, June 2003 1320 [2547] Rosen, E. Rekhter, Y., "BGP/MPLS VPNs", IETF RFC 2547, 1321 March 1999 1323 [SWALLOW] Swallow, G. and Goguen, R., "RSVP Label Allocation for 1324 Backup Tunnels", draft-swallow-rsvp-bypass-label-01.txt, 1325 November 2000 1327 [TTL] Agarwal, P., and Akyol, B., "TTL Processing in MPLS Networks", 1328 IETF RFC 3443, January 2003 1330 [VCCV] Nadeau et.al., "Pseudo Wire (PW) Virtual Circuit Connection 1331 Verification (VCCV)", draft-ietf-pwe3-vccv-00.txt, July 2003 1333 [Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM 1334 Functionality for MPLS Networks" 1336 [Y1711] ITU-T Recommendation Y.1711(2002), "OAM Mechanism for MPLS 1337 Networks" 1339 [Y17FECCV] ITU-T Draft Recommendation Y.17fec-cv, "Misbranching 1340 Detection in MPLS Networks", Temporary Document TD25rev1 (WP3/13), 1341 July 2003 1343 18. Editor's Address 1345 David Allan 1346 Nortel Networks Phone: 1-613-763-6362 1347 3500 Carling Ave. Email: dallan@nortelnetworks.com 1348 Ottawa, Ontario, CANADA