idnits 2.17.1 

draft-ietf-mpls-framework-02.txt:
-(660): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(690): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == There are 2 instances of lines with non-ascii characters in the document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 9 instances of too long lines in the document, the longest one
     being 3 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 125: '... MPLS forwarding MUST simplify packet ...'
     RFC 2119 keyword, line 132: '...ore technologies MUST be general with ...'
     RFC 2119 keyword, line 134: '...imizations for particular media MAY be...'
     RFC 2119 keyword, line 137: '...ore technologies MUST be compatible wi...'
     RFC 2119 keyword, line 138: '...g protocols, and MUST be capable of op...'
     (23 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 667 has weird spacing: '...ceiving  domai...'

  == Line 1449 has weird spacing: '...ount of  resou...'

  == Line 1651 has weird spacing: '...er LSRs  that ...'

  == Line 1761 has weird spacing: '...ue must  have ...'

  == Line 2136 has weird spacing: '...warding  for t...'

  == (2 more instances...)

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The MPLS protocol MUST not make assumptions about the forwarding
     capabilities of an MPLS node.  Thus, MPLS must propose solutions that can
     leverage the benefits of a node that is capable of L3 forwarding, but
     must not mandate the node be capable of such.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 21, 1998) is 9471 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'TAG' on line 1343

  -- Looks like a reference, but probably isn't: 'ARIS' on line 1635

  -- Looks like a reference, but probably isn't: 'RSVP' on line 1344

  -- Looks like a reference, but probably isn't: 'CSR' on line 1345

  -- Looks like a reference, but probably isn't: 'Ipsilon' on line 1345

  -- Looks like a reference, but probably isn't: 'PNNI' on line 2193

  -- Looks like a reference, but probably isn't: 'TDP' on line 1635

  -- Looks like a reference, but probably isn't: 'FANP' on line 1636

  == Unused Reference: '1' is defined on line 2966, but no explicit reference
     was found in the text

  == Unused Reference: '2' is defined on line 2970, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 2974, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 2978, but no explicit reference
     was found in the text

  == Unused Reference: '5' is defined on line 2982, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 2986, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 2989, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 2993, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 2997, but no explicit reference
     was found in the text

  == Unused Reference: '10' is defined on line 3001, but no explicit
     reference was found in the text

  == Unused Reference: '11' is defined on line 3005, but no explicit
     reference was found in the text

  == Unused Reference: '12' is defined on line 3008, but no explicit
     reference was found in the text

  == Unused Reference: '13' is defined on line 3012, but no explicit
     reference was found in the text

  == Unused Reference: '14' is defined on line 3015, but no explicit
     reference was found in the text

  == Unused Reference: '15' is defined on line 3019, but no explicit
     reference was found in the text

  == Unused Reference: '16' is defined on line 3021, but no explicit
     reference was found in the text

  == Unused Reference: '17' is defined on line 3024, but no explicit
     reference was found in the text

  == Unused Reference: '18' is defined on line 3027, but no explicit
     reference was found in the text

  == Unused Reference: '19' is defined on line 3030, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-07) exists of
     draft-ietf-mpls-arch-00

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  -- Possible downref: Non-RFC (?) normative reference: ref. '7'

  == Outdated reference: A later version (-08) exists of
     draft-ietf-mpls-label-encaps-00

  -- Possible downref: Non-RFC (?) normative reference: ref. '9'

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'

  ** Downref: Normative reference to an Informational RFC: RFC 2098 (ref.
     '11')

  -- Possible downref: Non-RFC (?) normative reference: ref. '12'

  ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref.
     '13')

  -- Unexpected draft version: The latest known version of 
     draft-ietf-rsvp-spec is -15, but you're referring to -16.

  ** Obsolete normative reference: RFC 1583 (ref. '15') (Obsoleted by RFC
     2178)

  ** Obsolete normative reference: RFC 1771 (ref. '16') (Obsoleted by RFC
     4271)

  ** Downref: Normative reference to an Informational RFC: RFC 1953 (ref.
     '17')

  -- Possible downref: Non-RFC (?) normative reference: ref. '18'

  == Outdated reference: A later version (-14) exists of
     draft-ietf-rolc-nhrp-12


     Summary: 16 errors (**), 0 flaws (~~), 31 warnings (==), 21 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         R. Callon
3	INTERNET DRAFT                                    Ascend Communications
4	<draft-ietf-mpls-framework-02.txt>                            P. Doolan
5	                                                      Ennovate Networks
6	                                                             N. Feldman
7	                                                              IBM Corp.
8	                                                            A. Fredette
9	                                                           Bay Networks
10	                                                             G. Swallow
11	                                                          Cisco Systems
12	                                                         A. Viswanathan
13	                                                              IBM Corp.
14	                                                      November 21, 1997
15	                                                   Expires May 21, 1998

17	             A Framework for Multiprotocol Label Switching

19	Status of this Memo

21	   This document is an Internet-Draft.  Internet-Drafts are working
22	   documents of the Internet Engineering Task Force (IETF), its areas,
23	   and its working groups.  Note that other groups may also distribute
24	   working documents as Internet-Drafts.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as ``work in progress.''

31	   To learn the current status of any Internet-Draft, please check the
32	   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
33	   Directories on ds.internic.net (US East Coast), nic.nordu.net
34	   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
35	   Rim).  Distribution of this memo is unlimited.

37	Abstract

39	   This document discusses technical issues and requirements for the
40	   Multiprotocol Label Switching working group. This is an initial draft
41	   document, which will evolve and expand over time. It is the intent of
42	   this document to produce a coherent description of all significant
43	   approaches which were and are being considered by the working group.
44	   Selection of specific approaches, making choices regarding
45	   engineering tradeoffs, and detailed protocol specification, are
46	   outside of the scope of this framework document.

48	   Note that this document is at an early stage, and that most of the
49	   detailed technical discussion is only in a rough form. Additional
50	   text will be provided over time from a number of sources.  A small
51	   amount of the text in this document may be redundant with the
52	   proposed protocol architecture for MPLS. This redundancy will be
53	   reduced over time, with the overall discussion of issues moved to be
54	   in this document, and the selection of specific approaches and
55	   specification of the protocol contained in the protocol architecture
56	   and other related documents.

58	Acknowledgments

60	   The ideas and text in this document have been collected from a number
61	   of sources and comments received. We would like to thank Jim Luciani,
62	   Andy Malis, Rayadurgam Ravikanth, Yakov Rekhter, Eric Rosen, Vijay
63	   Srinivasan, and Pasi Vananen for their inputs and ideas.

65	1. Introduction and Requirements

67	1.1 Overview of MPLS

69	   The primary goal of the MPLS working group is to standardize a base
70	   technology that integrates the label swapping forwarding paradigm
71	   with network layer routing. This base technology (label swapping) is
72	   expected to improve the price/performance of network layer routing,
73	   improve the scalability of the network layer, and provide greater
74	   flexibility in the delivery of (new) routing services (by allowing
75	   new routing services to be added without a change to the forwarding
76	   paradigm).

78	   The initial MPLS effort will be focused on IPv4 and IPv6. However,
79	   the core technology will be extendible to multiple network layer
80	   protocols (e.g., IPX, Appletalk, DECnet, CLNP). MPLS is not confined
81	   to any specific link layer technology, it can work with any media
82	   over which Network Layer packets can be passed between network layer
83	   entities.

85	   MPLS makes use of a routing approach whereby the normal mode of
86	   operation is that L3 routing (e.g., existing IP routing protocols
87	   and/or new IP routing protocols) is used by all nodes to determine
88	   the routed path.

90	   MPLS provides a simple "core" set of mechanisms which can be applied
91	   in several ways to provide a rich functionality. The core effort
92	   includes:

94	   a) Semantics assigned to a stream label:

96	      - Labels are associated with specific streams of data;

98	   b) Forwarding Methods:

100	      - Forwarding is simplified by the use of short fixed length
101	        labels to identify streams

103	      - Forwarding may require simple functions such as looking up a
104	        label in a table, swapping labels, and possibly decrementing
105	        and checking a TTL.

107	      - In some cases MPLS may make direct use of underlying layer 2
108	        forwarding, such as is provided by ATM or Frame Relay
109	        equipment.

111	   c) Label Distribution Methods:

113	      - Allow nodes to determine which labels to use for specific
114	        streams

116	      - This may use some sort of control exchange, and/or be
117	        piggybacked on a routing protocol

119	   The MPLS working group will define the procedures and protocols used
120	   to assign significance to the forwarding labels and to distribute
121	   that information between cooperating MPLS forwarders.

123	1.2 Requirements

125	   - MPLS forwarding MUST simplify packet forwarding in order to do the
126	     following:

128	     - lower cost of high speed forwarding

130	     - improve forwarding performance

132	   - MPLS core technologies MUST be general with respect to data link
133	     technologies (i.e., work over a very wide range of underlying data
134	     links). Specific optimizations for particular media MAY be
135	     considered.

137	   - MPLS core technologies MUST be compatible with a wide range of
138	     routing protocols, and MUST be capable of operating independently
139	     of the underlying routing protocols. It has been observed that
140	     considerable optimizations can be achieved in some cases by small
141	     enhancements of existing protocols. Such enhancements MAY be
142	     considered in the case of IETF standard routing protocols, and if
143	     appropriate, coordinated with the relevant working group(s).

145	   - Routing protocols which are used in conjunction with MPLS might
146	     be based on distributed computation. As such, during routing
147	     transients, these protocols may compute forwarding paths which
148	     potentially contain loops. MPLS MUST provide protocol mechanisms
149	     to either prevent the formation of loops and /or contain the
150	     amount of (networking) resources that can be consumed due to the
151	     presence of loops.

153	   - MPLS forwarding MUST allow "aggregate forwarding" of user data;
154	     i.e., allow streams to be forwarded as a unit and ensure that an
155	     identified stream takes a single path, where a stream may consist
156	     of the aggregate of multiple flows of user data. MPLS SHOULD
157	     provide multiple levels of aggregation support (e.g., from
158	     individual end to end application flows at one extreme, to
159	     aggregates of all flows passing through a specified switch or
160	     router at the other extreme).

162	   - MPLS MUST support operations, administration,  and maintenance
163	     facilities at least as extensive as those supported in current IP
164	     networks. Current network management and diagnostic tools SHOULD
165	     continue to work in order to provide some backward compatibility.
166	     Where such tools are broken by MPLS, hooks MUST be supplied to
167	     allow equivalent functionality to be created.

169	   - MPLS core technologies MUST work with both unicast and multicast
170	     streams.

172	   - The MPLS core specifications MUST clearly state how MPLS operates
173	     in a hierarchical network.

175	   - Scalability issues MUST be considered and analyzed during the
176	     definition of MPLS. Very scaleable solutions MUST be sought.

178	   - MPLS core technologies MUST be capable of working with O(n) streams
179	     to switch all best-effort traffic, where n is the number of nodes
180	     in a MPLS domain. MPLS protocol standards MUST be capable of taking
181	     advantage of hardware that supports stream merging where
182	     appropriate. Note that O(n-squared) streams or VCs might also be
183	     appropriate for use in some cases.

185	   - The core set of MPLS standards, along with existing Internet
186	     standards, MUST be a self-contained solution. For example, the
187	     proposed solution MUST NOT require specific hardware features that
188	     do not commonly exist on network equipment at the time that the
189	     standard is complete. However, the solution MAY make use of
190	     additional optional hardware features (e.g., to optimize
191	     performance).

193	   - The MPLS protocol standards MUST support multipath routing and
194	     forwarding.

196	   - MPLS MUST be compatible with the IETF Integrated Services Model,
197	     including RSVP.

199	   - It MUST be possible for MPLS switches to coexist with non MPLS
200	     switches in the same switched network. MPLS switches SHOULD NOT
201	     impose additional configuration on non-MPLS switches.

203	   - MPLS MUST allow "ships in the night" operation with existing layer
204	     2 switching protocols (e.g., ATM Forum Signaling) (i.e., MPLS must
205	     be capable of being used in the same network which is also
206	     simultaneously operating standard layer 2 protocols).

208	   - The MPLS protocol MUST support both topology-driven and
209	     traffic/request-driven label assignments.

211	1.3 Terminology

213	   aggregate stream

215	     synonym of "stream"

217	   DLCI

219	     a label used in Frame Relay networks to identify frame
220	     relay circuits

222	   flow

224	     a single instance of an application to application flow
225	     of data (as in the RSVP and IFMP use of the term "flow")

227	   forwarding equivalence class

229	     a group of L3 packets which are forwarded in the same
230	     manner (e.g., over the same path, with the same
231	     forwarding treatment). A forwarding equivalence class is
232	     therefore the set of L3 packets which could safely be
233	     mapped to the same label. Note that there may be reasons
234	     that packets from a single forwarding equivalence class
235	     may be mapped to multiple labels (e.g., when stream
236	     merge is not used).

238	   frame merge

240	     stream merge, when it is applied to operation over
241	     frame based media, so that the potential problem of cell
242	     interleave is not an issue.

244	   label

246	     a short fixed length physically contiguous locally
247	     significant identifier which is used to identify a stream

249	   label information base

251	     the database of information containing label bindings

253	   label swap

255	     the basic forwarding operation consisting of looking
256	     up an incoming label to determine the outgoing label,
257	     encapsulation, port, and other data handling information.

259	   label swapping

261	     a forwarding paradigm allowing streamlined forwarding of
262	     data by using labels to identify streams of data to be
263	     forwarded.

265	   label switched hop

267	     the hop between two MPLS nodes, on which forwarding is
268	     done using labels.

270	   label switched path

272	     the path created by the concatenation of one or more label
273	     switched hops, allowing a packet to be forwarded by swapping
274	     labels from an MPLS node to another MPLS node.

276	   layer 2

278	     the protocol layer under layer 3 (which therefore offers
279	     the services used by layer 3). Forwarding, when done by the
280	     swapping of short fixed length labels, occurs at layer 2
281	     regardless of whether the label being examined is an ATM
282	     VPI/VCI, a frame relay DLCI, or an MPLS label.

284	   layer 3

286	     the protocol layer at which IP and its associated routing
287	     protocols operate

289	   link layer
290	     synonymous with layer 2

292	   loop detection

294	     a method of dealing with loops in which loops are allowed
295	     to be set up, and data may be transmitted over the loop,
296	     but the loop is later detected and closed

298	   loop prevention

300	     a method of dealing with loops in which data is never
301	     transmitted over a loop

303	   label stack

305	     an ordered set of labels

307	   loop survival

309	     a method of dealing with loops in which data may be
310	     transmitted over a loop, but means are employed to limit the
311	     amount of network resources which may be consumed by the
312	     looping data

314	   label switching router

316	     an MPLS node which is capable of forwarding native L3 packets

318	   merge point

320	     the node at which multiple streams and switched paths are
321	     combined into a single stream sent over a single path. In the
322	     case that the multiple paths are not combined prior to the
323	     egress node, then the egress node becomes the merge point.

325	   Mlabel

327	     abbreviation for MPLS label

329	   MPLS core standards

331	     the standards which describe the core MPLS technology

333	   MPLS domain

335	     a contiguous set of nodes which operate MPLS routing and
336	     forwarding and which are also in one Routing or Administrative
337	     Domain

339	   MPLS edge node

341	     an MPLS node that connects an MPLS domain with a node which
342	     is outside of the domain, either because it does not run
343	     MPLS, and/or because it is in a different domain. Note that
344	     if an LSR has a neighboring host which is not running MPLS,
345	     that that LSR is an MPLS edge node.

347	   MPLS egress node

349	     an MPLS edge node in its role in handling traffic as it
350	     leaves an MPLS domain

352	   MPLS ingress node

354	     an MPLS edge node in its role in handling traffic as it
355	     enters an MPLS domain

357	   MPLS label

359	     a label placed in a short MPLS shim header used to identify
360	     streams

362	   MPLS node

364	     a node which is running MPLS. An MPLS node will be aware of
365	     MPLS control protocols, will operate one or more L3 routing
366	     protocols, and will be capable of forwarding packets based on
367	     labels. An MPLS node may optionally be also capable of
368	     forwarding native L3 packets.

370	   MultiProtocol Label Switching

372	     an IETF working group and the effort associated with the
373	     working group

375	   network layer

377	     synonymous with layer 3

379	   shortcut VC

381	     a VC set up as a result of an NHRP query and response

383	   stack

385	     synonymous with label stack

387	   stream

389	     an aggregate of one or more flows, treated as one aggregate
390	     for the purpose of forwarding in L2 and/or L3 nodes (e.g.,
391	     may be described using a single label). In many cases a stream
392	     may be the aggregate of a very large number of flows.
393	     Synonymous with "aggregate stream".

395	   stream merge

397	     the merging of several smaller streams into a larger stream,
398	     such that for some or all of the path the larger stream can
399	     be referred to using a single label.

401	   switched path

403	     synonymous with label switched path

405	   virtual circuit

407	     a circuit used by a connection-oriented layer 2 technology
408	     such as ATM or Frame Relay, requiring the maintenance of
409	     state information in layer 2 switches.

411	   VC merge

413	     stream merge when it is specifically applied to VCs,
414	     specifically so as to allow multiple VCs to merge into one
415	     single VC

417	   VP merge

419	     stream merge when it is applied to VPs, specifically so as
420	     to allow multiple VPs to merge into one single VP. In this
421	     case the VCIs need to be unique. This allows cells from
422	     different sources to be distinguished via the VCI.

424	   VPI/VCI

426	     a label used in ATM networks to identify circuits

428	1.4 Acronyms and Abbreviations

430	   DLCI            Data Link Circuit Identifier

432	   FEC             Forwarding Equivalence Class

434	   ISP             Internet Service Provider
435	   LIB             Label Information Base

437	   LDP             Label Distribution Protocol

439	   L2              Layer 2

441	   L3              Layer 3

443	   LSP             Label Switched Path

445	   LSR             Label Switching Router

447	   MPLS            MultiProtocol Label Switching

449	   MPT             Multipoint to Point Tree

451	   NHC             Next Hop (NHRP) Client

453	   NHS             Next Hop (NHRP) Server

455	   VC              Virtual Circuit

457	   VCI             Virtual Circuit Identifier

459	   VPI             Virtual Path Identifier

461	1.5 Motivation for MPLS

463	   This section describes the expected and potential benefits of the
464	   MPLS over existing schemes. Specifically, this section discusses the
465	   advantages of MPLS over previous methods for building core networks
466	   (i.e., networks for internet service providers or for major corporate
467	   backbones). The potential advantages of MPLS in campus and local area
468	   networks are not discussed in this section.

470	   There are currently two commonly used methods for building core IP
471	   networks: (i) Networks of datagram routers in which the core of the
472	   network is based on the datagram routers; (ii) Networks of datagram
473	   routers operating over an ATM core. In order to describe the
474	   advantages of MPLS, it is necessary to know which alternate to MPLS
475	   we are using for the comparison. This section is therefore split into
476	   two sections: Section 1.5.1 describes the advantages of MPLS when
477	   compared to a pure datagram routed network. Section 1.5.2 describes
478	   the advantages of MPLS when compared to an IP over ATM network.

480	   This section does not provide a complete list of requirements for
481	   MPLS. For example, Multipoint to Point Trees are important for MPLS
482	   to scale. However, datagram forwarding naturally acts in this way
483	   (since multiple sources are merged automatically), and the ATM forum
484	   is currently adding support for multipoint to point to the ATM
485	   standards. The ability to do MPTs is therefore important to MPLS, but
486	   does not represent an advantage over either datagram routing or IP
487	   over ATM, and therefore is not mentioned in this section.

489	1.5.1 Benefits Relative to Use of a Router Core

491	1.5.1.1 Simplified Forwarding

493	   Label swapping allows packet forwarding to be based on an exact match
494	   for a short label, rather than a longest match algorithm applied to a
495	   longer address as is required for normal datagram forwarding. In
496	   addition, the label headers used with MPLS are simpler than the
497	   headers typically used with datagram protocols such as IP. This in
498	   turn implies that MPLS allows a much simpler forwarding paradigm
499	   relative to datagrams, and implies that it is easier to build a high
500	   speed router using MPLS.

502	   Whether this simpler forwarding operation will result in availability
503	   of LSRs which can operate at higher speeds than datagram routers is
504	   controversial, and probably depends upon implementation details.
505	   There are some parts of the network, such as at hierarchical
506	   boundaries, where datagram IP forwarding at high speed will be
507	   required. This implies that implementation of a high speed router is
508	   highly desirable. In addition, there are currently multiple companies
509	   building high speed routers which will allow IP packets to be
510	   forwarded at very high speed. At speeds at least up to OC48, it
511	   appears that once the one-time engineering is completed, the per-unit
512	   cost associated with IP forwarding will be a small fraction of the
513	   overall equipment cost.

515	   However, there are also many existing routers which can benefit from
516	   the simpler forwarding allowed by MPLS. In addition, there are some
517	   routers being built with implementations that will benefit from the
518	   simpler forwarding available with MPLS.

520	1.5.1.2 Efficient Explicit Routing

522	   Explicit routing (aka Source Routing) is a very powerful technique
523	   which potentially can be useful for a variety of purposes. However,
524	   with pure datagram routing the overhead of carrying a complete
525	   explicit route with each packet is prohibitive. However, MPLS allows
526	   the explicit route to be carried only at the time that the label
527	   switched path is set up, and not with each packet. This implies that
528	   MPLS makes explicit routing practical. This in turn implies that MPLS
529	   can make possible a number of advanced routing features which depend
530	   upon explicit routing.

532	1.5.1.3 Traffic Engineering

534	   Traffic engineering refers to the process of selecting the paths
535	   chosen by data traffic in order to balance the traffic load on the
536	   various links, routers, and switches in the network. Traffic
537	   engineering is most important in networks where multiple parallel or
538	   alternate paths are available. The rapid growth in the Internet, and
539	   particularly the associated rapid growth in the demand for bandwidth,
540	   has tended to cause some core networks to become increasingly
541	   "branchy" in recent years, resulting in an increase in the importance
542	   of traffic engineering.

544	   It is common today, in networks that are running IP over an ATM core
545	   using PVCs, to manually configure the path of each PVC in order to
546	   equalize the traffic levels on different links in the network. Thus
547	   traffic engineering is typically done today in IP over ATM networks
548	   using manual configuration.

550	   Traffic engineering is difficult to accomplish with datagram routing.
551	   Some degree of load balancing can be obtained by adjusting the
552	   metrics associated with network links. However, there is a limit to
553	   how much can be accomplished in this way, and in networks with a
554	   large number of alternative paths between any two points balancing of
555	   the traffic levels on all links is difficult to achieve solely by
556	   adjustment of the metrics used with hop by hop datagram routing.

558	   MPLS allows streams from any particular ingress node to any
559	   particular egress node to be individually identified. MPLS therefore
560	   provides a straightforward mechanism to measure the traffic
561	   associated with each ingress node to egress node pair. In addition,
562	   since MPLS allows efficient explicit routing of Label Switched Paths,
563	   it is straightforward to ensure that any particular stream of data
564	   takes the preferred path.

566	   The hard part of traffic engineering is selection of the method used
567	   to route each Label Switched Path. There are a variety of possible
568	   ways to do this, ranging from manual configuration of routes, to use
569	   of a routing protocol which announces traffic loads in the network
570	   combined with background recomputation of paths.

572	1.5.1.4 QoS Routing

574	   QoS routing refers to a method of routing in which the route chosen
575	   for a particular stream is chosen in response to the QoS required for
576	   that stream. In many cases QoS routing needs to make use of explicit
577	   routing for several reasons:

579	   In some cases specific bandwidth is likely to be reserved for each of
580	   many specific streams of data. This implies that the total bandwidth
581	   of multiple streams may exceed the bandwidth available on any
582	   particular link, and thus not all streams, even between the same
583	   ingress and egress nodes, can take the same path. Instead, individual
584	   streams will need to be individually routed. This is somewhat
585	   analogous to traffic engineering, but might require separation of
586	   streams on a finer granularity. Thus explicit routing may be needed
587	   in order to allow each stream to be individually routed, and to
588	   eliminate the need for each switch along the path of a stream to
589	   compute the route for each stream.

591	   Consider the case of routing a stream with a specific bandwidth
592	   requirement: In this case the route chosen will depend upon the
593	   amount of bandwidth which is requested. For any one given bandwidth
594	   it is straightforward to select a path. However there are a lot of
595	   different levels of bandwidth which could in principle be requested.
596	   This makes it impractical to precompute all possible paths for all
597	   possible bandwidths. If the path for a particular stream must be
598	   computed on demand, then it is undesirable to require every LSR on
599	   the path to compute the path. Instead, it is preferable to have the
600	   first node compute the path and specify the route to be followed
601	   through use of an explicit route.

603	   For a variety of reasons the information available for QoS routing
604	   may in some cases be slightly out of date. This implies that the
605	   attempt to select a specific path for a QoS-sensitive stream may in
606	   some cases fail, due to a particular node or link not having the
607	   required resources available. In these cases it is not in general
608	   always feasible to tell all other nodes in the network of the limited
609	   resource in one particular network element. If explicit routing is
610	   available, then this permits the initial node of the stream (the
611	   ingress node in MPLS) to be informed that the indicated network
612	   element is not able to carry the stream, allowing an alternate path
613	   to be selected. However, in this case the node that selects the
614	   alternate path has to use explicit routing in order to force the
615	   stream to follow the alternate path.

617	   These and similar examples implies that explicit routing is necessary
618	   in order to do an adequate job of QoS routing. Given that MPLS allows
619	   efficient explicit routing, it follows that MPLS also facilitates QoS
620	   routing.

622	1.5.1.5 Complex Mappings from IP Packet to Forwarding Equivalence Class

624	   MPLS allows the mapping from IP packet to forwarding equivalence
625	   class to be performed only once, at the ingress to an MPLS area. This
626	   facilitates complex mappings from IP packet to FEC than would
627	   otherwise be impractical.

629	   For example, consider the case of provisioned QoS: Some ISPs offer a
630	   service wherein specific customers subscribe to received
631	   differentiated services (e.g., their packets may receive preferential
632	   forwarding treatment). Mapping of IP packets to the service level may
633	   require knowing the customer who is transmitting the packet, which
634	   may in turn require packet filtering based on source and destination
635	   address, incoming interface, and other characteristics. The sheer
636	   number of filters that are needed in a moderate sized ISP preclude
637	   repetition of the filters at every router throughout the network.
638	   Also, some information such as incoming interface is not available
639	   except at the ingress node to the network. This implies that the
640	   preferred way to offer provisioned QoS is to map the packet at the
641	   ingress point to the preferred QoS level, and then label the packet
642	   in some way. MPLS offers an efficient method to label the QoS class
643	   associated with any particular packet.

645	   Other examples of complex mappings from IP packet to FEC are also
646	   likely to be determined as MPLS is deployed.

648	1.5.1.6 Partitioning of Functionality

650	   Due to the support of the different label granularities, it will be
651	   possible to hierarchically partition the processing functionality to
652	   the different network elements, so that the more heavy processing
653	   takes place on the edges of the network, near the customers, and on
654	   the core network the processing is as simple as possible, e.g. pure
655	   label based forwarding.

657	   AS level aggregations will enable building of the fully switched
658	   backbone networks and traffic exchange points. Also, it will be
659	   possible for operators to fully switch the transit traffic traveling
660	   through the operator�s network. Deaggregation will be needed for the
661	   streams that are destined in the networks connected to the MPLS
662	   domain, but it shall be noted that this deaggregation will only need
663	   to perform lookup operations associated with finding the label for
664	   egress router or interface, e.g. TOS information bound to label in
665	   source is still valid, and can be honored on basis of which label the
666	   packet was received in. It shall be noted that it is even impossible
667	   for the receiving  domain to do the classification as the original
668	   packet classification policy is not known by the receiving domain.

670	   As one example of the improved functional partitioning, consider the
671	   case of the use of packet filters to map IP packets into a
672	   substantial number of queues, such that each queue receives
673	   differentiated services. For example, suppose that a network supports
674	   individual queuing for on the order of 100 different customers, with
675	   packets mapped to queues based on the source and destination IP
676	   address. In this case, with MPLS the packet filtering can be done
677	   solely on the edge of the network, with the packets mapped to labels
678	   such that each individual user receives separate labels. Thus with
679	   MPLS the filtering can be performed at the edge only of the network.
680	   This allows complex mappings of IP packets to forwarding equivalence
681	   class.

683	1.5.1.7 Single Forwarding Paradigm with Service Level Differentiation

685	   MPLS can allow a single forwarding paradigm to be used to support
686	   multiple types of service on the same network.

688	   Because of the forwarding paradigm, it will be possible to carry the
689	   different services through the same network elements, regardless of
690	   the control plane protocols used for the population of the LSR�s LIB.
691	   It is for example possible, in case of ATM based switching system to
692	   support all the native ATM services, frame relay services,  and
693	   labeled IP services. The simultaneous support of multiple service may
694	   need partitioning of the label space between the services, and shall
695	   be supported by the label distribution / management protocol.

697	   Non-exhaustive list of examples of the services suitable for carrying
698	   over LSRs are IP traffic, Frame Relay traffic, ATM traffic (in case
699	   of cell switching), IP tunneling, VPNs, and other datagram protocols.

701	   Note that MPLS does not necessarily use the same header format over
702	   all types of media. However, over any particular type of media a
703	   single header format (at least for the lowest level of the Label
704	   Stack) should be possible.

706	1.5.2 Benefits Relative to Use of an ATM or Frame Relay Core

708	   Note: This section compares MPLS with other methods for
709	   interconnecting routers over a switched core network. We are not
710	   considering methods for interconnecting hosts located on virtual
711	   networks. For example the ATM Forum LANE and MPOA standards support
712	   virtual networks. MPLS does not directly support virtual networks,
713	   and should not be compared directly with MPOA or LANE.

715	   Previously available methods for interconnecting routers in an IP
716	   over ATM environment make use of either: (i) a full mesh 'n-squared'
717	   overlay of virtual circuits between n ATM-attached routers; (ii) A
718	   partial mesh of VCs between routers; or (iii) A partial mesh of VCs,
719	   plus the use of NHRP to facilitate on demand cut-through SVCs.

721	1.5.2.1 Scaling of the Routing Protocol

723	   Relative to the interconnection of IP over an ATM core, MPLS improves
724	   the scaling of routing due to reduced number of peers and elimination
725	   of the 'n-squared' logical links between routers used to operate the
726	   routing protocols.

728	   Because all LSRs will run standard routing protocols, the number of
729	   the peers routers need to communicate with are reduced to the number
730	   of the LSRs and router given LSR is directly connected to, instead of
731	   having to peer with large number of routers at the ends of the
732	   switched L2 paths. This benefit is achieved because the edge LSRs do
733	   not need to peer with every other edge LSR in the domain as is case
734	   on hybrid switch / router network.

736	1.5.2.2 Common Operation over Packet and Cell media

738	   MPLS makes use of common methods for routing and forwarding over
739	   packet and cell media, and potentially allows a common approach to
740	   traffic engineering, QoS routing, and other aspects of operation. For
741	   example, this means that the same method for label distribution can
742	   be used over Frame Relay and ATM media, as well as between LSRs using
743	   the MPLS Shim Header for forwarding over other media (such as PPP
744	   links and broadcast LANs).

746	   Note: There may be some differences with respect to the operation of
747	   different media. For example, if VP merge is used with ATM media
748	   (rather than VC merge) then the merge operation may be somewhat
749	   different than what it would be with packet media or with ATM using
750	   VC merge.

752	1.5.2.3 Easier Management

754	   The use of a common method for label distribution and common routing
755	   protocols over multiple types of media is expected to simplify
756	   network management of MPLS networks.

758	1.5.2.4 Elimination of the 'Routing over Large Clouds' Issue

760	   MPLS eliminates the need to use NHRP and on-demand cut-through SVCs
761	   for operation over ATM. This eliminates the latency problem
762	   associated with cut-through SVCs.

764	2. Discussion of Core MPLS Components

766	2.1 The Basic Routing Approach

768	   Routing is accomplished through the use of standard L3 routing
769	   protocols, such as OSPF and BGP.  The information maintained by the
770	   L3 routing protocols is then used to distribute labels to neighboring
771	   nodes that are used in the forwarding of packets as described below.

773	   In the case of ATM networks, the labels that are distributed are
774	   VPI/VCIs and a separate protocol (i.e., PNNI) is not necessary for
775	   the establishment of VCs for IP forwarding.

777	   The topological scope of a routing protocol (i.e. routing domain) and
778	   the scope of label switching MPLS-capable nodes may be different.
779	   For example, MPLS-knowledgeable and MPLS-ignorant nodes, all of which
780	   are OSPF routers, may be co-resident in an area. In the case that
781	   neighboring routers know MPLS, labels can be exchanged and used.

783	   Neighboring MPLS routers may use configured PVCs or PVPs to tunnel
784	   through non-participating ATM or FR switches.

786	2.2 Labels

788	   In addition to the single routing protocol approach discussed above,
789	   the other key concept in the basic MPLS approach is the use of short
790	   fixed length labels to simply user data forwarding.

792	2.2.1 Label Semantics

794	   It is important that the MPLS solutions are clear about what
795	   semantics (i.e., what knowledge of the state of the network) is
796	   implicit in the use of labels for forwarding user data packets or
797	   cells.

799	   At the simplest level, a label may be thought of as nothing more than
800	   a shorthand for the packet header, in order to index the forwarding
801	   decision that a router would make for the packet. In this context,
802	   the label is nothing more than a shorthand for an aggregate stream of
803	   user data.

805	   This observation leads to one possible very simple interpretation
806	   that the "meaning" of the label is a strictly local issue between two
807	   neighboring nodes. With this interpretation: (i) MPLS could be
808	   employed between any two neighboring nodes for forwarding of data
809	   between those nodes, even if no other nodes in the network
810	   participate in MPLS; (ii) When MPLS is used between more than two
811	   nodes, then the operation between any two neighboring nodes could be
812	   interpreted as independent of the operation between any other pair of
813	   nodes. This approach has the advantage of semantic simplicity, and of
814	   being the closest to pure datagram forwarding. However this approach
815	   (like pure datagram forwarding) has the disadvantage that when a
816	   packet is forwarded it is not known whether the packet is being
817	   forwarded into a loop, into a black hole, or towards links which have
818	   inadequate resources to handle the traffic flow. These disadvantages
819	   are necessary with pure datagram forwarding, but are optional design
820	   choices to be made when label switching is being used.

822	   There are cases where it would be desirable to have additional
823	   knowledge implicit in the existence of the label. For example, one
824	   approach to avoiding loops (see section 4.3) involves signaling the
825	   label distribution along a path before packets are forwarded on that
826	   path. With this approach the fact that a node has a label to use for
827	   a particular IP packet would imply the knowledge that following the
828	   label (including label swapping at subsequent nodes) leads to a non-
829	   looping path which makes progress towards the destination (something
830	   which is usually, but not necessarily always true when using pure
831	   datagram routing). This would of course require some sort of label
832	   distribution/setup protocol which signals along the path being setup
833	   before the labels are available for packet forwarding. However, there
834	   are also other consequences to having additional semantics associated
835	   with the label: specifically, procedures are needed to ensure that
836	   the semantics are correct. For example, if the fact that you have a
837	   label for a particular destination implies that there is a loop-free
838	   path, then when the path changes some procedures are required to
839	   ensure that it is still loop free. Another example of semantics which
840	   could be implicit in a label is the identity of the higher level
841	   protocol type which is encoded using that label value.

843	   In either case, the specific value of a label to use for a stream is
844	   strictly a local issue; however the decision about whether to use the
845	   label may be based on some global (or at least wider scope) knowledge
846	   that, for example, the label-switched path is loop-free and/or has
847	   the appropriate resources.

849	   A similar example occurs in ATM networks: With standard ATM a
850	   signaling protocol is used which both reserves resources in switches
851	   along the path, and which ensures that the path is loop-free and
852	   terminates at the correct node. Thus implicit in the fact that an ATM
853	   node has a VPI/VCI for forwarding a particular piece of data is the
854	   knowledge that the path has been set up successfully.

856	   Another similar examples occurs with multipoint to point trees over
857	   ATM (see section 4.2 below), where the multipoint to point tree uses
858	   a VP, and cell interleave at merge points in the tree is handled by
859	   giving each source on the tree a distinct VCI within the VP. In this
860	   case, the fact that each source has a known VPI/VCI to use needs to
861	   (implicitly or explicitly) imply the knowledge that the VCI assigned
862	   to that source is unique within the context of the VP.

864	   In general labels are used to optimize how the system works, not to
865	   control how the system works. For example, the routing protocol
866	   determines the path that a packet follows. The presence or absence of
867	   a label assignment should not effect the path of a L3 packet. Note
868	   however that the use of labels may make capabilities such as explicit
869	   routes, loadsharing, and multipath more efficient.

871	2.2.2 Label Granularity

873	   Labels are used to create a simple forwarding paradigm.  The
874	   essential element in assigning a label is that the device which will
875	   be using the label to forward packets will be forwarding all packets
876	   with the same label in the same way.  If the packet is to be
877	   forwarded solely by looking at the label, then at a minimum, all
878	   packets with the same incoming label must be forwarded out the same
879	   port(s) with the same encapsulation(s), and with the same next hop
880	   label (if any).

882	   The term "forwarding equivalence class" is used to refer to a set of
883	   L3 packets which are all forwarded in the same manner by a particular
884	   LSR (for example, the IP packets in a forwarding equivalence class
885	   may be destined for the same egress from an MPLS network, and may be
886	   associated with the same QoS class). A forwarding equivalence class
887	   is therefore the set of L3 packets which could safely be mapped to
888	   the same label. Note that there may be reasons that packets from a
889	   single forwarding equivalence class may be mapped to multiple labels
890	   (e.g., when stream merge is not used).

892	   Note that the label could also mean "ignore this label and forward
893	   based on what is contained within," where within one might find a
894	   label (if a stack of labels is used) or a layer 3 packet.

896	   For IP unicast traffic, the granularity of a label allows various
897	   levels of aggregation in a Label Information Base (LIB).  At one end
898	   of the spectrum, a label could represent a host route (i.e. the full
899	   32 bits of IP address).  If a router forwards an entire CIDR prefix
900	   in the same way, it may choose to use one label to represent that
901	   prefix.  Similarly if the router is forwarding several (otherwise
902	   unrelated) CIDR prefixes in the same way it may choose to use the
903	   same label for this set of prefixes.  For instance all CIDR prefixes
904	   which share the same BGP Next Hop could be assigned the same label.
905	   Taking this to the limit, an egress router may choose to advertise
906	   all of its prefixes with the same label.

908	   By introducing the concept of an egress identifier, the distribution
909	   of labels associated with groups of CIDR prefixes can be simplified.
910	   For instance, an egress identifier might specify the BGP Next Hop,
911	   with all prefixes routed to that next hop receiving the label
912	   associated with that egress identifier.  Another natural place to
913	   aggregate would be the MPLS egress router.  This would work
914	   particularly well in conjunction with a link-state routing protocol,
915	   where the association between egress router and CIDR prefix is
916	   already distributed throughout an area.

918	   For IP multicast, the natural binding of a label would be to a
919	   multicast tree, or rather to the branch of a tree which extends from
920	   a particular port.  Thus for a shared tree, the label corresponds to
921	   the multicast group, (*,G).  For (S,G) state, the label would
922	   correspond to the source address and the multicast group.

924	   A label can also have a granularity finer than a host route.  That
925	   is, it could be associated with some combination of source and
926	   destination address or other information within the packet.  This
927	   might for example be done on an administrative basis to aid in
928	   effecting policy.  A label could also correspond to all packets which
929	   match a particular Integrated Services filter specification.

931	   Labels can also represent explicit routes.  This use is semantically
932	   equivalent to using an IP tunnel with a complete explicit route. This
933	   is discussed in more detail in section 4.10.

935	2.2.2.1 Examples of Unicast traffic granularities:

937	    - PQ (Port Quadruples) same IP source address prefix, destination
938	   address prefix, TTL, IP protocol and TCP/UDP source/destination ports

940	    - PQT (Port Quadruples with TOS) same IP source address prefix,
941	   destination address prefix, TTL, IP protocol and TCP/UDP
942	   source/destination ports and same IP header TOS field (including
943	   Precedence and TOS bits).

945	    - HP (Host Pairs) Same specific IP source and destination address
946	   (32 bit)

948	    - NP (Network Pairs) Same IP source and destination address prefixes
949	   (variable length)

951	    - DN (Destination Network) Same IP destination network address
952	   prefix (variable length)

954	    - ER (Egress Router) Same egress router ID (e.g. OSPF)

956	    - NAS (Next-hop AS) Same next-hop AS number (BGP)

958	    - DAS (Destination AS) Same destination AS number (BGP)

960	2.2.2.2 Multicast traffic granularities:

962	    - SST (Source Specific Tree) Same source address and multicast group

964	    - SMT (Shared Multicast Tree) Same multicast group address

966	2.2.3 Label Assignment
967	   Essential to label switching is the notion of binding between a label
968	   and Network Layer routing (routes).  A control component is
969	   responsible for creating label bindings, and then distributing the
970	   label binding information among label switches. Label assignment
971	   involves allocating a label, and then binding a label to a route.

973	   Label assignment can be driven by control traffic or by data traffic.
974	   This is discussed in more detail in section 3.4.

976	   Control traffic driven label assignment has several advantages, as
977	   compared to data traffic driven label Assignment. For one thing, it
978	   minimizes the amount of additional control traffic needed to
979	   distribute label binding information, as label binding information is
980	   distributed only in response to control traffic, independent of data
981	   traffic. It also makes the overall scheme independent of and
982	   insensitive to the data traffic profile/pattern. Control traffic
983	   driven creation of label binding improves forwarding latency, as
984	   labels are assigned before data traffic arrives, rather than being
985	   assigned as data traffic arrives. It also simplifies the overall
986	   system behavior, as the control plane is controlled solely by control
987	   traffic, rather than by a mix of control and data traffic.

989	   There are however situations where data traffic driven label
990	   assignment is necessary.  A particular case may occur with ATM
991	   without VP or VC merge. In this case in order to set up a full mesh
992	   of VCs would require n-squared VCs. However, in very large networks
993	   this may be infeasible. Instead VCs may be setup where required for
994	   forwarding data traffic. In this case it is generally not possible to
995	   know a priori how many such streams may occur.

997	   Label withdrawal is required with both control-driven and data-driven
998	   label assignment. Label withdrawal is primarily a matter of garbage
999	   collection, that is collecting up unused labels so that they may be
1000	   reassigned.  Generally speaking, a label should be withdrawn when the
1001	   conditions that allowed it to be assigned are no longer true. For
1002	   example, if a label is imbued with extra semantics such as loop-free-
1003	   ness, then the label must be withdrawn when those extra semantics
1004	   cease to hold.

1006	   In certain cases, notably multicast, it may be necessary to share a
1007	   label space between multiple entities.  If these sharing arrangements
1008	   are altered by the coming and going of neighbors, then labels which
1009	   are no longer controlled by an entity must be withdrawn and a new
1010	   label assigned.

1012	2.2.4 Label Stack and Forwarding Operations

1014	   The basic forwarding operation consists of looking up the incoming
1015	   label to determine the outgoing label, encapsulation, port, and any
1016	   additional information which may pertain to the stream such as a
1017	   particular queue or other QoS related treatment.  We refer to this
1018	   operation as a label swap.

1020	   When a packet first enters an MPLS domain, the packet is forwarded by
1021	   normal layer 3 forwarding operations with the exception that the
1022	   outgoing encapsulation will now include a label.  We refer to this
1023	   operation as a label push.  When a packet leaves an MPLS domain, the
1024	   label is removed.  We refer to this as a label pop.

1026	   In some situations, carrying a stack of labels is useful.  For
1027	   instance both IGP and BGP label could be used to allow routers in the
1028	   interior of an AS to be free of BGP information.  In this scenario,
1029	   the "IGP" label is used to steer the packet through the AS and the
1030	   "BGP" label is used to switch between ASes.

1032	   With a label stack, the set of label operations remains the same,
1033	   except that at some points one might push or pop multiple labels, or
1034	   pop & swap, or swap & push.

1036	2.3 Encapsulation

1038	   Label-based forwarding makes use of various pieces of information,
1039	   including a label or stack of labels, and possibly additional
1040	   information such as a TTL field. In some cases this information may
1041	   be encoded using an MPLS header, in other cases this information may
1042	   be encoded in L2 headers. Note that there may be multiple types of
1043	   MPLS headers. For example, the header used over one media type may be
1044	   different than is used over a different media type. Similarly, in
1045	   some cases the information that MPLS makes use of may be encoded in
1046	   an ATM header. We will use the term "MPLS encapsulation" to refer to
1047	   whatever form is used to encapsulate the label information and other
1048	   information used for label based forwarding. The term "MPLS header"
1049	   will be used where this information is carried in some sort of MPLS-
1050	   specific header (i.e., when the MPLS information cannot all be
1051	   carried in a L2 header). Whether there is one or multiple forms of
1052	   possible MPLS headers is also outside of the scope of this document.

1054	   The exact contents of the MPLS encapsulation is outside of the scope
1055	   of this document. Some fields, such as the label, are obviously
1056	   needed. Some others might or might not be standardized, based on
1057	   further study. An encapsulation scheme may make use of the following
1058	   fields:

1060	     -  label
1061	     -  TTL
1062	     -  class of service
1063	     -  stack indicator
1064	     -  next header type indicator
1065	     -  checksum

1067	   It is desirable to have a very short encapsulation header.  For
1068	   example, a four byte encapsulation header adds to the convenience of
1069	   building a hardware implementation that forwards based on the
1070	   encapsulation header. But at the same time it is tricky assigning
1071	   such a limited number of bits to carry the above listed information
1072	   in an MPLS header. Hence careful consideration must be given to the
1073	   information chosen for an MPLS header.

1075	   A TTL value in the MPLS header may be useful in the same manner as it
1076	   is in IP. Specifically, TTL may be used to terminate packets caught
1077	   in a routing loop, and for other related uses such as traceroute. The
1078	   TTL mechanism is a simple and proven method of handling such events.
1079	   Another use of TTL is to expire packets in a network by limiting
1080	   their "time to live" and eliminating stale packets that may cause
1081	   problems for some of the higher layer protocols. When used over link
1082	   layers which do not provide a TTL field, alternate mechanisms will be
1083	   needed to replace the uses of the TTL field.

1085	   A provision for a class of service (COS) field in the MPLS header
1086	   allows multiple service classes within the same label.  However, when
1087	   more sophisticated QoS is associated with a label, the COS may not
1088	   have any significance.  Alternatively, the COS (like QoS) can be left
1089	   out of the header, and instead propagated with the label assignment,
1090	   but this entails that a separate label be assigned to each required
1091	   class of service.  Nevertheless, the COS mechanism provides a simple
1092	   method of segregating flows within a label.

1094	   As previously mentioned, the encapsulation header can be used to
1095	   derive benefits of tunneling (or stacking).

1097	   The MPLS header must provide a way to indicate that multiple MPLS
1098	   headers are stacked (i.e., the "stack indicator").  For this purpose
1099	   a single bit in the MPLS header will suffice. In addition, there are
1100	   also some benefits to indicating the type of the protocol header
1101	   following the MPLS header (i.e., the "next header type indicator").
1102	   One option would be to combine the stack indicator and next header
1103	   type indicator into a single value (i.e., the next header type
1104	   indicator could be allowed to take the value "MPLS header"). Another
1105	   option is to have the next header type indicator be implicit in the
1106	   label value (such that this information would be propagated along
1107	   with the label).

1109	   There is no compelling reason to support a checksum field in the MPLS
1110	   header. A CRC mechanism at the L2 layer should be sufficient to
1111	   ensure the integrity of the MPLS header.

1113	3. Observations, Issues and Assumptions

1115	3.1 Layer 2 versus Layer 3 Forwarding

1117	   MPLS uses L2 forwarding as a way to provide simple and fast packet
1118	   forwarding capability.  One primary reason for the simplicity of L2
1119	   layer forwarding comes from its short, fixed length labels.  A node
1120	   forwarding at L3 must parse a (relatively) large header, and perform
1121	   a longest-prefix match to determine a forwarding path.  However, when
1122	   a node performs L2 label swapping, and labels are assigned properly,
1123	   it can do a direct index lookup into its forwarding (or in this case,
1124	   label-swapping) table with the short header. It is arguably simpler
1125	   to build label swapping hardware than it is to build L3 forwarding
1126	   hardware because the label swapping function is less complex.

1128	   The relative performance of L2 and L3 forwarding may differ
1129	   considerably between nodes. Some nodes may illustrate an order of
1130	   magnitude difference. Other nodes (for example, nodes with more
1131	   extensive L3 forwarding hardware) may have identical performance at
1132	   L2 and L3. However, some nodes may not be capable of doing a L3
1133	   forwarding at all (e.g. ATM), or have such limited capacity as to be
1134	   unusable at L3.  In this situation, traffic must be blackholed if no
1135	   switched path exists.

1137	   On nodes in which L3 forwarding is slower than L2 forwarding, pushing
1138	   traffic to L3 when no L2 path is available may cause congestion. In
1139	   some cases this could cause data loss (since L3 may be unable to keep
1140	   up with the increased traffic). However, if data is discarded, then
1141	   in general this will cause TCP to backoff, which would allow control
1142	   traffic, traceroute and other network management tools to continue to
1143	   work.

1145	   The MPLS protocol MUST not make assumptions about the forwarding
1146	   capabilities of an MPLS node.  Thus, MPLS must propose solutions that
1147	   can leverage the benefits of a node that is capable of L3 forwarding,
1148	   but must not mandate the node be capable of such.

1150	   Why We Will Still Need L3 Forwarding:

1152	   MPLS will not, and is not intended to, replace L3 forwarding. There
1153	   is absolutely a need for some systems to continue to forward IP
1154	   packets using normal Layer 3 IP forwarding. L3 forwarding will be
1155	   needed for a variety of reasons, including:

1157	     -  For scaling; to forward on a finer granularity than the labels
1158	        can provide
1159	     -  For security; to allow packet filtering at firewalls.
1160	     -  For forwarding at the initial router (when hosts don't do MPLS)

1162	   Consider a campus network which is serving a small company. Suppose
1163	   that this companies makes use of the Internet, for example as a
1164	   method of communicating with customers. A customer on the other side
1165	   of the world has an IP packet to be forwarded to a particular system
1166	   within the company. It is not reasonable to expect that the customer
1167	   will have a label to use to forward the packet to that specific
1168	   system. Rather, the label used for the "first hop" forwarding might
1169	   be sufficient to get the packet considerably closer to the
1170	   destination. However, the granularity of the labels cannot be to
1171	   every host worldwide. Similarly, routing used within one routing
1172	   domain cannot know about every host worldwide. This implies that in
1173	   may cases the labels assigned to a particular packet will be
1174	   sufficient to get the packet close to the destination, but that at
1175	   some points along the path of the packet the IP header will need to
1176	   be examined to determine a finer granularity for forwarding that
1177	   packet. This is particularly likely to occur at domain boundaries.

1179	   A similar point occurs at the last router prior to the destination
1180	   host. In general, the number of hosts attached to a network is likely
1181	   to be great enough that it is not feasible to assign a separate label
1182	   to every host. Rather, as least for routing within the destination
1183	   routing domain (or the destination area if there is a hierarchical
1184	   routing protocol in use) a label may be assigned which is sufficient
1185	   to get the packet to the last hop router. However, the last hop
1186	   router will need to examine the IP header (and particularly the
1187	   destination IP address) in order to forward the packet to the correct
1188	   destination host.

1190	   Packet filtering at firewalls is an important part of the operation
1191	   of the Internet. While the current state of Internet security may be
1192	   considerably less advanced than may be desired, nonetheless some
1193	   security (as is provided by firewalls) is much better than no
1194	   security. We expect that packet filtering will continue to be
1195	   important for the foreseeable future. Packet filtering requires
1196	   examination of the contents of the packet, including the IP header.
1197	   This implies that at firewalls the packet cannot be forwarded simply
1198	   by considering the label associated with the packet. Note that this
1199	   is also likely to occur at domain boundaries.

1201	   Finally, it is very likely that many hosts will not implement MPLS.
1202	   Rather, the host will simply forward an IP packet to its first hop
1203	   router. This first hop router will need to examine the IP header
1204	   prior to forwarding the packet (with or without a label).

1206	3.2 Scaling Issues

1208	   MPLS scalability is provided by two of the principles of routing.
1209	   The first is that forwarding follows an inverted tree rooted at a
1210	   destination.  The second is that the number of destinations is
1211	   reduced by routing aggregation.

1213	   The very nature of IP forwarding is a merged multipoint-to-point
1214	   tree. Thus, since MPLS mirrors the IP network layer, an MPLS node
1215	   that is capable of merging is capable of creating O(n) switched paths
1216	   which provide network reachability to all "n" destinations.  The
1217	   meaning of "n" depends on the granularity of the switched paths.  One
1218	   obvious choice of "n" is the number of CIDR prefixes existing in the
1219	   forwarding table (this scales the same as today's routing). However,
1220	   the value of "n" may be reduced considerably by choosing switched
1221	   paths of further aggregation. For example, by creating switched paths
1222	   to each possible egress node, "n" may represent the number of egress
1223	   nodes in a network. This choice creates "n" switched paths, such that
1224	   each path is shared by all CIDR prefixes that are routed through the
1225	   same egress node.  This selection greatly improves scalability, since
1226	   it minimizes "n", but at the same time maintains the same switching
1227	   performance of CIDR aggregation. (See section 2.2.2 for a description
1228	   of all of the levels of granularity provided by MPLS).

1230	   The MPLS technology must scale at least as well as existing
1231	   technology. For example, if the MPLS technology were to support ONLY
1232	   host-to-host switched path connectivity, then the number of
1233	   switched-paths would be much higher than the number of routing table
1234	   entries.

1236	   There are several ways in which merging can be done in order to allow
1237	   O(n) switches paths to connect n nodes. The merging approach used has
1238	   an impact on the amount of state information, buffering, delay
1239	   characteristics, and the means of control required to coordinate the
1240	   trees. These issues are discussed in more detail in section 4.2.

1242	   There are some cases in which O(n-squared) switched paths may be used
1243	   (for example, by setting up a full mesh of point to point streams).
1244	   As label space and the amount of state information that can be
1245	   supported may be limited, it will not be possible to support O(n-
1246	   squared) switched paths in very large networks. However, in some
1247	   cases the use of n- squared paths may even be a advantage (for
1248	   example, to allow load- splitting of individual streams).

1250	   MPLS must be designed to scale for O(n). O(n) scaling allows MPLS
1251	   domains to scale to a very large scale. In addition, if best effort
1252	   service can be supported with O(n) scaling, this conserves resources
1253	   (such as label space and state information) which can be used for
1254	   supporting advanced services such as QoS. However, since some
1255	   switches may not support merging, and some small networks may not
1256	   require the scaling benefits of O(n), provisions must also be
1257	   provided for a non- merging, O(n-squared) solution.

1259	   Note: A precise and complete description of scaling would consider
1260	   that there are multiple dimensions of scaling, and multiple resources
1261	   whose usage may be considered. Possible dimensions of scaling
1262	   include: (i) the total number of streams which exist in an MPLS
1263	   domain (with associated labels assigned to them); (ii) the total
1264	   number of "label swapping pairs" which may be stored in the nodes of
1265	   the network (ie, entries of the form "for incoming label 'x', use
1266	   outgoing label 'y'"); (iii) the number of labels which need to be
1267	   assigned for use over a particular link; (iv) The amount of state
1268	   information which needs to be maintained by any one node. We do not
1269	   intend to perform a complete analysis of all possible scaling issues,
1270	   and understand that our use of the terms "O(n)" and "O(n-squared)" is
1271	   approximate only.

1273	3.3 Types of Streams

1275	   Switched paths in the MPLS network can be of different types:

1277	     -  point-to-point
1278	     -  multipoint-to-point
1279	     -  point-to-multipoint
1280	     -  multipoint-to-multipoint

1282	   Two of the factors that determine which type of switched path is used
1283	   are (i) The capability of the switches employed in a network; (ii)
1284	   The purpose of the creation of a switched path; that is, the types of
1285	   flows to be carried in the switched path.  These two factor also
1286	   determine the scalability of a network in terms of the number of
1287	   switched paths in use for transporting data through a network.

1289	   The point-to-point switched path can be used to connect all ingress
1290	   nodes to all the egress nodes to carry unicast traffic.  In this
1291	   case, since an ingress node has point-to-point connections to all the
1292	   egress nodes, the number of connections in use for transporting
1293	   traffic is of O(n-squared), where n is the number of edges MPLS
1294	   devices.  For small networks the full mesh connection approach may
1295	   suffice and not pose any scalability problems.  However, in large
1296	   enterprise backbone or ISP networks, this will not scale well.

1298	   Point-to-point switched paths may be used on a host-to-host or
1299	   application to application basis (e.g., a switched path per RSVP
1300	   flow). The dedicated point-to-point switched path transports the
1301	   unicast data from the ingress to the egress node of the MPLS network.

1303	   This approach may be used for providing QoS services or for best-
1304	   effort traffic.

1306	   A multipoint-to-point switched path connects all ingress nodes to an
1307	   single egress node. At a given intermediate node in the multipoint-
1308	   to- point switched path, L2 data units from several upstream links
1309	   are "merged" into a single label on a downstream link.  Since each
1310	   egress node is reachable via a single multipoint-to-point switched
1311	   path, the number of switched paths required to transport best-effort
1312	   traffic through a MPLS network is O(n), where n is the number of
1313	   egress nodes.

1315	   The point-to-multipoint switched path is used for distributing
1316	   multicast traffic. This switched path tree mirrors the multicast
1317	   distribution tree as determined by the multicast routing protocols.
1318	   Typically a switch capable of point-to-multipoint connection
1319	   replicates an L2 data unit from the incoming (parent) interface to
1320	   all the outgoing (child) interfaces. Standard ATM switches support
1321	   such functionality in the form of point-to-multipoint VCs or VPs.

1323	   A multipoint-to-multipoint switched path may be used to combine
1324	   multicast traffic from multiple sources into a single multicast
1325	   distribution tree.  The advantage of this is that the multipoint-to-
1326	   multipoint switched path is shared by multiple sources. Conceptually,
1327	   a form of multipoint-to-multipoint can be thought of as follows:
1328	   Suppose that you have a point to multipoint VC from each node to all
1329	   other nodes. Suppose that any point where two or more VCs happen to
1330	   merge, you merge them into a single VC or VP. This would require
1331	   either coordination of VCI spaces (so that each source has a unique
1332	   VCI within a VP) or VC merge capabilities. The applicability of
1333	   similar concepts to MPLS is FFS.

1335	3.4 Data Driven versus Control Traffic Driven Label Assignment

1337	   A fundamental concept in MPLS is the association of labels and
1338	   network layer routing. Each LSR must assign labels, and distribute
1339	   them to its forwarding peers, for traffic which it intends to forward
1340	   by label swapping.  In the various contributions that have been made
1341	   so far to the MPLS WG we identify three broad strategies for label
1342	   assignment; (i) those driven by topology based control traffic
1343	   [TAG][ARIS][IP navigator]; (ii) Those driven by request based control
1344	   traffic [RSVP]; and (iii) those driven by data traffic
1345	   [CSR][Ipsilon].

1347	   We also note that in actual practice combinations of these methods
1348	   may be employed. One example is that topology based methods for best
1349	   effort traffic plus request based methods for support of RSVP.

1351	3.4.1 Topology Driven Label Assignment

1353	   In this scheme labels are assigned in response to normal processing
1354	   of routing protocol control traffic. Examples of such control
1355	   protocols are OSPF and  BGP. As an LSR processes OSPF or BGP updates
1356	   it can, as it makes or changes entries in its forwarding tables,
1357	   assign labels to those entries.

1359	   Among the properties of this scheme are:

1361	   - The computational load of assignment and distribution and the
1362	     bandwidth consumed by label distribution are bounded by the size
1363	     of the network.

1365	   - Labels are in the general case preassigned. If a route exists then
1366	     a label has been assigned to it (and distributed). Traffic may be
1367	     label swapped immediately it arrives, there is no label setup
1368	     latency at forwarding time.

1370	   - Requires LSRs to be able to process control traffic load only.

1372	   - Labels assigned in response to the operation of routing protocols
1373	     can have a granularity equivalent to that of the routes advertised
1374	     by the protocol. Labels can, by this means, cover (highly)
1375	     aggregated routes.

1377	3.4.2 Request Driven Label Assignment

1379	   In this scheme labels are assigned in response to normal processing
1380	   of request based control traffic. Examples of such control protocols
1381	   are RSVP. As an LSR processes RSVP messages it can, as it makes or
1382	   changes entries in its forwarding tables, assign labels to those
1383	   entries.

1385	   Among the properties of this scheme are:

1387	   - The computational load of assignment and distribution and the
1388	     bandwidth consumed by label distribution are bounded by the
1389	     amount of control traffic in the system.

1391	   - Labels are in the general case preassigned. If a route exists
1392	     then a label has been assigned to it (and distributed). Traffic
1393	     may be label swapped immediately it arrives, there is no label
1394	     setup latency at forwarding time.

1396	   - Requires LSRs to be able to process control traffic load only.

1398	   - Depending upon the number of flows supported, this approach may
1399	     require a larger number of labels to be assigned compared with
1400	     topology driven assignment.

1402	   - This approach requires applications to make use of request
1403	     paradigm in order to get a label assigned to their flow.

1405	3.4.3 Traffic Driven Label Assignment

1407	   In this scheme the arrival of data at an LSR "triggers" label
1408	   assignment and distribution. Traffic driven approach has the
1409	   following characteristics.

1411	   - Label assignment and distribution costs are a function of
1412	     traffic patterns. In an LSR with limited label space that is
1413	     using a traffic driven approach to amortize its labels over a
1414	     larger number of flows the overhead due to label assignment
1415	     and distribution grows as a function of the number of flows
1416	     and as a function of their "persistence". Short lived but
1417	     recurring flows may impose a heavy control burden.

1419	   - There is a latency associated with the appearance of a "flow"
1420	     and the assignment of a label to it. The documented approaches
1421	     to this problem suggest L3 forwarding during this setup phase,
1422	     this has the potential for packet reordering (note that packet
1423	     reordering may occur with any scheme when the network topology
1424	     changes, but traffic driven label assignment introduces another
1425	     cause for reordering).

1427	   - Flow driven label assignment requires high performance packet
1428	     classification capabilities.

1430	   - Traffic driven label assignment may be useful to reduce label
1431	     consumption (assuming that flows are not close to full mesh).

1433	   - If you want flows to hosts, due to limits on label space, then
1434	     traffic based label consumption is probably necessary due to
1435	     the large number of hosts which may occur in a network.

1437	   - If you want to assign specific network resources to specific
1438	     labels, to be used for support of application flows, then
1439	     again the fine grain associated with labels may require data
1440	     based label assignment.

1442	3.5 The Need for Dealing with Looping

1444	   Routing protocols which are used in conjunction with MPLS will in
1445	   many cases be based on distributed computation. As such, during
1446	   routing transients, these protocols may compute forwarding paths
1447	   which contain loops. For this reason MPLS will be designed with
1448	   mechanisms to either prevent the formation of loops and /or contain
1449	   the amount of  resources that can be consumed due to the presence of
1450	   loops.

1452	   Note that there are a number of different alternative mechanisms
1453	   which have been proposed (see section 4.3). Some of these prevent the
1454	   formation of layer 2 forwarding loops, others allow loops to form but
1455	   minimize their impact in one way or another (e.g., by discarding
1456	   packets which loop, or by detecting and closing the loop after a
1457	   period of time). Generally speaking, there are tradeoffs to be made
1458	   between the amount of looping which might occur, and other
1459	   considerations such as the time to convergence after a change in the
1460	   paths computed by the routing algorithm.

1462	   We are not proposing any changes to normal layer 3 operation, and
1463	   specifically are not trying to eliminate the possibility of looping
1464	   at layer 3. Transient loops will continue to be possible in IP
1465	   networks. Note that IP has a means to limit the damage done by
1466	   looping packets, based on decrementing the IP TTL field as the packet
1467	   is forwarded, and discarding packets whose TTL has expired. Dynamic
1468	   routing protocols used with IP are also designed to minimize the
1469	   amount of time during which loops exist.

1471	   The question that MPLS has to deal with is what to do at L2. In some
1472	   cases L2 may make use of the same method that is used as L3. However,
1473	   other options are available at L2, and in some cases (specifically
1474	   when operating over ATM or Frame Relay hardware) the method of
1475	   decrementing a TTL field (or any similar field) is not available.

1477	   There are basically two problems caused by packet looping: The most
1478	   obvious problem is that packets are not delivered to the correct
1479	   destination. The other result of looping is congestion. Even with TTL
1480	   decrementing and packet discard, there may still be a significant
1481	   amount of time that packets travel through a loop. This can adversely
1482	   affect other packets which are not looping: Congestion due to the
1483	   looping packets can cause non-looping packets to be delayed and/or
1484	   discarded.

1486	   Looping is particularly serious in (at least) three cases: One is
1487	   when forwarding over ATM. Since ATM does not have a TTL field to
1488	   decrement, there is no way to discard ATM cells which are looping
1489	   over ATM subnetworks.  Standard ATM PNNI routing and signaling solves
1490	   this problem by making use of call setup procedures which ensure that
1491	   ATM VCs will never be setup in a loop [PNNI]. However, when MPLS is
1492	   used over ATM subnets, the native ATM routing and signaling
1493	   procedures may not be used for the full L2 path. This leads to the
1494	   possibility that MPLS over ATM might in principle allow packets to
1495	   loop indefinitely, or until L3 routing stabilizes. Methods are needed
1496	   to prevent this problem.

1498	   Another case in which looping can be particularly unpleasant is for
1499	   multicast traffic. With multicast, it is possible that the packet may
1500	   be delivered successfully to some destinations even though copies
1501	   intended for other destinations are looping. This leads to the
1502	   possibility that huge numbers of identical packets could be delivered
1503	   to some destinations. Also, since multicast implies that packets are
1504	   duplicated at some points in their path, the congestion resulting
1505	   from looping packets may be particularly severe.

1507	   Another unpleasant complication of looping occurs if the congestion
1508	   caused by the loop interferes with the routing protocol. It is
1509	   possible for the congestion caused by looping to cause routing
1510	   protocol control packets to be discarded, with the result that the
1511	   routing protocol becomes unstable. For example this could lengthen
1512	   the duration of the loop.

1514	   In normal operation of IP networks the impact of congestion is
1515	   limited by the fact that TCP backs off (i.e., transmits substantially
1516	   less traffic) in response to lost packets. Where the congestion is
1517	   caused by looping, the combination of TTL and the resulting discard
1518	   of looping packets, plus the reduction in offered traffic, can limit
1519	   the resulting impact on the network. TCP backoff however does not
1520	   solve the problem if the looping packets are not discarded (for
1521	   example, if the loop is over an ATM subnetwork where TTL is not
1522	   used).

1524	   The severity of the problem caused by looping may depend upon
1525	   implementation details. Suppose, for instance, that ATM switching
1526	   hardware is being used to provide MPLS switching functions. If the
1527	   ATM hardware has per-VC queuing, and if it is capable of providing
1528	   fair access to the buffer pool for incoming cells based on the
1529	   incoming VC (so that no one incoming VC is allowed to grab a
1530	   disproportionate number of buffers), this looping might not have a
1531	   significant effect on other traffic. If the ATM hardware cannot
1532	   provide fair buffer access of this sort, however, then even transient
1533	   loops may cause severe degradation of the node's total performance.

1535	   Given that MPLS is a relatively new approach, it is possible that
1536	   looping may have consequences which are not fully understood (such as
1537	   looping of LDP control information in cases where stream merge is not
1538	   used).

1540	   Even if fair buffer access can be provided, it is still worthwhile to
1541	   have some means of detecting loops that last "longer than possible".
1542	   In addition, even where TTL and/or per-VC fair queuing provides a
1543	   means for surviving loops, it still may be desirable where practical
1544	   to avoid setting up LSPs which loop.

1546	   Methods for dealing with loops are discussed in section 4.3.

1548	3.6 Operations and Management

1550	   Operations and management of networks is critically important. This
1551	   implies that MPLS must support operations, administration,  and
1552	   maintenance facilities at least as extensive as those supported in
1553	   current IP networks.

1555	   In most ways this is a relatively simple requirement to meet. Given
1556	   that all MPLS nodes run normal IP routing protocols, it is
1557	   straightforward to expect them to participate in normal IP network
1558	   management protocols.

1560	   There is one issue which has been identified and which needs to be
1561	   addressed by the MPLS effort: There is an issue with regard to
1562	   operation of Traceroute over MPLS networks. Note that other O&M
1563	   issues may be identified in the future.

1565	   Traceroute is a very commonly used network management tool.
1566	   Traceroute is based on use of the TTL field: A station trying to
1567	   determine the route from itself to a specified address transmits
1568	   multiple IP packets, with the TTL field set to 1 in the first packet,
1569	   2 in the second packet, etc.. This causes each router along the path
1570	   to send back an ICMP error report for TTL exceeded. This in turn
1571	   allows the station to determine the set of routers along the route.
1572	   For example, this can be used to determine where a problem exists (if
1573	   no router responds past some point, the last router which responds
1574	   can become the starting point for a search to determine the cause of
1575	   the problem).

1577	   When MPLS is operating over ATM or Frame Relay networks there is no
1578	   TTL field to decrement (and ATM and Frame Relay forwarding hardware
1579	   does not decrement TTL). This implies that it is not straightforward
1580	   to have Traceroute operate in this environment.

1582	   There is the question of whether we *want* all routers along a path
1583	   to be visible via traceroute. For example, an ISP probably doesn't
1584	   want to expose the interior of their network to a customer. However,
1585	   the issue of whether a network's policy will allow the interior of
1586	   the network to be visible should be independent of whether is it
1587	   possible for some users to see the interior of the network. Thus
1588	   while there clearly should be the possibility of using policy
1589	   mechanisms to block traceroute from being used to see the interior of
1590	   the network, this does not imply that it is okay to develop protocol
1591	   mechanisms which break traceroute from working.

1593	   There is also the question of whether the interior of a MPLS network
1594	   is analogous to a normal IP network, or whether it is closer to the
1595	   interior of a layer 2 network (for example, an ATM subnet). Clearly
1596	   IP traceroute cannot be used to expose the interior of an ATM subnet.
1597	   When a packet is crossing an ATM subnetwork (for example, between an
1598	   ingress and an egress router which are attached to the ATM subnet)
1599	   traceroute can be used to determine the router to router path, but
1600	   not the path through the ATM switches which comprise the ATM subnet.
1601	   Note here that MPLS forms a sort of "in between" special case:
1602	   Routing is based on normal IP routing protocols, the equivalent of
1603	   call setup (label binding/exchange) is based on MPLS-specific
1604	   protocols, but forwarding is based on normal L2 ATM forwarding. MPLS
1605	   therefore supersedes the normal ATM-based methods that would be used
1606	   to eliminate loops and/or trace paths through the ATM subnet.

1608	   It is generally agreed that Traceroute is a relatively "ugly" tool,
1609	   and that a better tool for tracing the route of a packet would be
1610	   preferable. However, no better tool has yet been designed or even
1611	   proposed. Also, however ugly Traceroute may be, it is nonetheless
1612	   very useful, widely deployed, and widely used. In general, it is
1613	   highly preferable to define, implement, and deploy a new tool, and to
1614	   determine through experience that the new tool is sufficient, before
1615	   breaking a tool which is as widely used as traceroute.

1617	   Methods that may be used to either allow traceroute to be used in an
1618	   MPLS network, or to replace traceroute, are discussed in section
1619	   4.14.

1621	4. Technical Approaches

1623	4.1 Label Distribution

1625	   A fundamental requirement in MPLS is that an LSR forwarding label
1626	   switched traffic to another LSR apply a label to that traffic which
1627	   is meaningful to the other (receiving the traffic) LSR. LSR's could
1628	   learn about each other's labels in a variety of ways. We call the
1629	   general topic "label distribution".

1631	4.1.1 Explicit Label Distribution

1633	   Explicit label distribution anticipates the specification by MPLS of
1634	   a standard protocol for label distribution. Two of the possible
1635	   approaches [TDP] [ARIS] that are oriented toward topology driven
1636	   label distribution. One other approach [FANP], in contrast, makes use
1637	   of traffic driven label distribution.

1639	   We expect that the label distribution protocol (LDP) which emerges
1640	   from the MPLS WG is likely to inherit elements from one or more of
1641	   the possible approaches.

1643	   Consider LSR A forwarding traffic to LSR B. We call A the upstream
1644	   (wrt to dataflow) LSR and B the downstream LSR. A must apply a label
1645	   to the traffic that B "understands". Label distribution must ensure
1646	   that the "meaning" of the label will be communicated between A and B.
1647	   An important question is whether A or B (or some other entity)
1648	   allocates the label.

1650	   In this discussion we are talking about the allocation and
1651	   distribution of labels between two peer LSRs  that are on a single
1652	   segment of what may be a longer path. A related but in fact entirely
1653	   separate issue is the question of where control of the whole path
1654	   resides. In essence there are two models; by analogy to upstream and
1655	   downstream for a single segment we can talk about ingress and egress
1656	   for an LSP (or to and from a label swapping "domain"). In one model a
1657	   path is setup from ingress to egress in the other from egress to
1658	   ingress.

1660	4.1.1.1 Downstream Label Allocation

1662	   "Downstream Label Allocation" refers to a method where the label
1663	   allocation is done by the downstream LSR, i.e. the LSR that uses the
1664	   label as an index into its switching tables.

1666	   This is, arguably, the most natural label allocation/distribution
1667	   mode for unicast traffic. As an LSR build its routing tables (we
1668	   consider here control driven allocation of tags) it is free, within
1669	   some limits we will discuss, to allocate labels to in any manner that
1670	   may be convenient to the particular implementation. Since the labels
1671	   that it allocates will be those upon which it subsequently makes
1672	   forwarding decisions we assume implementations will perform the
1673	   allocation in an optimal manner. Having allocated labels the default
1674	   behavior is to distribute the labels (and bindings) to all peers.

1676	   In some cases (particularly with ATM) there may be a limited number
1677	   of labels which may be used across an interface, and/or a limited
1678	   number of label assignments which may be supported by a single
1679	   device. Operation in this case may make use of "on demand" label
1680	   assignment. With this approach, an LSR may for example request a
1681	   label for a route from a particular peer only when its routing
1682	   calculations indicate that peer to be the new next hop for the route.

1684	4.1.1.2 Upstream Label Allocation

1686	   "Upstream Label Allocation" refers to a method where the label
1687	   allocation is done by the upstream LSR. In this case the LSR choosing
1688	   the label (the upstream LSR) and the LSR which needs to interpret
1689	   packets using the label (the downstream LSR) are not the same node.
1690	   We note here that in the upstream LSR the label at issue is not used
1691	   as an index into the switching tables but rather is found as the
1692	   result of a lookup on those tables.

1694	   The motivation for upstream label allocation comes from the
1695	   recognition that it might be possible to optimize multicast machinery
1696	   in an LSR if it were possible to use the same label on all output
1697	   ports for which a particular multicast packet/cell were destined.
1698	   Upstream assignment makes this possible.

1700	4.1.1.3 Other Label Allocation Methods

1702	   Another option would be to make use of label values which are unique
1703	   within the MPLS domain (implying that a domain-wide allocation would
1704	   be needed). In this case, any stream to a particular MPLS egress node
1705	   could make use of the label of that node (implying that label values
1706	   do not need to be swapped at intermediate nodes).

1708	   With this method of label allocation, there is a choice to be made
1709	   regarding the scope over which a label is unique. One approach is to
1710	   configure each node in an MPLS domain with a label which is unique in
1711	   that domain. Another approach is to use a truly global identifier
1712	   (for example the IEEE 48 bit identifier), where each MPLS-capable
1713	   node would be stamped at birth with a truly globally unique
1714	   identifier. The point of this global approach is to simplify
1715	   configuration in each MPLS domain by eliminating the need to
1716	   configure label IDs.

1718	4.1.2 Piggybacking on Other Control Messages

1720	   While we have discussed use of an explicit MPLS LDP we note that
1721	   there are several existing protocols that can be easily modified to
1722	   distribute both routing/control and label information. This could be
1723	   done with any of OSPF, BGP, RSVP and/or PIM. A particular
1724	   architectural elegance of these schemes is that label distribution
1725	   uses the same mechanisms as are used in distribution of the
1726	   underlying routing or control information.

1728	   When explicit label distribution is used, the routing computation and
1729	   label distribution are decoupled. This implies a possibility that at
1730	   some point you may either have a route to a specific destination
1731	   without an associated label, and/or a label for a specific
1732	   destination which makes use of a path which you are no longer using.
1733	   Piggybacking label distribution on the operation of the routing
1734	   protocol is one way to eliminate this decoupling.

1736	   Piggybacking label distribution on the routing protocol introduces an
1737	   issue regarding how to negotiate acceptable label values and what to
1738	   do if an invalid label is received. This is discussed in section
1739	   4.1.3.

1741	4.1.3 Acceptable Label Values

1743	   There are some constraints on which label values may be used in
1744	   either allocation mode. Clearly the label values must lie within the
1745	   allowable range described in the encapsulation standards that the
1746	   MPLS WG will produce. The label value used must also, however, lie
1747	   within a range that the peer  LSR is capable of supporting. We
1748	   imagine that certain machines, for example ATM switches operating as
1749	   LSRs may, due to operational or implementation restrictions, support
1750	   a label space more limited than that bounded by the valid range found
1751	   in the encapsulation standard. This implies that an advertisement or
1752	   negotiation mechanism for useable label range may be a part of the
1753	   MPLS LDP. When operating over ATM using ATM forwarding hardware, due
1754	   to the need for compatibility with the existing use of the ATM
1755	   VPI/VCI space, it is quite likely that an explicit mechanism will be
1756	   needed for label range negotiation.

1758	   In addition we note that LDP may be one of a number of mechanism used
1759	   to distribute labels between any given pair of LSRs. Clearly where
1760	   such multiple mechanisms exist care must be taken to coordinate the
1761	   allocation of label values. A single label value must  have a unique
1762	   meaning to the LSR that distributes it.

1764	   There is an issue regarding how to allow negotiation of acceptable
1765	   label values if label distribution is piggybacked with the routing
1766	   protocol. In this case it may be necessary either to require
1767	   equipment to accept any possible label value, or to configure devices
1768	   to know which range of label values may be selected. It is not clear
1769	   in this case what to do if an invalid label value is received as
1770	   there may be no means of sending a NAK.

1772	   A similar issue occurs with multicast traffic over broadcast media,
1773	   where there may be multiple nodes which receive the same transmission
1774	   (using a single label value). Here again it may be "non-trivial" how
1775	   to allow n-party negotiation of acceptable label values.

1777	4.1.4 LDP Reliability

1779	   The need for reliable label distribution depends upon the relative
1780	   performance of L2 and L3 forwarding, as well as the relationship
1781	   between label distribution and the routing protocol operation.

1783	   If label distribution is tied to the operation of the routing
1784	   protocol, then a reasonable protocol design would ensure that labels
1785	   are distributed successfully as long as the associated route and/or
1786	   reachability advertisement is distributed successfully. This implies
1787	   that the reliability of label distribution will be the same as the
1788	   reliability of route distribution.

1790	   If there is a very large difference between L2 and L3 forwarding
1791	   performance, then the cost of failing to deliver a label is
1792	   significant. In this case it is important to ensure that labels are
1793	   distributed reliably. Given that LDP needs to operate in a wide
1794	   variety of environments with a wide variety of equipment, this
1795	   implies that it is important for any LDP developed by the MPLS WG to
1796	   ensure reliable delivery of label information.

1798	   Reliable delivery of LDP packets may potentially be accomplished
1799	   either by using an existing reliable transport protocol such as TCP,
1800	   or by specifying reliability mechanisms as part of LDP (for example,
1801	   the reliability mechanisms which are defined in IDRP could
1802	   potentially be "borrowed" for use with LSP).

1804	   TCP supports flow control {in addition to supporting reliable
1805	   delivery of data). Flow control is a desirable feature which will be
1806	   useful for MPLS (as well as other applications making use of a
1807	   reliable transport) and therefore needs to be built into whatever
1808	   reliability mechanism is used for MPLS.

1810	4.1.5 Label Purge Mechanisms

1812	   Another issue to be considered is the "lifetime" of label data once
1813	   it arrives at an LSR, and the method of purging label data. There are
1814	   several methods that could be used either separately, or (more
1815	   likely) in combination.

1817	   One approach is for label information to be timed out. With this
1818	   approach a lifetime is distributed along with the label value. The
1819	   label value may be refreshed prior to timing out. If the label is not
1820	   refreshed prior to timing out it is discarded. In this case each
1821	   lifetime and timer may apply to a single label, or to a group of
1822	   labels (e.g., all labels selected by the same node).

1824	   Similarly, two peer nodes may make use of an MPLS peer keep-alive
1825	   mechanism. This implies exchange of MPLS control packets between
1826	   neighbors on a periodic basis. This in general is likely to use a
1827	   smaller timeout value than label value timers (analogous to the fact
1828	   that the OSPF HELLO interval is much shorter than the OSPF LSA
1829	   lifetime). If the peer session between two MPLS nodes fails (due to
1830	   expiration of the associated timer prior to reception of the refresh)
1831	   then associated label information is discarded.

1833	   If label information is piggybacked on the routing protocol then the
1834	   timeout mechanisms would also be taken from the associated routing
1835	   protocol (note that routing protocols in general have mechanisms to
1836	   invalidate stale routing information).

1838	   An alternative method for invalidating labels is to make use of an
1839	   explicit label removal message.

1841	4.2 Stream Merging

1843	   In order to scale O(n) (rather than O(n-squared), MPLS makes use of
1844	   the concept of stream merge. This makes use of multipoint to point
1845	   streams in order to allow multiple streams to be merged into one
1846	   stream.

1848	4.2.1 Types of Stream Merge:

1850	   There are several types of stream merge that can be used, depending
1851	   upon the underlying media.

1853	   When MPLS is used over frame based media merging is straightforward.
1854	   All that is required for stream merge to take place is for a node to
1855	   allow multiple upstream labels to be forwarded the same way and
1856	   mapped into a single downstream label. This is referred to as frame
1857	   merge.

1859	   Operation over ATM media is less straightforward. In ATM, the data
1860	   packets are encapsulated into an ATM Adaptation Layer, say AAL5, and
1861	   the AAL5 PDU is segmented into ATM cells with a VPI/VCI value and the
1862	   cells are transmitted in sequence.  It is contingent on ATM switches
1863	   to keep the cells of a PDU (or with the same VPI/VCI value)
1864	   contiguous and in sequence.  This is because the device that
1865	   reassembles the cells to re-form the transmitted PDU expects the
1866	   cells to be contiguous and in sequence, as there isn't sufficient
1867	   information in the ATM cell header (unlike IP fragmentation) to
1868	   reassemble the PDU with any cell order. Hence, if cells from several
1869	   upstream link are transmitted onto the same downstream VPI/VCI, then
1870	   cells from one PDU can get interleaved with cells from another PDU on
1871	   the outgoing VPI/VCI, and result in corruption of the original PDUs
1872	   by mis-sequencing the cells of each PDU.

1874	   The most straightforward (but erroneous) method of merging in an ATM
1875	   environment would be to take the cells from two incoming VCs and
1876	   merge them into a single outgoing VCI. If this was done without any
1877	   buffering of cells then cells from two or more packets could end up
1878	   being interleaved into a single AAL5 frame. Therefore the problem
1879	   when operating over ATM is how to avoid interleaving of cells from
1880	   multiple sources.

1882	   There are two ways to solve this interleaving problem, which are
1883	   referred to as VC merge and VP merge.

1885	   VC merge allows multiple VCs to be merged into a single outgoing VC.
1886	   In order for this to work the node performing the merge needs to keep
1887	   the cells from one AAL5 frame (e.g., corresponding to an IP packet)
1888	   separate from the cells of other AAL5 frames. This may be done by
1889	   performing the SAR function in order to reassemble each IP packet
1890	   before forwarding that packet. In this case VC merge is essentially
1891	   equivalent to frame merge. An alternative is to buffer the cells of
1892	   one AAL5 frame together, without actually reassembling them. When the
1893	   end of frame indicator is reached that frame can be forwarded. Note
1894	   however that both forms of VC merge requires that the entire AAL5
1895	   frame be received before any cells corresponding to that frame be
1896	   forwarded. VC merge therefore requires capabilities which are
1897	   generally not available in most existing ATM forwarding hardware.

1899	   The alternative for use over ATM media is VP merge. Here multiple VPs
1900	   can be merged into a single VP. Separate VCIs within the merged VP
1901	   are used to distinguish frames (e.g., IP packets) from different
1902	   sources. In some cases, one VP may be used for the tree from each
1903	   ingress node to a single egress node.

1905	4.2.2 Interoperation of Merge Options:

1907	   If some nodes support stream merge, and some nodes do not, then it is
1908	   necessary to ensure that the two types of nodes can interoperate
1909	   within a single network. This affects the number of labels that a
1910	   node needs to send to a neighbor. An upstream LSR which supports
1911	   Stream Merge needs to be sent only one label per forwarding
1912	   equivalence class (FEC). An upstream neighbor which does not support
1913	   Stream Merge needs to be sent multiple labels per FEC. However, there
1914	   is no way of knowing a priori how many labels it needs. This will
1915	   depend on how many LSRs are upstream of it with respect to the FEC in
1916	   question.

1918	   If a particular upstream neighbor does not support stream merge, it
1919	   is not known a priori how many labels it will need. The upstream
1920	   neighbor may need to explicitly ask for labels for each FEC. The
1921	   upstream neighbor may make multiple such requests (for one or more
1922	   labels per request). When a downstream neighbor receives such a
1923	   request from upstream, and the downstream neighbor does not itself
1924	   support stream merge, then it must in turn ask its downstream
1925	   neighbor for more labels for the FEC in question.

1927	   It is possible that there may be some nodes which support merge, but
1928	   have a limited number of upstream streams which may be merged into a
1929	   single downstream streams. Suppose for example that due to some
1930	   hardware limitation a node is capable of merging four upstream LSPs
1931	   into a single downstream LSP. Suppose however, that this particular
1932	   node has six upstream LSPs arriving at it for a particular Stream. In
1933	   this case, this node may merge these into two downstream LSPs
1934	   (corresponding to two labels that need to be obtained from the
1935	   downstream neighbor). In this case, the node will need to obtain the
1936	   required two labels.

1938	   The interoperation of the various forms of merging over ATM is most
1939	   easily described by first describing the interoperation of VC merge
1940	   with non-merge.

1942	   In the case where VC merge and non-merge nodes are interconnected the
1943	   forwarding of cells is based in all cases on a VC (i.e., the
1944	   concatenation of the VPI and VCI). For each node, if an upstream
1945	   neighbor is doing VC merge then that upstream neighbor requires only
1946	   a single outgoing VPI/VCI for a particular FEC (this is analogous to
1947	   the requirement for a single label in the case of operation over
1948	   frame media). If the upstream neighbor is not doing merge, then it
1949	   will require a single outgoing VPI/VCI per FEC for itself (assuming
1950	   that it can be an ingress node), plus enough outgoing VPI/VCIs to map
1951	   to incoming VPI/VCIs to pass to its upstream neighbors. The number
1952	   required will be determined by allowing the upstream nodes to request
1953	   additional VPI/VCIs from their downstream neighbors.

1955	   A similar method is possible to support nodes which perform VP merge.
1956	   In this case the VP merge node, rather than requesting a single
1957	   VPI/VCI or a number of VPI/VCIs from its downstream neighbor, instead
1958	   may request a single VP (identified by a VPI). Furthermore, suppose
1959	   that a non-merge node is downstream from two different VP merge
1960	   nodes. This node may need to request one VPI/VCI (for traffic
1961	   originating from itself) plus two VPs (one for each upstream node).

1963	   Note that there are multiple options for coordinating VCIs within a
1964	   VP. Description of the range of options is FFS.

1966	   In order to support all of VP merge, VC merge, and non-merge, it is
1967	   therefore necessary to allow upstream nodes to request a combination
1968	   of zero or more VC identifiers (consisting of a VPI/VCI), plus zero
1969	   or more VPs (identified by VPIs). VP merge nodes would therefore
1970	   request one VP. VC merge node would request only a single VPI/VCI
1971	   (since they can merge all upstream traffic into a single VC). Non-
1972	   merge nodes would pass on any requests that they get from above, plus
1973	   request a VPI/VCI for traffic that they originate (if they can be
1974	   ingress nodes). However, non-merge nodes which can only do VC
1975	   forwarding (and not VP forwarding) will need to know which VCIs are
1976	   used within each VP in order to install the correct VCs in its
1977	   forwarding table. A detailed description of how this could work is
1978	   FFS.

1980	4.2.3 Coordination of the VCI space with VP Merge:

1982	   VP merge requires that the VCIs be coordinated to ensure uniqueness.
1983	   There are a number of ways in which this may be accomplished:

1985	   1. Each node may be pre-configured with a unique VCI value (or
1986	      values).

1988	   2. Some one node (most likely they root of the multipoint to point
1989	      tree) may coordinate the VCI values used within the VP.  A
1990	      protocol mechanism will be needed to allow this to occur. How
1991	      hard this is to do depends somewhat upon whether the root is
1992	      otherwise involved in coordinating the multipoint to point
1993	      tree. For example, allowing one node (such as the root) to
1994	      coordinate the tree may be useful for purposes of coordinating
1995	      load sharing (see section 4.10). Thus whether or not the issue
1996	      of coordinating the VCI space is significant or trivial may
1997	      depend upon other design choices which at first glance may
1998	      have appeared to be independent protocol design choices.

2000	   3. Other unique information such as portions of a class B or class
2001	      C address may be used to provide a unique VCI value.

2003	   4. Another alternative is to implement a simple hardware extension
2004	      in the ATM switches to keep the VCI values unique by dynamically
2005	      altering them to avoid collision.

2007	   VP merge makes less efficient use of the VPI/VCI space (relative to
2008	   VC merge).  When VP merge is used, the LSPs may not be able to
2009	   transit public ATM networks that don't support SVP.

2011	4.2.4 Buffering Issues Related To Stream Merge:

2013	   There is an issue regarding the amount of buffering required for
2014	   frame merge, VC merge, and VP merge. Frame merge and VC merge
2015	   requires that intermediate points buffer incoming packets until the
2016	   entire packet arrives. This is essentially the same as is required in
2017	   traditional IP routers.

2019	   VP merge allows cells to be transmitted by intermediate nodes as soon
2020	   as they arrive, reducing the buffering and latency at intermediate
2021	   nodes. However, the use of VP merge implies that cells from multiple
2022	   packets will arrive at the egress node interleaved on separate VCIs.
2023	   This in turn implies that the egress node may have somewhat increased
2024	   buffering requirements. To a large extent egress nodes for some
2025	   destinations will be intermediate nodes for other destinations,
2026	   implying that increase in buffers required for some purpose (egress
2027	   traffic) will be offset by a reduction in buffers required for other
2028	   purposes (transit traffic). Also, routers today typically deal with
2029	   high-fanout channelized interfaces and with multi-VC ATM interfaces,
2030	   implying that the requirement of buffering simultaneously arriving
2031	   cells from multiple packets and sources is something that routers
2032	   typically do today. This is not meant to imply that the required
2033	   buffer size and performance is inexpensive, but rather is meant to
2034	   observe that it is a solvable issue.

2036	   ATM equipment provides traffic shaping, in which the ATM cells
2037	   associated with any one particular VC are intentionally not
2038	   transmitted back to back, but rather are spread out over time in
2039	   order to place less short term buffering load on switches. Since VC
2040	   merge requires that all cells associated with a particular packet (or
2041	   a particular AAL5 frame) are buffered before any cell from the packet
2042	   can be transmitted, VC merge defeats much of the intent of traffic
2043	   shaping. An advantage of VP merge is that it preserves traffic
2044	   shaping through ATM switches acting as LSRs. While traffic shaping
2045	   may generally be expected to reduce the buffering requirements in ATM
2046	   switches (whether acting as MPLS switches or as native ATM switches),
2047	   the precise effect of traffic shaping has not been studied in the
2048	   context of MPLS.

2050	4.3 Loop Handling

2052	   Generally, methods for dealing with loops can be split into three
2053	   categories: Loop Survival makes use of methods which minimize the
2054	   impact of loops, for example by limiting the amount of network
2055	   resources which can be consumed by a loop; Loop Detection allows
2056	   loops to be set up, but later detects these loops and eliminates
2057	   them; Loop Prevention provides methods for avoiding setting up L2
2058	   forwarding in a way which results in a L2 loop.

2060	   Note that we are concerned here only with loops that occur in L2
2061	   forwarding. Transient loops at L3 will continue to be part of the
2062	   normal IP operation, and will be handled the way that IP has been
2063	   handling loops for years (see section 3.5).

2065	   Loop Survival:

2067	   Loop Survival refers to methods that are used to allow the network to
2068	   operate well even though short term transient loops may be formed by
2069	   the routing protocol. The basic approach to loop survival is to limit
2070	   the amount of network resources which are consumed by looping
2071	   packets, and to minimize the effect on other (non-looping) traffic.
2072	   Note that loop survival is the method used by conventional IP
2073	   forwarding, and is therefore based on long and relatively successful
2074	   experience in the Internet.

2076	   The most basic method for loop survival is based on the use to a TTL
2077	   (Time To Live) field. The TTL field is decremented at each hop. If
2078	   the TTL field reaches zero, then the packet is discarded. This method
2079	   works well over those media which has a TTL field. This explicitly
2080	   includes L3 IP forwarding. Also, assuming that the core MPLS
2081	   specifications will include definition of a "shim" MPLS header for
2082	   use over those media which do not have their own labels, in order to
2083	   carry labels for use in forwarding of user data, it is likely that
2084	   the shim header will also include a TTL field.

2086	   However, there is considerable interest in using MPLS over L2
2087	   protocols which provide their own labels, with the L2 label used for
2088	   MPLS forwarding. Specific L2 protocols which offer a label for this
2089	   purpose include ATM and Frame Relay. However, neither ATM nor Frame
2090	   Relay have a TTL field. This implies that this method cannot be used
2091	   when basic ATM or Frame Relay forwarding is being used.

2093	   Another basic method for loop survival is the use of dynamic routing
2094	   protocols which converge rapidly to non-looping paths. In some
2095	   instances it is possible that congestion caused by looping data could
2096	   effect the convergence of the routing protocol (see section 3.5).
2097	   MPLS should be designed to prevent this problem from occurring. Given
2098	   that MPLS uses the same routing protocols as are used for IP, this
2099	   method does not need to be discussed further in this framework
2100	   document.

2102	   Another possible tool for loop survival is the use of fair queuing.
2103	   This allows unrelated flows of user data to be placed in different
2104	   queues. This helps to ensure that a node which is overloaded with
2105	   looping user data can nonetheless forward unrelated non-looping data,
2106	   thereby minimizing the effect that looping data has on other data. We
2107	   cannot assume that fair queuing will always be available. In
2108	   practice, many fair queuing implementations merge multiple streams
2109	   into one queue (implying that the number of queues used is less than
2110	   the number of user data flows which are present in the network).
2111	   This implies that any data which happens to be in the same queue with
2112	   looping data may be adversely effected.

2114	   Loop Detection:

2116	   Loop Detection refers to methods whereby a loop may be set up at L2,
2117	   but the loop is subsequently detected. When the loop is detected, it
2118	   may be broken at L2 by dropping the label relationship, implying that
2119	   packets for a set of destinations must be forwarded at L3.

2121	   A possible method for loop detection is based on transmitting a "loop
2122	   detection" control packet (LDCP) along the path towards a specified
2123	   destination whenever the route to the destination changes. This LDCP
2124	   is forwarded in the direction that the label specifies, with the
2125	   labels swapped to the correct next hop value. However, normal L2
2126	   forwarding cannot be used because each hop needs to examine the
2127	   packet to check for loops.  The LDCP is forwarded towards that
2128	   destination until one of the following happens: (i) The LDCP reaches
2129	   the last MPLS node along the path (i.e. the next hop is either a
2130	   router which is not participating in MPLS, or is the final
2131	   destination host); (ii) The TTL of the LDCP expires (assuming that
2132	   the control packet uses a TTL, which is optional but not absolutely
2133	   necessary), or (iii) The LDCP returns to the node which originally
2134	   transmitted it. If the latter occurs, then the packet has looped and
2135	   the node which originally transmitted the LDCP stops using the
2136	   associated label, and instead uses L3 forwarding  for the associated
2137	   destination addresses. One problem with this method is that once a
2138	   loop is detected it is not known when the loop clears. One option
2139	   would be to set a timer, and to transmit a new LDCP when the timer
2140	   expires.

2142	   An alternate method counts the hops to each egress node, based on the
2143	   routes currently available. Each node advertises its distance (in hop
2144	   counts) to each destination. An egress node advertises the
2145	   destinations that it can reach directly with an associated hop count
2146	   of zero. For each destination, a node computes the hop count to that
2147	   destination based on adding one to the hop count advertised by its
2148	   actual next hop used for that destination. When the hop count for a
2149	   particular destination changes, the hop counts needs to be
2150	   readvertised.

2152	   In addition, the first of the loop prevention schemes discussed below
2153	   may be modified to provide loop detection (the details are
2154	   straightforward, but have not been written down in time to include in
2155	   this rough draft).

2157	   Loop Prevention:

2159	   Loop prevention makes use of methods to ensure that loops are never
2160	   set up at L2. This implies that the labels are not used until some
2161	   method is used to ensure that following the label towards the
2162	   destination, with associated label swaps at each switch, will not
2163	   result in a loop. Until the L2 path (making use of assigned labels)
2164	   is available, packets are forwarded at L3.

2166	   Loop prevention requires explicit signaling of some sort to be used
2167	   when setting up an L2 stream.

2169	   One method of loop prevention requires that labels be propagated
2170	   starting at the egress switch. The egress switch signals to
2171	   neighboring switches the label to use for a particular destination.
2172	   That switch then signals an associated label to its neighbors, etc.
2173	   The control packets which propagate the labels also include the path
2174	   to the egress (as a list of routerIDs). Any looping control packet
2175	   can therefore be detected and the path not set up to or past the
2176	   looping point. <Operation when routing changes needs to be described
2177	   here in more detail>.

2179	   Another option is to use explicit routing to set up label bindings
2180	   from the egress switch to each ingress switch. This precludes the
2181	   possibility of looping, since the entire path is computed by one
2182	   node. This also allows non-looping paths to be set up provided that
2183	   the egress switch has a view of the topology which is reasonably
2184	   close to reality (if there are operational links which the egress
2185	   switch doesn't know about, it will simply pick a path which doesn't
2186	   use those links; if there are links which have failed but which the
2187	   the egress switch thinks are operational, then there is some chance
2188	   that the setup attempt will fail but in this case the attempt can be
2189	   retried on a separate path). Note therefore that non-looping paths
2190	   can be set up with this method in many cases where distributed
2191	   routing plus hop by hop forwarding would not actually result in non-
2192	   looping paths. This method is similar to the method used by standard
2193	   ATM routing to ensure that SVCs are non-looping [PNNI].

2195	   Explicit routing is only applicable if the routing protocol gives the
2196	   egress switch sufficient information to set up the explicit route,
2197	   implying that the protocol must be either a link state protocol (such
2198	   as OSPF) or a path vector protocol (such as BGP). Source routing
2199	   therefore is not appropriate as a general approach for use in any
2200	   network regardless of the routing protocol. This method also requires
2201	   some overhead for the call setup before label-based forwarding can be
2202	   used. If the network topology changes in a manner which breaks the
2203	   existing path, then a new path will need to be explicit routed from
2204	   the egress switch.  Due to this overhead this method is probably only
2205	   appropriate if other significant advantages are also going to be
2206	   obtained from having a single node (the egress switch) coordinate the
2207	   paths to be used. Examples of other reasons to have one node
2208	   coordinate the paths to a single egress switch include: (i)
2209	   Coordinating the VCI space where VP merge is used (see section 4.2);
2210	   and (ii) Coordinating the routing of streams from multiple ingress
2211	   switches to one egress switch so as to balance the load on multiple
2212	   alternate paths through the network.

2214	   In principle the explicit routing could also be done in the alternate
2215	   direction (from ingress to egress). However, this would make it more
2216	   difficult to merge streams if stream merge is to be used. This would
2217	   also make it more difficult to coordinate (i) changes to the paths
2218	   used, (ii) the VCI space assignments, and (iii) load sharing. This
2219	   therefore makes explicit routing more difficult, and also reduces the
2220	   other advantages that could be obtained from the approach.

2222	   If label distribution is piggybacked on the routing protocol (see
2223	   section 4.1.2), then loop prevention is only possible if the routing
2224	   protocol itself does loop prevention.

2226	   What To Do If A Loop Is Detected:

2228	   With all of these schemes, if a loop is known to exist then the L2
2229	   label-swapped path is not set up. This leads to the obvious question
2230	   of what does an MPLS node do when it doesn't have a label for a
2231	   particular destination, and a packet for that destination arrives to
2232	   be forwarded? If possible, the packet is forwarded using normal L3
2233	   (IP) forwarding. There are two issues that this raises: (i) What
2234	   about nodes which are not capable of L3 forwarding; (ii) Given the
2235	   relative speeds of L2 and L3 forwarding, does this work?

2237	   Nodes which are not capable of L3 forwarding obviously can't forward
2238	   a packet unless it arrives with a label, and the associated next hop
2239	   label has been assigned. Such nodes, when they receive a packet for
2240	   which the next hop label has not been assigned, must discard the
2241	   packet. It is probably safe to assume that if a node cannot forward
2242	   an L3 packet, then it is probably also incapable of forwarding an
2243	   ICMP error report that it originates. This implies that the packet
2244	   will need to be discarded in this case.

2246	   In many cases L2 forwarding will be significantly faster than L3
2247	   forwarding (allowing faster forwarding is a significant motivation
2248	   behind the work on MPLS). This implies that if a node is forwarding a
2249	   large volume of traffic at L2, and a change in the routing protocol
2250	   causes the associated labels to be lost (necessitating L3
2251	   forwarding), in some cases the node will not be capable of forwarding
2252	   the same volume of traffic at L3. This will of course require that
2253	   packets be discarded. However, in some cases only a relatively small
2254	   volume of traffic will need to be forwarded at L3. Thus forwarding at
2255	   L3 when L2 is not available is not necessarily always a problem.
2256	   There may be some nodes which are capable of forwarding equally fast
2257	   at L2 and L3 (for example, such nodes may contain IP forwarding
2258	   hardware which is not available in all nodes). Finally, when packets
2259	   are lost this will cause TCP to backoff, which will in turn reduce
2260	   the load on the network and allow the network to stabilize even at
2261	   reduced forwarding rates until such time as the label bindings can be
2262	   reestablished.

2264	   Note that in most cases loops will be caused either by configuration
2265	   errors, or due to short term transient problems caused by the failure
2266	   of a link. If only one link goes down, and if routing creates a
2267	   normal "tree-shaped" set of paths to any one destination, then the
2268	   failure of one link somewhere in the network will effect only one
2269	   link's worth of data passing through any one node in the network.
2270	   This implies that if a node is capable of forwarding one link's worth
2271	   of data at L3, then in many or most cases it will have sufficient L3
2272	   bandwidth to handle looping data.

2274	4.4 Interoperation with NHRP

2276	   <note: Future versions of this document will contain a picture to
2277	   clarify the discussion in this section. In addition there are
2278	   alternate interaction scenarios which probably should be discussed
2279	   briefly.>

2281	   When label switching is used over ATM, and there exists an LSR which
2282	   is also operating as a Next Hop Client (NHC), the possibility of
2283	   direct interaction arises.  That is, could one switch cells between
2284	   the two technologies without reassembly.  To enable this several
2285	   important issues must be addressed.

2287	   The encapsulation must be acceptable to both MPLS and NHRP.  If only
2288	   a single label is used, then the null encapsulation could be used.
2289	   Other solutions could be developed to handle label stacks.

2291	   NHRP must understand and respect the granularity of a stream.

2293	   Currently NHRP resolves an IP address to an ATM address. The response
2294	   may include a mask indicating a range of addresses. However, any VC
2295	   to the ATM address is considered to be a viable means of packet
2296	   delivery. Suppose that an NHC NHRPs for IP address A and gets back
2297	   ATM address 1 and sets up a VC to address 1. Later the same NHC NHRPs
2298	   for a totally unrelated IP address B and gets back the same ATM
2299	   address 1. In this case normal NHRP behavior allows the NHC to use
2300	   the VC (that was set up for destination A) for traffic to B.

2302	   Note: In this section we will refer to a VC set up as a result of an
2303	   NHRP query/response as a shortcut VC.

2305	   If one expects to be able to label switch the packets being received
2306	   from a shortcut VC, then the label switch needs to be informed as to
2307	   exactly what traffic will arrive on that VC and that mapping cannot
2308	   change without notice. Currently there exists no mechanism in the
2309	   defined signaling of an shortcut VC.  Several means are possible.  A
2310	   binding, equivalent to the binding in LDP, could be sent in the setup
2311	   message.  Alternatively, the binding of prefix to label could remain
2312	   in an LDP session (or whatever means of label distribution as
2313	   appropriate) and the setup could carry a binding of the label to the
2314	   VC. This would leave the binding mechanism for shortcut VCs
2315	   independent of the label distribution mechanism.

2317	   A further architectural challenge exists in that label switching is
2318	   inherently unidirectional whereas ATM is bi-directional.  The above
2319	   binding semantics are fairly straight-forward.  However, effectively
2320	   using the reverse direction of a VC presents further challenges.

2322	   Label switching must also respect the granularity of the shortcut VC.
2323	   Without VC merge, this means a single label switched flow must map to
2324	   a VC.  In the case of VC merge, multiple label switched streams could
2325	   be merged onto a single shortcut VC.  But given the asymmetry
2326	   involved, there is perhaps little practical use

2328	   Another issue is one of practicality and usefulness.  What is sent
2329	   over the VC must be at a fine enough granularity to be label switched
2330	   through receiving domain.  One potential place where the two
2331	   technologies might come into play is in moving data from one campus
2332	   via the wide-area to another campus.  In such a scenario, the two
2333	   technologies would border precisely at the point where summarization
2334	   is likely to occur.  Each campus would have a detailed understanding
2335	   of itself, but not of the other campus.  The wide-area is likely to
2336	   have summarized knowledge only. But at such a point level 3
2337	   processing becomes the likely solution.

2339	4.5. Operation in a hierarchy

2341	   MPLS allows hierarchical operation, through use of a label stack.
2342	   This allows MPLS to simultaneously be used for routing at a fine
2343	   grain level (for example, between individual routers within an ISP)
2344	   and at a higher "area by area" or "domain by domain" level.

2346	4.5.1 Example of Hierarchical Operation

2348	   Figure 1 illustrates an example of how MPLS may operate in a
2349	   hierarchy. This example illustrates three transit routing domains
2350	   (Domain #1, #2, and #3). For example, these three domains may
2351	   represent internet service providers. Domain Boundary Routers are
2352	   illustrated in each domain (routers R1 and R2 in domain #1, routers
2353	   R3 and R8 in domain #2, and routers R9 and R10 in domain #3. Suppose
2354	   that these domain boundary routers are operating BGP.

2356	   Internal routers are not illustrated in domains 1 and 3. However,
2357	   internal routers are illustrated within domain #2. In particular, the
2358	   path between routers R3 and R8 follows the internal routers R4, R5,
2359	   R6, and R7 within domain #2.

2361	        .................     ........................     ................
2362	        .               .     .                      .     .              .
2363	        .               .     .                      .     .              .
2364	        .R1           R2-------R3                  R8-------R9         R10.
2365	        .               .     . \                 /  .     .              .
2366	        .               .     .  R4---R5---R6---R7   .     .              .
2367	        .               .     .                      .     .              .
2368	        .   Domain#1    .     .       Domain#2       .     .    Domain#3  .
2369	        .................     ........................     ................

2371	                 Example of the Use of MPLS in a Hierarchy

2373	   In this example there are two levels of routing taking place. For
2374	   example, OSPF may be used for routing within Domain #2. In this case
2375	   the routers R3, R4, R5, R6, R7, and R8 may be running OSPF amongst
2376	   themselves in order to compute routes within Domain #2. The domain
2377	   boundary routers (R1, R2, R3, R8, R9, and R10) operate BGP in order
2378	   to determine paths between routing domains.

2380	   MPLS allows label forwarding to be done independently at multiple
2381	   levels. In this example, MPLS may be used at the BGP level (between
2382	   routers R1, R2, R3, R8, R9, and R10) and at the OSPF level (between
2383	   routers R4, R5, R6, and R7). Thus when the IP packet traverses Domain
2384	   number 2, it will contain two labels, encoded as a "label stack". The
2385	   higher level label would be used between routers R3 and R8. This
2386	   would be encapsulated inside a header specifying a lower level label
2387	   used within domain 2.

2389	   Consider the forwarding operation that takes place at router R3. In
2390	   this case, R3 will receive a packet from R2 containing a single label
2391	   (the BGP level label). R3 will need to swap BGP level labels in order
2392	   to put the label that R8 expects. R3 will also need to add an OSPF-
2393	   level label, as is expected by R4. R3 therefore "pushes down" the BGP
2394	   level label in the label stack, by adding a lower level label. Also
2395	   note that the actual label swapping operation performed by R3 can be
2396	   optimized to allow very simple forwarding: R3 receives a single
2397	   incoming label from R2, and can map this label into the new label
2398	   header to be prepended to the packet, it just happens that the new
2399	   label header to be added by R3 contains two labels rather than one.

2401	4.5.2 Components Required for Hierarchical Operation

2403	   In order for MPLS to operate in a hierarchy, there are three things
2404	   which must be accomplished:

2406	  - Hierarchical Label Exchange in LDP

2408	    The Label Distribution Protocol needs to exchange labels at each
2409	    level of the hierarchy. In our example, R3 needs to exchange label
2410	    bindings with R8 for operation at the BGP level. At the same time,
2411	    R3 needs to exchange label bindings with R4 (and R4 needs to
2412	    exchange label bindings with R5) for operation at the OSPF level.
2413	    The control component for hierarchical labeling is essentially the
2414	    same as that for single level tagging, except that labels are
2415	    exchanged not just among physically adjacent LSRs but between those
2416	    switching on the same level in the tag stack.

2418	   - Label Stack
2419	    Multiple labels need to be carried in data packets. For example,
2420	    when a data packet is being carried across domain #2, the data
2421	    packet needs to be encapsulated in a header which carries BGP level
2422	    label, and the resulting packet needs to be carried in a header
2423	    which carries an OSPF level label.

2425	  - Configuration
2426	    It is necessary for routers to know when hierarchical label
2427	    switching is being used.

2429	4.5.3 Some Restrictions on Use of Hierarchical MPLS

2431	Consider the example in figure 1. In this case, the BGP-level label is
2432	encoded by router R1. Label swapping is employed for packet forwarding
2433	at R2, R3, R8, and R9. This is only possible if R1 knows the right label
2434	to use, implying that the granularity used in mapping packets to
2435	forwarding equivalence classes is the same at routers R2, R3, R8, and
2436	R9.

2438	We can consider some specific examples to illustrate the issue:

2440	Suppose that the destination host is within domain 3. In this case, it
2441	is very likely that router R9 will forward the packet based on a finer
2442	grain than was used previously. For example, a relatively short address
2443	prefix may be used for advertising the addresses reachable in domain 3,
2444	while longer (more specific) address prefixes may be used for specific
2445	areas or subnets within domain 3. In this case router R1 may assign a
2446	BGP level label to the packet, and label based forwarding at the BGP
2447	level may be used by routers R1, R2, R3, and R8. However, router R9 will
2448	need to make use of layer 3 forwarding.

2450	Alternatively, suppose that domain 3 is an Internet Service Provider,
2451	which offers service to multiple routing domains. Suppose that in this
2452	case domain 3 makes use of a single CIDR address block (based on a
2453	single address prefix), with smaller address blocks (corresponding to
2454	longer address prefixes) assigned to each of multiple domains who get
2455	their Internet service from domain 3. Suppose that the destination for a
2456	particular IP packet is contained in one of these smaller domains whose
2457	addresses are contained in the larger address block assigned to and
2458	administered by domain 3. Again in this case router R9 will need to make
2459	use of label based forwarding.

2461	Let's consider another possible complication: Suppose that router R1 is
2462	an MPLS node, but that some of the internal routers within domain 1 do
2463	not know about MPLS. In this case, suppose that R1 encapsulates an IP
2464	packet in an MPLS header in order to carry the BGP level label. In this
2465	case the non-MPLS-capable routers within domain 1 will not know what to
2466	do with the MPLS header. This implies that MPLS can be used at a higher
2467	level (such as between the border routers R1 and R2 in our example) only
2468	if either the lower level routers (such as the routers within domain 1)
2469	are also using MPLS, or the MPLS header is itself encapsulated within an
2470	IP header for transmission across the domain.

2472	These examples imply that there are some cases where IP forwarding will
2473	be required in a hierarchy. While hierarchical MPLS may be useful in
2474	many cases, it does not replace layer 3 forwarding.

2476	4.5.4 The Relationship between MPLS hierarchy and Routing Hierarchy

2478	4.5.4.1 Stacked Labels in a Flat Routing Environment

2480	The label stacking mechanism can be useful in some scenarios independent
2481	of routing hierarchy.

2483	The basic concept of stacking is to provide a mechanism to segregate
2484	streams within a switched path.  Under normal operation, when packets
2485	are encapsulated into a single L2 header, if multiple streams are
2486	forwarded into a switched path, it will require L3 processing to
2487	segregate a certain stream at the end of the switched path.  The
2488	stacking mechanism provides an easy way to maintain the identity of
2489	various streams which are merged into a single switched path.

2491	One useful application of this technique is in Virtual Private Networks.
2492	The packets can be switched both at the ingress and egress nodes of the
2493	provider network.  A packet coming in at one end of a customer network
2494	contains an encapsulated header with the VPN label.  At the VPN ingress
2495	node, the header is "popped", to provide the label for switching through
2496	the VPN.  Further, this header is then "pushed" with an encapsulation of
2497	the far end customer label.  At the VPN egress node, the packet  header
2498	is "popped" again, and the new header provides the label for switching
2499	through the customer site.  This enables one to provide customers with
2500	benefits of VPN with end-to-end switching for optimal performance.

2502	Another interesting use can be in conjunction with RSVP flows.  In RSVP,
2503	senders flows can be logically merged under a single resource
2504	reservation using the Shared and the Wildcard filters.  The stacking
2505	mechanism can be used to merge flows into a single label and the shared
2506	QoS can be applied to the single label on top of the stack.  Since
2507	sender flows within the merged switched path maintain their identity, it
2508	is easy to demerge at a downstream node without requiring L3 processing
2509	of the packets. Another similar application can be merging of several
2510	premium service flows with similar QoS into a single switched path. This
2511	helps in conserving labels in backbone of a large networks.

2513	Yet another useful application can be DVMRP tunnels similar in concept
2514	to the DVMRP tunnels used in the existing Mbone.  The ingress node to
2515	the DVMRP switched tunnels encapsulates the label learned from the
2516	egress node of the DVMRP tunnel for a particular (S,G) pair before
2517	forwarding packets into the DVMRP tunnel.  The egress node of the tunnel
2518	just pops the top label and switches the packet based on the interior
2519	label.

2521	Note that the use of tunnels can be also quite beneficial in a non-
2522	hierarchical environment.  Take for example the case where a domain
2523	contains a subset of MPLS nodes.  The MPLS egress can advertise labels
2524	for the routes which are within the domain, but are external to the MPLS
2525	core.  The ingress node can encapsulate packets for these destinations
2526	within the header for the aggregated switched path that crosses the MPLS
2527	domain.

2529	It is not evident if this technique has any useful application in a flat
2530	routing domain, but can be used in conjunction with explicit routing
2531	when providing specialized services.  The multiple levels of
2532	encapsulation can also be used like loose source routing.

2534	4.5.4.2 Flat labels in a Hierarchical Routing Environment

2536	It is also possible in some environments to use a single level of label
2537	in a network using hierarchical routing. This is for example possible in
2538	the case of a two level OSPF network in which the primary purpose of the
2539	network is to support external routes. Specifically, (depending upon the
2540	types of area hierarchy used) OSPF allows external routes to be
2541	advertised throughout an OSPF routing domain, with each external route
2542	associated with the routerID of the router with reachability to the
2543	specific route. This implies that it is possible to set up an LSP to
2544	every router in the routing domain, and then use the LSP for packets
2545	destined to the associated external routes.

2547	4.5.4.3 Configuration of the Hierarchy

2549	The possibility of having a variety of different relationships between
2550	the routing hierarchy and the MPLS hierarchy leads to an obvious
2551	question:  How is the relationship between the two hierarchies to be
2552	determined? At first glance it would seem that this generality leads to
2553	a relatively complex configuration issue, and it could be difficult to
2554	ensure consistent configuration of the network.

2556	One possible solution is to have the MPLS hierarchy default to using the
2557	same hierarchy structure as is used for routing, with each area and
2558	domain boundary (as used by routing) also implying an MPLS domain
2559	boundary. This would allow the normal default operation to conform to
2560	the type of operation that we might expect to be used in most
2561	situations, and would allow a common means of interoperation which we
2562	would expect all vendors of MPLS compliant equipment to support.

2564	4.5.5 Some Advantages of Hierarchical MPLS

2566	The use of hierarchical MPLS allows the routers internal to a transit
2567	routing domain to be isolated from the BGP-level routing information. In
2568	our example network, routers R4, R5, R6, and R7 can forward packets
2569	based solely on the lower level label. These internal routers do not
2570	need to know anything at all about higher level IP routing. Note that
2571	this advantage is not available in conventional IP forwarding: If the
2572	internal routers within a routing domain forward IP packets based on the
2573	destination IP address, then the internal routers need to know which
2574	route to use for any particular destination IP address. By combining
2575	hierarchical routing with label stacks MPLS is able to decouple the
2576	exterior and interior protocols. MPLS switches within a domain (interior
2577	switches) need only carry the reachability information for nodes in the
2578	domain. The MPLS border switches for the domain still, of course, carry
2579	the external routes.

2581	Use of hierarchical MPLS also extends the simpler forwarding offered by
2582	MPLS to domain boundary routers.

2584	MPLS places no bound on the number of labels that may be present in a
2585	label stack. In principal this means that MPLS can support multiple
2586	levels of routing hierarchy.

2588	4.6 I Interoperation of MPLS systems with "Conventional" ATM

2590	If we consider the implementation of MPLS on ATM switches we can imagine
2591	several possibilities.

2593	We might remove ATM Forum control plane completely. This the approach
2594	taken by Ipsilon in their IP Switching approach, and allows ATM switches
2595	to operate as MPLS LSRs.

2597	Alternately, we could build a system that supports a "Ships  in the
2598	night" (SIN) mode of operation where the ATM Forum and MPLS control
2599	planes both run on the same hardware but are isolated from each other,
2600	i.e. they do not interact. This allows a single device to simultaneously
2601	operate as both an MPLS LSR and an ATM switch.

2603	We feel that the MPLS architecture should allow both of these models. We
2604	note, however, that neither of them addresses the issue of operation of
2605	MPLS over a public ATM network, i.e. over a network that supports
2606	tariffed access to PVCs and ATM Forum SVCs.  Because public ATM service
2607	exists and will, presumably, become more pervasive in the future we feel
2608	that another model needs to be included in the architecture and be
2609	supported by the LDP.  We call this model the "integrated" model. In
2610	essence it is the same as the SIN model but without the restriction that
2611	the two control planes are isolated. In the integrated model the MPLS
2612	control plane is able to use the ATM control plane to setup SVCs as
2613	needed. An example of this integrated model that allows the coexistence
2614	and interoperation between ATM and MPLS is the CSR proposal from
2615	Toshiba.

2617	Note that there is a distinction relevant to the protocol specification
2618	process between the SIN and the Integrated approach. SIN does not
2619	require specification other than to require that it be transparent to
2620	both the MPLS and ATM control planes (i.e. neither should know of the
2621	others existence).  Realisation of SIN on a particular machine is purely
2622	an engineering challenge for the implementors. The Integrated model on
2623	the other hand requires specification of procedures for the use of SVCs
2624	and association of labels with them.

2626	4.7 Multicast

2628	This section is FFS.

2630	4.8 Multipath

2632	Many IP routing protocols support the notion of equal-cost multipath
2633	routes, in which a router maintains multiple next hops for one
2634	destination prefix when two or more equal-cost paths to the prefix
2635	exist. There are a few possible approaches for handling multipath with
2636	MPLS.

2638	In this discussion we will use the term "multipath node" to mean a node
2639	which is keeping track of multiple switched paths from itself for a
2640	single destination.

2642	The first approach maintains a separate switched path from each ingress
2643	node via one or more multipath nodes to a merge point. This requires
2644	MPLS to distinguish the separate switched paths, so that learning of a
2645	new switched path is not misinterpreted as a replacement of the same
2646	switched path. This also requires an ingress MPLS node be capable of
2647	distributing the traffic among the multiple switched paths. This
2648	approach preserves switching performance, but at a cost of proliferating
2649	the number of switched paths. For example, each switched path consumes a
2650	distinct label.

2652	The second approach establishes only one switched path from any one
2653	ingress node to a destination. However, when the paths from two
2654	different ingress nodes happen to arrive at the same node, that node may
2655	use different paths for each (implying that the node becomes a multipath
2656	node). Thus the switched path chosen by the multipath node may assign a
2657	different downstream path to each incoming stream. This conserves
2658	switched paths and maintains switching performance, but cannot balance
2659	loads across downstream links as well as the other approaches, even if
2660	switched paths are selectively assigned. With this approach is that the
2661	L2 path may be different from the normal L3 path, as traffic that
2662	otherwise would have taken multiple distinct paths is forced onto a
2663	single path.

2665	The third approach allows a single stream arriving at a multipath node
2666	to be split into multiple streams, by using L3 forwarding at the
2667	multipath node. For example, the multipath node might choose to use a
2668	hash function on the source and destination IP addresses, in order to
2669	avoid misordering packets between any one IP source and destination.
2670	This approach conserves switched paths at the cost of switching
2671	performance.

2673	4.9 Host Interactions

2675	There are a range of options for host interaction with MPLS:

2677	The most straightforward approach is no host involvement. Thus host
2678	operation may be completely independent of MPLS, rather hosts operate
2679	according to other IP standards. If there is no host involvement then
2680	this implies that the first hop requires an L3 lookup.

2682	If the host is ATM attached and doing NHRP, then this would allow the
2683	host to set up a Virtual Circuit to a router. However this brings up a
2684	range of issues as was discussed in section 4.4 ("interoperation with
2685	NHRP").

2687	On the ingress side, it is reasonable to consider having the first hop
2688	LSR provide labels to the hosts, and thus have hosts attach labels for
2689	packets that they transmit. This could allow the first hop LSR to avoid
2690	an L3 lookup. It is reasonable here to have the host request labels only
2691	when needed, rather than require the host to remember all labels
2692	assigned for use in the network.

2694	On the egress side, it is questionable whether hosts should be involved.
2695	For scaling reasons, it would be undesirable to use a different label
2696	for reaching each host.

2698	4.10 Explicit Routing

2700	There are two options for Route Selection: (1) Hop by hop routing, and
2701	(2) Explicit routing.

2703	An explicitly routed LSP is an LSP where, at a given LSR, the LSP next
2704	hop is not chosen by each local node, but rather is chosen by a single
2705	node (usually the ingress or egress node of the LSP). The sequence of
2706	LSRs followed by an explicit routing LSP may be chosen by configuration,
2707	or by an algorithm performed by a single node (for example, the egress
2708	node may make use of the topological information learned from a link
2709	state database in order to compute the entire path for the tree ending
2710	at that egress node).

2712	With MPLS the explicit route needs to be specified at the time that
2713	Labels are assigned, but the explicit route does not have to be
2714	specified with each L3 packet. This implies that explicit routing with
2715	MPLS is relatively efficient (when compared with the efficiency of
2716	explicit routing for pure datagrams).

2718	Explicit routing may be useful for a number of purposes such as allowing
2719	policy routing and/or facilitating traffic engineering.

2721	4.10.1 Establishment of Point to Point Explicitly Routed LSPs

2723	In order to establish a point to point explicitly routed LSP, the LDP
2724	packets used to set up the LSP must contain the explicit route. This
2725	implies that the LSP is set up in order either from the ingress to the
2726	egress, or from the egress to the ingress.

2728	One node needs to pick the explicit route: This may be done in at least
2729	two possible ways: (i) by configuration (eg, the explicit route may be
2730	chosen by an operator, or by a centralized server of some kind); (ii) By
2731	use of a routing protocol which allows the ingress and/or egress node to
2732	know the entire route to be followed. This would imply the use of a link
2733	state routing protocol (in which all nodes know the full topology) or of
2734	a path vector routing protocol (in which the ingress node is told the
2735	path as part of the normal operation of the routing protocol).

2737	Note: The normal operation of path vector routing protocols (such as
2738	BGP) does not provide the full set of routers along the path. This
2739	implies that either a partial source route only would be provided
2740	(implying that LSP setup would use a combination of hop by hop and
2741	explicit routing), or it would be necessary to augment the protocol in
2742	order to provide the complete explicit route. Detailed operation in this
2743	case is FFS.

2745	In the point to point case, it is relatively straightforward to specify
2746	the route to use: This is indicated by providing the addresses of each
2747	LSR on the LSP.

2749	4.10.2 Explicit and Hop by Hop routing: Avoiding Loops

2751	In general, an LSP will be explicit routed specifically because there is
2752	a good reason to use an alternative to the hop by hop routed path. This
2753	implies that the explicit route is likely to follow a path which is
2754	inconsistent with the path followed by hop by hop routing. If some of
2755	the nodes along the path follow an explicit route but some of the nodes
2756	make use of hop by hop routing (and ignore the explicit route), then
2757	inconsistent routing may result and in some cases loops (or severely
2758	inefficient paths) may form. This implies that for any one LSP, there
2759	are two possible options: (i) The entire LSP may be hop by hop routed;
2760	or (ii) The entire LSP may be explicit routed.

2762	For this reason, it is important that if an explicit route is specified
2763	for setting up an LSP, then that route must be followed in setting up
2764	the LSP.

2766	There is a related issue when a link or node in the middle of an
2767	explicitly routed LSP breaks: In this case, the last operating node on
2768	the upstream part of the LSP will continue receiving packets, but will
2769	not be able to forward them along the explicitly routed LSP (since its
2770	next hop is no longer functioning). In this case, it is not in general
2771	safe for this node to forward the packets using L3 forwarding with hop
2772	by hop routing. Instead, the packets must be discarded, and the upstream
2773	partition of the explicitly routed LSP must be torn down.

2775	Where part of an Explicitly Routed LSP breaks, the node which originated
2776	the LSP needs to be told about this. For robustness reasons the MPLS
2777	protocol design should not assume that the routing protocol will tell
2778	the node which originated the LSP. For example, it is possible that a
2779	link may go down and come back up quickly enough that the routing
2780	protocol never declares the link down. Rather, an explicit MPLS
2781	mechanism is needed.

2783	4.10.3 Merge and Explicit Routing

2785	Explicit Routing is slightly more complex with a multipoint to point LSP
2786	(i.e., in the case that stream merge is used).

2788	In this case, it is not possible to specify the route for the LSP as a
2789	simple list of LSRs (since the LSP does not consist of a simple sequence
2790	of LSRs). Rather the explicit route must specify a tree. There are
2791	several ways that this may be accomplished. Details are FFS.

2793	4.10.4 Using Explicit Routing for Traffic Engineering

2795	In the Internet today it is relatively common for ISPs to make use of a
2796	Frame Relay or ATM core, which interconnects a number of IP routers. The
2797	primary reason for use of a switching (L2) core is to make use of low
2798	cost equipment which provides very high speed forwarding. However, there
2799	is another very important reason for the use of a L2 core: In order to
2800	allow for Traffic Engineering.

2802	Traffic Engineering (also known as bandwidth management) refers to the
2803	process of managing the routes followed by user data traffic in a
2804	network in order to provide relatively equal and efficient loading of
2805	the resources in the network (i.e., to ensure that the bandwidth on
2806	links and nodes are within the capabilities of the links and nodes).

2808	Some rudimentary level of traffic engineering can be accomplished with
2809	pure datagram routing and forwarding by adjusting the metrics assigned
2810	to links. For example, suppose that there is a given link in a network
2811	which tends to be overloaded on a long term basis. One option would be
2812	to manually configure an increased metric value for this link, in the
2813	hopes of moving some traffic onto alternate routes. This provides a
2814	rather crude method of traffic engineering and provides only limited
2815	results.

2817	Another method of traffic engineering is to manually configure multiple
2818	PVCs across a L2 core, and to adjust the route followed by each PVC in
2819	an attempt to equalize the load on different parts of the network. Where
2820	necessary, multiple PVCs may be configured between the same two nodes,
2821	in order to allow traffic to be split between different paths. In some
2822	topologies it is much easier to achieve efficient non-overlapping or
2823	minimally-overlapping paths via this method (with manually configured
2824	paths) than it would be with pure datagram forwarding. A similar ability
2825	can be achieved with MPLS via the use of manual configuration of the
2826	paths taken by LSPs.

2828	A related issue is the decision on where merge is to occur. Note that
2829	once two streams merge into one stream (forwarded by a single label)
2830	then they cannot diverge again at that level of the MPLS hierarchy
2831	(i.e., they cannot be bifurcated without looking at a higher level label
2832	or the IP header). Thus there may be times when it is desirable to
2833	explicitly NOT merge two streams even though they are to the same egress
2834	node and FEC. Non-merge may be appropriate either because the streams
2835	will want to diverge later in the path (for example, to avoid
2836	overloading a particular downstream link), or because the streams may
2837	want to use different physical links in the case where multiple slower
2838	physical links are being aggregated into a single logical link for the
2839	purpose of IP routing.

2841	As a network grows to a very large size (on the order of hundreds of
2842	LSRs), it becomes increasingly difficult to handle the assignment of all
2843	routes via manual configuration. However, explicit routing allows
2844	several alternatives:

2846	1. Partial Configuration: One option is to use automatic/dynamic routing
2847	for most of the paths through the network, but then manually configure
2848	some routes. For example, suppose that full dynamic routing would result
2849	in a particular link being overloaded. One of the LSPs which uses that
2850	link could be selected and manually routed to use a different path.

2852	2. Central Computation: One option would be to provide long term network
2853	usage information to a single central management facility. That facility
2854	could then run a global optimization to compute a set of paths to use.
2855	Network management commands can be used to configure LSRs with the
2856	correct routes to use.

2858	3. Egress Computation: An egress node can run a computation which
2859	optimizes the path followed for traffic to itself. This cannot of course
2860	optimize the entire traffic load through the network, but can include
2861	optimization of traffic from multiple ingress's to one egress. The
2862	reason for optimizing traffic to a single egress, rather than from a
2863	single ingress, relates to the issue of when to merge: An ingress can
2864	never merge the traffic from itself to different egresses, but an egress
2865	can if desired chose to merge the traffic from multiple ingress's to
2866	itself.

2868	4.10.5 Using Explicit Routing for Policy Routing

2870	This section is FFS.

2872	4.11 Traceroute

2874	This section is FFS.

2876	4.12 LSP Control: Egress versus Local

2878	There is a choice to be made regarding whether the initial setup of LSPs
2879	will be initiated by the egress node, or locally by each individual
2880	node.

2882	When LSP control is done locally, then each node may at any time pass
2883	label bindings to its neighbors for each FEC recognized by that node. In
2884	the normal case that the neighboring nodes recognize the same FECs, then
2885	nodes may map incoming labels to outgoing labels as part of the normal
2886	label swapping forwarding method.

2888	When LSP control is done by the egress, then initially (on startup) only
2889	the egress node passes label bindings to its neighbors corresponding to
2890	any FECs which leave the MPLS network at that egress node. When
2891	initializing, other nodes wait until they get a label from downstream
2892	for a particular FEC before passing a corresponding label for the same
2893	FEC to upstream nodes.

2895	With local control, since each LSR is (at least initially) independently
2896	assigning labels to FECs, it is possible that different LSRs may make
2897	inconsistent decisions. For example, an upstream LSR may make a coarse
2898	decision (map multiple IP address prefixes to a single label) while its
2899	downstream neighbor makes a finer grain decision (map each individual IP
2900	address prefix to a separate label). With downstream label assignment
2901	this can be corrected by having LSRs withdraw labels that it has
2902	assigned which are inconsistent with downstream labels, and replace them
2903	with new consistent label assignments.

2905	This may appear to be an advantage of egress LSP control (since with
2906	egress control the initial label assignments "bubble up" from the egress
2907	to upstream nodes, and consistency is therefore easy to ensure).
2908	However, even with egress control it is possible that the choice of
2909	egress node may change, or the egress may (based on a change in
2910	configuration) change its mind in terms of the granularity which is to
2911	be used. This implies the same mechanism will be necessary to allow
2912	changes in granularity to bubble up to upstream nodes. The choice of
2913	egress or local control may therefore effect the frequency with which
2914	this mechanism is used, but will not effect the need for a mechanism to
2915	achieve consistency of label granularity.

2917	Egress control and local control can interwork in a very straightforward
2918	manner: With either approach, (assuming downstream label assignment) the
2919	egress node will initially assign labels for particular FECs and will
2920	pass these labels to its neighbors. With either approach these label
2921	assignments will bubble upstream, with the upstream nodes choosing
2922	labels that are consistent with the labels that they receive from
2923	downstream.

2925	The difference between the two techniques therefore becomes a tradeoff
2926	between avoiding a short period of initial thrashing on startup (in the
2927	sense of avoiding the need to withdraw inconsistent labels which may
2928	have been assigned using local control) versus the imposition of a short
2929	delay on initial startup (while waiting for the initial label
2930	assignments to bubble up from downstream). The protocol mechanisms which
2931	need to be defined are the same in either case, and the steady state
2932	operation is the same in either case.

2934	4.13 Security

2936	Security in a network using MPLS should be relatively similar to
2937	security in a normal IP network.

2939	Routing in an MPLS network uses precisely the same IP routing protocols
2940	as are currently used with IP. This implies that route filtering is
2941	unchanged from current operation. Similarly, the security of the routing
2942	protocols is not effected by the use of MPLS.

2944	Packet filtering also may be done as in normal IP. This will require
2945	either (i) that label swapping be terminated prior to any firewalls
2946	performing packet filtering (in which case a separate instance of label
2947	swapping may optionally be started after the firewall); or (ii) that
2948	firewalls "look past the labels", in order to inspect the entire IP
2949	packet contents. In this latter case note that the label may imply
2950	semantics greater than that contained in the packet header: In
2951	particular, a particular label value may imply that the packet is to
2952	take a particular path after the firewall. In environments in which this
2953	is considered to be a security issue it may be desirable to terminate
2954	the label prior to the firewall.

2956	Note that in principle labels could be used to speed up the operation of
2957	firewalls: In particular, the label could be used as an index into a
2958	table which indicates the characteristics that the packet needs to have
2959	in order to pass through the firewall. Depending upon implementation
2960	considerations matching the contents of the packet to the contents of
2961	the table may be quicker than parsing the packet in the absence of the
2962	label.

2964	References

2966	   [1] "A Proposed Architecture for MPLS", E. Rosen, A. Viswanathan, R.
2967	       Callon, work in progress, draft-ietf-mpls-arch-00.txt, August
2968	       1997.

2970	   [2] "ARIS: Aggregate Route-Based IP Switching", A. Viswanathan, N.
2971	       Feldman, R. Boivie, R. Woundy, work in progress, Internet Draft
2972	       <draft- viswanathan-aris-overview-00.txt>, March 1997.

2974	   [3] "ARIS Specification", N. Feldman, A. Viswanathan, work in
2975	       progress, Internet Draft <draft-feldman-aris-spec-00.txt>, March
2976	       1997.

2978	   [4] "ARIS Support for LAN Media Switching", S. Blake, A. Ghanwani, W.
2979	       Pace, V. Srinivasan, work in progress, Internet Draft <draft-
2980	       blake-aris-lan- 00.txt>, March 1997.

2982	   [5] "Tag Switching Architecture - Overview", Rekhter, Davie, Katz,
2983	       Rosen, Swallow, Farinacci, work in progress, Internet Draft
2984	       <draft-rekhter- tagswitch-arch-01.txt>, July 1997.

2986	   [6] Tag distribution Protocol", Doolan, Davie, Katz, Rekhter, Rosen,
2987	       work in progress, internet draft <draft-doolan-tdp-spec-01.txt>

2989	   [7] "Use of Tag Switching with ATM", Davie, Doolan, Lawrence,
2990	       McGloghrie, Rekhter, Rosen, Swallow, work in progress, Internet
2991	       Draft <draft-davie- tag-switching-atm-01.txt>

2993	   [8] "MPLS Label Stack Encoding", Rosen, Rekhter, Tappan, Farinacci,
2994	       Fedorkow, Li, Conta, work in progress, draft-ietf-mpls-label-
2995	       encaps-00.txt, November 1997.

2997	   [9] "Partitioning Tag Space among Multicast Routers on a Common
2998	       Subnet", Farinacci, work in progress, internet draft <draft-
2999	       farinacci-multicast- tag-part-00.txt>

3001	   [10] "Multicast Tag Binding and Distribution using PIM", Farinacci,
3002	        Rekhter, work in progress, internet draft <draft-farinacci-
3003	        multicast-tagsw- 00.txt>

3005	   [11] "Toshiba's Router Architecture Extensions for ATM: Overview",
3006	        Katsube, Nagami, Esaki, RFC2098.TXT.

3008	   [12] "Soft State Switching: A Proposal to Extend RSVP for Switching
3009	        RSVP Flows", A. Viswanathan, V. Srinivasan, work in progress,
3010	        Internet Draft <draft-viswanathan-aris-rsvp-00.txt>, March 1997.

3012	   [13] "Integrated Services in the Internet Architecture: an Overview",
3013	        R. Braden et al, RFC 1633, June 1994.

3015	   [14] "Resource ReSerVation Protocol (RSVP), Version 1 Functional
3016	        Specification", work in progress, draft-ietf-rsvp-spec-16.txt,
3017	        June 1997

3019	   [15] "OSPF version 2", J. Moy, RFC 1583, March 1994.

3021	   [16] "A Border Gateway Protocol 4 (BGP-4)", Y. Rekhter and T. Li,
3022	        RFC 1771, March 1995.

3024	   [17] "Ipsilon Flow Management Protocol Specification for IPv4 Version
3025	        1.0", P. Newman et al., RFC 1953, May 1996.

3027	   [18] "ATM Forum Private Network-Network Interface Specification,
3028	        Version 1.0", ATM Forum af-pnni-0055.000, March 1996.

3030	   [19] "NBMA Next Hop Resolution Protocol (NHRP)", Luciani, Katz,
3031	        Piscitello, Cole, work in progress, draft-ietf-rolc-nhrp-12.txt,
3032	        March 1998.

3034	Author's Addresses

3036	        Ross Callon
3037	        Ascend Communications, Inc.
3038	        1 Robbins Road
3039	        Westford, MA  01886
3040	        508-952-7412
3041	        rcallon@casc.com

3043	        Paul Doolan
3044	        Ennovate Networks
3045	        330 Codman Hill Road
3046	        Boxborough, MA
3047	        978-263-2002 x103
3048	        pdoolan@ennovatenetworks.com

3050	        Nancy Feldman
3051	        IBM Corp.
3052	        17 Skyline Drive
3053	        Hawthorne NY 10532
3054	        914-784-3254
3055	        nkf@vnet.ibm.com

3057	        Andre Fredette
3058	        Bay Networks Inc
3059	        3 Federal Street
3060	        Billerica, MA  01821
3061	        508-916-8524
3062	        fredette@baynetworks.com

3064	        George Swallow
3065	        Cisco Systems, Inc
3066	        250 Apollo Drive
3067	        Chelmsford, MA 01824
3068	        508-244-8143
3069	        swallow@cisco.com

3071	        Arun Viswanathan
3072	        IBM Corp.
3073	        17 Skyline Drive
3074	        Hawthorne NY 10532
3075	        914-784-3273
3076	        arunv@vnet.ibm.com