idnits 2.17.1 

draft-ietf-mpls-framework-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 119: '... MPLS forwarding MUST simplify packet ...'
     RFC 2119 keyword, line 125: '...ore technologies MUST be general with ...'
     RFC 2119 keyword, line 128: '...   particular media MAY be considered....'
     RFC 2119 keyword, line 130: '...ore technologies MUST be compatible wi...'
     RFC 2119 keyword, line 131: '...g protocols, and MUST be capable of op...'
     (23 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 126 has weird spacing: '...   link   tech...'

  == Line 479 has weird spacing: '...LS when  compa...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The MPLS protocol MUST not make assumptions about the forwarding
     capabilities of an MPLS node. Thus, MPLS must propose solutions that can
     leverage the benefits of a node that is capable of L3 forwarding, but
     must not mandate the node be capable of such.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 1999) is 9049 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ARCH'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ARIS'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ARIS-PROT'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ATM'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ATMVP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CR-LDP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ENCAP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FANP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FR'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'IPNAV'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'LDP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'LOOP-COLOR'

  == Outdated reference: A later version (-14) exists of
     draft-ietf-rolc-nhrp-12

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PNNI'

  ** Obsolete normative reference: RFC 1583 (Obsoleted by RFC 2178)

  ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref.
     'RFC1663')

  ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271)

  ** Downref: Normative reference to an Informational RFC: RFC 1953

  ** Downref: Normative reference to an Informational RFC: RFC 2098

  ** Downref: Normative reference to an Informational RFC: RFC 2105

  -- Unexpected draft version: The latest known version of 
     draft-ietf-rsvp-spec is -15, but you're referring to -16.

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RSVP-LSP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'TRAFENG'


     Summary: 11 errors (**), 0 flaws (~~), 5 warnings (==), 19 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                       R. Callon
2	Internet Draft                                    Ironbridge Networks
3	Expires: January 2000                                       P. Doolan
4	                                                    Ennovate Networks
5	                                                           N. Feldman
6	                                                                  IBM
7	                                                          A. Fredette
8	                                                      Nortel Networks
9	                                                           G. Swallow
10	                                                        Cisco Systems
11	                                                       A. Viswanathan
12	                                                  Lucent Technologies

14	                                                            July 1999

16	            A Framework for Multiprotocol Label Switching
17	                 <draft-ietf-mpls-framework-04.txt>

19	Status of this Memo

21	   This document is an Internet-Draft and is in full conformance with
22	   all provisions of Section 10 of RFC2026.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF), its areas, and its working groups. Note that
26	   other groups may also distribute working documents as Internet-
27	   Drafts.

29	   Internet-Drafts are draft documents valid for a maximum of six
30	   months and may be updated, replaced, or obsoleted by other
31	   documents at any time. It is inappropriate to use Internet-Drafts
32	   as reference material or to cite them other than as "work in
33	   progress."

35	   The list of current Internet-Drafts can be accessed at
36	   http://www.ietf.org/ietf/1id-abstracts.txt

38	   The list of Internet-Draft Shadow Directories can be accessed at
39	   http://www.ietf.org/shadow.html.

41	Abstract

43	   This document discusses technical issues and requirements for the
44	   Multiprotocol Label Switching working group. It is the intent of
45	   this document to produce a coherent description of all significant
46	   approaches which were and are being considered by the working
47	   group. Selection of specific approaches, making choices regarding
48	   engineering tradeoffs, and detailed protocol specification, are
49	   outside of the scope of this framework document.

51	Acknowledgments

53	   The ideas and text in this document have been collected from a
54	   number of sources and comments received. We would like to thank
55	   Eric Gray, Jim Luciani, Andy Malis, Rayadurgam Ravikanth, Yakov
56	   Rekhter, Eric Rosen, Vijay Srinivasan, and Pasi Vananen for their
57	   inputs and ideas.

59	1. Introduction and Requirements

61	1.1 Overview of MPLS

63	   The primary goal of the MPLS working group is to standardize a
64	   base technology that integrates the label swapping forwarding
65	   paradigm with network layer routing. This base technology (label
66	   swapping) is expected to improve the price/performance of network
67	   layer routing, improve the scalability of the network layer, and
68	   provide greater flexibility in the delivery of (new) routing
69	   services (by allowing new routing services to be added without a
70	   change to the forwarding paradigm).

72	   The initial MPLS effort will be focused on IPv4. However, the core
73	   technology will be extendible to multiple network layer protocols
74	   (e.g., Ipv6, IPX, Appletalk, DECnet, CLNP). MPLS is not confined
75	   to any specific link layer technology, it can work with any media
76	   over which Network Layer packets can be passed between network
77	   layer entities.

79	   MPLS makes use of a routing approach whereby the normal mode of
80	   operation is that L3 routing (e.g., existing IP routing protocols
81	   and/or new IP routing protocols) is used by all nodes to determine
82	   the routed path.

84	   MPLS provides a simple "core" set of mechanisms which can be
85	   applied in several ways to provide a rich functionality. The core
86	   effort includes:

88	   a) Semantics assigned to a stream label:

90	     - Labels are associated with specific streams of data.

92	   b) Forwarding Methods:

94	     - Forwarding is simplified by the use of short fixed length
95	       labels to identify streams.

97	     - Forwarding may require simple functions such as looking
98	       up a label in a table, swapping labels, and possibly
99	       decrementing and checking a TTL.

101	     - In some cases, MPLS may make direct use of underlying
102	       layer 2 forwarding, such as is provided by ATM [ATM] or
103	       Frame Relay [FR] equipment.

105	   c) Label Distribution Methods:

107	     - Allow nodes to determine which labels to use for
108	       specific streams.

110	     - This may use some sort of control exchange, and/or be
111	       piggybacked on a routing protocol.

113	   The MPLS working group will define the procedures and protocols
114	   used to assign significance to the forwarding labels and to
115	   distribute that information between cooperating MPLS forwarders.

117	1.2 Requirements

119	  - MPLS forwarding MUST simplify packet forwarding in order to
120	    do the following:

122	     o lower cost of high speed forwarding
123	     o improve forwarding performance

125	  - MPLS core technologies MUST be general with respect to data
126	    link   technologies (ie, work over a very wide range of
127	    underlying data links). Specific optimizations for
128	    particular media MAY be considered.

130	  - MPLS core technologies MUST be compatible with a wide range
131	    of routing protocols, and MUST be capable of operating
132	    independently of the underlying routing protocols. It has
133	    been observed that considerable optimizations can be
134	    achieved in some cases by small enhancements of existing
135	    protocols. Such enhancements MAY be considered in the case
136	    of IETF standard routing protocols, and if appropriate,
137	    coordinated with the relevant working group(s).

139	  - Routing protocols which are used in conjunction with MPLS
140	    might be based on distributed computation. As such, during
141	    routing transients, these protocols may compute forwarding
142	    paths which potentially contain loops. MPLS MUST provide
143	    protocol mechanisms to either prevent the formation of loops
144	    and /or contain the amount of (networking) resources that
145	    can be consumed due to the presence of loops.

147	  - MPLS forwarding MUST allow "aggregate forwarding" of user
148	    data; ie, allow streams to be forwarded as a unit and ensure
149	    that an identified stream takes a single path, where a
150	    stream may consist of the aggregate of multiple flows of
151	    user data. MPLS SHOULD provide multiple levels of
152	    aggregation support (e.g., from individual end to end
153	    application flows at one extreme, to aggregates of all flows
154	    passing through a specified switch or router at the other
155	    extreme).

157	  - MPLS MUST support operations, administration, and
158	    maintenance facilities at least as extensive as those
159	    supported in current IP networks. Current network management
160	    and diagnostic tools SHOULD continue to work in order to
161	    provide some backward compatibility. Where such tools are
162	    broken by MPLS, hooks MUST be supplied to allow equivalent
163	    functionality to be created.

165	  - MPLS core technologies MUST work with both unicast and
166	    multicast streams.

168	  - The MPLS core specifications MUST clearly state how MPLS
169	    operates in a hierarchical network.

171	  - Scalability issues MUST be considered and analyzed during
172	    the definition of MPLS. Very scaleable solutions MUST be
173	    sought.

175	  - MPLS core technologies MUST be capable of working with O(n)
176	    streams to switch all best-effort traffic, where n is the
177	    number of nodes in a MPLS domain. MPLS protocol standards
178	    MUST be capable of taking advantage of hardware that
179	    supports stream merging where appropriate. Note that O(n-
180	    squared) streams or VCs might also be appropriate for use in
181	    some cases.

183	  - The core set of MPLS standards, along with existing
184	    Internet standards, MUST be a self-contained solution. For
185	    example, the proposed solution MUST NOT require specific
186	    hardware features that do not commonly exist on network
187	    equipment at the time that the standard is complete.
188	    However, the solution MAY make use of additional optional
189	    hardware features (e.g., to optimize performance).

191	  - The MPLS protocol standards MUST support multipath routing
192	    and forwarding.

194	  - MPLS MUST be compatible with the IETF Integrated Services
195	    Model, including RSVP [RFC1663][RSVP].

197	  - It MUST be possible for MPLS switches to coexist with non
198	    MPLS switches in the same switched network. MPLS switches
199	    SHOULD NOT impose additional configuration on non-MPLS
200	    switches.

202	  - MPLS MUST allow "ships in the night" operation with
203	    existing layer 2 switching protocols (e.g., ATM Forum
204	    Signaling) (ie, MPLS must be capable of being used in the
205	    same network which is also simultaneously operating standard
206	    layer 2 protocols).

208	  - The MPLS protocol MUST support both topology-driven and
209	    traffic/request-driven label assignments.

211	1.3 Terminology

213	   aggregate stream

215	     synonym of "stream"

217	   DLCI

219	     a label used in Frame Relay networks to identify frame
220	     relay circuits

222	   flow

224	     a single instance of an application to application flow of
225	     data (as in the RSVP and IFMP use of the term "flow")

227	   forwarding equivalence class

229	     a group of L3 packets which are forwarded in the same
230	     manner (e.g., over the same path, with the same forwarding
231	     treatment). A forwarding equivalence class is therefore the
232	     set of L3 packets which could safely be mapped to the same
233	     label. Note that there may be reasons that packets from a
234	     single forwarding equivalence class may be mapped to
235	     multiple labels (e.g., when stream merge is not used).

237	   frame merge

239	     stream merge, when it is applied to operation over frame
240	     based media, so that the potential problem of cell
241	     interleave is not an issue.

243	   Label
244	     a short fixed length physically contiguous locally
245	     significant identifier which is used to identify a stream

247	   label information base

249	     the database of information containing label bindings

251	   label swap

253	     the basic forwarding operation consisting of looking up an
254	     incoming label to determine the outgoing label,
255	     encapsulation, port, and other data handling information

257	   label swapping

259	     a forwarding paradigm allowing streamlined forwarding of
260	     data by using labels to identify streams of data to be
261	     forwarded

263	   label switched hop

265	     the hop between two MPLS nodes, on which forwarding is done
266	     using labels.

268	   label switched path

270	     the path created by the concatenation of one or more label
271	     switched hops, allowing a packet to be forwarded by
272	     swapping labels from an MPLS node to another MPLS node

274	   layer 2

276	     the protocol layer under layer 3 (which therefore offers
277	     the services used by layer 3); Forwarding, when done by the
278	     swapping of short fixed length labels, occurs at layer 2
279	     regardless of whether the label being examined is an ATM
280	     VPI/VCI, a frame relay DLCI, or an MPLS label

282	   layer 3

284	     the protocol layer at which IP and its associated routing
285	     protocols operate

287	   link layer

289	     synonymous with layer 2

291	   loop detection

293	     a method in which loop setup may occur and data may be
294	     injected into the loop but a mechanism is provided to
295	     detect and break such loops

297	   loop prevention

299	     a method of dealing with loops in which data is never
300	     transmitted over a loop

302	   label stack

304	     an ordered set of labels

306	   loop survival

308	     a method of dealing with loops in which data may be
309	     transmitted over a loop, but means are employed to limit
310	     the amount of network resources which may be consumed by
311	     the looping data

313	   label switching router

315	     an MPLS node which is capable of forwarding native L3
316	     packets

318	   merge point

320	     the node at which multiple streams and switched paths are
321	     combined into a single stream sent over a single path; In
322	     the case that the multiple paths are not combined prior to
323	     the egress node, then the egress node becomes the merge
324	     point

326	   Mlabel

328	     abbreviation for MPLS label

330	   MPLS core standards

332	     the standards which describe the core MPLS technology

334	   MPLS domain

336	     a contiguous set of nodes which operate MPLS routing and
337	     forwarding and which are also in one Routing or
338	     Administrative Domain

340	   MPLS edge node

342	     an MPLS node that connects an MPLS domain with a node which
343	     is outside of the domain, either because it does not run
344	     MPLS, and/or because it is in a different domain; Note that
345	     if an LSR has a neighboring host which is not running MPLS,
346	     that LSR is an MPLS edge node

348	   MPLS egress node

350	     an MPLS edge node in its role in handling traffic as it
351	     leaves an MPLS domain

353	   MPLS ingress node

355	     an MPLS edge node in its role in handling traffic as it
356	     enters an MPLS domain

358	   MPLS label

360	     a label placed in a short MPLS shim header used to identify
361	     streams

363	   MPLS node

365	     a node which is running MPLS. An MPLS node will be aware of
366	     MPLS control protocols, will operate one or more L3 routing
367	     protocols, and will be capable of forwarding packets based
368	     on labels; An MPLS node may optionally be also capable of
369	     forwarding native L3 packets

371	   MultiProtocol Label Switching

373	     an IETF working group and the effort associated with the
374	     working group

376	   network layer

378	     synonymous with layer 3

380	   shortcut VC

382	     a VC set up as a result of an NHRP query and response

384	   stack

386	     synonymous with label stack

388	   stream

390	     an aggregate of one or more flows, treated as one flow for
391	     the purpose of forwarding in L2 and/or L3 nodes (e.g., may
392	     be described using a single label); In many cases a stream
393	     may be the aggregate of a very large number of flows.

395	     Synonymous with "aggregate stream"

397	   stream merge

399	     the merging of several smaller streams into a larger
400	     stream, such that for some or all of the path the larger
401	     stream can be referred to using a single label

403	   switched path

405	     synonymous with label switched path

407	   virtual circuit

409	     circuit used by a connection-oriented layer 2 technology
410	     such as ATM or Frame Relay, requiring the maintenance of
411	     state information in layer 2 switches

413	   VC merge

415	     stream merge when it is specifically applied to VCs,
416	     specifically so as to allow multiple VCs to merge into one
417	     single VC

419	   VP merge

421	     stream merge when it is applied to VPs, specifically so as
422	     to allow multiple VPs to merge into one single VP. In this
423	     case the VCIs need to be unique; This allows cells from
424	     different sources to be distinguished via the VCI

426	   VPI/VCI

428	     a label used in ATM networks to identify circuits

430	1.4 Acronyms and Abbreviations

432	   DLCI            Data Link Circuit Identifier

434	   FEC             Forwarding Equivalence Class

436	   ISP             Internet Service Provider

438	   LIB             Label Information Base

440	   LDP             Label Distribution Protocol

442	   L2              Layer 2

444	   L3              Layer 3
445	   LSP             Label Switched Path

447	   LSR             Label Switching Router

449	   MPLS            MultiProtocol Label Switching

451	   MPT             Multipoint to Point Tree

453	   NHC             Next Hop (NHRP) Client

455	   NHS             Next Hop (NHRP) Server

457	   VC              Virtual Circuit

459	   VCI             Virtual Circuit Identifier

461	   VPI             Virtual Path Identifier

463	1.5 Motivation for MPLS

465	   This section describes the expected and potential benefits of the
466	   MPLS over existing schemes. Specifically, this section discusses
467	   the advantages of MPLS over previous methods for building core
468	   networks (ie, networks for internet service providers or for major
469	   corporate backbones). The potential advantages of MPLS in campus
470	   and local area networks are not discussed in this section.

472	   There are currently two commonly used methods for building core IP
473	   networks: (i) Networks of datagram routers in which the core of
474	   the network is based on the datagram routers; (ii) Networks of
475	   datagram routers operating over an ATM core. In order to describe
476	   the advantages of MPLS, it is necessary to know which alternate to
477	   MPLS we are using for the comparison. This section is therefore
478	   split into two sections: Section 1.5.1 describes the advantages of
479	   MPLS when  compared to a pure datagram routed network. Section
480	   1.5.2 describes the advantages of MPLS when compared to an IP over
481	   ATM network.

483	   This section does not provide a complete list of requirements for
484	   MPLS. For example, Multipoint to Point Trees are important for
485	   MPLS to scale. However, datagram forwarding naturally acts in this
486	   way (since multiple sources are merged automatically), and the ATM
487	   forum is currently adding support for multipoint to point to the
488	   ATM standards. The ability to do MPTs is therefore important to
489	   MPLS, but does not represent an advantage over either datagram
490	   routing or IP over ATM, and therefore is not mentioned in this
491	   section.

493	1.5.1 Benefits Relative to Use of a Router Core
494	1.5.1.1 Simplified Forwarding

496	   Label swapping allows packet forwarding to be based on an exact
497	   match for a short label, rather than a longest match algorithm
498	   applied to a longer address as is required for normal datagram
499	   forwarding. In addition, the label headers used with MPLS are
500	   simpler than the headers typically used with datagram protocols
501	   such as IP. This in turn implies that MPLS allows a much simpler
502	   forwarding paradigm relative to datagrams, and implies that it is
503	   easier to build a high speed router using MPLS.

505	   Whether this simpler forwarding operation will result in
506	   availability of LSRs which can operate at higher speeds than
507	   datagram routers is controversial, and probably depends upon
508	   implementation details. There are some parts of the network, such
509	   as at hierarchical boundaries, where datagram IP forwarding at
510	   high speed will be required. This implies that implementation of a
511	   high speed router is highly desirable. In addition, there are
512	   currently multiple companies building high speed routers which
513	   will allow IP packets to be forwarded at very high speed. At
514	   speeds at least up to OC48, it appears that once the one-time
515	   engineering is completed, the per-unit cost associated with IP
516	   forwarding will be a small fraction of the overall equipment cost.

518	   However, there are also many existing routers which can benefit
519	   from the simpler forwarding allowed by MPLS. In addition, there
520	   are some routers being built with implementations that will
521	   benefit from the simpler forwarding available with MPLS.

523	1.5.1.2 Efficient Explicit Routing

525	   Explicit routing (aka Source Routing) is a very powerful technique
526	   which potentially can be useful for a variety of purposes.
527	   However, with pure datagram routing the overhead of carrying a
528	   complete explicit route with each packet is prohibitive. However,
529	   MPLS allows the explicit route to be carried only at the time that
530	   the label switched path is set up, and not with each packet. This
531	   implies that MPLS makes explicit routing practical. This in turn
532	   implies that MPLS can make possible a number of advanced routing
533	   features which depend upon explicit routing.

535	1.5.1.3 Traffic Engineering

537	   Traffic engineering refers to the process of selecting the paths
538	   chosen by data traffic in order to balance the traffic load on the
539	   various links, routers, and switches in the network. Traffic
540	   engineering is most important in networks where multiple parallel
541	   or alternate paths are available. The rapid growth in the
542	   Internet, and particularly the associated rapid growth in the
543	   demand for bandwidth, has tended to cause some core networks to
544	   become increasingly "branchy" in recent years, resulting in an
545	   increase in the importance of traffic engineering [TRAFENG].

547	   It is common today, in networks that are running IP over an ATM
548	   core using PVCs, to manually configure the path of each PVC in
549	   order to equalize the traffic levels on different links in the
550	   network. Thus traffic engineering is typically done today in IP
551	   over ATM networks using manual configuration.

553	   Traffic engineering is difficult to accomplish with datagram
554	   routing. Some degree of load balancing can be obtained by
555	   adjusting the metrics associated with network links. However,
556	   there is a limit to how much can be accomplished in this way, and
557	   in networks with a large number of alternative paths between any
558	   two points balancing of the traffic levels on all links is
559	   difficult to achieve solely by adjustment of the metrics used with
560	   hop by hop datagram routing.

562	   MPLS allows streams from any particular ingress node to any
563	   particular egress node to be individually identified. MPLS
564	   therefore provides a straightforward mechanism to measure the
565	   traffic associated with each ingress node to egress node pair. In
566	   addition, since MPLS allows efficient explicit routing of Label
567	   Switched Paths, it is straightforward to ensure that any
568	   particular stream of data takes the preferred path.

570	   The hard part of traffic engineering is selection of the method
571	   used to route each Label Switched Path. There are a variety of
572	   possible ways to do this, ranging from manual configuration of
573	   routes, to use of a routing protocol which announces traffic loads
574	   in the network combined with background recomputation of paths.

576	1.5.1.4 QoS Routing

578	   QoS routing refers to a method of routing in which the route
579	   chosen for a particular stream is chosen in response to the QoS
580	   required for that stream. In many cases QoS routing needs to make
581	   use of explicit routing for several reasons:

583	   In some cases specific bandwidth is likely to be reserved for each
584	   of many specific streams of data. This implies that the total
585	   bandwidth of multiple streams may exceed the bandwidth available
586	   on any particular link, and thus not all streams, even between the
587	   same ingress and egress nodes, can take the same path. Instead,
588	   individual streams will need to be individually routed. This is
589	   somewhat analogous to traffic engineering, but might require
590	   separation of streams on a finer granularity. Thus explicit
591	   routing may be needed in order to allow each stream to be
592	   individually routed, and to eliminate the need for each switch
593	   along the path of a stream to compute the route for each stream.

595	   Consider the case of routing a stream with a specific bandwidth
596	   requirement: In this case the route chosen will depend upon the
597	   amount of bandwidth which is requested. For any one given
598	   bandwidth, it is straightforward to select a path. However there
599	   are a lot of different levels of bandwidth which could in
600	   principle be requested. This makes it impractical to precompute
601	   all possible paths for all possible bandwidths. If the path for a
602	   particular stream must be computed on demand, then it is
603	   undesirable to require every LSR on the path to compute the path.
604	   Instead, it is preferable to have the first node compute the path
605	   and specify the route to be followed through use of an explicit
606	   route.

608	   For a variety of reasons the information available for QoS routing
609	   may in some cases be slightly out of date. This implies that the
610	   attempt to select a specific path for a QoS-sensitive stream may
611	   in some cases fail, due to a particular node or link not having
612	   the required resources available. In these cases it is not in
613	   general always feasible to tell all other nodes in the network of
614	   the limited resource in one particular network element. If
615	   explicit routing is available, then this permits the initial node
616	   of the stream (the ingress node in MPLS) to be informed that the
617	   indicated network element is not able to carry the stream,
618	   allowing an alternate path to be selected. However, in this case
619	   the node that selects the alternate path has to use explicit
620	   routing in order to force the stream to follow the alternate path.

622	   These and similar examples implies that explicit routing is
623	   necessary in order to do an adequate job of QoS routing. Given
624	   that MPLS allows efficient explicit routing, it follows that MPLS
625	   also facilitates QoS routing.

627	1.5.1.5 Mappings from IP Packet to Forwarding Equivalence Class

629	   MPLS allows the mapping from IP packet to forwarding equivalence
630	   class to be performed only once, at the ingress to an MPLS area.
631	   This facilitates complex mappings from IP packet to FEC that would
632	   otherwise be impractical.

634	   For example, consider the case of provisioned QoS: Some ISPs offer
635	   a service wherein specific customers subscribe to receive
636	   differentiated services (e.g., their packets may receive
637	   preferential forwarding treatment). Mapping of IP packets to the
638	   service level may require knowing the customer who is transmitting
639	   the packet, which may in turn require packet filtering based on
640	   source and destination address, incoming interface, and other
641	   characteristics. The sheer number of filters that are needed in a
642	   moderate sized ISP preclude repetition of the filters at every
643	   router throughout the network. Also, some information such as
644	   incoming interface is not available except at the ingress node to
645	   the network. This implies that the preferred way to offer
646	   provisioned QoS is to map the packet at the ingress point to the
647	   preferred QoS level, and then label the packet in some way. MPLS
648	   offers an efficient method to label the QoS class associated with
649	   any particular packet.

651	   Other examples of complex mappings from IP packet to FEC are also
652	   likely to be determined as MPLS is deployed.

654	1.5.1.6 Partitioning of Functionality

656	   Due to the support of the different label granularities, it will
657	   be possible to hierarchically partition the processing
658	   functionality to the different network elements, so that the more
659	   heavy processing takes place on the edges of the network, near the
660	   customers, and on the core network the processing is as simple as
661	   possible, e.g. pure label based forwarding.

663	   AS level aggregations will enable building of the fully switched
664	   backbone networks and traffic exchange points. Also, it will be
665	   possible for operators to fully switch the transit traffic
666	   traveling through the operator's network. Deaggregation will be
667	   needed for the streams that are destined in the networks connected
668	   to the MPLS domain, but it shall be noted that this deaggregation
669	   will only need to perform lookup operations associated with
670	   finding the label for egress router or interface, e.g. TOS
671	   information bound to label in source is still valid, and can be
672	   honored on basis of which label the packet was received in. It
673	   shall be noted that it is even impossible for the receiving domain
674	   to do the classification as the original packet classification
675	   policy is not known by the receiving domain.

677	   As one example of the improved functional partitioning, consider
678	   the case of the use of packet filters to map IP packets into a
679	   substantial number of queues, such that each queue receives
680	   differentiated services. For example, suppose that a network
681	   supports individual queuing for on the order of 100 different
682	   customers, with packets mapped to queues based on the source and
683	   destination IP address. In this case, with MPLS the packet
684	   filtering can be done solely on the edge of the network, with the
685	   packets mapped to labels such that each individual user receives
686	   separate labels. Thus with MPLS the filtering can be performed at
687	   the edge only of the network. This allows complex mappings of IP
688	   packets to forwarding equivalence class.

690	1.5.1.7 Single Forwarding Paradigm with Service Level Differentiation

692	   MPLS can allow a single forwarding paradigm to be used to support
693	   multiple types of service on the same network.

695	   Because of the forwarding paradigm, it will be possible to carry
696	   the different services through the same network elements,
697	   regardless of the control plane protocols used for the population
698	   of the LSR's LIB. It is for example possible, in case of ATM based
699	   switching system to support all the native ATM services, frame
700	   relay services, and labeled IP services. The simultaneous support
701	   of multiple service may need partitioning of the label space
702	   between the services, and shall be supported by the label
703	   distribution management protocol.

705	   Non-exhaustive list of examples of the services suitable for
706	   carrying over LSRs are IP traffic, Frame Relay traffic, ATM
707	   traffic (in case of cell switching), IP tunneling, VPNs, and other
708	   datagram protocols.

710	   Note that MPLS does not necessarily use the same header format
711	   over all types of media. However, over any particular type of
712	   media a single header format (at least for the lowest level of the
713	   Label Stack) should be possible.

715	1.5.2 Benefits Relative to Use of an ATM or Frame Relay Core

717	   Note: This section compares MPLS with other methods for
718	   interconnecting routers over a switched core network. We are not
719	   considering methods for interconnecting hosts located on virtual
720	   networks. For example the ATM Forum LANE and MPOA standards
721	   support virtual networks. MPLS does not directly support virtual
722	   networks, and should not be compared directly with MPOA or LANE.

724	   Previously available methods for interconnecting routers in an IP
725	   over ATM environment make use of either: (i) a full mesh 'n-
726	   squared' overlay of virtual circuits between n ATM-attached
727	   routers; (ii) A partial mesh of VCs between routers; or (iii) A
728	   partial mesh of VCs, plus the use of NHRP to facilitate on demand
729	   cut-through SVCs.

731	1.5.2.1 Scaling of the Routing Protocol

733	   Relative to the interconnection of IP over an ATM core, MPLS
734	   improves the scaling of routing due to reduced number of peers and
735	   elimination of the 'n-squared' logical links between routers used
736	   to operate the routing protocols.

738	   Because all LSRs will run standard routing protocols, the number
739	   of the peers routers need to communicate with are reduced to the
740	   number of the LSRs and router given LSR is directly connected to,
741	   instead of having to peer with large number of routers at the ends
742	   of the switched L2 paths. This benefit is achieved because the
743	   edge LSRs do not need to peer with every other edge LSR in the
744	   domain as is the case on a hybrid switch / router network.

746	1.5.2.2 Common Operation over Packet and Cell media

748	   MPLS makes use of common methods for routing and forwarding over
749	   packet and cell media, and potentially allows a common approach to
750	   traffic engineering, QoS routing, and other aspects of operation.
751	   For example, this means that the same method for label
752	   distribution can be used over Frame Relay and ATM media, as well
753	   as between LSRs using the MPLS Shim Header for forwarding over
754	   other media (such as PPP links and broadcast LANs).

756	   Note: There may be some differences with respect to the operation
757	   of different media. For example, if VP merge is used with ATM
758	   media (rather than VC merge) then the merge operation may be
759	   somewhat different than what it would be with packet media or with
760	   ATM using VC merge.

762	1.5.2.3 Easier Management

764	   The use of a common method for label distribution and common
765	   routing protocols over multiple types of media is expected to
766	   simplify network management of MPLS networks.

768	1.5.2.4 Elimination of the 'Routing over Large Clouds' Issue

770	   MPLS eliminates the need to use NHRP and on-demand cut-through
771	   SVCs for operation over ATM. This eliminates the latency problem
772	   associated with cut-through SVCs.

774	2. Discussion of Core MPLS Components

776	2.1 The Basic Routing Approach

778	   Routing is accomplished through the use of standard L3 routing
779	   protocols, such as OSPF and BGP [RFC1583][RFC1771]. The
780	   information maintained by the L3 routing protocols is then used to
781	   distribute labels to neighboring nodes that are used in the
782	   forwarding of packets as described below. In the case of ATM
783	   networks, the labels that are distributed are VPI/VCIs and a
784	   separate protocol (ie, PNNI) is not necessary for the
785	   establishment of VCs for IP forwarding.

787	   The topological scope of a routing protocol (ie routing domain)
788	   and the scope of label switching MPLS-capable nodes may be
789	   different. For example, MPLS-knowledgeable and MPLS-ignorant
790	   nodes, all of which are OSPF routers, may be co-resident in an
791	   area. In the case that neighboring routers know MPLS, labels can
792	   be exchanged and used.

794	   Neighboring MPLS routers may use configured PVCs or PVPs to tunnel
795	   through non-participating ATM or FR switches.

797	2.2 Labels

799	   In addition to the single routing protocol approach discussed
800	   above, the other key concept in the basic MPLS approach is the use
801	   of short fixed length labels to simply user data forwarding.

803	2.2.1 Label Semantics

805	   It is important that the MPLS solutions are clear about what
806	   semantics (ie, what knowledge of the state of the network) is
807	   implicit in the use of labels for forwarding user data packets or
808	   cells.

810	   At the simplest level, a label may be thought of as nothing more
811	   than a shorthand for the packet header, in order to index the
812	   forwarding decision that a router would make for the packet. In
813	   this context, the label is nothing more than a shorthand for an
814	   aggregate stream of user data.

816	   This observation leads to one possible very simple interpretation
817	   that the "meaning" of the label is a strictly local issue between
818	   two neighboring nodes. With this interpretation: (i) MPLS could be
819	   employed between any two neighboring nodes for forwarding of data
820	   between those nodes, even if no other nodes in the network
821	   participate in MPLS; (ii) When MPLS is used between more than two
822	   nodes, then the operation between any two neighboring nodes could
823	   be interpreted as independent of the operation between any other
824	   pair of nodes. This approach has the advantage of semantic
825	   simplicity, and of being the closest to pure datagram forwarding.
826	   However this approach (like pure datagram forwarding) has the
827	   disadvantage that when a packet is forwarded it is not known
828	   whether the packet is being forwarded into a loop, into a black
829	   hole, or towards links which have inadequate resources to handle
830	   the traffic flow. These disadvantages are necessary with pure
831	   datagram forwarding, but are optional design choices to be made
832	   when label switching is being used.

834	   There are cases where it would be desirable to have additional
835	   knowledge implicit in the existence of the label. For example, one
836	   approach to avoiding loops (see section 4.3) involves signaling
837	   the label distribution along a path before packets are forwarded
838	   on that path. With this approach the fact that a node has a label
839	   to use for a particular IP packet would imply the knowledge that
840	   following the label (including label swapping at subsequent nodes)
841	   leads to a non-looping path which makes progress towards the
842	   destination (something which is usually, but not necessarily
843	   always true when using pure datagram routing). This would of
844	   course require some sort of label distribution/setup protocol
845	   which signals along the path being setup before the labels are
846	   available for packet forwarding. However, there are also other
847	   consequences to having additional semantics associated with the
848	   label: specifically, procedures are needed to ensure that the
849	   semantics are correct. For example, if the fact that you have a
850	   label for a particular destination implies that there is a loop-
851	   free path, then when the path changes some procedures are required
852	   to ensure that it is still loop free. Another example of semantics
853	   which could be implicit in a label is the identity of the higher
854	   level protocol type which is encoded using that label value.

856	   In either case, the specific value of a label to use for a stream
857	   is strictly a local issue; however the decision about whether to
858	   use the label may be based on some global (or at least wider
859	   scope) knowledge that, for example, the label-switched path is
860	   loop-free and/or has the appropriate resources.

862	   A similar example occurs in ATM networks: With standard ATM a
863	   signaling protocol is used which both reserves resources in
864	   switches along the path, and which ensures that the path is loop-
865	   free and terminates at the correct node. Thus implicit in the fact
866	   that an ATM node has a VPI/VCI for forwarding a particular piece
867	   of data is the knowledge that the path has been set up
868	   successfully.

870	   Another similar example occurs with multipoint to point trees over
871	   ATM (see section 4.2 below), where the multipoint to point tree
872	   uses a VP, and cell interleave at merge points in the tree is
873	   handled by giving each source on the tree a distinct VCI within
874	   the VP. In this case, the fact that each source has a known
875	   VPI/VCI to use needs to (implicitly or explicitly) imply the
876	   knowledge that the VCI assigned to that source is unique within
877	   the context of the VP.

879	   In general labels are used to optimize how the system works, not
880	   to control how the system works. For example, the routing protocol
881	   determines the path that a packet follows. The presence or absence
882	   of a label assignment should not effect the path of a L3 packet.
883	   Note however that the use of labels may make capabilities such as
884	   explicit routes, loadsharing, and multipath more efficient.

886	2.2.2 Label Granularity

888	   Labels are used to create a simple forwarding paradigm. The
889	   essential element in assigning a label is that the device which
890	   will be using the label to forward packets will be forwarding all
891	   packets with the same label in the same way. If the packet is to
892	   be forwarded solely by looking at the label, then at a minimum,
893	   all packets with the same incoming label should be forwarded out
894	   the same port(s) with the same encapsulation(s), and with the same
895	   next hop label if any (although the special cases of multipath and
896	   load sharing may be an exception to this rule).

898	   The term "forwarding equivalence class" is used to refer to a set
899	   of L3 packets which are all forwarded in the same manner by a
900	   particular LSR (for example, the IP packets in a forwarding
901	   equivalence class may be destined for the same egress from an MPLS
902	   network, and may be associated with the same QoS class). A
903	   forwarding equivalence class is therefore the set of L3 packets
904	   which could safely be mapped to the same label. Note that there
905	   may be reasons that packets from a single forwarding equivalence
906	   class may be mapped to multiple labels (e.g., when stream merge is
907	   not used).

909	   Note that the label could also mean "ignore this label and forward
910	   based on what is contained within," where within one might find a
911	   label (if a stack of labels is used) or a layer 3 packet.

913	   For IP unicast traffic, the granularity of a label allows various
914	   levels of aggregation in a Label Information Base (LIB). At one
915	   end of the spectrum, a label could represent a host route (ie the
916	   full 32 bits of IP address). If a router forwards an entire CIDR
917	   prefix in the same way, it may choose to use one label to
918	   represent that prefix. Similarly if the router is forwarding
919	   several (otherwise unrelated) CIDR prefixes in the same way it may
920	   choose to use the same label for this set of prefixes. For
921	   instance all CIDR prefixes which share the same BGP Next Hop could
922	   be assigned the same label. Taking this to the limit, an egress
923	   router may choose to advertise all of its prefixes with the same
924	   label.

926	   By introducing the concept of an egress identifier, the
927	   distribution of labels associated with groups of CIDR prefixes can
928	   be simplified. For instance, an egress identifier might specify
929	   the BGP Next Hop, with all prefixes routed to that next hop
930	   receiving the label associated with that egress identifier.
931	   Another natural place to aggregate would be the MPLS egress
932	   router. This would work particularly well in conjunction with a
933	   link-state routing protocol, where the association between egress
934	   router and CIDR prefix is already distributed throughout an area.

936	   For IP multicast, the natural binding of a label would be to a
937	   multicast tree, or rather to the branch of a tree which extends
938	   from a particular port. Thus for a shared tree, the label
939	   corresponds to the multicast group, (*,G). For (S,G) state, the
940	   label would correspond to the source address and the multicast
941	   group.

943	   A label can also have a granularity finer than a host route. That
944	   is, it could be associated with some combination of source and
945	   destination address or other information within the packet. This
946	   might for example be done on an administrative basis to aid in
947	   effecting policy. A label could also correspond to all packets
948	   which match a particular Integrated Services filter specification.

950	   Labels can also represent explicit routes. This use is
951	   semantically equivalent to using an IP tunnel with a complete
952	   explicit route. This is discussed in more detail in section 4.10.

954	2.2.2.1 Examples of Unicast traffic granularities:

956	  - PQ (Port Quadruples) same IP source address prefix,
957	    destination address prefix, TTL, IP protocol and TCP/UDP
958	    source/destination ports

960	  - PQT (Port Quadruples with TOS) same IP source address
961	    prefix, destination address prefix, TTL, IP protocol and
962	    TCP/UDP source/destination ports and same IP header TOS
963	    field (including Precedence and TOS bits).

965	  - HP (Host Pairs) Same specific IP source and destination
966	    address (32 bit)

968	  - NP (Network Pairs) Same IP source and destination address
969	    prefixes (variable length)

971	  - DN (Destination Network) Same IP destination network
972	    address prefix (variable length)

974	  - ER (Egress Router) Same egress router ID (e.g. OSPF)

976	  - NAS (Next-hop AS) Same next-hop AS number (BGP)

978	  - DAS (Destination AS) Same destination AS number (BGP)

980	2.2.2.2 Multicast traffic granularities:

982	  - SST (Source Specific Tree) Same source address and
983	    multicast group

985	  - SMT (Shared Multicast Tree) Same multicast group address

987	2.2.3 Label Assignment

989	   Essential to label switching is the notion of binding between a
990	   label and Network Layer routing (routes). A control component is
991	   responsible for creating label bindings, and then distributing the
992	   label binding information among label switches. Label assignment
993	   involves allocating a label, and then binding a label to a route.

995	   Label assignment can be driven by control traffic or by data
996	   traffic. This is discussed in more detail in section 3.4.

998	   Control traffic driven label assignment has several advantages, as
999	   compared to data traffic driven label assignment. For one thing,
1000	   it minimizes the amount of additional control traffic needed to
1001	   distribute label binding information, as label binding information
1002	   is distributed only in response to control traffic, independent of
1003	   data traffic. It also makes the overall scheme independent of and
1004	   insensitive to the data traffic profile/pattern. Control traffic
1005	   driven creation of label binding improves forwarding latency, as
1006	   labels are assigned before data traffic arrives, rather than being
1007	   assigned as data traffic arrives. It also simplifies the overall
1008	   system behavior, as the control plane is controlled solely by
1009	   control traffic, rather than by a mix of control and data traffic.

1011	   There are however situations where data traffic driven label
1012	   assignment is necessary. A particular case may occur with ATM
1013	   without VP or VC merge. In this case in order to set up a full
1014	   mesh of VCs would require n-squared VCs. However, in very large
1015	   networks this may be infeasible. Instead VCs may be setup where
1016	   required for forwarding data traffic. In this case it is generally
1017	   not possible to know a priori how many such streams may occur.

1019	   Label withdrawal is required with both control-driven and data-
1020	   driven label assignment. Label withdrawal is primarily a matter of
1021	   garbage collection, that is collecting up unused labels so that
1022	   they may be reassigned. Generally speaking, a label should be
1023	   withdrawn when the conditions that allowed it to be assigned are
1024	   no longer true. For example, if a label is imbued with extra
1025	   semantics such as loop-free-ness, then the label must be withdrawn
1026	   when those extra semantics cease to hold.

1028	   In certain cases, notably multicast, it may be necessary to share
1029	   a label space between multiple entities. If these sharing
1030	   arrangements are altered by the coming and going of neighbors,
1031	   then labels which are no longer controlled by an entity must be
1032	   withdrawn and a new label assigned.

1034	2.2.4 Label Stack and Forwarding Operations

1036	   The basic forwarding operation consists of looking up the incoming
1037	   label to determine the outgoing label, encapsulation, port, and
1038	   any additional information which may pertain to the stream such as
1039	   a particular queue or other QoS related treatment. We refer to
1040	   this operation as a label swap.

1042	   When a packet first enters an MPLS domain, the packet is forwarded
1043	   by normal layer 3 forwarding operations with the exception that
1044	   the outgoing encapsulation will now include a label. We refer to
1045	   this operation as a label push. When a packet leaves an MPLS
1046	   domain, the label is removed. We refer to this as a label pop.

1048	   In some situations, carrying a stack of labels is useful. For
1049	   instance both IGP and BGP label could be used to allow routers in
1050	   the interior of an AS to be free of BGP information. In this
1051	   scenario, the "IGP" label is used to steer the packet through the
1052	   AS and the "BGP" label is used to switch between ASes.

1054	   With a label stack, the set of label operations remains the same,
1055	   except that at some points one might push or pop multiple labels,
1056	   or pop & swap, or swap & push.

1058	2.3 Encapsulation

1060	   Label-based forwarding makes use of various pieces of information,
1061	   including a label or stack of labels, and possibly additional
1062	   information such as a TTL field [ENCAP]. In some cases this
1063	   information may be encoded using an MPLS header, in other cases
1064	   this information may be encoded in L2 headers. Note that there may
1065	   be multiple types of MPLS headers. For example, the header used
1066	   over one media type may be different than is used over a different
1067	   media type. Similarly, in some cases the information that MPLS
1068	   makes use of may be encoded in an ATM header. We will use the term
1069	   "MPLS encapsulation" to refer to whatever form is used to
1070	   encapsulate the label information and other information used for
1071	   label based forwarding. The term "MPLS header" will be used where
1072	   this information is carried in some sort of MPLS-specific header
1073	   (ie, when the MPLS information cannot all be carried in a L2
1074	   header). Whether there is one or multiple forms of possible MPLS
1075	   headers is also outside of the scope of this document.

1077	   The exact contents of the MPLS encapsulation is outside of the
1078	   scope of this document. Some fields, such as the label, are
1079	   obviously needed. Some others might or might not be standardized,
1080	   based on further study. An encapsulation scheme may make use of
1081	   the following fields:
1082	     -  label
1083	     -  TTL
1084	     -  class of service
1085	     -  stack indicator
1086	     -  next header type indicator
1087	     -  checksum

1089	   It is desirable to have a very short encapsulation header. For
1090	   example, a four byte encapsulation header adds to the convenience
1091	   of building a hardware implementation that forwards based on the
1092	   encapsulation header. But at the same time it is tricky assigning
1093	   such a limited number of bits to carry the above listed
1094	   information in an MPLS header. Hence careful consideration must be
1095	   given to the information chosen for an MPLS header.

1097	   A TTL value in the MPLS header may be useful in the same manner as
1098	   it is in IP. Specifically, TTL may be used to terminate packets
1099	   caught in a routing loop, and for other related uses such as
1100	   traceroute. The TTL mechanism is a simple and proven method of
1101	   handling such events. Another use of TTL is to expire packets in a
1102	   network by limiting their "time to live" and eliminating stale
1103	   packets that may cause problems for some of the higher layer
1104	   protocols. When used over link layers which do not provide a TTL
1105	   field, alternate mechanisms will be needed to replace the uses of
1106	   the TTL field.

1108	   A provision for a class of service (COS) field in the MPLS header
1109	   allows multiple service classes within the same label. However,
1110	   when more sophisticated QoS is associated with a label, the COS
1111	   may not have any significance. Alternatively, the COS (like QoS)
1112	   can be left out of the header, and instead propagated with the
1113	   label assignment, but this entails that a separate label be
1114	   assigned to each required class of service. Nevertheless, the COS
1115	   mechanism provides a simple method of segregating flows within a
1116	   label.

1118	   As previously mentioned, the encapsulation header can be used to
1119	   derive benefits of tunneling (or stacking).

1121	   The MPLS header must provide a way to indicate that multiple MPLS
1122	   headers are stacked (ie, the "stack indicator"). For this purpose
1123	   a single bit in the MPLS header will suffice. In addition, there
1124	   are also some benefits to indicating the type of the protocol
1125	   header following the MPLS header (ie, the "next header type
1126	   indicator"). One option would be to combine the stack indicator
1127	   and next header type indicator into a single value (ie, the next
1128	   header type indicator could be allowed to take the value "MPLS
1129	   header"). Another option is to have the next header type indicator
1130	   be implicit in the label value (such that this information would
1131	   be propagated along with the label).

1133	   There is no compelling reason to support a checksum field in the
1134	   MPLS header. A CRC mechanism at the L2 layer should be sufficient
1135	   to ensure the integrity of the MPLS header.

1137	3. Observations, Issues and Assumptions

1139	3.1 Layer 2 versus Layer 3 Forwarding
1140	   MPLS uses L2 forwarding as a way to provide simple and fast packet
1141	   forwarding capability. One primary reason for the simplicity of L2
1142	   layer forwarding comes from its short, fixed length labels. A node
1143	   forwarding at L3 must parse a (relatively) large header, and
1144	   perform a longest-prefix match to determine a forwarding path.
1145	   However, when a node performs L2 label swapping, and labels are
1146	   assigned properly, it can do a direct index lookup into its
1147	   forwarding (or in this case, label-swapping) table with the short
1148	   header. It is arguably simpler to build label swapping hardware
1149	   than it is to build L3 forwarding hardware because the label
1150	   swapping function is less complex.

1152	   The relative performance of L2 and L3 forwarding may differ
1153	   considerably between nodes. Some nodes may illustrate an order of
1154	   magnitude difference. Other nodes (for example, nodes with more
1155	   extensive L3 forwarding hardware) may have identical performance
1156	   at L2 and L3. However, some nodes may not be capable of doing a L3
1157	   forwarding at all (e.g. ATM), or have such limited capacity as to
1158	   be unusable at L3. In this situation, traffic must be blackholed
1159	   if no switched path exists.

1161	   On nodes in which L3 forwarding is slower than L2 forwarding,
1162	   pushing traffic to L3 when no L2 path is available may cause
1163	   congestion. In some cases this could cause data loss (since L3 may
1164	   be unable to keep up with the increased traffic). However, if data
1165	   is discarded, then in general this will cause TCP to backoff,
1166	   which would allow control traffic, traceroute and other network
1167	   management tools to continue to work.

1169	   The MPLS protocol MUST not make assumptions about the forwarding
1170	   capabilities of an MPLS node. Thus, MPLS must propose solutions
1171	   that can leverage the benefits of a node that is capable of L3
1172	   forwarding, but must not mandate the node be capable of such.

1174	   Why We Will Still Need L3 Forwarding:

1176	   MPLS will not, and is not intended to, replace L3 forwarding.
1177	   There is absolutely a need for some systems to continue to forward
1178	   IP packets using normal Layer 3 IP forwarding. L3 forwarding will
1179	   be needed for a variety of reasons, including:
1180	     - For scaling; to forward on a finer granularity than the
1181	       labels can provide
1182	     - For security; to allow packet filtering at firewalls.
1183	     - For forwarding at the initial router (when hosts don't
1184	       do MPLS)

1186	   Consider a campus network which is serving a small company.
1187	   Suppose that this company makes use of the Internet, for example
1188	   as a method of communicating with customers. A customer on the
1189	   other side of the world has an IP packet to be forwarded to a
1190	   particular system within the company. It is not reasonable to
1191	   expect that the customer will have a label to use to forward the
1192	   packet to that specific system. Rather, the label used for the
1193	   "first hop" forwarding might be sufficient to get the packet
1194	   considerably closer to the destination. However, the granularity
1195	   of the labels cannot be to every host worldwide. Similarly,
1196	   routing used within one routing domain cannot know about every
1197	   host worldwide. This implies that in may cases the labels assigned
1198	   to a particular packet will be sufficient to get the packet close
1199	   to the destination, but that at some points along the path of the
1200	   packet the IP header will need to be examined to determine a finer
1201	   granularity for forwarding that packet. This is particularly
1202	   likely to occur at domain boundaries.

1204	   A similar point occurs at the last router prior to the destination
1205	   host. In general, the number of hosts attached to a network is
1206	   likely to be great enough that it is not feasible to assign a
1207	   separate label to every host. Rather, as least for routing within
1208	   the destination routing domain (or the destination area if there
1209	   is a hierarchical routing protocol in use) a label may be assigned
1210	   which is sufficient to get the packet to the last hop router.
1211	   However, the last hop router will need to examine the IP header
1212	   (and particularly the destination IP address) in order to forward
1213	   the packet to the correct destination host.

1215	   Packet filtering at firewalls is an important part of the
1216	   operation of the Internet. While the current state of Internet
1217	   security may be considerably less advanced than may be desired,
1218	   nonetheless some security (as is provided by firewalls) is much
1219	   better than no security. We expect that packet filtering will
1220	   continue to be important for the foreseeable future. Packet
1221	   filtering requires examination of the contents of the packet,
1222	   including the IP header. This implies that at firewalls the packet
1223	   cannot be forwarded simply by considering the label associated
1224	   with the packet. Note that this is also likely to occur at domain
1225	   boundaries.

1227	   Finally, it is very likely that many hosts will not implement
1228	   MPLS. Rather, the host will simply forward an IP packet to its
1229	   first hop router. This first hop router will need to examine the
1230	   IP header prior to forwarding the packet (with or without a
1231	   label).

1233	3.2 Scaling Issues

1235	   MPLS scalability is provided by two of the principles of routing.
1236	   The first is that forwarding follows an inverted tree rooted at a
1237	   destination. The second is that the number of destinations is
1238	   reduced by routing aggregation.

1240	   The very nature of IP forwarding is a merged multipoint-to-point
1241	   tree. Thus, since MPLS mirrors the IP network layer, an MPLS node
1242	   that is capable of merging is capable of creating O(n) switched
1243	   paths which provide network reachability to all "n" destinations.
1244	   The meaning of "n" depends on the granularity of the switched
1245	   paths. One obvious choice of "n" is the number of CIDR prefixes
1246	   existing in the forwarding table (this scales the same as today's
1247	   routing). However, the value of "n" may be reduced considerably by
1248	   choosing switched paths of further aggregation. For example, by
1249	   creating switched paths to each possible egress node, "n" may
1250	   represent the number of egress nodes in a network. This choice
1251	   creates "n" switched paths, such that each path is shared by all
1252	   CIDR prefixes that are routed through the same egress node. This
1253	   selection greatly improves scalability, since it minimizes "n",
1254	   but at the same time maintains the same switching performance of
1255	   CIDR aggregation. (See section 2.2.2 for a description of all of
1256	   the levels of granularity provided by MPLS).

1258	   The MPLS technology must scale at least as well as existing
1259	   technology. For example, if the MPLS technology were to support
1260	   ONLY host-to-host switched path connectivity, then the number of
1261	   switched-paths would be much higher than the number of routing
1262	   table entries.

1264	   There are several ways in which merging can be done in order to
1265	   allow O(n) switches paths to connect n nodes. The merging approach
1266	   used has an impact on the amount of state information, buffering,
1267	   delay characteristics, and the means of control required to
1268	   coordinate the trees. These issues are discussed in more detail in
1269	   section 4.2.

1271	   There are some cases in which O(n-squared) switched paths may be
1272	   used (for example, by setting up a full mesh of point to point
1273	   streams). As label space and the amount of state information that
1274	   can be supported may be limited, it will not be possible to
1275	   support O(n-squared) switched paths in very large networks.
1276	   However, in some cases the use of n-squared paths may even be a
1277	   advantage (for example, to allow load- splitting of individual
1278	   streams).

1280	   MPLS must be designed to scale for O(n). O(n) scaling allows MPLS
1281	   domains to scale to a very large scale. In addition, if best
1282	   effort service can be supported with O(n) scaling, this conserves
1283	   resources (such as label space and state information) which can be
1284	   used for supporting advanced services such as QoS. However, since
1285	   some switches may not support merging, and some small networks may
1286	   not require the scaling benefits of O(n), provisions must also be
1287	   provided for a non-merging, O(n-squared) solution.

1289	   Note: A precise and complete description of scaling would consider
1290	   that there are multiple dimensions of scaling, and multiple
1291	   resources whose usage may be considered. Possible dimensions of
1292	   scaling include: (i) the total number of streams which exist in an
1293	   MPLS domain (with associated labels assigned to them); (ii) the
1294	   total number of "label swapping pairs" which may be stored in the
1295	   nodes of the network (ie, entries of the form "for incoming label
1296	   'x', use outgoing label 'y'"); (iii) the number of labels which
1297	   need to be assigned for use over a particular link; (iv) The
1298	   amount of state information which needs to be maintained by any
1299	   one node. We do not intend to perform a complete analysis of all
1300	   possible scaling issues, and understand that our use of the terms
1301	   "O(n)" and "O(n-squared)" is approximate only.

1303	3.3 Types of Streams

1305	   Switched paths in the MPLS network can be of different types:

1307	     -  point-to-point
1308	     -  multipoint-to-point
1309	     -  point-to-multipoint
1310	     -  multipoint-to-multipoint

1312	   Two of the factors that determine which type of switched path is
1313	   used are (i) The capability of the switches employed in a network;
1314	   (ii) The purpose of the creation of a switched path; that is, the
1315	   types of flows to be carried in the switched path. These two
1316	   factor also determine the scalability of a network in terms of the
1317	   number of switched paths in use for transporting data through a
1318	   network.

1320	   The point-to-point switched path can be used to connect all
1321	   ingress nodes to all the egress nodes to carry unicast traffic. In
1322	   this case, since an ingress node has point-to-point connections to
1323	   all the egress nodes, the number of connections in use for
1324	   transporting traffic is of O(n-squared), where n is the number of
1325	   edge MPLS devices. For small networks the full mesh connection
1326	   approach may suffice and not pose any scalability problems.
1327	   However, in large enterprise backbone or ISP networks, this will
1328	   not scale well.

1330	   Point-to-point switched paths may be used on a host-to-host or
1331	   application to application basis (e.g., a switched path per RSVP
1332	   flow). The dedicated point-to-point switched path transports the
1333	   unicast data from the ingress to the egress node of the MPLS
1334	   network. This approach may be used for providing QoS services or
1335	   for best-effort traffic.

1337	   A multipoint-to-point switched path connects all ingress nodes to
1338	   an single egress node. At a given intermediate node in the
1339	   multipoint-to-point switched path, L2 data units from several
1340	   upstream links are "merged" into a single label on a downstream
1341	   link. Since each egress node is reachable via a single multipoint-
1342	   to-point switched path, the number of switched paths required to
1343	   transport best-effort traffic through a MPLS network is O(n),
1344	   where n is the number of egress nodes.

1346	   The point-to-multipoint switched path is used for distributing
1347	   multicast traffic. This switched path tree mirrors the multicast
1348	   distribution tree as determined by the multicast routing
1349	   protocols. Typically a switch capable of point-to-multipoint
1350	   connection replicates an L2 data unit from the incoming (parent)
1351	   interface to all the outgoing (child) interfaces. Standard ATM
1352	   switches support such functionality in the form of point-to-
1353	   multipoint VCs or VPs.

1355	   A multipoint-to-multipoint switched path may be used to combine
1356	   multicast traffic from multiple sources into a single multicast
1357	   distribution tree. The advantage of this is that the multipoint-to-
1358	   multipoint switched path is shared by multiple sources.
1359	   Conceptually, a form of multipoint-to-multipoint can be thought of
1360	   as follows: Suppose that you have a point to multipoint VC from
1361	   each node to all other nodes. Suppose that any point where two or
1362	   more VCs happen to merge, you merge them into a single VC or VP.
1363	   This would require either coordination of VCI spaces (so that each
1364	   source has a unique VCI within a VP) or VC merge capabilities. The
1365	   applicability of similar concepts to MPLS is FFS.

1367	3.4 Data Driven versus Control Traffic Driven Label Assignment

1369	   A fundamental concept in MPLS is the association of labels and
1370	   network layer routing. Each LSR must assign labels, and distribute
1371	   them to its forwarding peers, for traffic which it intends to
1372	   forward by label swapping. In the various contributions that have
1373	   been made so far to the MPLS WG we identify three broad strategies
1374	   for label assignment; (i) those driven by topology based control
1375	   traffic [RFC2105][ARIS][IPNAV]; (ii) Those driven by request based
1376	   control traffic [CR-LDP][RSVP-LSP]; and (iii) those driven by data
1377	   traffic [RFC2098][RFC1953].

1379	   We also note that in actual practice combinations of these methods
1380	   may be employed. One example is that topology based methods for
1381	   best effort traffic plus request based methods for support of
1382	   RSVP.

1384	3.4.1 Topology Driven Label Assignment

1386	   In this scheme labels are assigned in response to normal
1387	   processing of routing protocol control traffic. Examples of such
1388	   control protocols are OSPF and BGP. As an LSR processes OSPF or
1389	   BGP updates it can, as it makes or changes entries in its
1390	   forwarding tables, assign labels to those entries.

1392	   Among the properties of this scheme are:

1394	  - The computational load of assignment and distribution and
1395	    the bandwidth consumed by label distribution are bounded by
1396	    the size of the network.

1398	  - Labels are in the general case preassigned. If a route
1399	    exists then a label has been assigned to it (and
1400	    distributed). Traffic may be label swapped immediately it
1401	    arrives, there is no label setup latency at forwarding time.

1403	  - Requires LSRs to be able to process control traffic load
1404	    only.

1406	  - Labels assigned in response to the operation of routing
1407	    protocols can have a granularity equivalent to that of the
1408	    routes advertised by the protocol. Labels can, by this
1409	    means, cover (highly) aggregated routes.

1411	3.4.2 Request Driven Label Assignment

1413	   In this scheme labels are assigned in response to normal
1414	   processing of request based control traffic. Examples of such
1415	   control protocols are RSVP. As an LSR processes RSVP messages it
1416	   can, as it makes or changes entries in its forwarding tables,
1417	   assign labels to those entries.

1419	   Among the properties of this scheme are:

1421	  - The computational load of assignment and distribution and
1422	    the bandwidth consumed by label distribution are bounded by
1423	    the amount of control traffic in the system.

1425	  - Labels are in the general case preassigned. If a route
1426	    exists then a label has been assigned to it (and
1427	    distributed). Traffic may be label swapped immediately it
1428	    arrives, there is no label setup latency at forwarding time.

1430	  - Requires LSRs to be able to process control traffic load
1431	    only.

1433	  - Depending upon the number of flows supported, this approach
1434	    may require a larger number of labels to be assigned
1435	    compared with topology driven assignment.

1437	  - This approach requires applications to make use of request
1438	    paradigm in order to get a label assigned to their flow.

1440	3.4.3 Traffic Driven Label Assignment

1442	   In this scheme the arrival of data at an LSR "triggers" label
1443	   assignment and distribution. Traffic driven approach has the
1444	   following characteristics.

1446	  - Label assignment and distribution costs are a function of
1447	    traffic patterns. In an LSR with limited label space that is
1448	    using a traffic driven approach to amortize its labels over
1449	    a larger number of flows the overhead due to label
1450	    assignment and distribution grows as a function of the
1451	    number of flows and as a function of their "persistence".
1452	    Short lived but recurring flows may impose a heavy control
1453	    burden.

1455	  - There is a latency associated with the appearance of a
1456	    "flow" and the assignment of a label to it. The documented
1457	    approaches to this problem suggest L3 forwarding during this
1458	    setup phase, this has the potential for packet reordering
1459	    (note that packet reordering may occur with any scheme when
1460	    the network topology changes, but traffic driven label
1461	    assignment introduces another cause for reordering).

1463	  - Flow driven label assignment requires high performance
1464	    packet classification capabilities.

1466	  - Traffic driven label assignment may be useful to reduce
1467	    label consumption (assuming that flows are not close to full
1468	    mesh).

1470	  - If you want flows to hosts, due to limits on label space,
1471	    then traffic based label consumption is probably necessary
1472	    due to the large number of hosts which may occur in a
1473	    network.

1475	  - If you want to assign specific network resources to
1476	    specific labels, to be used for support of application
1477	    flows, then again the fine grain associated with labels may
1478	    require data based label assignment.

1480	3.5 The Need for Dealing with Looping

1482	   Routing protocols which are used in conjunction with MPLS will in
1483	   many cases be based on distributed computation. As such, during
1484	   routing transients, these protocols may compute forwarding paths
1485	   which contain loops. For this reason MPLS will be designed with
1486	   mechanisms to either prevent the formation of loops and /or
1487	   contain the amount of resources that can be consumed due to the
1488	   presence of loops.

1490	   Note that there are a number of different alternative mechanisms
1491	   which have been proposed (see section 4.3). Some of these prevent
1492	   the formation of layer 2 forwarding loops, others allow loops to
1493	   form but minimize their impact in one way or another (e.g., by
1494	   discarding packets which loop, or by detecting and closing the
1495	   loop after a period of time). Generally speaking, there are
1496	   tradeoffs to be made between the amount of looping which might
1497	   occur, and other considerations such as the time to convergence
1498	   after a change in the paths computed by the routing algorithm.

1500	   We are not proposing any changes to normal layer 3 operation, and
1501	   specifically are not trying to eliminate the possibility of
1502	   looping at layer 3. Transient loops will continue to be possible
1503	   in IP networks. Note that IP has a means to limit the damage done
1504	   by looping packets, based on decrementing the IP TTL field as the
1505	   packet is forwarded, and discarding packets whose TTL has expired.
1506	   Dynamic routing protocols used with IP are also designed to
1507	   minimize the amount of time during which loops exist.

1509	   The question that MPLS has to deal with is what to do at L2. In
1510	   some cases L2 may make use of the same method that is used as L3.
1511	   However, other options are available at L2, and in some cases
1512	   (specifically when operating over ATM or Frame Relay hardware) the
1513	   method of decrementing a TTL field (or any similar field) is not
1514	   available.

1516	   There are basically two problems caused by packet looping: The
1517	   most obvious problem is that packets are not delivered to the
1518	   correct destination. The other result of looping is congestion.
1519	   Even with TTL decrementing and packet discard, there may still be
1520	   a significant amount of time that packets travel through a loop.
1521	   This can adversely affect other packets which are not looping:
1522	   Congestion due to the looping packets can cause non-looping
1523	   packets to be delayed and/or discarded.

1525	   Looping is particularly serious in (at least) three cases: One is
1526	   when forwarding over ATM. Since ATM does not have a TTL field to
1527	   decrement, there is no way to discard ATM cells which are looping
1528	   over ATM subnetworks. Standard ATM PNNI routing and signaling
1529	   solves this problem by making use of call setup procedures which
1530	   ensure that ATM VCs will never be setup in a loop [PNNI]. However,
1531	   when MPLS is used over ATM subnets, the native ATM routing and
1532	   signaling procedures may not be used for the full L2 path. This
1533	   leads to the possibility that MPLS over ATM might in principle
1534	   allow packets to loop indefinitely, or until L3 routing
1535	   stabilizes. Methods are needed to prevent this problem.

1537	   Another case in which looping can be particularly unpleasant is
1538	   for multicast traffic. With multicast, it is possible that the
1539	   packet may be delivered successfully to some destinations even
1540	   though copies intended for other destinations are looping. This
1541	   leads to the possibility that huge numbers of identical packets
1542	   could be delivered to some destinations. Also, since multicast
1543	   implies that packets are duplicated at some points in their path,
1544	   the congestion resulting from looping packets may be particularly
1545	   severe.

1547	   Another unpleasant complication of looping occurs if the
1548	   congestion caused by the loop interferes with the routing
1549	   protocol. It is possible for the congestion caused by looping to
1550	   cause routing protocol control packets to be discarded, with the
1551	   result that the routing protocol becomes unstable. For example
1552	   this could lengthen the duration of the loop.

1554	   In normal operation of IP networks the impact of congestion is
1555	   limited by the fact that TCP backs off (ie, transmits
1556	   substantially less traffic) in response to lost packets. Where the
1557	   congestion is caused by looping, the combination of TTL and the
1558	   resulting discard of looping packets, plus the reduction in
1559	   offered traffic, can limit the resulting impact on the network.
1560	   TCP backoff however does not solve the problem if the looping
1561	   packets are not discarded (for example, if the loop is over an ATM
1562	   subnetwork where TTL is not used).

1564	   The severity of the problem caused by looping may depend upon
1565	   implementation details. Suppose, for instance, that ATM switching
1566	   hardware is being used to provide MPLS switching functions. If the
1567	   ATM hardware has per-VC queuing, and if it is capable of providing
1568	   fair access to the buffer pool for incoming cells based on the
1569	   incoming VC (so that no one incoming VC is allowed to grab a
1570	   disproportionate number of buffers), this looping might not have a
1571	   significant effect on other traffic. If the ATM hardware cannot
1572	   provide fair buffer access of this sort, however, then even
1573	   transient loops may cause severe degradation of the node's total
1574	   performance.

1576	   Given that MPLS is a relatively new approach, it is possible that
1577	   looping may have consequences which are not fully understood (such
1578	   as looping of LDP control information in cases where stream merge
1579	   is not used).

1581	   Even if fair buffer access can be provided, it is still worthwhile
1582	   to have some means of detecting loops that last "longer than
1583	   possible". In addition, even where TTL and/or per-VC fair queuing
1584	   provides a means for surviving loops, it still may be desirable
1585	   where practical to avoid setting up LSPs which loop.

1587	   Methods for dealing with loops are discussed in section 4.3.

1589	3.6 Operations and Management
1590	   Operations and management of networks is critically important.
1591	   This implies that MPLS must support operations, administration,
1592	   and maintenance facilities at least as extensive as those
1593	   supported in current IP networks.

1595	   In most ways this is a relatively simple requirement to meet.
1596	   Given that all MPLS nodes run normal IP routing protocols, it is
1597	   straightforward to expect them to participate in normal IP network
1598	   management protocols.

1600	   There is one issue which has been identified and which needs to be
1601	   addressed by the MPLS effort: There is an issue with regard to
1602	   operation of Traceroute over MPLS networks. Note that other O&M
1603	   issues may be identified in the future.

1605	   Traceroute is a very commonly used network management tool.
1606	   Traceroute is based on use of the TTL field: A station trying to
1607	   determine the route from itself to a specified address transmits
1608	   multiple IP packets, with the TTL field set to 1 in the first
1609	   packet, 2 in the second packet, etc.. This causes each router
1610	   along the path to send back an ICMP error report for TTL exceeded.
1611	   This in turn allows the station to determine the set of routers
1612	   along the route. For example, this can be used to determine where
1613	   a problem exists (if no router responds past some point, the last
1614	   router which responds can become the starting point for a search
1615	   to determine the cause of the problem).

1617	   When MPLS is operating over ATM or Frame Relay networks there is
1618	   no TTL field to decrement (and ATM and Frame Relay forwarding
1619	   hardware does not decrement TTL). This implies that it is not
1620	   straightforward to have Traceroute operate in this environment.

1622	   There is the question of whether we *want* all routers along a
1623	   path to be visible via traceroute. For example, an ISP probably
1624	   doesn't want to expose the interior of their network to a
1625	   customer. However, the issue of whether a network's policy will
1626	   allow the interior of the network to be visible should be
1627	   independent of whether is it possible for some users to see the
1628	   interior of the network. Thus while there clearly should be the
1629	   possibility of using policy mechanisms to block traceroute from
1630	   being used to see the interior of the network, this does not imply
1631	   that it is okay to develop protocol mechanisms which prevent
1632	   traceroute from working.

1634	   There is also the question of whether the interior of a MPLS
1635	   network is analogous to a normal IP network, or whether it is
1636	   closer to the interior of a layer 2 network (for example, an ATM
1637	   subnet). Clearly IP traceroute cannot be used to expose the
1638	   interior of an ATM subnet. When a packet is crossing an ATM
1639	   subnetwork (for example, between an ingress and an egress router
1640	   which are attached to the ATM subnet) traceroute can be used to
1641	   determine the router to router path, but not the path through the
1642	   ATM switches which comprise the ATM subnet. Note here that MPLS
1643	   forms a sort of "in between" special case:
1644	   Routing is based on normal IP routing protocols, the equivalent of
1645	   call setup (label binding/exchange) is based on MPLS-specific
1646	   protocols, but forwarding is based on normal L2 ATM forwarding.
1647	   MPLS therefore supersedes the normal ATM-based methods that would
1648	   be used to eliminate loops and/or trace paths through the ATM
1649	   subnet.

1651	   It is generally agreed that Traceroute is a relatively "ugly"
1652	   tool, and that a better tool for tracing the route of a packet
1653	   would be preferable. However, no better tool has yet been designed
1654	   or even proposed. Also, however ugly Traceroute may be, it is
1655	   nonetheless very useful, widely deployed, and widely used. In
1656	   general, it is highly preferable to define, implement, and deploy
1657	   a new tool, and to determine through experience that the new tool
1658	   is sufficient, before breaking a tool which is as widely used as
1659	   traceroute.

1661	   Methods that may be used to either allow traceroute to be used in
1662	   an MPLS network, or to replace traceroute, are discussed in
1663	   section 4.11.

1665	4. Technical Approaches

1667	4.1 Label Distribution

1669	   A fundamental requirement in MPLS is that an LSR forwarding label
1670	   switched traffic to another LSR apply a label to that traffic
1671	   which is meaningful to the other (receiving the traffic) LSR.
1672	   LSR's could learn about each other's labels in a variety of ways.
1673	   We call the general topic "label distribution".

1675	4.1.1 Explicit Label Distribution

1677	   Explicit label distribution anticipates the specification by MPLS
1678	   of a standard protocol for label distribution. Two of the possible
1679	   approaches (TDP, ARIS [ARIS-PROT]) are oriented toward topology
1680	   driven label distribution. One other approach [FANP], in contrast,
1681	   makes use of traffic driven label distribution. We expect that the
1682	   label distribution protocol [LDP] which emerges from the MPLS WG
1683	   is likely to inherit elements from one or more of the possible
1684	   approaches.

1686	   Consider LSR A forwarding traffic to LSR B. We call A the upstream
1687	   (wrt to dataflow) LSR and B the downstream LSR. A must apply a
1688	   label to the traffic that B "understands". Label distribution must
1689	   ensure that the "meaning" of the label will be communicated
1690	   between A and B. An important question is whether A or B (or some
1691	   other entity) allocates the label.

1693	   In this discussion we are talking about the allocation and
1694	   distribution of labels between two peer LSRs that are on a single
1695	   segment of what may be a longer path. A related but in fact
1696	   entirely separate issue is the question of where control of the
1697	   whole path resides. In essence there are two models; by analogy to
1698	   upstream and downstream for a single segment we can talk about
1699	   ingress and egress for an LSP (or to and from a label swapping
1700	   "domain"). In one model a path is setup from ingress to egress and
1701	   in the other from egress to ingress.

1703	4.1.1.1 Downstream Label Allocation

1705	   "Downstream Label Allocation" refers to a method where the label
1706	   allocation is done by the downstream LSR, ie the LSR that uses the
1707	   label as an index into its switching tables.

1709	   This is, arguably, the most natural label allocation/distribution
1710	   mode for unicast traffic. As an LSR build its routing tables (we
1711	   consider here control driven allocation of tags) it is free,
1712	   within some limits we will discuss, to allocate labels in any
1713	   manner that may be convenient to the particular implementation.
1714	   Since the labels that it allocates will be those upon which it
1715	   subsequently makes forwarding decisions we assume implementations
1716	   will perform the allocation in an optimal manner. Having allocated
1717	   labels the default behavior is to distribute the labels (and
1718	   bindings) to all peers.

1720	   In some cases (particularly with ATM) there may be a limited
1721	   number of labels which may be used across an interface, and/or a
1722	   limited number of label assignments which may be supported by a
1723	   single device. Operation in this case may make use of "on demand"
1724	   label assignment. With this approach, an LSR may for example
1725	   request a label for a route from a particular peer only when its
1726	   routing calculations indicate that peer to be the new next hop for
1727	   the route.

1729	4.1.1.2 Upstream Label Allocation

1731	   "Upstream Label Allocation" refers to a method where the label
1732	   allocation is done by the upstream LSR. In this case the LSR
1733	   choosing the label (the upstream LSR) and the LSR which needs to
1734	   interpret packets using the label (the downstream LSR) are not the
1735	   same node. We note here that in the upstream LSR the label at
1736	   issue is not used as an index into the switching tables but rather
1737	   is found as the result of a lookup on those tables.

1739	   The motivation for upstream label allocation comes from the
1740	   recognition that it might be possible to optimize multicast
1741	   machinery in an LSR if it were possible to use the same label on
1742	   all output ports for which a particular multicast packet/cell were
1743	   destined. Upstream assignment makes this possible.

1745	4.1.1.3 Other Label Allocation Methods

1747	   Another option would be to make use of label values which are
1748	   unique within the MPLS domain (implying that a domain-wide
1749	   allocation would be needed). In this case, any stream to a
1750	   particular MPLS egress node could make use of the label of that
1751	   node (implying that label values do not need to be swapped at
1752	   intermediate nodes).

1754	   With this method of label allocation, there is a choice to be made
1755	   regarding the scope over which a label is unique. One approach is
1756	   to configure each node in an MPLS domain with a label which is
1757	   unique in that domain. Another approach is to use a truly global
1758	   identifier (for example the IEEE 48 bit identifier), where each
1759	   MPLS-capable node would be stamped at birth with a truly globally
1760	   unique identifier. The point of this global approach is to
1761	   simplify configuration in each MPLS domain by eliminating the need
1762	   to configure label IDs.

1764	4.1.2 Piggybacking on Other Control Messages

1766	   While we have discussed use of an explicit MPLS LDP we note that
1767	   there are several existing protocols that can be easily modified
1768	   to distribute both routing/control and label information. This
1769	   could be done with any of OSPF, BGP, RSVP and/or PIM. A particular
1770	   architectural elegance of these schemes is that label distribution
1771	   uses the same mechanisms as are used in distribution of the
1772	   underlying routing or control information.

1774	   When explicit label distribution is used, the routing computation
1775	   and label distribution are decoupled. This implies a possibility
1776	   that at some point you may either have a route to a specific
1777	   destination without an associated label, and/or a label for a
1778	   specific destination which makes use of a path which you are no
1779	   longer using. Piggybacking label distribution on the operation of
1780	   the routing protocol is one way to eliminate this decoupling.

1782	   Piggybacking label distribution on the routing protocol introduces
1783	   an issue regarding how to negotiate acceptable label values and
1784	   what to do if an invalid label is received. This is discussed in
1785	   section 4.1.3.

1787	4.1.3 Acceptable Label Values
1788	   There are some constraints on which label values may be used in
1789	   either allocation mode. Clearly the label values must lie within
1790	   the allowable range described in the encapsulation standards that
1791	   the MPLS WG will produce. The label value used must also, however,
1792	   lie within a range that the peer LSR is capable of supporting. We
1793	   imagine that certain machines, for example ATM switches operating
1794	   as LSRs may, due to operational or implementation restrictions,
1795	   support a label space more limited than that bounded by the valid
1796	   range found in the encapsulation standard. This implies that an
1797	   advertisement or negotiation mechanism for useable label range may
1798	   be a part of the MPLS LDP. When operating over ATM using ATM
1799	   forwarding hardware, due to the need for compatibility with the
1800	   existing use of the ATM VPI/VCI space, it is quite likely that an
1801	   explicit mechanism will be needed for label range negotiation.

1803	   In addition we note that LDP may be one of a number of mechanism
1804	   used to distribute labels between any given pair of LSRs. Clearly
1805	   where such multiple mechanisms exist care must be taken to
1806	   coordinate the allocation of label values. A single label value
1807	   must have a unique meaning to the LSR that distributes it.

1809	   There is an issue regarding how to allow negotiation of acceptable
1810	   label values if label distribution is piggybacked with the routing
1811	   protocol. In this case it may be necessary either to require
1812	   equipment to accept any possible label value, or to configure
1813	   devices to know which range of label values may be selected. It is
1814	   not clear in this case what to do if an invalid label value is
1815	   received as there may be no means of sending a NAK.

1817	   A similar issue occurs with multicast traffic over broadcast
1818	   media, where there may be multiple nodes which receive the same
1819	   transmission (using a single label value). Here again it may be
1820	   "non-trivial" how to allow n-party negotiation of acceptable label
1821	   values.

1823	4.1.4 LDP Reliability

1825	   The need for reliable label distribution depends upon the relative
1826	   performance of L2 and L3 forwarding, as well as the relationship
1827	   between label distribution and the routing protocol operation.

1829	   If label distribution is tied to the operation of the routing
1830	   protocol, then a reasonable protocol design would ensure that
1831	   labels are distributed successfully as long as the associated
1832	   route and/or reachability advertisement is distributed
1833	   successfully. This implies that the reliability of label
1834	   distribution will be the same as the reliability of route
1835	   distribution.

1837	   If there is a very large difference between L2 and L3 forwarding
1838	   performance, then the cost of failing to deliver a label is
1839	   significant. In this case it is important to ensure that labels
1840	   are distributed reliably. Given that LDP needs to operate in a
1841	   wide variety of environments with a wide variety of equipment,
1842	   this implies that it is important for any LDP developed by the
1843	   MPLS WG to ensure reliable delivery of label information.

1845	   Reliable delivery of LDP packets may potentially be accomplished
1846	   either by using an existing reliable transport protocol such as
1847	   TCP, or by specifying reliability mechanisms as part of LDP (for
1848	   example, the reliability mechanisms which are defined in IDRP
1849	   could potentially be "borrowed" for use with LSP).

1851	   TCP supports flow control {in addition to supporting reliable
1852	   delivery of data). Flow control is a desirable feature which will
1853	   be useful for MPLS (as well as other applications making use of a
1854	   reliable transport) and therefore needs to be built into whatever
1855	   reliability mechanism is used for MPLS.

1857	4.1.5 Label Purge Mechanisms

1859	   Another issue to be considered is the "lifetime" of label data
1860	   once it arrives at an LSR, and the method of purging label data.
1861	   There are several methods that could be used either separately, or
1862	   (more likely) in combination.

1864	   One approach is for label information to be timed out. With this
1865	   approach a lifetime is distributed along with the label value. The
1866	   label value may be refreshed prior to timing out. If the label is
1867	   not refreshed prior to timing out it is discarded. In this case
1868	   each lifetime and timer may apply to a single label, or to a group
1869	   of labels (e.g., all labels selected by the same node).

1871	   Similarly, two peer nodes may make use of an MPLS peer keep-alive
1872	   mechanism. This implies exchange of MPLS control packets between
1873	   neighbors on a periodic basis. This in general is likely to use a
1874	   smaller timeout value than label value timers (analogous to the
1875	   fact that the OSPF HELLO interval is much shorter than the OSPF
1876	   LSA lifetime). If the peer session between two MPLS nodes fails
1877	   (due to expiration of the associated timer prior to reception of
1878	   the refresh) then associated label information is discarded.

1880	   If label information is piggybacked on the routing protocol then
1881	   the timeout mechanisms would also be taken from the associated
1882	   routing protocol (note that routing protocols in general have
1883	   mechanisms to invalidate stale routing information).

1885	   An alternative method for invalidating labels is to make use of an
1886	   explicit label removal message.

1888	4.2 Stream Merging

1890	   In order to scale O(n) (rather than O(n-squared), MPLS makes use
1891	   of the concept of stream merge. This makes use of multipoint to
1892	   point streams in order to allow multiple streams to be merged into
1893	   one stream.

1895	4.2.1 Types of Stream Merge:

1897	   There are several types of stream merge that can be used,
1898	   depending upon the underlying media.

1900	   When MPLS is used over frame based media merging is
1901	   straightforward. All that is required for stream merge to take
1902	   place is for a node to allow multiple upstream labels to be
1903	   forwarded the same way and mapped into a single downstream label.
1904	   This is referred to as frame merge.

1906	   Operation over ATM media is less straightforward. In ATM, the data
1907	   packets are encapsulated into an ATM Adaptation Layer, say AAL5,
1908	   and the AAL5 PDU is segmented into ATM cells with a VPI/VCI value
1909	   and the cells are transmitted in sequence. It is contingent on ATM
1910	   switches to keep the cells of a PDU (or with the same VPI/VCI
1911	   value) contiguous and in sequence. This is because the device that
1912	   reassembles the cells to re-form the transmitted PDU expects the
1913	   cells to be contiguous and in sequence, as there isn't sufficient
1914	   information in the ATM cell header (unlike IP fragmentation) to
1915	   reassemble the PDU with any cell order. Hence, if cells from
1916	   several upstream link are transmitted onto the same downstream
1917	   VPI/VCI, then cells from one PDU can get interleaved with cells
1918	   from another PDU on the outgoing VPI/VCI, and result in corruption
1919	   of the original PDUs by mis-sequencing the cells of each PDU.

1921	   The most straightforward (but erroneous) method of merging in an
1922	   ATM environment would be to take the cells from two incoming VCs
1923	   and merge them into a single outgoing VCI. If this was done
1924	   without any buffering of cells then cells from two or more packets
1925	   could end up being interleaved into a single AAL5 frame. Therefore
1926	   the problem when operating over ATM is how to avoid interleaving
1927	   of cells from multiple sources.

1929	   There are two ways to solve this interleaving problem, which are
1930	   referred to as VC merge and VP merge.

1932	   VC merge allows multiple VCs to be merged into a single outgoing
1933	   VC. In order for this to work the node performing the merge needs
1934	   to keep the cells from one AAL5 frame (e.g., corresponding to an
1935	   IP packet) separate from the cells of other AAL5 frames. This may
1936	   be done by performing the SAR function in order to reassemble each
1937	   IP packet before forwarding that packet. In this case VC merge is
1938	   essentially equivalent to frame merge. An alternative is to buffer
1939	   the cells of one AAL5 frame together, without actually
1940	   reassembling them. When the end of frame indicator is reached that
1941	   frame can be forwarded. Note however that both forms of VC merge
1942	   requires that the entire AAL5 frame be received before any cells
1943	   corresponding to that frame be forwarded. VC merge therefore
1944	   requires capabilities which are generally not available in most
1945	   existing ATM forwarding hardware.

1947	   The alternative for use over ATM media is VP merge. Here multiple
1948	   VPs can be merged into a single VP. Separate VCIs within the
1949	   merged VP are used to distinguish frames (e.g., IP packets) from
1950	   different sources. In some cases, one VP may be used for the tree
1951	   from each ingress node to a single egress node.

1953	4.2.2 Interoperation of Merge Options:

1955	   If some nodes support stream merge, and some nodes do not, then it
1956	   is necessary to ensure that the two types of nodes can
1957	   interoperate within a single network. This affects the number of
1958	   labels that a node needs to send to a neighbor. An upstream LSR
1959	   which supports Stream Merge needs to be sent only one label per
1960	   forwarding equivalence class (FEC). An upstream neighbor which
1961	   does not support Stream Merge needs to be sent multiple labels per
1962	   FEC. However, there is no way of knowing a priori how many labels
1963	   it needs. This will depend on how many LSRs are upstream of it
1964	   with respect to the FEC in question.

1966	   If a particular upstream neighbor does not support stream merge,
1967	   it is not known a priori how many labels it will need. The
1968	   upstream neighbor may need to explicitly ask for labels for each
1969	   FEC. The upstream neighbor may make multiple such requests (for
1970	   one or more labels per request). When a downstream neighbor
1971	   receives such a request from upstream, and the downstream neighbor
1972	   does not itself support stream merge, then it must in turn ask its
1973	   downstream neighbor for more labels for the FEC in question.

1975	   It is possible that there may be some nodes which support merge,
1976	   but have a limited number of upstream streams which may be merged
1977	   into a single downstream streams. Suppose for example that due to
1978	   some haardware limitation a node is capable of merging four
1979	   upstream LSPs into a single downstream LSP. Suppose however, that
1980	   this particular node has six upstream LSPs arriving at it for a
1981	   particular Stream. In this case, this node may merge these into
1982	   two downstream LSPs (corresponding to two labels that need to be
1983	   obtained from the downstream neighbor). In this case, the node
1984	   will need to obtain the required two labels.

1986	   The interoperation of the various forms of merging over ATM is
1987	   most easily described by first describing the interoperation of VC
1988	   merge with non-merge.

1990	   In the case where VC merge and non-merge nodes are interconnected
1991	   the forwarding of cells is based in all cases on a VC (ie, the
1992	   concatenation of the VPI and VCI). For each node, if an upstream
1993	   neighbor is doing VC merge then that upstream neighbor requires
1994	   only a single outgoing VPI/VCI for a particular FEC (this is
1995	   analogous to the requirement for a single label in the case of
1996	   operation over frame media). If the upstream neighbor is not doing
1997	   merge, then it will require a single outgoing VPI/VCI per FEC for
1998	   itself (assuming that it can be an ingress node), plus enough
1999	   outgoing VPI/VCIs to map to incoming VPI/VCIs to pass to its
2000	   upstream neighbors. The number required will be determined by
2001	   allowing the upstream nodes to request additional VPI/VCIs from
2002	   their downstream neighbors.

2004	   A similar method is possible to support nodes which perform VP
2005	   merge. In this case the VP merge node, rather than requesting a
2006	   single VPI/VCI or a number of VPI/VCIs from its downstream
2007	   neighbor, instead may request a single VP (identified by a VPI).
2008	   Furthermore, suppose that a non-merge node is downstream from two
2009	   different VP merge nodes. This node may need to request one
2010	   VPI/VCI (for traffic originating from itself) plus two VPs (one
2011	   for each upstream node).

2013	   In order to support all of VP merge, VC merge, and non-merge, it
2014	   is therefore necessary to allow upstream nodes to request a
2015	   combination of zero or more VC identifiers (consisting of a
2016	   VPI/VCI), plus zero or more VPs (identified by VPIs). VP merge
2017	   nodes would therefore request one VP. VC merge node would request
2018	   only a single VPI/VCI (since they can merge all upstream traffic
2019	   into a single VC). Non-merge nodes would pass on any requests that
2020	   they get from above, plus request a VPI/VCI for traffic that they
2021	   originate (if they can be ingress nodes). However, non-merge nodes
2022	   which can only do VC forwarding (and not VP forwarding) will need
2023	   to know which VCIs are used within each VP in order to install the
2024	   correct VCs in its forwarding table. A detailed description of how
2025	   this could work can be found in [ATMVP].

2027	4.2.3 Coordination of the VCI space with VP Merge:

2029	   VP merge requires that the VCIs be coordinated to ensure
2030	   uniqueness. There are a number of ways in which this may be
2031	   accomplished:

2033	  1. Each node may be pre-configured with a unique VCI value
2034	     (or values).

2036	  2. Some one node (most likely they root of the multipoint to
2037	     point tree) may coordinate the VCI values used within the
2038	     VP. A protocol mechanism will be needed to allow this to
2039	     occur. How hard this is to do depends somewhat upon
2040	     whether the root is otherwise involved in coordinating the
2041	     multipoint to point tree. For example, allowing one node
2042	     (such as the root) to coordinate the tree may be useful
2043	     for purposes of coordinating load sharing (see section
2044	     4.10). Thus whether or not the issue of coordinating the
2045	     VCI space is significant or trivial may depend upon other
2046	     design choices which at first glance may have appeared to
2047	     be independent protocol design choices.

2049	  3. Other unique information such as portions of a class B or
2050	     class C address may be used to provide a unique VCI value.

2052	  4. Another alternative is to implement a simple hardware
2053	     extension in the ATM switches to keep the VCI values
2054	     unique by dynamically altering them to avoid collision.

2056	   VP merge makes less efficient use of the VPI/VCI space (relative
2057	   to VC merge). When VP merge is used, the LSPs may not be able to
2058	   transit public ATM networks that don't support SVP.

2060	4.2.4 Buffering Issues Related To Stream Merge:

2062	   There is an issue regarding the amount of buffering required for
2063	   frame merge, VC merge, and VP merge. Frame merge and VC merge
2064	   requires that intermediate points buffer incoming packets until
2065	   the entire packet arrives. This is essentially the same as is
2066	   required in traditional IP routers.

2068	   VP merge allows cells to be transmitted by intermediate nodes as
2069	   soon as they arrive, reducing the buffering and latency at
2070	   intermediate nodes. However, the use of VP merge implies that
2071	   cells from multiple packets will arrive at the egress node
2072	   interleaved on separate VCIs. This in turn implies that the egress
2073	   node may have somewhat increased buffering requirements. To a
2074	   large extent egress nodes for some destinations will be
2075	   intermediate nodes for other destinations, implying that increase
2076	   in buffers required for some purpose (egress traffic) will be
2077	   offset by a reduction in buffers required for other purposes
2078	   (transit traffic). Also, routers today typically deal with high-
2079	   fanout channelized interfaces and with multi-VC ATM interfaces,
2080	   implying that the requirement of buffering simultaneously arriving
2081	   cells from multiple packets and sources is something that routers
2082	   typically do today. This is not meant to imply that the required
2083	   buffer size and performance is inexpensive, but rather is meant to
2084	   observe that it is a solvable issue.

2086	   ATM equipment provides traffic shaping, in which the ATM cells
2087	   associated with any one particular VC are intentionally not
2088	   transmitted back to back, but rather are spread out over time in
2089	   order to place less short term buffering load on switches. Since
2090	   VC merge requires that all cells associated with a particular
2091	   packet (or a particular AAL5 frame) are buffered before any cell
2092	   from the packet can be transmitted, VC merge defeats much of the
2093	   intent of traffic shaping. An advantage of VP merge is that it
2094	   preserves traffic shaping through ATM switches acting as LSRs.
2095	   While traffic shaping may generally be expected to reduce the
2096	   buffering requirements in ATM switches (whether acting as MPLS
2097	   switches or as native ATM switches), the precise effect of traffic
2098	   shaping has not been studied in the context of MPLS.

2100	4.3 Loop Handling

2102	   Generally, methods for dealing with loops can be split into three
2103	   categories: Loop Survival makes use of methods which minimize the
2104	   impact of loops, for example by limiting the amount of network
2105	   resources which can be consumed by a loop; Loop Detection allows
2106	   loops to be set up, but later detects these loops and eliminates
2107	   them; Loop Prevention provides methods for avoiding setting up L2
2108	   forwarding in a way which results in a L2 loop.

2110	   Note that we are concerned here only with loops that occur in L2
2111	   forwarding. Transient loops at L3 will continue to be part of the
2112	   normal IP operation, and will be handled the way that IP has been
2113	   handling loops for years (see section 3.5).

2115	   Loop Survival:

2117	   Loop Survival refers to methods that are used to allow the network
2118	   to operate well even though short term transient loops may be
2119	   formed by the routing protocol. The basic approach to loop
2120	   survival is to limit the amount of network resources which are
2121	   consumed by looping packets, and to minimize the effect on other
2122	   (non-looping) traffic. Note that loop survival is the method used
2123	   by conventional IP forwarding, and is therefore based on long and
2124	   relatively successful experience in the Internet.

2126	   The most basic method for loop survival is based on the use to a
2127	   TTL (Time To Live) field. The TTL field is decremented at each
2128	   hop. If the TTL field reaches zero, then the packet is discarded.
2129	   This method works well over those media which has a TTL field.
2130	   This explicitly includes L3 IP forwarding. Also, assuming that the
2131	   core MPLS specifications will include definition of a "shim" MPLS
2132	   header for use over those media which do not have their own
2133	   labels, in order to carry labels for use in forwarding of user
2134	   data, the shim header will also include a TTL field.

2136	   However, there is considerable interest in using MPLS over L2
2137	   protocols which provide their own labels, with the L2 label used
2138	   for MPLS forwarding. Specific L2 protocols which offer a label for
2139	   this purpose include ATM and Frame Relay. However, neither ATM nor
2140	   Frame Relay have a TTL field. This implies that this method cannot
2141	   be used when basic ATM or Frame Relay forwarding is being used.

2143	   Another basic method for loop survival is the use of dynamic
2144	   routing protocols which converge rapidly to non-looping paths. In
2145	   some instances it is possible that congestion caused by looping
2146	   data could effect the convergence of the routing protocol (see
2147	   section 3.5). MPLS should be designed to prevent this problem from
2148	   occurring. Given that MPLS uses the same routing protocols as are
2149	   used for IP, this method does not need to be discussed further in
2150	   this framework document.

2152	   Another possible tool for loop survival is the use of fair
2153	   queuing. This allows unrelated flows of user data to be placed in
2154	   different queues. This helps to ensure that a node which is
2155	   overloaded with looping user data can nonetheless forward
2156	   unrelated non-looping data, thereby minimizing the effect that
2157	   looping data has on other data. We cannot assume that fair queuing
2158	   will always be available. In practice, many fair queuing
2159	   implementations merge multiple streams into one queue (implying
2160	   that the number of queues used is less than the number of user
2161	   data flows which are present in the network). This implies that
2162	   any data which happens to be in the same queue with looping data
2163	   may be adversely effected.

2165	   Loop Detection:

2167	   Loop Detection refers to methods whereby a loop may be set up at
2168	   L2, but the loop is subsequently detected. When the loop is
2169	   detected, it may be broken at L2 by dropping the label
2170	   relationship, implying that packets for a set of destinations must
2171	   be forwarded at L3.

2173	   A possible method for loop detection is based on transmitting a
2174	   "loop detection" control packet (LDCP) along the path towards a
2175	   specified destination whenever the route to the destination
2176	   changes. This LDCP is forwarded in the direction that the label
2177	   specifies, with the labels swapped to the correct next hop value.
2178	   However, normal L2 forwarding cannot be used because each hop
2179	   needs to examine the packet to check for loops. The LDCP is
2180	   forwarded towards that destination until one of the following
2181	   happens: (i) The LDCP reaches the last MPLS node along the path
2182	   (ie the next hop is either a router which is not participating in
2183	   MPLS, or is the final destination host); (ii) The TTL of the LDCP
2184	   expires (assuming that the control packet uses a TTL, which is
2185	   optional but not absolutely necessary), or (iii) The LDCP returns
2186	   to the node which originally transmitted it. If the latter occurs,
2187	   then the packet has looped and the node which originally
2188	   transmitted the LDCP stops using the associated label, and instead
2189	   uses L3 forwarding for the associated destination addresses. One
2190	   problem with this method is that once a loop is detected it is not
2191	   known when the loop clears. One option would be to set a timer,
2192	   and to transmit a new LDCP when the timer expires.

2194	   Loop detection may also be achieved via a Path Vector control
2195	   message. A Path Vector contains a list of the LSRs that that label
2196	   distribution Control message has traversed. Each LSR which
2197	   propagates a control packet to either create or modify an LSP adds
2198	   its own unique identifier to the Path Vector list. An LSR that
2199	   receives a message with a Path Vector that contains its own
2200	   identifier detects that the message has traversed a loop.

2202	   An alternate method counts the hops to each egress node, based on
2203	   the routes currently available. Each node advertises its distance
2204	   (in hop counts) to each destination. An egress node advertises the
2205	   destinations that it can reach directly with an associated hop
2206	   count of zero. For each destination, a node computes the hop count
2207	   to that destination based on adding one to the hop count
2208	   advertised by its actual next hop used for that destination. When
2209	   the hop count for a particular destination changes, the hop counts
2210	   needs to be readvertised.

2212	   In addition, the first of the loop prevention schemes discussed
2213	   below may be modified to provide loop detection.

2215	   Loop Prevention:

2217	   Loop prevention makes use of methods to ensure that loops are
2218	   never set up at L2. This implies that the labels are not used
2219	   until some method is used to ensure that following the label
2220	   towards the destination, with associated label swaps at each
2221	   switch, will not result in a loop. Until the L2 path (making use
2222	   of assigned labels) is available, packets are forwarded at L3.

2224	   Loop prevention requires explicit signaling of some sort to be
2225	   used when setting up an L2 stream.

2227	   One method of loop prevention requires that labels be propagated
2228	   starting at the egress switch. The egress switch signals to
2229	   neighboring switches the label to use for a particular
2230	   destination. That switch then signals an associated label to its
2231	   neighbors, etc. The control packets which propagate the labels
2232	   also include the path to the egress (as a list of routerIDs). Any
2233	   looping control packet can therefore be detected and the path not
2234	   set up to or past the looping point.

2236	   During routing changes, a diffusion mechanism may be used to
2237	   prevent the formation of L2 loops. The purpose of the diffusion
2238	   computation is to prune the tree of an LSR that has detected a
2239	   route change for a given FEC, such that all upstream LSR's from
2240	   the tree that would be on a looping path are removed. It is only
2241	   after those LSR's are removed from the tree that it is safe to
2242	   replace the old LSP with the new LSP (and the old LSP can be
2243	   released).

2245	   The diffusion mechanism is an extension of the Path Vector
2246	   mechanism. An LSR, D, that detects that the next hop for an FEC
2247	   has changed, transmits a query message with a Path Vector
2248	   containing its unique identifier to its upstream neighbors. An
2249	   LSR, U, that receives such a query will determine if D is the next
2250	   hop for the given FEC. If not, then U may return "OK", meaning
2251	   that as far as node U is concerned it is safe for node D to switch
2252	   over to the new LSP. If node D is the next hop, then node U checks
2253	   the Path Vector to see if its unique identifier is already
2254	   present. If so, then a route loop is detected; in this case, node
2255	   U responds with a "LOOP" message, and node D will prune node U off
2256	   of its tree. If no loop is detected, then node U adds its unique
2257	   identifier to the Path Vector, and propagates the query message to
2258	   each of its upstream neighbors. The diffusion computation
2259	   continues to propagate upstream along each of the paths in the
2260	   tree until an ingress or looping LSR is found. Once an LSR has
2261	   received a response from each of its upstream neighbors, it may
2262	   then return an "OK" message to its downstream neighbor. When the
2263	   original node, node D, receives a response from each of its
2264	   neighbors, it is safe to replace the old LSP with the new one
2265	   because all the paths that would have looped have been pruned from
2266	   the tree. [ARCH]

2268	   An alternative method of loop prevention is the "colored"
2269	   mechanism. The heart of the Colored Thread (CT) algorithm
2270	   propagates a procedure that gives a color to each link along the
2271	   LSP in the downstream direction. The color is composed of two
2272	   fixed-length objects; the address of the node that created the
2273	   color and a local identifier that is unique within the creating
2274	   node. A loop-free LSP is established when the node that triggered
2275	   the coloring procedure receives an acknowledgment for the
2276	   procedure from its downstream node. During the coloring procedure,
2277	   a set of attributes (color, hop count, TTL), referred to as a
2278	   thread, is propagated downstream. A node that finds a change in
2279	   the next hop creates a color and passes it on the outgoing link to
2280	   the new next hop. If a node receives a color on an incoming link,
2281	   it either (a) passes the received color or (b) creates a new color
2282	   and passes it, on the outgoing link to the next hop. The coloring
2283	   procedure is propagated downstream until the LSP turns out to be
2284	   loop-free or a loop is found. In case (i), a positive
2285	   acknowledgment (ACK) is returned hop-by-hop to upstream nodes. In
2286	   case (ii), the coloring procedure is stalled and no ACK is
2287	   returned. [LOOP-COLOR]

2289	   Another option is to use explicit routing to set up label bindings
2290	   from the egress switch to each ingress switch. This precludes the
2291	   possibility of looping, since the entire path is computed by one
2292	   node. This also allows non-looping paths to be set up provided
2293	   that the egress switch has a view of the topology which is
2294	   reasonably close to reality (if there are operational links which
2295	   the egress switch doesn't know about, it will simply pick a path
2296	   which doesn't use those links; if there are links which have
2297	   failed but which the the egress switch thinks are operational,
2298	   then there is some chance that the setup attempt will fail but in
2299	   this case the attempt can be retried on a separate path). Note
2300	   therefore that non-looping paths can be set up with this method in
2301	   many cases where distributed routing plus hop by hop forwarding
2302	   would not actually result in non-looping paths. This method is
2303	   similar to the method used by standard ATM routing to ensure that
2304	   SVCs are non-looping [PNNI].

2306	   Explicit routing is only applicable if the routing protocol gives
2307	   the egress switch sufficient information to set up the explicit
2308	   route, implying that the protocol must be either a link state
2309	   protocol (such as OSPF) or a path vector protocol (such as BGP).
2310	   Source routing therefore is not appropriate as a general approach
2311	   for use in any network regardless of the routing protocol. This
2312	   method also requires some overhead for the call setup before label-
2313	   based forwarding can be used. If the network topology changes in a
2314	   manner which breaks the existing path, then a new path will need
2315	   to be explicit routed from the egress switch. Due to this overhead
2316	   this method is probably only appropriate if other significant
2317	   advantages are also going to be obtained from having a single node
2318	   (the egress switch) coordinate the paths to be used. Examples of
2319	   other reasons to have one node coordinate the paths to a single
2320	   egress switch include: (i) Coordinating the VCI space where VP
2321	   merge is used (see section 4.2); and (ii) Coordinating the routing
2322	   of streams from multiple ingress switches to one egress switch so
2323	   as to balance the load on multiple alternate paths through the
2324	   network.

2326	   In principle the explicit routing could also be done in the
2327	   alternate direction (from ingress to egress). However, this would
2328	   make it more difficult to merge streams if stream merge is to be
2329	   used. This would also make it more difficult to coordinate (i)
2330	   changes to the paths used, (ii) the VCI space assignments, and
2331	   (iii) load sharing. This therefore makes explicit routing more
2332	   difficult, and also reduces the other advantages that could be
2333	   obtained from the approach.

2335	   If label distribution is piggybacked on the routing protocol (see
2336	   section 4.1.2), then loop prevention is only possible if the
2337	   routing protocol itself does loop prevention.

2339	   What To Do If A Loop Is Detected:

2341	   With all of these schemes, if a loop is known to exist then the L2
2342	   label-swapped path is not set up. This leads to the obvious
2343	   question of what does an MPLS node do when it doesn't have a label
2344	   for a particular destination, and a packet for that destination
2345	   arrives to be forwarded? If possible, the packet is forwarded
2346	   using normal L3 (IP) forwarding. There are two issues that this
2347	   raises: (i) What about nodes which are not capable of L3
2348	   forwarding; (ii) Given the relative speeds of L2 and L3
2349	   forwarding, does this work?

2351	   Nodes which are not capable of L3 forwarding obviously can't
2352	   forward a packet unless it arrives with a label, and the
2353	   associated next hop label has been assigned. Such nodes, when they
2354	   receive a packet for which the next hop label has not been
2355	   assigned, must discard the packet. It is probably safe to assume
2356	   that if a node cannot forward an L3 packet, then it is probably
2357	   also incapable of forwarding an ICMP error report that it
2358	   originates. This implies that the packet will need to be silently
2359	   discarded in this case.

2361	   In many cases L2 forwarding will be significantly faster than L3
2362	   forwarding (allowing faster forwarding is a significant motivation
2363	   behind the work on MPLS). This implies that if a node is
2364	   forwarding a large volume of traffic at L2, and a change in the
2365	   routing protocol causes the associated labels to be lost
2366	   (necessitating L3 forwarding), in some cases the node will not be
2367	   capable of forwarding the same volume of traffic at L3. This will
2368	   of course require that packets be discarded. However, in some
2369	   cases only a relatively small volume of traffic will need to be
2370	   forwarded at L3. Thus forwarding at L3 when L2 is not available is
2371	   not necessarily always a problem. There may be some nodes which
2372	   are capable of forwarding equally fast at L2 and L3 (for example,
2373	   such nodes may contain IP forwarding hardware which is not
2374	   available in all nodes). Finally, when packets are lost this will
2375	   cause TCP to backoff, which will in turn reduce the load on the
2376	   network and allow the network to stabilize even at reduced
2377	   forwarding rates until such time as the label bindings can be
2378	   reestablished.

2380	   In many cases MPLS may be used for traffic engineering. In these
2381	   cases failure of an LSP may cause packets which would have taken
2382	   that LSP to be forwarded (using L3 forwarding) along paths which
2383	   are not consistent with the traffic engineering solution. This
2384	   could in turn cause congestion. In these cases packets may need to
2385	   be discarded even if the LSRs are capable of full line rate L3
2386	   forwarding. This may cause problems very similar to those
2387	   discussed in the previous paragraph.

2389	   Note that in most cases loops will be caused either by
2390	   configuration errors, or due to short term transient problems
2391	   caused by the failure of a link. If only one link goes down, and
2392	   if routing creates a normal "tree-shaped" set of paths to any one
2393	   destination, then the failure of one link somewhere in the network
2394	   will effect only one link's worth of data passing through any one
2395	   node in the network. This implies that if a node is capable of
2396	   forwarding one link's worth of data at L3, then in many or most
2397	   cases it will have sufficient L3 bandwidth to handle looping data.

2399	4.4 Interoperation with NHRP

2401	   When label switching is used over ATM, and there exists an LSR
2402	   which is also operating as a Next Hop Client (NHC), the
2403	   possibility of direct interaction arises. That is, could one
2404	   switch cells between the two technologies without reassembly? To
2405	   enable this several important issues must be addressed.

2407	   The encapsulation must be acceptable to both MPLS and NHRP. If
2408	   only a single label is used, then the null encapsulation could be
2409	   used. Other solutions could be developed to handle label stacks.

2411	   NHRP must understand and respect the granularity of a stream.

2413	   Currently NHRP resolves an IP address to an ATM address. The
2414	   response may include a mask indicating a range of addresses.
2415	   However, any VC to the ATM address is considered to be a viable
2416	   means of packet delivery. Suppose that an NHC NHRPs for IP address
2417	   A and gets back ATM address 1 and sets up a VC to address 1. Later
2418	   the same NHC NHRPs for a totally unrelated IP address B and gets
2419	   back the same ATM address 1. In this case normal NHRP behavior
2420	   allows the NHC to use the VC (that was set up for destination A)
2421	   for traffic to B [NHRP].

2423	   Note: In this section we will refer to a VC set up as a result of
2424	   an NHRP query/response as a shortcut VC.

2426	   If one expects to be able to label switch the packets being
2427	   received from a shortcut VC, then the label switch needs to be
2428	   informed as to exactly what traffic will arrive on that VC and
2429	   that mapping cannot change without notice. Currently there exists
2430	   no mechanism in the defined signaling of an shortcut VC. Several
2431	   means are possible. A binding, equivalent to the binding in LDP,
2432	   could be sent in the setup message. Alternatively, the binding of
2433	   prefix to label could remain in an LDP session (or whatever means
2434	   of label distribution as appropriate) and the setup could carry a
2435	   binding of the label to the VC. This would leave the binding
2436	   mechanism for shortcut VCs independent of the label distribution
2437	   mechanism.

2439	   A further architectural challenge exists in that label switching
2440	   is inherently unidirectional whereas ATM is bi-directional. The
2441	   above binding semantics are fairly straight-forward. However,
2442	   effectively using the reverse direction of a VC presents further
2443	   challenges.

2445	   Label switching must also respect the granularity of the shortcut
2446	   VC. Without VC merge, this means a single label switched flow must
2447	   map to a VC. In the case of VC merge, multiple label switched
2448	   streams could be merged onto a single shortcut VC. But given the
2449	   asymmetry involved, there is perhaps little practical use.

2451	   Another issue is one of practicality and usefulness. What is sent
2452	   over the VC must be at a fine enough granularity to be label
2453	   switched through receiving domain. One potential place where the
2454	   two technologies might come into play is in moving data from one
2455	   campus via the wide-area to another campus. In such a scenario,
2456	   the two technologies would border precisely at the point where
2457	   summarization is likely to occur. Each campus would have a
2458	   detailed understanding of itself, but not of the other campus. The
2459	   wide-area is likely to have summarized knowledge only. But at such
2460	   a point level 3 processing becomes the likely solution.

2462	4.5. Operation in a hierarchy

2464	   MPLS allows hierarchical operation, through use of a label stack.
2465	   This allows MPLS to simultaneously be used for routing at a fine
2466	   grain level (for example, between individual routers within an
2467	   ISP) and at a higher "area by area" or "domain by domain" level.

2469	4.5.1 Example of Hierarchical Operation

2471	   Figure 1 illustrates an example of how MPLS may operate in a
2472	   hierarchy. This example illustrates three transit routing domains
2473	   (Domain #1, #2, and #3). For example, these three domains may
2474	   represent internet service providers. Domain Boundary Routers are
2475	   illustrated in each domain (routers R1 and R2 in domain #1,
2476	   routers R3 and R8 in domain #2, and routers R9 and R10 in domain
2477	   #3. Suppose that these domain boundary routers are operating BGP.

2479	   Internal routers are not illustrated in domains 1 and 3. However,
2480	   internal routers are illustrated within domain #2. In particular,
2481	   the path between routers R3 and R8 follows the internal routers
2482	   R4, R5, R6, and R7 within domain #2.

2484	   .................    ........................    ................
2485	   .               .    .                      .    .              .
2486	   .               .    .                      .    .              .
2487	   .R1           R2------R3                  R8------R9         R10.
2488	   .               .    . \                 /  .    .              .
2489	   .               .    .  R4---R5---R6---R7   .    .              .
2490	   .               .    .                      .    .              .
2491	   .   Domain#1    .    .       Domain#2       .    .    Domain#3  .
2492	   .................    ........................    ................

2494	              Example of the Use of MPLS in a Hierarchy

2496	   In this example there are two levels of routing taking place. For
2497	   example, OSPF may be used for routing within Domain #2. In this
2498	   case the routers R3, R4, R5, R6, R7, and R8 may be running OSPF
2499	   amongst themselves in order to compute routes within Domain #2.
2500	   The domain boundary routers (R1, R2, R3, R8, R9, and R10) operate
2501	   BGP in order to determine paths between routing domains.

2503	   MPLS allows label forwarding to be done independently at multiple
2504	   levels. In this example, MPLS may be used at the BGP level
2505	   (between routers R1, R2, R3, R8, R9, and R10) and at the OSPF
2506	   level (between routers R4, R5, R6, and R7). Thus when the IP
2507	   packet traverses Domain number 2, it will contain two labels,
2508	   encoded as a "label stack". The higher level label would be used
2509	   between routers R3 and R8. This would be encapsulated inside a
2510	   header specifying a lower level label used within domain 2.

2512	   Consider the forwarding operation that takes place at router R3.
2513	   In this case, R3 will receive a packet from R2 containing a single
2514	   label (the BGP level label). R3 will need to swap BGP level labels
2515	   in order to put the label that R8 expects. R3 will also need to
2516	   add an OSPF-level label, as is expected by R4. R3 therefore
2517	   "pushes down" the BGP level label in the label stack, by adding a
2518	   lower level label. Also note that the actual label swapping
2519	   operation performed by R3 can be optimized to allow very simple
2520	   forwarding: R3 receives a single incoming label from R2, and can
2521	   map this label into the new label header to be prepended to the
2522	   packet, it just happens that the new label header to be added by
2523	   R3 contains two labels rather than one.

2525	4.5.2 Components Required for Hierarchical Operation

2527	   In order for MPLS to operate in a hierarchy, there are three
2528	   things which must be accomplished:

2530	   - Hierarchical Label Exchange in LDP
2531	     The Label Distribution Protocol needs to exchange labels at
2532	     each level of the hierarchy. In our example, R3 needs to
2533	     exchange label bindings with R8 for operation at the BGP
2534	     level. At the same time, R3 needs to exchange label
2535	     bindings with R4 (and R4 needs to exchange label bindings
2536	     with R5) for operation at the OSPF level. The control
2537	     component for hierarchical labeling is essentially the same
2538	     as that for single level tagging, except that labels are
2539	     exchanged not just among physically adjacent LSRs but
2540	     between those switching on the same level in the tag stack.

2542	   - Label Stack
2543	     Multiple labels need to be carried in data packets. For
2544	     example, when a data packet is being carried across domain
2545	     #2, the data packet needs to be encapsulated in a header
2546	     which carries BGP level label, and the resulting packet
2547	     needs to be carried in a header which carries an OSPF level
2548	     label.

2550	   - Configuration
2551	     It is necessary for routers to know when hierarchical label
2552	     switching is being used.

2554	4.5.3 Some Restrictions on Use of Hierarchical MPLS

2556	   Consider the example in figure 1. In this case, the BGP-level
2557	   label is encoded by router R1. Label swapping is employed for
2558	   packet forwarding at R2, R3, R8, and R9. This is only possible if
2559	   R1 knows the right label to use, implying that the granularity
2560	   used in mapping packets to forwarding equivalence classes is the
2561	   same at routers R2, R3, R8, and R9.

2563	   We can consider some specific examples to illustrate the issue:

2565	   Suppose that the destination host is within domain 3. In this
2566	   case, it is very likely that router R9 will forward the packet
2567	   based on a finer grain than was used previously. For example, a
2568	   relatively short address prefix may be used for advertising the
2569	   addresses reachable in domain 3, while longer (more specific)
2570	   address prefixes may be used for specific areas or subnets within
2571	   domain 3. In this case router R1 may assign a BGP level label to
2572	   the packet, and label based forwarding at the BGP level may be
2573	   used by routers R1, R2, R3, and R8. However, router R9 will need
2574	   to make use of layer 3 forwarding.

2576	   Alternatively, suppose that domain 3 is an Internet Service
2577	   Provider, which offers service to multiple routing domains.
2578	   Suppose that in this case domain 3 makes use of a single CIDR
2579	   address block (based on a single address prefix), with smaller
2580	   address blocks (corresponding to longer address prefixes) assigned
2581	   to each of multiple domains who get their Internet service from
2582	   domain 3. Suppose that the destination for a particular IP packet
2583	   is contained in one of these smaller domains whose addresses are
2584	   contained in the larger address block assigned to and administered
2585	   by domain 3. Again in this case router R9 will need to make use of
2586	   label based forwarding.

2588	   Let's consider another possible complication: Suppose that router
2589	   R1 is an MPLS node, but that some of the internal routers within
2590	   domain 1 do not know about MPLS. In this case, suppose that R1
2591	   encapsulates an IP packet in an MPLS header in order to carry the
2592	   BGP level label. In this case the non-MPLS-capable routers within
2593	   domain 1 will not know what to do with the MPLS header. This
2594	   implies that MPLS can be used at a higher level (such as between
2595	   the border routers R1 and R2 in our example) only if either the
2596	   lower level routers (such as the routers within domain 1)are also
2597	   using MPLS, or the MPLS header is itself encapsulated within an IP
2598	   header for transmission across the domain.

2600	   These examples imply that there are some cases where IP forwarding
2601	   will be required in a hierarchy. While hierarchical MPLS may be
2602	   useful in many cases, it does not replace layer 3 forwarding.

2604	4.5.4 The Relationship between MPLS hierarchy and Routing Hierarchy

2606	4.5.4.1 Stacked Labels in a Flat Routing Environment

2608	   The label stacking mechanism can be useful in some scenarios
2609	   independent of routing hierarchy.

2611	   The basic concept of stacking is to provide a mechanism to
2612	   segregate streams within a switched path. Under normal operation,
2613	   when packets are encapsulated into a single L2 header, if multiple
2614	   streams are forwarded into a switched path, it will require L3
2615	   processing to segregate a certain stream at the end of the
2616	   switched path. The stacking mechanism provides an easy way to
2617	   maintain the identity of various streams which are merged into a
2618	   single switched path.

2620	   One useful application of this technique is in Virtual Private
2621	   Networks. The packets can be switched both at the ingress and
2622	   egress nodes of the provider network. A packet coming in at one
2623	   end of a customer network contains an encapsulated header with the
2624	   VPN label. At the VPN ingress node, the header is "popped", to
2625	   provide the label for switching through the VPN. Further, this
2626	   header is then "pushed" with an encapsulation of the far end
2627	   customer label. At the VPN egress node, the packet header is
2628	   "popped" again, and the new header provides the label for
2629	   switching through the customer site. This enables one to provide
2630	   customers with benefits of VPN with end-to-end switching for
2631	   optimal performance.

2633	   Another interesting use can be in conjunction with RSVP flows. In
2634	   RSVP, senders flows can be logically merged under a single
2635	   resource reservation using the Shared and the Wildcard filters.
2636	   The stacking mechanism can be used to merge flows into a single
2637	   label and the shared QoS can be applied to the single label on top
2638	   of the stack. Since sender flows within the merged switched path
2639	   maintain their identity, it is easy to demerge at a downstream
2640	   node without requiring L3 processing of the packets. Another
2641	   similar application can be merging of several premium service
2642	   flows with similar QoS into a single switched path. This helps in
2643	   conserving labels in backbone of a large networks.

2645	   Yet another useful application can be DVMRP tunnels similar in
2646	   concept to the DVMRP tunnels used in the existing Mbone. The
2647	   ingress node to the DVMRP switched tunnels encapsulates the label
2648	   learned from the egress node of the DVMRP tunnel for a particular
2649	   (S,G) pair before forwarding packets into the DVMRP tunnel. The
2650	   egress node of the tunnel just pops the top label and switches the
2651	   packet based on the interior label.

2653	   Note that the use of tunnels can be also quite beneficial in a non-
2654	   hierarchical environment. Take for example the case where a domain
2655	   contains a subset of MPLS nodes. The MPLS egress can advertise
2656	   labels for the routes which are within the domain, but are
2657	   external to the MPLS core. The ingress node can encapsulate
2658	   packets for these destinations within the header for the
2659	   aggregated switched path that crosses the MPLS domain.

2661	   It is not evident if this technique has any useful application in
2662	   a flat routing domain, but can be used in conjunction with
2663	   explicit routing when providing specialized services. The multiple
2664	   levels of encapsulation can also be used like loose source
2665	   routing.

2667	4.5.4.2 Flat labels in a Hierarchical Routing Environment

2669	   It is also possible in some environments to use a single level of
2670	   label in a network using hierarchical routing. This is for example
2671	   possible in the case of a two level OSPF network in which the
2672	   primary purpose of the network is to support external routes.
2673	   Specifically, (depending upon the types of area hierarchy used)
2674	   OSPF allows external routes to be advertised throughout an OSPF
2675	   routing domain, with each external route associated with the
2676	   routerID of the router with reachability to the specific route.
2677	   This implies that it is possible to set up an LSP to every router
2678	   in the routing domain, and then use the LSP for packets destined
2679	   to the associated external routes.

2681	4.5.4.3 Configuration of the Hierarchy

2683	   The possibility of having a variety of different relationships
2684	   between the routing hierarchy and the MPLS hierarchy leads to an
2685	   obvious question: How is the relationship between the two
2686	   hierarchies to be determined? At first glance it would seem that
2687	   this generality leads to a relatively complex configuration issue,
2688	   and it could be difficult to ensure consistent configuration of
2689	   the network.

2691	   One possible solution is to have the MPLS hierarchy default to
2692	   using the same hierarchy structure as is used for routing, with
2693	   each area and domain boundary (as used by routing) also implying
2694	   an MPLS domain boundary. This would allow the normal default
2695	   operation to conform to the type of operation that we might expect
2696	   to be used in most situations, and would allow a common means of
2697	   interoperation which we would expect all vendors of MPLS compliant
2698	   equipment to support.

2700	4.5.5 Some Advantages of Hierarchical MPLS

2702	   The use of hierarchical MPLS allows the routers internal to a
2703	   transit routing domain to be isolated from the BGP-level routing
2704	   information. In our example network, routers R4, R5, R6, and R7
2705	   can forward packets based solely on the lower level label. These
2706	   internal routers do not need to know anything at all about higher
2707	   level IP routing. Note that this advantage is not available in
2708	   conventional IP forwarding: If the internal routers within a
2709	   routing domain forward IP packets based on the destination IP
2710	   address, then the internal routers need to know which route to use
2711	   for any particular destination IP address. By combining
2712	   hierarchical routing with label stacks MPLS is able to decouple
2713	   the exterior and interior protocols. MPLS switches within a domain
2714	   (interior switches) need only carry the reachability information
2715	   for nodes in the domain. The MPLS border switches for the domain
2716	   still, of course, carry the external routes.

2718	   Use of hierarchical MPLS also extends the simpler forwarding
2719	   offered by MPLS to domain boundary routers.

2721	   MPLS places no bound on the number of labels that may be present
2722	   in a label stack. In principal this means that MPLS can support
2723	   multiple levels of routing hierarchy.

2725	4.6 Interoperation of MPLS systems with "Conventional" ATM

2727	   If we consider the implementation of MPLS on ATM switches we can
2728	   imagine several possibilities.

2730	   We might remove ATM Forum control plane completely. This is the
2731	   approach taken by Ipsilon in their IP Switching approach, and
2732	   allows ATM switches to operate as MPLS LSRs.

2734	   Alternately, we could build a system that supports a "Ships in the
2735	   night" (SIN) mode of operation where the ATM Forum and MPLS
2736	   control planes both run on the same hardware but are isolated from
2737	   each other, ie, they do not interact. This allows a single device
2738	   to simultaneously operate as both an MPLS LSR and an ATM switch.

2740	   We feel that the MPLS architecture should allow both of these
2741	   models. We note, however, that neither of them addresses the issue
2742	   of operation of MPLS over a public ATM network, ie over a network
2743	   that supports tariffed access to PVCs and ATM Forum SVCs. Because
2744	   public ATM service exists and will, presumably, become more
2745	   pervasive in the future we feel that another model needs to be
2746	   included in the architecture and be supported by MPLS. We call
2747	   this model the "integrated" model. In essence it is the same as
2748	   the SIN model but without the restriction that the two control
2749	   planes are isolated. In the integrated model the MPLS control
2750	   plane is able to use the ATM control plane to setup SVCs as
2751	   needed. An example of this integrated model that allows the
2752	   coexistence and interoperation between ATM and MPLS is the CSR
2753	   proposal from Toshiba.

2755	   Note that there is a distinction relevant to the protocol
2756	   specification process between the SIN and the Integrated approach.
2757	   SIN does not require specification other than to require that it
2758	   be transparent to both the MPLS and ATM control planes (ie neither
2759	   should know of the others existence). Realisation of SIN on a
2760	   particular machine is purely an engineering challenge for the
2761	   implementors. The Integrated model on the other hand requires
2762	   specification of procedures for the use of SVCs and association of
2763	   labels with them.

2765	4.7 Multicast

2767	   This section is FFS.

2769	4.8 Multipath

2771	   Many IP routing protocols support the notion of equal-cost
2772	   multipath routes, in which a router maintains multiple next hops
2773	   for one destination prefix when two or more equal-cost paths to
2774	   the prefix exist. There are a few possible approaches for handling
2775	   multipath with MPLS.

2777	   In this discussion we will use the term "multipath node" to mean a
2778	   node which is keeping track of multiple switched paths from itself
2779	   for a single destination.

2781	   The first approach maintains a separate switched path from each
2782	   ingress node via one or more multipath nodes to a merge point.
2783	   This requires MPLS to distinguish the separate switched paths, so
2784	   that learning of a new switched path is not misinterpreted as a
2785	   replacement of the same switched path. This also requires an
2786	   ingress MPLS node be capable of distributing the traffic among the
2787	   multiple switched paths. This approach preserves switching
2788	   performance, but at a cost of proliferating the number of switched
2789	   paths. For example, each switched path consumes a distinct label.

2791	   The second approach establishes only one switched path from any
2792	   one ingress node to a destination. However, when the paths from
2793	   two different ingress nodes happen to arrive at the same node,
2794	   that node may use different paths for each (implying that the node
2795	   becomes a multipath node). Thus the switched path chosen by the
2796	   multipath node may assign a different downstream path to each
2797	   incoming stream. This conserves switched paths and maintains
2798	   switching performance, but cannot balance loads across downstream
2799	   links as well as the other approaches, even if switched paths are
2800	   selectively assigned. In issue with this approach is that the L2
2801	   path may be different from the normal L3 path, as traffic that
2802	   otherwise would have taken multiple distinct paths is forced onto
2803	   a single path.

2805	   The third approach allows a single stream arriving at a multipath
2806	   node to be split into multiple streams, by using L3 forwarding at
2807	   the multipath node. For example, the multipath node might choose
2808	   to use a hash function on the source and destination IP addresses,
2809	   in order to avoid misordering packets between any one IP source
2810	   and destination. This approach conserves switched paths at the
2811	   cost of switching performance.

2813	4.9 Host Interactions

2815	   There are a range of options for host interaction with MPLS:

2817	   The most straightforward approach is no host involvement. Thus
2818	   host operation may be completely independent of MPLS, rather hosts
2819	   operate according to other IP standards. If there is no host
2820	   involvement then this implies that the first hop requires an L3
2821	   lookup.

2823	   If the host is ATM attached and doing NHRP, then this would allow
2824	   the host to set up a Virtual Circuit to a router. However this
2825	   brings up a range of issues as was discussed in section 4.4
2826	   ("interoperation with NHRP").

2828	   On the ingress side, it is reasonable to consider having the first
2829	   hop LSR provide labels to the hosts, and thus have hosts attach
2830	   labels for packets that they transmit. This could allow the first
2831	   hop LSR to avoid an L3 lookup. It is reasonable here to have the
2832	   host request labels only when needed, rather than require the host
2833	   to remember all labels assigned for use in the network.

2835	   On the egress side, it is questionable whether hosts should be
2836	   involved. For scaling reasons, it would be undesirable to use a
2837	   different label for reaching each host.

2839	4.10 Explicit Routing

2841	   There are two options for Route Selection: (1) Hop by hop routing,
2842	   and (2) Explicit routing.

2844	   An explicitly routed LSP is an LSP where, at a given LSR, the LSP
2845	   next hop is not chosen by each local node, but rather is chosen by
2846	   a single node (usually the ingress or egress node of the LSP). The
2847	   sequence of LSRs followed by an explicit routing LSP may be chosen
2848	   by configuration, or by an algorithm performed by a single node
2849	   (for example, the egress node may make use of the topological
2850	   information learned from a link state database in order to compute
2851	   the entire path for the tree ending at that egress node).

2853	   With MPLS the explicit route needs to be specified at the time
2854	   that Labels are assigned, but the explicit route does not have to
2855	   be specified with each L3 packet. This implies that explicit
2856	   routing with MPLS is relatively efficient (when compared with the
2857	   efficiency of explicit routing for pure datagrams).

2859	   Explicit routing may be useful for a number of purposes such as
2860	   allowing policy routing and/or facilitating traffic engineering.

2862	4.10.1 Establishment of Point to Point Explicitly Routed LSPs

2864	   In order to establish a point to point explicitly routed LSP, the
2865	   signalling messages used to set up the LSP must contain the
2866	   explicit route. This implies that the LSP is set up in order
2867	   either from the ingress to the egress, or from the egress to the
2868	   ingress.

2870	   One node needs to pick the explicit route: This may be done in at
2871	   least two possible ways: (i) by configuration (eg, the explicit
2872	   route may be chosen by an operator, or by a centralized server of
2873	   some kind); (ii) By use of a routing protocol which allows the
2874	   ingress and/or egress node to know the entire route to be
2875	   followed. This would imply the use of a link state routing
2876	   protocol (in which all nodes know the full topology) or of a path
2877	   vector routing protocol (in which the ingress node is told the
2878	   path as part of the normal operation of the routing protocol).

2880	   Note: The normal operation of path vector routing protocols (such
2881	   as BGP) does not provide the full set of routers along the path.
2882	   This implies that either a partial source route only would be
2883	   provided (implying that LSP setup would use a combination of hop
2884	   by hop and explicit routing), or it would be necessary to augment
2885	   the protocol in order to provide the complete explicit route.

2887	   In the point to point case, it is relatively straightforward to
2888	   specify the route to use: This is indicated by providing the
2889	   addresses of each LSR on the LSP.

2891	4.10.2 Explicit and Hop by Hop routing: Avoiding Loops

2893	   In general, an LSP will be explicit routed specifically because
2894	   there is a good reason to use an alternative to the hop by hop
2895	   routed path. This implies that the explicit route is likely to
2896	   follow a path which is inconsistent with the path followed by hop
2897	   by hop routing. If some of the nodes along the path follow an
2898	   explicit route but some of the nodes make use of hop by hop
2899	   routing (and ignore the explicit route), then inconsistent routing
2900	   may result and in some cases loops (or severely inefficient paths)
2901	   may form. For any one LSP, there are three possible options: (i)
2902	   The entire LSP may be hop by hop routed; or (ii) The entire LSP
2903	   may be explicit routed; or (iii) The LSP may consist of both hop
2904	   by hop and explicit routed segments provided that the LSP is
2905	   established using ordered control.

2907	   For this reason, it is important that if a strict explicit route
2908	   is specified for setting up an LSP, then that route must be
2909	   followed in setting up the LSP.

2911	   There is a related issue when a link or node in the middle of an
2912	   explicitly routed LSP breaks: In this case, the last operating
2913	   node on the upstream part of the LSP will continue receiving
2914	   packets, but will not be able to forward them along the explicitly
2915	   routed LSP (since its next hop is no longer functioning). In this
2916	   case, it is not in general safe for this node to forward the
2917	   packets using L3 forwarding with hop by hop routing. Instead, the
2918	   packets must be discarded, and the upstream partition of the
2919	   explicitly routed LSP must be torn down.

2921	   Where part of an Explicitly Routed LSP breaks, the node which
2922	   originated the LSP needs to be told about this. For robustness
2923	   reasons the MPLS protocol design should not assume that the
2924	   routing protocol will tell the node which originated the LSP. For
2925	   example, it is possible that a link may go down and come back up
2926	   quickly enough that the routing protocol never declares the link
2927	   down. Rather, an explicit MPLS mechanism is needed.

2929	4.10.3 Merge and Explicit Routing

2931	   Explicit Routing is slightly more complex with a multipoint to
2932	   point LSP (ie, in the case that stream merge is used).

2934	   In this case, it is not possible to specify the route for the LSP
2935	   as a simple list of LSRs (since the LSP does not consist of a
2936	   simple sequence of LSRs). There are several ways that this may be
2937	   accomplished. Details are outside the scope of this document.

2939	4.10.4 Using Explicit Routing for Traffic Engineering

2941	   In the Internet today it is relatively common for ISPs to make use
2942	   of a Frame Relay or ATM core, which interconnects a number of IP
2943	   routers. The primary reason for use of a switching (L2) core is to
2944	   make use of low cost equipment which provides very high speed
2945	   forwarding. However, there is another very important reason for
2946	   the use of a L2 core: In order to allow for Traffic Engineering.

2948	   Traffic Engineering (also known as bandwidth management) refers to
2949	   the process of managing the routes followed by user data traffic
2950	   in a network in order to provide relatively equal and efficient
2951	   loading of the resources in the network (ie, to ensure that the
2952	   bandwidth on links and nodes are within the capabilities of the
2953	   links and nodes).

2955	   Some rudimentary level of traffic engineering can be accomplished
2956	   with pure datagram routing and forwarding by adjusting the metrics
2957	   assigned to links. For example, suppose that there is a given link
2958	   in a network which tends to be overloaded on a long term basis.
2959	   One option would be to manually configure an increased metric
2960	   value for this link, in the hopes of moving some traffic onto
2961	   alternate routes. This provides a rather crude method of traffic
2962	   engineering and provides only limited results.

2964	   Another method of traffic engineering is to manually configure
2965	   multiple PVCs across a L2 core, and to adjust the route followed
2966	   by each PVC in an attempt to equalize the load on different parts
2967	   of the network. Where necessary, multiple PVCs may be configured
2968	   between the same two nodes, in order to allow traffic to be split
2969	   between different paths. In some topologies it is much easier to
2970	   achieve efficient non-overlapping or minimally-overlapping paths
2971	   via this method (with manually configured paths) than it would be
2972	   with pure datagram forwarding. A similar ability can be achieved
2973	   with MPLS via the use of manual configuration of the paths taken
2974	   by LSPs.

2976	   A related issue is the decision on where merge is to occur. Note
2977	   that once two streams merge into one stream (forwarded by a single
2978	   label) then they cannot diverge again at that level of the MPLS
2979	   hierarchy (ie, they cannot be bifurcated without looking at a
2980	   higher level label or the IP header). Thus there may be times when
2981	   it is desirable to explicitly NOT merge two streams even though
2982	   they are to the same egress node and FEC. Non-merge may be
2983	   appropriate either because the streams will want to diverge later
2984	   in the path (for example, to avoid overloading a particular
2985	   downstream link), or because the streams may want to use different
2986	   physical links in the case where multiple slower physical links
2987	   are being aggregated into a single logical link for the purpose of
2988	   IP routing.

2990	   As a network grows to a very large size (on the order of hundreds
2991	   of LSRs), it becomes increasingly difficult to handle the
2992	   assignment of all routes via manual configuration. However,
2993	   explicit routing allows several alternatives:

2995	  1. Partial Configuration: One option is to use
2996	     automatic/dynamic routing for most of the paths through
2997	     the network, but then manually configure some routes. For
2998	     example, suppose that full dynamic routing would result in
2999	     a particular link being overloaded. One of the LSPs which
3000	     uses that link could be selected and manually routed to
3001	     use a different path.

3003	  2. Central Computation: One option would be to provide long
3004	     term network usage information to a single central
3005	     management facility. That facility could then run a global
3006	     optimization to compute a set of paths to use. Network
3007	     management commands can be used to configure LSRs with the
3008	     correct routes to use.

3010	  3. Egress Computation: An egress node can run a computation
3011	     which optimizes the path followed for traffic to itself.
3012	     This cannot of course optimize the entire traffic load
3013	     through the network, but can include optimization of
3014	     traffic from multiple ingress's to one egress. The reason
3015	     for optimizing traffic to a single egress, rather than
3016	     from a single ingress, relates to the issue of when to
3017	     merge: An ingress can never merge the traffic from itself
3018	     to different egresses, but an egress can if desired chose
3019	     to merge the traffic from multiple ingress's to itself.

3021	4.11 TTL and Traceroute

3023	   Traceroute is a useful method which is widely used for management
3024	   of IP networks. It is therefore highly desirable for traceroute
3025	   and TTL to be preserved in networks where MPLS is used. TTL can
3026	   also be useful to minimize the impact of loops (ie, as an aid to
3027	   loop survival).

3029	   In cases where the MPLS shim header is used, and where the IP
3030	   packets are normal Internet packets (ie, not part of a VPN), TTL
3031	   can optionally be handled in a way which is semantically identical
3032	   to operation in native IP networks. The ingress node, when
3033	   encapsulating an IP packet in the MPLS shim header, copies the TTL
3034	   from the IP header to the MPLS Shim Header. LSRs decrement the
3035	   TTL, and behave as normal IP routers in the case that the TTL
3036	   reaches zero (ie, discard the IP packet and return an ICMP error
3037	   report). Egress routers copy the TTL from the MPLS shim header
3038	   back to the IP header.

3040	   Where multiple MPLS shim headers are used in a label stack, TTL
3041	   can be handled in essentially the same manner. When a LSR pushes a
3042	   new header onto the stack, the TTL is copied from the previous
3043	   shim header to the new header. When an LSR pops a header off of
3044	   the stack, TTL is copied in the other direction.

3046	   Some carriers may choose to avoid exposing the topology (or even
3047	   the diameter) of their networks to customers. One way to do this
3048	   is to treat an entire LSP crossing the carrier network as a single
3049	   hop from the point of view of IP forwarding. In this case the
3050	   ingress router places a value in the TTL field of the shim header
3051	   which is independent of the TTL value found from the IP header.
3052	   Similarly the decapsulating router strips off the MPLS header and
3053	   forwards based on the IP header, but does not copy TTL values.
3054	   Routers which are in the middle of the LSP (neither ingress nor
3055	   egress) decrement the TTL contained in the MPLS shim header, but
3056	   do not return an error report if the TTL is expired.

3058	   There is a problem with the handling of ICMP error reports when
3059	   VPNs are supported using MPLS. In this case, the IP address space
3060	   used in the IP packet (carried over the LSP) might be local to the
3061	   VPN, and therefore might not be understood by the LSR which
3062	   detects that the TTL has reached zero. In addition, core LSRs
3063	   might not necessarily know which LSPs are supporting VPN traffic
3064	   and which are supporting Internet traffic. For this reason in
3065	   networks where VPNs are supported over MPLS, special precautions
3066	   are needed. If the ingress node knows the path of the LSP, then it
3067	   may discard the packet and return an ICMP error report (to the
3068	   VPNs space) if the TTL is less than the length of the LSP.
3069	   Alternatively, the TTL value used in the MPLS header may be
3070	   independent of the TTL value in the IP header, and the entire LSP
3071	   may be treated as a single hop from the perspective of datagram IP
3072	   forwarding. Alternatively, ICMP error reports could be turned off
3073	   in such networks.

3075	   One other potential solution to the ICMP error reporting problem
3076	   is to use "bidirectional" LSPs. In this case, two LSPs may be
3077	   created with the same endpoints, but which carry packets in
3078	   opposite directions. These two LSPs are logically coupled
3079	   together; that is, one LSP carries traffic from an originating
3080	   node to a destination node, while the other carries traffic from
3081	   the destination node to the originating node[TRAFENG]. When a
3082	   packet has to be discarded that had been flowing on the LSP in one
3083	   direction, the error report can be returned on the matching LSP in
3084	   the other direction. This is true even when the IP address space
3085	   encapsulated inside the LSP is one which the LSR does not
3086	   otherwise understand.

3088	   MPLS may also be used over L2 technologies which do not have TTL
3089	   values (specifically ATM and Frame Relay). In this case, TTL and
3090	   Traceroute may still be supported in some specific situations.

3092	   In our discussion we will assume that the MPLS encapsulation for
3093	   operation of MPLS over ATM and Frame Relay media always use a shim
3094	   header. Thus the packet would consist of an IP packet encapsulated
3095	   inside an MPLS shim header, which would in turn be encapsulated
3096	   for transmission over ATM or Frame Relay (eg, the IP packet and
3097	   MPLS shim header may be encapsulated in an AAL5 frame, which would
3098	   in turn be encapsulated inside ATM cells). If the shim header is
3099	   not used, when manipulations of the TTL in the shim header as
3100	   described below would be replaced by manipulations of the TTL
3101	   inside the IP header.

3103	   The most straightforward case is one where ATM or Frame Relay is
3104	   used for the entire path of the LSP, and where the ingress LSR
3105	   knows the entire path of the LSP (for example, this may occur when
3106	   the LSP is set up based on complete source routing). In this case
3107	   the ingress router decrements the TTL by the length of the LSP. If
3108	   the TTL reaches zero or a negative number, then the IP packet is
3109	   discarded and an ICMP error report is returned by the ingress
3110	   router, but with a source address which indicates the node at
3111	   which the TTL would have expired. In this case in principle the
3112	   TTL which is decremented could be either the one in the IP header
3113	   or the one in the MPLS header. However, it allows more uniform
3114	   operation (compared to other situations) if the TTL in the shim
3115	   header is decremented by the ingress router by the length of the
3116	   path, and then the egress router copies the TTL from the MPLS
3117	   header into the IP packet.

3119	   In some cases the length of the LSP might be known, but not the
3120	   exact identity of the LSRs along the path (eg, the LSP is set up
3121	   via ordered control). In this case the TTL can be decremented as
3122	   above, but if the TTL would expire the packet could be forwarded
3123	   by some "out of band" (control processor to control processor)
3124	   path in order to get the packet to the LSR at which the TTL will
3125	   reach zero.

3127	   There may be cases where part of the LSP traverses ATM or Frame
3128	   Relay links (using an ATM or Frame Relay header), and part
3129	   traverses other media (using the shim header).

3131	   Some of the issues which come up in this situation are best
3132	   illustrated through use of an example. Suppose that in the
3133	   following figure an LSP goes from R1 to R8. Thus R1 is the ingress
3134	   LSR, and R8 is the egress LSR for this particular LSP. LSRs R3 and
3135	   R6 have both ATM interfaces and non-ATM interfaces. Thus the MPLS
3136	   shim header is used on the link from R1 to R2, and from R2 to R3.
3137	   ATM is used on the links from R3 to R4, R4 to R5, and R5 to R6.
3138	   Finally, the shim header is again used on the links from R6 to R7,
3139	   and R7 to R8.

3141	        ...............................................
3142	        .                .             .              .
3143	        .                .             .              .
3144	        .R1------R2------R3           R6-----R7-----R8.
3145	        .                . \          /.              .
3146	        .                .  R4------R5 .              .
3147	        .                .             .              .
3148	        .  Shim Header   .     ATM     .  Shim Header .
3149	        ...............................................

3151	             LSP spanning ATM and Shim Header Media

3153	   If egress-initiated ordered control is used, then it is possible
3154	   that when the LSP is first set up the signalling protocol could
3155	   keep track of the number of hops to the next LSR that will use a
3156	   shim header (and which therefore understands TTL). In our example
3157	   R3 could therefore know that it is three hops to R6 (which is the
3158	   next router which will use a shim header containing a TTL value).
3159	   R3 can therefore decrement the TTL by the appropriate value (3),
3160	   and return an error report if the TTL will expire.

3162	   If ingress-initiated ordered control or independent control is
3163	   used, then it is not clear how R3 will know the identity of the
3164	   next LSR which understands TTL (ie, will use a shim header instead
3165	   of an ATM or frame relay header). For example, suppose that
3166	   complete explicit routing with ingress control is used. In this
3167	   case R3 will know the complete path to the egress (R8), but will
3168	   not know which downstream links use ATM media and which uses the
3169	   shim header. Thus R3 will know that R6 is a downstream LSR for
3170	   this LSP, but will not know that R6 is the specific LSR which
3171	   removes the packet from the ATM media.

3173	   R6 will forward the packet based on the incoming label implicit in
3174	   the VPI/VCI from the ATM media, plus the existing shim header.
3175	   Thus the TTL used at this point will be based on that received in
3176	   the shim header. This implies that the TTL value in the shim
3177	   header needs to be valid, which in turn implies that R3 needs to
3178	   adjust the TTL value in the shim header to account for the length
3179	   of the path from R3 to R6.

3181	4.12 LSP Control: Ordered versus Independent
3182	   There is a choice to be made regarding whether the initial setup
3183	   of LSPs will be in an ordered mode, where the assignment of LSP
3184	   labels is initiated by the egress node, or independently by each
3185	   individual node.

3187	   When LSP control is done independently, then each node may at any
3188	   time pass label bindings to its neighbors for each FEC recognized
3189	   by that node. In the normal case that the neighboring nodes
3190	   recognize the same FECs, then nodes may map incoming labels to
3191	   outgoing labels as part of the normal label swapping forwarding
3192	   method.

3194	   When LSP control is done in an ordered manner, then the egress
3195	   node passes label bindings to its neighbors corresponding to any
3196	   FECs which leave the MPLS network at that egress node. Other nodes
3197	   must wait until they get a label from downstream for a particular
3198	   FEC before passing a corresponding label for the same FEC to
3199	   upstream nodes.

3201	   With independent control, since each LSR is independently
3202	   assigning labels to FECs, it is possible that different LSRs may
3203	   make inconsistent decisions. For example, an upstream LSR may make
3204	   a coarse decision (map multiple IP address prefixes to a single
3205	   label) while its downstream neighbor makes a finer grain decision
3206	   (map each individual IP address prefix to a separate label). With
3207	   downstream label assignment this can be corrected by having LSRs
3208	   withdraw labels that it has assigned which are inconsistent with
3209	   downstream labels, and replace them with new consistent label
3210	   assignments.

3212	   This may appear to be an advantage of ordered LSP control (since
3213	   with egress control the initial label assignments "bubble up" from
3214	   the egress to upstream nodes, and consistency is therefore easy to
3215	   ensure). However, even with ordered control it is possible that
3216	   the choice of egress node may change, or the egress may (based on
3217	   a change in configuration) change its mind in terms of the
3218	   granularity which is to be used. This implies the same mechanism
3219	   will be necessary to allow changes in granularity to bubble up to
3220	   upstream nodes. The choice of ordered or independent control may
3221	   therefore effect the frequency with which this mechanism is used,
3222	   but will not effect the need for a mechanism to achieve
3223	   consistency of label granularity.

3225	   Ordered control and independent control can interwork in a very
3226	   straightforward manner: With either approach, (assuming downstream
3227	   label assignment) the egress node will initially assign labels for
3228	   particular FECs and will pass these labels to its neighbors. With
3229	   either approach these label assignments will bubble upstream, with
3230	   the upstream nodes choosing labels that are consistent with the
3231	   labels that they receive from downstream.

3233	   The difference between the two techniques therefore becomes a
3234	   tradeoff between avoiding a short period of initial thrashing on
3235	   startup (in the sense of avoiding the need to withdraw
3236	   inconsistent labels which may have been assigned using local
3237	   control) versus the imposition of a short delay on initial startup
3238	   (while waiting for the initial label assignments to bubble up from
3239	   downstream). The protocol mechanisms which need to be defined are
3240	   the same in either case, and the steady state operation is the
3241	   same in either case.

3243	5. Security

3245	   Security in a network using MPLS should be relatively similar to
3246	   security in a normal IP network.

3248	   Routing in an MPLS network uses precisely the same IP routing
3249	   protocols as are currently used with IP. This implies that route
3250	   filtering is unchanged from current operation. Similarly, the
3251	   security of the routing protocols is not effected by the use of
3252	   MPLS.

3254	   Packet filtering also may be done as in normal IP. This will
3255	   require either (i) that label swapping be terminated prior to any
3256	   firewalls performing packet filtering (in which case a separate
3257	   instance of label swapping may optionally be started after the
3258	   firewall); or (ii) that firewalls "look past the labels", in order
3259	   to inspect the entire IP packet contents. In this latter case note
3260	   that the label may imply semantics greater than that contained in
3261	   the packet header: In particular, a particular label value may
3262	   imply that the packet is to take a particular path after the
3263	   firewall. In environments in which this is considered to be a
3264	   security issue it may be desirable to terminate the label prior to
3265	   the firewall.

3267	   Note that in principle labels could be used to speed up the
3268	   operation of firewalls: In particular, the label could be used as
3269	   an index into a table which indicates the characteristics that the
3270	   packet needs to have in order to pass through the firewall.
3271	   Depending upon implementation considerations matching the contents
3272	   of the packet to the contents of the table may be quicker than
3273	   parsing the packet in the absence of the label.

3275	References

3277	   [ARCH]      "Multiprotocol Label Switching Architecture", E.
3278	               Rosen, A. Viswanathan, R. Callon, work in progress,
3279	               <draft-ietf-mpls-arch-05.txt>, April 1999.

3281	   [ARIS]      "ARIS: Aggregate Route-Based IP Switching", A.
3282	               Viswanathan, N. Feldman, R. Boivie, R. Woundy, IBM
3283	               Technical Report TR 29.2353, February 1998.

3285	   [ARIS-PROT] "ARIS Protocol Specification", N. Feldman, A.
3286	               Viswanathan, IBM Technical Report TR 29.2368, March
3287	               1998.

3289	   [ATM]       "MPLS using LDP and ATM VC Switching", Davie, Doolan,
3290	               Lawrence, McGloghrie, Rekhter, Rosen, Swallow, work in
3291	               progress, Internet Draft <draft-ietf-mpls-atm-02.txt>,
3292	               April 1999.

3294	   [ATMVP]     "MPLS using ATM VP Switching", N. Feldman, B.
3295	               Jamoussi, S. Komandur, A. Viswanathan, T. Worster,
3296	               work in progress, Internet Draft <draft-feldman-mpls-
3297	               atmvp-00.txt>, February, 1999.

3299	   [CR-LDP]    "Constraint-Based LSP Setup using LDP", Jamoussi, et.
3300	               al., work in progress, <draft-ietf-mpls-cr-ldp-
3301	               01.txt>, February, 1999.

3303	   [ENCAP]     "MPLS Label Stack Encoding", Rosen, Rekhter, Tappan,
3304	               Farinacci, Fedorkow, Li, Conta, work in progress,
3305	               Internet Draft <draft-ietf-mpls-label-encaps-04.txt>,
3306	               April 1999.

3308	   [FANP]      "Internetworking Based on Cell Switch Router-
3309	               Architecture and Protocol Overview", Y. Katsube, K.
3310	               Nagami, S. Matsuzawa, H. Esaki, Proceedings of the
3311	               IEEE, Vol. 85, No. 12, December, 1997.

3313	   [FR]        "Use of Label Switching on Frame Relay Networks", A.
3314	               Conta, P. Doolan, A. Malis, work in progress, Internet
3315	               Draft <draft-ietf-mpls-fr-03.txt>, November 1998.

3317	   [IPNAV]     "IP Switching for Scalable IP Services", H. Ahmed, R.
3318	               Callon, A. Malis, J. Moy, Proceedings of the IEEE,
3319	               Vol. 85, No. 12, December 1997.

3321	   [LDP]       "LDP Specification", L. Anderson, P. Doolan, N.
3322	               Feldman, A. Fredette, B. Thomas, work in progress,
3323	               <draft-ietf-mpls-ldp-04.txt>, May 1999.

3325	   [LOOP-COLOR]     "MPLS Loop Prevention Mechanism", Y. Ohba, Y.
3326	               Katsube, E. Rosen, P. Doolan, work in progress,
3327	               Internet Draft <draft-ietf-mpls-loop-prevention-
3328	               01.txt>, May 1999.

3330	   [NHRP]      "NBMA Next Hop Resolution Protocol (NHRP)", Luciani,
3331	               Katz, Piscitello, Cole, work in progress, draft-ietf-
3332	               rolc-nhrp-12.txt, March 1998.

3334	   [PNNI]      "ATM Forum Private Network-Network Interface
3335	               Specification, Version 1.0", ATM Forum af-pnni-
3336	               0055.000, March 1996.

3338	   [RFC1583]   "OSPF version 2", J. Moy, RFC 1583, March 1994.

3340	   [RFC1663]   "Integrated Services in the Internet Architecture: an
3341	               Overview", R. Braden et al, RFC 1633, June 1994.

3343	   [RFC1771]   "A Border Gateway Protocol 4 (BGP-4)", Y. Rekhter and
3344	               T. Li, RFC 1771, March 1995.

3346	   [RFC1953]   "Ipsilon Flow Management Protocol Specification for
3347	               IPv4 Version 1.0", P. Newman et al., RFC 1953, May
3348	               1996.

3350	   [RFC2098]   "Toshiba's Router Architecture Extensions for ATM:
3351	               Overview", Katsube, Nagami, Esaki, RFC2098.

3353	   [RFC2105]   "Cisco Systems' Tag Switching Architecture Overview",
3354	               Rekhter, Davie, Katz, Rosen, Swallow, RFC2105,
3355	               February, 1997.

3357	   [RSVP]      "Resource ReSerVation Protocol (RSVP), Version 1
3358	               Functional Specification", work in progress, draft-
3359	               ietf-rsvp-spec-16.txt, June 1997.

3361	   [RSVP-LSP]  "Extensions to RSVP for LSP Tunnels", D. Awduche, L.
3362	               Berger, D. Gan, T. Li, G. Swallow, V. Srinivasan, work
3363	               in progress, Internet Draft <draft-ietf-mpls-rsvp-lsp-
3364	               tunnel-02.txt>, March 1999.

3366	   [TRAFENG]   "Requirements for Traffic Engineering Over MPLS", D.
3367	               Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J.
3368	               McManus, work in progress, Internet Draft <draft-ietf-
3369	               mpls-traffic-eng-00.txt>, October 1998.

3371	Author's Addresses

3373	        Ross Callon
3374	        IronBridge Networks
3375	        55 Hayden Avenue,
3376	        Lexington, MA  02173
3377	        781-402-8017
3378	        rcallon@ironbridgenetworks.com

3380	        Paul Doolan
3381	        Ennovate Networks
3382	        330 Codman Hill Road
3383	        Boxborough, MA
3384	        978-263-2002 x103
3385	        pdoolan@ennovatenetworks.com

3387	        Nancy Feldman
3388	        IBM
3389	        30 Saw Mill River Rd.
3390	        Hawthorne NY 10532
3391	        914-784-3254
3392	        nkf@us.ibm.com

3394	        Andre Fredette
3395	        Nortel Networks
3396	        3 Federal Street
3397	        Billerica, MA  01821
3398	        978-288-8524
3399	        fredette@nortelnetworks.com

3401	        George Swallow
3402	        Cisco Systems, Inc
3403	        250 Apollo Drive
3404	        Chelmsford, MA 01824
3405	        508-244-8143
3406	        swallow@cisco.com

3408	        Arun Viswanathan
3409	        Lucent Technologies
3410	        101 Crawford Corner Rd., #4D-537
3411	        Holmdel, NJ 07733
3412	        732-332-5163
3413	        arunv@dnrc.bell-labs.com