idnits 2.17.1 

draft-ietf-mpls-framework-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-27) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 124: '... MPLS forwarding MUST simplify packet ...'
     RFC 2119 keyword, line 131: '...ore technologies MUST be general with ...'
     RFC 2119 keyword, line 133: '...imizations for particular media MAY be...'
     RFC 2119 keyword, line 136: '...ore technologies MUST be compatible wi...'
     RFC 2119 keyword, line 137: '...g protocols, and MUST be capable of op...'
     (23 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 1115 has weird spacing: '...ount of  resou...'

  == Line 1321 has weird spacing: '...er LSRs  that ...'

  == Line 1431 has weird spacing: '...ue must  have ...'

  == Line 1786 has weird spacing: '...warding  for t...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The MPLS protocol MUST not make assumptions about the forwarding
     capabilities of an MPLS node.  Thus, MPLS must propose solutions that can
     leverage the benefits of a node that is capable of L3 forwarding, but
     must not mandate the node be capable of such.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 30, 1997) is 9768 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'TAG' on line 1009

  -- Looks like a reference, but probably isn't: 'ARIS' on line 1305

  -- Looks like a reference, but probably isn't: 'RSVP' on line 1010

  -- Looks like a reference, but probably isn't: 'CSR' on line 1011

  -- Looks like a reference, but probably isn't: 'Ipsilon' on line 1011

  -- Looks like a reference, but probably isn't: 'PNNI' on line 1843

  -- Looks like a reference, but probably isn't: 'TDP' on line 1305

  -- Looks like a reference, but probably isn't: 'FANP' on line 1306

  == Unused Reference: '1' is defined on line 2342, but no explicit reference
     was found in the text

  == Unused Reference: '2' is defined on line 2346, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 2351, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 2355, but no explicit reference
     was found in the text

  == Unused Reference: '5' is defined on line 2359, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 2363, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 2366, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 2370, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 2374, but no explicit reference
     was found in the text

  == Unused Reference: '10' is defined on line 2378, but no explicit
     reference was found in the text

  == Unused Reference: '11' is defined on line 2382, but no explicit
     reference was found in the text

  == Unused Reference: '12' is defined on line 2385, but no explicit
     reference was found in the text

  == Unused Reference: '13' is defined on line 2389, but no explicit
     reference was found in the text

  == Unused Reference: '14' is defined on line 2392, but no explicit
     reference was found in the text

  == Unused Reference: '15' is defined on line 2396, but no explicit
     reference was found in the text

  == Unused Reference: '16' is defined on line 2398, but no explicit
     reference was found in the text

  == Unused Reference: '17' is defined on line 2401, but no explicit
     reference was found in the text

  == Unused Reference: '18' is defined on line 2404, but no explicit
     reference was found in the text

  == Unused Reference: '19' is defined on line 2407, but no explicit
     reference was found in the text

  -- No information found for draft-rosen-architecture - is the name correct?

  -- Possible downref: Normative reference to a draft: ref. '1' 

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  -- Possible downref: Non-RFC (?) normative reference: ref. '7'

  -- Possible downref: Non-RFC (?) normative reference: ref. '8'

  -- Possible downref: Non-RFC (?) normative reference: ref. '9'

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'

  ** Downref: Normative reference to an Informational RFC: RFC 2098 (ref.
     '11')

  -- Possible downref: Non-RFC (?) normative reference: ref. '12'

  ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref.
     '13')

  -- Unexpected draft version: The latest known version of 
     draft-ietf-rsvp-spec is -15, but you're referring to -16. (However, the
     state information for draft-rosen-architecture is not up-to-date.  The
     last update was unsuccessful)

  ** Obsolete normative reference: RFC 1583 (ref. '15') (Obsoleted by RFC
     2178)

  ** Obsolete normative reference: RFC 1771 (ref. '16') (Obsoleted by RFC
     4271)

  ** Downref: Normative reference to an Informational RFC: RFC 1953 (ref.
     '17')

  -- Possible downref: Non-RFC (?) normative reference: ref. '18'

  == Outdated reference: A later version (-14) exists of
     draft-ietf-rolc-nhrp-11


     Summary: 15 errors (**), 0 flaws (~~), 26 warnings (==), 24 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                         R. Callon
2	INTERNET DRAFT                                    Ascend Communications
3	<draft-ietf-mpls-framework-01.txt>                            P. Doolan
4	                                                          Cisco Systems
5	                                                             N. Feldman
6	                                                              IBM Corp.
7	                                                            A. Fredette
8	                                                           Bay Networks
9	                                                             G. Swallow
10	                                                          Cisco Systems
11	                                                         A. Viswanathan
12	                                                              IBM Corp.
13	                                                          July 30, 1997
14	                                                  Expires Jan. 30, 1998

16	             A Framework for Multiprotocol Label Switching

18	Status of this Memo

20	   This document is an Internet-Draft.  Internet-Drafts are working
21	   documents of the Internet Engineering Task Force (IETF), its areas,
22	   and its working groups.  Note that other groups may also distribute
23	   working documents as Internet-Drafts.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as ``work in progress.''

30	   To learn the current status of any Internet-Draft, please check the
31	   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
32	   Directories on ds.internic.net (US East Coast), nic.nordu.net
33	   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
34	   Rim).  Distribution of this memo is unlimited.

36	Abstract

38	   This document discusses technical issues and requirements for the
39	   Multiprotocol Label Switching working group. This is an initial draft
40	   document, which will evolve and expand over time. It is the intent of
41	   this document to produce a coherent description of all significant
42	   approaches which were and are being considered by the working group.
43	   Selection of specific approaches, making choices regarding
44	   engineering tradeoffs, and detailed protocol specification, are
45	   outside of the scope of this framework document.

47	   Note that this document is at an early stage, and that most of the
48	   detailed technical discussion is only in a rough form. Additional
49	   text will be provided over time from a number of sources.  A small
50	   amount of the text in this document may be redundant with the
51	   proposed protocol architecture for MPLS. This redundancy will be
52	   reduced over time, with the overall discussion of issues moved to be
53	   in this document, and the selection of specific approaches and
54	   specification of the protocol contained in the protocol architecture
55	   and other related documents.

57	Acknowledgments

59	   The ideas and text in this document have been collected from a number
60	   of sources and comments received. We would like to thank Jim Luciani,
61	   Andy Malis, Yakov Rekhter, Eric Rosen, and Vijay Srinivasan for their
62	   inputs and ideas.

64	1. Introduction and Requirements

66	1.1 Overview of MPLS

68	   The primary goal of the MPLS working group is to standardize a base
69	   technology that integrates the label swapping forwarding paradigm
70	   with network layer routing. This base technology (label swapping) is
71	   expected to improve the price/performance of network layer routing,
72	   improve the scalability of the network layer, and provide greater
73	   flexibility in the delivery of (new) routing services (by allowing
74	   new routing services to be added without a change to the forwarding
75	   paradigm).

77	   The initial MPLS effort will be focused on IPv4 and IPv6. However,
78	   the core technology will be extendible to multiple network layer
79	   protocols (e.g., IPX, Appletalk, DECnet, CLNP). MPLS is not confined
80	   to any specific link layer technology, it can work with any media
81	   over which Network Layer packets can be passed between network layer
82	   entities.

84	   MPLS makes use of a routing approach whereby the normal mode of
85	   operation is that L3 routing (e.g., existing IP routing protocols
86	   and/or new IP routing protocols) is used by all nodes to determine
87	   the routed path.

89	   MPLS provides a simple "core" set of mechanisms which can be applied
90	   in several ways to provide a rich functionality. The core effort
91	   includes:

93	   a) Semantics assigned to a stream label:

95	      - Labels are associated with specific streams of data;

97	   b) Forwarding Methods:

99	      - Forwarding is simplified by the use of short fixed length
100	        labels to identify streams

102	      - Forwarding may require simple functions such as looking up a
103	        label in a table, swapping labels, and possibly decrementing
104	        and checking a TTL.

106	      - In some cases MPLS may make direct use of underlying layer 2
107	        forwarding, such as is provided by ATM or Frame Relay
108	        equipment.

110	   c) Label Distribution Methods:

112	      - Allow nodes to determine which labels to use for specific
113	        streams

115	      - This may use some sort of control exchange, and/or be
116	        piggybacked on a routing protocol

118	   The MPLS working group will define the procedures and protocols used
119	   to assign significance to the forwarding labels and to distribute
120	   that information between cooperating MPLS forwarders.

122	1.2 Requirements

124	   - MPLS forwarding MUST simplify packet forwarding in order to do the
125	     following:

127	     - lower cost of high speed forwarding

129	     - improve forwarding performance

131	   - MPLS core technologies MUST be general with respect to data link
132	     technologies (i.e., work over a very wide range of underlying data
133	     links). Specific optimizations for particular media MAY be
134	     considered.

136	   - MPLS core technologies MUST be compatible with a wide range of
137	     routing protocols, and MUST be capable of operating independently
138	     of the underlying routing protocols. It has been observed that
139	     considerable optimizations can be achieved in some cases by small
140	     enhancements of existing protocols. Such enhancements MAY be
141	     considered in the case of IETF standard routing protocols, and if
142	     appropriate, coordinated with the relevant working group(s).

144	   - Routing protocols which are used in conjunction with MPLS might
145	     be based on distributed computation. As such, during routing
146	     transients, these protocols may compute forwarding paths which
147	     potentially contain loops. MPLS MUST provide protocol mechanisms
148	     to either prevent the formation of loops and /or contain the
149	     amount of (networking) resources that can be consumed due to the
150	     presence of loops.

152	   - MPLS forwarding MUST allow "aggregate forwarding" of user data;
153	     i.e., allow streams to be forwarded as a unit and ensure that an
154	     identified stream takes a single path, where a stream may consist
155	     of the aggregate of multiple flows of user data. MPLS SHOULD
156	     provide multiple levels of aggregation support (e.g., from
157	     individual end to end application flows at one extreme, to
158	     aggregates of all flows passing through a specified switch or
159	     router at the other extreme).

161	   - MPLS MUST support operations, administration,  and maintenance
162	     facilities at least as extensive as those supported in current IP
163	     networks. Current network management and diagnostic tools SHOULD
164	     continue to work in order to provide some backward compatibility.
165	     Where such tools are broken by MPLS, hooks MUST be supplied to
166	     allow equivalent functionality to be created.

168	   - MPLS core technologies MUST work with both unicast and multicast
169	     streams.

171	   - The MPLS core specifications MUST clearly state how MPLS operates
172	     in a hierarchical network.

174	   - Scalability issues MUST be considered and analyzed during the
175	     definition of MPLS. Very scaleable solutions MUST be sought.

177	   - MPLS core technologies MUST be capable of working with O(n) streams
178	     to switch all best-effort traffic, where n is the number of nodes
179	     in a MPLS domain. MPLS protocol standards MUST be capable of taking
180	     advantage of hardware that supports stream merging where
181	     appropriate. Note that O(n-squared) streams or VCs might also be
182	     appropriate for use in some cases.

184	   - The core set of MPLS standards, along with existing Internet
185	     standards, MUST be a self-contained solution. For example, the
186	     proposed solution MUST NOT require specific hardware features that
187	     do not commonly exist on network equipment at the time that the
188	     standard is complete. However, the solution MAY make use of
189	     additional optional hardware features (e.g., to optimize
190	     performance).

192	   - The MPLS protocol standards MUST support multipath routing and
193	     forwarding.

195	   - MPLS MUST be compatible with the IETF Integrated Services Model,
196	     including RSVP.

198	   - It MUST be possible for MPLS switches to coexist with non MPLS
199	     switches in the same switched network. MPLS switches SHOULD NOT
200	     impose additional configuration on non-MPLS switches.

202	   - MPLS MUST allow "ships in the night" operation with existing layer
203	     2 switching protocols (e.g., ATM Forum Signaling) (i.e., MPLS must
204	     be capable of being used in the same network which is also
205	     simultaneously operating standard layer 2 protocols).

207	   - The MPLS protocol MUST support both topology-driven and
208	     traffic/request-driven label assignments.

210	1.3 Terminology

212	   aggregate stream

214	     synonym of "stream"

216	   DLCI

218	     a label used in Frame Relay networks to identify frame
219	     relay circuits

221	   flow

223	     a single instance of an application to application flow
224	     of data (as in the RSVP and IFMP use of the term "flow")

226	   forwarding equivalence class

228	     a group of L3 packets which are forwarded in the same
229	     manner (e.g., over the same path, with the same
230	     forwarding treatment). A forwarding equivalence class is
231	     therefore the set of L3 packets which could safely be
232	     mapped to the same label. Note that there may be reasons
233	     that packets from a single forwarding equivalence class
234	     may be mapped to multiple labels (e.g., when stream
235	     merge is not used).

237	   frame merge

239	     stream merge, when it is applied to operation over
240	     frame based media, so that the potential problem of cell
241	     interleave is not an issue.

243	   label

245	     a short fixed length physically contiguous locally
246	     significant identifier which is used to identify a stream

248	   label information base

250	     the database of information containing label bindings

252	   label swap

254	     the basic forwarding operation consisting of looking
255	     up an incoming label to determine the outgoing label,
256	     encapsulation, port, and other data handling information.

258	   label swapping

260	     a forwarding paradigm allowing streamlined forwarding of
261	     data by using labels to identify streams of data to be
262	     forwarded.

264	   label switched hop

266	     the hop between two MPLS nodes, on which forwarding is
267	     done using labels.

269	   label switched path

271	     the path created by the concatenation of one or more label
272	     switched hops, allowing a packet to be forwarded by swapping
273	     labels from an MPLS node to another MPLS node.

275	   layer 2

277	     the protocol layer under layer 3 (which therefore offers
278	     the services used by layer 3). Forwarding, when done by the
279	     swapping of short fixed length labels, occurs at layer 2
280	     regardless of whether the label being examined is an ATM
281	     VPI/VCI, a frame relay DLCI, or an MPLS label.

283	   layer 3

285	     the protocol layer at which IP and its associated routing
286	     protocols operate

288	   link layer
289	     synonymous with layer 2

291	   loop detection

293	     a method of dealing with loops in which loops are allowed
294	     to be set up, and data may be transmitted over the loop,
295	     but the loop is later detected and closed

297	   loop prevention

299	     a method of dealing with loops in which data is never
300	     transmitted over a loop

302	   label stack

304	     an ordered set of labels

306	   loop survival

308	     a method of dealing with loops in which data may be
309	     transmitted over a loop, but means are employed to limit the
310	     amount of network resources which may be consumed by the
311	     looping data

313	   label switching router

315	     an MPLS node which is capable of forwarding native L3 packets

317	   merge point

319	     the node at which multiple streams and switched paths are
320	     combined into a single stream sent over a single path. In the
321	     case that the multiple paths are not combined prior to the
322	     egress node, then the egress node becomes the merge point.

324	   Mlabel

326	     abbreviation for MPLS label

328	   MPLS core standards

330	     the standards which describe the core MPLS technology

332	   MPLS domain

334	     a contiguous set of nodes which operate MPLS routing and
335	     forwarding and which are also in one Routing or Administrative
336	     Domain

338	   MPLS edge node

340	     an MPLS node that connects an MPLS domain with a node which
341	     is outside of the domain, either because it does not run
342	     MPLS, and/or because it is in a different domain. Note that
343	     if an LSR has a neighboring host which is not running MPLS,
344	     that that LSR is an MPLS edge node.

346	   MPLS egress node

348	     an MPLS edge node in its role in handling traffic as it
349	     leaves an MPLS domain

351	   MPLS ingress node

353	     an MPLS edge node in its role in handling traffic as it
354	     enters an MPLS domain

356	   MPLS label

358	     a label placed in a short MPLS shim header used to identify
359	     streams

361	   MPLS node

363	     a node which is running MPLS. An MPLS node will be aware of
364	     MPLS control protocols, will operate one or more L3 routing
365	     protocols, and will be capable of forwarding packets based on
366	     labels. An MPLS node may optionally be also capable of
367	     forwarding native L3 packets.

369	   MultiProtocol Label Switching

371	     an IETF working group and the effort associated with the
372	     working group

374	   network layer

376	     synonymous with layer 3

378	   shortcut VC

380	     a VC set up as a result of an NHRP query and response

382	   stack

384	     synonymous with label stack

386	   stream

388	     an aggregate of one or more flows, treated as one aggregate
389	     for the purpose of forwarding in L2 and/or L3 nodes (e.g.,
390	     may be described using a single label). In many cases a stream
391	     may be the aggregate of a very large number of flows.
392	     Synonymous with "aggregate stream".

394	   stream merge

396	     the merging of several smaller streams into a larger stream,
397	     such that for some or all of the path the larger stream can
398	     be referred to using a single label.

400	   switched path

402	     synonymous with label switched path

404	   virtual circuit

406	     a circuit used by a connection-oriented layer 2 technology
407	     such as ATM or Frame Relay, requiring the maintenance of
408	     state information in layer 2 switches.

410	   VC merge

412	     stream merge when it is specifically applied to VCs,
413	     specifically so as to allow multiple VCs to merge into one
414	     single VC

416	   VP merge

418	     stream merge when it is applied to VPs, specifically so as
419	     to allow multiple VPs to merge into one single VP. In this
420	     case the VCIs need to be unique. This allows cells from
421	     different sources to be distinguished via the VCI.

423	   VPI/VCI

425	     a label used in ATM networks to identify circuits

427	1.4 Acronyms and Abbreviations

429	   DLCI            Data Link Circuit Identifier

431	   FEC             Forwarding Equivalence Class

433	   ISP             Internet Service Provider
434	   LIB             Label Information Base

436	   LDP             Label Distribution Protocol

438	   L2              Layer 2

440	   L3              Layer 3

442	   LSP             Label Switched Path

444	   LSR             Label Switching Router

446	   MPLS            MultiProtocol Label Switching

448	   MPT             Multipoint to Point Tree

450	   NHC             Next Hop (NHRP) Client

452	   NHS             Next Hop (NHRP) Server

454	   VC              Virtual Circuit

456	   VCI             Virtual Circuit Identifier

458	   VPI             Virtual Path Identifier

460	2. Discussion of Core MPLS Components

462	2.1 The Basic Routing Approach

464	   Routing is accomplished through the use of standard L3 routing
465	   protocols, such as OSPF and BGP.  The information maintained by the
466	   L3 routing protocols is then used to distribute labels to neighboring
467	   nodes that are used in the forwarding of packets as described below.
468	   In the case of ATM networks, the labels that are distributed are
469	   VPI/VCIs and a separate protocol (i.e., PNNI) is not necessary for
470	   the establishment of VCs for IP forwarding.

472	   The topological scope of a routing protocol (i.e. routing domain) and
473	   the scope of label switching MPLS-capable nodes may be different.
474	   For example, MPLS-knowledgeable and MPLS-ignorant nodes, all of which
475	   are OSPF routers, may be co-resident in an area. In the case that
476	   neighboring routers know MPLS, labels can be exchanged and used.

478	   Neighboring MPLS routers may use configured PVCs or PVPs to tunnel
479	   through non-participating ATM or FR switches.

481	2.2 Labels

483	   In addition to the single routing protocol approach discussed above,
484	   the other key concept in the basic MPLS approach is the use of short
485	   fixed length labels to simply user data forwarding.

487	2.2.1 Label Semantics

489	   It is important that the MPLS solutions are clear about what
490	   semantics (i.e., what knowledge of the state of the network) is
491	   implicit in the use of labels for forwarding user data packets or
492	   cells.

494	   At the simplest level, a label may be thought of as nothing more than
495	   a shorthand for the packet header, in order to index the forwarding
496	   decision that a router would make for the packet. In this context,
497	   the label is nothing more than a shorthand for an aggregate stream of
498	   user data.

500	   This observation leads to one possible very simple interpretation
501	   that the "meaning" of the label is a strictly local issue between two
502	   neighboring nodes. With this interpretation: (i) MPLS could be
503	   employed between any two neighboring nodes for forwarding of data
504	   between those nodes, even if no other nodes in the network
505	   participate in MPLS; (ii) When MPLS is used between more than two
506	   nodes, then the operation between any two neighboring nodes could be
507	   interpreted as independent of the operation between any other pair of
508	   nodes. This approach has the advantage of semantic simplicity, and of
509	   being the closest to pure datagram forwarding. However this approach
510	   (like pure datagram forwarding) has the disadvantage that when a
511	   packet is forwarded it is not known whether the packet is being
512	   forwarded into a loop, into a black hole, or towards links which have
513	   inadequate resources to handle the traffic flow. These disadvantages
514	   are necessary with pure datagram forwarding, but are optional design
515	   choices to be made when label switching is being used.

517	   There are cases where it would be desirable to have additional
518	   knowledge implicit in the existence of the label. For example, one
519	   approach to avoiding loops (see section x.x below) involves signaling
520	   the label distribution along a path before packets are forwarded on
521	   that path. With this approach the fact that a node has a label to use
522	   for a particular IP packet would imply the knowledge that following
523	   the label (including label swapping at subsequent nodes) leads to a
524	   non-looping path which makes progress towards the destination
525	   (something which is usually, but not necessarily always true when
526	   using pure datagram routing). This would of course require some sort
527	   of label distribution/setup protocol which signals along the path
528	   being setup before the labels are available for packet forwarding.

530	   However, there are also other consequences to having additional
531	   semantics associated with the label: specifically, procedures are
532	   needed to ensure that the semantics are correct. For example, if the
533	   fact that you have a label for a particular destination implies that
534	   there is a loop-free path, then when the path changes some procedures
535	   are required to ensure that it is still loop free. Another example of
536	   semantics which could be implicit in a label is the identity of the
537	   higher level protocol type which is encoded using that label value.

539	   In either case, the specific value of a label to use for a stream is
540	   strictly a local issue; however the decision about whether to use the
541	   label may be based on some global (or at least wider scope) knowledge
542	   that, for example, the label-switched path is loop-free and/or has
543	   the appropriate resources.

545	   A similar example occurs in ATM networks: With standard ATM a
546	   signaling protocol is used which both reserves resources in switches
547	   along the path, and which ensures that the path is loop-free and
548	   terminates at the correct node. Thus implicit in the fact that an ATM
549	   node has a VPI/VCI for forwarding a particular piece of data is the
550	   knowledge that the path has been set up successfully.

552	   Another similar examples occurs with multipoint to point trees over
553	   ATM (see section xx below), where the multipoint to point tree uses a
554	   VP, and cell interleave at merge points in the tree is handled by
555	   giving each source on the tree a distinct VCI within the VP. In this
556	   case, the fact that each source has a known VPI/VCI to use needs to
557	   (implicitly or explicitly) imply the knowledge that the VCI assigned
558	   to that source is unique within the context of the VP.

560	   In general labels are used to optimize how the system works, not to
561	   control how the system works. For example, the routing protocol
562	   determines the path that a packet follows. The presence or absence of
563	   a label assignment should not effect the path of a L3 packet. Note
564	   however that the use of labels may make capabilities such as explicit
565	   routes, loadsharing, and multipath more efficient.

567	2.2.2 Label Granularity

569	   Labels are used to create a simple forwarding paradigm.  The
570	   essential element in assigning a label is that the device which will
571	   be using the label to forward packets will be forwarding all packets
572	   with the same label in the same way.  If the packet is to be
573	   forwarded solely by looking at the label, then at a minimum, all
574	   packets with the same incoming label must be forwarded out the same
575	   port(s) with the same encapsulation(s), and with the same next hop
576	   label (if any).

578	   The term "forwarding equivalence class" is used to refer to a set of
579	   L3 packets which are all forwarded in the same manner by a particular
580	   LSR (for example, the IP packets in a forwarding equivalence class
581	   may be destined for the same egress from an MPLS network, and may be
582	   associated with the same QoS class). A forwarding equivalence class
583	   is therefore the set of L3 packets which could safely be mapped to
584	   the same label. Note that there may be reasons that packets from a
585	   single forwarding equivalence class may be mapped to multiple labels
586	   (e.g., when stream merge is not used).

588	   Note that the label could also mean "ignore this label and forward
589	   based on what is contained within," where within one might find a
590	   label (if a stack of labels is used) or a layer 3 packet.

592	   For IP unicast traffic, the granularity of a label allows various
593	   levels of aggregation in a Label Information Base (LIB).  At one end
594	   of the spectrum, a label could represent a host route (i.e. the full
595	   32 bits of IP address).  If a router forwards an entire CIDR prefix
596	   in the same way, it may choose to use one label to represent that
597	   prefix.  Similarly if the router is forwarding several (otherwise
598	   unrelated) CIDR prefixes in the same way it may choose to use the
599	   same label for this set of prefixes.  For instance all CIDR prefixes
600	   which share the same BGP Next Hop could be assigned the same label.
601	   Taking this to the limit, an egress router may choose to advertise
602	   all of its prefixes with the same label.

604	   By introducing the concept of an egress identifier, the distribution
605	   of labels associated with groups of CIDR prefixes can be simplified.
606	   For instance, an egress identifier might specify the BGP Next Hop,
607	   with all prefixes routed to that next hop receiving the label
608	   associated with that egress identifier.  Another natural place to
609	   aggregate would be the MPLS egress router.  This would work
610	   particularly well in conjunction with a link-state routing protocol,
611	   where the association between egress router and CIDR prefix is
612	   already distributed throughout an area.

614	   For IP multicast, the natural binding of a label would be to a
615	   multicast tree, or rather to the branch of a tree which extends from
616	   a particular port.  Thus for a shared tree, the label corresponds to
617	   the multicast group, (*,G).  For (S,G) state, the label would
618	   correspond to the source address and the multicast group.

620	   A label can also have a granularity finer than a host route.  That
621	   is, it could be associated with some combination of source and
622	   destination address or other information within the packet.  This
623	   might for example be done on an administrative basis to aid in
624	   effecting policy.  A label could also correspond to all packets which
625	   match a particular Integrated Services filter specification.

627	   Labels can also represent explicit routes.  This use is semantically
628	   equivalent to using an IP tunnel with a complete explicit route. This
629	   is discussed in more detail in section 4.10.

631	2.2.3 Label Assignment

633	   Essential to label switching is the notion of binding between a label
634	   and Network Layer routing (routes).  A control component is
635	   responsible for creating label bindings, and then distributing the
636	   label binding information among label switches. Label assignment
637	   involves allocating a label, and then binding a label to a route.

639	   Label assignment can be driven by control traffic or by data traffic.
640	   This is discussed in more detail in section 3.4.

642	   Control traffic driven label assignment has several advantages, as
643	   compared to data traffic driven label Assignment. For one thing, it
644	   minimizes the amount of additional control traffic needed to
645	   distribute label binding information, as label binding information is
646	   distributed only in response to control traffic, independent of data
647	   traffic. It also makes the overall scheme independent of and
648	   insensitive to the data traffic profile/pattern. Control traffic
649	   driven creation of label binding improves forwarding latency, as
650	   labels are assigned before data traffic arrives, rather than being
651	   assigned as data traffic arrives. It also simplifies the overall
652	   system behavior, as the control plane is controlled solely by control
653	   traffic, rather than by a mix of control and data traffic.

655	   There are however situations where data traffic driven label
656	   assignment is necessary.  A particular case may occur with ATM
657	   without VP or VC merge. In this case in order to set up a full mesh
658	   of VCs would require n-squared VCs. However, in very large networks
659	   this may be infeasible. Instead VCs may be setup where required for
660	   forwarding data traffic. In this case it is generally not possible to
661	   know a priori how many such streams may occur.

663	   Label withdrawal is required with both control-driven and data-driven
664	   label assignment. Label withdrawal is primarily a matter of garbage
665	   collection, that is collecting up unused labels so that they may be
666	   reassigned.  Generally speaking, a label should be withdrawn when the
667	   conditions that allowed it to be assigned are no longer true. For
668	   example, if a label is imbued with extra semantics such as loop-free-
669	   ness, then the label must be withdrawn when those extra semantics
670	   cease to hold.

672	   In certain cases, notably multicast, it may be necessary to share a
673	   label space between multiple entities.  If these sharing arrangements
674	   are altered by the coming and going of neighbors, then labels which
675	   are no longer controlled by an entity must be withdrawn and a new
676	   label assigned.

678	2.2.4 Label Stack and Forwarding Operations

680	   The basic forwarding operation consists of looking up the incoming
681	   label to determine the outgoing label, encapsulation, port, and any
682	   additional information which may pertain to the stream such as a
683	   particular queue or other QoS related treatment.  We refer to this
684	   operation as a label swap.

686	   When a packet first enters an MPLS domain, the packet is forwarded by
687	   normal layer 3 forwarding operations with the exception that the
688	   outgoing encapsulation will now include a label.  We refer to this
689	   operation as a label push.  When a packet leaves an MPLS domain, the
690	   label is removed.  We refer to this as a label pop.

692	   In some situations, carrying a stack of labels is useful.  For
693	   instance both IGP and BGP label could be used to allow routers in the
694	   interior of an AS to be free of BGP information.  In this scenario,
695	   the "IGP" label is used to steer the packet through the AS and the
696	   "BGP" label is used to switch between ASes.

698	   With a label stack, the set of label operations remains the same,
699	   except that at some points one might push or pop multiple labels, or
700	   pop & swap, or swap & push.

702	2.3 Encapsulation

704	   Label-based forwarding makes use of various pieces of information,
705	   including a label or stack of labels, and possibly additional
706	   information such as a TTL field. In some cases this information may
707	   be encoded using an MPLS header, in other cases this information may
708	   be encoded in L2 headers. Note that there may be multiple types of
709	   MPLS headers. For example, the header used over one media type may be
710	   different than is used over a different media type. Similarly, in
711	   some cases the information that MPLS makes use of may be encoded in
712	   an ATM header. We will use the term "MPLS encapsulation" to refer to
713	   whatever form is used to encapsulate the label information and other
714	   information used for label based forwarding. The term "MPLS header"
715	   will be used where this information is carried in some sort of MPLS-
716	   specific header (i.e., when the MPLS information cannot all be
717	   carried in a L2 header). Whether there is one or multiple forms of
718	   possible MPLS headers is also outside of the scope of this document.

720	   The exact contents of the MPLS encapsulation is outside of the scope
721	   of this document. Some fields, such as the label, are obviously
722	   needed. Some others might or might not be standardized, based on
723	   further study. An encapsulation scheme may make use of the following
724	   fields:

726	     -  label
727	     -  TTL
728	     -  class of service
729	     -  stack indicator
730	     -  next header type indicator
731	     -  checksum

733	   It is desirable to have a very short encapsulation header.  For
734	   example, a four byte encapsulation header adds to the convenience of
735	   building a hardware implementation that forwards based on the
736	   encapsulation header. But at the same time it is tricky assigning
737	   such a limited number of bits to carry the above listed information
738	   in an MPLS header. Hence careful consideration must be given to the
739	   information chosen for an MPLS header.

741	   A TTL value in the MPLS header may be useful in the same manner as it
742	   is in IP. Specifically, TTL may be used to terminate packets caught
743	   in a routing loop, and for other related uses such as traceroute. The
744	   TTL mechanism is a simple and proven method of handling such events.
745	   Another use of TTL is to expire packets in a network by limiting
746	   their "time to live" and eliminating stale packets that may cause
747	   problems for some of the higher layer protocols. When used over link
748	   layers which do not provide a TTL field, alternate mechanisms will be
749	   needed to replace the uses of the TTL field.

751	   A provision for a class of service (COS) field in the MPLS header
752	   allows multiple service classes within the same label.  However, when
753	   more sophisticated QoS is associated with a label, the COS may not
754	   have any significance.  Alternatively, the COS (like QoS) can be left
755	   out of the header, and instead propagated with the label assignment,
756	   but this entails that a separate label be assigned to each required
757	   class of service.  Nevertheless, the COS mechanism provides a simple
758	   method of segregating flows within a label.

760	   As previously mentioned, the encapsulation header can be used to
761	   derive benefits of tunneling (or stacking).

763	   The MPLS header must provide a way to indicate that multiple MPLS
764	   headers are stacked (i.e., the "stack indicator").  For this purpose
765	   a single bit in the MPLS header will suffice. In addition, there are
766	   also some benefits to indicating the type of the protocol header
767	   following the MPLS header (i.e., the "next header type indicator").
768	   One option would be to combine the stack indicator and next header
769	   type indicator into a single value (i.e., the next header type
770	   indicator could be allowed to take the value "MPLS header"). Another
771	   option is to have the next header type indicator be implicit in the
772	   label value (such that this information would be propagated along
773	   with the label).

775	   There is no compelling reason to support a checksum field in the MPLS
776	   header. A CRC mechanism at the L2 layer should be sufficient to
777	   ensure the integrity of the MPLS header.

779	3. Observations, Issues and Assumptions

781	3.1 Layer 2 versus Layer 3 Forwarding

783	   MPLS uses L2 forwarding as a way to provide simple and fast packet
784	   forwarding capability.  One primary reason for the simplicity of L2
785	   layer forwarding comes from its short, fixed length labels.  A node
786	   forwarding at L3 must parse a (relatively) large header, and perform
787	   a longest-prefix match to determine a forwarding path.  However, when
788	   a node performs L2 label swapping, and labels are assigned properly,
789	   it can do a direct index lookup into its forwarding (or in this case,
790	   label-swapping) table with the short header. It is arguably simpler
791	   to build label swapping hardware than it is to build L3 forwarding
792	   hardware because the label swapping function is less complex.

794	   The relative performance of L2 and L3 forwarding may differ
795	   considerably between nodes. Some nodes may illustrate an order of
796	   magnitude difference. Other nodes (for example, nodes with more
797	   extensive L3 forwarding hardware) may have identical performance at
798	   L2 and L3. However, some nodes may not be capable of doing a L3
799	   forwarding at all (e.g. ATM), or have such limited capacity as to be
800	   unusable at L3.  In this situation, traffic must be blackholed if no
801	   switched path exists.

803	   On nodes in which L3 forwarding is slower than L2 forwarding, pushing
804	   traffic to L3 when no L2 path is available may cause congestion. In
805	   some cases this could cause data loss (since L3 may be unable to keep
806	   up with the increased traffic). However, if data is discarded, then
807	   in general this will cause TCP to backoff, which would allow control
808	   traffic, traceroute and other network management tools to continue to
809	   work.

811	   The MPLS protocol MUST not make assumptions about the forwarding
812	   capabilities of an MPLS node.  Thus, MPLS must propose solutions that
813	   can leverage the benefits of a node that is capable of L3 forwarding,
814	   but must not mandate the node be capable of such.

816	   Why We Will Still Need L3 Forwarding:

818	   MPLS will not, and is not intended to, replace L3 forwarding. There
819	   is absolutely a need for some systems to continue to forward IP
820	   packets using normal Layer 3 IP forwarding. L3 forwarding will be
821	   needed for a variety of reasons, including:

823	     -  For scaling; to forward on a finer granularity than the labels
824	        can provide
825	     -  For security; to allow packet filtering at firewalls.
826	     -  For forwarding at the initial router (when hosts don't do MPLS)

828	   Consider a campus network which is serving a small company. Suppose
829	   that this companies makes use of the Internet, for example as a
830	   method of communicating with customers. A customer on the other side
831	   of the world has an IP packet to be forwarded to a particular system
832	   within the company. It is not reasonable to expect that the customer
833	   will have a label to use to forward the packet to that specific
834	   system. Rather, the label used for the "first hop" forwarding might
835	   be sufficient to get the packet considerably closer to the
836	   destination. However, the granularity of the labels cannot be to
837	   every host worldwide. Similarly, routing used within one routing
838	   domain cannot know about every host worldwide. This implies that in
839	   may cases the labels assigned to a particular packet will be
840	   sufficient to get the packet close to the destination, but that at
841	   some points along the path of the packet the IP header will need to
842	   be examined to determine a finer granularity for forwarding that
843	   packet. This is particularly likely to occur at domain boundaries.

845	   A similar point occurs at the last router prior to the destination
846	   host. In general, the number of hosts attached to a network is likely
847	   to be great enough that it is not feasible to assign a separate label
848	   to every host. Rather, as least for routing within the destination
849	   routing domain (or the destination area if there is a hierarchical
850	   routing protocol in use) a label may be assigned which is sufficient
851	   to get the packet to the last hop router. However, the last hop
852	   router will need to examine the IP header (and particularly the
853	   destination IP address) in order to forward the packet to the correct
854	   destination host.

856	   Packet filtering at firewalls is an important part of the operation
857	   of the Internet. While the current state of Internet security may be
858	   considerably less advanced than may be desired, nonetheless some
859	   security (as is provided by firewalls) is much better than no
860	   security. We expect that packet filtering will continue to be
861	   important for the foreseeable future. Packet filtering requires
862	   examination of the contents of the packet, including the IP header.
863	   This implies that at firewalls the packet cannot be forwarded simply
864	   by considering the label associated with the packet. Note that this
865	   is also likely to occur at domain boundaries.

867	   Finally, it is very likely that many hosts will not implement MPLS.
868	   Rather, the host will simply forward an IP packet to its first hop
869	   router. This first hop router will need to examine the IP header
870	   prior to forwarding the packet (with or without a label).

872	3.2 Scaling Issues

874	   MPLS scalability is provided by two of the principles of routing.
875	   The first is that forwarding follows an inverted tree rooted at a
876	   destination.  The second is that the number of destinations is
877	   reduced by routing aggregation.

879	   The very nature of IP forwarding is a merged multipoint-to-point
880	   tree. Thus, since MPLS mirrors the IP network layer, an MPLS node
881	   that is capable of merging is capable of creating O(n) switched paths
882	   which provide network reachability to all "n" destinations.  The
883	   meaning of "n" depends on the granularity of the switched paths.  One
884	   obvious choice of "n" is the number of CIDR prefixes existing in the
885	   forwarding table (this scales the same as today's routing). However,
886	   the value of "n" may be reduced considerably by choosing switched
887	   paths of further aggregation. For example, by creating switched paths
888	   to each possible egress node, "n" may represent the number of egress
889	   nodes in a network. This choice creates "n" switched paths, such that
890	   each path is shared by all CIDR prefixes that are routed through the
891	   same egress node.  This selection greatly improves scalability, since
892	   it minimizes "n", but at the same time maintains the same switching
893	   performance of CIDR aggregation. (See section 2.2.2 for a description
894	   of all of the levels of granularity provided by MPLS).

896	   The MPLS technology must scale at least as well as existing
897	   technology. For example, if the MPLS technology were to support ONLY
898	   host-to-host switched path connectivity, then the number of
899	   switched-paths would be much higher than the number of routing table
900	   entries.

902	   There are several ways in which merging can be done in order to allow
903	   O(n) switches paths to connect n nodes. The merging approach used has
904	   an impact on the amount of state information, buffering, delay
905	   characteristics, and the means of control required to coordinate the
906	   trees. These issues are discussed in more detail in section 4.2.

908	   There are some cases in which O(n-squared) switched paths may be used
909	   (for example, by setting up a full mesh of point to point streams).
910	   As label space and the amount of state information that can be
911	   supported may be limited, it will not be possible to support O(n-
912	   squared) switched paths in very large networks. However, in some
913	   cases the use of n- squared paths may even be a advantage (for
914	   example, to allow load- splitting of individual streams).

916	   MPLS must be designed to scale for O(n). O(n) scaling allows MPLS
917	   domains to scale to a very large scale. In addition, if best effort
918	   service can be supported with O(n) scaling, this conserves resources
919	   (such as label space and state information) which can be used for
920	   supporting advanced services such as QoS. However, since some
921	   switches may not support merging, and some small networks may not
922	   require the scaling benefits of O(n), provisions must also be
923	   provided for a non- merging, O(n-squared) solution.

925	   Note: A precise and complete description of scaling would consider
926	   that there are multiple dimensions of scaling, and multiple resources
927	   whose usage may be considered. Possible dimensions of scaling
928	   include: (i) the total number of streams which exist in an MPLS
929	   domain (with associated labels assigned to them); (ii) the total
930	   number of "label swapping pairs" which may be stored in the nodes of
931	   the network (ie, entries of the form "for incoming label 'x', use
932	   outgoing label 'y'"); (iii) the number of labels which need to be
933	   assigned for use over a particular link; (iv) The amount of state
934	   information which needs to be maintained by any one node. We do not
935	   intend to perform a complete analysis of all possible scaling issues,
936	   and understand that our use of the terms "O(n)" and "O(n-squared)" is
937	   approximate only.

939	3.3 Types of Streams

941	   Switched paths in the MPLS network can be of different types:

943	     -  point-to-point
944	     -  multipoint-to-point
945	     -  point-to-multipoint
946	     -  multipoint-to-multipoint

948	   Two of the factors that determine which type of switched path is used
949	   are (i) The capability of the switches employed in a network; (ii)
950	   The purpose of the creation of a switched path; that is, the types of
951	   flows to be carried in the switched path.  These two factor also
952	   determine the scalability of a network in terms of the number of
953	   switched paths in use for transporting data through a network.

955	   The point-to-point switched path can be used to connect all ingress
956	   nodes to all the egress nodes to carry unicast traffic.  In this
957	   case, since an ingress node has point-to-point connections to all the
958	   egress nodes, the number of connections in use for transporting
959	   traffic is of O(n-squared), where n is the number of edges MPLS
960	   devices.  For small networks the full mesh connection approach may
961	   suffice and not pose any scalability problems.  However, in large
962	   enterprise backbone or ISP networks, this will not scale well.

964	   Point-to-point switched paths may be used on a host-to-host or
965	   application to application basis (e.g., a switched path per RSVP
966	   flow). The dedicated point-to-point switched path transports the
967	   unicast data from the ingress to the egress node of the MPLS network.
968	   This approach may be used for providing QoS services or for best-
969	   effort traffic.

971	   A multipoint-to-point switched path connects all ingress nodes to an
972	   single egress node. At a given intermediate node in the multipoint-
973	   to- point switched path, L2 data units from several upstream links
974	   are "merged" into a single label on a downstream link.  Since each
975	   egress node is reachable via a single multipoint-to-point switched
976	   path, the number of switched paths required to transport best-effort
977	   traffic through a MPLS network is O(n), where n is the number of
978	   egress nodes.

980	   The point-to-multipoint switched path is used for distributing
981	   multicast traffic. This switched path tree mirrors the multicast
982	   distribution tree as determined by the multicast routing protocols.
983	   Typically a switch capable of point-to-multipoint connection
984	   replicates an L2 data unit from the incoming (parent) interface to
985	   all the outgoing (child) interfaces. Standard ATM switches support
986	   such functionality in the form of point-to-multipoint VCs or VPs.

988	   A multipoint-to-multipoint switched path may be used to combine
989	   multicast traffic from multiple sources into a single multicast
990	   distribution tree.  The advantage of this is that the multipoint-to-
991	   multipoint switched path is shared by multiple sources. Conceptually,
992	   a form of multipoint-to-multipoint can be thought of as follows:
993	   Suppose that you have a point to multipoint VC from each node to all
994	   other nodes. Suppose that any point where two or more VCs happen to
995	   merge, you merge them into a single VC or VP. This would require
996	   either coordination of VCI spaces (so that each source has a unique
997	   VCI within a VP) or VC merge capabilities. The applicability of
998	   similar concepts to MPLS is FFS.

1000	3.4 Data Driven versus Control Traffic Driven Label Assignment

1002	   A fundamental concept in MPLS is the association of labels and
1003	   network layer routing. Each LSR must assign labels, and distribute
1004	   them to its forwarding peers, for traffic which it intends to forward
1005	   by label swapping.  In the various contributions that have been made
1006	   so far to the MPLS WG we identify three broad strategies for label
1007	   assignment; (i) those driven by topology based control traffic

1009	   [TAG][ARIS][IP navigator]; (ii) Those driven by request based control
1010	   traffic [RSVP]; and (iii) those driven by data traffic
1011	   [CSR][Ipsilon].

1013	   We also note that in actual practice combinations of these methods
1014	   may be employed. One example is that topology based methods for best
1015	   effort traffic plus request based methods for support of RSVP.

1017	3.4.1 Topology Driven Label Assignment

1019	   In this scheme labels are assigned in response to normal processing
1020	   of routing protocol control traffic. Examples of such control
1021	   protocols are OSPF and  BGP. As an LSR processes OSPF or BGP updates
1022	   it can, as it makes or changes entries in its forwarding tables,
1023	   assign labels to those entries.

1025	   Among the properties of this scheme are:

1027	   - The computational load of assignment and distribution and the
1028	     bandwidth consumed by label distribution are bounded by the size
1029	     of the network.

1031	   - Labels are in the general case preassigned. If a route exists then
1032	     a label has been assigned to it (and distributed). Traffic may be
1033	     label swapped immediately it arrives, there is no label setup
1034	     latency at forwarding time.

1036	   - Requires LSRs to be able to process control traffic load only.

1038	   - Labels assigned in response to the operation of routing protocols
1039	     can have a granularity equivalent to that of the routes advertised
1040	     by the protocol. Labels can, by this means, cover (highly)
1041	     aggregated routes.

1043	3.4.2 Request Driven Label Assignment

1045	   In this scheme labels are assigned in response to normal processing
1046	   of request based control traffic. Examples of such control protocols
1047	   are RSVP. As an LSR processes RSVP messages it can, as it makes or
1048	   changes entries in its forwarding tables, assign labels to those
1049	   entries.

1051	   Among the properties of this scheme are:

1053	   - The computational load of assignment and distribution and the
1054	     bandwidth consumed by label distribution are bounded by the
1055	     amount of control traffic in the system.

1057	   - Labels are in the general case preassigned. If a route exists
1058	     then a label has been assigned to it (and distributed). Traffic
1059	     may be label swapped immediately it arrives, there is no label
1060	     setup latency at forwarding time.

1062	   - Requires LSRs to be able to process control traffic load only.

1064	   - Depending upon the number of flows supported, this approach may
1065	     require a larger number of labels to be assigned compared with
1066	     topology driven assignment.

1068	   - This approach requires applications to make use of request
1069	     paradigm in order to get a label assigned to their flow.

1071	3.4.3 Traffic Driven Label Assignment

1073	   In this scheme the arrival of data at an LSR "triggers" label
1074	   assignment and distribution. Traffic driven approach has the
1075	   following characteristics.

1077	   - Label assignment and distribution costs are a function of
1078	     traffic patterns. In an LSR with limited label space that is
1079	     using a traffic driven approach to amortize its labels over a
1080	     larger number of flows the overhead due to label assignment
1081	     and distribution grows as a function of the number of flows
1082	     and as a function of their "persistence". Short lived but
1083	     recurring flows may impose a heavy control burden.

1085	   - There is a latency associated with the appearance of a "flow"
1086	     and the assignment of a label to it. The documented approaches
1087	     to this problem suggest L3 forwarding during this setup phase,
1088	     this has the potential for packet reordering (note that packet
1089	     reordering may occur with any scheme when the network topology
1090	     changes, but traffic driven label assignment introduces another
1091	     cause for reordering).

1093	   - Flow driven label assignment requires high performance packet
1094	     classification capabilities.

1096	   - Traffic driven label assignment may be useful to reduce label
1097	     consumption (assuming that flows are not close to full mesh).

1099	   - If you want flows to hosts, due to limits on label space, then
1100	     traffic based label consumption is probably necessary due to
1101	     the large number of hosts which may occur in a network.

1103	   - If you want to assign specific network resources to specific
1104	     labels, to be used for support of application flows, then
1105	     again the fine grain associated with labels may require data
1106	     based label assignment.

1108	3.5 The Need for Dealing with Looping

1110	   Routing protocols which are used in conjunction with MPLS will in
1111	   many cases be based on distributed computation. As such, during
1112	   routing transients, these protocols may compute forwarding paths
1113	   which contain loops. For this reason MPLS will be designed with
1114	   mechanisms to either prevent the formation of loops and /or contain
1115	   the amount of  resources that can be consumed due to the presence of
1116	   loops.

1118	   Note that there are a number of different alternative mechanisms
1119	   which have been proposed (see section 4.3). Some of these prevent the
1120	   formation of layer 2 forwarding loops, others allow loops to form but
1121	   minimize their impact in one way or another (e.g., by discarding
1122	   packets which loop, or by detecting and closing the loop after a
1123	   period of time). Generally speaking, there are tradeoffs to be made
1124	   between the amount of looping which might occur, and other
1125	   considerations such as the time to convergence after a change in the
1126	   paths computed by the routing algorithm.

1128	   We are not proposing any changes to normal layer 3 operation, and
1129	   specifically are not trying to eliminate the possibility of looping
1130	   at layer 3. Transient loops will continue to be possible in IP
1131	   networks. Note that IP has a means to limit the damage done by
1132	   looping packets, based on decrementing the IP TTL field as the packet
1133	   is forwarded, and discarding packets whose TTL has expired. Dynamic
1134	   routing protocols used with IP are also designed to minimize the
1135	   amount of time during which loops exist.

1137	   The question that MPLS has to deal with is what to do at L2. In some
1138	   cases L2 may make use of the same method that is used as L3. However,
1139	   other options are available at L2, and in some cases (specifically
1140	   when operating over ATM or Frame Relay hardware) the method of
1141	   decrementing a TTL field (or any similar field) is not available.

1143	   There are basically two problems caused by packet looping: The most
1144	   obvious problem is that packets are not delivered to the correct
1145	   destination. The other result of looping is congestion. Even with TTL
1146	   decrementing and packet discard, there may still be a significant
1147	   amount of time that packets travel through a loop. This can adversely
1148	   affect other packets which are not looping: Congestion due to the
1149	   looping packets can cause non-looping packets to be delayed and/or
1150	   discarded.

1152	   Looping is particularly serious in (at least) three cases: One is
1153	   when forwarding over ATM. Since ATM does not have a TTL field to
1154	   decrement, there is no way to discard ATM cells which are looping
1155	   over ATM subnetworks.  Standard ATM PNNI routing and signaling solves
1156	   this problem by making use of call setup procedures which ensure that
1157	   ATM VCs will never be setup in a loop [PNNI]. However, when MPLS is
1158	   used over ATM subnets, the native ATM routing and signaling
1159	   procedures may not be used for the full L2 path. This leads to the
1160	   possibility that MPLS over ATM might in principle allow packets to
1161	   loop indefinitely, or until L3 routing stabilizes. Methods are needed
1162	   to prevent this problem.

1164	   Another case in which looping can be particularly unpleasant is for
1165	   multicast traffic. With multicast, it is possible that the packet may
1166	   be delivered successfully to some destinations even though copies
1167	   intended for other destinations are looping. This leads to the
1168	   possibility that huge numbers of identical packets could be delivered
1169	   to some destinations. Also, since multicast implies that packets are
1170	   duplicated at some points in their path, the congestion resulting
1171	   from looping packets may be particularly severe.

1173	   Another unpleasant complication of looping occurs if the congestion
1174	   caused by the loop interferes with the routing protocol. It is
1175	   possible for the congestion caused by looping to cause routing
1176	   protocol control packets to be discarded, with the result that the
1177	   routing protocol becomes unstable. For example this could lengthen
1178	   the duration of the loop.

1180	   In normal operation of IP networks the impact of congestion is
1181	   limited by the fact that TCP backs off (i.e., transmits substantially
1182	   less traffic) in response to lost packets. Where the congestion is
1183	   caused by looping, the combination of TTL and the resulting discard
1184	   of looping packets, plus the reduction in offered traffic, can limit
1185	   the resulting impact on the network. TCP backoff however does not
1186	   solve the problem if the looping packets are not discarded (for
1187	   example, if the loop is over an ATM subnetwork where TTL is not
1188	   used).

1190	   The severity of the problem caused by looping may depend upon
1191	   implementation details. Suppose, for instance, that ATM switching
1192	   hardware is being used to provide MPLS switching functions. If the
1193	   ATM hardware has per-VC queuing, and if it is capable of providing
1194	   fair access to the buffer pool for incoming cells based on the
1195	   incoming VC (so that no one incoming VC is allowed to grab a
1196	   disproportionate number of buffers), this looping might not have a
1197	   significant effect on other traffic. If the ATM hardware cannot
1198	   provide fair buffer access of this sort, however, then even transient
1199	   loops may cause severe degradation of the node's total performance.

1201	   Given that MPLS is a relatively new approach, it is possible that
1202	   looping may have consequences which are not fully understood (such as
1203	   looping of LDP control information in cases where stream merge is not
1204	   used).

1206	   Even if fair buffer access can be provided, it is still worthwhile to
1207	   have some means of detecting loops that last "longer than possible".
1208	   In addition, even where TTL and/or per-VC fair queuing provides a
1209	   means for surviving loops, it still may be desirable where practical
1210	   to avoid setting up LSPs which loop.

1212	   Methods for dealing with loops are discussed in section 4.3.

1214	3.6 Operations and Management

1216	   Operations and management of networks is critically important. This
1217	   implies that MPLS must support operations, administration,  and
1218	   maintenance facilities at least as extensive as those supported in
1219	   current IP networks.

1221	   In most ways this is a relatively simple requirement to meet. Given
1222	   that all MPLS nodes run normal IP routing protocols, it is
1223	   straightforward to expect them to participate in normal IP network
1224	   management protocols.

1226	   There is one issue which has been identified and which needs to be
1227	   addressed by the MPLS effort: There is an issue with regard to
1228	   operation of Traceroute over MPLS networks. Note that other O&M
1229	   issues may be identified in the future.

1231	   Traceroute is a very commonly used network management tool.
1232	   Traceroute is based on use of the TTL field: A station trying to
1233	   determine the route from itself to a specified address transmits
1234	   multiple IP packets, with the TTL field set to 1 in the first packet,
1235	   2 in the second packet, etc.. This causes each router along the path
1236	   to send back an ICMP error report for TTL exceeded. This in turn
1237	   allows the station to determine the set of routers along the route.
1238	   For example, this can be used to determine where a problem exists (if
1239	   no router responds past some point, the last router which responds
1240	   can become the starting point for a search to determine the cause of
1241	   the problem).

1243	   When MPLS is operating over ATM or Frame Relay networks there is no
1244	   TTL field to decrement (and ATM and Frame Relay forwarding hardware
1245	   does not decrement TTL). This implies that it is not straightforward
1246	   to have Traceroute operate in this environment.

1248	   There is the question of whether we *want* all routers along a path
1249	   to be visible via traceroute. For example, an ISP probably doesn't
1250	   want to expose the interior of their network to a customer. However,
1251	   the issue of whether a network's policy will allow the interior of
1252	   the network to be visible should be independent of whether is it
1253	   possible for some users to see the interior of the network. Thus
1254	   while there clearly should be the possibility of using policy
1255	   mechanisms to block traceroute from being used to see the interior of
1256	   the network, this does not imply that it is okay to develop protocol
1257	   mechanisms which break traceroute from working.

1259	   There is also the question of whether the interior of a MPLS network
1260	   is analogous to a normal IP network, or whether it is closer to the
1261	   interior of a layer 2 network (for example, an ATM subnet). Clearly
1262	   IP traceroute cannot be used to expose the interior of an ATM subnet.
1263	   When a packet is crossing an ATM subnetwork (for example, between an
1264	   ingress and an egress router which are attached to the ATM subnet)
1265	   traceroute can be used to determine the router to router path, but
1266	   not the path through the ATM switches which comprise the ATM subnet.
1267	   Note here that MPLS forms a sort of "in between" special case:
1268	   Routing is based on normal IP routing protocols, the equivalent of
1269	   call setup (label binding/exchange) is based on MPLS-specific
1270	   protocols, but forwarding is based on normal L2 ATM forwarding. MPLS
1271	   therefore supersedes the normal ATM-based methods that would be used
1272	   to eliminate loops and/or trace paths through the ATM subnet.

1274	   It is generally agreed that Traceroute is a relatively "ugly" tool,
1275	   and that a better tool for tracing the route of a packet would be
1276	   preferable. However, no better tool has yet been designed or even
1277	   proposed. Also, however ugly Traceroute may be, it is nonetheless
1278	   very useful, widely deployed, and widely used. In general, it is
1279	   highly preferable to define, implement, and deploy a new tool, and to
1280	   determine through experience that the new tool is sufficient, before
1281	   breaking a tool which is as widely used as traceroute.

1283	   Methods that may be used to either allow traceroute to be used in an
1284	   MPLS network, or to replace traceroute, are discussed in section
1285	   4.14.

1287	4. Technical Approaches

1289	   We believe that section 4 is probably less complete than other
1290	   sections. Additional subsections are likely to be needed as a result
1291	   of additional discussions in the MPLS working group.

1293	4.1 Label Distribution

1295	   A fundamental requirement in MPLS is that an LSR forwarding label
1296	   switched traffic to another LSR apply a label to that traffic which
1297	   is meaningful to the other (receiving the traffic) LSR. LSR's could
1298	   learn about each other's labels in a variety of ways. We call the
1299	   general topic "label distribution".

1301	4.1.1 Explicit Label Distribution

1303	   Explicit label distribution anticipates the specification by MPLS of
1304	   a standard protocol for label distribution. Two of the possible
1305	   approaches [TDP] [ARIS] that are oriented toward topology driven
1306	   label distribution. One other approach [FANP], in contrast, makes use
1307	   of traffic driven label distribution.

1309	   We expect that the label distribution protocol (LDP) which emerges
1310	   from the MPLS WG is likely to inherit elements from one or more of
1311	   the possible approaches.

1313	   Consider LSR A forwarding traffic to LSR B. We call A the upstream
1314	   (wrt to dataflow) LSR and B the downstream LSR. A must apply a label
1315	   to the traffic that B "understands". Label distribution must ensure
1316	   that the "meaning" of the label will be communicated between A and B.
1317	   An important question is whether A or B (or some other entity)
1318	   allocates the label.

1320	   In this discussion we are talking about the allocation and
1321	   distribution of labels between two peer LSRs  that are on a single
1322	   segment of what may be a longer path. A related but in fact entirely
1323	   separate issue is the question of where control of the whole path
1324	   resides. In essence there are two models; by analogy to upstream and
1325	   downstream for a single segment we can talk about ingress and egress
1326	   for an LSP (or to and from a label swapping "domain"). In one model a
1327	   path is setup from ingress to egress in the other from egress to
1328	   ingress.

1330	4.1.1.1 Downstream Label Allocation

1332	   "Downstream Label Allocation" refers to a method where the label
1333	   allocation is done by the downstream LSR, i.e. the LSR that uses the
1334	   label as an index into its switching tables.

1336	   This is, arguably, the most natural label allocation/distribution
1337	   mode for unicast traffic. As an LSR build its routing tables (we
1338	   consider here control driven allocation of tags) it is free, within
1339	   some limits we will discuss, to allocate labels to in any manner that
1340	   may be convenient to the particular implementation. Since the labels
1341	   that it allocates will be those upon which it subsequently makes
1342	   forwarding decisions we assume implementations will perform the
1343	   allocation in an optimal manner. Having allocated labels the default
1344	   behavior is to distribute the labels (and bindings) to all peers.

1346	   In some cases (particularly with ATM) there may be a limited number
1347	   of labels which may be used across an interface, and/or a limited
1348	   number of label assignments which may be supported by a single
1349	   device. Operation in this case may make use of "on demand" label
1350	   assignment. With this approach, an LSR may for example request a
1351	   label for a route from a particular peer only when its routing
1352	   calculations indicate that peer to be the new next hop for the route.

1354	4.1.1.2 Upstream Label Allocation

1356	   "Upstream Label Allocation" refers to a method where the label
1357	   allocation is done by the upstream LSR. In this case the LSR choosing
1358	   the label (the upstream LSR) and the LSR which needs to interpret
1359	   packets using the label (the downstream LSR) are not the same node.
1360	   We note here that in the upstream LSR the label at issue is not used
1361	   as an index into the switching tables but rather is found as the
1362	   result of a lookup on those tables.

1364	   The motivation for upstream label allocation comes from the
1365	   recognition that it might be possible to optimize multicast machinery
1366	   in an LSR if it were possible to use the same label on all output
1367	   ports for which a particular multicast packet/cell were destined.
1368	   Upstream assignment makes this possible.

1370	4.1.1.3 Other Label Allocation Methods

1372	   Another option would be to make use of label values which are unique
1373	   within the MPLS domain (implying that a domain-wide allocation would
1374	   be needed). In this case, any stream to a particular MPLS egress node
1375	   could make use of the label of that node (implying that label values
1376	   do not need to be swapped at intermediate nodes).

1378	   With this method of label allocation, there is a choice to be made
1379	   regarding the scope over which a label is unique. One approach is to
1380	   configure each node in an MPLS domain with a label which is unique in
1381	   that domain. Another approach is to use a truly global identifier
1382	   (for example the IEEE 48 bit identifier), where each MPLS-capable
1383	   node would be stamped at birth with a truly globally unique
1384	   identifier. The point of this global approach is to simplify
1385	   configuration in each MPLS domain by eliminating the need to
1386	   configure label IDs.

1388	4.1.2 Piggybacking on Other Control Messages

1390	   While we have discussed use of an explicit MPLS LDP we note that
1391	   there are several existing protocols that can be easily modified to
1392	   distribute both routing/control and label information. This could be
1393	   done with any of OSPF, BGP, RSVP and/or PIM. A particular
1394	   architectural elegance of these schemes is that label distribution
1395	   uses the same mechanisms as are used in distribution of the
1396	   underlying routing or control information.

1398	   When explicit label distribution is used, the routing computation and
1399	   label distribution are decoupled. This implies a possibility that at
1400	   some point you may either have a route to a specific destination
1401	   without an associated label, and/or a label for a specific
1402	   destination which makes use of a path which you are no longer using.
1403	   Piggybacking label distribution on the operation of the routing
1404	   protocol is one way to eliminate this decoupling.

1406	   Piggybacking label distribution on the routing protocol introduces an
1407	   issue regarding how to negotiate acceptable label values and what to
1408	   do if an invalid label is received. This is discussed in section
1409	   4.1.3.

1411	4.1.3 Acceptable Label Values

1413	   There are some constraints on which label values may be used in
1414	   either allocation mode. Clearly the label values must lie within the
1415	   allowable range described in the encapsulation standards that the
1416	   MPLS WG will produce. The label value used must also, however, lie
1417	   within a range that the peer  LSR is capable of supporting. We
1418	   imagine that certain machines, for example ATM switches operating as
1419	   LSRs may, due to operational or implementation restrictions, support
1420	   a label space more limited than that bounded by the valid range found
1421	   in the encapsulation standard. This implies that an advertisement or
1422	   negotiation mechanism for useable label range may be a part of the
1423	   MPLS LDP. When operating over ATM using ATM forwarding hardware, due
1424	   to the need for compatibility with the existing use of the ATM
1425	   VPI/VCI space, it is quite likely that an explicit mechanism will be
1426	   needed for label range negotiation.

1428	   In addition we note that LDP may be one of a number of mechanism used
1429	   to distribute labels between any given pair of LSRs. Clearly where
1430	   such multiple mechanisms exist care must be taken to coordinate the
1431	   allocation of label values. A single label value must  have a unique
1432	   meaning to the LSR that distributes it.

1434	   There is an issue regarding how to allow negotiation of acceptable
1435	   label values if label distribution is piggybacked with the routing
1436	   protocol. In this case it may be necessary either to require
1437	   equipment to accept any possible label value, or to configure devices
1438	   to know which range of label values may be selected. It is not clear
1439	   in this case what to do if an invalid label value is received as
1440	   there may be no means of sending a NAK.

1442	   A similar issue occurs with multicast traffic over broadcast media,
1443	   where there may be multiple nodes which receive the same transmission
1444	   (using a single label value). Here again it may be "non-trivial" how
1445	   to allow n-party negotiation of acceptable label values.

1447	4.1.4 LDP Reliability

1449	   The need for reliable label distribution depends upon the relative
1450	   performance of L2 and L3 forwarding, as well as the relationship
1451	   between label distribution and the routing protocol operation.

1453	   If label distribution is tied to the operation of the routing
1454	   protocol, then a reasonable protocol design would ensure that labels
1455	   are distributed successfully as long as the associated route and/or
1456	   reachability advertisement is distributed successfully. This implies
1457	   that the reliability of label distribution will be the same as the
1458	   reliability of route distribution.

1460	   If there is a very large difference between L2 and L3 forwarding
1461	   performance, then the cost of failing to deliver a label is
1462	   significant. In this case it is important to ensure that labels are
1463	   distributed reliably. Given that LDP needs to operate in a wide
1464	   variety of environments with a wide variety of equipment, this
1465	   implies that it is important for any LDP developed by the MPLS WG to
1466	   ensure reliable delivery of label information.

1468	   Reliable delivery of LDP packets may potentially be accomplished
1469	   either by using an existing reliable transport protocol such as TCP,
1470	   or by specifying reliability mechanisms as part of LDP (for example,
1471	   the reliability mechanisms which are defined in IDRP could
1472	   potentially be "borrowed" for use with LSP).

1474	4.1.5 Label Purge Mechanisms

1476	   Another issue to be considered is the "lifetime" of label data once
1477	   it arrives at an LSR, and the method of purging label data. There are
1478	   several methods that could be used either separately, or (more
1479	   likely) in combination.

1481	   One approach is for label information to be timed out. With this
1482	   approach a lifetime is distributed along with the label value. The
1483	   label value may be refreshed prior to timing out. If the label is not
1484	   refreshed prior to timing out it is discarded. In this case each
1485	   lifetime and timer may apply to a single label, or to a group of
1486	   labels (e.g., all labels selected by the same node).

1488	   Similarly, two peer nodes may make use of an MPLS peer keep-alive
1489	   mechanism. This implies exchange of MPLS control packets between
1490	   neighbors on a periodic basis. This in general is likely to use a
1491	   smaller timeout value than label value timers (analogous to the fact
1492	   that the OSPF HELLO interval is much shorter than the OSPF LSA
1493	   lifetime). If the peer session between two MPLS nodes fails (due to
1494	   expiration of the associated timer prior to reception of the refresh)
1495	   then associated label information is discarded.

1497	   If label information is piggybacked on the routing protocol then the
1498	   timeout mechanisms would also be taken from the associated routing
1499	   protocol (note that routing protocols in general have mechanisms to
1500	   invalidate stale routing information).

1502	   An alternative method for invalidating labels is to make use of an
1503	   explicit label removal message.

1505	4.2 Stream Merging

1507	   In order to scale O(n) (rather than O(n-squared), MPLS makes use of
1508	   the concept of stream merge. This makes use of multipoint to point
1509	   streams in order to allow multiple streams to be merged into one
1510	   stream.

1512	   Types of Stream Merge:

1514	   There are several types of stream merge that can be used, depending
1515	   upon the underlying media.

1517	   When MPLS is used over frame based media merging is straightforward.
1518	   All that is required for stream merge to take place is for a node to
1519	   allow multiple upstream labels to be forwarded the same way and
1520	   mapped into a single downstream label. This is referred to as frame
1521	   merge.

1523	   Operation over ATM media is less straightforward. In ATM, the data
1524	   packets are encapsulated into an ATM Adaptation Layer, say AAL5, and
1525	   the AAL5 PDU is segmented into ATM cells with a VPI/VCI value and the
1526	   cells are transmitted in sequence.  It is contingent on ATM switches
1527	   to keep the cells of a PDU (or with the same VPI/VCI value)
1528	   contiguous and in sequence.  This is because the device that
1529	   reassembles the cells to re-form the transmitted PDU expects the
1530	   cells to be contiguous and in sequence, as there isn't sufficient
1531	   information in the ATM cell header (unlike IP fragmentation) to
1532	   reassemble the PDU with any cell order. Hence, if cells from several
1533	   upstream link are transmitted onto the same downstream VPI/VCI, then
1534	   cells from one PDU can get interleaved with cells from another PDU on
1535	   the outgoing VPI/VCI, and result in corruption of the original PDUs
1536	   by mis-sequencing the cells of each PDU.

1538	   The most straightforward (but erroneous) method of merging in an ATM
1539	   environment would be to take the cells from two incoming VCs and
1540	   merge them into a single outgoing VCI. If this was done without any
1541	   buffering of cells then cells from two or more packets could end up
1542	   being interleaved into a single AAL5 frame. Therefore the problem
1543	   when operating over ATM is how to avoid interleaving of cells from
1544	   multiple sources.

1546	   There are two ways to solve this interleaving problem, which are
1547	   referred to as VC merge and VP merge.

1549	   VC merge allows multiple VCs to be merged into a single outgoing VC.
1550	   In order for this to work the node performing the merge needs to keep
1551	   the cells from one AAL5 frame (e.g., corresponding to an IP packet)
1552	   separate from the cells of other AAL5 frames. This may be done by
1553	   performing the SAR function in order to reassemble each IP packet
1554	   before forwarding that packet. In this case VC merge is essentially
1555	   equivalent to frame merge. An alternative is to buffer the cells of
1556	   one AAL5 frame together, without actually reassembling them. When the
1557	   end of frame indicator is reached that frame can be forwarded. Note
1558	   however that both forms of VC merge requires that the entire AAL5
1559	   frame be received before any cells corresponding to that frame be
1560	   forwarded. VC merge therefore requires capabilities which are
1561	   generally not available in most existing ATM forwarding hardware.

1563	   The alternative for use over ATM media is VP merge. Here multiple VPs
1564	   can be merged into a single VP. Separate VCIs within the merged VP
1565	   are used to distinguish frames (e.g., IP packets) from different
1566	   sources. In some cases, one VP may be used for the tree from each
1567	   ingress node to a single egress node.

1569	   Interoperation of Merge Options:

1571	   If some nodes support stream merge, and some nodes do not, then it is
1572	   necessary to ensure that the two types of nodes can interoperate
1573	   within a single network. This affects the number of labels that a
1574	   node needs to send to a neighbor. An upstream LSR which supports
1575	   Stream Merge needs to be sent only one label per forwarding
1576	   equivalence class (FEC). An upstream neighbor which does not support
1577	   Stream Merge needs to be sent multiple labels per FEC. However, there
1578	   is no way of knowing a priori how many labels it needs. This will
1579	   depend on how many LSRs are upstream of it with respect to the FEC in
1580	   question.

1582	   If a particular upstream neighbor does not support stream merge, it
1583	   is not known a priori how many labels it will need. The upstream
1584	   neighbor may need to explicitly ask for labels for each FEC. The
1585	   upstream neighbor may make multiple such requests (for one or more
1586	   labels per request). When a downstream neighbor receives such a
1587	   request from upstream, and the downstream neighbor does not itself
1588	   support stream merge, then it must in turn ask its downstream
1589	   neighbor for more labels for the FEC in question.

1591	   It is possible that there may be some nodes which support merge, but
1592	   have a limited number of upstream streams which may be merged into a
1593	   single downstream streams. Suppose for example that due to some
1594	   hardware limitation a node is capable of merging four upstream LSPs
1595	   into a single downstream LSP. Suppose however, that this particular
1596	   node has six upstream LSPs arriving at it for a particular Stream. In
1597	   this case, this node may merge these into two downstream LSPs
1598	   (corresponding to two labels that need to be obtained from the
1599	   downstream neighbor). In this case, the node will need to obtain the
1600	   required two labels.

1602	   The interoperation of the various forms of merging over ATM is most
1603	   easily described by first describing the interoperation of VC merge
1604	   with non-merge.

1606	   In the case where VC merge and non-merge nodes are interconnected the
1607	   forwarding of cells is based in all cases on a VC (i.e., the
1608	   concatenation of the VPI and VCI). For each node, if an upstream
1609	   neighbor is doing VC merge then that upstream neighbor requires only
1610	   a single outgoing VPI/VCI for a particular FEC (this is analogous to
1611	   the requirement for a single label in the case of operation over
1612	   frame media). If the upstream neighbor is not doing merge, then it
1613	   will require a single outgoing VPI/VCI per FEC for itself (assuming
1614	   that it can be an ingress node), plus enough outgoing VPI/VCIs to map
1615	   to incoming VPI/VCIs to pass to its upstream neighbors. The number
1616	   required will be determined by allowing the upstream nodes to request
1617	   additional VPI/VCIs from their downstream neighbors.

1619	   A similar method is possible to support nodes which perform VP merge.
1620	   In this case the VP merge node, rather than requesting a single
1621	   VPI/VCI or a number of VPI/VCIs from its downstream neighbor, instead
1622	   may request a single VP (identified by a VPI). Furthermore, suppose
1623	   that a non-merge node is downstream from two different VP merge
1624	   nodes. This node may need to request one VPI/VCI (for traffic
1625	   originating from itself) plus two VPs (one for each upstream node).

1627	   Note that there are multiple options for coordinating VCIs within a
1628	   VP. Description of the range of options is FFS.

1630	   In order to support all of VP merge, VC merge, and non-merge, it is
1631	   therefore necessary to allow upstream nodes to request a combination
1632	   of zero or more VC identifiers (consisting of a VPI/VCI), plus zero
1633	   or more VPs (identified by VPIs). VP merge nodes would therefore
1634	   request one VP. VC merge node would request only a single VPI/VCI
1635	   (since they can merge all upstream traffic into a single VC). Non-
1636	   merge nodes would pass on any requests that they get from above, plus
1637	   request a VPI/VCI for traffic that they originate (if they can be
1638	   ingress nodes). However, non-merge nodes which can only do VC
1639	   forwarding (and not VP forwarding) will need to know which VCIs are
1640	   used within each VP in order to install the correct VCs in its
1641	   forwarding table. A detailed description of how this could work is
1642	   FFS.

1644	   Coordination of the VCI space with VP Merge:

1646	   VP merge requires that the VCIs be coordinated to ensure uniqueness.
1647	   There are a number of ways in which this may be accomplished:

1649	   1. Each node may be pre-configured with a unique VCI value (or
1650	      values).

1652	   2. Some one node (most likely they root of the multipoint to point
1653	      tree) may coordinate the VCI values used within the VP.  A
1654	      protocol mechanism will be needed to allow this to occur. How
1655	      hard this is to do depends somewhat upon whether the root is
1656	      otherwise involved in coordinating the multipoint to point
1657	      tree. For example, allowing one node (such as the root) to
1658	      coordinate the tree may be useful for purposes of coordinating
1659	      load sharing (see section 4.10). Thus whether or not the issue
1660	      of coordinating the VCI space is significant or trivial may
1661	      depend upon other design choices which at first glance may
1662	      have appeared to be independent protocol design choices.

1664	   3. Other unique information such as portions of a class B or class
1665	      C address may be used to provide a unique VCI value.

1667	   4. Another alternative is to implement a simple hardware extension
1668	      in the ATM switches to keep the VCI values unique by dynamically
1669	      altering them to avoid collision.

1671	   VP merge makes less efficient use of the VPI/VCI space (relative to
1672	   VC merge).  When VP merge is used, the LSPs may not be able to
1673	   transit public ATM networks that dont support SVP.

1675	   Buffering Issues Related To Stream Merge:

1677	   There is an issue regarding the amount of buffering required for
1678	   frame merge, VC merge, and VP merge. Frame merge and VC merge
1679	   requires that intermediate points buffer incoming packets until the
1680	   entire packet arrives. This is essentially the same as is required in
1681	   traditional IP routers.

1683	   VP merge allows cells to be transmitted by intermediate nodes as soon
1684	   as they arrive, reducing the buffering and latency at intermediate
1685	   nodes. However, the use of VP merge implies that cells from multiple
1686	   packets will arrive at the egress node interleaved on separate VCIs.
1687	   This in turn implies that the egress node may have somewhat increased
1688	   buffering requirements. To a large extent egress nodes for some
1689	   destinations will be intermediate nodes for other destinations,
1690	   implying that increase in buffers required for some purpose (egress
1691	   traffic) will be offset by a reduction in buffers required for other
1692	   purposes (transit traffic). Also, routers today typically deal with
1693	   high-fanout channelized interfaces and with multi-VC ATM interfaces,
1694	   implying that the requirement of buffering simultaneously arriving
1695	   cells from multiple packets and sources is something that routers
1696	   typically do today. This is not meant to imply that the required
1697	   buffer size and performance is inexpensive, but rather is meant to
1698	   observe that it is a solvable issue.

1700	4.3 Loop Handling

1702	   Generally, methods for dealing with loops can be split into three
1703	   categories: Loop Survival makes use of methods which minimize the
1704	   impact of loops, for example by limiting the amount of network
1705	   resources which can be consumed by a loop; Loop Detection allows
1706	   loops to be set up, but later detects these loops and eliminates
1707	   them; Loop Prevention provides methods for avoiding setting up L2
1708	   forwarding in a way which results in a L2 loop.

1710	   Note that we are concerned here only with loops that occur in L2
1711	   forwarding. Transient loops at L3 will continue to be part of the
1712	   normal IP operation, and will be handled the way that IP has been
1713	   handling loops for years (see section 3.5).

1715	   Loop Survival:

1717	   Loop Survival refers to methods that are used to allow the network to
1718	   operate well even though short term transient loops may be formed by
1719	   the routing protocol. The basic approach to loop survival is to limit
1720	   the amount of network resources which are consumed by looping
1721	   packets, and to minimize the effect on other (non-looping) traffic.
1722	   Note that loop survival is the method used by conventional IP
1723	   forwarding, and is therefore based on long and relatively successful
1724	   experience in the Internet.

1726	   The most basic method for loop survival is based on the use to a TTL
1727	   (Time To Live) field. The TTL field is decremented at each hop. If
1728	   the TTL field reaches zero, then the packet is discarded. This method
1729	   works well over those media which has a TTL field. This explicitly
1730	   includes L3 IP forwarding. Also, assuming that the core MPLS
1731	   specifications will include definition of a "shim" MPLS header for
1732	   use over those media which do not have their own labels, in order to
1733	   carry labels for use in forwarding of user data, it is likely that
1734	   the shim header will also include a TTL field.

1736	   However, there is considerable interest in using MPLS over L2
1737	   protocols which provide their own labels, with the L2 label used for
1738	   MPLS forwarding. Specific L2 protocols which offer a label for this
1739	   purpose include ATM and Frame Relay. However, neither ATM nor Frame
1740	   Relay have a TTL field. This implies that this method cannot be used
1741	   when basic ATM or Frame Relay forwarding is being used.

1743	   Another basic method for loop survival is the use of dynamic routing
1744	   protocols which converge rapidly to non-looping paths. In some
1745	   instances it is possible that congestion caused by looping data could
1746	   effect the convergence of the routing protocol (see section 3.5).
1747	   MPLS should be designed to prevent this problem from occurring. Given
1748	   that MPLS uses the same routing protocols as are used for IP, this
1749	   method does not need to be discussed further in this framework
1750	   document.

1752	   Another possible tool for loop survival is the use of fair queuing.
1753	   This allows unrelated flows of user data to be placed in different
1754	   queues. This helps to ensure that a node which is overloaded with
1755	   looping user data can nonetheless forward unrelated non-looping data,
1756	   thereby minimizing the effect that looping data has on other data. We
1757	   cannot assume that fair queuing will always be available. In
1758	   practice, many fair queuing implementations merge multiple streams
1759	   into one queue (implying that the number of queues used is less than
1760	   the number of user data flows which are present in the network).
1761	   This implies that any data which happens to be in the same queue with
1762	   looping data may be adversely effected.

1764	   Loop Detection:

1766	   Loop Detection refers to methods whereby a loop may be set up at L2,
1767	   but the loop is subsequently detected. When the loop is detected, it
1768	   may be broken at L2 by dropping the label relationship, implying that
1769	   packets for a set of destinations must be forwarded at L3.

1771	   A possible method for loop detection is based on transmitting a "loop
1772	   detection" control packet (LDCP) along the path towards a specified
1773	   destination whenever the route to the destination changes. This LDCP
1774	   is forwarded in the direction that the label specifies, with the
1775	   labels swapped to the correct next hop value. However, normal L2
1776	   forwarding cannot be used because each hop needs to examine the
1777	   packet to check for loops.  The LDCP is forwarded towards that
1778	   destination until one of the following happens: (i) The LDCP reaches
1779	   the last MPLS node along the path (i.e. the next hop is either a
1780	   router which is not participating in MPLS, or is the final
1781	   destination host); (ii) The TTL of the LDCP expires (assuming that
1782	   the control packet uses a TTL, which is optional but not absolutely
1783	   necessary), or (iii) The LDCP returns to the node which originally
1784	   transmitted it. If the latter occurs, then the packet has looped and
1785	   the node which originally transmitted the LDCP stops using the
1786	   associated label, and instead uses L3 forwarding  for the associated
1787	   destination addresses. One problem with this method is that once a
1788	   loop is detected it is not known when the loop clears. One option
1789	   would be to set a timer, and to transmit a new LDCP when the timer
1790	   expires.

1792	   An alternate method counts the hops to each egress node, based on the
1793	   routes currently available. Each node advertises its distance (in hop
1794	   counts) to each destination. An egress node advertises the
1795	   destinations that it can reach directly with an associated hop count
1796	   of zero. For each destination, a node computes the hop count to that
1797	   destination based on adding one to the hop count advertised by its
1798	   actual next hop used for that destination. When the hop count for a
1799	   particular destination changes, the hop counts needs to be
1800	   readvertised.

1802	   In addition, the first of the loop prevention schemes discussed below
1803	   may be modified to provide loop detection (the details are
1804	   straightforward, but have not been written down in time to include in
1805	   this rough draft).

1807	   Loop Prevention:

1809	   Loop prevention makes use of methods to ensure that loops are never
1810	   set up at L2. This implies that the labels are not used until some
1811	   method is used to ensure that following the label towards the
1812	   destination, with associated label swaps at each switch, will not
1813	   result in a loop. Until the L2 path (making use of assigned labels)
1814	   is available, packets are forwarded at L3.

1816	   Loop prevention requires explicit signaling of some sort to be used
1817	   when setting up an L2 stream.

1819	   One method of loop prevention requires that labels be propagated
1820	   starting at the egress switch. The egress switch signals to
1821	   neighboring switches the label to use for a particular destination.
1822	   That switch then signals an associated label to its neighbors, etc.
1823	   The control packets which propagate the labels also include the path
1824	   to the egress (as a list of routerIDs). Any looping control packet
1825	   can therefore be detected and the path not set up to or past the
1826	   looping point. <Operation when routing changes needs to be described
1827	   here in more detail>.

1829	   Another option is to use explicit routing to set up label bindings
1830	   from the egress switch to each ingress switch. This precludes the
1831	   possibility of looping, since the entire path is computed by one
1832	   node. This also allows non-looping paths to be set up provided that
1833	   the egress switch has a view of the topology which is reasonably
1834	   close to reality (if there are operational links which the egress
1835	   switch doesn't know about, it will simply pick a path which doesn't
1836	   use those links; if there are links which have failed but which the
1837	   the egress switch thinks are operational, then there is some chance
1838	   that the setup attempt will fail but in this case the attempt can be
1839	   retried on a separate path). Note therefore that non-looping paths
1840	   can be set up with this method in many cases where distributed
1841	   routing plus hop by hop forwarding would not actually result in non-
1842	   looping paths. This method is similar to the method used by standard
1843	   ATM routing to ensure that SVCs are non-looping [PNNI].

1845	   Explicit routing is only applicable if the routing protocol gives the
1846	   egress switch sufficient information to set up the explicit route,
1847	   implying that the protocol must be either a link state protocol (such
1848	   as OSPF) or a path vector protocol (such as BGP). Source routing
1849	   therefore is not appropriate as a general approach for use in any
1850	   network regardless of the routing protocol. This method also requires
1851	   some overhead for the call setup before label-based forwarding can be
1852	   used. If the network topology changes in a manner which breaks the
1853	   existing path, then a new path will need to be explicit routed from
1854	   the egress switch.  Due to this overhead this method is probably only
1855	   appropriate if other significant advantages are also going to be
1856	   obtained from having a single node (the egress switch) coordinate the
1857	   paths to be used. Examples of other reasons to have one node
1858	   coordinate the paths to a single egress switch include: (i)
1859	   Coordinating the VCI space where VP merge is used (see section 4.2);
1860	   and (ii) Coordinating the routing of streams from multiple ingress
1861	   switches to one egress switch so as to balance the load on multiple
1862	   alternate paths through the network.

1864	   In principle the explicit routing could also be done in the alternate
1865	   direction (from ingress to egress). However, this would make it more
1866	   difficult to merge streams if stream merge is to be used. This would
1867	   also make it more difficult to coordinate (i) changes to the paths
1868	   used, (ii) the VCI space assignments, and (iii) load sharing. This
1869	   therefore makes explicit routing more difficult, and also reduces the
1870	   other advantages that could be obtained from the approach.

1872	   If label distribution is piggybacked on the routing protocol (see
1873	   section 4.1.2), then loop prevention is only possible if the routing
1874	   protocol itself does loop prevention.

1876	   What To Do If A Loop Is Detected:

1878	   With all of these schemes, if a loop is known to exist then the L2
1879	   label-swapped path is not set up. This leads to the obvious question
1880	   of what does an MPLS node do when it doesn't have a label for a
1881	   particular destination, and a packet for that destination arrives to
1882	   be forwarded? If possible, the packet is forwarded using normal L3
1883	   (IP) forwarding. There are two issues that this raises: (i) What
1884	   about nodes which are not capable of L3 forwarding; (ii) Given the
1885	   relative speeds of L2 and L3 forwarding, does this work?

1887	   Nodes which are not capable of L3 forwarding obviously can't forward
1888	   a packet unless it arrives with a label, and the associated next hop
1889	   label has been assigned. Such nodes, when they receive a packet for
1890	   which the next hop label has not been assigned, must discard the
1891	   packet. It is probably safe to assume that if a node cannot forward
1892	   an L3 packet, then it is probably also incapable of forwarding an
1893	   ICMP error report that it originates. This implies that the packet
1894	   will need to be discarded in this case.

1896	   In many cases L2 forwarding will be significantly faster than L3
1897	   forwarding (allowing faster forwarding is a significant motivation
1898	   behind the work on MPLS). This implies that if a node is forwarding a
1899	   large volume of traffic at L2, and a change in the routing protocol
1900	   causes the associated labels to be lost (necessitating L3
1901	   forwarding), in some cases the node will not be capable of forwarding
1902	   the same volume of traffic at L3. This will of course require that
1903	   packets be discarded. However, in some cases only a relatively small
1904	   volume of traffic will need to be forwarded at L3. Thus forwarding at
1905	   L3 when L2 is not available is not necessarily always a problem.
1906	   There may be some nodes which are capable of forwarding equally fast
1907	   at L2 and L3 (for example, such nodes may contain IP forwarding
1908	   hardware which is not available in all nodes). Finally, when packets
1909	   are lost this will cause TCP to backoff, which will in turn reduce
1910	   the load on the network and allow the network to stabilize even at
1911	   reduced forwarding rates until such time as the label bindings can be
1912	   reestablished.

1914	   Note that in most cases loops will be caused either by configuration
1915	   errors, or due to short term transient problems caused by the failure
1916	   of a link. If only one link goes down, and if routing creates a
1917	   normal "tree-shaped" set of paths to any one destination, then the
1918	   failure of one link somewhere in the network will effect only one
1919	   link's worth of data passing through any one node in the network.
1920	   This implies that if a node is capable of forwarding one link's worth
1921	   of data at L3, then in many or most cases it will have sufficient L3
1922	   bandwidth to handle looping data.

1924	4.4 Interoperation with NHRP

1926	   <note: Future versions of this document will contain a picture to
1927	   clarify the discussion in this section. In addition there are
1928	   alternate interaction scenarios which probably should be discussed
1929	   briefly.>

1931	   When label switching is used over ATM, and there exists an LSR which
1932	   is also operating as a Next Hop Client (NHC), the possibility of
1933	   direct interaction arises.  That is, could one switch cells between
1934	   the two technologies without reassembly.  To enable this several
1935	   important issues must be addressed.

1937	   The encapsulation must be acceptable to both MPLS and NHRP.  If only
1938	   a single label is used, then the null encapsulation could be used.
1939	   Other solutions could be developed to handle label stacks.

1941	   NHRP must understand and respect the granularity of a stream.

1943	   Currently NHRP resolves an IP address to an ATM address. The response
1944	   may include a mask indicating a range of addresses. However, any VC
1945	   to the ATM address is considered to be a viable means of packet
1946	   delivery. Suppose that an NHC NHRPs for IP address A and gets back
1947	   ATM address 1 and sets up a VC to address 1. Later the same NHC NHRPs
1948	   for a totally unrelated IP address B and gets back the same ATM
1949	   address 1. In this case normal NHRP behavior allows the NHC to use
1950	   the VC (that was set up for destination A) for traffic to B.

1952	   Note: In this section we will refer to a VC set up as a result of an
1953	   NHRP query/response as a shortcut VC.

1955	   If one expects to be able to label switch the packets being received
1956	   from a shortcut VC, then the label switch needs to be informed as to
1957	   exactly what traffic will arrive on that VC and that mapping cannot
1958	   change without notice. Currently there exists no mechanism in the
1959	   defined signaling of an shortcut VC.  Several means are possible.  A
1960	   binding, equivalent to the binding in LDP, could be sent in the setup
1961	   message.  Alternatively, the binding of prefix to label could remain
1962	   in an LDP session (or whatever means of label distribution as
1963	   appropriate) and the setup could carry a binding of the label to the
1964	   VC. This would leave the binding mechanism for shortcut VCs
1965	   independent of the label distribution mechanism.

1967	   A further architectural challenge exists in that label switching is
1968	   inherently unidirectional whereas ATM is bi-directional.  The above
1969	   binding semantics are fairly straight-forward.  However, effectively
1970	   using the reverse direction of a VC presents further challenges.

1972	   Label switching must also respect the granularity of the shortcut VC.
1973	   Without VC merge, this means a single label switched flow must map to
1974	   a VC.  In the case of VC merge, multiple label switched streams could
1975	   be merged onto a single shortcut VC.  But given the asymmetry
1976	   involved, there is perhaps little practical use

1978	   Another issue is one of practicality and usefulness.  What is sent
1979	   over the VC must be at a fine enough granularity to be label switched
1980	   through receiving domain.  One potential place where the two
1981	   technologies might come into play is in moving data from one campus
1982	   via the wide-area to another campus.  In such a scenario, the two
1983	   technologies would border precisely at the point where summarization
1984	   is likely to occur.  Each campus would have a detailed understanding
1985	   of itself, but not of the other campus.  The wide-area is likely to
1986	   have summarized knowledge only. But at such a point level 3
1987	   processing becomes the likely solution.

1989	4.5 Operation in a Hierarchy

1991	   This section is FFS.

1993	4.6 Stacked Labels in a Flat Routing Environment

1995	   This section is FFS.

1997	4.7 Multicast

1999	   This section is FFS.

2001	4.8 Multipath

2003	   Many IP routing protocols support the notion of equal-cost multipath
2004	   routes, in which a router maintains multiple next hops for one
2005	   destination prefix when two or more equal-cost paths to the prefix
2006	   exist. There are a few possible approaches for handling multipath
2007	   with MPLS.

2009	   In this discussion we will use the term "multipath node" to mean a
2010	   node which is keeping track of multiple switched paths from itself
2011	   for a single destination.

2013	   The first approach maintains a separate switched path from each
2014	   ingress node via one or more multipath nodes to a merge point. This
2015	   requires MPLS to distinguish the separate switched paths, so that
2016	   learning of a new switched path is not misinterpreted as a
2017	   replacement of the same switched path. This also requires an ingress
2018	   MPLS node be capable of distributing the traffic among the multiple
2019	   switched paths. This approach preserves switching performance, but at
2020	   a cost of proliferating the number of switched paths. For example,
2021	   each switched path consumes a distinct label.

2023	   The second approach establishes only one switched path from any one
2024	   ingress node to a destination. However, when the paths from two
2025	   different ingress nodes happen to arrive at the same node, that node
2026	   may use different paths for each (implying that the node becomes a
2027	   multipath node). Thus the switched path chosen by the multipath node
2028	   may assign a different downstream path to each incoming stream. This
2029	   conserves switched paths and maintains switching performance, but
2030	   cannot balance loads across downstream links as well as the other
2031	   approaches, even if switched paths are selectively assigned. With
2032	   this approach is that the L2 path may be different from the normal L3
2033	   path, as traffic that otherwise would have taken multiple distinct
2034	   paths is forced onto a single path.

2036	   The third approach allows a single stream arriving at a multipath
2037	   node to be split into multiple streams, by using L3 forwarding at the
2038	   multipath node. For example, the multipath node might choose to use a
2039	   hash function on the source and destination IP addresses, in order to
2040	   avoid misordering packets between any one IP source and destination.
2041	   This approach conserves switched paths at the cost of switching
2042	   performance.

2044	4.9 Host Interactions

2046	   There are a range of options for host interaction with MPLS:

2048	   The most straightforward approach is no host involvement. Thus host
2049	   operation may be completely independent of MPLS, rather hosts operate
2050	   according to other IP standards. If there is no host involvement then
2051	   this implies that the first hop requires an L3 lookup.

2053	   If the host is ATM attached and doing NHRP, then this would allow the
2054	   host to set up a Virtual Circuit to a router. However this brings up
2055	   a range of issues as was discussed in section 4.4 ("interoperation
2056	   with NHRP").

2058	   On the ingress side, it is reasonable to consider having the first
2059	   hop LSR provide labels to the hosts, and thus have hosts attach
2060	   labels for packets that they transmit. This could allow the first hop
2061	   LSR to avoid an L3 lookup. It is reasonable here to have the host
2062	   request labels only when needed, rather than require the host to
2063	   remember all labels assigned for use in the network.

2065	   On the egress side, it is questionable whether hosts should be
2066	   involved. For scaling reasons, it would be undesirable to use a
2067	   different label for reaching each host.

2069	4.10 Explicit Routing

2071	   There are two options for Route Selection: (1) Hop by hop routing,
2072	   and (2) Explicit routing.

2074	   An explicitly routed LSP is an LSP where, at a given LSR, the LSP
2075	   next hop is not chosen by each local node, but rather is chosen by a
2076	   single node (usually the ingress or egress node of the LSP). The
2077	   sequence of LSRs followed by an explicit routing LSP may be chosen by
2078	   configuration, or by an algorithm performed by a single node (for
2079	   example, the egress node may make use of the topological information
2080	   learned from a link state database in order to compute the entire
2081	   path for the tree ending at that egress node).

2083	   With MPLS the explicit route needs to be specified at the time that
2084	   Labels are assigned, but the explicit route does not have to be
2085	   specified with each L3 packet. This implies that explicit routing
2086	   with MPLS is relatively efficient (when compared with the efficiency
2087	   of explicit routing for pure datagrams).

2089	   Explicit routing may be useful for a number of purposes such as
2090	   allowing policy routing and/or facilitating traffic engineering.

2092	4.10.1 Establishment of Point to Point Explicitly Routed LSPs

2094	   In order to establish a point to point explicitly routed LSP, the LDP
2095	   packets used to set up the LSP must contain the explicit route. This
2096	   implies that the LSP is set up in order either from the ingress to
2097	   the egress, or from the egress to the ingress.

2099	   One node needs to pick the explicit route: This may be done in at
2100	   least two possible ways: (i) by configuration (eg, the explicit route
2101	   may be chosen by an operator, or by a centralized server of some
2102	   kind); (ii) By use of a routing protocol which allows the ingress
2103	   and/or egress node to know the entire route to be followed. This
2104	   would imply the use of a link state routing protocol (in which all
2105	   nodes know the full topology) or of a path vector routing protocol
2106	   (in which the ingress node is told the path as part of the normal
2107	   operation of the routing protocol).

2109	   Note: The normal operation of path vector routing protocols (such as
2110	   BGP) does not provide the full set of routers along the path. This
2111	   implies that either a partial source route only would be provided
2112	   (implying that LSP setup would use a combination of hop by hop and
2113	   explicit routing), or it would be necessary to augment the protocol
2114	   in order to provide the complete explicit route. Detailed operation
2115	   in this case is FFS.

2117	   In the point to point case, it is relatively straightforward to
2118	   specify the route to use: This is indicated by providing the
2119	   addresses of each LSR on the LSP.

2121	4.10.2 Explicit and Hop by Hop routing: Avoiding Loops

2123	   In general, an LSP will be explicit routed specifically because there
2124	   is a good reason to use an alternative to the hop by hop routed path.
2125	   This implies that the explicit route is likely to follow a path which
2126	   is inconsistent with the path followed by hop by hop routing. If some
2127	   of the nodes along the path follow an explicit route but some of the
2128	   nodes make use of hop by hop routing (and ignore the explicit route),
2129	   then inconsistent routing may result and in some cases loops (or
2130	   severely inefficient paths) may form. This implies that for any one
2131	   LSP, there are two possible options: (i) The entire LSP may be hop by
2132	   hop routed; or (ii) The entire LSP may be explicit routed.

2134	   For this reason, it is important that if an explicit route is
2135	   specified for setting up an LSP, then that route must be followed in
2136	   setting up the LSP.

2138	   There is a related issue when a link or node in the middle of an
2139	   explicitly routed LSP breaks: In this case, the last operating node
2140	   on the upstream part of the LSP will continue receiving packets, but
2141	   will not be able to forward them along the explicitly routed LSP
2142	   (since its next hop is no longer functioning). In this case, it is
2143	   not in general safe for this node to forward the packets using L3
2144	   forwarding with hop by hop routing. Instead, the packets must be
2145	   discarded, and the upstream partition of the explicitly routed LSP
2146	   must be torn down.

2148	   Where part of an Explicitly Routed LSP breaks, the node which
2149	   originated the LSP needs to be told about this. For robustness
2150	   reasons the MPLS protocol design should not assume that the routing
2151	   protocol will tell the node which originated the LSP. For example, it
2152	   is possible that a link may go down and come back up quickly enough
2153	   that the routing protocol never declares the link down. Rather, an
2154	   explicit MPLS mechanism is needed.

2156	4.10.3 Merge and Explicit Routing

2158	   Explicit Routing is slightly more complex with a multipoint to point
2159	   LSP (i.e., in the case that stream merge is used).

2161	   In this case, it is not possible to specify the route for the LSP as
2162	   a simple list of LSRs (since the LSP does not consist of a simple
2163	   sequence of LSRs). Rather the explicit route must specify a tree.
2164	   There are several ways that this may be accomplished. Details are
2165	   FFS.

2167	4.10.4 Using Explicit Routing for Traffic Engineering

2169	   In the Internet today it is relatively common for ISPs to make use of
2170	   a Frame Relay or ATM core, which interconnects a number of IP
2171	   routers. The primary reason for use of a switching (L2) core is to
2172	   make use of low cost equipment which provides very high speed
2173	   forwarding. However, there is another very important reason for the
2174	   use of a L2 core: In order to allow for Traffic Engineering.

2176	   Traffic Engineering (also known as bandwidth management) refers to
2177	   the process of managing the routes followed by user data traffic in a
2178	   network in order to provide relatively equal and efficient loading of
2179	   the resources in the network (i.e., to ensure that the bandwidth on
2180	   links and nodes are within the capabilities of the links and nodes).

2182	   Some rudimentary level of traffic engineering can be accomplished
2183	   with pure datagram routing and forwarding by adjusting the metrics
2184	   assigned to links. For example, suppose that there is a given link in
2185	   a network which tends to be overloaded on a long term basis. One
2186	   option would be to manually configure an increased metric value for
2187	   this link, in the hopes of moving some traffic onto alternate routes.
2188	   This provides a rather crude method of traffic engineering and
2189	   provides only limited results.

2191	   Another method of traffic engineering is to manually configure
2192	   multiple PVCs across a L2 core, and to adjust the route followed by
2193	   each PVC in an attempt to equalize the load on different parts of the
2194	   network. Where necessary, multiple PVCs may be configured between the
2195	   same two nodes, in order to allow traffic to be split between
2196	   different paths. In some topologies it is much easier to achieve
2197	   efficient non-overlapping or minimally-overlapping paths via this
2198	   method (with manually configured paths) than it would be with pure
2199	   datagram forwarding. A similar ability can be achieved with MPLS via
2200	   the use of manual configuration of the paths taken by LSPs.

2202	   A related issue is the decision on where merge is to occur. Note that
2203	   once two streams merge into one stream (forwarded by a single label)
2204	   then they cannot diverge again at that level of the MPLS hierarchy
2205	   (i.e., they cannot be bifurcated without looking at a higher level
2206	   label or the IP header). Thus there may be times when it is desirable
2207	   to explicitly NOT merge two streams even though they are to the same
2208	   egress node and FEC. Non-merge may be appropriate either because the
2209	   streams will want to diverge later in the path (for example, to avoid
2210	   overloading a particular downstream link), or because the streams may
2211	   want to use different physical links in the case where multiple
2212	   slower physical links are being aggregated into a single logical link
2213	   for the purpose of IP routing.

2215	   As a network grows to a very large size (on the order of hundreds of
2216	   LSRs), it becomes increasingly difficult to handle the assignment of
2217	   all routes via manual configuration. However, explicit routing allows
2218	   several alternatives:

2220	   1. Partial Configuration: One option is to use automatic/dynamic
2221	   routing for most of the paths through the network, but then manually
2222	   configure some routes. For example, suppose that full dynamic routing
2223	   would result in a particular link being overloaded. One of the LSPs
2224	   which uses that link could be selected and manually routed to use a
2225	   different path.

2227	   2. Central Computation: One option would be to provide long term
2228	   network usage information to a single central management facility.
2229	   That facility could then run a global optimization to compute a set
2230	   of paths to use. Network management commands can be used to configure
2231	   LSRs with the correct routes to use.

2233	   3. Egress Computation: An egress node can run a computation which
2234	   optimizes the path followed for traffic to itself. This cannot of
2235	   course optimize the entire traffic load through the network, but can
2236	   include optimization of traffic from multiple ingress's to one
2237	   egress. The reason for optimizing traffic to a single egress, rather
2238	   than from a single ingress, relates to the issue of when to merge: An
2239	   ingress can never merge the traffic from itself to different
2240	   egresses, but an egress can if desired chose to merge the traffic
2241	   from multiple ingress's to itself.

2243	4.10.5 Using Explicit Routing for Policy Routing

2245	   This section is FFS.

2247	4.11 Traceroute

2249	   This section is FFS.

2251	4.12 LSP Control: Egress versus Local

2253	   There is a choice to be made regarding whether the initial setup of
2254	   LSPs will be initiated by the egress node, or locally by each
2255	   individual node.

2257	   When LSP control is done locally, then each node may at any time pass
2258	   label bindings to its neighbors for each FEC recognized by that node.
2259	   In the normal case that the neighboring nodes recognize the same
2260	   FECs, then nodes may map incoming labels to outgoing labels as part
2261	   of the normal label swapping forwarding method.

2263	   When LSP control is done by the egress, then initially (on startup)
2264	   only the egress node passes label bindings to its neighbors
2265	   corresponding to any FECs which leave the MPLS network at that egress
2266	   node. When initializing, other nodes wait until they get a label from
2267	   downstream for a particular FEC before passing a corresponding label
2268	   for the same FEC to upstream nodes.

2270	   With local control, since each LSR is (at least initially)
2271	   independently assigning labels to FECs, it is possible that different
2272	   LSRs may make inconsistent decisions. For example, an upstream LSR
2273	   may make a coarse decision (map multiple IP address prefixes to a
2274	   single label) while its downstream neighbor makes a finer grain
2275	   decision (map each individual IP address prefix to a separate label).
2276	   With downstream label assignment this can be corrected by having LSRs
2277	   withdraw labels that it has assigned which are inconsistent with
2278	   downstream labels, and replace them with new consistent label
2279	   assignments.

2281	   This may appear to be an advantage of egress LSP control (since with
2282	   egress control the initial label assignments "bubble up" from the
2283	   egress to upstream nodes, and consistency is therefore easy to
2284	   ensure). However, even with egress control it is possible that the
2285	   choice of egress node may change, or the egress may (based on a
2286	   change in configuration) change its mind in terms of the granularity
2287	   which is to be used. This implies the same mechanism will be
2288	   necessary to allow changes in granularity to bubble up to upstream
2289	   nodes. The choice of egress or local control may therefore effect the
2290	   frequency with which this mechanism is used, but will not effect the
2291	   need for a mechanism to achieve consistency of label granularity.

2293	   Egress control and local control can interwork in a very
2294	   straightforward manner: With either approach, (assuming downstream
2295	   label assignment) the egress node will initially assign labels for
2296	   particular FECs and will pass these labels to its neighbors. With
2297	   either approach these label assignments will bubble upstream, with
2298	   the upstream nodes choosing labels that are consistent with the
2299	   labels that they receive from downstream.

2301	   The difference between the two techniques therefore becomes a
2302	   tradeoff between avoiding a short period of initial thrashing on
2303	   startup (in the sense of avoiding the need to withdraw inconsistent
2304	   labels which may have been assigned using local control) versus the
2305	   imposition of a short delay on initial startup (while waiting for the
2306	   initial label assignments to bubble up from downstream). The protocol
2307	   mechanisms which need to be defined are the same in either case, and
2308	   the steady state operation is the same in either case.

2310	4.13 Security

2312	   Security in a network using MPLS should be relatively similar to
2313	   security in a normal IP network.

2315	   Routing in an MPLS network uses precisely the same IP routing
2316	   protocols as are currently used with IP. This implies that route
2317	   filtering is unchanged from current operation. Similarly, the
2318	   security of the routing protocols is not effected by the use of MPLS.

2320	   Packet filtering also may be done as in normal IP. This will require
2321	   either (i) that label swapping be terminated prior to any firewalls
2322	   performing packet filtering (in which case a separate instance of
2323	   label swapping may optionally be started after the firewall); or (ii)
2324	   that firewalls "look past the labels", in order to inspect the entire
2325	   IP packet contents. In this latter case note that the label may imply
2326	   semantics greater than that contained in the packet header: In
2327	   particular, a particular label value may imply that the packet is to
2328	   take a particular path after the firewall. In environments in which
2329	   this is considered to be a security issue it may be desirable to
2330	   terminate the label prior to the firewall.

2332	   Note that in principle labels could be used to speed up the operation
2333	   of firewalls: In particular, the label could be used as an index into
2334	   a table which indicates the characteristics that the packet needs to
2335	   have in order to pass through the firewall. Depending upon
2336	   implementation considerations matching the contents of the packet to
2337	   the contents of the table may be quicker than parsing the packet in
2338	   the absence of the label.

2340	References

2342	   [1] "A Proposed Architecture for MPLS", E. Rosen, A. Viswanathan, R.
2343	       Callon, work in progress, draft-rosen-architecture-00.txt, July
2344	       1997.

2346	   [2] "ARIS: Aggregate Route-Based IP Switching", A. Viswanathan, N.

2348	       Feldman, R. Boivie, R. Woundy, work in progress, Internet Draft
2349	       <draft- viswanathan-aris-overview-00.txt>, March 1997.

2351	   [3] "ARIS Specification", N. Feldman, A. Viswanathan, work in
2352	       progress, Internet Draft <draft-feldman-aris-spec-00.txt>, March
2353	       1997.

2355	   [4] "ARIS Support for LAN Media Switching", S. Blake, A. Ghanwani, W.
2356	       Pace, V. Srinivasan, work in progress, Internet Draft <draft-
2357	       blake-aris-lan- 00.txt>, March 1997.

2359	   [5] "Tag Switching Architecture - Overview", Rekhter, Davie, Katz,
2360	       Rosen, Swallow, Farinacci, work in progress, Internet Draft
2361	       <draft-rekhter- tagswitch-arch-00.txt>

2363	   [6] Tag distribution Protocol", Doolan, Davie, Katz, Rekhter, Rosen,
2364	       work in progress, internet draft <draft-doolan-tdp-spec-01.txt>

2366	   [7] "Use of Tag Switching with ATM", Davie, Doolan, Lawrence,
2367	       McGloghrie, Rekhter, Rosen, Swallow, work in progress, Internet
2368	       Draft <draft-davie- tag-switching-atm-01.txt>

2370	   [8] "Label Switching: Label Stack Encodings", Rosen, Rekhter, Tappan,
2371	       Farinacci, Fedorkow, Li, work in progress, internet draft
2372	       <draft-rosen- tag-stack-02.txt>

2374	   [9] "Partitioning Tag Space among Multicast Routers on a Common
2375	       Subnet", Farinacci, work in progress, internet draft <draft-
2376	       farinacci-multicast- tag-part-00.txt>

2378	   [10] "Multicast Tag Binding and Distribution using PIM", Farinacci,
2379	        Rekhter, work in progress, internet draft <draft-farinacci-
2380	        multicast-tagsw- 00.txt>

2382	   [11] "Toshiba's Router Architecture Extensions fir ATM: Overview",
2383	        Katsube, Nagami, Esaki, RFC2098.TXT.

2385	   [12] "Soft State Switching: A Proposal to Extend RSVP for Switching
2386	        RSVP Flows", A. Viswanathan, V. Srinivasan, work in progress,
2387	        Internet Draft <draft-viswanathan-aris-rsvp-00.txt>, March 1997.

2389	   [13] "Integrated Services in the Internet Architecture: an Overview",
2390	        R. Braden et al, RFC 1633, June 1994.

2392	   [14] "Resource ReSerVation Protocol (RSVP), Version 1 Functional
2393	        Specification", work in progress, draft-ietf-rsvp-spec-16.txt,
2394	        June 1997

2396	   [15] "OSPF version 2", J. Moy, RFC 1583, March 1994.

2398	   [16] "A Border Gateway Protocol 4 (BGP-4)", Y. Rekhter and T. Li,
2399	        RFC 1771, March 1995.

2401	   [17] "Ipsilon Flow Management Protocol Specification for IPv4 Version
2402	        1.0", P. Newman et al., RFC 1953, May 1996.

2404	   [18] "ATM Forum Private Network-Network Interface Specification,
2405	        Version 1.0", ATM Forum af-pnni-0055.000, March 1996.

2407	   [19] "NBMA Next Hop Resolution Protocol (NHRP)", J. Luciani et al.,
2408	        work in progress, draft-ietf-rolc-nhrp-11.txt, March 1997.

2410	Author's Addresses

2412	        Ross Callon
2413	        Ascend Communications, Inc.
2414	        1 Robbins Road
2415	        Westford, MA  01886
2416	        508-952-7412
2417	        rcallon@casc.com

2419	        Paul Doolan
2420	        Cisco Systems, Inc
2421	        250 Apollo Drive
2422	        Chelmsford, MA 01824
2423	        508-634-1204
2424	        pdoolan@cisco.com

2426	        Nancy Feldman
2427	        IBM Corp.
2428	        17 Skyline Drive
2429	        Hawthorne NY 10532
2430	        914-784-3254
2431	        nkf@vnet.ibm.com

2433	        Andre Fredette
2434	        Bay Networks Inc
2435	        3 Federal Street
2436	        Billerica, MA  01821
2437	        508-916-8524
2438	        fredette@baynetworks.com

2440	        George Swallow
2441	        Cisco Systems, Inc
2442	        250 Apollo Drive
2443	        Chelmsford, MA 01824
2444	        508-244-8143
2445	        swallow@cisco.com

2447	        Arun Viswanathan
2448	        IBM Corp.
2449	        17 Skyline Drive
2450	        Hawthorne NY 10532
2451	        914-784-3273
2452	        arunv@vnet.ibm.com