idnits 2.17.1 draft-ietf-mpls-framework-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-27) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 124: '... MPLS forwarding MUST simplify packet ...' RFC 2119 keyword, line 131: '...ore technologies MUST be general with ...' RFC 2119 keyword, line 133: '...imizations for particular media MAY be...' RFC 2119 keyword, line 136: '...ore technologies MUST be compatible wi...' RFC 2119 keyword, line 137: '...g protocols, and MUST be capable of op...' (23 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1115 has weird spacing: '...ount of resou...' == Line 1321 has weird spacing: '...er LSRs that ...' == Line 1431 has weird spacing: '...ue must have ...' == Line 1786 has weird spacing: '...warding for t...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The MPLS protocol MUST not make assumptions about the forwarding capabilities of an MPLS node. Thus, MPLS must propose solutions that can leverage the benefits of a node that is capable of L3 forwarding, but must not mandate the node be capable of such. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 30, 1997) is 9768 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'TAG' on line 1009 -- Looks like a reference, but probably isn't: 'ARIS' on line 1305 -- Looks like a reference, but probably isn't: 'RSVP' on line 1010 -- Looks like a reference, but probably isn't: 'CSR' on line 1011 -- Looks like a reference, but probably isn't: 'Ipsilon' on line 1011 -- Looks like a reference, but probably isn't: 'PNNI' on line 1843 -- Looks like a reference, but probably isn't: 'TDP' on line 1305 -- Looks like a reference, but probably isn't: 'FANP' on line 1306 == Unused Reference: '1' is defined on line 2342, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 2346, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 2351, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 2355, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 2359, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 2363, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 2366, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 2370, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 2374, but no explicit reference was found in the text == Unused Reference: '10' is defined on line 2378, but no explicit reference was found in the text == Unused Reference: '11' is defined on line 2382, but no explicit reference was found in the text == Unused Reference: '12' is defined on line 2385, but no explicit reference was found in the text == Unused Reference: '13' is defined on line 2389, but no explicit reference was found in the text == Unused Reference: '14' is defined on line 2392, but no explicit reference was found in the text == Unused Reference: '15' is defined on line 2396, but no explicit reference was found in the text == Unused Reference: '16' is defined on line 2398, but no explicit reference was found in the text == Unused Reference: '17' is defined on line 2401, but no explicit reference was found in the text == Unused Reference: '18' is defined on line 2404, but no explicit reference was found in the text == Unused Reference: '19' is defined on line 2407, but no explicit reference was found in the text -- No information found for draft-rosen-architecture - is the name correct? -- Possible downref: Normative reference to a draft: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' ** Downref: Normative reference to an Informational RFC: RFC 2098 (ref. '11') -- Possible downref: Non-RFC (?) normative reference: ref. '12' ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref. '13') -- Unexpected draft version: The latest known version of draft-ietf-rsvp-spec is -15, but you're referring to -16. (However, the state information for draft-rosen-architecture is not up-to-date. The last update was unsuccessful) ** Obsolete normative reference: RFC 1583 (ref. '15') (Obsoleted by RFC 2178) ** Obsolete normative reference: RFC 1771 (ref. '16') (Obsoleted by RFC 4271) ** Downref: Normative reference to an Informational RFC: RFC 1953 (ref. '17') -- Possible downref: Non-RFC (?) normative reference: ref. '18' == Outdated reference: A later version (-14) exists of draft-ietf-rolc-nhrp-11 Summary: 15 errors (**), 0 flaws (~~), 26 warnings (==), 24 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group R. Callon 2 INTERNET DRAFT Ascend Communications 3 P. Doolan 4 Cisco Systems 5 N. Feldman 6 IBM Corp. 7 A. Fredette 8 Bay Networks 9 G. Swallow 10 Cisco Systems 11 A. Viswanathan 12 IBM Corp. 13 July 30, 1997 14 Expires Jan. 30, 1998 16 A Framework for Multiprotocol Label Switching 18 Status of this Memo 20 This document is an Internet-Draft. Internet-Drafts are working 21 documents of the Internet Engineering Task Force (IETF), its areas, 22 and its working groups. Note that other groups may also distribute 23 working documents as Internet-Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as ``work in progress.'' 30 To learn the current status of any Internet-Draft, please check the 31 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 32 Directories on ds.internic.net (US East Coast), nic.nordu.net 33 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific 34 Rim). Distribution of this memo is unlimited. 36 Abstract 38 This document discusses technical issues and requirements for the 39 Multiprotocol Label Switching working group. This is an initial draft 40 document, which will evolve and expand over time. It is the intent of 41 this document to produce a coherent description of all significant 42 approaches which were and are being considered by the working group. 43 Selection of specific approaches, making choices regarding 44 engineering tradeoffs, and detailed protocol specification, are 45 outside of the scope of this framework document. 47 Note that this document is at an early stage, and that most of the 48 detailed technical discussion is only in a rough form. Additional 49 text will be provided over time from a number of sources. A small 50 amount of the text in this document may be redundant with the 51 proposed protocol architecture for MPLS. This redundancy will be 52 reduced over time, with the overall discussion of issues moved to be 53 in this document, and the selection of specific approaches and 54 specification of the protocol contained in the protocol architecture 55 and other related documents. 57 Acknowledgments 59 The ideas and text in this document have been collected from a number 60 of sources and comments received. We would like to thank Jim Luciani, 61 Andy Malis, Yakov Rekhter, Eric Rosen, and Vijay Srinivasan for their 62 inputs and ideas. 64 1. Introduction and Requirements 66 1.1 Overview of MPLS 68 The primary goal of the MPLS working group is to standardize a base 69 technology that integrates the label swapping forwarding paradigm 70 with network layer routing. This base technology (label swapping) is 71 expected to improve the price/performance of network layer routing, 72 improve the scalability of the network layer, and provide greater 73 flexibility in the delivery of (new) routing services (by allowing 74 new routing services to be added without a change to the forwarding 75 paradigm). 77 The initial MPLS effort will be focused on IPv4 and IPv6. However, 78 the core technology will be extendible to multiple network layer 79 protocols (e.g., IPX, Appletalk, DECnet, CLNP). MPLS is not confined 80 to any specific link layer technology, it can work with any media 81 over which Network Layer packets can be passed between network layer 82 entities. 84 MPLS makes use of a routing approach whereby the normal mode of 85 operation is that L3 routing (e.g., existing IP routing protocols 86 and/or new IP routing protocols) is used by all nodes to determine 87 the routed path. 89 MPLS provides a simple "core" set of mechanisms which can be applied 90 in several ways to provide a rich functionality. The core effort 91 includes: 93 a) Semantics assigned to a stream label: 95 - Labels are associated with specific streams of data; 97 b) Forwarding Methods: 99 - Forwarding is simplified by the use of short fixed length 100 labels to identify streams 102 - Forwarding may require simple functions such as looking up a 103 label in a table, swapping labels, and possibly decrementing 104 and checking a TTL. 106 - In some cases MPLS may make direct use of underlying layer 2 107 forwarding, such as is provided by ATM or Frame Relay 108 equipment. 110 c) Label Distribution Methods: 112 - Allow nodes to determine which labels to use for specific 113 streams 115 - This may use some sort of control exchange, and/or be 116 piggybacked on a routing protocol 118 The MPLS working group will define the procedures and protocols used 119 to assign significance to the forwarding labels and to distribute 120 that information between cooperating MPLS forwarders. 122 1.2 Requirements 124 - MPLS forwarding MUST simplify packet forwarding in order to do the 125 following: 127 - lower cost of high speed forwarding 129 - improve forwarding performance 131 - MPLS core technologies MUST be general with respect to data link 132 technologies (i.e., work over a very wide range of underlying data 133 links). Specific optimizations for particular media MAY be 134 considered. 136 - MPLS core technologies MUST be compatible with a wide range of 137 routing protocols, and MUST be capable of operating independently 138 of the underlying routing protocols. It has been observed that 139 considerable optimizations can be achieved in some cases by small 140 enhancements of existing protocols. Such enhancements MAY be 141 considered in the case of IETF standard routing protocols, and if 142 appropriate, coordinated with the relevant working group(s). 144 - Routing protocols which are used in conjunction with MPLS might 145 be based on distributed computation. As such, during routing 146 transients, these protocols may compute forwarding paths which 147 potentially contain loops. MPLS MUST provide protocol mechanisms 148 to either prevent the formation of loops and /or contain the 149 amount of (networking) resources that can be consumed due to the 150 presence of loops. 152 - MPLS forwarding MUST allow "aggregate forwarding" of user data; 153 i.e., allow streams to be forwarded as a unit and ensure that an 154 identified stream takes a single path, where a stream may consist 155 of the aggregate of multiple flows of user data. MPLS SHOULD 156 provide multiple levels of aggregation support (e.g., from 157 individual end to end application flows at one extreme, to 158 aggregates of all flows passing through a specified switch or 159 router at the other extreme). 161 - MPLS MUST support operations, administration, and maintenance 162 facilities at least as extensive as those supported in current IP 163 networks. Current network management and diagnostic tools SHOULD 164 continue to work in order to provide some backward compatibility. 165 Where such tools are broken by MPLS, hooks MUST be supplied to 166 allow equivalent functionality to be created. 168 - MPLS core technologies MUST work with both unicast and multicast 169 streams. 171 - The MPLS core specifications MUST clearly state how MPLS operates 172 in a hierarchical network. 174 - Scalability issues MUST be considered and analyzed during the 175 definition of MPLS. Very scaleable solutions MUST be sought. 177 - MPLS core technologies MUST be capable of working with O(n) streams 178 to switch all best-effort traffic, where n is the number of nodes 179 in a MPLS domain. MPLS protocol standards MUST be capable of taking 180 advantage of hardware that supports stream merging where 181 appropriate. Note that O(n-squared) streams or VCs might also be 182 appropriate for use in some cases. 184 - The core set of MPLS standards, along with existing Internet 185 standards, MUST be a self-contained solution. For example, the 186 proposed solution MUST NOT require specific hardware features that 187 do not commonly exist on network equipment at the time that the 188 standard is complete. However, the solution MAY make use of 189 additional optional hardware features (e.g., to optimize 190 performance). 192 - The MPLS protocol standards MUST support multipath routing and 193 forwarding. 195 - MPLS MUST be compatible with the IETF Integrated Services Model, 196 including RSVP. 198 - It MUST be possible for MPLS switches to coexist with non MPLS 199 switches in the same switched network. MPLS switches SHOULD NOT 200 impose additional configuration on non-MPLS switches. 202 - MPLS MUST allow "ships in the night" operation with existing layer 203 2 switching protocols (e.g., ATM Forum Signaling) (i.e., MPLS must 204 be capable of being used in the same network which is also 205 simultaneously operating standard layer 2 protocols). 207 - The MPLS protocol MUST support both topology-driven and 208 traffic/request-driven label assignments. 210 1.3 Terminology 212 aggregate stream 214 synonym of "stream" 216 DLCI 218 a label used in Frame Relay networks to identify frame 219 relay circuits 221 flow 223 a single instance of an application to application flow 224 of data (as in the RSVP and IFMP use of the term "flow") 226 forwarding equivalence class 228 a group of L3 packets which are forwarded in the same 229 manner (e.g., over the same path, with the same 230 forwarding treatment). A forwarding equivalence class is 231 therefore the set of L3 packets which could safely be 232 mapped to the same label. Note that there may be reasons 233 that packets from a single forwarding equivalence class 234 may be mapped to multiple labels (e.g., when stream 235 merge is not used). 237 frame merge 239 stream merge, when it is applied to operation over 240 frame based media, so that the potential problem of cell 241 interleave is not an issue. 243 label 245 a short fixed length physically contiguous locally 246 significant identifier which is used to identify a stream 248 label information base 250 the database of information containing label bindings 252 label swap 254 the basic forwarding operation consisting of looking 255 up an incoming label to determine the outgoing label, 256 encapsulation, port, and other data handling information. 258 label swapping 260 a forwarding paradigm allowing streamlined forwarding of 261 data by using labels to identify streams of data to be 262 forwarded. 264 label switched hop 266 the hop between two MPLS nodes, on which forwarding is 267 done using labels. 269 label switched path 271 the path created by the concatenation of one or more label 272 switched hops, allowing a packet to be forwarded by swapping 273 labels from an MPLS node to another MPLS node. 275 layer 2 277 the protocol layer under layer 3 (which therefore offers 278 the services used by layer 3). Forwarding, when done by the 279 swapping of short fixed length labels, occurs at layer 2 280 regardless of whether the label being examined is an ATM 281 VPI/VCI, a frame relay DLCI, or an MPLS label. 283 layer 3 285 the protocol layer at which IP and its associated routing 286 protocols operate 288 link layer 289 synonymous with layer 2 291 loop detection 293 a method of dealing with loops in which loops are allowed 294 to be set up, and data may be transmitted over the loop, 295 but the loop is later detected and closed 297 loop prevention 299 a method of dealing with loops in which data is never 300 transmitted over a loop 302 label stack 304 an ordered set of labels 306 loop survival 308 a method of dealing with loops in which data may be 309 transmitted over a loop, but means are employed to limit the 310 amount of network resources which may be consumed by the 311 looping data 313 label switching router 315 an MPLS node which is capable of forwarding native L3 packets 317 merge point 319 the node at which multiple streams and switched paths are 320 combined into a single stream sent over a single path. In the 321 case that the multiple paths are not combined prior to the 322 egress node, then the egress node becomes the merge point. 324 Mlabel 326 abbreviation for MPLS label 328 MPLS core standards 330 the standards which describe the core MPLS technology 332 MPLS domain 334 a contiguous set of nodes which operate MPLS routing and 335 forwarding and which are also in one Routing or Administrative 336 Domain 338 MPLS edge node 340 an MPLS node that connects an MPLS domain with a node which 341 is outside of the domain, either because it does not run 342 MPLS, and/or because it is in a different domain. Note that 343 if an LSR has a neighboring host which is not running MPLS, 344 that that LSR is an MPLS edge node. 346 MPLS egress node 348 an MPLS edge node in its role in handling traffic as it 349 leaves an MPLS domain 351 MPLS ingress node 353 an MPLS edge node in its role in handling traffic as it 354 enters an MPLS domain 356 MPLS label 358 a label placed in a short MPLS shim header used to identify 359 streams 361 MPLS node 363 a node which is running MPLS. An MPLS node will be aware of 364 MPLS control protocols, will operate one or more L3 routing 365 protocols, and will be capable of forwarding packets based on 366 labels. An MPLS node may optionally be also capable of 367 forwarding native L3 packets. 369 MultiProtocol Label Switching 371 an IETF working group and the effort associated with the 372 working group 374 network layer 376 synonymous with layer 3 378 shortcut VC 380 a VC set up as a result of an NHRP query and response 382 stack 384 synonymous with label stack 386 stream 388 an aggregate of one or more flows, treated as one aggregate 389 for the purpose of forwarding in L2 and/or L3 nodes (e.g., 390 may be described using a single label). In many cases a stream 391 may be the aggregate of a very large number of flows. 392 Synonymous with "aggregate stream". 394 stream merge 396 the merging of several smaller streams into a larger stream, 397 such that for some or all of the path the larger stream can 398 be referred to using a single label. 400 switched path 402 synonymous with label switched path 404 virtual circuit 406 a circuit used by a connection-oriented layer 2 technology 407 such as ATM or Frame Relay, requiring the maintenance of 408 state information in layer 2 switches. 410 VC merge 412 stream merge when it is specifically applied to VCs, 413 specifically so as to allow multiple VCs to merge into one 414 single VC 416 VP merge 418 stream merge when it is applied to VPs, specifically so as 419 to allow multiple VPs to merge into one single VP. In this 420 case the VCIs need to be unique. This allows cells from 421 different sources to be distinguished via the VCI. 423 VPI/VCI 425 a label used in ATM networks to identify circuits 427 1.4 Acronyms and Abbreviations 429 DLCI Data Link Circuit Identifier 431 FEC Forwarding Equivalence Class 433 ISP Internet Service Provider 434 LIB Label Information Base 436 LDP Label Distribution Protocol 438 L2 Layer 2 440 L3 Layer 3 442 LSP Label Switched Path 444 LSR Label Switching Router 446 MPLS MultiProtocol Label Switching 448 MPT Multipoint to Point Tree 450 NHC Next Hop (NHRP) Client 452 NHS Next Hop (NHRP) Server 454 VC Virtual Circuit 456 VCI Virtual Circuit Identifier 458 VPI Virtual Path Identifier 460 2. Discussion of Core MPLS Components 462 2.1 The Basic Routing Approach 464 Routing is accomplished through the use of standard L3 routing 465 protocols, such as OSPF and BGP. The information maintained by the 466 L3 routing protocols is then used to distribute labels to neighboring 467 nodes that are used in the forwarding of packets as described below. 468 In the case of ATM networks, the labels that are distributed are 469 VPI/VCIs and a separate protocol (i.e., PNNI) is not necessary for 470 the establishment of VCs for IP forwarding. 472 The topological scope of a routing protocol (i.e. routing domain) and 473 the scope of label switching MPLS-capable nodes may be different. 474 For example, MPLS-knowledgeable and MPLS-ignorant nodes, all of which 475 are OSPF routers, may be co-resident in an area. In the case that 476 neighboring routers know MPLS, labels can be exchanged and used. 478 Neighboring MPLS routers may use configured PVCs or PVPs to tunnel 479 through non-participating ATM or FR switches. 481 2.2 Labels 483 In addition to the single routing protocol approach discussed above, 484 the other key concept in the basic MPLS approach is the use of short 485 fixed length labels to simply user data forwarding. 487 2.2.1 Label Semantics 489 It is important that the MPLS solutions are clear about what 490 semantics (i.e., what knowledge of the state of the network) is 491 implicit in the use of labels for forwarding user data packets or 492 cells. 494 At the simplest level, a label may be thought of as nothing more than 495 a shorthand for the packet header, in order to index the forwarding 496 decision that a router would make for the packet. In this context, 497 the label is nothing more than a shorthand for an aggregate stream of 498 user data. 500 This observation leads to one possible very simple interpretation 501 that the "meaning" of the label is a strictly local issue between two 502 neighboring nodes. With this interpretation: (i) MPLS could be 503 employed between any two neighboring nodes for forwarding of data 504 between those nodes, even if no other nodes in the network 505 participate in MPLS; (ii) When MPLS is used between more than two 506 nodes, then the operation between any two neighboring nodes could be 507 interpreted as independent of the operation between any other pair of 508 nodes. This approach has the advantage of semantic simplicity, and of 509 being the closest to pure datagram forwarding. However this approach 510 (like pure datagram forwarding) has the disadvantage that when a 511 packet is forwarded it is not known whether the packet is being 512 forwarded into a loop, into a black hole, or towards links which have 513 inadequate resources to handle the traffic flow. These disadvantages 514 are necessary with pure datagram forwarding, but are optional design 515 choices to be made when label switching is being used. 517 There are cases where it would be desirable to have additional 518 knowledge implicit in the existence of the label. For example, one 519 approach to avoiding loops (see section x.x below) involves signaling 520 the label distribution along a path before packets are forwarded on 521 that path. With this approach the fact that a node has a label to use 522 for a particular IP packet would imply the knowledge that following 523 the label (including label swapping at subsequent nodes) leads to a 524 non-looping path which makes progress towards the destination 525 (something which is usually, but not necessarily always true when 526 using pure datagram routing). This would of course require some sort 527 of label distribution/setup protocol which signals along the path 528 being setup before the labels are available for packet forwarding. 530 However, there are also other consequences to having additional 531 semantics associated with the label: specifically, procedures are 532 needed to ensure that the semantics are correct. For example, if the 533 fact that you have a label for a particular destination implies that 534 there is a loop-free path, then when the path changes some procedures 535 are required to ensure that it is still loop free. Another example of 536 semantics which could be implicit in a label is the identity of the 537 higher level protocol type which is encoded using that label value. 539 In either case, the specific value of a label to use for a stream is 540 strictly a local issue; however the decision about whether to use the 541 label may be based on some global (or at least wider scope) knowledge 542 that, for example, the label-switched path is loop-free and/or has 543 the appropriate resources. 545 A similar example occurs in ATM networks: With standard ATM a 546 signaling protocol is used which both reserves resources in switches 547 along the path, and which ensures that the path is loop-free and 548 terminates at the correct node. Thus implicit in the fact that an ATM 549 node has a VPI/VCI for forwarding a particular piece of data is the 550 knowledge that the path has been set up successfully. 552 Another similar examples occurs with multipoint to point trees over 553 ATM (see section xx below), where the multipoint to point tree uses a 554 VP, and cell interleave at merge points in the tree is handled by 555 giving each source on the tree a distinct VCI within the VP. In this 556 case, the fact that each source has a known VPI/VCI to use needs to 557 (implicitly or explicitly) imply the knowledge that the VCI assigned 558 to that source is unique within the context of the VP. 560 In general labels are used to optimize how the system works, not to 561 control how the system works. For example, the routing protocol 562 determines the path that a packet follows. The presence or absence of 563 a label assignment should not effect the path of a L3 packet. Note 564 however that the use of labels may make capabilities such as explicit 565 routes, loadsharing, and multipath more efficient. 567 2.2.2 Label Granularity 569 Labels are used to create a simple forwarding paradigm. The 570 essential element in assigning a label is that the device which will 571 be using the label to forward packets will be forwarding all packets 572 with the same label in the same way. If the packet is to be 573 forwarded solely by looking at the label, then at a minimum, all 574 packets with the same incoming label must be forwarded out the same 575 port(s) with the same encapsulation(s), and with the same next hop 576 label (if any). 578 The term "forwarding equivalence class" is used to refer to a set of 579 L3 packets which are all forwarded in the same manner by a particular 580 LSR (for example, the IP packets in a forwarding equivalence class 581 may be destined for the same egress from an MPLS network, and may be 582 associated with the same QoS class). A forwarding equivalence class 583 is therefore the set of L3 packets which could safely be mapped to 584 the same label. Note that there may be reasons that packets from a 585 single forwarding equivalence class may be mapped to multiple labels 586 (e.g., when stream merge is not used). 588 Note that the label could also mean "ignore this label and forward 589 based on what is contained within," where within one might find a 590 label (if a stack of labels is used) or a layer 3 packet. 592 For IP unicast traffic, the granularity of a label allows various 593 levels of aggregation in a Label Information Base (LIB). At one end 594 of the spectrum, a label could represent a host route (i.e. the full 595 32 bits of IP address). If a router forwards an entire CIDR prefix 596 in the same way, it may choose to use one label to represent that 597 prefix. Similarly if the router is forwarding several (otherwise 598 unrelated) CIDR prefixes in the same way it may choose to use the 599 same label for this set of prefixes. For instance all CIDR prefixes 600 which share the same BGP Next Hop could be assigned the same label. 601 Taking this to the limit, an egress router may choose to advertise 602 all of its prefixes with the same label. 604 By introducing the concept of an egress identifier, the distribution 605 of labels associated with groups of CIDR prefixes can be simplified. 606 For instance, an egress identifier might specify the BGP Next Hop, 607 with all prefixes routed to that next hop receiving the label 608 associated with that egress identifier. Another natural place to 609 aggregate would be the MPLS egress router. This would work 610 particularly well in conjunction with a link-state routing protocol, 611 where the association between egress router and CIDR prefix is 612 already distributed throughout an area. 614 For IP multicast, the natural binding of a label would be to a 615 multicast tree, or rather to the branch of a tree which extends from 616 a particular port. Thus for a shared tree, the label corresponds to 617 the multicast group, (*,G). For (S,G) state, the label would 618 correspond to the source address and the multicast group. 620 A label can also have a granularity finer than a host route. That 621 is, it could be associated with some combination of source and 622 destination address or other information within the packet. This 623 might for example be done on an administrative basis to aid in 624 effecting policy. A label could also correspond to all packets which 625 match a particular Integrated Services filter specification. 627 Labels can also represent explicit routes. This use is semantically 628 equivalent to using an IP tunnel with a complete explicit route. This 629 is discussed in more detail in section 4.10. 631 2.2.3 Label Assignment 633 Essential to label switching is the notion of binding between a label 634 and Network Layer routing (routes). A control component is 635 responsible for creating label bindings, and then distributing the 636 label binding information among label switches. Label assignment 637 involves allocating a label, and then binding a label to a route. 639 Label assignment can be driven by control traffic or by data traffic. 640 This is discussed in more detail in section 3.4. 642 Control traffic driven label assignment has several advantages, as 643 compared to data traffic driven label Assignment. For one thing, it 644 minimizes the amount of additional control traffic needed to 645 distribute label binding information, as label binding information is 646 distributed only in response to control traffic, independent of data 647 traffic. It also makes the overall scheme independent of and 648 insensitive to the data traffic profile/pattern. Control traffic 649 driven creation of label binding improves forwarding latency, as 650 labels are assigned before data traffic arrives, rather than being 651 assigned as data traffic arrives. It also simplifies the overall 652 system behavior, as the control plane is controlled solely by control 653 traffic, rather than by a mix of control and data traffic. 655 There are however situations where data traffic driven label 656 assignment is necessary. A particular case may occur with ATM 657 without VP or VC merge. In this case in order to set up a full mesh 658 of VCs would require n-squared VCs. However, in very large networks 659 this may be infeasible. Instead VCs may be setup where required for 660 forwarding data traffic. In this case it is generally not possible to 661 know a priori how many such streams may occur. 663 Label withdrawal is required with both control-driven and data-driven 664 label assignment. Label withdrawal is primarily a matter of garbage 665 collection, that is collecting up unused labels so that they may be 666 reassigned. Generally speaking, a label should be withdrawn when the 667 conditions that allowed it to be assigned are no longer true. For 668 example, if a label is imbued with extra semantics such as loop-free- 669 ness, then the label must be withdrawn when those extra semantics 670 cease to hold. 672 In certain cases, notably multicast, it may be necessary to share a 673 label space between multiple entities. If these sharing arrangements 674 are altered by the coming and going of neighbors, then labels which 675 are no longer controlled by an entity must be withdrawn and a new 676 label assigned. 678 2.2.4 Label Stack and Forwarding Operations 680 The basic forwarding operation consists of looking up the incoming 681 label to determine the outgoing label, encapsulation, port, and any 682 additional information which may pertain to the stream such as a 683 particular queue or other QoS related treatment. We refer to this 684 operation as a label swap. 686 When a packet first enters an MPLS domain, the packet is forwarded by 687 normal layer 3 forwarding operations with the exception that the 688 outgoing encapsulation will now include a label. We refer to this 689 operation as a label push. When a packet leaves an MPLS domain, the 690 label is removed. We refer to this as a label pop. 692 In some situations, carrying a stack of labels is useful. For 693 instance both IGP and BGP label could be used to allow routers in the 694 interior of an AS to be free of BGP information. In this scenario, 695 the "IGP" label is used to steer the packet through the AS and the 696 "BGP" label is used to switch between ASes. 698 With a label stack, the set of label operations remains the same, 699 except that at some points one might push or pop multiple labels, or 700 pop & swap, or swap & push. 702 2.3 Encapsulation 704 Label-based forwarding makes use of various pieces of information, 705 including a label or stack of labels, and possibly additional 706 information such as a TTL field. In some cases this information may 707 be encoded using an MPLS header, in other cases this information may 708 be encoded in L2 headers. Note that there may be multiple types of 709 MPLS headers. For example, the header used over one media type may be 710 different than is used over a different media type. Similarly, in 711 some cases the information that MPLS makes use of may be encoded in 712 an ATM header. We will use the term "MPLS encapsulation" to refer to 713 whatever form is used to encapsulate the label information and other 714 information used for label based forwarding. The term "MPLS header" 715 will be used where this information is carried in some sort of MPLS- 716 specific header (i.e., when the MPLS information cannot all be 717 carried in a L2 header). Whether there is one or multiple forms of 718 possible MPLS headers is also outside of the scope of this document. 720 The exact contents of the MPLS encapsulation is outside of the scope 721 of this document. Some fields, such as the label, are obviously 722 needed. Some others might or might not be standardized, based on 723 further study. An encapsulation scheme may make use of the following 724 fields: 726 - label 727 - TTL 728 - class of service 729 - stack indicator 730 - next header type indicator 731 - checksum 733 It is desirable to have a very short encapsulation header. For 734 example, a four byte encapsulation header adds to the convenience of 735 building a hardware implementation that forwards based on the 736 encapsulation header. But at the same time it is tricky assigning 737 such a limited number of bits to carry the above listed information 738 in an MPLS header. Hence careful consideration must be given to the 739 information chosen for an MPLS header. 741 A TTL value in the MPLS header may be useful in the same manner as it 742 is in IP. Specifically, TTL may be used to terminate packets caught 743 in a routing loop, and for other related uses such as traceroute. The 744 TTL mechanism is a simple and proven method of handling such events. 745 Another use of TTL is to expire packets in a network by limiting 746 their "time to live" and eliminating stale packets that may cause 747 problems for some of the higher layer protocols. When used over link 748 layers which do not provide a TTL field, alternate mechanisms will be 749 needed to replace the uses of the TTL field. 751 A provision for a class of service (COS) field in the MPLS header 752 allows multiple service classes within the same label. However, when 753 more sophisticated QoS is associated with a label, the COS may not 754 have any significance. Alternatively, the COS (like QoS) can be left 755 out of the header, and instead propagated with the label assignment, 756 but this entails that a separate label be assigned to each required 757 class of service. Nevertheless, the COS mechanism provides a simple 758 method of segregating flows within a label. 760 As previously mentioned, the encapsulation header can be used to 761 derive benefits of tunneling (or stacking). 763 The MPLS header must provide a way to indicate that multiple MPLS 764 headers are stacked (i.e., the "stack indicator"). For this purpose 765 a single bit in the MPLS header will suffice. In addition, there are 766 also some benefits to indicating the type of the protocol header 767 following the MPLS header (i.e., the "next header type indicator"). 768 One option would be to combine the stack indicator and next header 769 type indicator into a single value (i.e., the next header type 770 indicator could be allowed to take the value "MPLS header"). Another 771 option is to have the next header type indicator be implicit in the 772 label value (such that this information would be propagated along 773 with the label). 775 There is no compelling reason to support a checksum field in the MPLS 776 header. A CRC mechanism at the L2 layer should be sufficient to 777 ensure the integrity of the MPLS header. 779 3. Observations, Issues and Assumptions 781 3.1 Layer 2 versus Layer 3 Forwarding 783 MPLS uses L2 forwarding as a way to provide simple and fast packet 784 forwarding capability. One primary reason for the simplicity of L2 785 layer forwarding comes from its short, fixed length labels. A node 786 forwarding at L3 must parse a (relatively) large header, and perform 787 a longest-prefix match to determine a forwarding path. However, when 788 a node performs L2 label swapping, and labels are assigned properly, 789 it can do a direct index lookup into its forwarding (or in this case, 790 label-swapping) table with the short header. It is arguably simpler 791 to build label swapping hardware than it is to build L3 forwarding 792 hardware because the label swapping function is less complex. 794 The relative performance of L2 and L3 forwarding may differ 795 considerably between nodes. Some nodes may illustrate an order of 796 magnitude difference. Other nodes (for example, nodes with more 797 extensive L3 forwarding hardware) may have identical performance at 798 L2 and L3. However, some nodes may not be capable of doing a L3 799 forwarding at all (e.g. ATM), or have such limited capacity as to be 800 unusable at L3. In this situation, traffic must be blackholed if no 801 switched path exists. 803 On nodes in which L3 forwarding is slower than L2 forwarding, pushing 804 traffic to L3 when no L2 path is available may cause congestion. In 805 some cases this could cause data loss (since L3 may be unable to keep 806 up with the increased traffic). However, if data is discarded, then 807 in general this will cause TCP to backoff, which would allow control 808 traffic, traceroute and other network management tools to continue to 809 work. 811 The MPLS protocol MUST not make assumptions about the forwarding 812 capabilities of an MPLS node. Thus, MPLS must propose solutions that 813 can leverage the benefits of a node that is capable of L3 forwarding, 814 but must not mandate the node be capable of such. 816 Why We Will Still Need L3 Forwarding: 818 MPLS will not, and is not intended to, replace L3 forwarding. There 819 is absolutely a need for some systems to continue to forward IP 820 packets using normal Layer 3 IP forwarding. L3 forwarding will be 821 needed for a variety of reasons, including: 823 - For scaling; to forward on a finer granularity than the labels 824 can provide 825 - For security; to allow packet filtering at firewalls. 826 - For forwarding at the initial router (when hosts don't do MPLS) 828 Consider a campus network which is serving a small company. Suppose 829 that this companies makes use of the Internet, for example as a 830 method of communicating with customers. A customer on the other side 831 of the world has an IP packet to be forwarded to a particular system 832 within the company. It is not reasonable to expect that the customer 833 will have a label to use to forward the packet to that specific 834 system. Rather, the label used for the "first hop" forwarding might 835 be sufficient to get the packet considerably closer to the 836 destination. However, the granularity of the labels cannot be to 837 every host worldwide. Similarly, routing used within one routing 838 domain cannot know about every host worldwide. This implies that in 839 may cases the labels assigned to a particular packet will be 840 sufficient to get the packet close to the destination, but that at 841 some points along the path of the packet the IP header will need to 842 be examined to determine a finer granularity for forwarding that 843 packet. This is particularly likely to occur at domain boundaries. 845 A similar point occurs at the last router prior to the destination 846 host. In general, the number of hosts attached to a network is likely 847 to be great enough that it is not feasible to assign a separate label 848 to every host. Rather, as least for routing within the destination 849 routing domain (or the destination area if there is a hierarchical 850 routing protocol in use) a label may be assigned which is sufficient 851 to get the packet to the last hop router. However, the last hop 852 router will need to examine the IP header (and particularly the 853 destination IP address) in order to forward the packet to the correct 854 destination host. 856 Packet filtering at firewalls is an important part of the operation 857 of the Internet. While the current state of Internet security may be 858 considerably less advanced than may be desired, nonetheless some 859 security (as is provided by firewalls) is much better than no 860 security. We expect that packet filtering will continue to be 861 important for the foreseeable future. Packet filtering requires 862 examination of the contents of the packet, including the IP header. 863 This implies that at firewalls the packet cannot be forwarded simply 864 by considering the label associated with the packet. Note that this 865 is also likely to occur at domain boundaries. 867 Finally, it is very likely that many hosts will not implement MPLS. 868 Rather, the host will simply forward an IP packet to its first hop 869 router. This first hop router will need to examine the IP header 870 prior to forwarding the packet (with or without a label). 872 3.2 Scaling Issues 874 MPLS scalability is provided by two of the principles of routing. 875 The first is that forwarding follows an inverted tree rooted at a 876 destination. The second is that the number of destinations is 877 reduced by routing aggregation. 879 The very nature of IP forwarding is a merged multipoint-to-point 880 tree. Thus, since MPLS mirrors the IP network layer, an MPLS node 881 that is capable of merging is capable of creating O(n) switched paths 882 which provide network reachability to all "n" destinations. The 883 meaning of "n" depends on the granularity of the switched paths. One 884 obvious choice of "n" is the number of CIDR prefixes existing in the 885 forwarding table (this scales the same as today's routing). However, 886 the value of "n" may be reduced considerably by choosing switched 887 paths of further aggregation. For example, by creating switched paths 888 to each possible egress node, "n" may represent the number of egress 889 nodes in a network. This choice creates "n" switched paths, such that 890 each path is shared by all CIDR prefixes that are routed through the 891 same egress node. This selection greatly improves scalability, since 892 it minimizes "n", but at the same time maintains the same switching 893 performance of CIDR aggregation. (See section 2.2.2 for a description 894 of all of the levels of granularity provided by MPLS). 896 The MPLS technology must scale at least as well as existing 897 technology. For example, if the MPLS technology were to support ONLY 898 host-to-host switched path connectivity, then the number of 899 switched-paths would be much higher than the number of routing table 900 entries. 902 There are several ways in which merging can be done in order to allow 903 O(n) switches paths to connect n nodes. The merging approach used has 904 an impact on the amount of state information, buffering, delay 905 characteristics, and the means of control required to coordinate the 906 trees. These issues are discussed in more detail in section 4.2. 908 There are some cases in which O(n-squared) switched paths may be used 909 (for example, by setting up a full mesh of point to point streams). 910 As label space and the amount of state information that can be 911 supported may be limited, it will not be possible to support O(n- 912 squared) switched paths in very large networks. However, in some 913 cases the use of n- squared paths may even be a advantage (for 914 example, to allow load- splitting of individual streams). 916 MPLS must be designed to scale for O(n). O(n) scaling allows MPLS 917 domains to scale to a very large scale. In addition, if best effort 918 service can be supported with O(n) scaling, this conserves resources 919 (such as label space and state information) which can be used for 920 supporting advanced services such as QoS. However, since some 921 switches may not support merging, and some small networks may not 922 require the scaling benefits of O(n), provisions must also be 923 provided for a non- merging, O(n-squared) solution. 925 Note: A precise and complete description of scaling would consider 926 that there are multiple dimensions of scaling, and multiple resources 927 whose usage may be considered. Possible dimensions of scaling 928 include: (i) the total number of streams which exist in an MPLS 929 domain (with associated labels assigned to them); (ii) the total 930 number of "label swapping pairs" which may be stored in the nodes of 931 the network (ie, entries of the form "for incoming label 'x', use 932 outgoing label 'y'"); (iii) the number of labels which need to be 933 assigned for use over a particular link; (iv) The amount of state 934 information which needs to be maintained by any one node. We do not 935 intend to perform a complete analysis of all possible scaling issues, 936 and understand that our use of the terms "O(n)" and "O(n-squared)" is 937 approximate only. 939 3.3 Types of Streams 941 Switched paths in the MPLS network can be of different types: 943 - point-to-point 944 - multipoint-to-point 945 - point-to-multipoint 946 - multipoint-to-multipoint 948 Two of the factors that determine which type of switched path is used 949 are (i) The capability of the switches employed in a network; (ii) 950 The purpose of the creation of a switched path; that is, the types of 951 flows to be carried in the switched path. These two factor also 952 determine the scalability of a network in terms of the number of 953 switched paths in use for transporting data through a network. 955 The point-to-point switched path can be used to connect all ingress 956 nodes to all the egress nodes to carry unicast traffic. In this 957 case, since an ingress node has point-to-point connections to all the 958 egress nodes, the number of connections in use for transporting 959 traffic is of O(n-squared), where n is the number of edges MPLS 960 devices. For small networks the full mesh connection approach may 961 suffice and not pose any scalability problems. However, in large 962 enterprise backbone or ISP networks, this will not scale well. 964 Point-to-point switched paths may be used on a host-to-host or 965 application to application basis (e.g., a switched path per RSVP 966 flow). The dedicated point-to-point switched path transports the 967 unicast data from the ingress to the egress node of the MPLS network. 968 This approach may be used for providing QoS services or for best- 969 effort traffic. 971 A multipoint-to-point switched path connects all ingress nodes to an 972 single egress node. At a given intermediate node in the multipoint- 973 to- point switched path, L2 data units from several upstream links 974 are "merged" into a single label on a downstream link. Since each 975 egress node is reachable via a single multipoint-to-point switched 976 path, the number of switched paths required to transport best-effort 977 traffic through a MPLS network is O(n), where n is the number of 978 egress nodes. 980 The point-to-multipoint switched path is used for distributing 981 multicast traffic. This switched path tree mirrors the multicast 982 distribution tree as determined by the multicast routing protocols. 983 Typically a switch capable of point-to-multipoint connection 984 replicates an L2 data unit from the incoming (parent) interface to 985 all the outgoing (child) interfaces. Standard ATM switches support 986 such functionality in the form of point-to-multipoint VCs or VPs. 988 A multipoint-to-multipoint switched path may be used to combine 989 multicast traffic from multiple sources into a single multicast 990 distribution tree. The advantage of this is that the multipoint-to- 991 multipoint switched path is shared by multiple sources. Conceptually, 992 a form of multipoint-to-multipoint can be thought of as follows: 993 Suppose that you have a point to multipoint VC from each node to all 994 other nodes. Suppose that any point where two or more VCs happen to 995 merge, you merge them into a single VC or VP. This would require 996 either coordination of VCI spaces (so that each source has a unique 997 VCI within a VP) or VC merge capabilities. The applicability of 998 similar concepts to MPLS is FFS. 1000 3.4 Data Driven versus Control Traffic Driven Label Assignment 1002 A fundamental concept in MPLS is the association of labels and 1003 network layer routing. Each LSR must assign labels, and distribute 1004 them to its forwarding peers, for traffic which it intends to forward 1005 by label swapping. In the various contributions that have been made 1006 so far to the MPLS WG we identify three broad strategies for label 1007 assignment; (i) those driven by topology based control traffic 1009 [TAG][ARIS][IP navigator]; (ii) Those driven by request based control 1010 traffic [RSVP]; and (iii) those driven by data traffic 1011 [CSR][Ipsilon]. 1013 We also note that in actual practice combinations of these methods 1014 may be employed. One example is that topology based methods for best 1015 effort traffic plus request based methods for support of RSVP. 1017 3.4.1 Topology Driven Label Assignment 1019 In this scheme labels are assigned in response to normal processing 1020 of routing protocol control traffic. Examples of such control 1021 protocols are OSPF and BGP. As an LSR processes OSPF or BGP updates 1022 it can, as it makes or changes entries in its forwarding tables, 1023 assign labels to those entries. 1025 Among the properties of this scheme are: 1027 - The computational load of assignment and distribution and the 1028 bandwidth consumed by label distribution are bounded by the size 1029 of the network. 1031 - Labels are in the general case preassigned. If a route exists then 1032 a label has been assigned to it (and distributed). Traffic may be 1033 label swapped immediately it arrives, there is no label setup 1034 latency at forwarding time. 1036 - Requires LSRs to be able to process control traffic load only. 1038 - Labels assigned in response to the operation of routing protocols 1039 can have a granularity equivalent to that of the routes advertised 1040 by the protocol. Labels can, by this means, cover (highly) 1041 aggregated routes. 1043 3.4.2 Request Driven Label Assignment 1045 In this scheme labels are assigned in response to normal processing 1046 of request based control traffic. Examples of such control protocols 1047 are RSVP. As an LSR processes RSVP messages it can, as it makes or 1048 changes entries in its forwarding tables, assign labels to those 1049 entries. 1051 Among the properties of this scheme are: 1053 - The computational load of assignment and distribution and the 1054 bandwidth consumed by label distribution are bounded by the 1055 amount of control traffic in the system. 1057 - Labels are in the general case preassigned. If a route exists 1058 then a label has been assigned to it (and distributed). Traffic 1059 may be label swapped immediately it arrives, there is no label 1060 setup latency at forwarding time. 1062 - Requires LSRs to be able to process control traffic load only. 1064 - Depending upon the number of flows supported, this approach may 1065 require a larger number of labels to be assigned compared with 1066 topology driven assignment. 1068 - This approach requires applications to make use of request 1069 paradigm in order to get a label assigned to their flow. 1071 3.4.3 Traffic Driven Label Assignment 1073 In this scheme the arrival of data at an LSR "triggers" label 1074 assignment and distribution. Traffic driven approach has the 1075 following characteristics. 1077 - Label assignment and distribution costs are a function of 1078 traffic patterns. In an LSR with limited label space that is 1079 using a traffic driven approach to amortize its labels over a 1080 larger number of flows the overhead due to label assignment 1081 and distribution grows as a function of the number of flows 1082 and as a function of their "persistence". Short lived but 1083 recurring flows may impose a heavy control burden. 1085 - There is a latency associated with the appearance of a "flow" 1086 and the assignment of a label to it. The documented approaches 1087 to this problem suggest L3 forwarding during this setup phase, 1088 this has the potential for packet reordering (note that packet 1089 reordering may occur with any scheme when the network topology 1090 changes, but traffic driven label assignment introduces another 1091 cause for reordering). 1093 - Flow driven label assignment requires high performance packet 1094 classification capabilities. 1096 - Traffic driven label assignment may be useful to reduce label 1097 consumption (assuming that flows are not close to full mesh). 1099 - If you want flows to hosts, due to limits on label space, then 1100 traffic based label consumption is probably necessary due to 1101 the large number of hosts which may occur in a network. 1103 - If you want to assign specific network resources to specific 1104 labels, to be used for support of application flows, then 1105 again the fine grain associated with labels may require data 1106 based label assignment. 1108 3.5 The Need for Dealing with Looping 1110 Routing protocols which are used in conjunction with MPLS will in 1111 many cases be based on distributed computation. As such, during 1112 routing transients, these protocols may compute forwarding paths 1113 which contain loops. For this reason MPLS will be designed with 1114 mechanisms to either prevent the formation of loops and /or contain 1115 the amount of resources that can be consumed due to the presence of 1116 loops. 1118 Note that there are a number of different alternative mechanisms 1119 which have been proposed (see section 4.3). Some of these prevent the 1120 formation of layer 2 forwarding loops, others allow loops to form but 1121 minimize their impact in one way or another (e.g., by discarding 1122 packets which loop, or by detecting and closing the loop after a 1123 period of time). Generally speaking, there are tradeoffs to be made 1124 between the amount of looping which might occur, and other 1125 considerations such as the time to convergence after a change in the 1126 paths computed by the routing algorithm. 1128 We are not proposing any changes to normal layer 3 operation, and 1129 specifically are not trying to eliminate the possibility of looping 1130 at layer 3. Transient loops will continue to be possible in IP 1131 networks. Note that IP has a means to limit the damage done by 1132 looping packets, based on decrementing the IP TTL field as the packet 1133 is forwarded, and discarding packets whose TTL has expired. Dynamic 1134 routing protocols used with IP are also designed to minimize the 1135 amount of time during which loops exist. 1137 The question that MPLS has to deal with is what to do at L2. In some 1138 cases L2 may make use of the same method that is used as L3. However, 1139 other options are available at L2, and in some cases (specifically 1140 when operating over ATM or Frame Relay hardware) the method of 1141 decrementing a TTL field (or any similar field) is not available. 1143 There are basically two problems caused by packet looping: The most 1144 obvious problem is that packets are not delivered to the correct 1145 destination. The other result of looping is congestion. Even with TTL 1146 decrementing and packet discard, there may still be a significant 1147 amount of time that packets travel through a loop. This can adversely 1148 affect other packets which are not looping: Congestion due to the 1149 looping packets can cause non-looping packets to be delayed and/or 1150 discarded. 1152 Looping is particularly serious in (at least) three cases: One is 1153 when forwarding over ATM. Since ATM does not have a TTL field to 1154 decrement, there is no way to discard ATM cells which are looping 1155 over ATM subnetworks. Standard ATM PNNI routing and signaling solves 1156 this problem by making use of call setup procedures which ensure that 1157 ATM VCs will never be setup in a loop [PNNI]. However, when MPLS is 1158 used over ATM subnets, the native ATM routing and signaling 1159 procedures may not be used for the full L2 path. This leads to the 1160 possibility that MPLS over ATM might in principle allow packets to 1161 loop indefinitely, or until L3 routing stabilizes. Methods are needed 1162 to prevent this problem. 1164 Another case in which looping can be particularly unpleasant is for 1165 multicast traffic. With multicast, it is possible that the packet may 1166 be delivered successfully to some destinations even though copies 1167 intended for other destinations are looping. This leads to the 1168 possibility that huge numbers of identical packets could be delivered 1169 to some destinations. Also, since multicast implies that packets are 1170 duplicated at some points in their path, the congestion resulting 1171 from looping packets may be particularly severe. 1173 Another unpleasant complication of looping occurs if the congestion 1174 caused by the loop interferes with the routing protocol. It is 1175 possible for the congestion caused by looping to cause routing 1176 protocol control packets to be discarded, with the result that the 1177 routing protocol becomes unstable. For example this could lengthen 1178 the duration of the loop. 1180 In normal operation of IP networks the impact of congestion is 1181 limited by the fact that TCP backs off (i.e., transmits substantially 1182 less traffic) in response to lost packets. Where the congestion is 1183 caused by looping, the combination of TTL and the resulting discard 1184 of looping packets, plus the reduction in offered traffic, can limit 1185 the resulting impact on the network. TCP backoff however does not 1186 solve the problem if the looping packets are not discarded (for 1187 example, if the loop is over an ATM subnetwork where TTL is not 1188 used). 1190 The severity of the problem caused by looping may depend upon 1191 implementation details. Suppose, for instance, that ATM switching 1192 hardware is being used to provide MPLS switching functions. If the 1193 ATM hardware has per-VC queuing, and if it is capable of providing 1194 fair access to the buffer pool for incoming cells based on the 1195 incoming VC (so that no one incoming VC is allowed to grab a 1196 disproportionate number of buffers), this looping might not have a 1197 significant effect on other traffic. If the ATM hardware cannot 1198 provide fair buffer access of this sort, however, then even transient 1199 loops may cause severe degradation of the node's total performance. 1201 Given that MPLS is a relatively new approach, it is possible that 1202 looping may have consequences which are not fully understood (such as 1203 looping of LDP control information in cases where stream merge is not 1204 used). 1206 Even if fair buffer access can be provided, it is still worthwhile to 1207 have some means of detecting loops that last "longer than possible". 1208 In addition, even where TTL and/or per-VC fair queuing provides a 1209 means for surviving loops, it still may be desirable where practical 1210 to avoid setting up LSPs which loop. 1212 Methods for dealing with loops are discussed in section 4.3. 1214 3.6 Operations and Management 1216 Operations and management of networks is critically important. This 1217 implies that MPLS must support operations, administration, and 1218 maintenance facilities at least as extensive as those supported in 1219 current IP networks. 1221 In most ways this is a relatively simple requirement to meet. Given 1222 that all MPLS nodes run normal IP routing protocols, it is 1223 straightforward to expect them to participate in normal IP network 1224 management protocols. 1226 There is one issue which has been identified and which needs to be 1227 addressed by the MPLS effort: There is an issue with regard to 1228 operation of Traceroute over MPLS networks. Note that other O&M 1229 issues may be identified in the future. 1231 Traceroute is a very commonly used network management tool. 1232 Traceroute is based on use of the TTL field: A station trying to 1233 determine the route from itself to a specified address transmits 1234 multiple IP packets, with the TTL field set to 1 in the first packet, 1235 2 in the second packet, etc.. This causes each router along the path 1236 to send back an ICMP error report for TTL exceeded. This in turn 1237 allows the station to determine the set of routers along the route. 1238 For example, this can be used to determine where a problem exists (if 1239 no router responds past some point, the last router which responds 1240 can become the starting point for a search to determine the cause of 1241 the problem). 1243 When MPLS is operating over ATM or Frame Relay networks there is no 1244 TTL field to decrement (and ATM and Frame Relay forwarding hardware 1245 does not decrement TTL). This implies that it is not straightforward 1246 to have Traceroute operate in this environment. 1248 There is the question of whether we *want* all routers along a path 1249 to be visible via traceroute. For example, an ISP probably doesn't 1250 want to expose the interior of their network to a customer. However, 1251 the issue of whether a network's policy will allow the interior of 1252 the network to be visible should be independent of whether is it 1253 possible for some users to see the interior of the network. Thus 1254 while there clearly should be the possibility of using policy 1255 mechanisms to block traceroute from being used to see the interior of 1256 the network, this does not imply that it is okay to develop protocol 1257 mechanisms which break traceroute from working. 1259 There is also the question of whether the interior of a MPLS network 1260 is analogous to a normal IP network, or whether it is closer to the 1261 interior of a layer 2 network (for example, an ATM subnet). Clearly 1262 IP traceroute cannot be used to expose the interior of an ATM subnet. 1263 When a packet is crossing an ATM subnetwork (for example, between an 1264 ingress and an egress router which are attached to the ATM subnet) 1265 traceroute can be used to determine the router to router path, but 1266 not the path through the ATM switches which comprise the ATM subnet. 1267 Note here that MPLS forms a sort of "in between" special case: 1268 Routing is based on normal IP routing protocols, the equivalent of 1269 call setup (label binding/exchange) is based on MPLS-specific 1270 protocols, but forwarding is based on normal L2 ATM forwarding. MPLS 1271 therefore supersedes the normal ATM-based methods that would be used 1272 to eliminate loops and/or trace paths through the ATM subnet. 1274 It is generally agreed that Traceroute is a relatively "ugly" tool, 1275 and that a better tool for tracing the route of a packet would be 1276 preferable. However, no better tool has yet been designed or even 1277 proposed. Also, however ugly Traceroute may be, it is nonetheless 1278 very useful, widely deployed, and widely used. In general, it is 1279 highly preferable to define, implement, and deploy a new tool, and to 1280 determine through experience that the new tool is sufficient, before 1281 breaking a tool which is as widely used as traceroute. 1283 Methods that may be used to either allow traceroute to be used in an 1284 MPLS network, or to replace traceroute, are discussed in section 1285 4.14. 1287 4. Technical Approaches 1289 We believe that section 4 is probably less complete than other 1290 sections. Additional subsections are likely to be needed as a result 1291 of additional discussions in the MPLS working group. 1293 4.1 Label Distribution 1295 A fundamental requirement in MPLS is that an LSR forwarding label 1296 switched traffic to another LSR apply a label to that traffic which 1297 is meaningful to the other (receiving the traffic) LSR. LSR's could 1298 learn about each other's labels in a variety of ways. We call the 1299 general topic "label distribution". 1301 4.1.1 Explicit Label Distribution 1303 Explicit label distribution anticipates the specification by MPLS of 1304 a standard protocol for label distribution. Two of the possible 1305 approaches [TDP] [ARIS] that are oriented toward topology driven 1306 label distribution. One other approach [FANP], in contrast, makes use 1307 of traffic driven label distribution. 1309 We expect that the label distribution protocol (LDP) which emerges 1310 from the MPLS WG is likely to inherit elements from one or more of 1311 the possible approaches. 1313 Consider LSR A forwarding traffic to LSR B. We call A the upstream 1314 (wrt to dataflow) LSR and B the downstream LSR. A must apply a label 1315 to the traffic that B "understands". Label distribution must ensure 1316 that the "meaning" of the label will be communicated between A and B. 1317 An important question is whether A or B (or some other entity) 1318 allocates the label. 1320 In this discussion we are talking about the allocation and 1321 distribution of labels between two peer LSRs that are on a single 1322 segment of what may be a longer path. A related but in fact entirely 1323 separate issue is the question of where control of the whole path 1324 resides. In essence there are two models; by analogy to upstream and 1325 downstream for a single segment we can talk about ingress and egress 1326 for an LSP (or to and from a label swapping "domain"). In one model a 1327 path is setup from ingress to egress in the other from egress to 1328 ingress. 1330 4.1.1.1 Downstream Label Allocation 1332 "Downstream Label Allocation" refers to a method where the label 1333 allocation is done by the downstream LSR, i.e. the LSR that uses the 1334 label as an index into its switching tables. 1336 This is, arguably, the most natural label allocation/distribution 1337 mode for unicast traffic. As an LSR build its routing tables (we 1338 consider here control driven allocation of tags) it is free, within 1339 some limits we will discuss, to allocate labels to in any manner that 1340 may be convenient to the particular implementation. Since the labels 1341 that it allocates will be those upon which it subsequently makes 1342 forwarding decisions we assume implementations will perform the 1343 allocation in an optimal manner. Having allocated labels the default 1344 behavior is to distribute the labels (and bindings) to all peers. 1346 In some cases (particularly with ATM) there may be a limited number 1347 of labels which may be used across an interface, and/or a limited 1348 number of label assignments which may be supported by a single 1349 device. Operation in this case may make use of "on demand" label 1350 assignment. With this approach, an LSR may for example request a 1351 label for a route from a particular peer only when its routing 1352 calculations indicate that peer to be the new next hop for the route. 1354 4.1.1.2 Upstream Label Allocation 1356 "Upstream Label Allocation" refers to a method where the label 1357 allocation is done by the upstream LSR. In this case the LSR choosing 1358 the label (the upstream LSR) and the LSR which needs to interpret 1359 packets using the label (the downstream LSR) are not the same node. 1360 We note here that in the upstream LSR the label at issue is not used 1361 as an index into the switching tables but rather is found as the 1362 result of a lookup on those tables. 1364 The motivation for upstream label allocation comes from the 1365 recognition that it might be possible to optimize multicast machinery 1366 in an LSR if it were possible to use the same label on all output 1367 ports for which a particular multicast packet/cell were destined. 1368 Upstream assignment makes this possible. 1370 4.1.1.3 Other Label Allocation Methods 1372 Another option would be to make use of label values which are unique 1373 within the MPLS domain (implying that a domain-wide allocation would 1374 be needed). In this case, any stream to a particular MPLS egress node 1375 could make use of the label of that node (implying that label values 1376 do not need to be swapped at intermediate nodes). 1378 With this method of label allocation, there is a choice to be made 1379 regarding the scope over which a label is unique. One approach is to 1380 configure each node in an MPLS domain with a label which is unique in 1381 that domain. Another approach is to use a truly global identifier 1382 (for example the IEEE 48 bit identifier), where each MPLS-capable 1383 node would be stamped at birth with a truly globally unique 1384 identifier. The point of this global approach is to simplify 1385 configuration in each MPLS domain by eliminating the need to 1386 configure label IDs. 1388 4.1.2 Piggybacking on Other Control Messages 1390 While we have discussed use of an explicit MPLS LDP we note that 1391 there are several existing protocols that can be easily modified to 1392 distribute both routing/control and label information. This could be 1393 done with any of OSPF, BGP, RSVP and/or PIM. A particular 1394 architectural elegance of these schemes is that label distribution 1395 uses the same mechanisms as are used in distribution of the 1396 underlying routing or control information. 1398 When explicit label distribution is used, the routing computation and 1399 label distribution are decoupled. This implies a possibility that at 1400 some point you may either have a route to a specific destination 1401 without an associated label, and/or a label for a specific 1402 destination which makes use of a path which you are no longer using. 1403 Piggybacking label distribution on the operation of the routing 1404 protocol is one way to eliminate this decoupling. 1406 Piggybacking label distribution on the routing protocol introduces an 1407 issue regarding how to negotiate acceptable label values and what to 1408 do if an invalid label is received. This is discussed in section 1409 4.1.3. 1411 4.1.3 Acceptable Label Values 1413 There are some constraints on which label values may be used in 1414 either allocation mode. Clearly the label values must lie within the 1415 allowable range described in the encapsulation standards that the 1416 MPLS WG will produce. The label value used must also, however, lie 1417 within a range that the peer LSR is capable of supporting. We 1418 imagine that certain machines, for example ATM switches operating as 1419 LSRs may, due to operational or implementation restrictions, support 1420 a label space more limited than that bounded by the valid range found 1421 in the encapsulation standard. This implies that an advertisement or 1422 negotiation mechanism for useable label range may be a part of the 1423 MPLS LDP. When operating over ATM using ATM forwarding hardware, due 1424 to the need for compatibility with the existing use of the ATM 1425 VPI/VCI space, it is quite likely that an explicit mechanism will be 1426 needed for label range negotiation. 1428 In addition we note that LDP may be one of a number of mechanism used 1429 to distribute labels between any given pair of LSRs. Clearly where 1430 such multiple mechanisms exist care must be taken to coordinate the 1431 allocation of label values. A single label value must have a unique 1432 meaning to the LSR that distributes it. 1434 There is an issue regarding how to allow negotiation of acceptable 1435 label values if label distribution is piggybacked with the routing 1436 protocol. In this case it may be necessary either to require 1437 equipment to accept any possible label value, or to configure devices 1438 to know which range of label values may be selected. It is not clear 1439 in this case what to do if an invalid label value is received as 1440 there may be no means of sending a NAK. 1442 A similar issue occurs with multicast traffic over broadcast media, 1443 where there may be multiple nodes which receive the same transmission 1444 (using a single label value). Here again it may be "non-trivial" how 1445 to allow n-party negotiation of acceptable label values. 1447 4.1.4 LDP Reliability 1449 The need for reliable label distribution depends upon the relative 1450 performance of L2 and L3 forwarding, as well as the relationship 1451 between label distribution and the routing protocol operation. 1453 If label distribution is tied to the operation of the routing 1454 protocol, then a reasonable protocol design would ensure that labels 1455 are distributed successfully as long as the associated route and/or 1456 reachability advertisement is distributed successfully. This implies 1457 that the reliability of label distribution will be the same as the 1458 reliability of route distribution. 1460 If there is a very large difference between L2 and L3 forwarding 1461 performance, then the cost of failing to deliver a label is 1462 significant. In this case it is important to ensure that labels are 1463 distributed reliably. Given that LDP needs to operate in a wide 1464 variety of environments with a wide variety of equipment, this 1465 implies that it is important for any LDP developed by the MPLS WG to 1466 ensure reliable delivery of label information. 1468 Reliable delivery of LDP packets may potentially be accomplished 1469 either by using an existing reliable transport protocol such as TCP, 1470 or by specifying reliability mechanisms as part of LDP (for example, 1471 the reliability mechanisms which are defined in IDRP could 1472 potentially be "borrowed" for use with LSP). 1474 4.1.5 Label Purge Mechanisms 1476 Another issue to be considered is the "lifetime" of label data once 1477 it arrives at an LSR, and the method of purging label data. There are 1478 several methods that could be used either separately, or (more 1479 likely) in combination. 1481 One approach is for label information to be timed out. With this 1482 approach a lifetime is distributed along with the label value. The 1483 label value may be refreshed prior to timing out. If the label is not 1484 refreshed prior to timing out it is discarded. In this case each 1485 lifetime and timer may apply to a single label, or to a group of 1486 labels (e.g., all labels selected by the same node). 1488 Similarly, two peer nodes may make use of an MPLS peer keep-alive 1489 mechanism. This implies exchange of MPLS control packets between 1490 neighbors on a periodic basis. This in general is likely to use a 1491 smaller timeout value than label value timers (analogous to the fact 1492 that the OSPF HELLO interval is much shorter than the OSPF LSA 1493 lifetime). If the peer session between two MPLS nodes fails (due to 1494 expiration of the associated timer prior to reception of the refresh) 1495 then associated label information is discarded. 1497 If label information is piggybacked on the routing protocol then the 1498 timeout mechanisms would also be taken from the associated routing 1499 protocol (note that routing protocols in general have mechanisms to 1500 invalidate stale routing information). 1502 An alternative method for invalidating labels is to make use of an 1503 explicit label removal message. 1505 4.2 Stream Merging 1507 In order to scale O(n) (rather than O(n-squared), MPLS makes use of 1508 the concept of stream merge. This makes use of multipoint to point 1509 streams in order to allow multiple streams to be merged into one 1510 stream. 1512 Types of Stream Merge: 1514 There are several types of stream merge that can be used, depending 1515 upon the underlying media. 1517 When MPLS is used over frame based media merging is straightforward. 1518 All that is required for stream merge to take place is for a node to 1519 allow multiple upstream labels to be forwarded the same way and 1520 mapped into a single downstream label. This is referred to as frame 1521 merge. 1523 Operation over ATM media is less straightforward. In ATM, the data 1524 packets are encapsulated into an ATM Adaptation Layer, say AAL5, and 1525 the AAL5 PDU is segmented into ATM cells with a VPI/VCI value and the 1526 cells are transmitted in sequence. It is contingent on ATM switches 1527 to keep the cells of a PDU (or with the same VPI/VCI value) 1528 contiguous and in sequence. This is because the device that 1529 reassembles the cells to re-form the transmitted PDU expects the 1530 cells to be contiguous and in sequence, as there isn't sufficient 1531 information in the ATM cell header (unlike IP fragmentation) to 1532 reassemble the PDU with any cell order. Hence, if cells from several 1533 upstream link are transmitted onto the same downstream VPI/VCI, then 1534 cells from one PDU can get interleaved with cells from another PDU on 1535 the outgoing VPI/VCI, and result in corruption of the original PDUs 1536 by mis-sequencing the cells of each PDU. 1538 The most straightforward (but erroneous) method of merging in an ATM 1539 environment would be to take the cells from two incoming VCs and 1540 merge them into a single outgoing VCI. If this was done without any 1541 buffering of cells then cells from two or more packets could end up 1542 being interleaved into a single AAL5 frame. Therefore the problem 1543 when operating over ATM is how to avoid interleaving of cells from 1544 multiple sources. 1546 There are two ways to solve this interleaving problem, which are 1547 referred to as VC merge and VP merge. 1549 VC merge allows multiple VCs to be merged into a single outgoing VC. 1550 In order for this to work the node performing the merge needs to keep 1551 the cells from one AAL5 frame (e.g., corresponding to an IP packet) 1552 separate from the cells of other AAL5 frames. This may be done by 1553 performing the SAR function in order to reassemble each IP packet 1554 before forwarding that packet. In this case VC merge is essentially 1555 equivalent to frame merge. An alternative is to buffer the cells of 1556 one AAL5 frame together, without actually reassembling them. When the 1557 end of frame indicator is reached that frame can be forwarded. Note 1558 however that both forms of VC merge requires that the entire AAL5 1559 frame be received before any cells corresponding to that frame be 1560 forwarded. VC merge therefore requires capabilities which are 1561 generally not available in most existing ATM forwarding hardware. 1563 The alternative for use over ATM media is VP merge. Here multiple VPs 1564 can be merged into a single VP. Separate VCIs within the merged VP 1565 are used to distinguish frames (e.g., IP packets) from different 1566 sources. In some cases, one VP may be used for the tree from each 1567 ingress node to a single egress node. 1569 Interoperation of Merge Options: 1571 If some nodes support stream merge, and some nodes do not, then it is 1572 necessary to ensure that the two types of nodes can interoperate 1573 within a single network. This affects the number of labels that a 1574 node needs to send to a neighbor. An upstream LSR which supports 1575 Stream Merge needs to be sent only one label per forwarding 1576 equivalence class (FEC). An upstream neighbor which does not support 1577 Stream Merge needs to be sent multiple labels per FEC. However, there 1578 is no way of knowing a priori how many labels it needs. This will 1579 depend on how many LSRs are upstream of it with respect to the FEC in 1580 question. 1582 If a particular upstream neighbor does not support stream merge, it 1583 is not known a priori how many labels it will need. The upstream 1584 neighbor may need to explicitly ask for labels for each FEC. The 1585 upstream neighbor may make multiple such requests (for one or more 1586 labels per request). When a downstream neighbor receives such a 1587 request from upstream, and the downstream neighbor does not itself 1588 support stream merge, then it must in turn ask its downstream 1589 neighbor for more labels for the FEC in question. 1591 It is possible that there may be some nodes which support merge, but 1592 have a limited number of upstream streams which may be merged into a 1593 single downstream streams. Suppose for example that due to some 1594 hardware limitation a node is capable of merging four upstream LSPs 1595 into a single downstream LSP. Suppose however, that this particular 1596 node has six upstream LSPs arriving at it for a particular Stream. In 1597 this case, this node may merge these into two downstream LSPs 1598 (corresponding to two labels that need to be obtained from the 1599 downstream neighbor). In this case, the node will need to obtain the 1600 required two labels. 1602 The interoperation of the various forms of merging over ATM is most 1603 easily described by first describing the interoperation of VC merge 1604 with non-merge. 1606 In the case where VC merge and non-merge nodes are interconnected the 1607 forwarding of cells is based in all cases on a VC (i.e., the 1608 concatenation of the VPI and VCI). For each node, if an upstream 1609 neighbor is doing VC merge then that upstream neighbor requires only 1610 a single outgoing VPI/VCI for a particular FEC (this is analogous to 1611 the requirement for a single label in the case of operation over 1612 frame media). If the upstream neighbor is not doing merge, then it 1613 will require a single outgoing VPI/VCI per FEC for itself (assuming 1614 that it can be an ingress node), plus enough outgoing VPI/VCIs to map 1615 to incoming VPI/VCIs to pass to its upstream neighbors. The number 1616 required will be determined by allowing the upstream nodes to request 1617 additional VPI/VCIs from their downstream neighbors. 1619 A similar method is possible to support nodes which perform VP merge. 1620 In this case the VP merge node, rather than requesting a single 1621 VPI/VCI or a number of VPI/VCIs from its downstream neighbor, instead 1622 may request a single VP (identified by a VPI). Furthermore, suppose 1623 that a non-merge node is downstream from two different VP merge 1624 nodes. This node may need to request one VPI/VCI (for traffic 1625 originating from itself) plus two VPs (one for each upstream node). 1627 Note that there are multiple options for coordinating VCIs within a 1628 VP. Description of the range of options is FFS. 1630 In order to support all of VP merge, VC merge, and non-merge, it is 1631 therefore necessary to allow upstream nodes to request a combination 1632 of zero or more VC identifiers (consisting of a VPI/VCI), plus zero 1633 or more VPs (identified by VPIs). VP merge nodes would therefore 1634 request one VP. VC merge node would request only a single VPI/VCI 1635 (since they can merge all upstream traffic into a single VC). Non- 1636 merge nodes would pass on any requests that they get from above, plus 1637 request a VPI/VCI for traffic that they originate (if they can be 1638 ingress nodes). However, non-merge nodes which can only do VC 1639 forwarding (and not VP forwarding) will need to know which VCIs are 1640 used within each VP in order to install the correct VCs in its 1641 forwarding table. A detailed description of how this could work is 1642 FFS. 1644 Coordination of the VCI space with VP Merge: 1646 VP merge requires that the VCIs be coordinated to ensure uniqueness. 1647 There are a number of ways in which this may be accomplished: 1649 1. Each node may be pre-configured with a unique VCI value (or 1650 values). 1652 2. Some one node (most likely they root of the multipoint to point 1653 tree) may coordinate the VCI values used within the VP. A 1654 protocol mechanism will be needed to allow this to occur. How 1655 hard this is to do depends somewhat upon whether the root is 1656 otherwise involved in coordinating the multipoint to point 1657 tree. For example, allowing one node (such as the root) to 1658 coordinate the tree may be useful for purposes of coordinating 1659 load sharing (see section 4.10). Thus whether or not the issue 1660 of coordinating the VCI space is significant or trivial may 1661 depend upon other design choices which at first glance may 1662 have appeared to be independent protocol design choices. 1664 3. Other unique information such as portions of a class B or class 1665 C address may be used to provide a unique VCI value. 1667 4. Another alternative is to implement a simple hardware extension 1668 in the ATM switches to keep the VCI values unique by dynamically 1669 altering them to avoid collision. 1671 VP merge makes less efficient use of the VPI/VCI space (relative to 1672 VC merge). When VP merge is used, the LSPs may not be able to 1673 transit public ATM networks that dont support SVP. 1675 Buffering Issues Related To Stream Merge: 1677 There is an issue regarding the amount of buffering required for 1678 frame merge, VC merge, and VP merge. Frame merge and VC merge 1679 requires that intermediate points buffer incoming packets until the 1680 entire packet arrives. This is essentially the same as is required in 1681 traditional IP routers. 1683 VP merge allows cells to be transmitted by intermediate nodes as soon 1684 as they arrive, reducing the buffering and latency at intermediate 1685 nodes. However, the use of VP merge implies that cells from multiple 1686 packets will arrive at the egress node interleaved on separate VCIs. 1687 This in turn implies that the egress node may have somewhat increased 1688 buffering requirements. To a large extent egress nodes for some 1689 destinations will be intermediate nodes for other destinations, 1690 implying that increase in buffers required for some purpose (egress 1691 traffic) will be offset by a reduction in buffers required for other 1692 purposes (transit traffic). Also, routers today typically deal with 1693 high-fanout channelized interfaces and with multi-VC ATM interfaces, 1694 implying that the requirement of buffering simultaneously arriving 1695 cells from multiple packets and sources is something that routers 1696 typically do today. This is not meant to imply that the required 1697 buffer size and performance is inexpensive, but rather is meant to 1698 observe that it is a solvable issue. 1700 4.3 Loop Handling 1702 Generally, methods for dealing with loops can be split into three 1703 categories: Loop Survival makes use of methods which minimize the 1704 impact of loops, for example by limiting the amount of network 1705 resources which can be consumed by a loop; Loop Detection allows 1706 loops to be set up, but later detects these loops and eliminates 1707 them; Loop Prevention provides methods for avoiding setting up L2 1708 forwarding in a way which results in a L2 loop. 1710 Note that we are concerned here only with loops that occur in L2 1711 forwarding. Transient loops at L3 will continue to be part of the 1712 normal IP operation, and will be handled the way that IP has been 1713 handling loops for years (see section 3.5). 1715 Loop Survival: 1717 Loop Survival refers to methods that are used to allow the network to 1718 operate well even though short term transient loops may be formed by 1719 the routing protocol. The basic approach to loop survival is to limit 1720 the amount of network resources which are consumed by looping 1721 packets, and to minimize the effect on other (non-looping) traffic. 1722 Note that loop survival is the method used by conventional IP 1723 forwarding, and is therefore based on long and relatively successful 1724 experience in the Internet. 1726 The most basic method for loop survival is based on the use to a TTL 1727 (Time To Live) field. The TTL field is decremented at each hop. If 1728 the TTL field reaches zero, then the packet is discarded. This method 1729 works well over those media which has a TTL field. This explicitly 1730 includes L3 IP forwarding. Also, assuming that the core MPLS 1731 specifications will include definition of a "shim" MPLS header for 1732 use over those media which do not have their own labels, in order to 1733 carry labels for use in forwarding of user data, it is likely that 1734 the shim header will also include a TTL field. 1736 However, there is considerable interest in using MPLS over L2 1737 protocols which provide their own labels, with the L2 label used for 1738 MPLS forwarding. Specific L2 protocols which offer a label for this 1739 purpose include ATM and Frame Relay. However, neither ATM nor Frame 1740 Relay have a TTL field. This implies that this method cannot be used 1741 when basic ATM or Frame Relay forwarding is being used. 1743 Another basic method for loop survival is the use of dynamic routing 1744 protocols which converge rapidly to non-looping paths. In some 1745 instances it is possible that congestion caused by looping data could 1746 effect the convergence of the routing protocol (see section 3.5). 1747 MPLS should be designed to prevent this problem from occurring. Given 1748 that MPLS uses the same routing protocols as are used for IP, this 1749 method does not need to be discussed further in this framework 1750 document. 1752 Another possible tool for loop survival is the use of fair queuing. 1753 This allows unrelated flows of user data to be placed in different 1754 queues. This helps to ensure that a node which is overloaded with 1755 looping user data can nonetheless forward unrelated non-looping data, 1756 thereby minimizing the effect that looping data has on other data. We 1757 cannot assume that fair queuing will always be available. In 1758 practice, many fair queuing implementations merge multiple streams 1759 into one queue (implying that the number of queues used is less than 1760 the number of user data flows which are present in the network). 1761 This implies that any data which happens to be in the same queue with 1762 looping data may be adversely effected. 1764 Loop Detection: 1766 Loop Detection refers to methods whereby a loop may be set up at L2, 1767 but the loop is subsequently detected. When the loop is detected, it 1768 may be broken at L2 by dropping the label relationship, implying that 1769 packets for a set of destinations must be forwarded at L3. 1771 A possible method for loop detection is based on transmitting a "loop 1772 detection" control packet (LDCP) along the path towards a specified 1773 destination whenever the route to the destination changes. This LDCP 1774 is forwarded in the direction that the label specifies, with the 1775 labels swapped to the correct next hop value. However, normal L2 1776 forwarding cannot be used because each hop needs to examine the 1777 packet to check for loops. The LDCP is forwarded towards that 1778 destination until one of the following happens: (i) The LDCP reaches 1779 the last MPLS node along the path (i.e. the next hop is either a 1780 router which is not participating in MPLS, or is the final 1781 destination host); (ii) The TTL of the LDCP expires (assuming that 1782 the control packet uses a TTL, which is optional but not absolutely 1783 necessary), or (iii) The LDCP returns to the node which originally 1784 transmitted it. If the latter occurs, then the packet has looped and 1785 the node which originally transmitted the LDCP stops using the 1786 associated label, and instead uses L3 forwarding for the associated 1787 destination addresses. One problem with this method is that once a 1788 loop is detected it is not known when the loop clears. One option 1789 would be to set a timer, and to transmit a new LDCP when the timer 1790 expires. 1792 An alternate method counts the hops to each egress node, based on the 1793 routes currently available. Each node advertises its distance (in hop 1794 counts) to each destination. An egress node advertises the 1795 destinations that it can reach directly with an associated hop count 1796 of zero. For each destination, a node computes the hop count to that 1797 destination based on adding one to the hop count advertised by its 1798 actual next hop used for that destination. When the hop count for a 1799 particular destination changes, the hop counts needs to be 1800 readvertised. 1802 In addition, the first of the loop prevention schemes discussed below 1803 may be modified to provide loop detection (the details are 1804 straightforward, but have not been written down in time to include in 1805 this rough draft). 1807 Loop Prevention: 1809 Loop prevention makes use of methods to ensure that loops are never 1810 set up at L2. This implies that the labels are not used until some 1811 method is used to ensure that following the label towards the 1812 destination, with associated label swaps at each switch, will not 1813 result in a loop. Until the L2 path (making use of assigned labels) 1814 is available, packets are forwarded at L3. 1816 Loop prevention requires explicit signaling of some sort to be used 1817 when setting up an L2 stream. 1819 One method of loop prevention requires that labels be propagated 1820 starting at the egress switch. The egress switch signals to 1821 neighboring switches the label to use for a particular destination. 1822 That switch then signals an associated label to its neighbors, etc. 1823 The control packets which propagate the labels also include the path 1824 to the egress (as a list of routerIDs). Any looping control packet 1825 can therefore be detected and the path not set up to or past the 1826 looping point. . 1829 Another option is to use explicit routing to set up label bindings 1830 from the egress switch to each ingress switch. This precludes the 1831 possibility of looping, since the entire path is computed by one 1832 node. This also allows non-looping paths to be set up provided that 1833 the egress switch has a view of the topology which is reasonably 1834 close to reality (if there are operational links which the egress 1835 switch doesn't know about, it will simply pick a path which doesn't 1836 use those links; if there are links which have failed but which the 1837 the egress switch thinks are operational, then there is some chance 1838 that the setup attempt will fail but in this case the attempt can be 1839 retried on a separate path). Note therefore that non-looping paths 1840 can be set up with this method in many cases where distributed 1841 routing plus hop by hop forwarding would not actually result in non- 1842 looping paths. This method is similar to the method used by standard 1843 ATM routing to ensure that SVCs are non-looping [PNNI]. 1845 Explicit routing is only applicable if the routing protocol gives the 1846 egress switch sufficient information to set up the explicit route, 1847 implying that the protocol must be either a link state protocol (such 1848 as OSPF) or a path vector protocol (such as BGP). Source routing 1849 therefore is not appropriate as a general approach for use in any 1850 network regardless of the routing protocol. This method also requires 1851 some overhead for the call setup before label-based forwarding can be 1852 used. If the network topology changes in a manner which breaks the 1853 existing path, then a new path will need to be explicit routed from 1854 the egress switch. Due to this overhead this method is probably only 1855 appropriate if other significant advantages are also going to be 1856 obtained from having a single node (the egress switch) coordinate the 1857 paths to be used. Examples of other reasons to have one node 1858 coordinate the paths to a single egress switch include: (i) 1859 Coordinating the VCI space where VP merge is used (see section 4.2); 1860 and (ii) Coordinating the routing of streams from multiple ingress 1861 switches to one egress switch so as to balance the load on multiple 1862 alternate paths through the network. 1864 In principle the explicit routing could also be done in the alternate 1865 direction (from ingress to egress). However, this would make it more 1866 difficult to merge streams if stream merge is to be used. This would 1867 also make it more difficult to coordinate (i) changes to the paths 1868 used, (ii) the VCI space assignments, and (iii) load sharing. This 1869 therefore makes explicit routing more difficult, and also reduces the 1870 other advantages that could be obtained from the approach. 1872 If label distribution is piggybacked on the routing protocol (see 1873 section 4.1.2), then loop prevention is only possible if the routing 1874 protocol itself does loop prevention. 1876 What To Do If A Loop Is Detected: 1878 With all of these schemes, if a loop is known to exist then the L2 1879 label-swapped path is not set up. This leads to the obvious question 1880 of what does an MPLS node do when it doesn't have a label for a 1881 particular destination, and a packet for that destination arrives to 1882 be forwarded? If possible, the packet is forwarded using normal L3 1883 (IP) forwarding. There are two issues that this raises: (i) What 1884 about nodes which are not capable of L3 forwarding; (ii) Given the 1885 relative speeds of L2 and L3 forwarding, does this work? 1887 Nodes which are not capable of L3 forwarding obviously can't forward 1888 a packet unless it arrives with a label, and the associated next hop 1889 label has been assigned. Such nodes, when they receive a packet for 1890 which the next hop label has not been assigned, must discard the 1891 packet. It is probably safe to assume that if a node cannot forward 1892 an L3 packet, then it is probably also incapable of forwarding an 1893 ICMP error report that it originates. This implies that the packet 1894 will need to be discarded in this case. 1896 In many cases L2 forwarding will be significantly faster than L3 1897 forwarding (allowing faster forwarding is a significant motivation 1898 behind the work on MPLS). This implies that if a node is forwarding a 1899 large volume of traffic at L2, and a change in the routing protocol 1900 causes the associated labels to be lost (necessitating L3 1901 forwarding), in some cases the node will not be capable of forwarding 1902 the same volume of traffic at L3. This will of course require that 1903 packets be discarded. However, in some cases only a relatively small 1904 volume of traffic will need to be forwarded at L3. Thus forwarding at 1905 L3 when L2 is not available is not necessarily always a problem. 1906 There may be some nodes which are capable of forwarding equally fast 1907 at L2 and L3 (for example, such nodes may contain IP forwarding 1908 hardware which is not available in all nodes). Finally, when packets 1909 are lost this will cause TCP to backoff, which will in turn reduce 1910 the load on the network and allow the network to stabilize even at 1911 reduced forwarding rates until such time as the label bindings can be 1912 reestablished. 1914 Note that in most cases loops will be caused either by configuration 1915 errors, or due to short term transient problems caused by the failure 1916 of a link. If only one link goes down, and if routing creates a 1917 normal "tree-shaped" set of paths to any one destination, then the 1918 failure of one link somewhere in the network will effect only one 1919 link's worth of data passing through any one node in the network. 1920 This implies that if a node is capable of forwarding one link's worth 1921 of data at L3, then in many or most cases it will have sufficient L3 1922 bandwidth to handle looping data. 1924 4.4 Interoperation with NHRP 1926 1931 When label switching is used over ATM, and there exists an LSR which 1932 is also operating as a Next Hop Client (NHC), the possibility of 1933 direct interaction arises. That is, could one switch cells between 1934 the two technologies without reassembly. To enable this several 1935 important issues must be addressed. 1937 The encapsulation must be acceptable to both MPLS and NHRP. If only 1938 a single label is used, then the null encapsulation could be used. 1939 Other solutions could be developed to handle label stacks. 1941 NHRP must understand and respect the granularity of a stream. 1943 Currently NHRP resolves an IP address to an ATM address. The response 1944 may include a mask indicating a range of addresses. However, any VC 1945 to the ATM address is considered to be a viable means of packet 1946 delivery. Suppose that an NHC NHRPs for IP address A and gets back 1947 ATM address 1 and sets up a VC to address 1. Later the same NHC NHRPs 1948 for a totally unrelated IP address B and gets back the same ATM 1949 address 1. In this case normal NHRP behavior allows the NHC to use 1950 the VC (that was set up for destination A) for traffic to B. 1952 Note: In this section we will refer to a VC set up as a result of an 1953 NHRP query/response as a shortcut VC. 1955 If one expects to be able to label switch the packets being received 1956 from a shortcut VC, then the label switch needs to be informed as to 1957 exactly what traffic will arrive on that VC and that mapping cannot 1958 change without notice. Currently there exists no mechanism in the 1959 defined signaling of an shortcut VC. Several means are possible. A 1960 binding, equivalent to the binding in LDP, could be sent in the setup 1961 message. Alternatively, the binding of prefix to label could remain 1962 in an LDP session (or whatever means of label distribution as 1963 appropriate) and the setup could carry a binding of the label to the 1964 VC. This would leave the binding mechanism for shortcut VCs 1965 independent of the label distribution mechanism. 1967 A further architectural challenge exists in that label switching is 1968 inherently unidirectional whereas ATM is bi-directional. The above 1969 binding semantics are fairly straight-forward. However, effectively 1970 using the reverse direction of a VC presents further challenges. 1972 Label switching must also respect the granularity of the shortcut VC. 1973 Without VC merge, this means a single label switched flow must map to 1974 a VC. In the case of VC merge, multiple label switched streams could 1975 be merged onto a single shortcut VC. But given the asymmetry 1976 involved, there is perhaps little practical use 1978 Another issue is one of practicality and usefulness. What is sent 1979 over the VC must be at a fine enough granularity to be label switched 1980 through receiving domain. One potential place where the two 1981 technologies might come into play is in moving data from one campus 1982 via the wide-area to another campus. In such a scenario, the two 1983 technologies would border precisely at the point where summarization 1984 is likely to occur. Each campus would have a detailed understanding 1985 of itself, but not of the other campus. The wide-area is likely to 1986 have summarized knowledge only. But at such a point level 3 1987 processing becomes the likely solution. 1989 4.5 Operation in a Hierarchy 1991 This section is FFS. 1993 4.6 Stacked Labels in a Flat Routing Environment 1995 This section is FFS. 1997 4.7 Multicast 1999 This section is FFS. 2001 4.8 Multipath 2003 Many IP routing protocols support the notion of equal-cost multipath 2004 routes, in which a router maintains multiple next hops for one 2005 destination prefix when two or more equal-cost paths to the prefix 2006 exist. There are a few possible approaches for handling multipath 2007 with MPLS. 2009 In this discussion we will use the term "multipath node" to mean a 2010 node which is keeping track of multiple switched paths from itself 2011 for a single destination. 2013 The first approach maintains a separate switched path from each 2014 ingress node via one or more multipath nodes to a merge point. This 2015 requires MPLS to distinguish the separate switched paths, so that 2016 learning of a new switched path is not misinterpreted as a 2017 replacement of the same switched path. This also requires an ingress 2018 MPLS node be capable of distributing the traffic among the multiple 2019 switched paths. This approach preserves switching performance, but at 2020 a cost of proliferating the number of switched paths. For example, 2021 each switched path consumes a distinct label. 2023 The second approach establishes only one switched path from any one 2024 ingress node to a destination. However, when the paths from two 2025 different ingress nodes happen to arrive at the same node, that node 2026 may use different paths for each (implying that the node becomes a 2027 multipath node). Thus the switched path chosen by the multipath node 2028 may assign a different downstream path to each incoming stream. This 2029 conserves switched paths and maintains switching performance, but 2030 cannot balance loads across downstream links as well as the other 2031 approaches, even if switched paths are selectively assigned. With 2032 this approach is that the L2 path may be different from the normal L3 2033 path, as traffic that otherwise would have taken multiple distinct 2034 paths is forced onto a single path. 2036 The third approach allows a single stream arriving at a multipath 2037 node to be split into multiple streams, by using L3 forwarding at the 2038 multipath node. For example, the multipath node might choose to use a 2039 hash function on the source and destination IP addresses, in order to 2040 avoid misordering packets between any one IP source and destination. 2041 This approach conserves switched paths at the cost of switching 2042 performance. 2044 4.9 Host Interactions 2046 There are a range of options for host interaction with MPLS: 2048 The most straightforward approach is no host involvement. Thus host 2049 operation may be completely independent of MPLS, rather hosts operate 2050 according to other IP standards. If there is no host involvement then 2051 this implies that the first hop requires an L3 lookup. 2053 If the host is ATM attached and doing NHRP, then this would allow the 2054 host to set up a Virtual Circuit to a router. However this brings up 2055 a range of issues as was discussed in section 4.4 ("interoperation 2056 with NHRP"). 2058 On the ingress side, it is reasonable to consider having the first 2059 hop LSR provide labels to the hosts, and thus have hosts attach 2060 labels for packets that they transmit. This could allow the first hop 2061 LSR to avoid an L3 lookup. It is reasonable here to have the host 2062 request labels only when needed, rather than require the host to 2063 remember all labels assigned for use in the network. 2065 On the egress side, it is questionable whether hosts should be 2066 involved. For scaling reasons, it would be undesirable to use a 2067 different label for reaching each host. 2069 4.10 Explicit Routing 2071 There are two options for Route Selection: (1) Hop by hop routing, 2072 and (2) Explicit routing. 2074 An explicitly routed LSP is an LSP where, at a given LSR, the LSP 2075 next hop is not chosen by each local node, but rather is chosen by a 2076 single node (usually the ingress or egress node of the LSP). The 2077 sequence of LSRs followed by an explicit routing LSP may be chosen by 2078 configuration, or by an algorithm performed by a single node (for 2079 example, the egress node may make use of the topological information 2080 learned from a link state database in order to compute the entire 2081 path for the tree ending at that egress node). 2083 With MPLS the explicit route needs to be specified at the time that 2084 Labels are assigned, but the explicit route does not have to be 2085 specified with each L3 packet. This implies that explicit routing 2086 with MPLS is relatively efficient (when compared with the efficiency 2087 of explicit routing for pure datagrams). 2089 Explicit routing may be useful for a number of purposes such as 2090 allowing policy routing and/or facilitating traffic engineering. 2092 4.10.1 Establishment of Point to Point Explicitly Routed LSPs 2094 In order to establish a point to point explicitly routed LSP, the LDP 2095 packets used to set up the LSP must contain the explicit route. This 2096 implies that the LSP is set up in order either from the ingress to 2097 the egress, or from the egress to the ingress. 2099 One node needs to pick the explicit route: This may be done in at 2100 least two possible ways: (i) by configuration (eg, the explicit route 2101 may be chosen by an operator, or by a centralized server of some 2102 kind); (ii) By use of a routing protocol which allows the ingress 2103 and/or egress node to know the entire route to be followed. This 2104 would imply the use of a link state routing protocol (in which all 2105 nodes know the full topology) or of a path vector routing protocol 2106 (in which the ingress node is told the path as part of the normal 2107 operation of the routing protocol). 2109 Note: The normal operation of path vector routing protocols (such as 2110 BGP) does not provide the full set of routers along the path. This 2111 implies that either a partial source route only would be provided 2112 (implying that LSP setup would use a combination of hop by hop and 2113 explicit routing), or it would be necessary to augment the protocol 2114 in order to provide the complete explicit route. Detailed operation 2115 in this case is FFS. 2117 In the point to point case, it is relatively straightforward to 2118 specify the route to use: This is indicated by providing the 2119 addresses of each LSR on the LSP. 2121 4.10.2 Explicit and Hop by Hop routing: Avoiding Loops 2123 In general, an LSP will be explicit routed specifically because there 2124 is a good reason to use an alternative to the hop by hop routed path. 2125 This implies that the explicit route is likely to follow a path which 2126 is inconsistent with the path followed by hop by hop routing. If some 2127 of the nodes along the path follow an explicit route but some of the 2128 nodes make use of hop by hop routing (and ignore the explicit route), 2129 then inconsistent routing may result and in some cases loops (or 2130 severely inefficient paths) may form. This implies that for any one 2131 LSP, there are two possible options: (i) The entire LSP may be hop by 2132 hop routed; or (ii) The entire LSP may be explicit routed. 2134 For this reason, it is important that if an explicit route is 2135 specified for setting up an LSP, then that route must be followed in 2136 setting up the LSP. 2138 There is a related issue when a link or node in the middle of an 2139 explicitly routed LSP breaks: In this case, the last operating node 2140 on the upstream part of the LSP will continue receiving packets, but 2141 will not be able to forward them along the explicitly routed LSP 2142 (since its next hop is no longer functioning). In this case, it is 2143 not in general safe for this node to forward the packets using L3 2144 forwarding with hop by hop routing. Instead, the packets must be 2145 discarded, and the upstream partition of the explicitly routed LSP 2146 must be torn down. 2148 Where part of an Explicitly Routed LSP breaks, the node which 2149 originated the LSP needs to be told about this. For robustness 2150 reasons the MPLS protocol design should not assume that the routing 2151 protocol will tell the node which originated the LSP. For example, it 2152 is possible that a link may go down and come back up quickly enough 2153 that the routing protocol never declares the link down. Rather, an 2154 explicit MPLS mechanism is needed. 2156 4.10.3 Merge and Explicit Routing 2158 Explicit Routing is slightly more complex with a multipoint to point 2159 LSP (i.e., in the case that stream merge is used). 2161 In this case, it is not possible to specify the route for the LSP as 2162 a simple list of LSRs (since the LSP does not consist of a simple 2163 sequence of LSRs). Rather the explicit route must specify a tree. 2164 There are several ways that this may be accomplished. Details are 2165 FFS. 2167 4.10.4 Using Explicit Routing for Traffic Engineering 2169 In the Internet today it is relatively common for ISPs to make use of 2170 a Frame Relay or ATM core, which interconnects a number of IP 2171 routers. The primary reason for use of a switching (L2) core is to 2172 make use of low cost equipment which provides very high speed 2173 forwarding. However, there is another very important reason for the 2174 use of a L2 core: In order to allow for Traffic Engineering. 2176 Traffic Engineering (also known as bandwidth management) refers to 2177 the process of managing the routes followed by user data traffic in a 2178 network in order to provide relatively equal and efficient loading of 2179 the resources in the network (i.e., to ensure that the bandwidth on 2180 links and nodes are within the capabilities of the links and nodes). 2182 Some rudimentary level of traffic engineering can be accomplished 2183 with pure datagram routing and forwarding by adjusting the metrics 2184 assigned to links. For example, suppose that there is a given link in 2185 a network which tends to be overloaded on a long term basis. One 2186 option would be to manually configure an increased metric value for 2187 this link, in the hopes of moving some traffic onto alternate routes. 2188 This provides a rather crude method of traffic engineering and 2189 provides only limited results. 2191 Another method of traffic engineering is to manually configure 2192 multiple PVCs across a L2 core, and to adjust the route followed by 2193 each PVC in an attempt to equalize the load on different parts of the 2194 network. Where necessary, multiple PVCs may be configured between the 2195 same two nodes, in order to allow traffic to be split between 2196 different paths. In some topologies it is much easier to achieve 2197 efficient non-overlapping or minimally-overlapping paths via this 2198 method (with manually configured paths) than it would be with pure 2199 datagram forwarding. A similar ability can be achieved with MPLS via 2200 the use of manual configuration of the paths taken by LSPs. 2202 A related issue is the decision on where merge is to occur. Note that 2203 once two streams merge into one stream (forwarded by a single label) 2204 then they cannot diverge again at that level of the MPLS hierarchy 2205 (i.e., they cannot be bifurcated without looking at a higher level 2206 label or the IP header). Thus there may be times when it is desirable 2207 to explicitly NOT merge two streams even though they are to the same 2208 egress node and FEC. Non-merge may be appropriate either because the 2209 streams will want to diverge later in the path (for example, to avoid 2210 overloading a particular downstream link), or because the streams may 2211 want to use different physical links in the case where multiple 2212 slower physical links are being aggregated into a single logical link 2213 for the purpose of IP routing. 2215 As a network grows to a very large size (on the order of hundreds of 2216 LSRs), it becomes increasingly difficult to handle the assignment of 2217 all routes via manual configuration. However, explicit routing allows 2218 several alternatives: 2220 1. Partial Configuration: One option is to use automatic/dynamic 2221 routing for most of the paths through the network, but then manually 2222 configure some routes. For example, suppose that full dynamic routing 2223 would result in a particular link being overloaded. One of the LSPs 2224 which uses that link could be selected and manually routed to use a 2225 different path. 2227 2. Central Computation: One option would be to provide long term 2228 network usage information to a single central management facility. 2229 That facility could then run a global optimization to compute a set 2230 of paths to use. Network management commands can be used to configure 2231 LSRs with the correct routes to use. 2233 3. Egress Computation: An egress node can run a computation which 2234 optimizes the path followed for traffic to itself. This cannot of 2235 course optimize the entire traffic load through the network, but can 2236 include optimization of traffic from multiple ingress's to one 2237 egress. The reason for optimizing traffic to a single egress, rather 2238 than from a single ingress, relates to the issue of when to merge: An 2239 ingress can never merge the traffic from itself to different 2240 egresses, but an egress can if desired chose to merge the traffic 2241 from multiple ingress's to itself. 2243 4.10.5 Using Explicit Routing for Policy Routing 2245 This section is FFS. 2247 4.11 Traceroute 2249 This section is FFS. 2251 4.12 LSP Control: Egress versus Local 2253 There is a choice to be made regarding whether the initial setup of 2254 LSPs will be initiated by the egress node, or locally by each 2255 individual node. 2257 When LSP control is done locally, then each node may at any time pass 2258 label bindings to its neighbors for each FEC recognized by that node. 2259 In the normal case that the neighboring nodes recognize the same 2260 FECs, then nodes may map incoming labels to outgoing labels as part 2261 of the normal label swapping forwarding method. 2263 When LSP control is done by the egress, then initially (on startup) 2264 only the egress node passes label bindings to its neighbors 2265 corresponding to any FECs which leave the MPLS network at that egress 2266 node. When initializing, other nodes wait until they get a label from 2267 downstream for a particular FEC before passing a corresponding label 2268 for the same FEC to upstream nodes. 2270 With local control, since each LSR is (at least initially) 2271 independently assigning labels to FECs, it is possible that different 2272 LSRs may make inconsistent decisions. For example, an upstream LSR 2273 may make a coarse decision (map multiple IP address prefixes to a 2274 single label) while its downstream neighbor makes a finer grain 2275 decision (map each individual IP address prefix to a separate label). 2276 With downstream label assignment this can be corrected by having LSRs 2277 withdraw labels that it has assigned which are inconsistent with 2278 downstream labels, and replace them with new consistent label 2279 assignments. 2281 This may appear to be an advantage of egress LSP control (since with 2282 egress control the initial label assignments "bubble up" from the 2283 egress to upstream nodes, and consistency is therefore easy to 2284 ensure). However, even with egress control it is possible that the 2285 choice of egress node may change, or the egress may (based on a 2286 change in configuration) change its mind in terms of the granularity 2287 which is to be used. This implies the same mechanism will be 2288 necessary to allow changes in granularity to bubble up to upstream 2289 nodes. The choice of egress or local control may therefore effect the 2290 frequency with which this mechanism is used, but will not effect the 2291 need for a mechanism to achieve consistency of label granularity. 2293 Egress control and local control can interwork in a very 2294 straightforward manner: With either approach, (assuming downstream 2295 label assignment) the egress node will initially assign labels for 2296 particular FECs and will pass these labels to its neighbors. With 2297 either approach these label assignments will bubble upstream, with 2298 the upstream nodes choosing labels that are consistent with the 2299 labels that they receive from downstream. 2301 The difference between the two techniques therefore becomes a 2302 tradeoff between avoiding a short period of initial thrashing on 2303 startup (in the sense of avoiding the need to withdraw inconsistent 2304 labels which may have been assigned using local control) versus the 2305 imposition of a short delay on initial startup (while waiting for the 2306 initial label assignments to bubble up from downstream). The protocol 2307 mechanisms which need to be defined are the same in either case, and 2308 the steady state operation is the same in either case. 2310 4.13 Security 2312 Security in a network using MPLS should be relatively similar to 2313 security in a normal IP network. 2315 Routing in an MPLS network uses precisely the same IP routing 2316 protocols as are currently used with IP. This implies that route 2317 filtering is unchanged from current operation. Similarly, the 2318 security of the routing protocols is not effected by the use of MPLS. 2320 Packet filtering also may be done as in normal IP. This will require 2321 either (i) that label swapping be terminated prior to any firewalls 2322 performing packet filtering (in which case a separate instance of 2323 label swapping may optionally be started after the firewall); or (ii) 2324 that firewalls "look past the labels", in order to inspect the entire 2325 IP packet contents. In this latter case note that the label may imply 2326 semantics greater than that contained in the packet header: In 2327 particular, a particular label value may imply that the packet is to 2328 take a particular path after the firewall. In environments in which 2329 this is considered to be a security issue it may be desirable to 2330 terminate the label prior to the firewall. 2332 Note that in principle labels could be used to speed up the operation 2333 of firewalls: In particular, the label could be used as an index into 2334 a table which indicates the characteristics that the packet needs to 2335 have in order to pass through the firewall. Depending upon 2336 implementation considerations matching the contents of the packet to 2337 the contents of the table may be quicker than parsing the packet in 2338 the absence of the label. 2340 References 2342 [1] "A Proposed Architecture for MPLS", E. Rosen, A. Viswanathan, R. 2343 Callon, work in progress, draft-rosen-architecture-00.txt, July 2344 1997. 2346 [2] "ARIS: Aggregate Route-Based IP Switching", A. Viswanathan, N. 2348 Feldman, R. Boivie, R. Woundy, work in progress, Internet Draft 2349 , March 1997. 2351 [3] "ARIS Specification", N. Feldman, A. Viswanathan, work in 2352 progress, Internet Draft , March 2353 1997. 2355 [4] "ARIS Support for LAN Media Switching", S. Blake, A. Ghanwani, W. 2356 Pace, V. Srinivasan, work in progress, Internet Draft , March 1997. 2359 [5] "Tag Switching Architecture - Overview", Rekhter, Davie, Katz, 2360 Rosen, Swallow, Farinacci, work in progress, Internet Draft 2361 2363 [6] Tag distribution Protocol", Doolan, Davie, Katz, Rekhter, Rosen, 2364 work in progress, internet draft 2366 [7] "Use of Tag Switching with ATM", Davie, Doolan, Lawrence, 2367 McGloghrie, Rekhter, Rosen, Swallow, work in progress, Internet 2368 Draft 2370 [8] "Label Switching: Label Stack Encodings", Rosen, Rekhter, Tappan, 2371 Farinacci, Fedorkow, Li, work in progress, internet draft 2372 2374 [9] "Partitioning Tag Space among Multicast Routers on a Common 2375 Subnet", Farinacci, work in progress, internet draft 2378 [10] "Multicast Tag Binding and Distribution using PIM", Farinacci, 2379 Rekhter, work in progress, internet draft 2382 [11] "Toshiba's Router Architecture Extensions fir ATM: Overview", 2383 Katsube, Nagami, Esaki, RFC2098.TXT. 2385 [12] "Soft State Switching: A Proposal to Extend RSVP for Switching 2386 RSVP Flows", A. Viswanathan, V. Srinivasan, work in progress, 2387 Internet Draft , March 1997. 2389 [13] "Integrated Services in the Internet Architecture: an Overview", 2390 R. Braden et al, RFC 1633, June 1994. 2392 [14] "Resource ReSerVation Protocol (RSVP), Version 1 Functional 2393 Specification", work in progress, draft-ietf-rsvp-spec-16.txt, 2394 June 1997 2396 [15] "OSPF version 2", J. Moy, RFC 1583, March 1994. 2398 [16] "A Border Gateway Protocol 4 (BGP-4)", Y. Rekhter and T. Li, 2399 RFC 1771, March 1995. 2401 [17] "Ipsilon Flow Management Protocol Specification for IPv4 Version 2402 1.0", P. Newman et al., RFC 1953, May 1996. 2404 [18] "ATM Forum Private Network-Network Interface Specification, 2405 Version 1.0", ATM Forum af-pnni-0055.000, March 1996. 2407 [19] "NBMA Next Hop Resolution Protocol (NHRP)", J. Luciani et al., 2408 work in progress, draft-ietf-rolc-nhrp-11.txt, March 1997. 2410 Author's Addresses 2412 Ross Callon 2413 Ascend Communications, Inc. 2414 1 Robbins Road 2415 Westford, MA 01886 2416 508-952-7412 2417 rcallon@casc.com 2419 Paul Doolan 2420 Cisco Systems, Inc 2421 250 Apollo Drive 2422 Chelmsford, MA 01824 2423 508-634-1204 2424 pdoolan@cisco.com 2426 Nancy Feldman 2427 IBM Corp. 2428 17 Skyline Drive 2429 Hawthorne NY 10532 2430 914-784-3254 2431 nkf@vnet.ibm.com 2433 Andre Fredette 2434 Bay Networks Inc 2435 3 Federal Street 2436 Billerica, MA 01821 2437 508-916-8524 2438 fredette@baynetworks.com 2440 George Swallow 2441 Cisco Systems, Inc 2442 250 Apollo Drive 2443 Chelmsford, MA 01824 2444 508-244-8143 2445 swallow@cisco.com 2447 Arun Viswanathan 2448 IBM Corp. 2449 17 Skyline Drive 2450 Hawthorne NY 10532 2451 914-784-3273 2452 arunv@vnet.ibm.com