idnits 2.17.1 draft-ietf-mpls-framework-02.txt: -(660): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(690): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 2 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 9 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 125: '... MPLS forwarding MUST simplify packet ...' RFC 2119 keyword, line 132: '...ore technologies MUST be general with ...' RFC 2119 keyword, line 134: '...imizations for particular media MAY be...' RFC 2119 keyword, line 137: '...ore technologies MUST be compatible wi...' RFC 2119 keyword, line 138: '...g protocols, and MUST be capable of op...' (23 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 667 has weird spacing: '...ceiving domai...' == Line 1449 has weird spacing: '...ount of resou...' == Line 1651 has weird spacing: '...er LSRs that ...' == Line 1761 has weird spacing: '...ue must have ...' == Line 2136 has weird spacing: '...warding for t...' == (2 more instances...) == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The MPLS protocol MUST not make assumptions about the forwarding capabilities of an MPLS node. Thus, MPLS must propose solutions that can leverage the benefits of a node that is capable of L3 forwarding, but must not mandate the node be capable of such. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 21, 1998) is 9471 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'TAG' on line 1343 -- Looks like a reference, but probably isn't: 'ARIS' on line 1635 -- Looks like a reference, but probably isn't: 'RSVP' on line 1344 -- Looks like a reference, but probably isn't: 'CSR' on line 1345 -- Looks like a reference, but probably isn't: 'Ipsilon' on line 1345 -- Looks like a reference, but probably isn't: 'PNNI' on line 2193 -- Looks like a reference, but probably isn't: 'TDP' on line 1635 -- Looks like a reference, but probably isn't: 'FANP' on line 1636 == Unused Reference: '1' is defined on line 2966, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 2970, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 2974, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 2978, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 2982, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 2986, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 2989, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 2993, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 2997, but no explicit reference was found in the text == Unused Reference: '10' is defined on line 3001, but no explicit reference was found in the text == Unused Reference: '11' is defined on line 3005, but no explicit reference was found in the text == Unused Reference: '12' is defined on line 3008, but no explicit reference was found in the text == Unused Reference: '13' is defined on line 3012, but no explicit reference was found in the text == Unused Reference: '14' is defined on line 3015, but no explicit reference was found in the text == Unused Reference: '15' is defined on line 3019, but no explicit reference was found in the text == Unused Reference: '16' is defined on line 3021, but no explicit reference was found in the text == Unused Reference: '17' is defined on line 3024, but no explicit reference was found in the text == Unused Reference: '18' is defined on line 3027, but no explicit reference was found in the text == Unused Reference: '19' is defined on line 3030, but no explicit reference was found in the text == Outdated reference: A later version (-07) exists of draft-ietf-mpls-arch-00 -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' == Outdated reference: A later version (-08) exists of draft-ietf-mpls-label-encaps-00 -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' ** Downref: Normative reference to an Informational RFC: RFC 2098 (ref. '11') -- Possible downref: Non-RFC (?) normative reference: ref. '12' ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref. '13') -- Unexpected draft version: The latest known version of draft-ietf-rsvp-spec is -15, but you're referring to -16. ** Obsolete normative reference: RFC 1583 (ref. '15') (Obsoleted by RFC 2178) ** Obsolete normative reference: RFC 1771 (ref. '16') (Obsoleted by RFC 4271) ** Downref: Normative reference to an Informational RFC: RFC 1953 (ref. '17') -- Possible downref: Non-RFC (?) normative reference: ref. '18' == Outdated reference: A later version (-14) exists of draft-ietf-rolc-nhrp-12 Summary: 16 errors (**), 0 flaws (~~), 31 warnings (==), 21 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Callon 3 INTERNET DRAFT Ascend Communications 4 P. Doolan 5 Ennovate Networks 6 N. Feldman 7 IBM Corp. 8 A. Fredette 9 Bay Networks 10 G. Swallow 11 Cisco Systems 12 A. Viswanathan 13 IBM Corp. 14 November 21, 1997 15 Expires May 21, 1998 17 A Framework for Multiprotocol Label Switching 19 Status of this Memo 21 This document is an Internet-Draft. Internet-Drafts are working 22 documents of the Internet Engineering Task Force (IETF), its areas, 23 and its working groups. Note that other groups may also distribute 24 working documents as Internet-Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as ``work in progress.'' 31 To learn the current status of any Internet-Draft, please check the 32 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 33 Directories on ds.internic.net (US East Coast), nic.nordu.net 34 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific 35 Rim). Distribution of this memo is unlimited. 37 Abstract 39 This document discusses technical issues and requirements for the 40 Multiprotocol Label Switching working group. This is an initial draft 41 document, which will evolve and expand over time. It is the intent of 42 this document to produce a coherent description of all significant 43 approaches which were and are being considered by the working group. 44 Selection of specific approaches, making choices regarding 45 engineering tradeoffs, and detailed protocol specification, are 46 outside of the scope of this framework document. 48 Note that this document is at an early stage, and that most of the 49 detailed technical discussion is only in a rough form. Additional 50 text will be provided over time from a number of sources. A small 51 amount of the text in this document may be redundant with the 52 proposed protocol architecture for MPLS. This redundancy will be 53 reduced over time, with the overall discussion of issues moved to be 54 in this document, and the selection of specific approaches and 55 specification of the protocol contained in the protocol architecture 56 and other related documents. 58 Acknowledgments 60 The ideas and text in this document have been collected from a number 61 of sources and comments received. We would like to thank Jim Luciani, 62 Andy Malis, Rayadurgam Ravikanth, Yakov Rekhter, Eric Rosen, Vijay 63 Srinivasan, and Pasi Vananen for their inputs and ideas. 65 1. Introduction and Requirements 67 1.1 Overview of MPLS 69 The primary goal of the MPLS working group is to standardize a base 70 technology that integrates the label swapping forwarding paradigm 71 with network layer routing. This base technology (label swapping) is 72 expected to improve the price/performance of network layer routing, 73 improve the scalability of the network layer, and provide greater 74 flexibility in the delivery of (new) routing services (by allowing 75 new routing services to be added without a change to the forwarding 76 paradigm). 78 The initial MPLS effort will be focused on IPv4 and IPv6. However, 79 the core technology will be extendible to multiple network layer 80 protocols (e.g., IPX, Appletalk, DECnet, CLNP). MPLS is not confined 81 to any specific link layer technology, it can work with any media 82 over which Network Layer packets can be passed between network layer 83 entities. 85 MPLS makes use of a routing approach whereby the normal mode of 86 operation is that L3 routing (e.g., existing IP routing protocols 87 and/or new IP routing protocols) is used by all nodes to determine 88 the routed path. 90 MPLS provides a simple "core" set of mechanisms which can be applied 91 in several ways to provide a rich functionality. The core effort 92 includes: 94 a) Semantics assigned to a stream label: 96 - Labels are associated with specific streams of data; 98 b) Forwarding Methods: 100 - Forwarding is simplified by the use of short fixed length 101 labels to identify streams 103 - Forwarding may require simple functions such as looking up a 104 label in a table, swapping labels, and possibly decrementing 105 and checking a TTL. 107 - In some cases MPLS may make direct use of underlying layer 2 108 forwarding, such as is provided by ATM or Frame Relay 109 equipment. 111 c) Label Distribution Methods: 113 - Allow nodes to determine which labels to use for specific 114 streams 116 - This may use some sort of control exchange, and/or be 117 piggybacked on a routing protocol 119 The MPLS working group will define the procedures and protocols used 120 to assign significance to the forwarding labels and to distribute 121 that information between cooperating MPLS forwarders. 123 1.2 Requirements 125 - MPLS forwarding MUST simplify packet forwarding in order to do the 126 following: 128 - lower cost of high speed forwarding 130 - improve forwarding performance 132 - MPLS core technologies MUST be general with respect to data link 133 technologies (i.e., work over a very wide range of underlying data 134 links). Specific optimizations for particular media MAY be 135 considered. 137 - MPLS core technologies MUST be compatible with a wide range of 138 routing protocols, and MUST be capable of operating independently 139 of the underlying routing protocols. It has been observed that 140 considerable optimizations can be achieved in some cases by small 141 enhancements of existing protocols. Such enhancements MAY be 142 considered in the case of IETF standard routing protocols, and if 143 appropriate, coordinated with the relevant working group(s). 145 - Routing protocols which are used in conjunction with MPLS might 146 be based on distributed computation. As such, during routing 147 transients, these protocols may compute forwarding paths which 148 potentially contain loops. MPLS MUST provide protocol mechanisms 149 to either prevent the formation of loops and /or contain the 150 amount of (networking) resources that can be consumed due to the 151 presence of loops. 153 - MPLS forwarding MUST allow "aggregate forwarding" of user data; 154 i.e., allow streams to be forwarded as a unit and ensure that an 155 identified stream takes a single path, where a stream may consist 156 of the aggregate of multiple flows of user data. MPLS SHOULD 157 provide multiple levels of aggregation support (e.g., from 158 individual end to end application flows at one extreme, to 159 aggregates of all flows passing through a specified switch or 160 router at the other extreme). 162 - MPLS MUST support operations, administration, and maintenance 163 facilities at least as extensive as those supported in current IP 164 networks. Current network management and diagnostic tools SHOULD 165 continue to work in order to provide some backward compatibility. 166 Where such tools are broken by MPLS, hooks MUST be supplied to 167 allow equivalent functionality to be created. 169 - MPLS core technologies MUST work with both unicast and multicast 170 streams. 172 - The MPLS core specifications MUST clearly state how MPLS operates 173 in a hierarchical network. 175 - Scalability issues MUST be considered and analyzed during the 176 definition of MPLS. Very scaleable solutions MUST be sought. 178 - MPLS core technologies MUST be capable of working with O(n) streams 179 to switch all best-effort traffic, where n is the number of nodes 180 in a MPLS domain. MPLS protocol standards MUST be capable of taking 181 advantage of hardware that supports stream merging where 182 appropriate. Note that O(n-squared) streams or VCs might also be 183 appropriate for use in some cases. 185 - The core set of MPLS standards, along with existing Internet 186 standards, MUST be a self-contained solution. For example, the 187 proposed solution MUST NOT require specific hardware features that 188 do not commonly exist on network equipment at the time that the 189 standard is complete. However, the solution MAY make use of 190 additional optional hardware features (e.g., to optimize 191 performance). 193 - The MPLS protocol standards MUST support multipath routing and 194 forwarding. 196 - MPLS MUST be compatible with the IETF Integrated Services Model, 197 including RSVP. 199 - It MUST be possible for MPLS switches to coexist with non MPLS 200 switches in the same switched network. MPLS switches SHOULD NOT 201 impose additional configuration on non-MPLS switches. 203 - MPLS MUST allow "ships in the night" operation with existing layer 204 2 switching protocols (e.g., ATM Forum Signaling) (i.e., MPLS must 205 be capable of being used in the same network which is also 206 simultaneously operating standard layer 2 protocols). 208 - The MPLS protocol MUST support both topology-driven and 209 traffic/request-driven label assignments. 211 1.3 Terminology 213 aggregate stream 215 synonym of "stream" 217 DLCI 219 a label used in Frame Relay networks to identify frame 220 relay circuits 222 flow 224 a single instance of an application to application flow 225 of data (as in the RSVP and IFMP use of the term "flow") 227 forwarding equivalence class 229 a group of L3 packets which are forwarded in the same 230 manner (e.g., over the same path, with the same 231 forwarding treatment). A forwarding equivalence class is 232 therefore the set of L3 packets which could safely be 233 mapped to the same label. Note that there may be reasons 234 that packets from a single forwarding equivalence class 235 may be mapped to multiple labels (e.g., when stream 236 merge is not used). 238 frame merge 240 stream merge, when it is applied to operation over 241 frame based media, so that the potential problem of cell 242 interleave is not an issue. 244 label 246 a short fixed length physically contiguous locally 247 significant identifier which is used to identify a stream 249 label information base 251 the database of information containing label bindings 253 label swap 255 the basic forwarding operation consisting of looking 256 up an incoming label to determine the outgoing label, 257 encapsulation, port, and other data handling information. 259 label swapping 261 a forwarding paradigm allowing streamlined forwarding of 262 data by using labels to identify streams of data to be 263 forwarded. 265 label switched hop 267 the hop between two MPLS nodes, on which forwarding is 268 done using labels. 270 label switched path 272 the path created by the concatenation of one or more label 273 switched hops, allowing a packet to be forwarded by swapping 274 labels from an MPLS node to another MPLS node. 276 layer 2 278 the protocol layer under layer 3 (which therefore offers 279 the services used by layer 3). Forwarding, when done by the 280 swapping of short fixed length labels, occurs at layer 2 281 regardless of whether the label being examined is an ATM 282 VPI/VCI, a frame relay DLCI, or an MPLS label. 284 layer 3 286 the protocol layer at which IP and its associated routing 287 protocols operate 289 link layer 290 synonymous with layer 2 292 loop detection 294 a method of dealing with loops in which loops are allowed 295 to be set up, and data may be transmitted over the loop, 296 but the loop is later detected and closed 298 loop prevention 300 a method of dealing with loops in which data is never 301 transmitted over a loop 303 label stack 305 an ordered set of labels 307 loop survival 309 a method of dealing with loops in which data may be 310 transmitted over a loop, but means are employed to limit the 311 amount of network resources which may be consumed by the 312 looping data 314 label switching router 316 an MPLS node which is capable of forwarding native L3 packets 318 merge point 320 the node at which multiple streams and switched paths are 321 combined into a single stream sent over a single path. In the 322 case that the multiple paths are not combined prior to the 323 egress node, then the egress node becomes the merge point. 325 Mlabel 327 abbreviation for MPLS label 329 MPLS core standards 331 the standards which describe the core MPLS technology 333 MPLS domain 335 a contiguous set of nodes which operate MPLS routing and 336 forwarding and which are also in one Routing or Administrative 337 Domain 339 MPLS edge node 341 an MPLS node that connects an MPLS domain with a node which 342 is outside of the domain, either because it does not run 343 MPLS, and/or because it is in a different domain. Note that 344 if an LSR has a neighboring host which is not running MPLS, 345 that that LSR is an MPLS edge node. 347 MPLS egress node 349 an MPLS edge node in its role in handling traffic as it 350 leaves an MPLS domain 352 MPLS ingress node 354 an MPLS edge node in its role in handling traffic as it 355 enters an MPLS domain 357 MPLS label 359 a label placed in a short MPLS shim header used to identify 360 streams 362 MPLS node 364 a node which is running MPLS. An MPLS node will be aware of 365 MPLS control protocols, will operate one or more L3 routing 366 protocols, and will be capable of forwarding packets based on 367 labels. An MPLS node may optionally be also capable of 368 forwarding native L3 packets. 370 MultiProtocol Label Switching 372 an IETF working group and the effort associated with the 373 working group 375 network layer 377 synonymous with layer 3 379 shortcut VC 381 a VC set up as a result of an NHRP query and response 383 stack 385 synonymous with label stack 387 stream 389 an aggregate of one or more flows, treated as one aggregate 390 for the purpose of forwarding in L2 and/or L3 nodes (e.g., 391 may be described using a single label). In many cases a stream 392 may be the aggregate of a very large number of flows. 393 Synonymous with "aggregate stream". 395 stream merge 397 the merging of several smaller streams into a larger stream, 398 such that for some or all of the path the larger stream can 399 be referred to using a single label. 401 switched path 403 synonymous with label switched path 405 virtual circuit 407 a circuit used by a connection-oriented layer 2 technology 408 such as ATM or Frame Relay, requiring the maintenance of 409 state information in layer 2 switches. 411 VC merge 413 stream merge when it is specifically applied to VCs, 414 specifically so as to allow multiple VCs to merge into one 415 single VC 417 VP merge 419 stream merge when it is applied to VPs, specifically so as 420 to allow multiple VPs to merge into one single VP. In this 421 case the VCIs need to be unique. This allows cells from 422 different sources to be distinguished via the VCI. 424 VPI/VCI 426 a label used in ATM networks to identify circuits 428 1.4 Acronyms and Abbreviations 430 DLCI Data Link Circuit Identifier 432 FEC Forwarding Equivalence Class 434 ISP Internet Service Provider 435 LIB Label Information Base 437 LDP Label Distribution Protocol 439 L2 Layer 2 441 L3 Layer 3 443 LSP Label Switched Path 445 LSR Label Switching Router 447 MPLS MultiProtocol Label Switching 449 MPT Multipoint to Point Tree 451 NHC Next Hop (NHRP) Client 453 NHS Next Hop (NHRP) Server 455 VC Virtual Circuit 457 VCI Virtual Circuit Identifier 459 VPI Virtual Path Identifier 461 1.5 Motivation for MPLS 463 This section describes the expected and potential benefits of the 464 MPLS over existing schemes. Specifically, this section discusses the 465 advantages of MPLS over previous methods for building core networks 466 (i.e., networks for internet service providers or for major corporate 467 backbones). The potential advantages of MPLS in campus and local area 468 networks are not discussed in this section. 470 There are currently two commonly used methods for building core IP 471 networks: (i) Networks of datagram routers in which the core of the 472 network is based on the datagram routers; (ii) Networks of datagram 473 routers operating over an ATM core. In order to describe the 474 advantages of MPLS, it is necessary to know which alternate to MPLS 475 we are using for the comparison. This section is therefore split into 476 two sections: Section 1.5.1 describes the advantages of MPLS when 477 compared to a pure datagram routed network. Section 1.5.2 describes 478 the advantages of MPLS when compared to an IP over ATM network. 480 This section does not provide a complete list of requirements for 481 MPLS. For example, Multipoint to Point Trees are important for MPLS 482 to scale. However, datagram forwarding naturally acts in this way 483 (since multiple sources are merged automatically), and the ATM forum 484 is currently adding support for multipoint to point to the ATM 485 standards. The ability to do MPTs is therefore important to MPLS, but 486 does not represent an advantage over either datagram routing or IP 487 over ATM, and therefore is not mentioned in this section. 489 1.5.1 Benefits Relative to Use of a Router Core 491 1.5.1.1 Simplified Forwarding 493 Label swapping allows packet forwarding to be based on an exact match 494 for a short label, rather than a longest match algorithm applied to a 495 longer address as is required for normal datagram forwarding. In 496 addition, the label headers used with MPLS are simpler than the 497 headers typically used with datagram protocols such as IP. This in 498 turn implies that MPLS allows a much simpler forwarding paradigm 499 relative to datagrams, and implies that it is easier to build a high 500 speed router using MPLS. 502 Whether this simpler forwarding operation will result in availability 503 of LSRs which can operate at higher speeds than datagram routers is 504 controversial, and probably depends upon implementation details. 505 There are some parts of the network, such as at hierarchical 506 boundaries, where datagram IP forwarding at high speed will be 507 required. This implies that implementation of a high speed router is 508 highly desirable. In addition, there are currently multiple companies 509 building high speed routers which will allow IP packets to be 510 forwarded at very high speed. At speeds at least up to OC48, it 511 appears that once the one-time engineering is completed, the per-unit 512 cost associated with IP forwarding will be a small fraction of the 513 overall equipment cost. 515 However, there are also many existing routers which can benefit from 516 the simpler forwarding allowed by MPLS. In addition, there are some 517 routers being built with implementations that will benefit from the 518 simpler forwarding available with MPLS. 520 1.5.1.2 Efficient Explicit Routing 522 Explicit routing (aka Source Routing) is a very powerful technique 523 which potentially can be useful for a variety of purposes. However, 524 with pure datagram routing the overhead of carrying a complete 525 explicit route with each packet is prohibitive. However, MPLS allows 526 the explicit route to be carried only at the time that the label 527 switched path is set up, and not with each packet. This implies that 528 MPLS makes explicit routing practical. This in turn implies that MPLS 529 can make possible a number of advanced routing features which depend 530 upon explicit routing. 532 1.5.1.3 Traffic Engineering 534 Traffic engineering refers to the process of selecting the paths 535 chosen by data traffic in order to balance the traffic load on the 536 various links, routers, and switches in the network. Traffic 537 engineering is most important in networks where multiple parallel or 538 alternate paths are available. The rapid growth in the Internet, and 539 particularly the associated rapid growth in the demand for bandwidth, 540 has tended to cause some core networks to become increasingly 541 "branchy" in recent years, resulting in an increase in the importance 542 of traffic engineering. 544 It is common today, in networks that are running IP over an ATM core 545 using PVCs, to manually configure the path of each PVC in order to 546 equalize the traffic levels on different links in the network. Thus 547 traffic engineering is typically done today in IP over ATM networks 548 using manual configuration. 550 Traffic engineering is difficult to accomplish with datagram routing. 551 Some degree of load balancing can be obtained by adjusting the 552 metrics associated with network links. However, there is a limit to 553 how much can be accomplished in this way, and in networks with a 554 large number of alternative paths between any two points balancing of 555 the traffic levels on all links is difficult to achieve solely by 556 adjustment of the metrics used with hop by hop datagram routing. 558 MPLS allows streams from any particular ingress node to any 559 particular egress node to be individually identified. MPLS therefore 560 provides a straightforward mechanism to measure the traffic 561 associated with each ingress node to egress node pair. In addition, 562 since MPLS allows efficient explicit routing of Label Switched Paths, 563 it is straightforward to ensure that any particular stream of data 564 takes the preferred path. 566 The hard part of traffic engineering is selection of the method used 567 to route each Label Switched Path. There are a variety of possible 568 ways to do this, ranging from manual configuration of routes, to use 569 of a routing protocol which announces traffic loads in the network 570 combined with background recomputation of paths. 572 1.5.1.4 QoS Routing 574 QoS routing refers to a method of routing in which the route chosen 575 for a particular stream is chosen in response to the QoS required for 576 that stream. In many cases QoS routing needs to make use of explicit 577 routing for several reasons: 579 In some cases specific bandwidth is likely to be reserved for each of 580 many specific streams of data. This implies that the total bandwidth 581 of multiple streams may exceed the bandwidth available on any 582 particular link, and thus not all streams, even between the same 583 ingress and egress nodes, can take the same path. Instead, individual 584 streams will need to be individually routed. This is somewhat 585 analogous to traffic engineering, but might require separation of 586 streams on a finer granularity. Thus explicit routing may be needed 587 in order to allow each stream to be individually routed, and to 588 eliminate the need for each switch along the path of a stream to 589 compute the route for each stream. 591 Consider the case of routing a stream with a specific bandwidth 592 requirement: In this case the route chosen will depend upon the 593 amount of bandwidth which is requested. For any one given bandwidth 594 it is straightforward to select a path. However there are a lot of 595 different levels of bandwidth which could in principle be requested. 596 This makes it impractical to precompute all possible paths for all 597 possible bandwidths. If the path for a particular stream must be 598 computed on demand, then it is undesirable to require every LSR on 599 the path to compute the path. Instead, it is preferable to have the 600 first node compute the path and specify the route to be followed 601 through use of an explicit route. 603 For a variety of reasons the information available for QoS routing 604 may in some cases be slightly out of date. This implies that the 605 attempt to select a specific path for a QoS-sensitive stream may in 606 some cases fail, due to a particular node or link not having the 607 required resources available. In these cases it is not in general 608 always feasible to tell all other nodes in the network of the limited 609 resource in one particular network element. If explicit routing is 610 available, then this permits the initial node of the stream (the 611 ingress node in MPLS) to be informed that the indicated network 612 element is not able to carry the stream, allowing an alternate path 613 to be selected. However, in this case the node that selects the 614 alternate path has to use explicit routing in order to force the 615 stream to follow the alternate path. 617 These and similar examples implies that explicit routing is necessary 618 in order to do an adequate job of QoS routing. Given that MPLS allows 619 efficient explicit routing, it follows that MPLS also facilitates QoS 620 routing. 622 1.5.1.5 Complex Mappings from IP Packet to Forwarding Equivalence Class 624 MPLS allows the mapping from IP packet to forwarding equivalence 625 class to be performed only once, at the ingress to an MPLS area. This 626 facilitates complex mappings from IP packet to FEC than would 627 otherwise be impractical. 629 For example, consider the case of provisioned QoS: Some ISPs offer a 630 service wherein specific customers subscribe to received 631 differentiated services (e.g., their packets may receive preferential 632 forwarding treatment). Mapping of IP packets to the service level may 633 require knowing the customer who is transmitting the packet, which 634 may in turn require packet filtering based on source and destination 635 address, incoming interface, and other characteristics. The sheer 636 number of filters that are needed in a moderate sized ISP preclude 637 repetition of the filters at every router throughout the network. 638 Also, some information such as incoming interface is not available 639 except at the ingress node to the network. This implies that the 640 preferred way to offer provisioned QoS is to map the packet at the 641 ingress point to the preferred QoS level, and then label the packet 642 in some way. MPLS offers an efficient method to label the QoS class 643 associated with any particular packet. 645 Other examples of complex mappings from IP packet to FEC are also 646 likely to be determined as MPLS is deployed. 648 1.5.1.6 Partitioning of Functionality 650 Due to the support of the different label granularities, it will be 651 possible to hierarchically partition the processing functionality to 652 the different network elements, so that the more heavy processing 653 takes place on the edges of the network, near the customers, and on 654 the core network the processing is as simple as possible, e.g. pure 655 label based forwarding. 657 AS level aggregations will enable building of the fully switched 658 backbone networks and traffic exchange points. Also, it will be 659 possible for operators to fully switch the transit traffic traveling 660 through the operator�s network. Deaggregation will be needed for the 661 streams that are destined in the networks connected to the MPLS 662 domain, but it shall be noted that this deaggregation will only need 663 to perform lookup operations associated with finding the label for 664 egress router or interface, e.g. TOS information bound to label in 665 source is still valid, and can be honored on basis of which label the 666 packet was received in. It shall be noted that it is even impossible 667 for the receiving domain to do the classification as the original 668 packet classification policy is not known by the receiving domain. 670 As one example of the improved functional partitioning, consider the 671 case of the use of packet filters to map IP packets into a 672 substantial number of queues, such that each queue receives 673 differentiated services. For example, suppose that a network supports 674 individual queuing for on the order of 100 different customers, with 675 packets mapped to queues based on the source and destination IP 676 address. In this case, with MPLS the packet filtering can be done 677 solely on the edge of the network, with the packets mapped to labels 678 such that each individual user receives separate labels. Thus with 679 MPLS the filtering can be performed at the edge only of the network. 680 This allows complex mappings of IP packets to forwarding equivalence 681 class. 683 1.5.1.7 Single Forwarding Paradigm with Service Level Differentiation 685 MPLS can allow a single forwarding paradigm to be used to support 686 multiple types of service on the same network. 688 Because of the forwarding paradigm, it will be possible to carry the 689 different services through the same network elements, regardless of 690 the control plane protocols used for the population of the LSR�s LIB. 691 It is for example possible, in case of ATM based switching system to 692 support all the native ATM services, frame relay services, and 693 labeled IP services. The simultaneous support of multiple service may 694 need partitioning of the label space between the services, and shall 695 be supported by the label distribution / management protocol. 697 Non-exhaustive list of examples of the services suitable for carrying 698 over LSRs are IP traffic, Frame Relay traffic, ATM traffic (in case 699 of cell switching), IP tunneling, VPNs, and other datagram protocols. 701 Note that MPLS does not necessarily use the same header format over 702 all types of media. However, over any particular type of media a 703 single header format (at least for the lowest level of the Label 704 Stack) should be possible. 706 1.5.2 Benefits Relative to Use of an ATM or Frame Relay Core 708 Note: This section compares MPLS with other methods for 709 interconnecting routers over a switched core network. We are not 710 considering methods for interconnecting hosts located on virtual 711 networks. For example the ATM Forum LANE and MPOA standards support 712 virtual networks. MPLS does not directly support virtual networks, 713 and should not be compared directly with MPOA or LANE. 715 Previously available methods for interconnecting routers in an IP 716 over ATM environment make use of either: (i) a full mesh 'n-squared' 717 overlay of virtual circuits between n ATM-attached routers; (ii) A 718 partial mesh of VCs between routers; or (iii) A partial mesh of VCs, 719 plus the use of NHRP to facilitate on demand cut-through SVCs. 721 1.5.2.1 Scaling of the Routing Protocol 723 Relative to the interconnection of IP over an ATM core, MPLS improves 724 the scaling of routing due to reduced number of peers and elimination 725 of the 'n-squared' logical links between routers used to operate the 726 routing protocols. 728 Because all LSRs will run standard routing protocols, the number of 729 the peers routers need to communicate with are reduced to the number 730 of the LSRs and router given LSR is directly connected to, instead of 731 having to peer with large number of routers at the ends of the 732 switched L2 paths. This benefit is achieved because the edge LSRs do 733 not need to peer with every other edge LSR in the domain as is case 734 on hybrid switch / router network. 736 1.5.2.2 Common Operation over Packet and Cell media 738 MPLS makes use of common methods for routing and forwarding over 739 packet and cell media, and potentially allows a common approach to 740 traffic engineering, QoS routing, and other aspects of operation. For 741 example, this means that the same method for label distribution can 742 be used over Frame Relay and ATM media, as well as between LSRs using 743 the MPLS Shim Header for forwarding over other media (such as PPP 744 links and broadcast LANs). 746 Note: There may be some differences with respect to the operation of 747 different media. For example, if VP merge is used with ATM media 748 (rather than VC merge) then the merge operation may be somewhat 749 different than what it would be with packet media or with ATM using 750 VC merge. 752 1.5.2.3 Easier Management 754 The use of a common method for label distribution and common routing 755 protocols over multiple types of media is expected to simplify 756 network management of MPLS networks. 758 1.5.2.4 Elimination of the 'Routing over Large Clouds' Issue 760 MPLS eliminates the need to use NHRP and on-demand cut-through SVCs 761 for operation over ATM. This eliminates the latency problem 762 associated with cut-through SVCs. 764 2. Discussion of Core MPLS Components 766 2.1 The Basic Routing Approach 768 Routing is accomplished through the use of standard L3 routing 769 protocols, such as OSPF and BGP. The information maintained by the 770 L3 routing protocols is then used to distribute labels to neighboring 771 nodes that are used in the forwarding of packets as described below. 773 In the case of ATM networks, the labels that are distributed are 774 VPI/VCIs and a separate protocol (i.e., PNNI) is not necessary for 775 the establishment of VCs for IP forwarding. 777 The topological scope of a routing protocol (i.e. routing domain) and 778 the scope of label switching MPLS-capable nodes may be different. 779 For example, MPLS-knowledgeable and MPLS-ignorant nodes, all of which 780 are OSPF routers, may be co-resident in an area. In the case that 781 neighboring routers know MPLS, labels can be exchanged and used. 783 Neighboring MPLS routers may use configured PVCs or PVPs to tunnel 784 through non-participating ATM or FR switches. 786 2.2 Labels 788 In addition to the single routing protocol approach discussed above, 789 the other key concept in the basic MPLS approach is the use of short 790 fixed length labels to simply user data forwarding. 792 2.2.1 Label Semantics 794 It is important that the MPLS solutions are clear about what 795 semantics (i.e., what knowledge of the state of the network) is 796 implicit in the use of labels for forwarding user data packets or 797 cells. 799 At the simplest level, a label may be thought of as nothing more than 800 a shorthand for the packet header, in order to index the forwarding 801 decision that a router would make for the packet. In this context, 802 the label is nothing more than a shorthand for an aggregate stream of 803 user data. 805 This observation leads to one possible very simple interpretation 806 that the "meaning" of the label is a strictly local issue between two 807 neighboring nodes. With this interpretation: (i) MPLS could be 808 employed between any two neighboring nodes for forwarding of data 809 between those nodes, even if no other nodes in the network 810 participate in MPLS; (ii) When MPLS is used between more than two 811 nodes, then the operation between any two neighboring nodes could be 812 interpreted as independent of the operation between any other pair of 813 nodes. This approach has the advantage of semantic simplicity, and of 814 being the closest to pure datagram forwarding. However this approach 815 (like pure datagram forwarding) has the disadvantage that when a 816 packet is forwarded it is not known whether the packet is being 817 forwarded into a loop, into a black hole, or towards links which have 818 inadequate resources to handle the traffic flow. These disadvantages 819 are necessary with pure datagram forwarding, but are optional design 820 choices to be made when label switching is being used. 822 There are cases where it would be desirable to have additional 823 knowledge implicit in the existence of the label. For example, one 824 approach to avoiding loops (see section 4.3) involves signaling the 825 label distribution along a path before packets are forwarded on that 826 path. With this approach the fact that a node has a label to use for 827 a particular IP packet would imply the knowledge that following the 828 label (including label swapping at subsequent nodes) leads to a non- 829 looping path which makes progress towards the destination (something 830 which is usually, but not necessarily always true when using pure 831 datagram routing). This would of course require some sort of label 832 distribution/setup protocol which signals along the path being setup 833 before the labels are available for packet forwarding. However, there 834 are also other consequences to having additional semantics associated 835 with the label: specifically, procedures are needed to ensure that 836 the semantics are correct. For example, if the fact that you have a 837 label for a particular destination implies that there is a loop-free 838 path, then when the path changes some procedures are required to 839 ensure that it is still loop free. Another example of semantics which 840 could be implicit in a label is the identity of the higher level 841 protocol type which is encoded using that label value. 843 In either case, the specific value of a label to use for a stream is 844 strictly a local issue; however the decision about whether to use the 845 label may be based on some global (or at least wider scope) knowledge 846 that, for example, the label-switched path is loop-free and/or has 847 the appropriate resources. 849 A similar example occurs in ATM networks: With standard ATM a 850 signaling protocol is used which both reserves resources in switches 851 along the path, and which ensures that the path is loop-free and 852 terminates at the correct node. Thus implicit in the fact that an ATM 853 node has a VPI/VCI for forwarding a particular piece of data is the 854 knowledge that the path has been set up successfully. 856 Another similar examples occurs with multipoint to point trees over 857 ATM (see section 4.2 below), where the multipoint to point tree uses 858 a VP, and cell interleave at merge points in the tree is handled by 859 giving each source on the tree a distinct VCI within the VP. In this 860 case, the fact that each source has a known VPI/VCI to use needs to 861 (implicitly or explicitly) imply the knowledge that the VCI assigned 862 to that source is unique within the context of the VP. 864 In general labels are used to optimize how the system works, not to 865 control how the system works. For example, the routing protocol 866 determines the path that a packet follows. The presence or absence of 867 a label assignment should not effect the path of a L3 packet. Note 868 however that the use of labels may make capabilities such as explicit 869 routes, loadsharing, and multipath more efficient. 871 2.2.2 Label Granularity 873 Labels are used to create a simple forwarding paradigm. The 874 essential element in assigning a label is that the device which will 875 be using the label to forward packets will be forwarding all packets 876 with the same label in the same way. If the packet is to be 877 forwarded solely by looking at the label, then at a minimum, all 878 packets with the same incoming label must be forwarded out the same 879 port(s) with the same encapsulation(s), and with the same next hop 880 label (if any). 882 The term "forwarding equivalence class" is used to refer to a set of 883 L3 packets which are all forwarded in the same manner by a particular 884 LSR (for example, the IP packets in a forwarding equivalence class 885 may be destined for the same egress from an MPLS network, and may be 886 associated with the same QoS class). A forwarding equivalence class 887 is therefore the set of L3 packets which could safely be mapped to 888 the same label. Note that there may be reasons that packets from a 889 single forwarding equivalence class may be mapped to multiple labels 890 (e.g., when stream merge is not used). 892 Note that the label could also mean "ignore this label and forward 893 based on what is contained within," where within one might find a 894 label (if a stack of labels is used) or a layer 3 packet. 896 For IP unicast traffic, the granularity of a label allows various 897 levels of aggregation in a Label Information Base (LIB). At one end 898 of the spectrum, a label could represent a host route (i.e. the full 899 32 bits of IP address). If a router forwards an entire CIDR prefix 900 in the same way, it may choose to use one label to represent that 901 prefix. Similarly if the router is forwarding several (otherwise 902 unrelated) CIDR prefixes in the same way it may choose to use the 903 same label for this set of prefixes. For instance all CIDR prefixes 904 which share the same BGP Next Hop could be assigned the same label. 905 Taking this to the limit, an egress router may choose to advertise 906 all of its prefixes with the same label. 908 By introducing the concept of an egress identifier, the distribution 909 of labels associated with groups of CIDR prefixes can be simplified. 910 For instance, an egress identifier might specify the BGP Next Hop, 911 with all prefixes routed to that next hop receiving the label 912 associated with that egress identifier. Another natural place to 913 aggregate would be the MPLS egress router. This would work 914 particularly well in conjunction with a link-state routing protocol, 915 where the association between egress router and CIDR prefix is 916 already distributed throughout an area. 918 For IP multicast, the natural binding of a label would be to a 919 multicast tree, or rather to the branch of a tree which extends from 920 a particular port. Thus for a shared tree, the label corresponds to 921 the multicast group, (*,G). For (S,G) state, the label would 922 correspond to the source address and the multicast group. 924 A label can also have a granularity finer than a host route. That 925 is, it could be associated with some combination of source and 926 destination address or other information within the packet. This 927 might for example be done on an administrative basis to aid in 928 effecting policy. A label could also correspond to all packets which 929 match a particular Integrated Services filter specification. 931 Labels can also represent explicit routes. This use is semantically 932 equivalent to using an IP tunnel with a complete explicit route. This 933 is discussed in more detail in section 4.10. 935 2.2.2.1 Examples of Unicast traffic granularities: 937 - PQ (Port Quadruples) same IP source address prefix, destination 938 address prefix, TTL, IP protocol and TCP/UDP source/destination ports 940 - PQT (Port Quadruples with TOS) same IP source address prefix, 941 destination address prefix, TTL, IP protocol and TCP/UDP 942 source/destination ports and same IP header TOS field (including 943 Precedence and TOS bits). 945 - HP (Host Pairs) Same specific IP source and destination address 946 (32 bit) 948 - NP (Network Pairs) Same IP source and destination address prefixes 949 (variable length) 951 - DN (Destination Network) Same IP destination network address 952 prefix (variable length) 954 - ER (Egress Router) Same egress router ID (e.g. OSPF) 956 - NAS (Next-hop AS) Same next-hop AS number (BGP) 958 - DAS (Destination AS) Same destination AS number (BGP) 960 2.2.2.2 Multicast traffic granularities: 962 - SST (Source Specific Tree) Same source address and multicast group 964 - SMT (Shared Multicast Tree) Same multicast group address 966 2.2.3 Label Assignment 967 Essential to label switching is the notion of binding between a label 968 and Network Layer routing (routes). A control component is 969 responsible for creating label bindings, and then distributing the 970 label binding information among label switches. Label assignment 971 involves allocating a label, and then binding a label to a route. 973 Label assignment can be driven by control traffic or by data traffic. 974 This is discussed in more detail in section 3.4. 976 Control traffic driven label assignment has several advantages, as 977 compared to data traffic driven label Assignment. For one thing, it 978 minimizes the amount of additional control traffic needed to 979 distribute label binding information, as label binding information is 980 distributed only in response to control traffic, independent of data 981 traffic. It also makes the overall scheme independent of and 982 insensitive to the data traffic profile/pattern. Control traffic 983 driven creation of label binding improves forwarding latency, as 984 labels are assigned before data traffic arrives, rather than being 985 assigned as data traffic arrives. It also simplifies the overall 986 system behavior, as the control plane is controlled solely by control 987 traffic, rather than by a mix of control and data traffic. 989 There are however situations where data traffic driven label 990 assignment is necessary. A particular case may occur with ATM 991 without VP or VC merge. In this case in order to set up a full mesh 992 of VCs would require n-squared VCs. However, in very large networks 993 this may be infeasible. Instead VCs may be setup where required for 994 forwarding data traffic. In this case it is generally not possible to 995 know a priori how many such streams may occur. 997 Label withdrawal is required with both control-driven and data-driven 998 label assignment. Label withdrawal is primarily a matter of garbage 999 collection, that is collecting up unused labels so that they may be 1000 reassigned. Generally speaking, a label should be withdrawn when the 1001 conditions that allowed it to be assigned are no longer true. For 1002 example, if a label is imbued with extra semantics such as loop-free- 1003 ness, then the label must be withdrawn when those extra semantics 1004 cease to hold. 1006 In certain cases, notably multicast, it may be necessary to share a 1007 label space between multiple entities. If these sharing arrangements 1008 are altered by the coming and going of neighbors, then labels which 1009 are no longer controlled by an entity must be withdrawn and a new 1010 label assigned. 1012 2.2.4 Label Stack and Forwarding Operations 1014 The basic forwarding operation consists of looking up the incoming 1015 label to determine the outgoing label, encapsulation, port, and any 1016 additional information which may pertain to the stream such as a 1017 particular queue or other QoS related treatment. We refer to this 1018 operation as a label swap. 1020 When a packet first enters an MPLS domain, the packet is forwarded by 1021 normal layer 3 forwarding operations with the exception that the 1022 outgoing encapsulation will now include a label. We refer to this 1023 operation as a label push. When a packet leaves an MPLS domain, the 1024 label is removed. We refer to this as a label pop. 1026 In some situations, carrying a stack of labels is useful. For 1027 instance both IGP and BGP label could be used to allow routers in the 1028 interior of an AS to be free of BGP information. In this scenario, 1029 the "IGP" label is used to steer the packet through the AS and the 1030 "BGP" label is used to switch between ASes. 1032 With a label stack, the set of label operations remains the same, 1033 except that at some points one might push or pop multiple labels, or 1034 pop & swap, or swap & push. 1036 2.3 Encapsulation 1038 Label-based forwarding makes use of various pieces of information, 1039 including a label or stack of labels, and possibly additional 1040 information such as a TTL field. In some cases this information may 1041 be encoded using an MPLS header, in other cases this information may 1042 be encoded in L2 headers. Note that there may be multiple types of 1043 MPLS headers. For example, the header used over one media type may be 1044 different than is used over a different media type. Similarly, in 1045 some cases the information that MPLS makes use of may be encoded in 1046 an ATM header. We will use the term "MPLS encapsulation" to refer to 1047 whatever form is used to encapsulate the label information and other 1048 information used for label based forwarding. The term "MPLS header" 1049 will be used where this information is carried in some sort of MPLS- 1050 specific header (i.e., when the MPLS information cannot all be 1051 carried in a L2 header). Whether there is one or multiple forms of 1052 possible MPLS headers is also outside of the scope of this document. 1054 The exact contents of the MPLS encapsulation is outside of the scope 1055 of this document. Some fields, such as the label, are obviously 1056 needed. Some others might or might not be standardized, based on 1057 further study. An encapsulation scheme may make use of the following 1058 fields: 1060 - label 1061 - TTL 1062 - class of service 1063 - stack indicator 1064 - next header type indicator 1065 - checksum 1067 It is desirable to have a very short encapsulation header. For 1068 example, a four byte encapsulation header adds to the convenience of 1069 building a hardware implementation that forwards based on the 1070 encapsulation header. But at the same time it is tricky assigning 1071 such a limited number of bits to carry the above listed information 1072 in an MPLS header. Hence careful consideration must be given to the 1073 information chosen for an MPLS header. 1075 A TTL value in the MPLS header may be useful in the same manner as it 1076 is in IP. Specifically, TTL may be used to terminate packets caught 1077 in a routing loop, and for other related uses such as traceroute. The 1078 TTL mechanism is a simple and proven method of handling such events. 1079 Another use of TTL is to expire packets in a network by limiting 1080 their "time to live" and eliminating stale packets that may cause 1081 problems for some of the higher layer protocols. When used over link 1082 layers which do not provide a TTL field, alternate mechanisms will be 1083 needed to replace the uses of the TTL field. 1085 A provision for a class of service (COS) field in the MPLS header 1086 allows multiple service classes within the same label. However, when 1087 more sophisticated QoS is associated with a label, the COS may not 1088 have any significance. Alternatively, the COS (like QoS) can be left 1089 out of the header, and instead propagated with the label assignment, 1090 but this entails that a separate label be assigned to each required 1091 class of service. Nevertheless, the COS mechanism provides a simple 1092 method of segregating flows within a label. 1094 As previously mentioned, the encapsulation header can be used to 1095 derive benefits of tunneling (or stacking). 1097 The MPLS header must provide a way to indicate that multiple MPLS 1098 headers are stacked (i.e., the "stack indicator"). For this purpose 1099 a single bit in the MPLS header will suffice. In addition, there are 1100 also some benefits to indicating the type of the protocol header 1101 following the MPLS header (i.e., the "next header type indicator"). 1102 One option would be to combine the stack indicator and next header 1103 type indicator into a single value (i.e., the next header type 1104 indicator could be allowed to take the value "MPLS header"). Another 1105 option is to have the next header type indicator be implicit in the 1106 label value (such that this information would be propagated along 1107 with the label). 1109 There is no compelling reason to support a checksum field in the MPLS 1110 header. A CRC mechanism at the L2 layer should be sufficient to 1111 ensure the integrity of the MPLS header. 1113 3. Observations, Issues and Assumptions 1115 3.1 Layer 2 versus Layer 3 Forwarding 1117 MPLS uses L2 forwarding as a way to provide simple and fast packet 1118 forwarding capability. One primary reason for the simplicity of L2 1119 layer forwarding comes from its short, fixed length labels. A node 1120 forwarding at L3 must parse a (relatively) large header, and perform 1121 a longest-prefix match to determine a forwarding path. However, when 1122 a node performs L2 label swapping, and labels are assigned properly, 1123 it can do a direct index lookup into its forwarding (or in this case, 1124 label-swapping) table with the short header. It is arguably simpler 1125 to build label swapping hardware than it is to build L3 forwarding 1126 hardware because the label swapping function is less complex. 1128 The relative performance of L2 and L3 forwarding may differ 1129 considerably between nodes. Some nodes may illustrate an order of 1130 magnitude difference. Other nodes (for example, nodes with more 1131 extensive L3 forwarding hardware) may have identical performance at 1132 L2 and L3. However, some nodes may not be capable of doing a L3 1133 forwarding at all (e.g. ATM), or have such limited capacity as to be 1134 unusable at L3. In this situation, traffic must be blackholed if no 1135 switched path exists. 1137 On nodes in which L3 forwarding is slower than L2 forwarding, pushing 1138 traffic to L3 when no L2 path is available may cause congestion. In 1139 some cases this could cause data loss (since L3 may be unable to keep 1140 up with the increased traffic). However, if data is discarded, then 1141 in general this will cause TCP to backoff, which would allow control 1142 traffic, traceroute and other network management tools to continue to 1143 work. 1145 The MPLS protocol MUST not make assumptions about the forwarding 1146 capabilities of an MPLS node. Thus, MPLS must propose solutions that 1147 can leverage the benefits of a node that is capable of L3 forwarding, 1148 but must not mandate the node be capable of such. 1150 Why We Will Still Need L3 Forwarding: 1152 MPLS will not, and is not intended to, replace L3 forwarding. There 1153 is absolutely a need for some systems to continue to forward IP 1154 packets using normal Layer 3 IP forwarding. L3 forwarding will be 1155 needed for a variety of reasons, including: 1157 - For scaling; to forward on a finer granularity than the labels 1158 can provide 1159 - For security; to allow packet filtering at firewalls. 1160 - For forwarding at the initial router (when hosts don't do MPLS) 1162 Consider a campus network which is serving a small company. Suppose 1163 that this companies makes use of the Internet, for example as a 1164 method of communicating with customers. A customer on the other side 1165 of the world has an IP packet to be forwarded to a particular system 1166 within the company. It is not reasonable to expect that the customer 1167 will have a label to use to forward the packet to that specific 1168 system. Rather, the label used for the "first hop" forwarding might 1169 be sufficient to get the packet considerably closer to the 1170 destination. However, the granularity of the labels cannot be to 1171 every host worldwide. Similarly, routing used within one routing 1172 domain cannot know about every host worldwide. This implies that in 1173 may cases the labels assigned to a particular packet will be 1174 sufficient to get the packet close to the destination, but that at 1175 some points along the path of the packet the IP header will need to 1176 be examined to determine a finer granularity for forwarding that 1177 packet. This is particularly likely to occur at domain boundaries. 1179 A similar point occurs at the last router prior to the destination 1180 host. In general, the number of hosts attached to a network is likely 1181 to be great enough that it is not feasible to assign a separate label 1182 to every host. Rather, as least for routing within the destination 1183 routing domain (or the destination area if there is a hierarchical 1184 routing protocol in use) a label may be assigned which is sufficient 1185 to get the packet to the last hop router. However, the last hop 1186 router will need to examine the IP header (and particularly the 1187 destination IP address) in order to forward the packet to the correct 1188 destination host. 1190 Packet filtering at firewalls is an important part of the operation 1191 of the Internet. While the current state of Internet security may be 1192 considerably less advanced than may be desired, nonetheless some 1193 security (as is provided by firewalls) is much better than no 1194 security. We expect that packet filtering will continue to be 1195 important for the foreseeable future. Packet filtering requires 1196 examination of the contents of the packet, including the IP header. 1197 This implies that at firewalls the packet cannot be forwarded simply 1198 by considering the label associated with the packet. Note that this 1199 is also likely to occur at domain boundaries. 1201 Finally, it is very likely that many hosts will not implement MPLS. 1202 Rather, the host will simply forward an IP packet to its first hop 1203 router. This first hop router will need to examine the IP header 1204 prior to forwarding the packet (with or without a label). 1206 3.2 Scaling Issues 1208 MPLS scalability is provided by two of the principles of routing. 1209 The first is that forwarding follows an inverted tree rooted at a 1210 destination. The second is that the number of destinations is 1211 reduced by routing aggregation. 1213 The very nature of IP forwarding is a merged multipoint-to-point 1214 tree. Thus, since MPLS mirrors the IP network layer, an MPLS node 1215 that is capable of merging is capable of creating O(n) switched paths 1216 which provide network reachability to all "n" destinations. The 1217 meaning of "n" depends on the granularity of the switched paths. One 1218 obvious choice of "n" is the number of CIDR prefixes existing in the 1219 forwarding table (this scales the same as today's routing). However, 1220 the value of "n" may be reduced considerably by choosing switched 1221 paths of further aggregation. For example, by creating switched paths 1222 to each possible egress node, "n" may represent the number of egress 1223 nodes in a network. This choice creates "n" switched paths, such that 1224 each path is shared by all CIDR prefixes that are routed through the 1225 same egress node. This selection greatly improves scalability, since 1226 it minimizes "n", but at the same time maintains the same switching 1227 performance of CIDR aggregation. (See section 2.2.2 for a description 1228 of all of the levels of granularity provided by MPLS). 1230 The MPLS technology must scale at least as well as existing 1231 technology. For example, if the MPLS technology were to support ONLY 1232 host-to-host switched path connectivity, then the number of 1233 switched-paths would be much higher than the number of routing table 1234 entries. 1236 There are several ways in which merging can be done in order to allow 1237 O(n) switches paths to connect n nodes. The merging approach used has 1238 an impact on the amount of state information, buffering, delay 1239 characteristics, and the means of control required to coordinate the 1240 trees. These issues are discussed in more detail in section 4.2. 1242 There are some cases in which O(n-squared) switched paths may be used 1243 (for example, by setting up a full mesh of point to point streams). 1244 As label space and the amount of state information that can be 1245 supported may be limited, it will not be possible to support O(n- 1246 squared) switched paths in very large networks. However, in some 1247 cases the use of n- squared paths may even be a advantage (for 1248 example, to allow load- splitting of individual streams). 1250 MPLS must be designed to scale for O(n). O(n) scaling allows MPLS 1251 domains to scale to a very large scale. In addition, if best effort 1252 service can be supported with O(n) scaling, this conserves resources 1253 (such as label space and state information) which can be used for 1254 supporting advanced services such as QoS. However, since some 1255 switches may not support merging, and some small networks may not 1256 require the scaling benefits of O(n), provisions must also be 1257 provided for a non- merging, O(n-squared) solution. 1259 Note: A precise and complete description of scaling would consider 1260 that there are multiple dimensions of scaling, and multiple resources 1261 whose usage may be considered. Possible dimensions of scaling 1262 include: (i) the total number of streams which exist in an MPLS 1263 domain (with associated labels assigned to them); (ii) the total 1264 number of "label swapping pairs" which may be stored in the nodes of 1265 the network (ie, entries of the form "for incoming label 'x', use 1266 outgoing label 'y'"); (iii) the number of labels which need to be 1267 assigned for use over a particular link; (iv) The amount of state 1268 information which needs to be maintained by any one node. We do not 1269 intend to perform a complete analysis of all possible scaling issues, 1270 and understand that our use of the terms "O(n)" and "O(n-squared)" is 1271 approximate only. 1273 3.3 Types of Streams 1275 Switched paths in the MPLS network can be of different types: 1277 - point-to-point 1278 - multipoint-to-point 1279 - point-to-multipoint 1280 - multipoint-to-multipoint 1282 Two of the factors that determine which type of switched path is used 1283 are (i) The capability of the switches employed in a network; (ii) 1284 The purpose of the creation of a switched path; that is, the types of 1285 flows to be carried in the switched path. These two factor also 1286 determine the scalability of a network in terms of the number of 1287 switched paths in use for transporting data through a network. 1289 The point-to-point switched path can be used to connect all ingress 1290 nodes to all the egress nodes to carry unicast traffic. In this 1291 case, since an ingress node has point-to-point connections to all the 1292 egress nodes, the number of connections in use for transporting 1293 traffic is of O(n-squared), where n is the number of edges MPLS 1294 devices. For small networks the full mesh connection approach may 1295 suffice and not pose any scalability problems. However, in large 1296 enterprise backbone or ISP networks, this will not scale well. 1298 Point-to-point switched paths may be used on a host-to-host or 1299 application to application basis (e.g., a switched path per RSVP 1300 flow). The dedicated point-to-point switched path transports the 1301 unicast data from the ingress to the egress node of the MPLS network. 1303 This approach may be used for providing QoS services or for best- 1304 effort traffic. 1306 A multipoint-to-point switched path connects all ingress nodes to an 1307 single egress node. At a given intermediate node in the multipoint- 1308 to- point switched path, L2 data units from several upstream links 1309 are "merged" into a single label on a downstream link. Since each 1310 egress node is reachable via a single multipoint-to-point switched 1311 path, the number of switched paths required to transport best-effort 1312 traffic through a MPLS network is O(n), where n is the number of 1313 egress nodes. 1315 The point-to-multipoint switched path is used for distributing 1316 multicast traffic. This switched path tree mirrors the multicast 1317 distribution tree as determined by the multicast routing protocols. 1318 Typically a switch capable of point-to-multipoint connection 1319 replicates an L2 data unit from the incoming (parent) interface to 1320 all the outgoing (child) interfaces. Standard ATM switches support 1321 such functionality in the form of point-to-multipoint VCs or VPs. 1323 A multipoint-to-multipoint switched path may be used to combine 1324 multicast traffic from multiple sources into a single multicast 1325 distribution tree. The advantage of this is that the multipoint-to- 1326 multipoint switched path is shared by multiple sources. Conceptually, 1327 a form of multipoint-to-multipoint can be thought of as follows: 1328 Suppose that you have a point to multipoint VC from each node to all 1329 other nodes. Suppose that any point where two or more VCs happen to 1330 merge, you merge them into a single VC or VP. This would require 1331 either coordination of VCI spaces (so that each source has a unique 1332 VCI within a VP) or VC merge capabilities. The applicability of 1333 similar concepts to MPLS is FFS. 1335 3.4 Data Driven versus Control Traffic Driven Label Assignment 1337 A fundamental concept in MPLS is the association of labels and 1338 network layer routing. Each LSR must assign labels, and distribute 1339 them to its forwarding peers, for traffic which it intends to forward 1340 by label swapping. In the various contributions that have been made 1341 so far to the MPLS WG we identify three broad strategies for label 1342 assignment; (i) those driven by topology based control traffic 1343 [TAG][ARIS][IP navigator]; (ii) Those driven by request based control 1344 traffic [RSVP]; and (iii) those driven by data traffic 1345 [CSR][Ipsilon]. 1347 We also note that in actual practice combinations of these methods 1348 may be employed. One example is that topology based methods for best 1349 effort traffic plus request based methods for support of RSVP. 1351 3.4.1 Topology Driven Label Assignment 1353 In this scheme labels are assigned in response to normal processing 1354 of routing protocol control traffic. Examples of such control 1355 protocols are OSPF and BGP. As an LSR processes OSPF or BGP updates 1356 it can, as it makes or changes entries in its forwarding tables, 1357 assign labels to those entries. 1359 Among the properties of this scheme are: 1361 - The computational load of assignment and distribution and the 1362 bandwidth consumed by label distribution are bounded by the size 1363 of the network. 1365 - Labels are in the general case preassigned. If a route exists then 1366 a label has been assigned to it (and distributed). Traffic may be 1367 label swapped immediately it arrives, there is no label setup 1368 latency at forwarding time. 1370 - Requires LSRs to be able to process control traffic load only. 1372 - Labels assigned in response to the operation of routing protocols 1373 can have a granularity equivalent to that of the routes advertised 1374 by the protocol. Labels can, by this means, cover (highly) 1375 aggregated routes. 1377 3.4.2 Request Driven Label Assignment 1379 In this scheme labels are assigned in response to normal processing 1380 of request based control traffic. Examples of such control protocols 1381 are RSVP. As an LSR processes RSVP messages it can, as it makes or 1382 changes entries in its forwarding tables, assign labels to those 1383 entries. 1385 Among the properties of this scheme are: 1387 - The computational load of assignment and distribution and the 1388 bandwidth consumed by label distribution are bounded by the 1389 amount of control traffic in the system. 1391 - Labels are in the general case preassigned. If a route exists 1392 then a label has been assigned to it (and distributed). Traffic 1393 may be label swapped immediately it arrives, there is no label 1394 setup latency at forwarding time. 1396 - Requires LSRs to be able to process control traffic load only. 1398 - Depending upon the number of flows supported, this approach may 1399 require a larger number of labels to be assigned compared with 1400 topology driven assignment. 1402 - This approach requires applications to make use of request 1403 paradigm in order to get a label assigned to their flow. 1405 3.4.3 Traffic Driven Label Assignment 1407 In this scheme the arrival of data at an LSR "triggers" label 1408 assignment and distribution. Traffic driven approach has the 1409 following characteristics. 1411 - Label assignment and distribution costs are a function of 1412 traffic patterns. In an LSR with limited label space that is 1413 using a traffic driven approach to amortize its labels over a 1414 larger number of flows the overhead due to label assignment 1415 and distribution grows as a function of the number of flows 1416 and as a function of their "persistence". Short lived but 1417 recurring flows may impose a heavy control burden. 1419 - There is a latency associated with the appearance of a "flow" 1420 and the assignment of a label to it. The documented approaches 1421 to this problem suggest L3 forwarding during this setup phase, 1422 this has the potential for packet reordering (note that packet 1423 reordering may occur with any scheme when the network topology 1424 changes, but traffic driven label assignment introduces another 1425 cause for reordering). 1427 - Flow driven label assignment requires high performance packet 1428 classification capabilities. 1430 - Traffic driven label assignment may be useful to reduce label 1431 consumption (assuming that flows are not close to full mesh). 1433 - If you want flows to hosts, due to limits on label space, then 1434 traffic based label consumption is probably necessary due to 1435 the large number of hosts which may occur in a network. 1437 - If you want to assign specific network resources to specific 1438 labels, to be used for support of application flows, then 1439 again the fine grain associated with labels may require data 1440 based label assignment. 1442 3.5 The Need for Dealing with Looping 1444 Routing protocols which are used in conjunction with MPLS will in 1445 many cases be based on distributed computation. As such, during 1446 routing transients, these protocols may compute forwarding paths 1447 which contain loops. For this reason MPLS will be designed with 1448 mechanisms to either prevent the formation of loops and /or contain 1449 the amount of resources that can be consumed due to the presence of 1450 loops. 1452 Note that there are a number of different alternative mechanisms 1453 which have been proposed (see section 4.3). Some of these prevent the 1454 formation of layer 2 forwarding loops, others allow loops to form but 1455 minimize their impact in one way or another (e.g., by discarding 1456 packets which loop, or by detecting and closing the loop after a 1457 period of time). Generally speaking, there are tradeoffs to be made 1458 between the amount of looping which might occur, and other 1459 considerations such as the time to convergence after a change in the 1460 paths computed by the routing algorithm. 1462 We are not proposing any changes to normal layer 3 operation, and 1463 specifically are not trying to eliminate the possibility of looping 1464 at layer 3. Transient loops will continue to be possible in IP 1465 networks. Note that IP has a means to limit the damage done by 1466 looping packets, based on decrementing the IP TTL field as the packet 1467 is forwarded, and discarding packets whose TTL has expired. Dynamic 1468 routing protocols used with IP are also designed to minimize the 1469 amount of time during which loops exist. 1471 The question that MPLS has to deal with is what to do at L2. In some 1472 cases L2 may make use of the same method that is used as L3. However, 1473 other options are available at L2, and in some cases (specifically 1474 when operating over ATM or Frame Relay hardware) the method of 1475 decrementing a TTL field (or any similar field) is not available. 1477 There are basically two problems caused by packet looping: The most 1478 obvious problem is that packets are not delivered to the correct 1479 destination. The other result of looping is congestion. Even with TTL 1480 decrementing and packet discard, there may still be a significant 1481 amount of time that packets travel through a loop. This can adversely 1482 affect other packets which are not looping: Congestion due to the 1483 looping packets can cause non-looping packets to be delayed and/or 1484 discarded. 1486 Looping is particularly serious in (at least) three cases: One is 1487 when forwarding over ATM. Since ATM does not have a TTL field to 1488 decrement, there is no way to discard ATM cells which are looping 1489 over ATM subnetworks. Standard ATM PNNI routing and signaling solves 1490 this problem by making use of call setup procedures which ensure that 1491 ATM VCs will never be setup in a loop [PNNI]. However, when MPLS is 1492 used over ATM subnets, the native ATM routing and signaling 1493 procedures may not be used for the full L2 path. This leads to the 1494 possibility that MPLS over ATM might in principle allow packets to 1495 loop indefinitely, or until L3 routing stabilizes. Methods are needed 1496 to prevent this problem. 1498 Another case in which looping can be particularly unpleasant is for 1499 multicast traffic. With multicast, it is possible that the packet may 1500 be delivered successfully to some destinations even though copies 1501 intended for other destinations are looping. This leads to the 1502 possibility that huge numbers of identical packets could be delivered 1503 to some destinations. Also, since multicast implies that packets are 1504 duplicated at some points in their path, the congestion resulting 1505 from looping packets may be particularly severe. 1507 Another unpleasant complication of looping occurs if the congestion 1508 caused by the loop interferes with the routing protocol. It is 1509 possible for the congestion caused by looping to cause routing 1510 protocol control packets to be discarded, with the result that the 1511 routing protocol becomes unstable. For example this could lengthen 1512 the duration of the loop. 1514 In normal operation of IP networks the impact of congestion is 1515 limited by the fact that TCP backs off (i.e., transmits substantially 1516 less traffic) in response to lost packets. Where the congestion is 1517 caused by looping, the combination of TTL and the resulting discard 1518 of looping packets, plus the reduction in offered traffic, can limit 1519 the resulting impact on the network. TCP backoff however does not 1520 solve the problem if the looping packets are not discarded (for 1521 example, if the loop is over an ATM subnetwork where TTL is not 1522 used). 1524 The severity of the problem caused by looping may depend upon 1525 implementation details. Suppose, for instance, that ATM switching 1526 hardware is being used to provide MPLS switching functions. If the 1527 ATM hardware has per-VC queuing, and if it is capable of providing 1528 fair access to the buffer pool for incoming cells based on the 1529 incoming VC (so that no one incoming VC is allowed to grab a 1530 disproportionate number of buffers), this looping might not have a 1531 significant effect on other traffic. If the ATM hardware cannot 1532 provide fair buffer access of this sort, however, then even transient 1533 loops may cause severe degradation of the node's total performance. 1535 Given that MPLS is a relatively new approach, it is possible that 1536 looping may have consequences which are not fully understood (such as 1537 looping of LDP control information in cases where stream merge is not 1538 used). 1540 Even if fair buffer access can be provided, it is still worthwhile to 1541 have some means of detecting loops that last "longer than possible". 1542 In addition, even where TTL and/or per-VC fair queuing provides a 1543 means for surviving loops, it still may be desirable where practical 1544 to avoid setting up LSPs which loop. 1546 Methods for dealing with loops are discussed in section 4.3. 1548 3.6 Operations and Management 1550 Operations and management of networks is critically important. This 1551 implies that MPLS must support operations, administration, and 1552 maintenance facilities at least as extensive as those supported in 1553 current IP networks. 1555 In most ways this is a relatively simple requirement to meet. Given 1556 that all MPLS nodes run normal IP routing protocols, it is 1557 straightforward to expect them to participate in normal IP network 1558 management protocols. 1560 There is one issue which has been identified and which needs to be 1561 addressed by the MPLS effort: There is an issue with regard to 1562 operation of Traceroute over MPLS networks. Note that other O&M 1563 issues may be identified in the future. 1565 Traceroute is a very commonly used network management tool. 1566 Traceroute is based on use of the TTL field: A station trying to 1567 determine the route from itself to a specified address transmits 1568 multiple IP packets, with the TTL field set to 1 in the first packet, 1569 2 in the second packet, etc.. This causes each router along the path 1570 to send back an ICMP error report for TTL exceeded. This in turn 1571 allows the station to determine the set of routers along the route. 1572 For example, this can be used to determine where a problem exists (if 1573 no router responds past some point, the last router which responds 1574 can become the starting point for a search to determine the cause of 1575 the problem). 1577 When MPLS is operating over ATM or Frame Relay networks there is no 1578 TTL field to decrement (and ATM and Frame Relay forwarding hardware 1579 does not decrement TTL). This implies that it is not straightforward 1580 to have Traceroute operate in this environment. 1582 There is the question of whether we *want* all routers along a path 1583 to be visible via traceroute. For example, an ISP probably doesn't 1584 want to expose the interior of their network to a customer. However, 1585 the issue of whether a network's policy will allow the interior of 1586 the network to be visible should be independent of whether is it 1587 possible for some users to see the interior of the network. Thus 1588 while there clearly should be the possibility of using policy 1589 mechanisms to block traceroute from being used to see the interior of 1590 the network, this does not imply that it is okay to develop protocol 1591 mechanisms which break traceroute from working. 1593 There is also the question of whether the interior of a MPLS network 1594 is analogous to a normal IP network, or whether it is closer to the 1595 interior of a layer 2 network (for example, an ATM subnet). Clearly 1596 IP traceroute cannot be used to expose the interior of an ATM subnet. 1597 When a packet is crossing an ATM subnetwork (for example, between an 1598 ingress and an egress router which are attached to the ATM subnet) 1599 traceroute can be used to determine the router to router path, but 1600 not the path through the ATM switches which comprise the ATM subnet. 1601 Note here that MPLS forms a sort of "in between" special case: 1602 Routing is based on normal IP routing protocols, the equivalent of 1603 call setup (label binding/exchange) is based on MPLS-specific 1604 protocols, but forwarding is based on normal L2 ATM forwarding. MPLS 1605 therefore supersedes the normal ATM-based methods that would be used 1606 to eliminate loops and/or trace paths through the ATM subnet. 1608 It is generally agreed that Traceroute is a relatively "ugly" tool, 1609 and that a better tool for tracing the route of a packet would be 1610 preferable. However, no better tool has yet been designed or even 1611 proposed. Also, however ugly Traceroute may be, it is nonetheless 1612 very useful, widely deployed, and widely used. In general, it is 1613 highly preferable to define, implement, and deploy a new tool, and to 1614 determine through experience that the new tool is sufficient, before 1615 breaking a tool which is as widely used as traceroute. 1617 Methods that may be used to either allow traceroute to be used in an 1618 MPLS network, or to replace traceroute, are discussed in section 1619 4.14. 1621 4. Technical Approaches 1623 4.1 Label Distribution 1625 A fundamental requirement in MPLS is that an LSR forwarding label 1626 switched traffic to another LSR apply a label to that traffic which 1627 is meaningful to the other (receiving the traffic) LSR. LSR's could 1628 learn about each other's labels in a variety of ways. We call the 1629 general topic "label distribution". 1631 4.1.1 Explicit Label Distribution 1633 Explicit label distribution anticipates the specification by MPLS of 1634 a standard protocol for label distribution. Two of the possible 1635 approaches [TDP] [ARIS] that are oriented toward topology driven 1636 label distribution. One other approach [FANP], in contrast, makes use 1637 of traffic driven label distribution. 1639 We expect that the label distribution protocol (LDP) which emerges 1640 from the MPLS WG is likely to inherit elements from one or more of 1641 the possible approaches. 1643 Consider LSR A forwarding traffic to LSR B. We call A the upstream 1644 (wrt to dataflow) LSR and B the downstream LSR. A must apply a label 1645 to the traffic that B "understands". Label distribution must ensure 1646 that the "meaning" of the label will be communicated between A and B. 1647 An important question is whether A or B (or some other entity) 1648 allocates the label. 1650 In this discussion we are talking about the allocation and 1651 distribution of labels between two peer LSRs that are on a single 1652 segment of what may be a longer path. A related but in fact entirely 1653 separate issue is the question of where control of the whole path 1654 resides. In essence there are two models; by analogy to upstream and 1655 downstream for a single segment we can talk about ingress and egress 1656 for an LSP (or to and from a label swapping "domain"). In one model a 1657 path is setup from ingress to egress in the other from egress to 1658 ingress. 1660 4.1.1.1 Downstream Label Allocation 1662 "Downstream Label Allocation" refers to a method where the label 1663 allocation is done by the downstream LSR, i.e. the LSR that uses the 1664 label as an index into its switching tables. 1666 This is, arguably, the most natural label allocation/distribution 1667 mode for unicast traffic. As an LSR build its routing tables (we 1668 consider here control driven allocation of tags) it is free, within 1669 some limits we will discuss, to allocate labels to in any manner that 1670 may be convenient to the particular implementation. Since the labels 1671 that it allocates will be those upon which it subsequently makes 1672 forwarding decisions we assume implementations will perform the 1673 allocation in an optimal manner. Having allocated labels the default 1674 behavior is to distribute the labels (and bindings) to all peers. 1676 In some cases (particularly with ATM) there may be a limited number 1677 of labels which may be used across an interface, and/or a limited 1678 number of label assignments which may be supported by a single 1679 device. Operation in this case may make use of "on demand" label 1680 assignment. With this approach, an LSR may for example request a 1681 label for a route from a particular peer only when its routing 1682 calculations indicate that peer to be the new next hop for the route. 1684 4.1.1.2 Upstream Label Allocation 1686 "Upstream Label Allocation" refers to a method where the label 1687 allocation is done by the upstream LSR. In this case the LSR choosing 1688 the label (the upstream LSR) and the LSR which needs to interpret 1689 packets using the label (the downstream LSR) are not the same node. 1690 We note here that in the upstream LSR the label at issue is not used 1691 as an index into the switching tables but rather is found as the 1692 result of a lookup on those tables. 1694 The motivation for upstream label allocation comes from the 1695 recognition that it might be possible to optimize multicast machinery 1696 in an LSR if it were possible to use the same label on all output 1697 ports for which a particular multicast packet/cell were destined. 1698 Upstream assignment makes this possible. 1700 4.1.1.3 Other Label Allocation Methods 1702 Another option would be to make use of label values which are unique 1703 within the MPLS domain (implying that a domain-wide allocation would 1704 be needed). In this case, any stream to a particular MPLS egress node 1705 could make use of the label of that node (implying that label values 1706 do not need to be swapped at intermediate nodes). 1708 With this method of label allocation, there is a choice to be made 1709 regarding the scope over which a label is unique. One approach is to 1710 configure each node in an MPLS domain with a label which is unique in 1711 that domain. Another approach is to use a truly global identifier 1712 (for example the IEEE 48 bit identifier), where each MPLS-capable 1713 node would be stamped at birth with a truly globally unique 1714 identifier. The point of this global approach is to simplify 1715 configuration in each MPLS domain by eliminating the need to 1716 configure label IDs. 1718 4.1.2 Piggybacking on Other Control Messages 1720 While we have discussed use of an explicit MPLS LDP we note that 1721 there are several existing protocols that can be easily modified to 1722 distribute both routing/control and label information. This could be 1723 done with any of OSPF, BGP, RSVP and/or PIM. A particular 1724 architectural elegance of these schemes is that label distribution 1725 uses the same mechanisms as are used in distribution of the 1726 underlying routing or control information. 1728 When explicit label distribution is used, the routing computation and 1729 label distribution are decoupled. This implies a possibility that at 1730 some point you may either have a route to a specific destination 1731 without an associated label, and/or a label for a specific 1732 destination which makes use of a path which you are no longer using. 1733 Piggybacking label distribution on the operation of the routing 1734 protocol is one way to eliminate this decoupling. 1736 Piggybacking label distribution on the routing protocol introduces an 1737 issue regarding how to negotiate acceptable label values and what to 1738 do if an invalid label is received. This is discussed in section 1739 4.1.3. 1741 4.1.3 Acceptable Label Values 1743 There are some constraints on which label values may be used in 1744 either allocation mode. Clearly the label values must lie within the 1745 allowable range described in the encapsulation standards that the 1746 MPLS WG will produce. The label value used must also, however, lie 1747 within a range that the peer LSR is capable of supporting. We 1748 imagine that certain machines, for example ATM switches operating as 1749 LSRs may, due to operational or implementation restrictions, support 1750 a label space more limited than that bounded by the valid range found 1751 in the encapsulation standard. This implies that an advertisement or 1752 negotiation mechanism for useable label range may be a part of the 1753 MPLS LDP. When operating over ATM using ATM forwarding hardware, due 1754 to the need for compatibility with the existing use of the ATM 1755 VPI/VCI space, it is quite likely that an explicit mechanism will be 1756 needed for label range negotiation. 1758 In addition we note that LDP may be one of a number of mechanism used 1759 to distribute labels between any given pair of LSRs. Clearly where 1760 such multiple mechanisms exist care must be taken to coordinate the 1761 allocation of label values. A single label value must have a unique 1762 meaning to the LSR that distributes it. 1764 There is an issue regarding how to allow negotiation of acceptable 1765 label values if label distribution is piggybacked with the routing 1766 protocol. In this case it may be necessary either to require 1767 equipment to accept any possible label value, or to configure devices 1768 to know which range of label values may be selected. It is not clear 1769 in this case what to do if an invalid label value is received as 1770 there may be no means of sending a NAK. 1772 A similar issue occurs with multicast traffic over broadcast media, 1773 where there may be multiple nodes which receive the same transmission 1774 (using a single label value). Here again it may be "non-trivial" how 1775 to allow n-party negotiation of acceptable label values. 1777 4.1.4 LDP Reliability 1779 The need for reliable label distribution depends upon the relative 1780 performance of L2 and L3 forwarding, as well as the relationship 1781 between label distribution and the routing protocol operation. 1783 If label distribution is tied to the operation of the routing 1784 protocol, then a reasonable protocol design would ensure that labels 1785 are distributed successfully as long as the associated route and/or 1786 reachability advertisement is distributed successfully. This implies 1787 that the reliability of label distribution will be the same as the 1788 reliability of route distribution. 1790 If there is a very large difference between L2 and L3 forwarding 1791 performance, then the cost of failing to deliver a label is 1792 significant. In this case it is important to ensure that labels are 1793 distributed reliably. Given that LDP needs to operate in a wide 1794 variety of environments with a wide variety of equipment, this 1795 implies that it is important for any LDP developed by the MPLS WG to 1796 ensure reliable delivery of label information. 1798 Reliable delivery of LDP packets may potentially be accomplished 1799 either by using an existing reliable transport protocol such as TCP, 1800 or by specifying reliability mechanisms as part of LDP (for example, 1801 the reliability mechanisms which are defined in IDRP could 1802 potentially be "borrowed" for use with LSP). 1804 TCP supports flow control {in addition to supporting reliable 1805 delivery of data). Flow control is a desirable feature which will be 1806 useful for MPLS (as well as other applications making use of a 1807 reliable transport) and therefore needs to be built into whatever 1808 reliability mechanism is used for MPLS. 1810 4.1.5 Label Purge Mechanisms 1812 Another issue to be considered is the "lifetime" of label data once 1813 it arrives at an LSR, and the method of purging label data. There are 1814 several methods that could be used either separately, or (more 1815 likely) in combination. 1817 One approach is for label information to be timed out. With this 1818 approach a lifetime is distributed along with the label value. The 1819 label value may be refreshed prior to timing out. If the label is not 1820 refreshed prior to timing out it is discarded. In this case each 1821 lifetime and timer may apply to a single label, or to a group of 1822 labels (e.g., all labels selected by the same node). 1824 Similarly, two peer nodes may make use of an MPLS peer keep-alive 1825 mechanism. This implies exchange of MPLS control packets between 1826 neighbors on a periodic basis. This in general is likely to use a 1827 smaller timeout value than label value timers (analogous to the fact 1828 that the OSPF HELLO interval is much shorter than the OSPF LSA 1829 lifetime). If the peer session between two MPLS nodes fails (due to 1830 expiration of the associated timer prior to reception of the refresh) 1831 then associated label information is discarded. 1833 If label information is piggybacked on the routing protocol then the 1834 timeout mechanisms would also be taken from the associated routing 1835 protocol (note that routing protocols in general have mechanisms to 1836 invalidate stale routing information). 1838 An alternative method for invalidating labels is to make use of an 1839 explicit label removal message. 1841 4.2 Stream Merging 1843 In order to scale O(n) (rather than O(n-squared), MPLS makes use of 1844 the concept of stream merge. This makes use of multipoint to point 1845 streams in order to allow multiple streams to be merged into one 1846 stream. 1848 4.2.1 Types of Stream Merge: 1850 There are several types of stream merge that can be used, depending 1851 upon the underlying media. 1853 When MPLS is used over frame based media merging is straightforward. 1854 All that is required for stream merge to take place is for a node to 1855 allow multiple upstream labels to be forwarded the same way and 1856 mapped into a single downstream label. This is referred to as frame 1857 merge. 1859 Operation over ATM media is less straightforward. In ATM, the data 1860 packets are encapsulated into an ATM Adaptation Layer, say AAL5, and 1861 the AAL5 PDU is segmented into ATM cells with a VPI/VCI value and the 1862 cells are transmitted in sequence. It is contingent on ATM switches 1863 to keep the cells of a PDU (or with the same VPI/VCI value) 1864 contiguous and in sequence. This is because the device that 1865 reassembles the cells to re-form the transmitted PDU expects the 1866 cells to be contiguous and in sequence, as there isn't sufficient 1867 information in the ATM cell header (unlike IP fragmentation) to 1868 reassemble the PDU with any cell order. Hence, if cells from several 1869 upstream link are transmitted onto the same downstream VPI/VCI, then 1870 cells from one PDU can get interleaved with cells from another PDU on 1871 the outgoing VPI/VCI, and result in corruption of the original PDUs 1872 by mis-sequencing the cells of each PDU. 1874 The most straightforward (but erroneous) method of merging in an ATM 1875 environment would be to take the cells from two incoming VCs and 1876 merge them into a single outgoing VCI. If this was done without any 1877 buffering of cells then cells from two or more packets could end up 1878 being interleaved into a single AAL5 frame. Therefore the problem 1879 when operating over ATM is how to avoid interleaving of cells from 1880 multiple sources. 1882 There are two ways to solve this interleaving problem, which are 1883 referred to as VC merge and VP merge. 1885 VC merge allows multiple VCs to be merged into a single outgoing VC. 1886 In order for this to work the node performing the merge needs to keep 1887 the cells from one AAL5 frame (e.g., corresponding to an IP packet) 1888 separate from the cells of other AAL5 frames. This may be done by 1889 performing the SAR function in order to reassemble each IP packet 1890 before forwarding that packet. In this case VC merge is essentially 1891 equivalent to frame merge. An alternative is to buffer the cells of 1892 one AAL5 frame together, without actually reassembling them. When the 1893 end of frame indicator is reached that frame can be forwarded. Note 1894 however that both forms of VC merge requires that the entire AAL5 1895 frame be received before any cells corresponding to that frame be 1896 forwarded. VC merge therefore requires capabilities which are 1897 generally not available in most existing ATM forwarding hardware. 1899 The alternative for use over ATM media is VP merge. Here multiple VPs 1900 can be merged into a single VP. Separate VCIs within the merged VP 1901 are used to distinguish frames (e.g., IP packets) from different 1902 sources. In some cases, one VP may be used for the tree from each 1903 ingress node to a single egress node. 1905 4.2.2 Interoperation of Merge Options: 1907 If some nodes support stream merge, and some nodes do not, then it is 1908 necessary to ensure that the two types of nodes can interoperate 1909 within a single network. This affects the number of labels that a 1910 node needs to send to a neighbor. An upstream LSR which supports 1911 Stream Merge needs to be sent only one label per forwarding 1912 equivalence class (FEC). An upstream neighbor which does not support 1913 Stream Merge needs to be sent multiple labels per FEC. However, there 1914 is no way of knowing a priori how many labels it needs. This will 1915 depend on how many LSRs are upstream of it with respect to the FEC in 1916 question. 1918 If a particular upstream neighbor does not support stream merge, it 1919 is not known a priori how many labels it will need. The upstream 1920 neighbor may need to explicitly ask for labels for each FEC. The 1921 upstream neighbor may make multiple such requests (for one or more 1922 labels per request). When a downstream neighbor receives such a 1923 request from upstream, and the downstream neighbor does not itself 1924 support stream merge, then it must in turn ask its downstream 1925 neighbor for more labels for the FEC in question. 1927 It is possible that there may be some nodes which support merge, but 1928 have a limited number of upstream streams which may be merged into a 1929 single downstream streams. Suppose for example that due to some 1930 hardware limitation a node is capable of merging four upstream LSPs 1931 into a single downstream LSP. Suppose however, that this particular 1932 node has six upstream LSPs arriving at it for a particular Stream. In 1933 this case, this node may merge these into two downstream LSPs 1934 (corresponding to two labels that need to be obtained from the 1935 downstream neighbor). In this case, the node will need to obtain the 1936 required two labels. 1938 The interoperation of the various forms of merging over ATM is most 1939 easily described by first describing the interoperation of VC merge 1940 with non-merge. 1942 In the case where VC merge and non-merge nodes are interconnected the 1943 forwarding of cells is based in all cases on a VC (i.e., the 1944 concatenation of the VPI and VCI). For each node, if an upstream 1945 neighbor is doing VC merge then that upstream neighbor requires only 1946 a single outgoing VPI/VCI for a particular FEC (this is analogous to 1947 the requirement for a single label in the case of operation over 1948 frame media). If the upstream neighbor is not doing merge, then it 1949 will require a single outgoing VPI/VCI per FEC for itself (assuming 1950 that it can be an ingress node), plus enough outgoing VPI/VCIs to map 1951 to incoming VPI/VCIs to pass to its upstream neighbors. The number 1952 required will be determined by allowing the upstream nodes to request 1953 additional VPI/VCIs from their downstream neighbors. 1955 A similar method is possible to support nodes which perform VP merge. 1956 In this case the VP merge node, rather than requesting a single 1957 VPI/VCI or a number of VPI/VCIs from its downstream neighbor, instead 1958 may request a single VP (identified by a VPI). Furthermore, suppose 1959 that a non-merge node is downstream from two different VP merge 1960 nodes. This node may need to request one VPI/VCI (for traffic 1961 originating from itself) plus two VPs (one for each upstream node). 1963 Note that there are multiple options for coordinating VCIs within a 1964 VP. Description of the range of options is FFS. 1966 In order to support all of VP merge, VC merge, and non-merge, it is 1967 therefore necessary to allow upstream nodes to request a combination 1968 of zero or more VC identifiers (consisting of a VPI/VCI), plus zero 1969 or more VPs (identified by VPIs). VP merge nodes would therefore 1970 request one VP. VC merge node would request only a single VPI/VCI 1971 (since they can merge all upstream traffic into a single VC). Non- 1972 merge nodes would pass on any requests that they get from above, plus 1973 request a VPI/VCI for traffic that they originate (if they can be 1974 ingress nodes). However, non-merge nodes which can only do VC 1975 forwarding (and not VP forwarding) will need to know which VCIs are 1976 used within each VP in order to install the correct VCs in its 1977 forwarding table. A detailed description of how this could work is 1978 FFS. 1980 4.2.3 Coordination of the VCI space with VP Merge: 1982 VP merge requires that the VCIs be coordinated to ensure uniqueness. 1983 There are a number of ways in which this may be accomplished: 1985 1. Each node may be pre-configured with a unique VCI value (or 1986 values). 1988 2. Some one node (most likely they root of the multipoint to point 1989 tree) may coordinate the VCI values used within the VP. A 1990 protocol mechanism will be needed to allow this to occur. How 1991 hard this is to do depends somewhat upon whether the root is 1992 otherwise involved in coordinating the multipoint to point 1993 tree. For example, allowing one node (such as the root) to 1994 coordinate the tree may be useful for purposes of coordinating 1995 load sharing (see section 4.10). Thus whether or not the issue 1996 of coordinating the VCI space is significant or trivial may 1997 depend upon other design choices which at first glance may 1998 have appeared to be independent protocol design choices. 2000 3. Other unique information such as portions of a class B or class 2001 C address may be used to provide a unique VCI value. 2003 4. Another alternative is to implement a simple hardware extension 2004 in the ATM switches to keep the VCI values unique by dynamically 2005 altering them to avoid collision. 2007 VP merge makes less efficient use of the VPI/VCI space (relative to 2008 VC merge). When VP merge is used, the LSPs may not be able to 2009 transit public ATM networks that don't support SVP. 2011 4.2.4 Buffering Issues Related To Stream Merge: 2013 There is an issue regarding the amount of buffering required for 2014 frame merge, VC merge, and VP merge. Frame merge and VC merge 2015 requires that intermediate points buffer incoming packets until the 2016 entire packet arrives. This is essentially the same as is required in 2017 traditional IP routers. 2019 VP merge allows cells to be transmitted by intermediate nodes as soon 2020 as they arrive, reducing the buffering and latency at intermediate 2021 nodes. However, the use of VP merge implies that cells from multiple 2022 packets will arrive at the egress node interleaved on separate VCIs. 2023 This in turn implies that the egress node may have somewhat increased 2024 buffering requirements. To a large extent egress nodes for some 2025 destinations will be intermediate nodes for other destinations, 2026 implying that increase in buffers required for some purpose (egress 2027 traffic) will be offset by a reduction in buffers required for other 2028 purposes (transit traffic). Also, routers today typically deal with 2029 high-fanout channelized interfaces and with multi-VC ATM interfaces, 2030 implying that the requirement of buffering simultaneously arriving 2031 cells from multiple packets and sources is something that routers 2032 typically do today. This is not meant to imply that the required 2033 buffer size and performance is inexpensive, but rather is meant to 2034 observe that it is a solvable issue. 2036 ATM equipment provides traffic shaping, in which the ATM cells 2037 associated with any one particular VC are intentionally not 2038 transmitted back to back, but rather are spread out over time in 2039 order to place less short term buffering load on switches. Since VC 2040 merge requires that all cells associated with a particular packet (or 2041 a particular AAL5 frame) are buffered before any cell from the packet 2042 can be transmitted, VC merge defeats much of the intent of traffic 2043 shaping. An advantage of VP merge is that it preserves traffic 2044 shaping through ATM switches acting as LSRs. While traffic shaping 2045 may generally be expected to reduce the buffering requirements in ATM 2046 switches (whether acting as MPLS switches or as native ATM switches), 2047 the precise effect of traffic shaping has not been studied in the 2048 context of MPLS. 2050 4.3 Loop Handling 2052 Generally, methods for dealing with loops can be split into three 2053 categories: Loop Survival makes use of methods which minimize the 2054 impact of loops, for example by limiting the amount of network 2055 resources which can be consumed by a loop; Loop Detection allows 2056 loops to be set up, but later detects these loops and eliminates 2057 them; Loop Prevention provides methods for avoiding setting up L2 2058 forwarding in a way which results in a L2 loop. 2060 Note that we are concerned here only with loops that occur in L2 2061 forwarding. Transient loops at L3 will continue to be part of the 2062 normal IP operation, and will be handled the way that IP has been 2063 handling loops for years (see section 3.5). 2065 Loop Survival: 2067 Loop Survival refers to methods that are used to allow the network to 2068 operate well even though short term transient loops may be formed by 2069 the routing protocol. The basic approach to loop survival is to limit 2070 the amount of network resources which are consumed by looping 2071 packets, and to minimize the effect on other (non-looping) traffic. 2072 Note that loop survival is the method used by conventional IP 2073 forwarding, and is therefore based on long and relatively successful 2074 experience in the Internet. 2076 The most basic method for loop survival is based on the use to a TTL 2077 (Time To Live) field. The TTL field is decremented at each hop. If 2078 the TTL field reaches zero, then the packet is discarded. This method 2079 works well over those media which has a TTL field. This explicitly 2080 includes L3 IP forwarding. Also, assuming that the core MPLS 2081 specifications will include definition of a "shim" MPLS header for 2082 use over those media which do not have their own labels, in order to 2083 carry labels for use in forwarding of user data, it is likely that 2084 the shim header will also include a TTL field. 2086 However, there is considerable interest in using MPLS over L2 2087 protocols which provide their own labels, with the L2 label used for 2088 MPLS forwarding. Specific L2 protocols which offer a label for this 2089 purpose include ATM and Frame Relay. However, neither ATM nor Frame 2090 Relay have a TTL field. This implies that this method cannot be used 2091 when basic ATM or Frame Relay forwarding is being used. 2093 Another basic method for loop survival is the use of dynamic routing 2094 protocols which converge rapidly to non-looping paths. In some 2095 instances it is possible that congestion caused by looping data could 2096 effect the convergence of the routing protocol (see section 3.5). 2097 MPLS should be designed to prevent this problem from occurring. Given 2098 that MPLS uses the same routing protocols as are used for IP, this 2099 method does not need to be discussed further in this framework 2100 document. 2102 Another possible tool for loop survival is the use of fair queuing. 2103 This allows unrelated flows of user data to be placed in different 2104 queues. This helps to ensure that a node which is overloaded with 2105 looping user data can nonetheless forward unrelated non-looping data, 2106 thereby minimizing the effect that looping data has on other data. We 2107 cannot assume that fair queuing will always be available. In 2108 practice, many fair queuing implementations merge multiple streams 2109 into one queue (implying that the number of queues used is less than 2110 the number of user data flows which are present in the network). 2111 This implies that any data which happens to be in the same queue with 2112 looping data may be adversely effected. 2114 Loop Detection: 2116 Loop Detection refers to methods whereby a loop may be set up at L2, 2117 but the loop is subsequently detected. When the loop is detected, it 2118 may be broken at L2 by dropping the label relationship, implying that 2119 packets for a set of destinations must be forwarded at L3. 2121 A possible method for loop detection is based on transmitting a "loop 2122 detection" control packet (LDCP) along the path towards a specified 2123 destination whenever the route to the destination changes. This LDCP 2124 is forwarded in the direction that the label specifies, with the 2125 labels swapped to the correct next hop value. However, normal L2 2126 forwarding cannot be used because each hop needs to examine the 2127 packet to check for loops. The LDCP is forwarded towards that 2128 destination until one of the following happens: (i) The LDCP reaches 2129 the last MPLS node along the path (i.e. the next hop is either a 2130 router which is not participating in MPLS, or is the final 2131 destination host); (ii) The TTL of the LDCP expires (assuming that 2132 the control packet uses a TTL, which is optional but not absolutely 2133 necessary), or (iii) The LDCP returns to the node which originally 2134 transmitted it. If the latter occurs, then the packet has looped and 2135 the node which originally transmitted the LDCP stops using the 2136 associated label, and instead uses L3 forwarding for the associated 2137 destination addresses. One problem with this method is that once a 2138 loop is detected it is not known when the loop clears. One option 2139 would be to set a timer, and to transmit a new LDCP when the timer 2140 expires. 2142 An alternate method counts the hops to each egress node, based on the 2143 routes currently available. Each node advertises its distance (in hop 2144 counts) to each destination. An egress node advertises the 2145 destinations that it can reach directly with an associated hop count 2146 of zero. For each destination, a node computes the hop count to that 2147 destination based on adding one to the hop count advertised by its 2148 actual next hop used for that destination. When the hop count for a 2149 particular destination changes, the hop counts needs to be 2150 readvertised. 2152 In addition, the first of the loop prevention schemes discussed below 2153 may be modified to provide loop detection (the details are 2154 straightforward, but have not been written down in time to include in 2155 this rough draft). 2157 Loop Prevention: 2159 Loop prevention makes use of methods to ensure that loops are never 2160 set up at L2. This implies that the labels are not used until some 2161 method is used to ensure that following the label towards the 2162 destination, with associated label swaps at each switch, will not 2163 result in a loop. Until the L2 path (making use of assigned labels) 2164 is available, packets are forwarded at L3. 2166 Loop prevention requires explicit signaling of some sort to be used 2167 when setting up an L2 stream. 2169 One method of loop prevention requires that labels be propagated 2170 starting at the egress switch. The egress switch signals to 2171 neighboring switches the label to use for a particular destination. 2172 That switch then signals an associated label to its neighbors, etc. 2173 The control packets which propagate the labels also include the path 2174 to the egress (as a list of routerIDs). Any looping control packet 2175 can therefore be detected and the path not set up to or past the 2176 looping point. . 2179 Another option is to use explicit routing to set up label bindings 2180 from the egress switch to each ingress switch. This precludes the 2181 possibility of looping, since the entire path is computed by one 2182 node. This also allows non-looping paths to be set up provided that 2183 the egress switch has a view of the topology which is reasonably 2184 close to reality (if there are operational links which the egress 2185 switch doesn't know about, it will simply pick a path which doesn't 2186 use those links; if there are links which have failed but which the 2187 the egress switch thinks are operational, then there is some chance 2188 that the setup attempt will fail but in this case the attempt can be 2189 retried on a separate path). Note therefore that non-looping paths 2190 can be set up with this method in many cases where distributed 2191 routing plus hop by hop forwarding would not actually result in non- 2192 looping paths. This method is similar to the method used by standard 2193 ATM routing to ensure that SVCs are non-looping [PNNI]. 2195 Explicit routing is only applicable if the routing protocol gives the 2196 egress switch sufficient information to set up the explicit route, 2197 implying that the protocol must be either a link state protocol (such 2198 as OSPF) or a path vector protocol (such as BGP). Source routing 2199 therefore is not appropriate as a general approach for use in any 2200 network regardless of the routing protocol. This method also requires 2201 some overhead for the call setup before label-based forwarding can be 2202 used. If the network topology changes in a manner which breaks the 2203 existing path, then a new path will need to be explicit routed from 2204 the egress switch. Due to this overhead this method is probably only 2205 appropriate if other significant advantages are also going to be 2206 obtained from having a single node (the egress switch) coordinate the 2207 paths to be used. Examples of other reasons to have one node 2208 coordinate the paths to a single egress switch include: (i) 2209 Coordinating the VCI space where VP merge is used (see section 4.2); 2210 and (ii) Coordinating the routing of streams from multiple ingress 2211 switches to one egress switch so as to balance the load on multiple 2212 alternate paths through the network. 2214 In principle the explicit routing could also be done in the alternate 2215 direction (from ingress to egress). However, this would make it more 2216 difficult to merge streams if stream merge is to be used. This would 2217 also make it more difficult to coordinate (i) changes to the paths 2218 used, (ii) the VCI space assignments, and (iii) load sharing. This 2219 therefore makes explicit routing more difficult, and also reduces the 2220 other advantages that could be obtained from the approach. 2222 If label distribution is piggybacked on the routing protocol (see 2223 section 4.1.2), then loop prevention is only possible if the routing 2224 protocol itself does loop prevention. 2226 What To Do If A Loop Is Detected: 2228 With all of these schemes, if a loop is known to exist then the L2 2229 label-swapped path is not set up. This leads to the obvious question 2230 of what does an MPLS node do when it doesn't have a label for a 2231 particular destination, and a packet for that destination arrives to 2232 be forwarded? If possible, the packet is forwarded using normal L3 2233 (IP) forwarding. There are two issues that this raises: (i) What 2234 about nodes which are not capable of L3 forwarding; (ii) Given the 2235 relative speeds of L2 and L3 forwarding, does this work? 2237 Nodes which are not capable of L3 forwarding obviously can't forward 2238 a packet unless it arrives with a label, and the associated next hop 2239 label has been assigned. Such nodes, when they receive a packet for 2240 which the next hop label has not been assigned, must discard the 2241 packet. It is probably safe to assume that if a node cannot forward 2242 an L3 packet, then it is probably also incapable of forwarding an 2243 ICMP error report that it originates. This implies that the packet 2244 will need to be discarded in this case. 2246 In many cases L2 forwarding will be significantly faster than L3 2247 forwarding (allowing faster forwarding is a significant motivation 2248 behind the work on MPLS). This implies that if a node is forwarding a 2249 large volume of traffic at L2, and a change in the routing protocol 2250 causes the associated labels to be lost (necessitating L3 2251 forwarding), in some cases the node will not be capable of forwarding 2252 the same volume of traffic at L3. This will of course require that 2253 packets be discarded. However, in some cases only a relatively small 2254 volume of traffic will need to be forwarded at L3. Thus forwarding at 2255 L3 when L2 is not available is not necessarily always a problem. 2256 There may be some nodes which are capable of forwarding equally fast 2257 at L2 and L3 (for example, such nodes may contain IP forwarding 2258 hardware which is not available in all nodes). Finally, when packets 2259 are lost this will cause TCP to backoff, which will in turn reduce 2260 the load on the network and allow the network to stabilize even at 2261 reduced forwarding rates until such time as the label bindings can be 2262 reestablished. 2264 Note that in most cases loops will be caused either by configuration 2265 errors, or due to short term transient problems caused by the failure 2266 of a link. If only one link goes down, and if routing creates a 2267 normal "tree-shaped" set of paths to any one destination, then the 2268 failure of one link somewhere in the network will effect only one 2269 link's worth of data passing through any one node in the network. 2270 This implies that if a node is capable of forwarding one link's worth 2271 of data at L3, then in many or most cases it will have sufficient L3 2272 bandwidth to handle looping data. 2274 4.4 Interoperation with NHRP 2276 2281 When label switching is used over ATM, and there exists an LSR which 2282 is also operating as a Next Hop Client (NHC), the possibility of 2283 direct interaction arises. That is, could one switch cells between 2284 the two technologies without reassembly. To enable this several 2285 important issues must be addressed. 2287 The encapsulation must be acceptable to both MPLS and NHRP. If only 2288 a single label is used, then the null encapsulation could be used. 2289 Other solutions could be developed to handle label stacks. 2291 NHRP must understand and respect the granularity of a stream. 2293 Currently NHRP resolves an IP address to an ATM address. The response 2294 may include a mask indicating a range of addresses. However, any VC 2295 to the ATM address is considered to be a viable means of packet 2296 delivery. Suppose that an NHC NHRPs for IP address A and gets back 2297 ATM address 1 and sets up a VC to address 1. Later the same NHC NHRPs 2298 for a totally unrelated IP address B and gets back the same ATM 2299 address 1. In this case normal NHRP behavior allows the NHC to use 2300 the VC (that was set up for destination A) for traffic to B. 2302 Note: In this section we will refer to a VC set up as a result of an 2303 NHRP query/response as a shortcut VC. 2305 If one expects to be able to label switch the packets being received 2306 from a shortcut VC, then the label switch needs to be informed as to 2307 exactly what traffic will arrive on that VC and that mapping cannot 2308 change without notice. Currently there exists no mechanism in the 2309 defined signaling of an shortcut VC. Several means are possible. A 2310 binding, equivalent to the binding in LDP, could be sent in the setup 2311 message. Alternatively, the binding of prefix to label could remain 2312 in an LDP session (or whatever means of label distribution as 2313 appropriate) and the setup could carry a binding of the label to the 2314 VC. This would leave the binding mechanism for shortcut VCs 2315 independent of the label distribution mechanism. 2317 A further architectural challenge exists in that label switching is 2318 inherently unidirectional whereas ATM is bi-directional. The above 2319 binding semantics are fairly straight-forward. However, effectively 2320 using the reverse direction of a VC presents further challenges. 2322 Label switching must also respect the granularity of the shortcut VC. 2323 Without VC merge, this means a single label switched flow must map to 2324 a VC. In the case of VC merge, multiple label switched streams could 2325 be merged onto a single shortcut VC. But given the asymmetry 2326 involved, there is perhaps little practical use 2328 Another issue is one of practicality and usefulness. What is sent 2329 over the VC must be at a fine enough granularity to be label switched 2330 through receiving domain. One potential place where the two 2331 technologies might come into play is in moving data from one campus 2332 via the wide-area to another campus. In such a scenario, the two 2333 technologies would border precisely at the point where summarization 2334 is likely to occur. Each campus would have a detailed understanding 2335 of itself, but not of the other campus. The wide-area is likely to 2336 have summarized knowledge only. But at such a point level 3 2337 processing becomes the likely solution. 2339 4.5. Operation in a hierarchy 2341 MPLS allows hierarchical operation, through use of a label stack. 2342 This allows MPLS to simultaneously be used for routing at a fine 2343 grain level (for example, between individual routers within an ISP) 2344 and at a higher "area by area" or "domain by domain" level. 2346 4.5.1 Example of Hierarchical Operation 2348 Figure 1 illustrates an example of how MPLS may operate in a 2349 hierarchy. This example illustrates three transit routing domains 2350 (Domain #1, #2, and #3). For example, these three domains may 2351 represent internet service providers. Domain Boundary Routers are 2352 illustrated in each domain (routers R1 and R2 in domain #1, routers 2353 R3 and R8 in domain #2, and routers R9 and R10 in domain #3. Suppose 2354 that these domain boundary routers are operating BGP. 2356 Internal routers are not illustrated in domains 1 and 3. However, 2357 internal routers are illustrated within domain #2. In particular, the 2358 path between routers R3 and R8 follows the internal routers R4, R5, 2359 R6, and R7 within domain #2. 2361 ................. ........................ ................ 2362 . . . . . . 2363 . . . . . . 2364 .R1 R2-------R3 R8-------R9 R10. 2365 . . . \ / . . . 2366 . . . R4---R5---R6---R7 . . . 2367 . . . . . . 2368 . Domain#1 . . Domain#2 . . Domain#3 . 2369 ................. ........................ ................ 2371 Example of the Use of MPLS in a Hierarchy 2373 In this example there are two levels of routing taking place. For 2374 example, OSPF may be used for routing within Domain #2. In this case 2375 the routers R3, R4, R5, R6, R7, and R8 may be running OSPF amongst 2376 themselves in order to compute routes within Domain #2. The domain 2377 boundary routers (R1, R2, R3, R8, R9, and R10) operate BGP in order 2378 to determine paths between routing domains. 2380 MPLS allows label forwarding to be done independently at multiple 2381 levels. In this example, MPLS may be used at the BGP level (between 2382 routers R1, R2, R3, R8, R9, and R10) and at the OSPF level (between 2383 routers R4, R5, R6, and R7). Thus when the IP packet traverses Domain 2384 number 2, it will contain two labels, encoded as a "label stack". The 2385 higher level label would be used between routers R3 and R8. This 2386 would be encapsulated inside a header specifying a lower level label 2387 used within domain 2. 2389 Consider the forwarding operation that takes place at router R3. In 2390 this case, R3 will receive a packet from R2 containing a single label 2391 (the BGP level label). R3 will need to swap BGP level labels in order 2392 to put the label that R8 expects. R3 will also need to add an OSPF- 2393 level label, as is expected by R4. R3 therefore "pushes down" the BGP 2394 level label in the label stack, by adding a lower level label. Also 2395 note that the actual label swapping operation performed by R3 can be 2396 optimized to allow very simple forwarding: R3 receives a single 2397 incoming label from R2, and can map this label into the new label 2398 header to be prepended to the packet, it just happens that the new 2399 label header to be added by R3 contains two labels rather than one. 2401 4.5.2 Components Required for Hierarchical Operation 2403 In order for MPLS to operate in a hierarchy, there are three things 2404 which must be accomplished: 2406 - Hierarchical Label Exchange in LDP 2408 The Label Distribution Protocol needs to exchange labels at each 2409 level of the hierarchy. In our example, R3 needs to exchange label 2410 bindings with R8 for operation at the BGP level. At the same time, 2411 R3 needs to exchange label bindings with R4 (and R4 needs to 2412 exchange label bindings with R5) for operation at the OSPF level. 2413 The control component for hierarchical labeling is essentially the 2414 same as that for single level tagging, except that labels are 2415 exchanged not just among physically adjacent LSRs but between those 2416 switching on the same level in the tag stack. 2418 - Label Stack 2419 Multiple labels need to be carried in data packets. For example, 2420 when a data packet is being carried across domain #2, the data 2421 packet needs to be encapsulated in a header which carries BGP level 2422 label, and the resulting packet needs to be carried in a header 2423 which carries an OSPF level label. 2425 - Configuration 2426 It is necessary for routers to know when hierarchical label 2427 switching is being used. 2429 4.5.3 Some Restrictions on Use of Hierarchical MPLS 2431 Consider the example in figure 1. In this case, the BGP-level label is 2432 encoded by router R1. Label swapping is employed for packet forwarding 2433 at R2, R3, R8, and R9. This is only possible if R1 knows the right label 2434 to use, implying that the granularity used in mapping packets to 2435 forwarding equivalence classes is the same at routers R2, R3, R8, and 2436 R9. 2438 We can consider some specific examples to illustrate the issue: 2440 Suppose that the destination host is within domain 3. In this case, it 2441 is very likely that router R9 will forward the packet based on a finer 2442 grain than was used previously. For example, a relatively short address 2443 prefix may be used for advertising the addresses reachable in domain 3, 2444 while longer (more specific) address prefixes may be used for specific 2445 areas or subnets within domain 3. In this case router R1 may assign a 2446 BGP level label to the packet, and label based forwarding at the BGP 2447 level may be used by routers R1, R2, R3, and R8. However, router R9 will 2448 need to make use of layer 3 forwarding. 2450 Alternatively, suppose that domain 3 is an Internet Service Provider, 2451 which offers service to multiple routing domains. Suppose that in this 2452 case domain 3 makes use of a single CIDR address block (based on a 2453 single address prefix), with smaller address blocks (corresponding to 2454 longer address prefixes) assigned to each of multiple domains who get 2455 their Internet service from domain 3. Suppose that the destination for a 2456 particular IP packet is contained in one of these smaller domains whose 2457 addresses are contained in the larger address block assigned to and 2458 administered by domain 3. Again in this case router R9 will need to make 2459 use of label based forwarding. 2461 Let's consider another possible complication: Suppose that router R1 is 2462 an MPLS node, but that some of the internal routers within domain 1 do 2463 not know about MPLS. In this case, suppose that R1 encapsulates an IP 2464 packet in an MPLS header in order to carry the BGP level label. In this 2465 case the non-MPLS-capable routers within domain 1 will not know what to 2466 do with the MPLS header. This implies that MPLS can be used at a higher 2467 level (such as between the border routers R1 and R2 in our example) only 2468 if either the lower level routers (such as the routers within domain 1) 2469 are also using MPLS, or the MPLS header is itself encapsulated within an 2470 IP header for transmission across the domain. 2472 These examples imply that there are some cases where IP forwarding will 2473 be required in a hierarchy. While hierarchical MPLS may be useful in 2474 many cases, it does not replace layer 3 forwarding. 2476 4.5.4 The Relationship between MPLS hierarchy and Routing Hierarchy 2478 4.5.4.1 Stacked Labels in a Flat Routing Environment 2480 The label stacking mechanism can be useful in some scenarios independent 2481 of routing hierarchy. 2483 The basic concept of stacking is to provide a mechanism to segregate 2484 streams within a switched path. Under normal operation, when packets 2485 are encapsulated into a single L2 header, if multiple streams are 2486 forwarded into a switched path, it will require L3 processing to 2487 segregate a certain stream at the end of the switched path. The 2488 stacking mechanism provides an easy way to maintain the identity of 2489 various streams which are merged into a single switched path. 2491 One useful application of this technique is in Virtual Private Networks. 2492 The packets can be switched both at the ingress and egress nodes of the 2493 provider network. A packet coming in at one end of a customer network 2494 contains an encapsulated header with the VPN label. At the VPN ingress 2495 node, the header is "popped", to provide the label for switching through 2496 the VPN. Further, this header is then "pushed" with an encapsulation of 2497 the far end customer label. At the VPN egress node, the packet header 2498 is "popped" again, and the new header provides the label for switching 2499 through the customer site. This enables one to provide customers with 2500 benefits of VPN with end-to-end switching for optimal performance. 2502 Another interesting use can be in conjunction with RSVP flows. In RSVP, 2503 senders flows can be logically merged under a single resource 2504 reservation using the Shared and the Wildcard filters. The stacking 2505 mechanism can be used to merge flows into a single label and the shared 2506 QoS can be applied to the single label on top of the stack. Since 2507 sender flows within the merged switched path maintain their identity, it 2508 is easy to demerge at a downstream node without requiring L3 processing 2509 of the packets. Another similar application can be merging of several 2510 premium service flows with similar QoS into a single switched path. This 2511 helps in conserving labels in backbone of a large networks. 2513 Yet another useful application can be DVMRP tunnels similar in concept 2514 to the DVMRP tunnels used in the existing Mbone. The ingress node to 2515 the DVMRP switched tunnels encapsulates the label learned from the 2516 egress node of the DVMRP tunnel for a particular (S,G) pair before 2517 forwarding packets into the DVMRP tunnel. The egress node of the tunnel 2518 just pops the top label and switches the packet based on the interior 2519 label. 2521 Note that the use of tunnels can be also quite beneficial in a non- 2522 hierarchical environment. Take for example the case where a domain 2523 contains a subset of MPLS nodes. The MPLS egress can advertise labels 2524 for the routes which are within the domain, but are external to the MPLS 2525 core. The ingress node can encapsulate packets for these destinations 2526 within the header for the aggregated switched path that crosses the MPLS 2527 domain. 2529 It is not evident if this technique has any useful application in a flat 2530 routing domain, but can be used in conjunction with explicit routing 2531 when providing specialized services. The multiple levels of 2532 encapsulation can also be used like loose source routing. 2534 4.5.4.2 Flat labels in a Hierarchical Routing Environment 2536 It is also possible in some environments to use a single level of label 2537 in a network using hierarchical routing. This is for example possible in 2538 the case of a two level OSPF network in which the primary purpose of the 2539 network is to support external routes. Specifically, (depending upon the 2540 types of area hierarchy used) OSPF allows external routes to be 2541 advertised throughout an OSPF routing domain, with each external route 2542 associated with the routerID of the router with reachability to the 2543 specific route. This implies that it is possible to set up an LSP to 2544 every router in the routing domain, and then use the LSP for packets 2545 destined to the associated external routes. 2547 4.5.4.3 Configuration of the Hierarchy 2549 The possibility of having a variety of different relationships between 2550 the routing hierarchy and the MPLS hierarchy leads to an obvious 2551 question: How is the relationship between the two hierarchies to be 2552 determined? At first glance it would seem that this generality leads to 2553 a relatively complex configuration issue, and it could be difficult to 2554 ensure consistent configuration of the network. 2556 One possible solution is to have the MPLS hierarchy default to using the 2557 same hierarchy structure as is used for routing, with each area and 2558 domain boundary (as used by routing) also implying an MPLS domain 2559 boundary. This would allow the normal default operation to conform to 2560 the type of operation that we might expect to be used in most 2561 situations, and would allow a common means of interoperation which we 2562 would expect all vendors of MPLS compliant equipment to support. 2564 4.5.5 Some Advantages of Hierarchical MPLS 2566 The use of hierarchical MPLS allows the routers internal to a transit 2567 routing domain to be isolated from the BGP-level routing information. In 2568 our example network, routers R4, R5, R6, and R7 can forward packets 2569 based solely on the lower level label. These internal routers do not 2570 need to know anything at all about higher level IP routing. Note that 2571 this advantage is not available in conventional IP forwarding: If the 2572 internal routers within a routing domain forward IP packets based on the 2573 destination IP address, then the internal routers need to know which 2574 route to use for any particular destination IP address. By combining 2575 hierarchical routing with label stacks MPLS is able to decouple the 2576 exterior and interior protocols. MPLS switches within a domain (interior 2577 switches) need only carry the reachability information for nodes in the 2578 domain. The MPLS border switches for the domain still, of course, carry 2579 the external routes. 2581 Use of hierarchical MPLS also extends the simpler forwarding offered by 2582 MPLS to domain boundary routers. 2584 MPLS places no bound on the number of labels that may be present in a 2585 label stack. In principal this means that MPLS can support multiple 2586 levels of routing hierarchy. 2588 4.6 I Interoperation of MPLS systems with "Conventional" ATM 2590 If we consider the implementation of MPLS on ATM switches we can imagine 2591 several possibilities. 2593 We might remove ATM Forum control plane completely. This the approach 2594 taken by Ipsilon in their IP Switching approach, and allows ATM switches 2595 to operate as MPLS LSRs. 2597 Alternately, we could build a system that supports a "Ships in the 2598 night" (SIN) mode of operation where the ATM Forum and MPLS control 2599 planes both run on the same hardware but are isolated from each other, 2600 i.e. they do not interact. This allows a single device to simultaneously 2601 operate as both an MPLS LSR and an ATM switch. 2603 We feel that the MPLS architecture should allow both of these models. We 2604 note, however, that neither of them addresses the issue of operation of 2605 MPLS over a public ATM network, i.e. over a network that supports 2606 tariffed access to PVCs and ATM Forum SVCs. Because public ATM service 2607 exists and will, presumably, become more pervasive in the future we feel 2608 that another model needs to be included in the architecture and be 2609 supported by the LDP. We call this model the "integrated" model. In 2610 essence it is the same as the SIN model but without the restriction that 2611 the two control planes are isolated. In the integrated model the MPLS 2612 control plane is able to use the ATM control plane to setup SVCs as 2613 needed. An example of this integrated model that allows the coexistence 2614 and interoperation between ATM and MPLS is the CSR proposal from 2615 Toshiba. 2617 Note that there is a distinction relevant to the protocol specification 2618 process between the SIN and the Integrated approach. SIN does not 2619 require specification other than to require that it be transparent to 2620 both the MPLS and ATM control planes (i.e. neither should know of the 2621 others existence). Realisation of SIN on a particular machine is purely 2622 an engineering challenge for the implementors. The Integrated model on 2623 the other hand requires specification of procedures for the use of SVCs 2624 and association of labels with them. 2626 4.7 Multicast 2628 This section is FFS. 2630 4.8 Multipath 2632 Many IP routing protocols support the notion of equal-cost multipath 2633 routes, in which a router maintains multiple next hops for one 2634 destination prefix when two or more equal-cost paths to the prefix 2635 exist. There are a few possible approaches for handling multipath with 2636 MPLS. 2638 In this discussion we will use the term "multipath node" to mean a node 2639 which is keeping track of multiple switched paths from itself for a 2640 single destination. 2642 The first approach maintains a separate switched path from each ingress 2643 node via one or more multipath nodes to a merge point. This requires 2644 MPLS to distinguish the separate switched paths, so that learning of a 2645 new switched path is not misinterpreted as a replacement of the same 2646 switched path. This also requires an ingress MPLS node be capable of 2647 distributing the traffic among the multiple switched paths. This 2648 approach preserves switching performance, but at a cost of proliferating 2649 the number of switched paths. For example, each switched path consumes a 2650 distinct label. 2652 The second approach establishes only one switched path from any one 2653 ingress node to a destination. However, when the paths from two 2654 different ingress nodes happen to arrive at the same node, that node may 2655 use different paths for each (implying that the node becomes a multipath 2656 node). Thus the switched path chosen by the multipath node may assign a 2657 different downstream path to each incoming stream. This conserves 2658 switched paths and maintains switching performance, but cannot balance 2659 loads across downstream links as well as the other approaches, even if 2660 switched paths are selectively assigned. With this approach is that the 2661 L2 path may be different from the normal L3 path, as traffic that 2662 otherwise would have taken multiple distinct paths is forced onto a 2663 single path. 2665 The third approach allows a single stream arriving at a multipath node 2666 to be split into multiple streams, by using L3 forwarding at the 2667 multipath node. For example, the multipath node might choose to use a 2668 hash function on the source and destination IP addresses, in order to 2669 avoid misordering packets between any one IP source and destination. 2670 This approach conserves switched paths at the cost of switching 2671 performance. 2673 4.9 Host Interactions 2675 There are a range of options for host interaction with MPLS: 2677 The most straightforward approach is no host involvement. Thus host 2678 operation may be completely independent of MPLS, rather hosts operate 2679 according to other IP standards. If there is no host involvement then 2680 this implies that the first hop requires an L3 lookup. 2682 If the host is ATM attached and doing NHRP, then this would allow the 2683 host to set up a Virtual Circuit to a router. However this brings up a 2684 range of issues as was discussed in section 4.4 ("interoperation with 2685 NHRP"). 2687 On the ingress side, it is reasonable to consider having the first hop 2688 LSR provide labels to the hosts, and thus have hosts attach labels for 2689 packets that they transmit. This could allow the first hop LSR to avoid 2690 an L3 lookup. It is reasonable here to have the host request labels only 2691 when needed, rather than require the host to remember all labels 2692 assigned for use in the network. 2694 On the egress side, it is questionable whether hosts should be involved. 2695 For scaling reasons, it would be undesirable to use a different label 2696 for reaching each host. 2698 4.10 Explicit Routing 2700 There are two options for Route Selection: (1) Hop by hop routing, and 2701 (2) Explicit routing. 2703 An explicitly routed LSP is an LSP where, at a given LSR, the LSP next 2704 hop is not chosen by each local node, but rather is chosen by a single 2705 node (usually the ingress or egress node of the LSP). The sequence of 2706 LSRs followed by an explicit routing LSP may be chosen by configuration, 2707 or by an algorithm performed by a single node (for example, the egress 2708 node may make use of the topological information learned from a link 2709 state database in order to compute the entire path for the tree ending 2710 at that egress node). 2712 With MPLS the explicit route needs to be specified at the time that 2713 Labels are assigned, but the explicit route does not have to be 2714 specified with each L3 packet. This implies that explicit routing with 2715 MPLS is relatively efficient (when compared with the efficiency of 2716 explicit routing for pure datagrams). 2718 Explicit routing may be useful for a number of purposes such as allowing 2719 policy routing and/or facilitating traffic engineering. 2721 4.10.1 Establishment of Point to Point Explicitly Routed LSPs 2723 In order to establish a point to point explicitly routed LSP, the LDP 2724 packets used to set up the LSP must contain the explicit route. This 2725 implies that the LSP is set up in order either from the ingress to the 2726 egress, or from the egress to the ingress. 2728 One node needs to pick the explicit route: This may be done in at least 2729 two possible ways: (i) by configuration (eg, the explicit route may be 2730 chosen by an operator, or by a centralized server of some kind); (ii) By 2731 use of a routing protocol which allows the ingress and/or egress node to 2732 know the entire route to be followed. This would imply the use of a link 2733 state routing protocol (in which all nodes know the full topology) or of 2734 a path vector routing protocol (in which the ingress node is told the 2735 path as part of the normal operation of the routing protocol). 2737 Note: The normal operation of path vector routing protocols (such as 2738 BGP) does not provide the full set of routers along the path. This 2739 implies that either a partial source route only would be provided 2740 (implying that LSP setup would use a combination of hop by hop and 2741 explicit routing), or it would be necessary to augment the protocol in 2742 order to provide the complete explicit route. Detailed operation in this 2743 case is FFS. 2745 In the point to point case, it is relatively straightforward to specify 2746 the route to use: This is indicated by providing the addresses of each 2747 LSR on the LSP. 2749 4.10.2 Explicit and Hop by Hop routing: Avoiding Loops 2751 In general, an LSP will be explicit routed specifically because there is 2752 a good reason to use an alternative to the hop by hop routed path. This 2753 implies that the explicit route is likely to follow a path which is 2754 inconsistent with the path followed by hop by hop routing. If some of 2755 the nodes along the path follow an explicit route but some of the nodes 2756 make use of hop by hop routing (and ignore the explicit route), then 2757 inconsistent routing may result and in some cases loops (or severely 2758 inefficient paths) may form. This implies that for any one LSP, there 2759 are two possible options: (i) The entire LSP may be hop by hop routed; 2760 or (ii) The entire LSP may be explicit routed. 2762 For this reason, it is important that if an explicit route is specified 2763 for setting up an LSP, then that route must be followed in setting up 2764 the LSP. 2766 There is a related issue when a link or node in the middle of an 2767 explicitly routed LSP breaks: In this case, the last operating node on 2768 the upstream part of the LSP will continue receiving packets, but will 2769 not be able to forward them along the explicitly routed LSP (since its 2770 next hop is no longer functioning). In this case, it is not in general 2771 safe for this node to forward the packets using L3 forwarding with hop 2772 by hop routing. Instead, the packets must be discarded, and the upstream 2773 partition of the explicitly routed LSP must be torn down. 2775 Where part of an Explicitly Routed LSP breaks, the node which originated 2776 the LSP needs to be told about this. For robustness reasons the MPLS 2777 protocol design should not assume that the routing protocol will tell 2778 the node which originated the LSP. For example, it is possible that a 2779 link may go down and come back up quickly enough that the routing 2780 protocol never declares the link down. Rather, an explicit MPLS 2781 mechanism is needed. 2783 4.10.3 Merge and Explicit Routing 2785 Explicit Routing is slightly more complex with a multipoint to point LSP 2786 (i.e., in the case that stream merge is used). 2788 In this case, it is not possible to specify the route for the LSP as a 2789 simple list of LSRs (since the LSP does not consist of a simple sequence 2790 of LSRs). Rather the explicit route must specify a tree. There are 2791 several ways that this may be accomplished. Details are FFS. 2793 4.10.4 Using Explicit Routing for Traffic Engineering 2795 In the Internet today it is relatively common for ISPs to make use of a 2796 Frame Relay or ATM core, which interconnects a number of IP routers. The 2797 primary reason for use of a switching (L2) core is to make use of low 2798 cost equipment which provides very high speed forwarding. However, there 2799 is another very important reason for the use of a L2 core: In order to 2800 allow for Traffic Engineering. 2802 Traffic Engineering (also known as bandwidth management) refers to the 2803 process of managing the routes followed by user data traffic in a 2804 network in order to provide relatively equal and efficient loading of 2805 the resources in the network (i.e., to ensure that the bandwidth on 2806 links and nodes are within the capabilities of the links and nodes). 2808 Some rudimentary level of traffic engineering can be accomplished with 2809 pure datagram routing and forwarding by adjusting the metrics assigned 2810 to links. For example, suppose that there is a given link in a network 2811 which tends to be overloaded on a long term basis. One option would be 2812 to manually configure an increased metric value for this link, in the 2813 hopes of moving some traffic onto alternate routes. This provides a 2814 rather crude method of traffic engineering and provides only limited 2815 results. 2817 Another method of traffic engineering is to manually configure multiple 2818 PVCs across a L2 core, and to adjust the route followed by each PVC in 2819 an attempt to equalize the load on different parts of the network. Where 2820 necessary, multiple PVCs may be configured between the same two nodes, 2821 in order to allow traffic to be split between different paths. In some 2822 topologies it is much easier to achieve efficient non-overlapping or 2823 minimally-overlapping paths via this method (with manually configured 2824 paths) than it would be with pure datagram forwarding. A similar ability 2825 can be achieved with MPLS via the use of manual configuration of the 2826 paths taken by LSPs. 2828 A related issue is the decision on where merge is to occur. Note that 2829 once two streams merge into one stream (forwarded by a single label) 2830 then they cannot diverge again at that level of the MPLS hierarchy 2831 (i.e., they cannot be bifurcated without looking at a higher level label 2832 or the IP header). Thus there may be times when it is desirable to 2833 explicitly NOT merge two streams even though they are to the same egress 2834 node and FEC. Non-merge may be appropriate either because the streams 2835 will want to diverge later in the path (for example, to avoid 2836 overloading a particular downstream link), or because the streams may 2837 want to use different physical links in the case where multiple slower 2838 physical links are being aggregated into a single logical link for the 2839 purpose of IP routing. 2841 As a network grows to a very large size (on the order of hundreds of 2842 LSRs), it becomes increasingly difficult to handle the assignment of all 2843 routes via manual configuration. However, explicit routing allows 2844 several alternatives: 2846 1. Partial Configuration: One option is to use automatic/dynamic routing 2847 for most of the paths through the network, but then manually configure 2848 some routes. For example, suppose that full dynamic routing would result 2849 in a particular link being overloaded. One of the LSPs which uses that 2850 link could be selected and manually routed to use a different path. 2852 2. Central Computation: One option would be to provide long term network 2853 usage information to a single central management facility. That facility 2854 could then run a global optimization to compute a set of paths to use. 2855 Network management commands can be used to configure LSRs with the 2856 correct routes to use. 2858 3. Egress Computation: An egress node can run a computation which 2859 optimizes the path followed for traffic to itself. This cannot of course 2860 optimize the entire traffic load through the network, but can include 2861 optimization of traffic from multiple ingress's to one egress. The 2862 reason for optimizing traffic to a single egress, rather than from a 2863 single ingress, relates to the issue of when to merge: An ingress can 2864 never merge the traffic from itself to different egresses, but an egress 2865 can if desired chose to merge the traffic from multiple ingress's to 2866 itself. 2868 4.10.5 Using Explicit Routing for Policy Routing 2870 This section is FFS. 2872 4.11 Traceroute 2874 This section is FFS. 2876 4.12 LSP Control: Egress versus Local 2878 There is a choice to be made regarding whether the initial setup of LSPs 2879 will be initiated by the egress node, or locally by each individual 2880 node. 2882 When LSP control is done locally, then each node may at any time pass 2883 label bindings to its neighbors for each FEC recognized by that node. In 2884 the normal case that the neighboring nodes recognize the same FECs, then 2885 nodes may map incoming labels to outgoing labels as part of the normal 2886 label swapping forwarding method. 2888 When LSP control is done by the egress, then initially (on startup) only 2889 the egress node passes label bindings to its neighbors corresponding to 2890 any FECs which leave the MPLS network at that egress node. When 2891 initializing, other nodes wait until they get a label from downstream 2892 for a particular FEC before passing a corresponding label for the same 2893 FEC to upstream nodes. 2895 With local control, since each LSR is (at least initially) independently 2896 assigning labels to FECs, it is possible that different LSRs may make 2897 inconsistent decisions. For example, an upstream LSR may make a coarse 2898 decision (map multiple IP address prefixes to a single label) while its 2899 downstream neighbor makes a finer grain decision (map each individual IP 2900 address prefix to a separate label). With downstream label assignment 2901 this can be corrected by having LSRs withdraw labels that it has 2902 assigned which are inconsistent with downstream labels, and replace them 2903 with new consistent label assignments. 2905 This may appear to be an advantage of egress LSP control (since with 2906 egress control the initial label assignments "bubble up" from the egress 2907 to upstream nodes, and consistency is therefore easy to ensure). 2908 However, even with egress control it is possible that the choice of 2909 egress node may change, or the egress may (based on a change in 2910 configuration) change its mind in terms of the granularity which is to 2911 be used. This implies the same mechanism will be necessary to allow 2912 changes in granularity to bubble up to upstream nodes. The choice of 2913 egress or local control may therefore effect the frequency with which 2914 this mechanism is used, but will not effect the need for a mechanism to 2915 achieve consistency of label granularity. 2917 Egress control and local control can interwork in a very straightforward 2918 manner: With either approach, (assuming downstream label assignment) the 2919 egress node will initially assign labels for particular FECs and will 2920 pass these labels to its neighbors. With either approach these label 2921 assignments will bubble upstream, with the upstream nodes choosing 2922 labels that are consistent with the labels that they receive from 2923 downstream. 2925 The difference between the two techniques therefore becomes a tradeoff 2926 between avoiding a short period of initial thrashing on startup (in the 2927 sense of avoiding the need to withdraw inconsistent labels which may 2928 have been assigned using local control) versus the imposition of a short 2929 delay on initial startup (while waiting for the initial label 2930 assignments to bubble up from downstream). The protocol mechanisms which 2931 need to be defined are the same in either case, and the steady state 2932 operation is the same in either case. 2934 4.13 Security 2936 Security in a network using MPLS should be relatively similar to 2937 security in a normal IP network. 2939 Routing in an MPLS network uses precisely the same IP routing protocols 2940 as are currently used with IP. This implies that route filtering is 2941 unchanged from current operation. Similarly, the security of the routing 2942 protocols is not effected by the use of MPLS. 2944 Packet filtering also may be done as in normal IP. This will require 2945 either (i) that label swapping be terminated prior to any firewalls 2946 performing packet filtering (in which case a separate instance of label 2947 swapping may optionally be started after the firewall); or (ii) that 2948 firewalls "look past the labels", in order to inspect the entire IP 2949 packet contents. In this latter case note that the label may imply 2950 semantics greater than that contained in the packet header: In 2951 particular, a particular label value may imply that the packet is to 2952 take a particular path after the firewall. In environments in which this 2953 is considered to be a security issue it may be desirable to terminate 2954 the label prior to the firewall. 2956 Note that in principle labels could be used to speed up the operation of 2957 firewalls: In particular, the label could be used as an index into a 2958 table which indicates the characteristics that the packet needs to have 2959 in order to pass through the firewall. Depending upon implementation 2960 considerations matching the contents of the packet to the contents of 2961 the table may be quicker than parsing the packet in the absence of the 2962 label. 2964 References 2966 [1] "A Proposed Architecture for MPLS", E. Rosen, A. Viswanathan, R. 2967 Callon, work in progress, draft-ietf-mpls-arch-00.txt, August 2968 1997. 2970 [2] "ARIS: Aggregate Route-Based IP Switching", A. Viswanathan, N. 2971 Feldman, R. Boivie, R. Woundy, work in progress, Internet Draft 2972 , March 1997. 2974 [3] "ARIS Specification", N. Feldman, A. Viswanathan, work in 2975 progress, Internet Draft , March 2976 1997. 2978 [4] "ARIS Support for LAN Media Switching", S. Blake, A. Ghanwani, W. 2979 Pace, V. Srinivasan, work in progress, Internet Draft , March 1997. 2982 [5] "Tag Switching Architecture - Overview", Rekhter, Davie, Katz, 2983 Rosen, Swallow, Farinacci, work in progress, Internet Draft 2984 , July 1997. 2986 [6] Tag distribution Protocol", Doolan, Davie, Katz, Rekhter, Rosen, 2987 work in progress, internet draft 2989 [7] "Use of Tag Switching with ATM", Davie, Doolan, Lawrence, 2990 McGloghrie, Rekhter, Rosen, Swallow, work in progress, Internet 2991 Draft 2993 [8] "MPLS Label Stack Encoding", Rosen, Rekhter, Tappan, Farinacci, 2994 Fedorkow, Li, Conta, work in progress, draft-ietf-mpls-label- 2995 encaps-00.txt, November 1997. 2997 [9] "Partitioning Tag Space among Multicast Routers on a Common 2998 Subnet", Farinacci, work in progress, internet draft 3001 [10] "Multicast Tag Binding and Distribution using PIM", Farinacci, 3002 Rekhter, work in progress, internet draft 3005 [11] "Toshiba's Router Architecture Extensions for ATM: Overview", 3006 Katsube, Nagami, Esaki, RFC2098.TXT. 3008 [12] "Soft State Switching: A Proposal to Extend RSVP for Switching 3009 RSVP Flows", A. Viswanathan, V. Srinivasan, work in progress, 3010 Internet Draft , March 1997. 3012 [13] "Integrated Services in the Internet Architecture: an Overview", 3013 R. Braden et al, RFC 1633, June 1994. 3015 [14] "Resource ReSerVation Protocol (RSVP), Version 1 Functional 3016 Specification", work in progress, draft-ietf-rsvp-spec-16.txt, 3017 June 1997 3019 [15] "OSPF version 2", J. Moy, RFC 1583, March 1994. 3021 [16] "A Border Gateway Protocol 4 (BGP-4)", Y. Rekhter and T. Li, 3022 RFC 1771, March 1995. 3024 [17] "Ipsilon Flow Management Protocol Specification for IPv4 Version 3025 1.0", P. Newman et al., RFC 1953, May 1996. 3027 [18] "ATM Forum Private Network-Network Interface Specification, 3028 Version 1.0", ATM Forum af-pnni-0055.000, March 1996. 3030 [19] "NBMA Next Hop Resolution Protocol (NHRP)", Luciani, Katz, 3031 Piscitello, Cole, work in progress, draft-ietf-rolc-nhrp-12.txt, 3032 March 1998. 3034 Author's Addresses 3036 Ross Callon 3037 Ascend Communications, Inc. 3038 1 Robbins Road 3039 Westford, MA 01886 3040 508-952-7412 3041 rcallon@casc.com 3043 Paul Doolan 3044 Ennovate Networks 3045 330 Codman Hill Road 3046 Boxborough, MA 3047 978-263-2002 x103 3048 pdoolan@ennovatenetworks.com 3050 Nancy Feldman 3051 IBM Corp. 3052 17 Skyline Drive 3053 Hawthorne NY 10532 3054 914-784-3254 3055 nkf@vnet.ibm.com 3057 Andre Fredette 3058 Bay Networks Inc 3059 3 Federal Street 3060 Billerica, MA 01821 3061 508-916-8524 3062 fredette@baynetworks.com 3064 George Swallow 3065 Cisco Systems, Inc 3066 250 Apollo Drive 3067 Chelmsford, MA 01824 3068 508-244-8143 3069 swallow@cisco.com 3071 Arun Viswanathan 3072 IBM Corp. 3073 17 Skyline Drive 3074 Hawthorne NY 10532 3075 914-784-3273 3076 arunv@vnet.ibm.com