idnits 2.17.1 draft-ietf-mpls-framework-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 119: '... MPLS forwarding MUST simplify packet ...' RFC 2119 keyword, line 125: '...ore technologies MUST be general with ...' RFC 2119 keyword, line 128: '... particular media MAY be considered....' RFC 2119 keyword, line 130: '...ore technologies MUST be compatible wi...' RFC 2119 keyword, line 131: '...g protocols, and MUST be capable of op...' (23 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 126 has weird spacing: '... link tech...' == Line 479 has weird spacing: '...LS when compa...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The MPLS protocol MUST not make assumptions about the forwarding capabilities of an MPLS node. Thus, MPLS must propose solutions that can leverage the benefits of a node that is capable of L3 forwarding, but must not mandate the node be capable of such. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 1999) is 9049 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ARCH' -- Possible downref: Non-RFC (?) normative reference: ref. 'ARIS' -- Possible downref: Non-RFC (?) normative reference: ref. 'ARIS-PROT' -- Possible downref: Non-RFC (?) normative reference: ref. 'ATM' -- Possible downref: Non-RFC (?) normative reference: ref. 'ATMVP' -- Possible downref: Non-RFC (?) normative reference: ref. 'CR-LDP' -- Possible downref: Non-RFC (?) normative reference: ref. 'ENCAP' -- Possible downref: Non-RFC (?) normative reference: ref. 'FANP' -- Possible downref: Non-RFC (?) normative reference: ref. 'FR' -- Possible downref: Non-RFC (?) normative reference: ref. 'IPNAV' -- Possible downref: Non-RFC (?) normative reference: ref. 'LDP' -- Possible downref: Non-RFC (?) normative reference: ref. 'LOOP-COLOR' == Outdated reference: A later version (-14) exists of draft-ietf-rolc-nhrp-12 -- Possible downref: Non-RFC (?) normative reference: ref. 'PNNI' ** Obsolete normative reference: RFC 1583 (Obsoleted by RFC 2178) ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref. 'RFC1663') ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) ** Downref: Normative reference to an Informational RFC: RFC 1953 ** Downref: Normative reference to an Informational RFC: RFC 2098 ** Downref: Normative reference to an Informational RFC: RFC 2105 -- Unexpected draft version: The latest known version of draft-ietf-rsvp-spec is -15, but you're referring to -16. -- Possible downref: Non-RFC (?) normative reference: ref. 'RSVP-LSP' -- Possible downref: Non-RFC (?) normative reference: ref. 'TRAFENG' Summary: 11 errors (**), 0 flaws (~~), 5 warnings (==), 19 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group R. Callon 2 Internet Draft Ironbridge Networks 3 Expires: January 2000 P. Doolan 4 Ennovate Networks 5 N. Feldman 6 IBM 7 A. Fredette 8 Nortel Networks 9 G. Swallow 10 Cisco Systems 11 A. Viswanathan 12 Lucent Technologies 14 July 1999 16 A Framework for Multiprotocol Label Switching 17 19 Status of this Memo 21 This document is an Internet-Draft and is in full conformance with 22 all provisions of Section 10 of RFC2026. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six 30 months and may be updated, replaced, or obsoleted by other 31 documents at any time. It is inappropriate to use Internet-Drafts 32 as reference material or to cite them other than as "work in 33 progress." 35 The list of current Internet-Drafts can be accessed at 36 http://www.ietf.org/ietf/1id-abstracts.txt 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 Abstract 43 This document discusses technical issues and requirements for the 44 Multiprotocol Label Switching working group. It is the intent of 45 this document to produce a coherent description of all significant 46 approaches which were and are being considered by the working 47 group. Selection of specific approaches, making choices regarding 48 engineering tradeoffs, and detailed protocol specification, are 49 outside of the scope of this framework document. 51 Acknowledgments 53 The ideas and text in this document have been collected from a 54 number of sources and comments received. We would like to thank 55 Eric Gray, Jim Luciani, Andy Malis, Rayadurgam Ravikanth, Yakov 56 Rekhter, Eric Rosen, Vijay Srinivasan, and Pasi Vananen for their 57 inputs and ideas. 59 1. Introduction and Requirements 61 1.1 Overview of MPLS 63 The primary goal of the MPLS working group is to standardize a 64 base technology that integrates the label swapping forwarding 65 paradigm with network layer routing. This base technology (label 66 swapping) is expected to improve the price/performance of network 67 layer routing, improve the scalability of the network layer, and 68 provide greater flexibility in the delivery of (new) routing 69 services (by allowing new routing services to be added without a 70 change to the forwarding paradigm). 72 The initial MPLS effort will be focused on IPv4. However, the core 73 technology will be extendible to multiple network layer protocols 74 (e.g., Ipv6, IPX, Appletalk, DECnet, CLNP). MPLS is not confined 75 to any specific link layer technology, it can work with any media 76 over which Network Layer packets can be passed between network 77 layer entities. 79 MPLS makes use of a routing approach whereby the normal mode of 80 operation is that L3 routing (e.g., existing IP routing protocols 81 and/or new IP routing protocols) is used by all nodes to determine 82 the routed path. 84 MPLS provides a simple "core" set of mechanisms which can be 85 applied in several ways to provide a rich functionality. The core 86 effort includes: 88 a) Semantics assigned to a stream label: 90 - Labels are associated with specific streams of data. 92 b) Forwarding Methods: 94 - Forwarding is simplified by the use of short fixed length 95 labels to identify streams. 97 - Forwarding may require simple functions such as looking 98 up a label in a table, swapping labels, and possibly 99 decrementing and checking a TTL. 101 - In some cases, MPLS may make direct use of underlying 102 layer 2 forwarding, such as is provided by ATM [ATM] or 103 Frame Relay [FR] equipment. 105 c) Label Distribution Methods: 107 - Allow nodes to determine which labels to use for 108 specific streams. 110 - This may use some sort of control exchange, and/or be 111 piggybacked on a routing protocol. 113 The MPLS working group will define the procedures and protocols 114 used to assign significance to the forwarding labels and to 115 distribute that information between cooperating MPLS forwarders. 117 1.2 Requirements 119 - MPLS forwarding MUST simplify packet forwarding in order to 120 do the following: 122 o lower cost of high speed forwarding 123 o improve forwarding performance 125 - MPLS core technologies MUST be general with respect to data 126 link technologies (ie, work over a very wide range of 127 underlying data links). Specific optimizations for 128 particular media MAY be considered. 130 - MPLS core technologies MUST be compatible with a wide range 131 of routing protocols, and MUST be capable of operating 132 independently of the underlying routing protocols. It has 133 been observed that considerable optimizations can be 134 achieved in some cases by small enhancements of existing 135 protocols. Such enhancements MAY be considered in the case 136 of IETF standard routing protocols, and if appropriate, 137 coordinated with the relevant working group(s). 139 - Routing protocols which are used in conjunction with MPLS 140 might be based on distributed computation. As such, during 141 routing transients, these protocols may compute forwarding 142 paths which potentially contain loops. MPLS MUST provide 143 protocol mechanisms to either prevent the formation of loops 144 and /or contain the amount of (networking) resources that 145 can be consumed due to the presence of loops. 147 - MPLS forwarding MUST allow "aggregate forwarding" of user 148 data; ie, allow streams to be forwarded as a unit and ensure 149 that an identified stream takes a single path, where a 150 stream may consist of the aggregate of multiple flows of 151 user data. MPLS SHOULD provide multiple levels of 152 aggregation support (e.g., from individual end to end 153 application flows at one extreme, to aggregates of all flows 154 passing through a specified switch or router at the other 155 extreme). 157 - MPLS MUST support operations, administration, and 158 maintenance facilities at least as extensive as those 159 supported in current IP networks. Current network management 160 and diagnostic tools SHOULD continue to work in order to 161 provide some backward compatibility. Where such tools are 162 broken by MPLS, hooks MUST be supplied to allow equivalent 163 functionality to be created. 165 - MPLS core technologies MUST work with both unicast and 166 multicast streams. 168 - The MPLS core specifications MUST clearly state how MPLS 169 operates in a hierarchical network. 171 - Scalability issues MUST be considered and analyzed during 172 the definition of MPLS. Very scaleable solutions MUST be 173 sought. 175 - MPLS core technologies MUST be capable of working with O(n) 176 streams to switch all best-effort traffic, where n is the 177 number of nodes in a MPLS domain. MPLS protocol standards 178 MUST be capable of taking advantage of hardware that 179 supports stream merging where appropriate. Note that O(n- 180 squared) streams or VCs might also be appropriate for use in 181 some cases. 183 - The core set of MPLS standards, along with existing 184 Internet standards, MUST be a self-contained solution. For 185 example, the proposed solution MUST NOT require specific 186 hardware features that do not commonly exist on network 187 equipment at the time that the standard is complete. 188 However, the solution MAY make use of additional optional 189 hardware features (e.g., to optimize performance). 191 - The MPLS protocol standards MUST support multipath routing 192 and forwarding. 194 - MPLS MUST be compatible with the IETF Integrated Services 195 Model, including RSVP [RFC1663][RSVP]. 197 - It MUST be possible for MPLS switches to coexist with non 198 MPLS switches in the same switched network. MPLS switches 199 SHOULD NOT impose additional configuration on non-MPLS 200 switches. 202 - MPLS MUST allow "ships in the night" operation with 203 existing layer 2 switching protocols (e.g., ATM Forum 204 Signaling) (ie, MPLS must be capable of being used in the 205 same network which is also simultaneously operating standard 206 layer 2 protocols). 208 - The MPLS protocol MUST support both topology-driven and 209 traffic/request-driven label assignments. 211 1.3 Terminology 213 aggregate stream 215 synonym of "stream" 217 DLCI 219 a label used in Frame Relay networks to identify frame 220 relay circuits 222 flow 224 a single instance of an application to application flow of 225 data (as in the RSVP and IFMP use of the term "flow") 227 forwarding equivalence class 229 a group of L3 packets which are forwarded in the same 230 manner (e.g., over the same path, with the same forwarding 231 treatment). A forwarding equivalence class is therefore the 232 set of L3 packets which could safely be mapped to the same 233 label. Note that there may be reasons that packets from a 234 single forwarding equivalence class may be mapped to 235 multiple labels (e.g., when stream merge is not used). 237 frame merge 239 stream merge, when it is applied to operation over frame 240 based media, so that the potential problem of cell 241 interleave is not an issue. 243 Label 244 a short fixed length physically contiguous locally 245 significant identifier which is used to identify a stream 247 label information base 249 the database of information containing label bindings 251 label swap 253 the basic forwarding operation consisting of looking up an 254 incoming label to determine the outgoing label, 255 encapsulation, port, and other data handling information 257 label swapping 259 a forwarding paradigm allowing streamlined forwarding of 260 data by using labels to identify streams of data to be 261 forwarded 263 label switched hop 265 the hop between two MPLS nodes, on which forwarding is done 266 using labels. 268 label switched path 270 the path created by the concatenation of one or more label 271 switched hops, allowing a packet to be forwarded by 272 swapping labels from an MPLS node to another MPLS node 274 layer 2 276 the protocol layer under layer 3 (which therefore offers 277 the services used by layer 3); Forwarding, when done by the 278 swapping of short fixed length labels, occurs at layer 2 279 regardless of whether the label being examined is an ATM 280 VPI/VCI, a frame relay DLCI, or an MPLS label 282 layer 3 284 the protocol layer at which IP and its associated routing 285 protocols operate 287 link layer 289 synonymous with layer 2 291 loop detection 293 a method in which loop setup may occur and data may be 294 injected into the loop but a mechanism is provided to 295 detect and break such loops 297 loop prevention 299 a method of dealing with loops in which data is never 300 transmitted over a loop 302 label stack 304 an ordered set of labels 306 loop survival 308 a method of dealing with loops in which data may be 309 transmitted over a loop, but means are employed to limit 310 the amount of network resources which may be consumed by 311 the looping data 313 label switching router 315 an MPLS node which is capable of forwarding native L3 316 packets 318 merge point 320 the node at which multiple streams and switched paths are 321 combined into a single stream sent over a single path; In 322 the case that the multiple paths are not combined prior to 323 the egress node, then the egress node becomes the merge 324 point 326 Mlabel 328 abbreviation for MPLS label 330 MPLS core standards 332 the standards which describe the core MPLS technology 334 MPLS domain 336 a contiguous set of nodes which operate MPLS routing and 337 forwarding and which are also in one Routing or 338 Administrative Domain 340 MPLS edge node 342 an MPLS node that connects an MPLS domain with a node which 343 is outside of the domain, either because it does not run 344 MPLS, and/or because it is in a different domain; Note that 345 if an LSR has a neighboring host which is not running MPLS, 346 that LSR is an MPLS edge node 348 MPLS egress node 350 an MPLS edge node in its role in handling traffic as it 351 leaves an MPLS domain 353 MPLS ingress node 355 an MPLS edge node in its role in handling traffic as it 356 enters an MPLS domain 358 MPLS label 360 a label placed in a short MPLS shim header used to identify 361 streams 363 MPLS node 365 a node which is running MPLS. An MPLS node will be aware of 366 MPLS control protocols, will operate one or more L3 routing 367 protocols, and will be capable of forwarding packets based 368 on labels; An MPLS node may optionally be also capable of 369 forwarding native L3 packets 371 MultiProtocol Label Switching 373 an IETF working group and the effort associated with the 374 working group 376 network layer 378 synonymous with layer 3 380 shortcut VC 382 a VC set up as a result of an NHRP query and response 384 stack 386 synonymous with label stack 388 stream 390 an aggregate of one or more flows, treated as one flow for 391 the purpose of forwarding in L2 and/or L3 nodes (e.g., may 392 be described using a single label); In many cases a stream 393 may be the aggregate of a very large number of flows. 395 Synonymous with "aggregate stream" 397 stream merge 399 the merging of several smaller streams into a larger 400 stream, such that for some or all of the path the larger 401 stream can be referred to using a single label 403 switched path 405 synonymous with label switched path 407 virtual circuit 409 circuit used by a connection-oriented layer 2 technology 410 such as ATM or Frame Relay, requiring the maintenance of 411 state information in layer 2 switches 413 VC merge 415 stream merge when it is specifically applied to VCs, 416 specifically so as to allow multiple VCs to merge into one 417 single VC 419 VP merge 421 stream merge when it is applied to VPs, specifically so as 422 to allow multiple VPs to merge into one single VP. In this 423 case the VCIs need to be unique; This allows cells from 424 different sources to be distinguished via the VCI 426 VPI/VCI 428 a label used in ATM networks to identify circuits 430 1.4 Acronyms and Abbreviations 432 DLCI Data Link Circuit Identifier 434 FEC Forwarding Equivalence Class 436 ISP Internet Service Provider 438 LIB Label Information Base 440 LDP Label Distribution Protocol 442 L2 Layer 2 444 L3 Layer 3 445 LSP Label Switched Path 447 LSR Label Switching Router 449 MPLS MultiProtocol Label Switching 451 MPT Multipoint to Point Tree 453 NHC Next Hop (NHRP) Client 455 NHS Next Hop (NHRP) Server 457 VC Virtual Circuit 459 VCI Virtual Circuit Identifier 461 VPI Virtual Path Identifier 463 1.5 Motivation for MPLS 465 This section describes the expected and potential benefits of the 466 MPLS over existing schemes. Specifically, this section discusses 467 the advantages of MPLS over previous methods for building core 468 networks (ie, networks for internet service providers or for major 469 corporate backbones). The potential advantages of MPLS in campus 470 and local area networks are not discussed in this section. 472 There are currently two commonly used methods for building core IP 473 networks: (i) Networks of datagram routers in which the core of 474 the network is based on the datagram routers; (ii) Networks of 475 datagram routers operating over an ATM core. In order to describe 476 the advantages of MPLS, it is necessary to know which alternate to 477 MPLS we are using for the comparison. This section is therefore 478 split into two sections: Section 1.5.1 describes the advantages of 479 MPLS when compared to a pure datagram routed network. Section 480 1.5.2 describes the advantages of MPLS when compared to an IP over 481 ATM network. 483 This section does not provide a complete list of requirements for 484 MPLS. For example, Multipoint to Point Trees are important for 485 MPLS to scale. However, datagram forwarding naturally acts in this 486 way (since multiple sources are merged automatically), and the ATM 487 forum is currently adding support for multipoint to point to the 488 ATM standards. The ability to do MPTs is therefore important to 489 MPLS, but does not represent an advantage over either datagram 490 routing or IP over ATM, and therefore is not mentioned in this 491 section. 493 1.5.1 Benefits Relative to Use of a Router Core 494 1.5.1.1 Simplified Forwarding 496 Label swapping allows packet forwarding to be based on an exact 497 match for a short label, rather than a longest match algorithm 498 applied to a longer address as is required for normal datagram 499 forwarding. In addition, the label headers used with MPLS are 500 simpler than the headers typically used with datagram protocols 501 such as IP. This in turn implies that MPLS allows a much simpler 502 forwarding paradigm relative to datagrams, and implies that it is 503 easier to build a high speed router using MPLS. 505 Whether this simpler forwarding operation will result in 506 availability of LSRs which can operate at higher speeds than 507 datagram routers is controversial, and probably depends upon 508 implementation details. There are some parts of the network, such 509 as at hierarchical boundaries, where datagram IP forwarding at 510 high speed will be required. This implies that implementation of a 511 high speed router is highly desirable. In addition, there are 512 currently multiple companies building high speed routers which 513 will allow IP packets to be forwarded at very high speed. At 514 speeds at least up to OC48, it appears that once the one-time 515 engineering is completed, the per-unit cost associated with IP 516 forwarding will be a small fraction of the overall equipment cost. 518 However, there are also many existing routers which can benefit 519 from the simpler forwarding allowed by MPLS. In addition, there 520 are some routers being built with implementations that will 521 benefit from the simpler forwarding available with MPLS. 523 1.5.1.2 Efficient Explicit Routing 525 Explicit routing (aka Source Routing) is a very powerful technique 526 which potentially can be useful for a variety of purposes. 527 However, with pure datagram routing the overhead of carrying a 528 complete explicit route with each packet is prohibitive. However, 529 MPLS allows the explicit route to be carried only at the time that 530 the label switched path is set up, and not with each packet. This 531 implies that MPLS makes explicit routing practical. This in turn 532 implies that MPLS can make possible a number of advanced routing 533 features which depend upon explicit routing. 535 1.5.1.3 Traffic Engineering 537 Traffic engineering refers to the process of selecting the paths 538 chosen by data traffic in order to balance the traffic load on the 539 various links, routers, and switches in the network. Traffic 540 engineering is most important in networks where multiple parallel 541 or alternate paths are available. The rapid growth in the 542 Internet, and particularly the associated rapid growth in the 543 demand for bandwidth, has tended to cause some core networks to 544 become increasingly "branchy" in recent years, resulting in an 545 increase in the importance of traffic engineering [TRAFENG]. 547 It is common today, in networks that are running IP over an ATM 548 core using PVCs, to manually configure the path of each PVC in 549 order to equalize the traffic levels on different links in the 550 network. Thus traffic engineering is typically done today in IP 551 over ATM networks using manual configuration. 553 Traffic engineering is difficult to accomplish with datagram 554 routing. Some degree of load balancing can be obtained by 555 adjusting the metrics associated with network links. However, 556 there is a limit to how much can be accomplished in this way, and 557 in networks with a large number of alternative paths between any 558 two points balancing of the traffic levels on all links is 559 difficult to achieve solely by adjustment of the metrics used with 560 hop by hop datagram routing. 562 MPLS allows streams from any particular ingress node to any 563 particular egress node to be individually identified. MPLS 564 therefore provides a straightforward mechanism to measure the 565 traffic associated with each ingress node to egress node pair. In 566 addition, since MPLS allows efficient explicit routing of Label 567 Switched Paths, it is straightforward to ensure that any 568 particular stream of data takes the preferred path. 570 The hard part of traffic engineering is selection of the method 571 used to route each Label Switched Path. There are a variety of 572 possible ways to do this, ranging from manual configuration of 573 routes, to use of a routing protocol which announces traffic loads 574 in the network combined with background recomputation of paths. 576 1.5.1.4 QoS Routing 578 QoS routing refers to a method of routing in which the route 579 chosen for a particular stream is chosen in response to the QoS 580 required for that stream. In many cases QoS routing needs to make 581 use of explicit routing for several reasons: 583 In some cases specific bandwidth is likely to be reserved for each 584 of many specific streams of data. This implies that the total 585 bandwidth of multiple streams may exceed the bandwidth available 586 on any particular link, and thus not all streams, even between the 587 same ingress and egress nodes, can take the same path. Instead, 588 individual streams will need to be individually routed. This is 589 somewhat analogous to traffic engineering, but might require 590 separation of streams on a finer granularity. Thus explicit 591 routing may be needed in order to allow each stream to be 592 individually routed, and to eliminate the need for each switch 593 along the path of a stream to compute the route for each stream. 595 Consider the case of routing a stream with a specific bandwidth 596 requirement: In this case the route chosen will depend upon the 597 amount of bandwidth which is requested. For any one given 598 bandwidth, it is straightforward to select a path. However there 599 are a lot of different levels of bandwidth which could in 600 principle be requested. This makes it impractical to precompute 601 all possible paths for all possible bandwidths. If the path for a 602 particular stream must be computed on demand, then it is 603 undesirable to require every LSR on the path to compute the path. 604 Instead, it is preferable to have the first node compute the path 605 and specify the route to be followed through use of an explicit 606 route. 608 For a variety of reasons the information available for QoS routing 609 may in some cases be slightly out of date. This implies that the 610 attempt to select a specific path for a QoS-sensitive stream may 611 in some cases fail, due to a particular node or link not having 612 the required resources available. In these cases it is not in 613 general always feasible to tell all other nodes in the network of 614 the limited resource in one particular network element. If 615 explicit routing is available, then this permits the initial node 616 of the stream (the ingress node in MPLS) to be informed that the 617 indicated network element is not able to carry the stream, 618 allowing an alternate path to be selected. However, in this case 619 the node that selects the alternate path has to use explicit 620 routing in order to force the stream to follow the alternate path. 622 These and similar examples implies that explicit routing is 623 necessary in order to do an adequate job of QoS routing. Given 624 that MPLS allows efficient explicit routing, it follows that MPLS 625 also facilitates QoS routing. 627 1.5.1.5 Mappings from IP Packet to Forwarding Equivalence Class 629 MPLS allows the mapping from IP packet to forwarding equivalence 630 class to be performed only once, at the ingress to an MPLS area. 631 This facilitates complex mappings from IP packet to FEC that would 632 otherwise be impractical. 634 For example, consider the case of provisioned QoS: Some ISPs offer 635 a service wherein specific customers subscribe to receive 636 differentiated services (e.g., their packets may receive 637 preferential forwarding treatment). Mapping of IP packets to the 638 service level may require knowing the customer who is transmitting 639 the packet, which may in turn require packet filtering based on 640 source and destination address, incoming interface, and other 641 characteristics. The sheer number of filters that are needed in a 642 moderate sized ISP preclude repetition of the filters at every 643 router throughout the network. Also, some information such as 644 incoming interface is not available except at the ingress node to 645 the network. This implies that the preferred way to offer 646 provisioned QoS is to map the packet at the ingress point to the 647 preferred QoS level, and then label the packet in some way. MPLS 648 offers an efficient method to label the QoS class associated with 649 any particular packet. 651 Other examples of complex mappings from IP packet to FEC are also 652 likely to be determined as MPLS is deployed. 654 1.5.1.6 Partitioning of Functionality 656 Due to the support of the different label granularities, it will 657 be possible to hierarchically partition the processing 658 functionality to the different network elements, so that the more 659 heavy processing takes place on the edges of the network, near the 660 customers, and on the core network the processing is as simple as 661 possible, e.g. pure label based forwarding. 663 AS level aggregations will enable building of the fully switched 664 backbone networks and traffic exchange points. Also, it will be 665 possible for operators to fully switch the transit traffic 666 traveling through the operator's network. Deaggregation will be 667 needed for the streams that are destined in the networks connected 668 to the MPLS domain, but it shall be noted that this deaggregation 669 will only need to perform lookup operations associated with 670 finding the label for egress router or interface, e.g. TOS 671 information bound to label in source is still valid, and can be 672 honored on basis of which label the packet was received in. It 673 shall be noted that it is even impossible for the receiving domain 674 to do the classification as the original packet classification 675 policy is not known by the receiving domain. 677 As one example of the improved functional partitioning, consider 678 the case of the use of packet filters to map IP packets into a 679 substantial number of queues, such that each queue receives 680 differentiated services. For example, suppose that a network 681 supports individual queuing for on the order of 100 different 682 customers, with packets mapped to queues based on the source and 683 destination IP address. In this case, with MPLS the packet 684 filtering can be done solely on the edge of the network, with the 685 packets mapped to labels such that each individual user receives 686 separate labels. Thus with MPLS the filtering can be performed at 687 the edge only of the network. This allows complex mappings of IP 688 packets to forwarding equivalence class. 690 1.5.1.7 Single Forwarding Paradigm with Service Level Differentiation 692 MPLS can allow a single forwarding paradigm to be used to support 693 multiple types of service on the same network. 695 Because of the forwarding paradigm, it will be possible to carry 696 the different services through the same network elements, 697 regardless of the control plane protocols used for the population 698 of the LSR's LIB. It is for example possible, in case of ATM based 699 switching system to support all the native ATM services, frame 700 relay services, and labeled IP services. The simultaneous support 701 of multiple service may need partitioning of the label space 702 between the services, and shall be supported by the label 703 distribution management protocol. 705 Non-exhaustive list of examples of the services suitable for 706 carrying over LSRs are IP traffic, Frame Relay traffic, ATM 707 traffic (in case of cell switching), IP tunneling, VPNs, and other 708 datagram protocols. 710 Note that MPLS does not necessarily use the same header format 711 over all types of media. However, over any particular type of 712 media a single header format (at least for the lowest level of the 713 Label Stack) should be possible. 715 1.5.2 Benefits Relative to Use of an ATM or Frame Relay Core 717 Note: This section compares MPLS with other methods for 718 interconnecting routers over a switched core network. We are not 719 considering methods for interconnecting hosts located on virtual 720 networks. For example the ATM Forum LANE and MPOA standards 721 support virtual networks. MPLS does not directly support virtual 722 networks, and should not be compared directly with MPOA or LANE. 724 Previously available methods for interconnecting routers in an IP 725 over ATM environment make use of either: (i) a full mesh 'n- 726 squared' overlay of virtual circuits between n ATM-attached 727 routers; (ii) A partial mesh of VCs between routers; or (iii) A 728 partial mesh of VCs, plus the use of NHRP to facilitate on demand 729 cut-through SVCs. 731 1.5.2.1 Scaling of the Routing Protocol 733 Relative to the interconnection of IP over an ATM core, MPLS 734 improves the scaling of routing due to reduced number of peers and 735 elimination of the 'n-squared' logical links between routers used 736 to operate the routing protocols. 738 Because all LSRs will run standard routing protocols, the number 739 of the peers routers need to communicate with are reduced to the 740 number of the LSRs and router given LSR is directly connected to, 741 instead of having to peer with large number of routers at the ends 742 of the switched L2 paths. This benefit is achieved because the 743 edge LSRs do not need to peer with every other edge LSR in the 744 domain as is the case on a hybrid switch / router network. 746 1.5.2.2 Common Operation over Packet and Cell media 748 MPLS makes use of common methods for routing and forwarding over 749 packet and cell media, and potentially allows a common approach to 750 traffic engineering, QoS routing, and other aspects of operation. 751 For example, this means that the same method for label 752 distribution can be used over Frame Relay and ATM media, as well 753 as between LSRs using the MPLS Shim Header for forwarding over 754 other media (such as PPP links and broadcast LANs). 756 Note: There may be some differences with respect to the operation 757 of different media. For example, if VP merge is used with ATM 758 media (rather than VC merge) then the merge operation may be 759 somewhat different than what it would be with packet media or with 760 ATM using VC merge. 762 1.5.2.3 Easier Management 764 The use of a common method for label distribution and common 765 routing protocols over multiple types of media is expected to 766 simplify network management of MPLS networks. 768 1.5.2.4 Elimination of the 'Routing over Large Clouds' Issue 770 MPLS eliminates the need to use NHRP and on-demand cut-through 771 SVCs for operation over ATM. This eliminates the latency problem 772 associated with cut-through SVCs. 774 2. Discussion of Core MPLS Components 776 2.1 The Basic Routing Approach 778 Routing is accomplished through the use of standard L3 routing 779 protocols, such as OSPF and BGP [RFC1583][RFC1771]. The 780 information maintained by the L3 routing protocols is then used to 781 distribute labels to neighboring nodes that are used in the 782 forwarding of packets as described below. In the case of ATM 783 networks, the labels that are distributed are VPI/VCIs and a 784 separate protocol (ie, PNNI) is not necessary for the 785 establishment of VCs for IP forwarding. 787 The topological scope of a routing protocol (ie routing domain) 788 and the scope of label switching MPLS-capable nodes may be 789 different. For example, MPLS-knowledgeable and MPLS-ignorant 790 nodes, all of which are OSPF routers, may be co-resident in an 791 area. In the case that neighboring routers know MPLS, labels can 792 be exchanged and used. 794 Neighboring MPLS routers may use configured PVCs or PVPs to tunnel 795 through non-participating ATM or FR switches. 797 2.2 Labels 799 In addition to the single routing protocol approach discussed 800 above, the other key concept in the basic MPLS approach is the use 801 of short fixed length labels to simply user data forwarding. 803 2.2.1 Label Semantics 805 It is important that the MPLS solutions are clear about what 806 semantics (ie, what knowledge of the state of the network) is 807 implicit in the use of labels for forwarding user data packets or 808 cells. 810 At the simplest level, a label may be thought of as nothing more 811 than a shorthand for the packet header, in order to index the 812 forwarding decision that a router would make for the packet. In 813 this context, the label is nothing more than a shorthand for an 814 aggregate stream of user data. 816 This observation leads to one possible very simple interpretation 817 that the "meaning" of the label is a strictly local issue between 818 two neighboring nodes. With this interpretation: (i) MPLS could be 819 employed between any two neighboring nodes for forwarding of data 820 between those nodes, even if no other nodes in the network 821 participate in MPLS; (ii) When MPLS is used between more than two 822 nodes, then the operation between any two neighboring nodes could 823 be interpreted as independent of the operation between any other 824 pair of nodes. This approach has the advantage of semantic 825 simplicity, and of being the closest to pure datagram forwarding. 826 However this approach (like pure datagram forwarding) has the 827 disadvantage that when a packet is forwarded it is not known 828 whether the packet is being forwarded into a loop, into a black 829 hole, or towards links which have inadequate resources to handle 830 the traffic flow. These disadvantages are necessary with pure 831 datagram forwarding, but are optional design choices to be made 832 when label switching is being used. 834 There are cases where it would be desirable to have additional 835 knowledge implicit in the existence of the label. For example, one 836 approach to avoiding loops (see section 4.3) involves signaling 837 the label distribution along a path before packets are forwarded 838 on that path. With this approach the fact that a node has a label 839 to use for a particular IP packet would imply the knowledge that 840 following the label (including label swapping at subsequent nodes) 841 leads to a non-looping path which makes progress towards the 842 destination (something which is usually, but not necessarily 843 always true when using pure datagram routing). This would of 844 course require some sort of label distribution/setup protocol 845 which signals along the path being setup before the labels are 846 available for packet forwarding. However, there are also other 847 consequences to having additional semantics associated with the 848 label: specifically, procedures are needed to ensure that the 849 semantics are correct. For example, if the fact that you have a 850 label for a particular destination implies that there is a loop- 851 free path, then when the path changes some procedures are required 852 to ensure that it is still loop free. Another example of semantics 853 which could be implicit in a label is the identity of the higher 854 level protocol type which is encoded using that label value. 856 In either case, the specific value of a label to use for a stream 857 is strictly a local issue; however the decision about whether to 858 use the label may be based on some global (or at least wider 859 scope) knowledge that, for example, the label-switched path is 860 loop-free and/or has the appropriate resources. 862 A similar example occurs in ATM networks: With standard ATM a 863 signaling protocol is used which both reserves resources in 864 switches along the path, and which ensures that the path is loop- 865 free and terminates at the correct node. Thus implicit in the fact 866 that an ATM node has a VPI/VCI for forwarding a particular piece 867 of data is the knowledge that the path has been set up 868 successfully. 870 Another similar example occurs with multipoint to point trees over 871 ATM (see section 4.2 below), where the multipoint to point tree 872 uses a VP, and cell interleave at merge points in the tree is 873 handled by giving each source on the tree a distinct VCI within 874 the VP. In this case, the fact that each source has a known 875 VPI/VCI to use needs to (implicitly or explicitly) imply the 876 knowledge that the VCI assigned to that source is unique within 877 the context of the VP. 879 In general labels are used to optimize how the system works, not 880 to control how the system works. For example, the routing protocol 881 determines the path that a packet follows. The presence or absence 882 of a label assignment should not effect the path of a L3 packet. 883 Note however that the use of labels may make capabilities such as 884 explicit routes, loadsharing, and multipath more efficient. 886 2.2.2 Label Granularity 888 Labels are used to create a simple forwarding paradigm. The 889 essential element in assigning a label is that the device which 890 will be using the label to forward packets will be forwarding all 891 packets with the same label in the same way. If the packet is to 892 be forwarded solely by looking at the label, then at a minimum, 893 all packets with the same incoming label should be forwarded out 894 the same port(s) with the same encapsulation(s), and with the same 895 next hop label if any (although the special cases of multipath and 896 load sharing may be an exception to this rule). 898 The term "forwarding equivalence class" is used to refer to a set 899 of L3 packets which are all forwarded in the same manner by a 900 particular LSR (for example, the IP packets in a forwarding 901 equivalence class may be destined for the same egress from an MPLS 902 network, and may be associated with the same QoS class). A 903 forwarding equivalence class is therefore the set of L3 packets 904 which could safely be mapped to the same label. Note that there 905 may be reasons that packets from a single forwarding equivalence 906 class may be mapped to multiple labels (e.g., when stream merge is 907 not used). 909 Note that the label could also mean "ignore this label and forward 910 based on what is contained within," where within one might find a 911 label (if a stack of labels is used) or a layer 3 packet. 913 For IP unicast traffic, the granularity of a label allows various 914 levels of aggregation in a Label Information Base (LIB). At one 915 end of the spectrum, a label could represent a host route (ie the 916 full 32 bits of IP address). If a router forwards an entire CIDR 917 prefix in the same way, it may choose to use one label to 918 represent that prefix. Similarly if the router is forwarding 919 several (otherwise unrelated) CIDR prefixes in the same way it may 920 choose to use the same label for this set of prefixes. For 921 instance all CIDR prefixes which share the same BGP Next Hop could 922 be assigned the same label. Taking this to the limit, an egress 923 router may choose to advertise all of its prefixes with the same 924 label. 926 By introducing the concept of an egress identifier, the 927 distribution of labels associated with groups of CIDR prefixes can 928 be simplified. For instance, an egress identifier might specify 929 the BGP Next Hop, with all prefixes routed to that next hop 930 receiving the label associated with that egress identifier. 931 Another natural place to aggregate would be the MPLS egress 932 router. This would work particularly well in conjunction with a 933 link-state routing protocol, where the association between egress 934 router and CIDR prefix is already distributed throughout an area. 936 For IP multicast, the natural binding of a label would be to a 937 multicast tree, or rather to the branch of a tree which extends 938 from a particular port. Thus for a shared tree, the label 939 corresponds to the multicast group, (*,G). For (S,G) state, the 940 label would correspond to the source address and the multicast 941 group. 943 A label can also have a granularity finer than a host route. That 944 is, it could be associated with some combination of source and 945 destination address or other information within the packet. This 946 might for example be done on an administrative basis to aid in 947 effecting policy. A label could also correspond to all packets 948 which match a particular Integrated Services filter specification. 950 Labels can also represent explicit routes. This use is 951 semantically equivalent to using an IP tunnel with a complete 952 explicit route. This is discussed in more detail in section 4.10. 954 2.2.2.1 Examples of Unicast traffic granularities: 956 - PQ (Port Quadruples) same IP source address prefix, 957 destination address prefix, TTL, IP protocol and TCP/UDP 958 source/destination ports 960 - PQT (Port Quadruples with TOS) same IP source address 961 prefix, destination address prefix, TTL, IP protocol and 962 TCP/UDP source/destination ports and same IP header TOS 963 field (including Precedence and TOS bits). 965 - HP (Host Pairs) Same specific IP source and destination 966 address (32 bit) 968 - NP (Network Pairs) Same IP source and destination address 969 prefixes (variable length) 971 - DN (Destination Network) Same IP destination network 972 address prefix (variable length) 974 - ER (Egress Router) Same egress router ID (e.g. OSPF) 976 - NAS (Next-hop AS) Same next-hop AS number (BGP) 978 - DAS (Destination AS) Same destination AS number (BGP) 980 2.2.2.2 Multicast traffic granularities: 982 - SST (Source Specific Tree) Same source address and 983 multicast group 985 - SMT (Shared Multicast Tree) Same multicast group address 987 2.2.3 Label Assignment 989 Essential to label switching is the notion of binding between a 990 label and Network Layer routing (routes). A control component is 991 responsible for creating label bindings, and then distributing the 992 label binding information among label switches. Label assignment 993 involves allocating a label, and then binding a label to a route. 995 Label assignment can be driven by control traffic or by data 996 traffic. This is discussed in more detail in section 3.4. 998 Control traffic driven label assignment has several advantages, as 999 compared to data traffic driven label assignment. For one thing, 1000 it minimizes the amount of additional control traffic needed to 1001 distribute label binding information, as label binding information 1002 is distributed only in response to control traffic, independent of 1003 data traffic. It also makes the overall scheme independent of and 1004 insensitive to the data traffic profile/pattern. Control traffic 1005 driven creation of label binding improves forwarding latency, as 1006 labels are assigned before data traffic arrives, rather than being 1007 assigned as data traffic arrives. It also simplifies the overall 1008 system behavior, as the control plane is controlled solely by 1009 control traffic, rather than by a mix of control and data traffic. 1011 There are however situations where data traffic driven label 1012 assignment is necessary. A particular case may occur with ATM 1013 without VP or VC merge. In this case in order to set up a full 1014 mesh of VCs would require n-squared VCs. However, in very large 1015 networks this may be infeasible. Instead VCs may be setup where 1016 required for forwarding data traffic. In this case it is generally 1017 not possible to know a priori how many such streams may occur. 1019 Label withdrawal is required with both control-driven and data- 1020 driven label assignment. Label withdrawal is primarily a matter of 1021 garbage collection, that is collecting up unused labels so that 1022 they may be reassigned. Generally speaking, a label should be 1023 withdrawn when the conditions that allowed it to be assigned are 1024 no longer true. For example, if a label is imbued with extra 1025 semantics such as loop-free-ness, then the label must be withdrawn 1026 when those extra semantics cease to hold. 1028 In certain cases, notably multicast, it may be necessary to share 1029 a label space between multiple entities. If these sharing 1030 arrangements are altered by the coming and going of neighbors, 1031 then labels which are no longer controlled by an entity must be 1032 withdrawn and a new label assigned. 1034 2.2.4 Label Stack and Forwarding Operations 1036 The basic forwarding operation consists of looking up the incoming 1037 label to determine the outgoing label, encapsulation, port, and 1038 any additional information which may pertain to the stream such as 1039 a particular queue or other QoS related treatment. We refer to 1040 this operation as a label swap. 1042 When a packet first enters an MPLS domain, the packet is forwarded 1043 by normal layer 3 forwarding operations with the exception that 1044 the outgoing encapsulation will now include a label. We refer to 1045 this operation as a label push. When a packet leaves an MPLS 1046 domain, the label is removed. We refer to this as a label pop. 1048 In some situations, carrying a stack of labels is useful. For 1049 instance both IGP and BGP label could be used to allow routers in 1050 the interior of an AS to be free of BGP information. In this 1051 scenario, the "IGP" label is used to steer the packet through the 1052 AS and the "BGP" label is used to switch between ASes. 1054 With a label stack, the set of label operations remains the same, 1055 except that at some points one might push or pop multiple labels, 1056 or pop & swap, or swap & push. 1058 2.3 Encapsulation 1060 Label-based forwarding makes use of various pieces of information, 1061 including a label or stack of labels, and possibly additional 1062 information such as a TTL field [ENCAP]. In some cases this 1063 information may be encoded using an MPLS header, in other cases 1064 this information may be encoded in L2 headers. Note that there may 1065 be multiple types of MPLS headers. For example, the header used 1066 over one media type may be different than is used over a different 1067 media type. Similarly, in some cases the information that MPLS 1068 makes use of may be encoded in an ATM header. We will use the term 1069 "MPLS encapsulation" to refer to whatever form is used to 1070 encapsulate the label information and other information used for 1071 label based forwarding. The term "MPLS header" will be used where 1072 this information is carried in some sort of MPLS-specific header 1073 (ie, when the MPLS information cannot all be carried in a L2 1074 header). Whether there is one or multiple forms of possible MPLS 1075 headers is also outside of the scope of this document. 1077 The exact contents of the MPLS encapsulation is outside of the 1078 scope of this document. Some fields, such as the label, are 1079 obviously needed. Some others might or might not be standardized, 1080 based on further study. An encapsulation scheme may make use of 1081 the following fields: 1082 - label 1083 - TTL 1084 - class of service 1085 - stack indicator 1086 - next header type indicator 1087 - checksum 1089 It is desirable to have a very short encapsulation header. For 1090 example, a four byte encapsulation header adds to the convenience 1091 of building a hardware implementation that forwards based on the 1092 encapsulation header. But at the same time it is tricky assigning 1093 such a limited number of bits to carry the above listed 1094 information in an MPLS header. Hence careful consideration must be 1095 given to the information chosen for an MPLS header. 1097 A TTL value in the MPLS header may be useful in the same manner as 1098 it is in IP. Specifically, TTL may be used to terminate packets 1099 caught in a routing loop, and for other related uses such as 1100 traceroute. The TTL mechanism is a simple and proven method of 1101 handling such events. Another use of TTL is to expire packets in a 1102 network by limiting their "time to live" and eliminating stale 1103 packets that may cause problems for some of the higher layer 1104 protocols. When used over link layers which do not provide a TTL 1105 field, alternate mechanisms will be needed to replace the uses of 1106 the TTL field. 1108 A provision for a class of service (COS) field in the MPLS header 1109 allows multiple service classes within the same label. However, 1110 when more sophisticated QoS is associated with a label, the COS 1111 may not have any significance. Alternatively, the COS (like QoS) 1112 can be left out of the header, and instead propagated with the 1113 label assignment, but this entails that a separate label be 1114 assigned to each required class of service. Nevertheless, the COS 1115 mechanism provides a simple method of segregating flows within a 1116 label. 1118 As previously mentioned, the encapsulation header can be used to 1119 derive benefits of tunneling (or stacking). 1121 The MPLS header must provide a way to indicate that multiple MPLS 1122 headers are stacked (ie, the "stack indicator"). For this purpose 1123 a single bit in the MPLS header will suffice. In addition, there 1124 are also some benefits to indicating the type of the protocol 1125 header following the MPLS header (ie, the "next header type 1126 indicator"). One option would be to combine the stack indicator 1127 and next header type indicator into a single value (ie, the next 1128 header type indicator could be allowed to take the value "MPLS 1129 header"). Another option is to have the next header type indicator 1130 be implicit in the label value (such that this information would 1131 be propagated along with the label). 1133 There is no compelling reason to support a checksum field in the 1134 MPLS header. A CRC mechanism at the L2 layer should be sufficient 1135 to ensure the integrity of the MPLS header. 1137 3. Observations, Issues and Assumptions 1139 3.1 Layer 2 versus Layer 3 Forwarding 1140 MPLS uses L2 forwarding as a way to provide simple and fast packet 1141 forwarding capability. One primary reason for the simplicity of L2 1142 layer forwarding comes from its short, fixed length labels. A node 1143 forwarding at L3 must parse a (relatively) large header, and 1144 perform a longest-prefix match to determine a forwarding path. 1145 However, when a node performs L2 label swapping, and labels are 1146 assigned properly, it can do a direct index lookup into its 1147 forwarding (or in this case, label-swapping) table with the short 1148 header. It is arguably simpler to build label swapping hardware 1149 than it is to build L3 forwarding hardware because the label 1150 swapping function is less complex. 1152 The relative performance of L2 and L3 forwarding may differ 1153 considerably between nodes. Some nodes may illustrate an order of 1154 magnitude difference. Other nodes (for example, nodes with more 1155 extensive L3 forwarding hardware) may have identical performance 1156 at L2 and L3. However, some nodes may not be capable of doing a L3 1157 forwarding at all (e.g. ATM), or have such limited capacity as to 1158 be unusable at L3. In this situation, traffic must be blackholed 1159 if no switched path exists. 1161 On nodes in which L3 forwarding is slower than L2 forwarding, 1162 pushing traffic to L3 when no L2 path is available may cause 1163 congestion. In some cases this could cause data loss (since L3 may 1164 be unable to keep up with the increased traffic). However, if data 1165 is discarded, then in general this will cause TCP to backoff, 1166 which would allow control traffic, traceroute and other network 1167 management tools to continue to work. 1169 The MPLS protocol MUST not make assumptions about the forwarding 1170 capabilities of an MPLS node. Thus, MPLS must propose solutions 1171 that can leverage the benefits of a node that is capable of L3 1172 forwarding, but must not mandate the node be capable of such. 1174 Why We Will Still Need L3 Forwarding: 1176 MPLS will not, and is not intended to, replace L3 forwarding. 1177 There is absolutely a need for some systems to continue to forward 1178 IP packets using normal Layer 3 IP forwarding. L3 forwarding will 1179 be needed for a variety of reasons, including: 1180 - For scaling; to forward on a finer granularity than the 1181 labels can provide 1182 - For security; to allow packet filtering at firewalls. 1183 - For forwarding at the initial router (when hosts don't 1184 do MPLS) 1186 Consider a campus network which is serving a small company. 1187 Suppose that this company makes use of the Internet, for example 1188 as a method of communicating with customers. A customer on the 1189 other side of the world has an IP packet to be forwarded to a 1190 particular system within the company. It is not reasonable to 1191 expect that the customer will have a label to use to forward the 1192 packet to that specific system. Rather, the label used for the 1193 "first hop" forwarding might be sufficient to get the packet 1194 considerably closer to the destination. However, the granularity 1195 of the labels cannot be to every host worldwide. Similarly, 1196 routing used within one routing domain cannot know about every 1197 host worldwide. This implies that in may cases the labels assigned 1198 to a particular packet will be sufficient to get the packet close 1199 to the destination, but that at some points along the path of the 1200 packet the IP header will need to be examined to determine a finer 1201 granularity for forwarding that packet. This is particularly 1202 likely to occur at domain boundaries. 1204 A similar point occurs at the last router prior to the destination 1205 host. In general, the number of hosts attached to a network is 1206 likely to be great enough that it is not feasible to assign a 1207 separate label to every host. Rather, as least for routing within 1208 the destination routing domain (or the destination area if there 1209 is a hierarchical routing protocol in use) a label may be assigned 1210 which is sufficient to get the packet to the last hop router. 1211 However, the last hop router will need to examine the IP header 1212 (and particularly the destination IP address) in order to forward 1213 the packet to the correct destination host. 1215 Packet filtering at firewalls is an important part of the 1216 operation of the Internet. While the current state of Internet 1217 security may be considerably less advanced than may be desired, 1218 nonetheless some security (as is provided by firewalls) is much 1219 better than no security. We expect that packet filtering will 1220 continue to be important for the foreseeable future. Packet 1221 filtering requires examination of the contents of the packet, 1222 including the IP header. This implies that at firewalls the packet 1223 cannot be forwarded simply by considering the label associated 1224 with the packet. Note that this is also likely to occur at domain 1225 boundaries. 1227 Finally, it is very likely that many hosts will not implement 1228 MPLS. Rather, the host will simply forward an IP packet to its 1229 first hop router. This first hop router will need to examine the 1230 IP header prior to forwarding the packet (with or without a 1231 label). 1233 3.2 Scaling Issues 1235 MPLS scalability is provided by two of the principles of routing. 1236 The first is that forwarding follows an inverted tree rooted at a 1237 destination. The second is that the number of destinations is 1238 reduced by routing aggregation. 1240 The very nature of IP forwarding is a merged multipoint-to-point 1241 tree. Thus, since MPLS mirrors the IP network layer, an MPLS node 1242 that is capable of merging is capable of creating O(n) switched 1243 paths which provide network reachability to all "n" destinations. 1244 The meaning of "n" depends on the granularity of the switched 1245 paths. One obvious choice of "n" is the number of CIDR prefixes 1246 existing in the forwarding table (this scales the same as today's 1247 routing). However, the value of "n" may be reduced considerably by 1248 choosing switched paths of further aggregation. For example, by 1249 creating switched paths to each possible egress node, "n" may 1250 represent the number of egress nodes in a network. This choice 1251 creates "n" switched paths, such that each path is shared by all 1252 CIDR prefixes that are routed through the same egress node. This 1253 selection greatly improves scalability, since it minimizes "n", 1254 but at the same time maintains the same switching performance of 1255 CIDR aggregation. (See section 2.2.2 for a description of all of 1256 the levels of granularity provided by MPLS). 1258 The MPLS technology must scale at least as well as existing 1259 technology. For example, if the MPLS technology were to support 1260 ONLY host-to-host switched path connectivity, then the number of 1261 switched-paths would be much higher than the number of routing 1262 table entries. 1264 There are several ways in which merging can be done in order to 1265 allow O(n) switches paths to connect n nodes. The merging approach 1266 used has an impact on the amount of state information, buffering, 1267 delay characteristics, and the means of control required to 1268 coordinate the trees. These issues are discussed in more detail in 1269 section 4.2. 1271 There are some cases in which O(n-squared) switched paths may be 1272 used (for example, by setting up a full mesh of point to point 1273 streams). As label space and the amount of state information that 1274 can be supported may be limited, it will not be possible to 1275 support O(n-squared) switched paths in very large networks. 1276 However, in some cases the use of n-squared paths may even be a 1277 advantage (for example, to allow load- splitting of individual 1278 streams). 1280 MPLS must be designed to scale for O(n). O(n) scaling allows MPLS 1281 domains to scale to a very large scale. In addition, if best 1282 effort service can be supported with O(n) scaling, this conserves 1283 resources (such as label space and state information) which can be 1284 used for supporting advanced services such as QoS. However, since 1285 some switches may not support merging, and some small networks may 1286 not require the scaling benefits of O(n), provisions must also be 1287 provided for a non-merging, O(n-squared) solution. 1289 Note: A precise and complete description of scaling would consider 1290 that there are multiple dimensions of scaling, and multiple 1291 resources whose usage may be considered. Possible dimensions of 1292 scaling include: (i) the total number of streams which exist in an 1293 MPLS domain (with associated labels assigned to them); (ii) the 1294 total number of "label swapping pairs" which may be stored in the 1295 nodes of the network (ie, entries of the form "for incoming label 1296 'x', use outgoing label 'y'"); (iii) the number of labels which 1297 need to be assigned for use over a particular link; (iv) The 1298 amount of state information which needs to be maintained by any 1299 one node. We do not intend to perform a complete analysis of all 1300 possible scaling issues, and understand that our use of the terms 1301 "O(n)" and "O(n-squared)" is approximate only. 1303 3.3 Types of Streams 1305 Switched paths in the MPLS network can be of different types: 1307 - point-to-point 1308 - multipoint-to-point 1309 - point-to-multipoint 1310 - multipoint-to-multipoint 1312 Two of the factors that determine which type of switched path is 1313 used are (i) The capability of the switches employed in a network; 1314 (ii) The purpose of the creation of a switched path; that is, the 1315 types of flows to be carried in the switched path. These two 1316 factor also determine the scalability of a network in terms of the 1317 number of switched paths in use for transporting data through a 1318 network. 1320 The point-to-point switched path can be used to connect all 1321 ingress nodes to all the egress nodes to carry unicast traffic. In 1322 this case, since an ingress node has point-to-point connections to 1323 all the egress nodes, the number of connections in use for 1324 transporting traffic is of O(n-squared), where n is the number of 1325 edge MPLS devices. For small networks the full mesh connection 1326 approach may suffice and not pose any scalability problems. 1327 However, in large enterprise backbone or ISP networks, this will 1328 not scale well. 1330 Point-to-point switched paths may be used on a host-to-host or 1331 application to application basis (e.g., a switched path per RSVP 1332 flow). The dedicated point-to-point switched path transports the 1333 unicast data from the ingress to the egress node of the MPLS 1334 network. This approach may be used for providing QoS services or 1335 for best-effort traffic. 1337 A multipoint-to-point switched path connects all ingress nodes to 1338 an single egress node. At a given intermediate node in the 1339 multipoint-to-point switched path, L2 data units from several 1340 upstream links are "merged" into a single label on a downstream 1341 link. Since each egress node is reachable via a single multipoint- 1342 to-point switched path, the number of switched paths required to 1343 transport best-effort traffic through a MPLS network is O(n), 1344 where n is the number of egress nodes. 1346 The point-to-multipoint switched path is used for distributing 1347 multicast traffic. This switched path tree mirrors the multicast 1348 distribution tree as determined by the multicast routing 1349 protocols. Typically a switch capable of point-to-multipoint 1350 connection replicates an L2 data unit from the incoming (parent) 1351 interface to all the outgoing (child) interfaces. Standard ATM 1352 switches support such functionality in the form of point-to- 1353 multipoint VCs or VPs. 1355 A multipoint-to-multipoint switched path may be used to combine 1356 multicast traffic from multiple sources into a single multicast 1357 distribution tree. The advantage of this is that the multipoint-to- 1358 multipoint switched path is shared by multiple sources. 1359 Conceptually, a form of multipoint-to-multipoint can be thought of 1360 as follows: Suppose that you have a point to multipoint VC from 1361 each node to all other nodes. Suppose that any point where two or 1362 more VCs happen to merge, you merge them into a single VC or VP. 1363 This would require either coordination of VCI spaces (so that each 1364 source has a unique VCI within a VP) or VC merge capabilities. The 1365 applicability of similar concepts to MPLS is FFS. 1367 3.4 Data Driven versus Control Traffic Driven Label Assignment 1369 A fundamental concept in MPLS is the association of labels and 1370 network layer routing. Each LSR must assign labels, and distribute 1371 them to its forwarding peers, for traffic which it intends to 1372 forward by label swapping. In the various contributions that have 1373 been made so far to the MPLS WG we identify three broad strategies 1374 for label assignment; (i) those driven by topology based control 1375 traffic [RFC2105][ARIS][IPNAV]; (ii) Those driven by request based 1376 control traffic [CR-LDP][RSVP-LSP]; and (iii) those driven by data 1377 traffic [RFC2098][RFC1953]. 1379 We also note that in actual practice combinations of these methods 1380 may be employed. One example is that topology based methods for 1381 best effort traffic plus request based methods for support of 1382 RSVP. 1384 3.4.1 Topology Driven Label Assignment 1386 In this scheme labels are assigned in response to normal 1387 processing of routing protocol control traffic. Examples of such 1388 control protocols are OSPF and BGP. As an LSR processes OSPF or 1389 BGP updates it can, as it makes or changes entries in its 1390 forwarding tables, assign labels to those entries. 1392 Among the properties of this scheme are: 1394 - The computational load of assignment and distribution and 1395 the bandwidth consumed by label distribution are bounded by 1396 the size of the network. 1398 - Labels are in the general case preassigned. If a route 1399 exists then a label has been assigned to it (and 1400 distributed). Traffic may be label swapped immediately it 1401 arrives, there is no label setup latency at forwarding time. 1403 - Requires LSRs to be able to process control traffic load 1404 only. 1406 - Labels assigned in response to the operation of routing 1407 protocols can have a granularity equivalent to that of the 1408 routes advertised by the protocol. Labels can, by this 1409 means, cover (highly) aggregated routes. 1411 3.4.2 Request Driven Label Assignment 1413 In this scheme labels are assigned in response to normal 1414 processing of request based control traffic. Examples of such 1415 control protocols are RSVP. As an LSR processes RSVP messages it 1416 can, as it makes or changes entries in its forwarding tables, 1417 assign labels to those entries. 1419 Among the properties of this scheme are: 1421 - The computational load of assignment and distribution and 1422 the bandwidth consumed by label distribution are bounded by 1423 the amount of control traffic in the system. 1425 - Labels are in the general case preassigned. If a route 1426 exists then a label has been assigned to it (and 1427 distributed). Traffic may be label swapped immediately it 1428 arrives, there is no label setup latency at forwarding time. 1430 - Requires LSRs to be able to process control traffic load 1431 only. 1433 - Depending upon the number of flows supported, this approach 1434 may require a larger number of labels to be assigned 1435 compared with topology driven assignment. 1437 - This approach requires applications to make use of request 1438 paradigm in order to get a label assigned to their flow. 1440 3.4.3 Traffic Driven Label Assignment 1442 In this scheme the arrival of data at an LSR "triggers" label 1443 assignment and distribution. Traffic driven approach has the 1444 following characteristics. 1446 - Label assignment and distribution costs are a function of 1447 traffic patterns. In an LSR with limited label space that is 1448 using a traffic driven approach to amortize its labels over 1449 a larger number of flows the overhead due to label 1450 assignment and distribution grows as a function of the 1451 number of flows and as a function of their "persistence". 1452 Short lived but recurring flows may impose a heavy control 1453 burden. 1455 - There is a latency associated with the appearance of a 1456 "flow" and the assignment of a label to it. The documented 1457 approaches to this problem suggest L3 forwarding during this 1458 setup phase, this has the potential for packet reordering 1459 (note that packet reordering may occur with any scheme when 1460 the network topology changes, but traffic driven label 1461 assignment introduces another cause for reordering). 1463 - Flow driven label assignment requires high performance 1464 packet classification capabilities. 1466 - Traffic driven label assignment may be useful to reduce 1467 label consumption (assuming that flows are not close to full 1468 mesh). 1470 - If you want flows to hosts, due to limits on label space, 1471 then traffic based label consumption is probably necessary 1472 due to the large number of hosts which may occur in a 1473 network. 1475 - If you want to assign specific network resources to 1476 specific labels, to be used for support of application 1477 flows, then again the fine grain associated with labels may 1478 require data based label assignment. 1480 3.5 The Need for Dealing with Looping 1482 Routing protocols which are used in conjunction with MPLS will in 1483 many cases be based on distributed computation. As such, during 1484 routing transients, these protocols may compute forwarding paths 1485 which contain loops. For this reason MPLS will be designed with 1486 mechanisms to either prevent the formation of loops and /or 1487 contain the amount of resources that can be consumed due to the 1488 presence of loops. 1490 Note that there are a number of different alternative mechanisms 1491 which have been proposed (see section 4.3). Some of these prevent 1492 the formation of layer 2 forwarding loops, others allow loops to 1493 form but minimize their impact in one way or another (e.g., by 1494 discarding packets which loop, or by detecting and closing the 1495 loop after a period of time). Generally speaking, there are 1496 tradeoffs to be made between the amount of looping which might 1497 occur, and other considerations such as the time to convergence 1498 after a change in the paths computed by the routing algorithm. 1500 We are not proposing any changes to normal layer 3 operation, and 1501 specifically are not trying to eliminate the possibility of 1502 looping at layer 3. Transient loops will continue to be possible 1503 in IP networks. Note that IP has a means to limit the damage done 1504 by looping packets, based on decrementing the IP TTL field as the 1505 packet is forwarded, and discarding packets whose TTL has expired. 1506 Dynamic routing protocols used with IP are also designed to 1507 minimize the amount of time during which loops exist. 1509 The question that MPLS has to deal with is what to do at L2. In 1510 some cases L2 may make use of the same method that is used as L3. 1511 However, other options are available at L2, and in some cases 1512 (specifically when operating over ATM or Frame Relay hardware) the 1513 method of decrementing a TTL field (or any similar field) is not 1514 available. 1516 There are basically two problems caused by packet looping: The 1517 most obvious problem is that packets are not delivered to the 1518 correct destination. The other result of looping is congestion. 1519 Even with TTL decrementing and packet discard, there may still be 1520 a significant amount of time that packets travel through a loop. 1521 This can adversely affect other packets which are not looping: 1522 Congestion due to the looping packets can cause non-looping 1523 packets to be delayed and/or discarded. 1525 Looping is particularly serious in (at least) three cases: One is 1526 when forwarding over ATM. Since ATM does not have a TTL field to 1527 decrement, there is no way to discard ATM cells which are looping 1528 over ATM subnetworks. Standard ATM PNNI routing and signaling 1529 solves this problem by making use of call setup procedures which 1530 ensure that ATM VCs will never be setup in a loop [PNNI]. However, 1531 when MPLS is used over ATM subnets, the native ATM routing and 1532 signaling procedures may not be used for the full L2 path. This 1533 leads to the possibility that MPLS over ATM might in principle 1534 allow packets to loop indefinitely, or until L3 routing 1535 stabilizes. Methods are needed to prevent this problem. 1537 Another case in which looping can be particularly unpleasant is 1538 for multicast traffic. With multicast, it is possible that the 1539 packet may be delivered successfully to some destinations even 1540 though copies intended for other destinations are looping. This 1541 leads to the possibility that huge numbers of identical packets 1542 could be delivered to some destinations. Also, since multicast 1543 implies that packets are duplicated at some points in their path, 1544 the congestion resulting from looping packets may be particularly 1545 severe. 1547 Another unpleasant complication of looping occurs if the 1548 congestion caused by the loop interferes with the routing 1549 protocol. It is possible for the congestion caused by looping to 1550 cause routing protocol control packets to be discarded, with the 1551 result that the routing protocol becomes unstable. For example 1552 this could lengthen the duration of the loop. 1554 In normal operation of IP networks the impact of congestion is 1555 limited by the fact that TCP backs off (ie, transmits 1556 substantially less traffic) in response to lost packets. Where the 1557 congestion is caused by looping, the combination of TTL and the 1558 resulting discard of looping packets, plus the reduction in 1559 offered traffic, can limit the resulting impact on the network. 1560 TCP backoff however does not solve the problem if the looping 1561 packets are not discarded (for example, if the loop is over an ATM 1562 subnetwork where TTL is not used). 1564 The severity of the problem caused by looping may depend upon 1565 implementation details. Suppose, for instance, that ATM switching 1566 hardware is being used to provide MPLS switching functions. If the 1567 ATM hardware has per-VC queuing, and if it is capable of providing 1568 fair access to the buffer pool for incoming cells based on the 1569 incoming VC (so that no one incoming VC is allowed to grab a 1570 disproportionate number of buffers), this looping might not have a 1571 significant effect on other traffic. If the ATM hardware cannot 1572 provide fair buffer access of this sort, however, then even 1573 transient loops may cause severe degradation of the node's total 1574 performance. 1576 Given that MPLS is a relatively new approach, it is possible that 1577 looping may have consequences which are not fully understood (such 1578 as looping of LDP control information in cases where stream merge 1579 is not used). 1581 Even if fair buffer access can be provided, it is still worthwhile 1582 to have some means of detecting loops that last "longer than 1583 possible". In addition, even where TTL and/or per-VC fair queuing 1584 provides a means for surviving loops, it still may be desirable 1585 where practical to avoid setting up LSPs which loop. 1587 Methods for dealing with loops are discussed in section 4.3. 1589 3.6 Operations and Management 1590 Operations and management of networks is critically important. 1591 This implies that MPLS must support operations, administration, 1592 and maintenance facilities at least as extensive as those 1593 supported in current IP networks. 1595 In most ways this is a relatively simple requirement to meet. 1596 Given that all MPLS nodes run normal IP routing protocols, it is 1597 straightforward to expect them to participate in normal IP network 1598 management protocols. 1600 There is one issue which has been identified and which needs to be 1601 addressed by the MPLS effort: There is an issue with regard to 1602 operation of Traceroute over MPLS networks. Note that other O&M 1603 issues may be identified in the future. 1605 Traceroute is a very commonly used network management tool. 1606 Traceroute is based on use of the TTL field: A station trying to 1607 determine the route from itself to a specified address transmits 1608 multiple IP packets, with the TTL field set to 1 in the first 1609 packet, 2 in the second packet, etc.. This causes each router 1610 along the path to send back an ICMP error report for TTL exceeded. 1611 This in turn allows the station to determine the set of routers 1612 along the route. For example, this can be used to determine where 1613 a problem exists (if no router responds past some point, the last 1614 router which responds can become the starting point for a search 1615 to determine the cause of the problem). 1617 When MPLS is operating over ATM or Frame Relay networks there is 1618 no TTL field to decrement (and ATM and Frame Relay forwarding 1619 hardware does not decrement TTL). This implies that it is not 1620 straightforward to have Traceroute operate in this environment. 1622 There is the question of whether we *want* all routers along a 1623 path to be visible via traceroute. For example, an ISP probably 1624 doesn't want to expose the interior of their network to a 1625 customer. However, the issue of whether a network's policy will 1626 allow the interior of the network to be visible should be 1627 independent of whether is it possible for some users to see the 1628 interior of the network. Thus while there clearly should be the 1629 possibility of using policy mechanisms to block traceroute from 1630 being used to see the interior of the network, this does not imply 1631 that it is okay to develop protocol mechanisms which prevent 1632 traceroute from working. 1634 There is also the question of whether the interior of a MPLS 1635 network is analogous to a normal IP network, or whether it is 1636 closer to the interior of a layer 2 network (for example, an ATM 1637 subnet). Clearly IP traceroute cannot be used to expose the 1638 interior of an ATM subnet. When a packet is crossing an ATM 1639 subnetwork (for example, between an ingress and an egress router 1640 which are attached to the ATM subnet) traceroute can be used to 1641 determine the router to router path, but not the path through the 1642 ATM switches which comprise the ATM subnet. Note here that MPLS 1643 forms a sort of "in between" special case: 1644 Routing is based on normal IP routing protocols, the equivalent of 1645 call setup (label binding/exchange) is based on MPLS-specific 1646 protocols, but forwarding is based on normal L2 ATM forwarding. 1647 MPLS therefore supersedes the normal ATM-based methods that would 1648 be used to eliminate loops and/or trace paths through the ATM 1649 subnet. 1651 It is generally agreed that Traceroute is a relatively "ugly" 1652 tool, and that a better tool for tracing the route of a packet 1653 would be preferable. However, no better tool has yet been designed 1654 or even proposed. Also, however ugly Traceroute may be, it is 1655 nonetheless very useful, widely deployed, and widely used. In 1656 general, it is highly preferable to define, implement, and deploy 1657 a new tool, and to determine through experience that the new tool 1658 is sufficient, before breaking a tool which is as widely used as 1659 traceroute. 1661 Methods that may be used to either allow traceroute to be used in 1662 an MPLS network, or to replace traceroute, are discussed in 1663 section 4.11. 1665 4. Technical Approaches 1667 4.1 Label Distribution 1669 A fundamental requirement in MPLS is that an LSR forwarding label 1670 switched traffic to another LSR apply a label to that traffic 1671 which is meaningful to the other (receiving the traffic) LSR. 1672 LSR's could learn about each other's labels in a variety of ways. 1673 We call the general topic "label distribution". 1675 4.1.1 Explicit Label Distribution 1677 Explicit label distribution anticipates the specification by MPLS 1678 of a standard protocol for label distribution. Two of the possible 1679 approaches (TDP, ARIS [ARIS-PROT]) are oriented toward topology 1680 driven label distribution. One other approach [FANP], in contrast, 1681 makes use of traffic driven label distribution. We expect that the 1682 label distribution protocol [LDP] which emerges from the MPLS WG 1683 is likely to inherit elements from one or more of the possible 1684 approaches. 1686 Consider LSR A forwarding traffic to LSR B. We call A the upstream 1687 (wrt to dataflow) LSR and B the downstream LSR. A must apply a 1688 label to the traffic that B "understands". Label distribution must 1689 ensure that the "meaning" of the label will be communicated 1690 between A and B. An important question is whether A or B (or some 1691 other entity) allocates the label. 1693 In this discussion we are talking about the allocation and 1694 distribution of labels between two peer LSRs that are on a single 1695 segment of what may be a longer path. A related but in fact 1696 entirely separate issue is the question of where control of the 1697 whole path resides. In essence there are two models; by analogy to 1698 upstream and downstream for a single segment we can talk about 1699 ingress and egress for an LSP (or to and from a label swapping 1700 "domain"). In one model a path is setup from ingress to egress and 1701 in the other from egress to ingress. 1703 4.1.1.1 Downstream Label Allocation 1705 "Downstream Label Allocation" refers to a method where the label 1706 allocation is done by the downstream LSR, ie the LSR that uses the 1707 label as an index into its switching tables. 1709 This is, arguably, the most natural label allocation/distribution 1710 mode for unicast traffic. As an LSR build its routing tables (we 1711 consider here control driven allocation of tags) it is free, 1712 within some limits we will discuss, to allocate labels in any 1713 manner that may be convenient to the particular implementation. 1714 Since the labels that it allocates will be those upon which it 1715 subsequently makes forwarding decisions we assume implementations 1716 will perform the allocation in an optimal manner. Having allocated 1717 labels the default behavior is to distribute the labels (and 1718 bindings) to all peers. 1720 In some cases (particularly with ATM) there may be a limited 1721 number of labels which may be used across an interface, and/or a 1722 limited number of label assignments which may be supported by a 1723 single device. Operation in this case may make use of "on demand" 1724 label assignment. With this approach, an LSR may for example 1725 request a label for a route from a particular peer only when its 1726 routing calculations indicate that peer to be the new next hop for 1727 the route. 1729 4.1.1.2 Upstream Label Allocation 1731 "Upstream Label Allocation" refers to a method where the label 1732 allocation is done by the upstream LSR. In this case the LSR 1733 choosing the label (the upstream LSR) and the LSR which needs to 1734 interpret packets using the label (the downstream LSR) are not the 1735 same node. We note here that in the upstream LSR the label at 1736 issue is not used as an index into the switching tables but rather 1737 is found as the result of a lookup on those tables. 1739 The motivation for upstream label allocation comes from the 1740 recognition that it might be possible to optimize multicast 1741 machinery in an LSR if it were possible to use the same label on 1742 all output ports for which a particular multicast packet/cell were 1743 destined. Upstream assignment makes this possible. 1745 4.1.1.3 Other Label Allocation Methods 1747 Another option would be to make use of label values which are 1748 unique within the MPLS domain (implying that a domain-wide 1749 allocation would be needed). In this case, any stream to a 1750 particular MPLS egress node could make use of the label of that 1751 node (implying that label values do not need to be swapped at 1752 intermediate nodes). 1754 With this method of label allocation, there is a choice to be made 1755 regarding the scope over which a label is unique. One approach is 1756 to configure each node in an MPLS domain with a label which is 1757 unique in that domain. Another approach is to use a truly global 1758 identifier (for example the IEEE 48 bit identifier), where each 1759 MPLS-capable node would be stamped at birth with a truly globally 1760 unique identifier. The point of this global approach is to 1761 simplify configuration in each MPLS domain by eliminating the need 1762 to configure label IDs. 1764 4.1.2 Piggybacking on Other Control Messages 1766 While we have discussed use of an explicit MPLS LDP we note that 1767 there are several existing protocols that can be easily modified 1768 to distribute both routing/control and label information. This 1769 could be done with any of OSPF, BGP, RSVP and/or PIM. A particular 1770 architectural elegance of these schemes is that label distribution 1771 uses the same mechanisms as are used in distribution of the 1772 underlying routing or control information. 1774 When explicit label distribution is used, the routing computation 1775 and label distribution are decoupled. This implies a possibility 1776 that at some point you may either have a route to a specific 1777 destination without an associated label, and/or a label for a 1778 specific destination which makes use of a path which you are no 1779 longer using. Piggybacking label distribution on the operation of 1780 the routing protocol is one way to eliminate this decoupling. 1782 Piggybacking label distribution on the routing protocol introduces 1783 an issue regarding how to negotiate acceptable label values and 1784 what to do if an invalid label is received. This is discussed in 1785 section 4.1.3. 1787 4.1.3 Acceptable Label Values 1788 There are some constraints on which label values may be used in 1789 either allocation mode. Clearly the label values must lie within 1790 the allowable range described in the encapsulation standards that 1791 the MPLS WG will produce. The label value used must also, however, 1792 lie within a range that the peer LSR is capable of supporting. We 1793 imagine that certain machines, for example ATM switches operating 1794 as LSRs may, due to operational or implementation restrictions, 1795 support a label space more limited than that bounded by the valid 1796 range found in the encapsulation standard. This implies that an 1797 advertisement or negotiation mechanism for useable label range may 1798 be a part of the MPLS LDP. When operating over ATM using ATM 1799 forwarding hardware, due to the need for compatibility with the 1800 existing use of the ATM VPI/VCI space, it is quite likely that an 1801 explicit mechanism will be needed for label range negotiation. 1803 In addition we note that LDP may be one of a number of mechanism 1804 used to distribute labels between any given pair of LSRs. Clearly 1805 where such multiple mechanisms exist care must be taken to 1806 coordinate the allocation of label values. A single label value 1807 must have a unique meaning to the LSR that distributes it. 1809 There is an issue regarding how to allow negotiation of acceptable 1810 label values if label distribution is piggybacked with the routing 1811 protocol. In this case it may be necessary either to require 1812 equipment to accept any possible label value, or to configure 1813 devices to know which range of label values may be selected. It is 1814 not clear in this case what to do if an invalid label value is 1815 received as there may be no means of sending a NAK. 1817 A similar issue occurs with multicast traffic over broadcast 1818 media, where there may be multiple nodes which receive the same 1819 transmission (using a single label value). Here again it may be 1820 "non-trivial" how to allow n-party negotiation of acceptable label 1821 values. 1823 4.1.4 LDP Reliability 1825 The need for reliable label distribution depends upon the relative 1826 performance of L2 and L3 forwarding, as well as the relationship 1827 between label distribution and the routing protocol operation. 1829 If label distribution is tied to the operation of the routing 1830 protocol, then a reasonable protocol design would ensure that 1831 labels are distributed successfully as long as the associated 1832 route and/or reachability advertisement is distributed 1833 successfully. This implies that the reliability of label 1834 distribution will be the same as the reliability of route 1835 distribution. 1837 If there is a very large difference between L2 and L3 forwarding 1838 performance, then the cost of failing to deliver a label is 1839 significant. In this case it is important to ensure that labels 1840 are distributed reliably. Given that LDP needs to operate in a 1841 wide variety of environments with a wide variety of equipment, 1842 this implies that it is important for any LDP developed by the 1843 MPLS WG to ensure reliable delivery of label information. 1845 Reliable delivery of LDP packets may potentially be accomplished 1846 either by using an existing reliable transport protocol such as 1847 TCP, or by specifying reliability mechanisms as part of LDP (for 1848 example, the reliability mechanisms which are defined in IDRP 1849 could potentially be "borrowed" for use with LSP). 1851 TCP supports flow control {in addition to supporting reliable 1852 delivery of data). Flow control is a desirable feature which will 1853 be useful for MPLS (as well as other applications making use of a 1854 reliable transport) and therefore needs to be built into whatever 1855 reliability mechanism is used for MPLS. 1857 4.1.5 Label Purge Mechanisms 1859 Another issue to be considered is the "lifetime" of label data 1860 once it arrives at an LSR, and the method of purging label data. 1861 There are several methods that could be used either separately, or 1862 (more likely) in combination. 1864 One approach is for label information to be timed out. With this 1865 approach a lifetime is distributed along with the label value. The 1866 label value may be refreshed prior to timing out. If the label is 1867 not refreshed prior to timing out it is discarded. In this case 1868 each lifetime and timer may apply to a single label, or to a group 1869 of labels (e.g., all labels selected by the same node). 1871 Similarly, two peer nodes may make use of an MPLS peer keep-alive 1872 mechanism. This implies exchange of MPLS control packets between 1873 neighbors on a periodic basis. This in general is likely to use a 1874 smaller timeout value than label value timers (analogous to the 1875 fact that the OSPF HELLO interval is much shorter than the OSPF 1876 LSA lifetime). If the peer session between two MPLS nodes fails 1877 (due to expiration of the associated timer prior to reception of 1878 the refresh) then associated label information is discarded. 1880 If label information is piggybacked on the routing protocol then 1881 the timeout mechanisms would also be taken from the associated 1882 routing protocol (note that routing protocols in general have 1883 mechanisms to invalidate stale routing information). 1885 An alternative method for invalidating labels is to make use of an 1886 explicit label removal message. 1888 4.2 Stream Merging 1890 In order to scale O(n) (rather than O(n-squared), MPLS makes use 1891 of the concept of stream merge. This makes use of multipoint to 1892 point streams in order to allow multiple streams to be merged into 1893 one stream. 1895 4.2.1 Types of Stream Merge: 1897 There are several types of stream merge that can be used, 1898 depending upon the underlying media. 1900 When MPLS is used over frame based media merging is 1901 straightforward. All that is required for stream merge to take 1902 place is for a node to allow multiple upstream labels to be 1903 forwarded the same way and mapped into a single downstream label. 1904 This is referred to as frame merge. 1906 Operation over ATM media is less straightforward. In ATM, the data 1907 packets are encapsulated into an ATM Adaptation Layer, say AAL5, 1908 and the AAL5 PDU is segmented into ATM cells with a VPI/VCI value 1909 and the cells are transmitted in sequence. It is contingent on ATM 1910 switches to keep the cells of a PDU (or with the same VPI/VCI 1911 value) contiguous and in sequence. This is because the device that 1912 reassembles the cells to re-form the transmitted PDU expects the 1913 cells to be contiguous and in sequence, as there isn't sufficient 1914 information in the ATM cell header (unlike IP fragmentation) to 1915 reassemble the PDU with any cell order. Hence, if cells from 1916 several upstream link are transmitted onto the same downstream 1917 VPI/VCI, then cells from one PDU can get interleaved with cells 1918 from another PDU on the outgoing VPI/VCI, and result in corruption 1919 of the original PDUs by mis-sequencing the cells of each PDU. 1921 The most straightforward (but erroneous) method of merging in an 1922 ATM environment would be to take the cells from two incoming VCs 1923 and merge them into a single outgoing VCI. If this was done 1924 without any buffering of cells then cells from two or more packets 1925 could end up being interleaved into a single AAL5 frame. Therefore 1926 the problem when operating over ATM is how to avoid interleaving 1927 of cells from multiple sources. 1929 There are two ways to solve this interleaving problem, which are 1930 referred to as VC merge and VP merge. 1932 VC merge allows multiple VCs to be merged into a single outgoing 1933 VC. In order for this to work the node performing the merge needs 1934 to keep the cells from one AAL5 frame (e.g., corresponding to an 1935 IP packet) separate from the cells of other AAL5 frames. This may 1936 be done by performing the SAR function in order to reassemble each 1937 IP packet before forwarding that packet. In this case VC merge is 1938 essentially equivalent to frame merge. An alternative is to buffer 1939 the cells of one AAL5 frame together, without actually 1940 reassembling them. When the end of frame indicator is reached that 1941 frame can be forwarded. Note however that both forms of VC merge 1942 requires that the entire AAL5 frame be received before any cells 1943 corresponding to that frame be forwarded. VC merge therefore 1944 requires capabilities which are generally not available in most 1945 existing ATM forwarding hardware. 1947 The alternative for use over ATM media is VP merge. Here multiple 1948 VPs can be merged into a single VP. Separate VCIs within the 1949 merged VP are used to distinguish frames (e.g., IP packets) from 1950 different sources. In some cases, one VP may be used for the tree 1951 from each ingress node to a single egress node. 1953 4.2.2 Interoperation of Merge Options: 1955 If some nodes support stream merge, and some nodes do not, then it 1956 is necessary to ensure that the two types of nodes can 1957 interoperate within a single network. This affects the number of 1958 labels that a node needs to send to a neighbor. An upstream LSR 1959 which supports Stream Merge needs to be sent only one label per 1960 forwarding equivalence class (FEC). An upstream neighbor which 1961 does not support Stream Merge needs to be sent multiple labels per 1962 FEC. However, there is no way of knowing a priori how many labels 1963 it needs. This will depend on how many LSRs are upstream of it 1964 with respect to the FEC in question. 1966 If a particular upstream neighbor does not support stream merge, 1967 it is not known a priori how many labels it will need. The 1968 upstream neighbor may need to explicitly ask for labels for each 1969 FEC. The upstream neighbor may make multiple such requests (for 1970 one or more labels per request). When a downstream neighbor 1971 receives such a request from upstream, and the downstream neighbor 1972 does not itself support stream merge, then it must in turn ask its 1973 downstream neighbor for more labels for the FEC in question. 1975 It is possible that there may be some nodes which support merge, 1976 but have a limited number of upstream streams which may be merged 1977 into a single downstream streams. Suppose for example that due to 1978 some haardware limitation a node is capable of merging four 1979 upstream LSPs into a single downstream LSP. Suppose however, that 1980 this particular node has six upstream LSPs arriving at it for a 1981 particular Stream. In this case, this node may merge these into 1982 two downstream LSPs (corresponding to two labels that need to be 1983 obtained from the downstream neighbor). In this case, the node 1984 will need to obtain the required two labels. 1986 The interoperation of the various forms of merging over ATM is 1987 most easily described by first describing the interoperation of VC 1988 merge with non-merge. 1990 In the case where VC merge and non-merge nodes are interconnected 1991 the forwarding of cells is based in all cases on a VC (ie, the 1992 concatenation of the VPI and VCI). For each node, if an upstream 1993 neighbor is doing VC merge then that upstream neighbor requires 1994 only a single outgoing VPI/VCI for a particular FEC (this is 1995 analogous to the requirement for a single label in the case of 1996 operation over frame media). If the upstream neighbor is not doing 1997 merge, then it will require a single outgoing VPI/VCI per FEC for 1998 itself (assuming that it can be an ingress node), plus enough 1999 outgoing VPI/VCIs to map to incoming VPI/VCIs to pass to its 2000 upstream neighbors. The number required will be determined by 2001 allowing the upstream nodes to request additional VPI/VCIs from 2002 their downstream neighbors. 2004 A similar method is possible to support nodes which perform VP 2005 merge. In this case the VP merge node, rather than requesting a 2006 single VPI/VCI or a number of VPI/VCIs from its downstream 2007 neighbor, instead may request a single VP (identified by a VPI). 2008 Furthermore, suppose that a non-merge node is downstream from two 2009 different VP merge nodes. This node may need to request one 2010 VPI/VCI (for traffic originating from itself) plus two VPs (one 2011 for each upstream node). 2013 In order to support all of VP merge, VC merge, and non-merge, it 2014 is therefore necessary to allow upstream nodes to request a 2015 combination of zero or more VC identifiers (consisting of a 2016 VPI/VCI), plus zero or more VPs (identified by VPIs). VP merge 2017 nodes would therefore request one VP. VC merge node would request 2018 only a single VPI/VCI (since they can merge all upstream traffic 2019 into a single VC). Non-merge nodes would pass on any requests that 2020 they get from above, plus request a VPI/VCI for traffic that they 2021 originate (if they can be ingress nodes). However, non-merge nodes 2022 which can only do VC forwarding (and not VP forwarding) will need 2023 to know which VCIs are used within each VP in order to install the 2024 correct VCs in its forwarding table. A detailed description of how 2025 this could work can be found in [ATMVP]. 2027 4.2.3 Coordination of the VCI space with VP Merge: 2029 VP merge requires that the VCIs be coordinated to ensure 2030 uniqueness. There are a number of ways in which this may be 2031 accomplished: 2033 1. Each node may be pre-configured with a unique VCI value 2034 (or values). 2036 2. Some one node (most likely they root of the multipoint to 2037 point tree) may coordinate the VCI values used within the 2038 VP. A protocol mechanism will be needed to allow this to 2039 occur. How hard this is to do depends somewhat upon 2040 whether the root is otherwise involved in coordinating the 2041 multipoint to point tree. For example, allowing one node 2042 (such as the root) to coordinate the tree may be useful 2043 for purposes of coordinating load sharing (see section 2044 4.10). Thus whether or not the issue of coordinating the 2045 VCI space is significant or trivial may depend upon other 2046 design choices which at first glance may have appeared to 2047 be independent protocol design choices. 2049 3. Other unique information such as portions of a class B or 2050 class C address may be used to provide a unique VCI value. 2052 4. Another alternative is to implement a simple hardware 2053 extension in the ATM switches to keep the VCI values 2054 unique by dynamically altering them to avoid collision. 2056 VP merge makes less efficient use of the VPI/VCI space (relative 2057 to VC merge). When VP merge is used, the LSPs may not be able to 2058 transit public ATM networks that don't support SVP. 2060 4.2.4 Buffering Issues Related To Stream Merge: 2062 There is an issue regarding the amount of buffering required for 2063 frame merge, VC merge, and VP merge. Frame merge and VC merge 2064 requires that intermediate points buffer incoming packets until 2065 the entire packet arrives. This is essentially the same as is 2066 required in traditional IP routers. 2068 VP merge allows cells to be transmitted by intermediate nodes as 2069 soon as they arrive, reducing the buffering and latency at 2070 intermediate nodes. However, the use of VP merge implies that 2071 cells from multiple packets will arrive at the egress node 2072 interleaved on separate VCIs. This in turn implies that the egress 2073 node may have somewhat increased buffering requirements. To a 2074 large extent egress nodes for some destinations will be 2075 intermediate nodes for other destinations, implying that increase 2076 in buffers required for some purpose (egress traffic) will be 2077 offset by a reduction in buffers required for other purposes 2078 (transit traffic). Also, routers today typically deal with high- 2079 fanout channelized interfaces and with multi-VC ATM interfaces, 2080 implying that the requirement of buffering simultaneously arriving 2081 cells from multiple packets and sources is something that routers 2082 typically do today. This is not meant to imply that the required 2083 buffer size and performance is inexpensive, but rather is meant to 2084 observe that it is a solvable issue. 2086 ATM equipment provides traffic shaping, in which the ATM cells 2087 associated with any one particular VC are intentionally not 2088 transmitted back to back, but rather are spread out over time in 2089 order to place less short term buffering load on switches. Since 2090 VC merge requires that all cells associated with a particular 2091 packet (or a particular AAL5 frame) are buffered before any cell 2092 from the packet can be transmitted, VC merge defeats much of the 2093 intent of traffic shaping. An advantage of VP merge is that it 2094 preserves traffic shaping through ATM switches acting as LSRs. 2095 While traffic shaping may generally be expected to reduce the 2096 buffering requirements in ATM switches (whether acting as MPLS 2097 switches or as native ATM switches), the precise effect of traffic 2098 shaping has not been studied in the context of MPLS. 2100 4.3 Loop Handling 2102 Generally, methods for dealing with loops can be split into three 2103 categories: Loop Survival makes use of methods which minimize the 2104 impact of loops, for example by limiting the amount of network 2105 resources which can be consumed by a loop; Loop Detection allows 2106 loops to be set up, but later detects these loops and eliminates 2107 them; Loop Prevention provides methods for avoiding setting up L2 2108 forwarding in a way which results in a L2 loop. 2110 Note that we are concerned here only with loops that occur in L2 2111 forwarding. Transient loops at L3 will continue to be part of the 2112 normal IP operation, and will be handled the way that IP has been 2113 handling loops for years (see section 3.5). 2115 Loop Survival: 2117 Loop Survival refers to methods that are used to allow the network 2118 to operate well even though short term transient loops may be 2119 formed by the routing protocol. The basic approach to loop 2120 survival is to limit the amount of network resources which are 2121 consumed by looping packets, and to minimize the effect on other 2122 (non-looping) traffic. Note that loop survival is the method used 2123 by conventional IP forwarding, and is therefore based on long and 2124 relatively successful experience in the Internet. 2126 The most basic method for loop survival is based on the use to a 2127 TTL (Time To Live) field. The TTL field is decremented at each 2128 hop. If the TTL field reaches zero, then the packet is discarded. 2129 This method works well over those media which has a TTL field. 2130 This explicitly includes L3 IP forwarding. Also, assuming that the 2131 core MPLS specifications will include definition of a "shim" MPLS 2132 header for use over those media which do not have their own 2133 labels, in order to carry labels for use in forwarding of user 2134 data, the shim header will also include a TTL field. 2136 However, there is considerable interest in using MPLS over L2 2137 protocols which provide their own labels, with the L2 label used 2138 for MPLS forwarding. Specific L2 protocols which offer a label for 2139 this purpose include ATM and Frame Relay. However, neither ATM nor 2140 Frame Relay have a TTL field. This implies that this method cannot 2141 be used when basic ATM or Frame Relay forwarding is being used. 2143 Another basic method for loop survival is the use of dynamic 2144 routing protocols which converge rapidly to non-looping paths. In 2145 some instances it is possible that congestion caused by looping 2146 data could effect the convergence of the routing protocol (see 2147 section 3.5). MPLS should be designed to prevent this problem from 2148 occurring. Given that MPLS uses the same routing protocols as are 2149 used for IP, this method does not need to be discussed further in 2150 this framework document. 2152 Another possible tool for loop survival is the use of fair 2153 queuing. This allows unrelated flows of user data to be placed in 2154 different queues. This helps to ensure that a node which is 2155 overloaded with looping user data can nonetheless forward 2156 unrelated non-looping data, thereby minimizing the effect that 2157 looping data has on other data. We cannot assume that fair queuing 2158 will always be available. In practice, many fair queuing 2159 implementations merge multiple streams into one queue (implying 2160 that the number of queues used is less than the number of user 2161 data flows which are present in the network). This implies that 2162 any data which happens to be in the same queue with looping data 2163 may be adversely effected. 2165 Loop Detection: 2167 Loop Detection refers to methods whereby a loop may be set up at 2168 L2, but the loop is subsequently detected. When the loop is 2169 detected, it may be broken at L2 by dropping the label 2170 relationship, implying that packets for a set of destinations must 2171 be forwarded at L3. 2173 A possible method for loop detection is based on transmitting a 2174 "loop detection" control packet (LDCP) along the path towards a 2175 specified destination whenever the route to the destination 2176 changes. This LDCP is forwarded in the direction that the label 2177 specifies, with the labels swapped to the correct next hop value. 2178 However, normal L2 forwarding cannot be used because each hop 2179 needs to examine the packet to check for loops. The LDCP is 2180 forwarded towards that destination until one of the following 2181 happens: (i) The LDCP reaches the last MPLS node along the path 2182 (ie the next hop is either a router which is not participating in 2183 MPLS, or is the final destination host); (ii) The TTL of the LDCP 2184 expires (assuming that the control packet uses a TTL, which is 2185 optional but not absolutely necessary), or (iii) The LDCP returns 2186 to the node which originally transmitted it. If the latter occurs, 2187 then the packet has looped and the node which originally 2188 transmitted the LDCP stops using the associated label, and instead 2189 uses L3 forwarding for the associated destination addresses. One 2190 problem with this method is that once a loop is detected it is not 2191 known when the loop clears. One option would be to set a timer, 2192 and to transmit a new LDCP when the timer expires. 2194 Loop detection may also be achieved via a Path Vector control 2195 message. A Path Vector contains a list of the LSRs that that label 2196 distribution Control message has traversed. Each LSR which 2197 propagates a control packet to either create or modify an LSP adds 2198 its own unique identifier to the Path Vector list. An LSR that 2199 receives a message with a Path Vector that contains its own 2200 identifier detects that the message has traversed a loop. 2202 An alternate method counts the hops to each egress node, based on 2203 the routes currently available. Each node advertises its distance 2204 (in hop counts) to each destination. An egress node advertises the 2205 destinations that it can reach directly with an associated hop 2206 count of zero. For each destination, a node computes the hop count 2207 to that destination based on adding one to the hop count 2208 advertised by its actual next hop used for that destination. When 2209 the hop count for a particular destination changes, the hop counts 2210 needs to be readvertised. 2212 In addition, the first of the loop prevention schemes discussed 2213 below may be modified to provide loop detection. 2215 Loop Prevention: 2217 Loop prevention makes use of methods to ensure that loops are 2218 never set up at L2. This implies that the labels are not used 2219 until some method is used to ensure that following the label 2220 towards the destination, with associated label swaps at each 2221 switch, will not result in a loop. Until the L2 path (making use 2222 of assigned labels) is available, packets are forwarded at L3. 2224 Loop prevention requires explicit signaling of some sort to be 2225 used when setting up an L2 stream. 2227 One method of loop prevention requires that labels be propagated 2228 starting at the egress switch. The egress switch signals to 2229 neighboring switches the label to use for a particular 2230 destination. That switch then signals an associated label to its 2231 neighbors, etc. The control packets which propagate the labels 2232 also include the path to the egress (as a list of routerIDs). Any 2233 looping control packet can therefore be detected and the path not 2234 set up to or past the looping point. 2236 During routing changes, a diffusion mechanism may be used to 2237 prevent the formation of L2 loops. The purpose of the diffusion 2238 computation is to prune the tree of an LSR that has detected a 2239 route change for a given FEC, such that all upstream LSR's from 2240 the tree that would be on a looping path are removed. It is only 2241 after those LSR's are removed from the tree that it is safe to 2242 replace the old LSP with the new LSP (and the old LSP can be 2243 released). 2245 The diffusion mechanism is an extension of the Path Vector 2246 mechanism. An LSR, D, that detects that the next hop for an FEC 2247 has changed, transmits a query message with a Path Vector 2248 containing its unique identifier to its upstream neighbors. An 2249 LSR, U, that receives such a query will determine if D is the next 2250 hop for the given FEC. If not, then U may return "OK", meaning 2251 that as far as node U is concerned it is safe for node D to switch 2252 over to the new LSP. If node D is the next hop, then node U checks 2253 the Path Vector to see if its unique identifier is already 2254 present. If so, then a route loop is detected; in this case, node 2255 U responds with a "LOOP" message, and node D will prune node U off 2256 of its tree. If no loop is detected, then node U adds its unique 2257 identifier to the Path Vector, and propagates the query message to 2258 each of its upstream neighbors. The diffusion computation 2259 continues to propagate upstream along each of the paths in the 2260 tree until an ingress or looping LSR is found. Once an LSR has 2261 received a response from each of its upstream neighbors, it may 2262 then return an "OK" message to its downstream neighbor. When the 2263 original node, node D, receives a response from each of its 2264 neighbors, it is safe to replace the old LSP with the new one 2265 because all the paths that would have looped have been pruned from 2266 the tree. [ARCH] 2268 An alternative method of loop prevention is the "colored" 2269 mechanism. The heart of the Colored Thread (CT) algorithm 2270 propagates a procedure that gives a color to each link along the 2271 LSP in the downstream direction. The color is composed of two 2272 fixed-length objects; the address of the node that created the 2273 color and a local identifier that is unique within the creating 2274 node. A loop-free LSP is established when the node that triggered 2275 the coloring procedure receives an acknowledgment for the 2276 procedure from its downstream node. During the coloring procedure, 2277 a set of attributes (color, hop count, TTL), referred to as a 2278 thread, is propagated downstream. A node that finds a change in 2279 the next hop creates a color and passes it on the outgoing link to 2280 the new next hop. If a node receives a color on an incoming link, 2281 it either (a) passes the received color or (b) creates a new color 2282 and passes it, on the outgoing link to the next hop. The coloring 2283 procedure is propagated downstream until the LSP turns out to be 2284 loop-free or a loop is found. In case (i), a positive 2285 acknowledgment (ACK) is returned hop-by-hop to upstream nodes. In 2286 case (ii), the coloring procedure is stalled and no ACK is 2287 returned. [LOOP-COLOR] 2289 Another option is to use explicit routing to set up label bindings 2290 from the egress switch to each ingress switch. This precludes the 2291 possibility of looping, since the entire path is computed by one 2292 node. This also allows non-looping paths to be set up provided 2293 that the egress switch has a view of the topology which is 2294 reasonably close to reality (if there are operational links which 2295 the egress switch doesn't know about, it will simply pick a path 2296 which doesn't use those links; if there are links which have 2297 failed but which the the egress switch thinks are operational, 2298 then there is some chance that the setup attempt will fail but in 2299 this case the attempt can be retried on a separate path). Note 2300 therefore that non-looping paths can be set up with this method in 2301 many cases where distributed routing plus hop by hop forwarding 2302 would not actually result in non-looping paths. This method is 2303 similar to the method used by standard ATM routing to ensure that 2304 SVCs are non-looping [PNNI]. 2306 Explicit routing is only applicable if the routing protocol gives 2307 the egress switch sufficient information to set up the explicit 2308 route, implying that the protocol must be either a link state 2309 protocol (such as OSPF) or a path vector protocol (such as BGP). 2310 Source routing therefore is not appropriate as a general approach 2311 for use in any network regardless of the routing protocol. This 2312 method also requires some overhead for the call setup before label- 2313 based forwarding can be used. If the network topology changes in a 2314 manner which breaks the existing path, then a new path will need 2315 to be explicit routed from the egress switch. Due to this overhead 2316 this method is probably only appropriate if other significant 2317 advantages are also going to be obtained from having a single node 2318 (the egress switch) coordinate the paths to be used. Examples of 2319 other reasons to have one node coordinate the paths to a single 2320 egress switch include: (i) Coordinating the VCI space where VP 2321 merge is used (see section 4.2); and (ii) Coordinating the routing 2322 of streams from multiple ingress switches to one egress switch so 2323 as to balance the load on multiple alternate paths through the 2324 network. 2326 In principle the explicit routing could also be done in the 2327 alternate direction (from ingress to egress). However, this would 2328 make it more difficult to merge streams if stream merge is to be 2329 used. This would also make it more difficult to coordinate (i) 2330 changes to the paths used, (ii) the VCI space assignments, and 2331 (iii) load sharing. This therefore makes explicit routing more 2332 difficult, and also reduces the other advantages that could be 2333 obtained from the approach. 2335 If label distribution is piggybacked on the routing protocol (see 2336 section 4.1.2), then loop prevention is only possible if the 2337 routing protocol itself does loop prevention. 2339 What To Do If A Loop Is Detected: 2341 With all of these schemes, if a loop is known to exist then the L2 2342 label-swapped path is not set up. This leads to the obvious 2343 question of what does an MPLS node do when it doesn't have a label 2344 for a particular destination, and a packet for that destination 2345 arrives to be forwarded? If possible, the packet is forwarded 2346 using normal L3 (IP) forwarding. There are two issues that this 2347 raises: (i) What about nodes which are not capable of L3 2348 forwarding; (ii) Given the relative speeds of L2 and L3 2349 forwarding, does this work? 2351 Nodes which are not capable of L3 forwarding obviously can't 2352 forward a packet unless it arrives with a label, and the 2353 associated next hop label has been assigned. Such nodes, when they 2354 receive a packet for which the next hop label has not been 2355 assigned, must discard the packet. It is probably safe to assume 2356 that if a node cannot forward an L3 packet, then it is probably 2357 also incapable of forwarding an ICMP error report that it 2358 originates. This implies that the packet will need to be silently 2359 discarded in this case. 2361 In many cases L2 forwarding will be significantly faster than L3 2362 forwarding (allowing faster forwarding is a significant motivation 2363 behind the work on MPLS). This implies that if a node is 2364 forwarding a large volume of traffic at L2, and a change in the 2365 routing protocol causes the associated labels to be lost 2366 (necessitating L3 forwarding), in some cases the node will not be 2367 capable of forwarding the same volume of traffic at L3. This will 2368 of course require that packets be discarded. However, in some 2369 cases only a relatively small volume of traffic will need to be 2370 forwarded at L3. Thus forwarding at L3 when L2 is not available is 2371 not necessarily always a problem. There may be some nodes which 2372 are capable of forwarding equally fast at L2 and L3 (for example, 2373 such nodes may contain IP forwarding hardware which is not 2374 available in all nodes). Finally, when packets are lost this will 2375 cause TCP to backoff, which will in turn reduce the load on the 2376 network and allow the network to stabilize even at reduced 2377 forwarding rates until such time as the label bindings can be 2378 reestablished. 2380 In many cases MPLS may be used for traffic engineering. In these 2381 cases failure of an LSP may cause packets which would have taken 2382 that LSP to be forwarded (using L3 forwarding) along paths which 2383 are not consistent with the traffic engineering solution. This 2384 could in turn cause congestion. In these cases packets may need to 2385 be discarded even if the LSRs are capable of full line rate L3 2386 forwarding. This may cause problems very similar to those 2387 discussed in the previous paragraph. 2389 Note that in most cases loops will be caused either by 2390 configuration errors, or due to short term transient problems 2391 caused by the failure of a link. If only one link goes down, and 2392 if routing creates a normal "tree-shaped" set of paths to any one 2393 destination, then the failure of one link somewhere in the network 2394 will effect only one link's worth of data passing through any one 2395 node in the network. This implies that if a node is capable of 2396 forwarding one link's worth of data at L3, then in many or most 2397 cases it will have sufficient L3 bandwidth to handle looping data. 2399 4.4 Interoperation with NHRP 2401 When label switching is used over ATM, and there exists an LSR 2402 which is also operating as a Next Hop Client (NHC), the 2403 possibility of direct interaction arises. That is, could one 2404 switch cells between the two technologies without reassembly? To 2405 enable this several important issues must be addressed. 2407 The encapsulation must be acceptable to both MPLS and NHRP. If 2408 only a single label is used, then the null encapsulation could be 2409 used. Other solutions could be developed to handle label stacks. 2411 NHRP must understand and respect the granularity of a stream. 2413 Currently NHRP resolves an IP address to an ATM address. The 2414 response may include a mask indicating a range of addresses. 2415 However, any VC to the ATM address is considered to be a viable 2416 means of packet delivery. Suppose that an NHC NHRPs for IP address 2417 A and gets back ATM address 1 and sets up a VC to address 1. Later 2418 the same NHC NHRPs for a totally unrelated IP address B and gets 2419 back the same ATM address 1. In this case normal NHRP behavior 2420 allows the NHC to use the VC (that was set up for destination A) 2421 for traffic to B [NHRP]. 2423 Note: In this section we will refer to a VC set up as a result of 2424 an NHRP query/response as a shortcut VC. 2426 If one expects to be able to label switch the packets being 2427 received from a shortcut VC, then the label switch needs to be 2428 informed as to exactly what traffic will arrive on that VC and 2429 that mapping cannot change without notice. Currently there exists 2430 no mechanism in the defined signaling of an shortcut VC. Several 2431 means are possible. A binding, equivalent to the binding in LDP, 2432 could be sent in the setup message. Alternatively, the binding of 2433 prefix to label could remain in an LDP session (or whatever means 2434 of label distribution as appropriate) and the setup could carry a 2435 binding of the label to the VC. This would leave the binding 2436 mechanism for shortcut VCs independent of the label distribution 2437 mechanism. 2439 A further architectural challenge exists in that label switching 2440 is inherently unidirectional whereas ATM is bi-directional. The 2441 above binding semantics are fairly straight-forward. However, 2442 effectively using the reverse direction of a VC presents further 2443 challenges. 2445 Label switching must also respect the granularity of the shortcut 2446 VC. Without VC merge, this means a single label switched flow must 2447 map to a VC. In the case of VC merge, multiple label switched 2448 streams could be merged onto a single shortcut VC. But given the 2449 asymmetry involved, there is perhaps little practical use. 2451 Another issue is one of practicality and usefulness. What is sent 2452 over the VC must be at a fine enough granularity to be label 2453 switched through receiving domain. One potential place where the 2454 two technologies might come into play is in moving data from one 2455 campus via the wide-area to another campus. In such a scenario, 2456 the two technologies would border precisely at the point where 2457 summarization is likely to occur. Each campus would have a 2458 detailed understanding of itself, but not of the other campus. The 2459 wide-area is likely to have summarized knowledge only. But at such 2460 a point level 3 processing becomes the likely solution. 2462 4.5. Operation in a hierarchy 2464 MPLS allows hierarchical operation, through use of a label stack. 2465 This allows MPLS to simultaneously be used for routing at a fine 2466 grain level (for example, between individual routers within an 2467 ISP) and at a higher "area by area" or "domain by domain" level. 2469 4.5.1 Example of Hierarchical Operation 2471 Figure 1 illustrates an example of how MPLS may operate in a 2472 hierarchy. This example illustrates three transit routing domains 2473 (Domain #1, #2, and #3). For example, these three domains may 2474 represent internet service providers. Domain Boundary Routers are 2475 illustrated in each domain (routers R1 and R2 in domain #1, 2476 routers R3 and R8 in domain #2, and routers R9 and R10 in domain 2477 #3. Suppose that these domain boundary routers are operating BGP. 2479 Internal routers are not illustrated in domains 1 and 3. However, 2480 internal routers are illustrated within domain #2. In particular, 2481 the path between routers R3 and R8 follows the internal routers 2482 R4, R5, R6, and R7 within domain #2. 2484 ................. ........................ ................ 2485 . . . . . . 2486 . . . . . . 2487 .R1 R2------R3 R8------R9 R10. 2488 . . . \ / . . . 2489 . . . R4---R5---R6---R7 . . . 2490 . . . . . . 2491 . Domain#1 . . Domain#2 . . Domain#3 . 2492 ................. ........................ ................ 2494 Example of the Use of MPLS in a Hierarchy 2496 In this example there are two levels of routing taking place. For 2497 example, OSPF may be used for routing within Domain #2. In this 2498 case the routers R3, R4, R5, R6, R7, and R8 may be running OSPF 2499 amongst themselves in order to compute routes within Domain #2. 2500 The domain boundary routers (R1, R2, R3, R8, R9, and R10) operate 2501 BGP in order to determine paths between routing domains. 2503 MPLS allows label forwarding to be done independently at multiple 2504 levels. In this example, MPLS may be used at the BGP level 2505 (between routers R1, R2, R3, R8, R9, and R10) and at the OSPF 2506 level (between routers R4, R5, R6, and R7). Thus when the IP 2507 packet traverses Domain number 2, it will contain two labels, 2508 encoded as a "label stack". The higher level label would be used 2509 between routers R3 and R8. This would be encapsulated inside a 2510 header specifying a lower level label used within domain 2. 2512 Consider the forwarding operation that takes place at router R3. 2513 In this case, R3 will receive a packet from R2 containing a single 2514 label (the BGP level label). R3 will need to swap BGP level labels 2515 in order to put the label that R8 expects. R3 will also need to 2516 add an OSPF-level label, as is expected by R4. R3 therefore 2517 "pushes down" the BGP level label in the label stack, by adding a 2518 lower level label. Also note that the actual label swapping 2519 operation performed by R3 can be optimized to allow very simple 2520 forwarding: R3 receives a single incoming label from R2, and can 2521 map this label into the new label header to be prepended to the 2522 packet, it just happens that the new label header to be added by 2523 R3 contains two labels rather than one. 2525 4.5.2 Components Required for Hierarchical Operation 2527 In order for MPLS to operate in a hierarchy, there are three 2528 things which must be accomplished: 2530 - Hierarchical Label Exchange in LDP 2531 The Label Distribution Protocol needs to exchange labels at 2532 each level of the hierarchy. In our example, R3 needs to 2533 exchange label bindings with R8 for operation at the BGP 2534 level. At the same time, R3 needs to exchange label 2535 bindings with R4 (and R4 needs to exchange label bindings 2536 with R5) for operation at the OSPF level. The control 2537 component for hierarchical labeling is essentially the same 2538 as that for single level tagging, except that labels are 2539 exchanged not just among physically adjacent LSRs but 2540 between those switching on the same level in the tag stack. 2542 - Label Stack 2543 Multiple labels need to be carried in data packets. For 2544 example, when a data packet is being carried across domain 2545 #2, the data packet needs to be encapsulated in a header 2546 which carries BGP level label, and the resulting packet 2547 needs to be carried in a header which carries an OSPF level 2548 label. 2550 - Configuration 2551 It is necessary for routers to know when hierarchical label 2552 switching is being used. 2554 4.5.3 Some Restrictions on Use of Hierarchical MPLS 2556 Consider the example in figure 1. In this case, the BGP-level 2557 label is encoded by router R1. Label swapping is employed for 2558 packet forwarding at R2, R3, R8, and R9. This is only possible if 2559 R1 knows the right label to use, implying that the granularity 2560 used in mapping packets to forwarding equivalence classes is the 2561 same at routers R2, R3, R8, and R9. 2563 We can consider some specific examples to illustrate the issue: 2565 Suppose that the destination host is within domain 3. In this 2566 case, it is very likely that router R9 will forward the packet 2567 based on a finer grain than was used previously. For example, a 2568 relatively short address prefix may be used for advertising the 2569 addresses reachable in domain 3, while longer (more specific) 2570 address prefixes may be used for specific areas or subnets within 2571 domain 3. In this case router R1 may assign a BGP level label to 2572 the packet, and label based forwarding at the BGP level may be 2573 used by routers R1, R2, R3, and R8. However, router R9 will need 2574 to make use of layer 3 forwarding. 2576 Alternatively, suppose that domain 3 is an Internet Service 2577 Provider, which offers service to multiple routing domains. 2578 Suppose that in this case domain 3 makes use of a single CIDR 2579 address block (based on a single address prefix), with smaller 2580 address blocks (corresponding to longer address prefixes) assigned 2581 to each of multiple domains who get their Internet service from 2582 domain 3. Suppose that the destination for a particular IP packet 2583 is contained in one of these smaller domains whose addresses are 2584 contained in the larger address block assigned to and administered 2585 by domain 3. Again in this case router R9 will need to make use of 2586 label based forwarding. 2588 Let's consider another possible complication: Suppose that router 2589 R1 is an MPLS node, but that some of the internal routers within 2590 domain 1 do not know about MPLS. In this case, suppose that R1 2591 encapsulates an IP packet in an MPLS header in order to carry the 2592 BGP level label. In this case the non-MPLS-capable routers within 2593 domain 1 will not know what to do with the MPLS header. This 2594 implies that MPLS can be used at a higher level (such as between 2595 the border routers R1 and R2 in our example) only if either the 2596 lower level routers (such as the routers within domain 1)are also 2597 using MPLS, or the MPLS header is itself encapsulated within an IP 2598 header for transmission across the domain. 2600 These examples imply that there are some cases where IP forwarding 2601 will be required in a hierarchy. While hierarchical MPLS may be 2602 useful in many cases, it does not replace layer 3 forwarding. 2604 4.5.4 The Relationship between MPLS hierarchy and Routing Hierarchy 2606 4.5.4.1 Stacked Labels in a Flat Routing Environment 2608 The label stacking mechanism can be useful in some scenarios 2609 independent of routing hierarchy. 2611 The basic concept of stacking is to provide a mechanism to 2612 segregate streams within a switched path. Under normal operation, 2613 when packets are encapsulated into a single L2 header, if multiple 2614 streams are forwarded into a switched path, it will require L3 2615 processing to segregate a certain stream at the end of the 2616 switched path. The stacking mechanism provides an easy way to 2617 maintain the identity of various streams which are merged into a 2618 single switched path. 2620 One useful application of this technique is in Virtual Private 2621 Networks. The packets can be switched both at the ingress and 2622 egress nodes of the provider network. A packet coming in at one 2623 end of a customer network contains an encapsulated header with the 2624 VPN label. At the VPN ingress node, the header is "popped", to 2625 provide the label for switching through the VPN. Further, this 2626 header is then "pushed" with an encapsulation of the far end 2627 customer label. At the VPN egress node, the packet header is 2628 "popped" again, and the new header provides the label for 2629 switching through the customer site. This enables one to provide 2630 customers with benefits of VPN with end-to-end switching for 2631 optimal performance. 2633 Another interesting use can be in conjunction with RSVP flows. In 2634 RSVP, senders flows can be logically merged under a single 2635 resource reservation using the Shared and the Wildcard filters. 2636 The stacking mechanism can be used to merge flows into a single 2637 label and the shared QoS can be applied to the single label on top 2638 of the stack. Since sender flows within the merged switched path 2639 maintain their identity, it is easy to demerge at a downstream 2640 node without requiring L3 processing of the packets. Another 2641 similar application can be merging of several premium service 2642 flows with similar QoS into a single switched path. This helps in 2643 conserving labels in backbone of a large networks. 2645 Yet another useful application can be DVMRP tunnels similar in 2646 concept to the DVMRP tunnels used in the existing Mbone. The 2647 ingress node to the DVMRP switched tunnels encapsulates the label 2648 learned from the egress node of the DVMRP tunnel for a particular 2649 (S,G) pair before forwarding packets into the DVMRP tunnel. The 2650 egress node of the tunnel just pops the top label and switches the 2651 packet based on the interior label. 2653 Note that the use of tunnels can be also quite beneficial in a non- 2654 hierarchical environment. Take for example the case where a domain 2655 contains a subset of MPLS nodes. The MPLS egress can advertise 2656 labels for the routes which are within the domain, but are 2657 external to the MPLS core. The ingress node can encapsulate 2658 packets for these destinations within the header for the 2659 aggregated switched path that crosses the MPLS domain. 2661 It is not evident if this technique has any useful application in 2662 a flat routing domain, but can be used in conjunction with 2663 explicit routing when providing specialized services. The multiple 2664 levels of encapsulation can also be used like loose source 2665 routing. 2667 4.5.4.2 Flat labels in a Hierarchical Routing Environment 2669 It is also possible in some environments to use a single level of 2670 label in a network using hierarchical routing. This is for example 2671 possible in the case of a two level OSPF network in which the 2672 primary purpose of the network is to support external routes. 2673 Specifically, (depending upon the types of area hierarchy used) 2674 OSPF allows external routes to be advertised throughout an OSPF 2675 routing domain, with each external route associated with the 2676 routerID of the router with reachability to the specific route. 2677 This implies that it is possible to set up an LSP to every router 2678 in the routing domain, and then use the LSP for packets destined 2679 to the associated external routes. 2681 4.5.4.3 Configuration of the Hierarchy 2683 The possibility of having a variety of different relationships 2684 between the routing hierarchy and the MPLS hierarchy leads to an 2685 obvious question: How is the relationship between the two 2686 hierarchies to be determined? At first glance it would seem that 2687 this generality leads to a relatively complex configuration issue, 2688 and it could be difficult to ensure consistent configuration of 2689 the network. 2691 One possible solution is to have the MPLS hierarchy default to 2692 using the same hierarchy structure as is used for routing, with 2693 each area and domain boundary (as used by routing) also implying 2694 an MPLS domain boundary. This would allow the normal default 2695 operation to conform to the type of operation that we might expect 2696 to be used in most situations, and would allow a common means of 2697 interoperation which we would expect all vendors of MPLS compliant 2698 equipment to support. 2700 4.5.5 Some Advantages of Hierarchical MPLS 2702 The use of hierarchical MPLS allows the routers internal to a 2703 transit routing domain to be isolated from the BGP-level routing 2704 information. In our example network, routers R4, R5, R6, and R7 2705 can forward packets based solely on the lower level label. These 2706 internal routers do not need to know anything at all about higher 2707 level IP routing. Note that this advantage is not available in 2708 conventional IP forwarding: If the internal routers within a 2709 routing domain forward IP packets based on the destination IP 2710 address, then the internal routers need to know which route to use 2711 for any particular destination IP address. By combining 2712 hierarchical routing with label stacks MPLS is able to decouple 2713 the exterior and interior protocols. MPLS switches within a domain 2714 (interior switches) need only carry the reachability information 2715 for nodes in the domain. The MPLS border switches for the domain 2716 still, of course, carry the external routes. 2718 Use of hierarchical MPLS also extends the simpler forwarding 2719 offered by MPLS to domain boundary routers. 2721 MPLS places no bound on the number of labels that may be present 2722 in a label stack. In principal this means that MPLS can support 2723 multiple levels of routing hierarchy. 2725 4.6 Interoperation of MPLS systems with "Conventional" ATM 2727 If we consider the implementation of MPLS on ATM switches we can 2728 imagine several possibilities. 2730 We might remove ATM Forum control plane completely. This is the 2731 approach taken by Ipsilon in their IP Switching approach, and 2732 allows ATM switches to operate as MPLS LSRs. 2734 Alternately, we could build a system that supports a "Ships in the 2735 night" (SIN) mode of operation where the ATM Forum and MPLS 2736 control planes both run on the same hardware but are isolated from 2737 each other, ie, they do not interact. This allows a single device 2738 to simultaneously operate as both an MPLS LSR and an ATM switch. 2740 We feel that the MPLS architecture should allow both of these 2741 models. We note, however, that neither of them addresses the issue 2742 of operation of MPLS over a public ATM network, ie over a network 2743 that supports tariffed access to PVCs and ATM Forum SVCs. Because 2744 public ATM service exists and will, presumably, become more 2745 pervasive in the future we feel that another model needs to be 2746 included in the architecture and be supported by MPLS. We call 2747 this model the "integrated" model. In essence it is the same as 2748 the SIN model but without the restriction that the two control 2749 planes are isolated. In the integrated model the MPLS control 2750 plane is able to use the ATM control plane to setup SVCs as 2751 needed. An example of this integrated model that allows the 2752 coexistence and interoperation between ATM and MPLS is the CSR 2753 proposal from Toshiba. 2755 Note that there is a distinction relevant to the protocol 2756 specification process between the SIN and the Integrated approach. 2757 SIN does not require specification other than to require that it 2758 be transparent to both the MPLS and ATM control planes (ie neither 2759 should know of the others existence). Realisation of SIN on a 2760 particular machine is purely an engineering challenge for the 2761 implementors. The Integrated model on the other hand requires 2762 specification of procedures for the use of SVCs and association of 2763 labels with them. 2765 4.7 Multicast 2767 This section is FFS. 2769 4.8 Multipath 2771 Many IP routing protocols support the notion of equal-cost 2772 multipath routes, in which a router maintains multiple next hops 2773 for one destination prefix when two or more equal-cost paths to 2774 the prefix exist. There are a few possible approaches for handling 2775 multipath with MPLS. 2777 In this discussion we will use the term "multipath node" to mean a 2778 node which is keeping track of multiple switched paths from itself 2779 for a single destination. 2781 The first approach maintains a separate switched path from each 2782 ingress node via one or more multipath nodes to a merge point. 2783 This requires MPLS to distinguish the separate switched paths, so 2784 that learning of a new switched path is not misinterpreted as a 2785 replacement of the same switched path. This also requires an 2786 ingress MPLS node be capable of distributing the traffic among the 2787 multiple switched paths. This approach preserves switching 2788 performance, but at a cost of proliferating the number of switched 2789 paths. For example, each switched path consumes a distinct label. 2791 The second approach establishes only one switched path from any 2792 one ingress node to a destination. However, when the paths from 2793 two different ingress nodes happen to arrive at the same node, 2794 that node may use different paths for each (implying that the node 2795 becomes a multipath node). Thus the switched path chosen by the 2796 multipath node may assign a different downstream path to each 2797 incoming stream. This conserves switched paths and maintains 2798 switching performance, but cannot balance loads across downstream 2799 links as well as the other approaches, even if switched paths are 2800 selectively assigned. In issue with this approach is that the L2 2801 path may be different from the normal L3 path, as traffic that 2802 otherwise would have taken multiple distinct paths is forced onto 2803 a single path. 2805 The third approach allows a single stream arriving at a multipath 2806 node to be split into multiple streams, by using L3 forwarding at 2807 the multipath node. For example, the multipath node might choose 2808 to use a hash function on the source and destination IP addresses, 2809 in order to avoid misordering packets between any one IP source 2810 and destination. This approach conserves switched paths at the 2811 cost of switching performance. 2813 4.9 Host Interactions 2815 There are a range of options for host interaction with MPLS: 2817 The most straightforward approach is no host involvement. Thus 2818 host operation may be completely independent of MPLS, rather hosts 2819 operate according to other IP standards. If there is no host 2820 involvement then this implies that the first hop requires an L3 2821 lookup. 2823 If the host is ATM attached and doing NHRP, then this would allow 2824 the host to set up a Virtual Circuit to a router. However this 2825 brings up a range of issues as was discussed in section 4.4 2826 ("interoperation with NHRP"). 2828 On the ingress side, it is reasonable to consider having the first 2829 hop LSR provide labels to the hosts, and thus have hosts attach 2830 labels for packets that they transmit. This could allow the first 2831 hop LSR to avoid an L3 lookup. It is reasonable here to have the 2832 host request labels only when needed, rather than require the host 2833 to remember all labels assigned for use in the network. 2835 On the egress side, it is questionable whether hosts should be 2836 involved. For scaling reasons, it would be undesirable to use a 2837 different label for reaching each host. 2839 4.10 Explicit Routing 2841 There are two options for Route Selection: (1) Hop by hop routing, 2842 and (2) Explicit routing. 2844 An explicitly routed LSP is an LSP where, at a given LSR, the LSP 2845 next hop is not chosen by each local node, but rather is chosen by 2846 a single node (usually the ingress or egress node of the LSP). The 2847 sequence of LSRs followed by an explicit routing LSP may be chosen 2848 by configuration, or by an algorithm performed by a single node 2849 (for example, the egress node may make use of the topological 2850 information learned from a link state database in order to compute 2851 the entire path for the tree ending at that egress node). 2853 With MPLS the explicit route needs to be specified at the time 2854 that Labels are assigned, but the explicit route does not have to 2855 be specified with each L3 packet. This implies that explicit 2856 routing with MPLS is relatively efficient (when compared with the 2857 efficiency of explicit routing for pure datagrams). 2859 Explicit routing may be useful for a number of purposes such as 2860 allowing policy routing and/or facilitating traffic engineering. 2862 4.10.1 Establishment of Point to Point Explicitly Routed LSPs 2864 In order to establish a point to point explicitly routed LSP, the 2865 signalling messages used to set up the LSP must contain the 2866 explicit route. This implies that the LSP is set up in order 2867 either from the ingress to the egress, or from the egress to the 2868 ingress. 2870 One node needs to pick the explicit route: This may be done in at 2871 least two possible ways: (i) by configuration (eg, the explicit 2872 route may be chosen by an operator, or by a centralized server of 2873 some kind); (ii) By use of a routing protocol which allows the 2874 ingress and/or egress node to know the entire route to be 2875 followed. This would imply the use of a link state routing 2876 protocol (in which all nodes know the full topology) or of a path 2877 vector routing protocol (in which the ingress node is told the 2878 path as part of the normal operation of the routing protocol). 2880 Note: The normal operation of path vector routing protocols (such 2881 as BGP) does not provide the full set of routers along the path. 2882 This implies that either a partial source route only would be 2883 provided (implying that LSP setup would use a combination of hop 2884 by hop and explicit routing), or it would be necessary to augment 2885 the protocol in order to provide the complete explicit route. 2887 In the point to point case, it is relatively straightforward to 2888 specify the route to use: This is indicated by providing the 2889 addresses of each LSR on the LSP. 2891 4.10.2 Explicit and Hop by Hop routing: Avoiding Loops 2893 In general, an LSP will be explicit routed specifically because 2894 there is a good reason to use an alternative to the hop by hop 2895 routed path. This implies that the explicit route is likely to 2896 follow a path which is inconsistent with the path followed by hop 2897 by hop routing. If some of the nodes along the path follow an 2898 explicit route but some of the nodes make use of hop by hop 2899 routing (and ignore the explicit route), then inconsistent routing 2900 may result and in some cases loops (or severely inefficient paths) 2901 may form. For any one LSP, there are three possible options: (i) 2902 The entire LSP may be hop by hop routed; or (ii) The entire LSP 2903 may be explicit routed; or (iii) The LSP may consist of both hop 2904 by hop and explicit routed segments provided that the LSP is 2905 established using ordered control. 2907 For this reason, it is important that if a strict explicit route 2908 is specified for setting up an LSP, then that route must be 2909 followed in setting up the LSP. 2911 There is a related issue when a link or node in the middle of an 2912 explicitly routed LSP breaks: In this case, the last operating 2913 node on the upstream part of the LSP will continue receiving 2914 packets, but will not be able to forward them along the explicitly 2915 routed LSP (since its next hop is no longer functioning). In this 2916 case, it is not in general safe for this node to forward the 2917 packets using L3 forwarding with hop by hop routing. Instead, the 2918 packets must be discarded, and the upstream partition of the 2919 explicitly routed LSP must be torn down. 2921 Where part of an Explicitly Routed LSP breaks, the node which 2922 originated the LSP needs to be told about this. For robustness 2923 reasons the MPLS protocol design should not assume that the 2924 routing protocol will tell the node which originated the LSP. For 2925 example, it is possible that a link may go down and come back up 2926 quickly enough that the routing protocol never declares the link 2927 down. Rather, an explicit MPLS mechanism is needed. 2929 4.10.3 Merge and Explicit Routing 2931 Explicit Routing is slightly more complex with a multipoint to 2932 point LSP (ie, in the case that stream merge is used). 2934 In this case, it is not possible to specify the route for the LSP 2935 as a simple list of LSRs (since the LSP does not consist of a 2936 simple sequence of LSRs). There are several ways that this may be 2937 accomplished. Details are outside the scope of this document. 2939 4.10.4 Using Explicit Routing for Traffic Engineering 2941 In the Internet today it is relatively common for ISPs to make use 2942 of a Frame Relay or ATM core, which interconnects a number of IP 2943 routers. The primary reason for use of a switching (L2) core is to 2944 make use of low cost equipment which provides very high speed 2945 forwarding. However, there is another very important reason for 2946 the use of a L2 core: In order to allow for Traffic Engineering. 2948 Traffic Engineering (also known as bandwidth management) refers to 2949 the process of managing the routes followed by user data traffic 2950 in a network in order to provide relatively equal and efficient 2951 loading of the resources in the network (ie, to ensure that the 2952 bandwidth on links and nodes are within the capabilities of the 2953 links and nodes). 2955 Some rudimentary level of traffic engineering can be accomplished 2956 with pure datagram routing and forwarding by adjusting the metrics 2957 assigned to links. For example, suppose that there is a given link 2958 in a network which tends to be overloaded on a long term basis. 2959 One option would be to manually configure an increased metric 2960 value for this link, in the hopes of moving some traffic onto 2961 alternate routes. This provides a rather crude method of traffic 2962 engineering and provides only limited results. 2964 Another method of traffic engineering is to manually configure 2965 multiple PVCs across a L2 core, and to adjust the route followed 2966 by each PVC in an attempt to equalize the load on different parts 2967 of the network. Where necessary, multiple PVCs may be configured 2968 between the same two nodes, in order to allow traffic to be split 2969 between different paths. In some topologies it is much easier to 2970 achieve efficient non-overlapping or minimally-overlapping paths 2971 via this method (with manually configured paths) than it would be 2972 with pure datagram forwarding. A similar ability can be achieved 2973 with MPLS via the use of manual configuration of the paths taken 2974 by LSPs. 2976 A related issue is the decision on where merge is to occur. Note 2977 that once two streams merge into one stream (forwarded by a single 2978 label) then they cannot diverge again at that level of the MPLS 2979 hierarchy (ie, they cannot be bifurcated without looking at a 2980 higher level label or the IP header). Thus there may be times when 2981 it is desirable to explicitly NOT merge two streams even though 2982 they are to the same egress node and FEC. Non-merge may be 2983 appropriate either because the streams will want to diverge later 2984 in the path (for example, to avoid overloading a particular 2985 downstream link), or because the streams may want to use different 2986 physical links in the case where multiple slower physical links 2987 are being aggregated into a single logical link for the purpose of 2988 IP routing. 2990 As a network grows to a very large size (on the order of hundreds 2991 of LSRs), it becomes increasingly difficult to handle the 2992 assignment of all routes via manual configuration. However, 2993 explicit routing allows several alternatives: 2995 1. Partial Configuration: One option is to use 2996 automatic/dynamic routing for most of the paths through 2997 the network, but then manually configure some routes. For 2998 example, suppose that full dynamic routing would result in 2999 a particular link being overloaded. One of the LSPs which 3000 uses that link could be selected and manually routed to 3001 use a different path. 3003 2. Central Computation: One option would be to provide long 3004 term network usage information to a single central 3005 management facility. That facility could then run a global 3006 optimization to compute a set of paths to use. Network 3007 management commands can be used to configure LSRs with the 3008 correct routes to use. 3010 3. Egress Computation: An egress node can run a computation 3011 which optimizes the path followed for traffic to itself. 3012 This cannot of course optimize the entire traffic load 3013 through the network, but can include optimization of 3014 traffic from multiple ingress's to one egress. The reason 3015 for optimizing traffic to a single egress, rather than 3016 from a single ingress, relates to the issue of when to 3017 merge: An ingress can never merge the traffic from itself 3018 to different egresses, but an egress can if desired chose 3019 to merge the traffic from multiple ingress's to itself. 3021 4.11 TTL and Traceroute 3023 Traceroute is a useful method which is widely used for management 3024 of IP networks. It is therefore highly desirable for traceroute 3025 and TTL to be preserved in networks where MPLS is used. TTL can 3026 also be useful to minimize the impact of loops (ie, as an aid to 3027 loop survival). 3029 In cases where the MPLS shim header is used, and where the IP 3030 packets are normal Internet packets (ie, not part of a VPN), TTL 3031 can optionally be handled in a way which is semantically identical 3032 to operation in native IP networks. The ingress node, when 3033 encapsulating an IP packet in the MPLS shim header, copies the TTL 3034 from the IP header to the MPLS Shim Header. LSRs decrement the 3035 TTL, and behave as normal IP routers in the case that the TTL 3036 reaches zero (ie, discard the IP packet and return an ICMP error 3037 report). Egress routers copy the TTL from the MPLS shim header 3038 back to the IP header. 3040 Where multiple MPLS shim headers are used in a label stack, TTL 3041 can be handled in essentially the same manner. When a LSR pushes a 3042 new header onto the stack, the TTL is copied from the previous 3043 shim header to the new header. When an LSR pops a header off of 3044 the stack, TTL is copied in the other direction. 3046 Some carriers may choose to avoid exposing the topology (or even 3047 the diameter) of their networks to customers. One way to do this 3048 is to treat an entire LSP crossing the carrier network as a single 3049 hop from the point of view of IP forwarding. In this case the 3050 ingress router places a value in the TTL field of the shim header 3051 which is independent of the TTL value found from the IP header. 3052 Similarly the decapsulating router strips off the MPLS header and 3053 forwards based on the IP header, but does not copy TTL values. 3054 Routers which are in the middle of the LSP (neither ingress nor 3055 egress) decrement the TTL contained in the MPLS shim header, but 3056 do not return an error report if the TTL is expired. 3058 There is a problem with the handling of ICMP error reports when 3059 VPNs are supported using MPLS. In this case, the IP address space 3060 used in the IP packet (carried over the LSP) might be local to the 3061 VPN, and therefore might not be understood by the LSR which 3062 detects that the TTL has reached zero. In addition, core LSRs 3063 might not necessarily know which LSPs are supporting VPN traffic 3064 and which are supporting Internet traffic. For this reason in 3065 networks where VPNs are supported over MPLS, special precautions 3066 are needed. If the ingress node knows the path of the LSP, then it 3067 may discard the packet and return an ICMP error report (to the 3068 VPNs space) if the TTL is less than the length of the LSP. 3069 Alternatively, the TTL value used in the MPLS header may be 3070 independent of the TTL value in the IP header, and the entire LSP 3071 may be treated as a single hop from the perspective of datagram IP 3072 forwarding. Alternatively, ICMP error reports could be turned off 3073 in such networks. 3075 One other potential solution to the ICMP error reporting problem 3076 is to use "bidirectional" LSPs. In this case, two LSPs may be 3077 created with the same endpoints, but which carry packets in 3078 opposite directions. These two LSPs are logically coupled 3079 together; that is, one LSP carries traffic from an originating 3080 node to a destination node, while the other carries traffic from 3081 the destination node to the originating node[TRAFENG]. When a 3082 packet has to be discarded that had been flowing on the LSP in one 3083 direction, the error report can be returned on the matching LSP in 3084 the other direction. This is true even when the IP address space 3085 encapsulated inside the LSP is one which the LSR does not 3086 otherwise understand. 3088 MPLS may also be used over L2 technologies which do not have TTL 3089 values (specifically ATM and Frame Relay). In this case, TTL and 3090 Traceroute may still be supported in some specific situations. 3092 In our discussion we will assume that the MPLS encapsulation for 3093 operation of MPLS over ATM and Frame Relay media always use a shim 3094 header. Thus the packet would consist of an IP packet encapsulated 3095 inside an MPLS shim header, which would in turn be encapsulated 3096 for transmission over ATM or Frame Relay (eg, the IP packet and 3097 MPLS shim header may be encapsulated in an AAL5 frame, which would 3098 in turn be encapsulated inside ATM cells). If the shim header is 3099 not used, when manipulations of the TTL in the shim header as 3100 described below would be replaced by manipulations of the TTL 3101 inside the IP header. 3103 The most straightforward case is one where ATM or Frame Relay is 3104 used for the entire path of the LSP, and where the ingress LSR 3105 knows the entire path of the LSP (for example, this may occur when 3106 the LSP is set up based on complete source routing). In this case 3107 the ingress router decrements the TTL by the length of the LSP. If 3108 the TTL reaches zero or a negative number, then the IP packet is 3109 discarded and an ICMP error report is returned by the ingress 3110 router, but with a source address which indicates the node at 3111 which the TTL would have expired. In this case in principle the 3112 TTL which is decremented could be either the one in the IP header 3113 or the one in the MPLS header. However, it allows more uniform 3114 operation (compared to other situations) if the TTL in the shim 3115 header is decremented by the ingress router by the length of the 3116 path, and then the egress router copies the TTL from the MPLS 3117 header into the IP packet. 3119 In some cases the length of the LSP might be known, but not the 3120 exact identity of the LSRs along the path (eg, the LSP is set up 3121 via ordered control). In this case the TTL can be decremented as 3122 above, but if the TTL would expire the packet could be forwarded 3123 by some "out of band" (control processor to control processor) 3124 path in order to get the packet to the LSR at which the TTL will 3125 reach zero. 3127 There may be cases where part of the LSP traverses ATM or Frame 3128 Relay links (using an ATM or Frame Relay header), and part 3129 traverses other media (using the shim header). 3131 Some of the issues which come up in this situation are best 3132 illustrated through use of an example. Suppose that in the 3133 following figure an LSP goes from R1 to R8. Thus R1 is the ingress 3134 LSR, and R8 is the egress LSR for this particular LSP. LSRs R3 and 3135 R6 have both ATM interfaces and non-ATM interfaces. Thus the MPLS 3136 shim header is used on the link from R1 to R2, and from R2 to R3. 3137 ATM is used on the links from R3 to R4, R4 to R5, and R5 to R6. 3138 Finally, the shim header is again used on the links from R6 to R7, 3139 and R7 to R8. 3141 ............................................... 3142 . . . . 3143 . . . . 3144 .R1------R2------R3 R6-----R7-----R8. 3145 . . \ /. . 3146 . . R4------R5 . . 3147 . . . . 3148 . Shim Header . ATM . Shim Header . 3149 ............................................... 3151 LSP spanning ATM and Shim Header Media 3153 If egress-initiated ordered control is used, then it is possible 3154 that when the LSP is first set up the signalling protocol could 3155 keep track of the number of hops to the next LSR that will use a 3156 shim header (and which therefore understands TTL). In our example 3157 R3 could therefore know that it is three hops to R6 (which is the 3158 next router which will use a shim header containing a TTL value). 3159 R3 can therefore decrement the TTL by the appropriate value (3), 3160 and return an error report if the TTL will expire. 3162 If ingress-initiated ordered control or independent control is 3163 used, then it is not clear how R3 will know the identity of the 3164 next LSR which understands TTL (ie, will use a shim header instead 3165 of an ATM or frame relay header). For example, suppose that 3166 complete explicit routing with ingress control is used. In this 3167 case R3 will know the complete path to the egress (R8), but will 3168 not know which downstream links use ATM media and which uses the 3169 shim header. Thus R3 will know that R6 is a downstream LSR for 3170 this LSP, but will not know that R6 is the specific LSR which 3171 removes the packet from the ATM media. 3173 R6 will forward the packet based on the incoming label implicit in 3174 the VPI/VCI from the ATM media, plus the existing shim header. 3175 Thus the TTL used at this point will be based on that received in 3176 the shim header. This implies that the TTL value in the shim 3177 header needs to be valid, which in turn implies that R3 needs to 3178 adjust the TTL value in the shim header to account for the length 3179 of the path from R3 to R6. 3181 4.12 LSP Control: Ordered versus Independent 3182 There is a choice to be made regarding whether the initial setup 3183 of LSPs will be in an ordered mode, where the assignment of LSP 3184 labels is initiated by the egress node, or independently by each 3185 individual node. 3187 When LSP control is done independently, then each node may at any 3188 time pass label bindings to its neighbors for each FEC recognized 3189 by that node. In the normal case that the neighboring nodes 3190 recognize the same FECs, then nodes may map incoming labels to 3191 outgoing labels as part of the normal label swapping forwarding 3192 method. 3194 When LSP control is done in an ordered manner, then the egress 3195 node passes label bindings to its neighbors corresponding to any 3196 FECs which leave the MPLS network at that egress node. Other nodes 3197 must wait until they get a label from downstream for a particular 3198 FEC before passing a corresponding label for the same FEC to 3199 upstream nodes. 3201 With independent control, since each LSR is independently 3202 assigning labels to FECs, it is possible that different LSRs may 3203 make inconsistent decisions. For example, an upstream LSR may make 3204 a coarse decision (map multiple IP address prefixes to a single 3205 label) while its downstream neighbor makes a finer grain decision 3206 (map each individual IP address prefix to a separate label). With 3207 downstream label assignment this can be corrected by having LSRs 3208 withdraw labels that it has assigned which are inconsistent with 3209 downstream labels, and replace them with new consistent label 3210 assignments. 3212 This may appear to be an advantage of ordered LSP control (since 3213 with egress control the initial label assignments "bubble up" from 3214 the egress to upstream nodes, and consistency is therefore easy to 3215 ensure). However, even with ordered control it is possible that 3216 the choice of egress node may change, or the egress may (based on 3217 a change in configuration) change its mind in terms of the 3218 granularity which is to be used. This implies the same mechanism 3219 will be necessary to allow changes in granularity to bubble up to 3220 upstream nodes. The choice of ordered or independent control may 3221 therefore effect the frequency with which this mechanism is used, 3222 but will not effect the need for a mechanism to achieve 3223 consistency of label granularity. 3225 Ordered control and independent control can interwork in a very 3226 straightforward manner: With either approach, (assuming downstream 3227 label assignment) the egress node will initially assign labels for 3228 particular FECs and will pass these labels to its neighbors. With 3229 either approach these label assignments will bubble upstream, with 3230 the upstream nodes choosing labels that are consistent with the 3231 labels that they receive from downstream. 3233 The difference between the two techniques therefore becomes a 3234 tradeoff between avoiding a short period of initial thrashing on 3235 startup (in the sense of avoiding the need to withdraw 3236 inconsistent labels which may have been assigned using local 3237 control) versus the imposition of a short delay on initial startup 3238 (while waiting for the initial label assignments to bubble up from 3239 downstream). The protocol mechanisms which need to be defined are 3240 the same in either case, and the steady state operation is the 3241 same in either case. 3243 5. Security 3245 Security in a network using MPLS should be relatively similar to 3246 security in a normal IP network. 3248 Routing in an MPLS network uses precisely the same IP routing 3249 protocols as are currently used with IP. This implies that route 3250 filtering is unchanged from current operation. Similarly, the 3251 security of the routing protocols is not effected by the use of 3252 MPLS. 3254 Packet filtering also may be done as in normal IP. This will 3255 require either (i) that label swapping be terminated prior to any 3256 firewalls performing packet filtering (in which case a separate 3257 instance of label swapping may optionally be started after the 3258 firewall); or (ii) that firewalls "look past the labels", in order 3259 to inspect the entire IP packet contents. In this latter case note 3260 that the label may imply semantics greater than that contained in 3261 the packet header: In particular, a particular label value may 3262 imply that the packet is to take a particular path after the 3263 firewall. In environments in which this is considered to be a 3264 security issue it may be desirable to terminate the label prior to 3265 the firewall. 3267 Note that in principle labels could be used to speed up the 3268 operation of firewalls: In particular, the label could be used as 3269 an index into a table which indicates the characteristics that the 3270 packet needs to have in order to pass through the firewall. 3271 Depending upon implementation considerations matching the contents 3272 of the packet to the contents of the table may be quicker than 3273 parsing the packet in the absence of the label. 3275 References 3277 [ARCH] "Multiprotocol Label Switching Architecture", E. 3278 Rosen, A. Viswanathan, R. Callon, work in progress, 3279 , April 1999. 3281 [ARIS] "ARIS: Aggregate Route-Based IP Switching", A. 3282 Viswanathan, N. Feldman, R. Boivie, R. Woundy, IBM 3283 Technical Report TR 29.2353, February 1998. 3285 [ARIS-PROT] "ARIS Protocol Specification", N. Feldman, A. 3286 Viswanathan, IBM Technical Report TR 29.2368, March 3287 1998. 3289 [ATM] "MPLS using LDP and ATM VC Switching", Davie, Doolan, 3290 Lawrence, McGloghrie, Rekhter, Rosen, Swallow, work in 3291 progress, Internet Draft , 3292 April 1999. 3294 [ATMVP] "MPLS using ATM VP Switching", N. Feldman, B. 3295 Jamoussi, S. Komandur, A. Viswanathan, T. Worster, 3296 work in progress, Internet Draft , February, 1999. 3299 [CR-LDP] "Constraint-Based LSP Setup using LDP", Jamoussi, et. 3300 al., work in progress, , February, 1999. 3303 [ENCAP] "MPLS Label Stack Encoding", Rosen, Rekhter, Tappan, 3304 Farinacci, Fedorkow, Li, Conta, work in progress, 3305 Internet Draft , 3306 April 1999. 3308 [FANP] "Internetworking Based on Cell Switch Router- 3309 Architecture and Protocol Overview", Y. Katsube, K. 3310 Nagami, S. Matsuzawa, H. Esaki, Proceedings of the 3311 IEEE, Vol. 85, No. 12, December, 1997. 3313 [FR] "Use of Label Switching on Frame Relay Networks", A. 3314 Conta, P. Doolan, A. Malis, work in progress, Internet 3315 Draft , November 1998. 3317 [IPNAV] "IP Switching for Scalable IP Services", H. Ahmed, R. 3318 Callon, A. Malis, J. Moy, Proceedings of the IEEE, 3319 Vol. 85, No. 12, December 1997. 3321 [LDP] "LDP Specification", L. Anderson, P. Doolan, N. 3322 Feldman, A. Fredette, B. Thomas, work in progress, 3323 , May 1999. 3325 [LOOP-COLOR] "MPLS Loop Prevention Mechanism", Y. Ohba, Y. 3326 Katsube, E. Rosen, P. Doolan, work in progress, 3327 Internet Draft , May 1999. 3330 [NHRP] "NBMA Next Hop Resolution Protocol (NHRP)", Luciani, 3331 Katz, Piscitello, Cole, work in progress, draft-ietf- 3332 rolc-nhrp-12.txt, March 1998. 3334 [PNNI] "ATM Forum Private Network-Network Interface 3335 Specification, Version 1.0", ATM Forum af-pnni- 3336 0055.000, March 1996. 3338 [RFC1583] "OSPF version 2", J. Moy, RFC 1583, March 1994. 3340 [RFC1663] "Integrated Services in the Internet Architecture: an 3341 Overview", R. Braden et al, RFC 1633, June 1994. 3343 [RFC1771] "A Border Gateway Protocol 4 (BGP-4)", Y. Rekhter and 3344 T. Li, RFC 1771, March 1995. 3346 [RFC1953] "Ipsilon Flow Management Protocol Specification for 3347 IPv4 Version 1.0", P. Newman et al., RFC 1953, May 3348 1996. 3350 [RFC2098] "Toshiba's Router Architecture Extensions for ATM: 3351 Overview", Katsube, Nagami, Esaki, RFC2098. 3353 [RFC2105] "Cisco Systems' Tag Switching Architecture Overview", 3354 Rekhter, Davie, Katz, Rosen, Swallow, RFC2105, 3355 February, 1997. 3357 [RSVP] "Resource ReSerVation Protocol (RSVP), Version 1 3358 Functional Specification", work in progress, draft- 3359 ietf-rsvp-spec-16.txt, June 1997. 3361 [RSVP-LSP] "Extensions to RSVP for LSP Tunnels", D. Awduche, L. 3362 Berger, D. Gan, T. Li, G. Swallow, V. Srinivasan, work 3363 in progress, Internet Draft , March 1999. 3366 [TRAFENG] "Requirements for Traffic Engineering Over MPLS", D. 3367 Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. 3368 McManus, work in progress, Internet Draft , October 1998. 3371 Author's Addresses 3373 Ross Callon 3374 IronBridge Networks 3375 55 Hayden Avenue, 3376 Lexington, MA 02173 3377 781-402-8017 3378 rcallon@ironbridgenetworks.com 3380 Paul Doolan 3381 Ennovate Networks 3382 330 Codman Hill Road 3383 Boxborough, MA 3384 978-263-2002 x103 3385 pdoolan@ennovatenetworks.com 3387 Nancy Feldman 3388 IBM 3389 30 Saw Mill River Rd. 3390 Hawthorne NY 10532 3391 914-784-3254 3392 nkf@us.ibm.com 3394 Andre Fredette 3395 Nortel Networks 3396 3 Federal Street 3397 Billerica, MA 01821 3398 978-288-8524 3399 fredette@nortelnetworks.com 3401 George Swallow 3402 Cisco Systems, Inc 3403 250 Apollo Drive 3404 Chelmsford, MA 01824 3405 508-244-8143 3406 swallow@cisco.com 3408 Arun Viswanathan 3409 Lucent Technologies 3410 101 Crawford Corner Rd., #4D-537 3411 Holmdel, NJ 07733 3412 732-332-5163 3413 arunv@dnrc.bell-labs.com