idnits 2.17.1 draft-ietf-mpls-framework-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 119: '... MPLS forwarding MUST simplify packet ...' RFC 2119 keyword, line 126: '...ore technologies MUST be general with ...' RFC 2119 keyword, line 128: '...imizations for particular media MAY be...' RFC 2119 keyword, line 131: '...ore technologies MUST be compatible wi...' RFC 2119 keyword, line 132: '...g protocols, and MUST be capable of op...' (23 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1263 has weird spacing: '...er LSRs that ...' == Line 1374 has weird spacing: '...ue must have ...' == Line 1788 has weird spacing: '...rity of a str...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The MPLS protocol MUST not make assumptions about the forwarding capabilities of an MPLS node. Thus, MPLS must propose solutions that can leverage the benefits of a node that is capable of L3 forwarding, but must not mandate the node be capable of such. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (Nov 12, 1997) is 9655 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'TAG' on line 973 -- Looks like a reference, but probably isn't: 'ARIS' on line 1247 -- Looks like a reference, but probably isn't: 'RSVP' on line 974 -- Looks like a reference, but probably isn't: 'CSR' on line 975 -- Looks like a reference, but probably isn't: 'Ipsilon' on line 975 -- Looks like a reference, but probably isn't: 'PNNI' on line 1690 -- Looks like a reference, but probably isn't: 'TDP' on line 1247 -- Looks like a reference, but probably isn't: 'FANP' on line 1248 == Unused Reference: '1' is defined on line 1956, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 1960, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 1964, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 1968, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 1972, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 1975, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 1979, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 1983, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 1987, but no explicit reference was found in the text == Unused Reference: '10' is defined on line 1991, but no explicit reference was found in the text == Unused Reference: '11' is defined on line 1994, but no explicit reference was found in the text == Unused Reference: '12' is defined on line 1998, but no explicit reference was found in the text == Unused Reference: '13' is defined on line 2001, but no explicit reference was found in the text == Unused Reference: '14' is defined on line 2005, but no explicit reference was found in the text == Unused Reference: '15' is defined on line 2007, but no explicit reference was found in the text == Unused Reference: '16' is defined on line 2010, but no explicit reference was found in the text == Unused Reference: '17' is defined on line 2013, but no explicit reference was found in the text == Unused Reference: '18' is defined on line 2016, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' ** Downref: Normative reference to an Informational RFC: RFC 2098 (ref. '10') -- Possible downref: Non-RFC (?) normative reference: ref. '11' ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref. '12') == Outdated reference: A later version (-15) exists of draft-ietf-rsvp-spec-14 ** Obsolete normative reference: RFC 1583 (ref. '14') (Obsoleted by RFC 2178) ** Obsolete normative reference: RFC 1771 (ref. '15') (Obsoleted by RFC 4271) ** Downref: Normative reference to an Informational RFC: RFC 1953 (ref. '16') -- Possible downref: Non-RFC (?) normative reference: ref. '17' == Outdated reference: A later version (-14) exists of draft-ietf-rolc-nhrp-11 Summary: 15 errors (**), 0 flaws (~~), 25 warnings (==), 21 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Callon 3 INTERNET DRAFT Cascade Communications 4 P. Doolan 5 Cisco Systems 6 N. Feldman 7 IBM Corp. 8 A. Fredette 9 Bay Networks 10 G. Swallow 11 Cisco Systems 12 A. Viswanathan 13 IBM Corp. 14 May 12, 1997 15 Expires Nov 12, 1997 17 A Framework for Multiprotocol Label Switching 19 Status of this Memo 21 This document is an Internet-Draft. Internet-Drafts are working 22 documents of the Internet Engineering Task Force (IETF), its areas, 23 and its working groups. Note that other groups may also distribute 24 working documents as Internet-Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as ``work in progress.'' 31 To learn the current status of any Internet-Draft, please check the 32 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 33 Directories on ds.internic.net (US East Coast), nic.nordu.net 34 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific 35 Rim). Distribution of this memo is unlimited. 37 Abstract 39 This document discusses technical issues and requirements for the 40 Multiprotocol Label Switching working group. This is an initial draft 41 document, which will evolve and expand over time. It is the intent of 42 this document to produce a coherent description of all significant 43 approaches which were and are being considered by the working group. 44 Selection of specific approaches, making choices regarding 45 engineering tradeoffs, and detailed protocol specification, are 46 outside of the scope of this framework document. 48 Note that this document is at an early stage, and that most of the 49 detailed technical discussion is only in a rough form. Additional 50 text will be provided over time from a number of sources. 52 Acknowledgments 54 The ideas and text in this document have been collected from a number 55 of sources and comments received. We would like to thank Jim Luciani, 56 Andy Malis, Yakov Rekhter, Eric Rosen, and Vijay Srinivasan for their 57 inputs and ideas. 59 1. Introduction and Requirements 61 1.1 Overview of MPLS 63 The primary goal of the MPLS working group is to standardize a base 64 technology that integrates the label swapping forwarding paradigm 65 with network layer routing. This base technology (label swapping) is 66 expected to improve the price/performance of network layer routing, 67 improve the scalability of the network layer, and provide greater 68 flexibility in the delivery of (new) routing services (by allowing 69 new routing services to be added without a change to the forwarding 70 paradigm). 72 The initial MPLS effort will be focused on IPv4 and IPv6. However, 73 the core technology will be extendible to multiple network layer 74 protocols (e.g., IPX, Appletalk, DECnet, CLNP). MPLS is not confined 75 to any specific link layer technology, it can work with any media 76 over which Network Layer packets can be passed between network layer 77 entities. 79 MPLS makes use of a routing approach whereby the normal mode of 80 operation is that L3 routing (e.g., existing IP routing protocols 81 and/or new IP routing protocols) is used by all nodes to determine 82 the routed path. 84 MPLS provides a simple "core" set of mechanisms which can be applied 85 in several ways to provide a rich functionality. The core effort 86 includes: 88 a) Semantics assigned to a stream label: 90 - Labels are associated with specific streams of data; 92 b) Forwarding Methods: 94 - Forwarding is simplified by the use of short fixed length 95 labels to identify streams 97 - Forwarding may require simple functions such as looking up a 98 label in a table, swapping labels, and possibly decrementing 99 and checking a TTL. 101 - In some cases MPLS may make direct use of underlying layer 2 102 forwarding, such as is provided by ATM or Frame Relay 103 equipment. 105 c) Label Distribution Methods: 107 - Allow nodes to determine which labels to use for specific 108 streams 110 - This may use some sort of control exchange, and/or be 111 piggybacked on a routing protocol 113 The MPLS working group will define the procedures and protocols used 114 to assign significance to the forwarding labels and to distribute 115 that information between cooperating MPLS forwarders. 117 1.2 Requirements 119 - MPLS forwarding MUST simplify packet forwarding in order to do the 120 following: 122 - lower cost of high speed forwarding 124 - improve forwarding performance 126 - MPLS core technologies MUST be general with respect to data link 127 technologies (i.e., work over a very wide range of underlying data 128 links). Specific optimizations for particular media MAY be 129 considered. 131 - MPLS core technologies MUST be compatible with a wide range of 132 routing protocols, and MUST be capable of operating independently 133 of the underlying routing protocols. It has been observed that 134 considerable optimizations can be achieved in some cases by small 135 enhancements of existing protocols. Such enhancements MAY be 136 considered in the case of IETF standard routing protocols, and if 137 appropriate, coordinated with the relevant working group(s). 139 - Routing protocols which are used in conjunction with MPLS might 140 be based on distributed computation. As such, during routing 141 transients, these protocols may compute forwarding paths which 142 potentially contain loops. MPLS MUST provide protocol mechanisms to 143 either prevent the formation of loops and /or contain the amount of 144 (networking) resources that can be consumed due to the presence of 145 loops. 147 - MPLS forwarding MUST allow "aggregate forwarding" of user data; 148 i.e., allow streams to be forwarded as a unit and ensure that an 149 identified stream takes a single path, where a stream may consist 150 of the aggregate of multiple flows of user data. MPLS SHOULD 151 provide multiple levels of aggregation support (e.g., from 152 individual end to end application flows at one extreme, to 153 aggregates of all flows passing through a specified switch or 154 router at the other extreme). 156 - MPLS MUST support operations, administration, and maintenance 157 facilities at least as extensive as those supported in current IP 158 networks. Current network management and diagnostic tools SHOULD 159 continue to work in order to provide some backward compatibility. 160 Where such tools are broken by MPLS, hooks MUST be supplied to 161 allow equivalent functionality to be created. 163 - MPLS core technologies MUST work with both unicast and multicast 164 streams. 166 - The MPLS core specifications MUST clearly state how MPLS operates 167 in a hierarchical network. 169 - Scalability issues MUST be considered and analyzed during the 170 definition of MPLS. Very scaleable solutions MUST be sought. 172 - MPLS core technologies MUST be capable of working with O(n) streams 173 to switch all best-effort traffic, where n is the number of nodes 174 in a MPLS domain. MPLS protocol standards MUST be capable of taking 175 advantage of hardware that supports stream merging where 176 appropriate. Note that O(n-squared) streams or VCs might also be 177 appropriate for use in some cases. 179 - The core set of MPLS standards, along with existing Internet 180 standards, MUST be a self-contained solution. For example, the 181 proposed solution MUST NOT require specific hardware features that 182 do not commonly exist on network equipment at the time that the 183 standard is complete. However, the solution MAY make use of 184 additional optional hardware features (e.g., to optimize 185 performance). 187 - The MPLS protocol standards MUST support multipath routing and 188 forwarding. 190 - MPLS MUST be compatible with the IETF Integrated Services Model, 191 including RSVP. 193 - It MUST be possible for MPLS switches to coexist with non MPLS 194 switches in the same switched network. MPLS switches SHOULD NOT 195 impose additional configuration on non-MPLS switches. 197 - MPLS MUST allow "ships in the night" operation with existing layer 198 2 switching protocols (e.g., ATM Forum Signaling) (i.e., MPLS must 199 be capable of being used in the same network which is also 200 simultaneously operating standard layer 2 protocols). 202 - The MPLS protocol MUST support both topology-driven and 203 traffic/request-driven label assignments. 205 1.3 Terminology 207 aggregate stream 209 synonym of "stream" 211 DLCI 213 a label used in Frame Relay networks to identify frame 214 relay circuits 216 flow 218 a single instance of an application to application flow 219 of data (as in the RSVP and IFMP use of the term "flow") 221 frame merge 223 stream merge, when it is applied to operation over 224 frame based media, so that the potential problem of cell 225 interleave is not an issue. 227 label 229 a short fixed length physically contiguous locally 230 significant identifier which is used to identify a stream 232 label information base 234 the database of information containing label bindings 236 label swap 238 the basic forwarding operation consisting of 239 looking up an incoming label to determine the outgoing label, 240 encapsulation, port, and other data handling information. 242 label swapping 244 a forwarding paradigm allowing streamlined forwarding of data 245 by using labels to identify streams of data to be forwarded. 247 label switched hop 249 the hop between two MPLS nodes, on which forwarding 250 is done using labels. 252 label switched path 254 the path created by the concatenation of one or more label 255 switched hops, allowing a packet to be forwarded by swapping 256 labels from an MPLS node to another MPLS node. 258 layer 2 260 the protocol layer under layer 3 (which therefore offers the 261 services used by layer 3). Forwarding, when done by the swapping 262 of short fixed length labels, occurs at layer 2 regardless of 263 whether the label being examined is an ATM VPI/VCI, a frame 264 relay DLCI, or an MPLS label. 266 layer 3 268 the protocol layer at which IP and its associated routing 269 protocols operate 271 link layer 273 synonymous with layer 2 275 loop detection 277 a method of dealing with loops in which loops are allowed 278 to be set up, and data may be transmitted over the loop, 279 but the loop is later detected and closed 281 loop prevention 283 a method of dealing with loops in which data is never 284 transmitted over a loop 286 label stack 288 an ordered set of labels 290 loop survival 292 a method of dealing with loops in which data may be 293 transmitted over a loop, but means are employed to limit the 294 amount of network resources which may be consumed by the 295 looping data 297 label switching router 299 an MPLS node which is capable of forwarding native L3 packets 301 merge point 303 the node at which multiple streams and switched paths are 304 combined into a single stream sent over a single path. In the 305 case that the multiple paths are not combined prior to the 306 egress node, then the egress node becomes the merge point. 308 Mlabel 310 abbreviation for MPLS label 312 MPLS core standards 314 the standards which describe the core MPLS technology 316 MPLS domain 318 a contiguous set of nodes which operate MPLS routing and 319 forwarding and which are also in one Routing or Administrative 320 Domain 322 MPLS edge node 324 an MPLS node that connects an MPLS domain with a node which 325 is outside of the domain, either because it does not run 326 MPLS, and/or because it is in a different domain. Note that 327 if an LSR has a neighboring host which is not running MPLS, 328 that that LSR is an MPLS edge node. 330 MPLS egress node 332 an MPLS edge node in its role in handling traffic as it leaves 333 an MPLS domain 335 MPLS ingress node 337 an MPLS edge node in its role in handling traffic as it enters 338 an MPLS domain 340 MPLS label 342 a label placed in a short MPLS shim header used to identify 343 streams 345 MPLS node 347 a node which is running MPLS. An MPLS node will be aware 348 of MPLS control protocols, will operate one or more L3 routing 349 protocols, and will be capable of forwarding packets based on 350 labels. An MPLS node may optionally be also capable of 351 forwarding native L3 packets. 353 MultiProtocol Label Switching 355 an IETF working group and the effort associated with the working 356 group 358 network layer 360 synonymous with layer 3 362 shortcut VC 364 a VC set up as a result of an NHRP query and response 366 stack 368 synonymous with label stack 370 stream 372 an aggregate of one or more flows, treated as one aggregate 373 for the purpose of forwarding in L2 and/or L3 nodes (e.g., 374 may be described using a single label). In many cases a stream 375 may be the aggregate of a very large number of flows. 376 Synonymous with "aggregate stream". 378 stream merge 380 the merging of several smaller streams into a larger stream, 381 such that for some or all of the path the larger stream can 382 be referred to using a single label. 384 switched path 386 synonymous with label switched path 388 virtual circuit 390 a circuit used by a connection-oriented layer 2 technology 391 such as ATM or Frame Relay, requiring the maintenance of 392 state information in layer 2 switches. 394 VC merge 396 stream merge when it is specifically applied to VCs, 397 specifically so as to allow multiple VCs to merge into one 398 single VC 400 VP merge 402 stream merge when it is applied to VPs, specifically so as 403 to allow multiple VPs to merge into one single VP. In this 404 case the VCIs need to be unique. This allows cells from 405 different sources to be distinguished via the VCI. 407 VPI/VCI 409 a label used in ATM networks to identify circuits 411 1.4 Acronyms and Abbreviations 413 DLCI Data Link Circuit Identifier 415 LIB Label Information Base 417 LDP Label Distribution Protocol 419 L2 Layer 2 421 L3 Layer 3 423 LSR Label Switching Router 425 MPLS MultiProtocol Label Switching 427 NHC Next Hop (NHRP) Client 429 NHS Next Hop (NHRP) Server 431 VC Virtual Circuit 432 VPI Virtual Path Identifier 434 VCI Virtual Circuit Identifier 436 2. Discussion of Core MPLS Components 438 2.1 The Basic Routing Approach 440 Routing is accomplished through the use of standard L3 routing 441 protocols, such as OSPF and BGP. The information maintained by the 442 L3 routing protocols is then used to distribute labels to neighboring 443 nodes that are used in the forwarding of packets as described below. 444 In the case of ATM networks, the labels that are distributed are 445 VPI/VCIs and a separate protocol (i.e., PNNI) is not necessary for 446 the establishment of VCs for IP forwarding. 448 The topological scope of a routing protocol (i.e. routing domain) and 449 the scope of label switching MPLS-capable nodes may be different. 450 For example, MPLS-knowledgeable and MPLS-ignorant nodes, all of which 451 are OSPF routers, may be co-resident in an area. In the case that 452 neighboring routers know MPLS, labels can be exchanged and used. 454 Neighboring MPLS routers may use configured PVCs or PVPs to tunnel 455 through non-participating ATM or FR switches. 457 2.2 Labels 459 In addition to the single routing protocol approach discussed above, 460 the other key concept in the basic MPLS approach is the use of short 461 fixed length labels to simply user data forwarding. 463 2.2.1 Label Semantics 465 It is important that the MPLS solutions are clear about what 466 semantics (i.e., what knowledge of the state of the network) is 467 implicit in the use of labels for forwarding user data packets or 468 cells. 470 At the simplest level, a label may be thought of as nothing more than 471 a shorthand for the packet header, in order to index the forwarding 472 decision that a router would make for the packet. In this context, 473 the label is nothing more than a shorthand for an aggregate stream of 474 user data. 476 This observation leads to one possible very simple interpretation 477 that the "meaning" of the label is a strictly local issue between two 478 neighboring nodes. With this interpretation: (i) MPLS could be 479 employed between any two neighboring nodes for forwarding of data 480 between those nodes, even if no other nodes in the network 481 participate in MPLS; (ii) When MPLS is used between more than two 482 nodes, then the operation between any two neighboring nodes could be 483 interpreted as independent of the operation between any other pair of 484 nodes. This approach has the advantage of semantic simplicity, and of 485 being the closest to pure datagram forwarding. However this approach 486 (like pure datagram forwarding) has the disadvantage that when a 487 packet is forwarded it is not known whether the packet is being 488 forwarded into a loop, into a black hole, or towards links which have 489 inadequate resources to handle the traffic flow. These disadvantages 490 are necessary with pure datagram forwarding, but are optional design 491 choices to be made when label switching is being used. 493 There are cases where it would be desirable to have additional 494 knowledge implicit in the existence of the label. For example, one 495 approach to avoiding loops (see section x.x below) involves signaling 496 the label distribution along a path before packets are forwarded on 497 that path. With this approach the fact that a node has a label to use 498 for a particular IP packet would imply the knowledge that following 499 the label (including label swapping at subsequent nodes) leads to a 500 non- looping path which makes progress towards the destination 501 (something which is usually, but not necessarily always true when 502 using pure datagram routing). This would of course require some sort 503 of label distribution/setup protocol which signals along the path 504 being setup before the labels are available for packet forwarding. 505 However, there are also other consequences to having additional 506 semantics associated with the label: specifically, procedures are 507 needed to ensure that the semantics are correct. For example, if the 508 fact that you have a label for a particular destination implies that 509 there is a loop-free path, then when the path changes some procedures 510 are required to ensure that it is still loop free. Another example of 511 semantics which could be implicit in a label is the identity of the 512 higher level protocol type which is encoded using that label value. 514 In either case, the specific value of a label to use for a stream is 515 strictly a local issue; however the decision about whether to use the 516 label may be based on some global (or at least wider scope) knowledge 517 that, for example, the label-switched path is loop-free and/or has 518 the appropriate resources. 520 A similar example occurs in ATM networks: With standard ATM a 521 signaling protocol is used which both reserves resources in switches 522 along the path, and which ensures that the path is loop-free and 523 terminates at the correct node. Thus implicit in the fact that an ATM 524 node has a VPI/VCI for forwarding a particular piece of data is the 525 knowledge that the path has been set up successfully. 527 Another similar examples occurs with multipoint to point trees over 528 ATM (see section xx below), where the multipoint to point tree uses a 529 VP, and cell interleave at merge points in the tree is handled by 530 giving each source on the tree a distinct VCI within the VP. In this 531 case, the fact that each source has a known VPI/VCI to use needs to 532 (implicitly or explicitly) imply the knowledge that the VCI assigned 533 to that source is unique within the context of the VP. 535 In general labels are used to optimize how the system works, not to 536 control how the system works. For example, the routing protocol 537 determines the path that a packet follows. The presence or absence of 538 a label assignment should not effect the path of a L3 packet. Note 539 however that the use of labels may make capabilities such as explicit 540 routes, loadsharing, and multipath more efficient. 542 2.2.2 Label Granularity 544 Labels are used to create a simple forwarding paradigm. The 545 essential element in assigning a label is that the device which will 546 be using the label to forward packets will be forwarding all packets 547 with the same label in the same way. If the packet is to be 548 forwarded solely by looking at the label, then at a minimum, all 549 packets with the same incoming label must be forwarded out the same 550 port(s) with the same encapsulation(s), and with the same next hop 551 label (if any). 553 Note that the label could also mean "ignore this label and forward 554 based on what is contained within," where within one might find a 555 label (if a stack of labels is used) or a layer 3 packet. 557 For IP unicast traffic, the granularity of a label allows various 558 levels of aggregation in a Label Information Base (LIB). At one end 559 of the spectrum, a label could represent a host route (i.e. the full 560 32 bits of IP address). If a router forwards an entire CIDR prefix 561 in the same way, it may choose to use one label to represent that 562 prefix. Similarly if the router is forwarding several (otherwise 563 unrelated) CIDR prefixes in the same way it may choose to use the 564 same label for this set of prefixes. For instance all CIDR prefixes 565 which share the same BGP Next Hop could be assigned the same label. 566 Taking this to the limit, an egress router may choose to advertise 567 all of its prefixes with the same label. 569 By introducing the concept of an egress identifier, the distribution 570 of labels associated with groups of CIDR prefixes can be simplified. 571 For instance, an egress identifier might specify the BGP Next Hop, 572 with all prefixes routed to that next hop receiving the label 573 associated with that egress identifier. Another natural place to 574 aggregate would be the MPLS egress router. This would work 575 particularly well in conjunction with a link-state routing protocol, 576 where the association between egress router and CIDR prefix is 577 already distributed throughout an area. 579 For IP multicast, the natural binding of a label would be to a 580 multicast tree, or rather to the branch of a tree which extends from 581 a particular port. Thus for a shared tree, the label corresponds to 582 the multicast group, (*,G). For (S,G) state, the label would 583 correspond to the source address and the multicast group. 585 A label can also have a granularity finer than a host route. That 586 is, it could be associated with some combination of source and 587 destination address or other information within the packet. This 588 might for example be done on an administrative basis to aid in 589 effecting policy. A label could also correspond to all packets which 590 match a particular Integrated Services filter specification. 592 Labels can also represent explicit routes. This use is semantically 593 equivalent to using an IP tunnel with a complete source route. This 594 is discussed in more detail in section 4.12. 596 2.2.3 Label Assignment 598 Essential to label switching is the notion of binding between a label 599 and Network Layer routing (routes). A control component is 600 responsible for creating label bindings, and then distributing the 601 label binding information among label switches. Label assignment 602 involves allocating a label, and then binding a label to a route. 604 Label assignment can be driven by control traffic or by data traffic. 605 This is discussed in more detail in section 3.4. 607 Control traffic driven label assignment has several advantages, as 608 compared to data traffic driven label Assignment. For one thing, it 609 minimizes the amount of additional control traffic needed to 610 distribute label binding information, as label binding information is 611 distributed only in response to control traffic, independent of data 612 traffic. It also makes the overall scheme independent of and 613 insensitive to the data traffic profile/pattern. Control traffic 614 driven creation of label binding improves forwarding latency, as 615 labels are assigned before data traffic arrives, rather than being 616 assigned as data traffic arrives. It also simplifies the overall 617 system behavior, as the control plane is controlled solely by control 618 traffic, rather than by a mix of control and data traffic. 620 There are however situations where data traffic driven label 621 assignment is necessary. A particular case may occur with ATM 622 without VP or VC merge. In this case in order to set up a full mesh 623 of VCs would require n-squared VCs. However, in very large networks 624 this may be infeasible. Instead VCs may be setup where required for 625 forwarding data traffic. In this case it is generally not possible to 626 know a priori how many such streams may occur. 628 Label withdrawal is required with both control-driven and data-driven 629 label assignment. Label withdrawal is primarily a matter of garbage 630 collection, that is collecting up unused labels so that they may be 631 reassigned. Generally speaking, a label should be withdrawn when the 632 conditions that allowed it to be assigned are no longer true. For 633 example, if a label is imbued with extra semantics such as loop-free- 634 ness, then the label must be withdrawn when those extra semantics 635 cease to hold. 637 In certain cases, notably multicast, it may be necessary to share a 638 label space between multiple entities. If these sharing arrangements 639 are altered by the coming and going of neighbors, then labels which 640 are no longer controlled by an entity must be withdrawn and a new 641 label assigned. 643 2.2.4 Label Stack and Forwarding Operations 645 The basic forwarding operation consists of looking up the incoming 646 label to determine the outgoing label, encapsulation, port, and any 647 additional information which may pertain to the stream such as a 648 particular queue or other QoS related treatment. We refer to this 649 operation as a label swap. 651 When a packet first enters an MPLS domain, the packet is forwarded by 652 normal layer 3 forwarding operations with the exception that the 653 outgoing encapsulation will now include a label. We refer to this 654 operation as a label push. When a packet leaves an MPLS domain, the 655 label is removed. We refer to this as a label pop. 657 In some situations, carrying a stack of labels is useful. For 658 instance both IGP and BGP label could be used to allow routers in the 659 interior of an AS to be free of BGP information. In this scenario, 660 the "IGP" label is used to steer the packet through the AS and the 661 "BGP" label is used to switch between ASes. 663 With a label stack, the set of label operations remains the same, 664 except that at some points one might push or pop multiple labels, or 665 pop & swap, or swap & push. 667 2.3 Encapsulation 669 Label-based forwarding makes use of various pieces of information, 670 including a label or stack of labels, and possibly additional 671 information such as a TTL field. In some cases this information may 672 be encoded using an MPLS header, in other cases this information may 673 be encoded in L2 headers. Note that there may be multiple types of 674 MPLS headers. For example, the header used over one media type may be 675 different than is used over a different media type. Similarly, in 676 some cases the information that MPLS makes use of may be encoded in 677 an ATM header. We will use the term "MPLS encapsulation" to refer to 678 whatever form is used to encapsulate the label information and other 679 information used for label based forwarding. The term "MPLS header" 680 will be used where this information is carried in some sort of MPLS- 681 specific header (i.e., when the MPLS information cannot all be 682 carried in a L2 header). Whether there is one or multiple forms of 683 possible MPLS headers is also outside of the scope of this document. 685 The exact contents of the MPLS encapsulation is outside of the scope 686 of this document. Some fields, such as the label, are obviously 687 needed. Some others might or might not be standardized, based on 688 further study. An encapsulation scheme might make use of the 689 following fields: 691 - label 692 - TTL 693 - class of service 694 - stack indicator 695 - next header type indicator 696 - checksum 698 It is desirable to have a very short encapsulation header. For 699 example, a four byte encapsulation header adds to the convenience of 700 building a hardware implementation that forwards based on the 701 encapsulation header. But at the same time it is tricky assigning 702 such a limited number of bits to carry the above listed information 703 in an MPLS header. Hence careful consideration must be given to the 704 information chosen for an MPLS header. 706 A TTL value in the MPLS header may be useful in the same manner as it 707 is in IP. Specifically, TTL may be used to terminate packets caught 708 in a routing loop, and for other related uses such as traceroute. The 709 TTL mechanism is a simple and proven method of handling such events. 710 Another use of TTL is to expire packets in a network by limiting 711 their "time to live" and eliminating stale packets that may cause 712 problems for some of the higher layer protocols. When used over link 713 layers which do not provide a TTL field, alternate mechanisms will be 714 needed to replace the uses of the TTL field. 716 A provision for a class of service (COS) field in the MPLS header 717 allows multiple service classes within the same label. However, when 718 more sophisticated QoS is associated with a label, the COS may not 719 have any significance. Alternatively, the COS (like QoS) can be left 720 out of the header, and instead propagated with the label assignment, 721 but this entails that a separate label be assigned to each required 722 class of service. Nevertheless, the COS mechanism provides a simple 723 method of segregating flows within a label. 725 As previously mentioned, the encapsulation header can be used to 726 derive benefits of tunneling (or stacking). 728 The MPLS header must provide a way to indicate that multiple MPLS 729 headers are stacked (ie, the "stack indicator"). For this purpose a 730 single bit in the MPLS header will suffice. In addition, there are 731 also some benefits to indicating the type of the protocol header 732 following the MPLS header (ie, the "next header type indicator"). One 733 option would be to combine the stack indicator and next header type 734 indicator into a single value (ie, the next header type indicator 735 could be allowed to take the value "MPLS header"). Another option is 736 to have the next header type indicator be implicit in the label value 737 (such that this information would be propagated along with the 738 label). 740 There is no compelling reason to support a checksum field in the MPLS 741 header. A CRC mechanism at the L2 layer should be sufficient to 742 ensure the integrity of the MPLS header. 744 3. Observations, Issues and Assumptions 746 3.1 Layer 2 versus Layer 3 Forwarding 748 MPLS uses L2 forwarding as a way to provide simple and fast packet 749 forwarding capability. One primary reason for the simplicity of L2 750 layer forwarding comes from its short, fixed length labels. A node 751 forwarding at L3 must parse a (relatively) large header, and perform 752 a longest-prefix match to determine a forwarding path. However, when 753 a node performs L2 label swapping, and labels are assigned properly, 754 it can do a direct index lookup into its forwarding (or in this case, 755 label-swapping) table with the short header. It is arguably simpler 756 to build label swapping hardware than it is to build L3 forwarding 757 hardware because the label swapping function is less complex. 759 The relative performance of L2 and L3 forwarding may differ 760 considerably between nodes. Some nodes may illustrate an order of 761 magnitude difference. Other nodes (for example, nodes with more 762 extensive L3 forwarding hardware) may have identical performance at 763 L2 and L3. However, some nodes may not be capable of doing a L3 764 forwarding at all (e.g. ATM), or have such limited capacity as to be 765 unusable at L3. In this situation, traffic must be blackholed if no 766 switched path exists. 768 On nodes in which L3 forwarding is slower than L2 forwarding, pushing 769 traffic to L3 when no L2 path is available may cause congestion. In 770 some cases this could cause data loss (since L3 may be unable to keep 771 up with the increased traffic). However, if data is discarded, then 772 in general this will cause TCP to backoff, which would allow control 773 traffic, traceroute and other network management tools to continue to 774 work. 776 The MPLS protocol MUST not make assumptions about the forwarding 777 capabilities of an MPLS node. Thus, MPLS must propose solutions that 778 can leverage the benefits of a node that is capable of L3 forwarding, 779 but must not mandate the node be capable of such. 781 Why We Will Still Need L3 Forwarding 783 MPLS will not, and is not intended to, replace L3 forwarding. There 784 is absolutely a need for some systems to continue to forward IP 785 packets using normal Layer 3 IP forwarding. L3 forwarding will be 786 needed for a variety of reasons, including: 788 - For scaling; to forward on a finer granularity than the labels 789 can provide 790 - For security; to allow packet filtering at firewalls. 791 - For forwarding at the initial router (when hosts don't do MPLS) 793 Consider a campus network which is serving a small company. Suppose 794 that this companies makes use of the Internet, for example as a 795 method of communicating with customers. A customer on the other side 796 of the world has an IP packet to be forwarded to a particular system 797 within the company. It is not reasonable to expect that the customer 798 will have a label to use to forward the packet to that specific 799 system. Rather, the label used for the "first hop" forwarding might 800 be sufficient to get the packet considerably closer to the 801 destination. However, the granularity of the labels cannot be to 802 every host worldwide. Similarly, routing used within one routing 803 domain cannot know about every host worldwide. This implies that in 804 may cases the labels assigned to a particular packet will be 805 sufficient to get the packet close to the destination, but that at 806 some points along the path of the packet the IP header will need to 807 be examined to determine a finer granularity for forwarding that 808 packet. This is particularly likely to occur at domain boundaries. 810 A similar point occurs at the last router prior to the destination 811 host. In general, the number of hosts attached to a network is likely 812 to be great enough that it is not feasible to assign a separate label 813 to every host. Rather, as least for routing within the destination 814 routing domain (or the destination area if there is a hierarchical 815 routing protocol in use) a label may be assigned which is sufficient 816 to get the packet to the last hop router. However, the last hop 817 router will need to examine the IP header (and particularly the 818 destination IP address) in order to forward the packet to the correct 819 destination host. 821 Packet filtering at firewalls is an important part of the operation 822 of the Internet. While the current state of Internet security may be 823 considerably less advanced than may be desired, nonetheless some 824 security (as is provided by firewalls) is much better than no 825 security. We expect that packet filtering will continue to be 826 important for the foreseeable future. Packet filtering requires 827 examination of the contents of the packet, including the IP header. 828 This implies that at firewalls the packet cannot be forwarded simply 829 by considering the label associated with the packet. Note that this 830 is also likely to occur at domain boundaries. 832 Finally, it is very likely that many hosts will not implement MPLS. 833 Rather, the host will simply forward an IP packet to its first hop 834 router. This first hop router will need to examine the IP header 835 prior to forwarding the packet (with or without a label). 837 3.2 Scaling Issues 839 MPLS scalability is provided by two of the principles of routing. 840 The first is that forwarding follows an inverted tree rooted at a 841 destination. The second is that the number of destinations is 842 reduced by routing aggregation. 844 The very nature of IP forwarding is a merged multipoint-to-point 845 tree. Thus, since MPLS mirrors the IP network layer, an MPLS node 846 that is capable of merging is capable of creating O(n) switched paths 847 which provide network reachability to all "n" destinations. The 848 meaning of "n" depends on the granularity of the switched paths. One 849 obvious choice of "n" is the number of CIDR prefixes existing in the 850 forwarding table (this scales the same as today's routing). However, 851 the value of "n" may be reduced considerably by choosing switched 852 paths of further aggregation. For example, by creating switched paths 853 to each possible egress node, "n" may represent the number of egress 854 nodes in a network. This choice creates "n" switched paths, such that 855 each path is shared by all CIDR prefixes that are routed through the 856 same egress node. This selection greatly improves scalability, since 857 it minimizes "n", but at the same time maintains the same switching 858 performance of CIDR aggregation. (See section 2.2.2 for a description 859 of all of the levels of granularity provided by MPLS). 861 The MPLS technology must scale at least as well as existing 862 technology. For example, if the MPLS technology were to support ONLY 863 host-to-host switched path connectivity, then the number of 864 switched-paths would be much higher than the number of routing table 865 entries. 867 There are several ways in which merging can be done in order to allow 868 O(n) switches paths to connect n nodes. The merging approach used has 869 an impact on the amount of state information, buffering, delay 870 characteristics, and the means of control required to coordinate the 871 trees. These issues are discussed in more detail in section 4.2. 873 There are some cases in which O(n-squared) switched paths may be used 874 (for example, by setting up a full mesh of point to point streams). 875 As label space and the amount of state information that can be 876 supported may be limited, it will not be possible to support O(n- 877 squared) switched paths in very large networks. However, in some 878 cases the use of n-squared paths may even be a advantage (for 879 example, to allow load- splitting of individual streams). 881 MPLS must be designed to scale for O(n). O(n) scaling allows MPLS 882 domains to scale to a very large scale. In addition, if best effort 883 service can be supported with O(n) scaling, this conserves resources 884 (such as label space and state information) which can be used for 885 supporting advanced services such as QoS. However, since some 886 switches may not support merging, and some small networks may not 887 require the scaling benefits of O(n), provisions must also be 888 provided for a non- merging, O(n-squared) solution. 890 Note: A precise and complete description of scaling would consider 891 that there are multiple dimensions of scaling, and multiple resources 892 whose usage may be considered. Possible dimensions of scaling 893 include: (i) the total number of streams which exist in an MPLS 894 domain (with associated labels assigned to them); (ii) the total 895 number of "label swapping pairs" which may be stored in the nodes of 896 the network (ie, entries of the form "for incoming label 'x', use 897 outgoing label 'y'"); (iii) the number of labels which need to be 898 assigned for use over a particular link; (iv) The amount of state 899 information which needs to be maintained by any one node. We do not 900 intend to perform a complete analysis of all possible scaling issues, 901 and understand that our use of the terms "O(n)" and "O(n-squared)" is 902 approximate only. 904 3.3 Types of Streams 906 Switched paths in the MPLS network can be of different types: 908 - point-to-point 909 - multipoint-to-point 910 - point-to-multipoint 911 - multipoint-to-multipoint 913 Two of the factors that determine which type of switched path is used 914 are (i) The capability of the switches employed in a network; (ii) 915 The purpose of the creation of a switched path; that is, the types of 916 flows to be carried in the switched path. These two factor also 917 determine the scalability of a network in terms of the number of 918 switched paths in use for transporting data through a network. 920 The point-to-point switched path can be used to connect all ingress 921 nodes to all the egress nodes to carry unicast traffic. In this 922 case, since an ingress node has point-to-point connections to all the 923 egress nodes, the number of connections in use for transporting 924 traffic is of O(n-squared), where n is the number of edges MPLS 925 devices. For small networks the full mesh connection approach may 926 suffice and not pose any scalability problems. However, in large 927 enterprise backbone or ISP networks, this will not scale well. 929 Point-to-point switched paths may be used on a host-to-host or 930 application to application basis (e.g., a switched path per RSVP 931 flow). The dedicated point-to-point switched path transports the 932 unicast data from the ingress to the egress node of the MPLS network. 933 This approach may be used for providing QoS services or for best- 934 effort traffic. 936 A multipoint-to-point switched path connects all ingress nodes to an 937 single egress node. At a given intermediate node in the multipoint- 938 to- point switched path, L2 data units from several upstream links 939 are "merged" into a single label on a downstream link. Since each 940 egress node is reachable via a single multipoint-to-point switched 941 path, the number of switched paths required to transport best-effort 942 traffic through a MPLS network is O(n), where n is the number of 943 egress nodes. 945 The point-to-multipoint switched path is used for distributing 946 multicast traffic. This switched path tree mirrors the multicast 947 distribution tree as determined by the multicast routing protocols. 948 Typically a switch capable of point-to-multipoint connection 949 replicates an L2 data unit from the incoming (parent) interface to 950 all the outgoing (child) interfaces. Standard ATM switches support 951 such functionality in the form of point-to-multipoint VCs or VPs. 953 A multipoint-to-multipoint switched path may be used to combine 954 multicast traffic from multiple sources into a single multicast 955 distribution tree. The advantage of this is that the multipoint-to- 956 multipoint switched path is shared by multiple sources. Conceptually, 957 a form of multipoint-to-multipoint can be thought of as follows: 958 Suppose that you have a point to multipoint VC from each node to all 959 other nodes. Suppose that any point where two or more VCs happen to 960 merge, you merge them into a single VC or VP. This would require 961 either coordination of VCI spaces (so that each source has a unique 962 VCI within a VP) or VC merge capabilities. The applicability of 963 similar concepts to MPLS is FFS. 965 3.4 Data Driven versus Control Traffic Driven Label Assignment 967 A fundamental concept in MPLS is the association of labels and 968 network layer routing. Each LSR must assign labels, and distribute 969 them to its forwarding peers, for traffic which it intends to forward 970 by label swapping. In the various contributions that have been made 971 so far to the MPLS WG we identify three broad strategies for label 972 assignment; (i) those driven by topology based control traffic 973 [TAG][ARIS][IP navigator]; (ii) Those driven by request based control 974 traffic [RSVP]; and (iii) those driven by data traffic 975 [CSR][Ipsilon]. 977 We also note that in actual practice combinations of these methods 978 may be employed. One example is that topology based methods for best 979 effort traffic plus request based methods for support of RSVP. 981 3.4.1 Topology Driven Label Assignment 983 In this scheme labels are assigned in response to normal processing 984 of routing protocol control traffic. Examples of such control 985 protocols are OSPF and BGP. As an LSR processes OSPF or BGP updates 986 it can, as it makes or changes entries in its forwarding tables, 987 assign labels to those entries. 989 Among the properties of this scheme are: 991 - The computational load of assignment and distribution and the 992 bandwidth consumed by label distribution are bounded by the size of 993 the network. 995 - Labels are in the general case preassigned. If a route exists then 996 a label has been assigned to it (and distributed). Traffic may be 997 label swapped immediately it arrives, there is no label setup 998 latency at forwarding time. 1000 - Requires LSRs to be able to process control traffic load only. 1002 - Labels assigned in response to the operation of routing protocols 1003 can have a granularity equivalent to that of the routes advertised 1004 by the protocol. Labels can, by this means, cover (highly) 1005 aggregated routes. 1007 3.4.2 Request Driven Label Assignment 1009 In this scheme labels are assigned in response to normal processing 1010 of request based control traffic. Examples of such control protocols 1011 are RSVP. As an LSR processes RSVP messages it can, as it makes or 1012 changes entries in its forwarding tables, assign labels to those 1013 entries. 1015 Among the properties of this scheme are: 1017 - The computational load of assignment and distribution and the 1018 bandwidth consumed by label distribution are bounded by the amount 1019 of control traffic in the system. 1021 - Labels are in the general case preassigned. If a route exists then 1022 a label has been assigned to it (and distributed). Traffic may be 1023 label swapped immediately it arrives, there is no label setup 1024 latency at forwarding time. 1026 - Requires LSRs to be able to process control traffic load only. 1028 - Depending upon the number of flows supported, this approach may 1029 require a larger number of labels to be assigned compared with 1030 topology driven assignment. 1032 - This approch requires applications to make use of request paradigm 1033 in order to get a label assigned to their flow. 1035 3.4.3 Traffic Driven Label Assignment 1037 In this scheme the arrival of data at an LSR "triggers" label 1038 assignment and distribution. Traffic driven approach has the 1039 following characteristics. 1041 - Label assignment and distribution costs are a function of 1042 traffic patterns. In an LSR with limited label space that is 1043 using a traffic driven approach to amortize its labels over a 1044 larger number of flows the overhead due to label assignment 1045 and distribution grows as a function of the number of flows 1046 and as a function of their "persistence". Short lived but 1047 recurring flows may impose a heavy control burden. 1049 - There is a latency associated with the appearance of a "flow" 1050 and the assignment of a label to it. The documented approaches 1051 to this problem suggest L3 forwarding during this setup phase, 1052 this has the potential for packet reordering (note that packet 1053 reordering may occur with any scheme when the network topology 1054 changes, but traffic driven label assignment introduces another 1055 cause for reordering). 1057 - Flow driven label assignment requires high performance packet 1058 classification capabilities. 1060 - Traffic driven label assignment may be useful to reduce label 1061 consumption (assuming that flows are not close to full mesh). 1063 - If you want flows to hosts, due to limits on label space, then 1064 traffic based label consumption is probably necessary due to 1065 the large number of hosts which may occur in a network. 1067 - If you want to assign specific network resources to specific 1068 labels, to be used for support of application flows, then 1069 again the fine grain associated with labels may require data 1070 based label assignment. 1072 3.5 The Need for Dealing with Looping 1074 Routing protocols which are used in conjunction with MPLS will in 1075 many cases be based on distributed computation. As such, during 1076 routing transients, these protocols may compute forwarding paths 1077 which contain loops. For this reason MPLS will be designed with 1078 mechanisms to either prevent the formation of loops and /or contain 1079 the amount of resources that can be consumed due to the presence of 1080 loops. 1082 Note that there are a number of different alternative mechanisms 1083 which have been proposed (see section 4.3). Some of these prevent the 1084 formation of layer 2 forwarding loops, others allow loops to form but 1085 minimize their impact in one way or another (e.g., by discarding 1086 packets which loop, or by detecting and closing the loop after a 1087 period of time). Generally speaking, there are tradeoffs to be made 1088 between the amount of looping which might occur, and other 1089 considerations such as the time to convergence after a change in the 1090 paths computed by the routing algorithm. 1092 We are not proposing any changes to normal layer 3 operation, and 1093 specifically are not trying to eliminate the possibility of looping 1094 at layer 3. Transient loops will continue to be possible in IP 1095 networks. Note that IP has a means to limit the damage done by 1096 looping packets, based on decrementing the IP TTL field as the packet 1097 is forwarded, and discarding packets whose TTL has expired. Dynamic 1098 routing protocols used with IP are also designed to minimize the 1099 amount of time during which loops exist. 1101 The question that MPLS has to deal with is what to do at L2. In some 1102 cases L2 may make use of the same method that is used as L3. However, 1103 other options are available at L2, and in some cases (specifically 1104 when operating over ATM or Frame Relay hardware) the method of 1105 decrementing a TTL field (or any similar field) is not available. 1107 There are basically two problems caused by packet looping: The most 1108 obvious problem is that packets are not delivered to the correct 1109 destination. The other result of looping is congestion. Even with TTL 1110 decrementing and packet discard, there may still be a significant 1111 amount of time that packets travel through a loop. This can adversely 1112 affect other packets which are not looping: Congestion due to the 1113 looping packets can cause non-looping packets to be delayed and/or 1114 discarded. 1116 Looping is particularly serious in (at least) three cases: One is 1117 when forwarding over ATM. Since ATM does not have a TTL field to 1118 decrement, there is no way to discard ATM cells which are looping 1119 over ATM subnetworks. Standard ATM PNNI routing and signaling solves 1120 this problem by making use of call setup procedures which ensure that 1121 ATM VCs will never be setup in a loop [PNNI]. However, when MPLS is 1122 used over ATM subnets, the native ATM routing and signaling 1123 procedures may not be used for the full L2 path. This leads to the 1124 possibility that MPLS over ATM might in principle allow packets to 1125 loop indefinitely, or until L3 routing stabilizes. Methods are needed 1126 to prevent this problem. 1128 Another case in which looping can be particularly unpleasant is for 1129 multicast traffic. With multicast, it is possible that the packet may 1130 be delivered successfully to some destinations even though copies 1131 intended for other destinations are looping. This leads to the 1132 possibility that huge numbers of identical packets could be delivered 1133 to some destinations. Also, since multicast implies that packets are 1134 duplicated at some points in their path, the congestion resulting 1135 from looping packets may be particularly severe. 1137 Another unpleasant complication of looping occurs if the congestion 1138 caused by the loop interferes with the routing protocol. It is 1139 possible for the congestion caused by looping to cause routing 1140 protocol control packets to be discarded, with the result that the 1141 routing protocol becomes unstable. For example this could lengthen 1142 the duration of the loop. 1144 In normal operation of IP networks the impact of congestion is 1145 limited by the fact that TCP backs off (i.e., transmits substantially 1146 less traffic) in response to lost packets. Where the congestion is 1147 caused by looping, the combination of TTL and the resulting discard 1148 of looping packets, plus the reduction in offered traffic, can limit 1149 the resulting impact on the network. TCP backoff however does not 1150 solve the problem if the looping packets are not discarded (for 1151 example, if the loop is over an ATM subnetwork where TTL is not 1152 used). 1154 Methods for dealing with loops are discussed in section 4.3. 1156 3.6 Operations and Management 1158 Operations and management of networks is critically important. This 1159 implies that MPLS must support operations, administration, and 1160 maintenance facilities at least as extensive as those supported in 1161 current IP networks. 1163 In most ways this is a relatively simple requirement to meet. Given 1164 that all MPLS nodes run normal IP routing protocols, it is 1165 straightforward to expect them to participate in normal IP network 1166 management protocols. 1168 There is one issue which has been identified and which needs to be 1169 addressed by the MPLS effort: There is an issue with regard to 1170 operation of Traceroute over MPLS networks. Note that other O&M 1171 issues may be identified in the future. 1173 Traceroute is a very commonly used network management tool. 1174 Traceroute is based on use of the TTL field: A station trying to 1175 determine the route from itself to a specified address transmits 1176 multiple IP packets, with the TTL field set to 1 in the first packet, 1177 2 in the second packet, etc.. This causes each router along the path 1178 to send back an ICMP error report for TTL exceeded. This in turn 1179 allows the station to determine the set of routers along the route. 1180 For example, this can be used to determine where a problem exists (if 1181 no router responds past some point, the last router which responds 1182 can become the starting point for a search to determine the cause of 1183 the problem). 1185 When MPLS is operating over ATM or Frame Relay networks there is no 1186 TTL field to decrement (and ATM and Frame Relay forwarding hardware 1187 does not decrement TTL). This implies that it is not straightforward 1188 to have Traceroute operate in this environment. 1190 There is the question of whether we *want* all routers along a path 1191 to be visible via traceroute. For example, an ISP probably doesn't 1192 want to expose the interior of their network to a customer. However, 1193 the issue of whether a network's policy will allow the interior of 1194 the network to be visible should be independent of whether is it 1195 possible for some users to see the interior of the network. Thus 1196 while there clearly should be the possibility of using policy 1197 mechanisms to block traceroute from being used to see the interior of 1198 the network, this does not imply that it is okay to develop protocol 1199 mechanisms which break traceroute from working. 1201 There is also the question of whether the interior of a MPLS network 1202 is analogous to a normal IP network, or whether it is closer to the 1203 interior of a layer 2 network (for example, an ATM subnet). Clearly 1204 IP traceroute cannot be used to expose the interior of an ATM subnet. 1205 When a packet is crossing an ATM subnetwork (for example, between an 1206 ingress and an egress router which are attached to the ATM subnet) 1207 traceroute can be used to determine the router to router path, but 1208 not the path through the ATM switches which comprise the ATM subnet. 1209 Note here that MPLS forms a sort of "in between" special case: 1210 Routing is based on normal IP routing protocols, the equivalent of 1211 call setup (label binding/exchange) is based on MPLS-specific 1212 protocols, but forwarding is based on normal L2 ATM forwarding. MPLS 1213 therefore supersedes the normal ATM-based methods that would be used 1214 to eliminate loops and/or trace paths through the ATM subnet. 1216 It is generally agreed that Traceroute is a relatively "ugly" tool, 1217 and that a better tool for tracing the route of a packet would be 1218 preferable. However, no better tool has yet been designed or even 1219 proposed. Also, however ugly Traceroute may be, it is nonetheless 1220 very useful, widely deployed, and widely used. In general, it is 1221 highly preferable to define, implement, and deploy a new tool, and to 1222 determine through experience that the new tool is sufficient, before 1223 breaking a tool which is as widely used as traceroute. 1225 Methods that may be used to either allow traceroute to be used in an 1226 MPLS network, or to replace traceroute, are discussed in section 1227 4.14. 1229 4. Technical Approaches 1231 We believe that section 4 is probably less complete than other 1232 sections. Additional subsections are likely to be needed as a result 1233 of additional discussions in the MPLS working group. 1235 4.1 Label Distribution 1237 A fundamental requirement in MPLS is that an LSR forwarding label 1238 switched traffic to another LSR apply a label to that traffic which 1239 is meaningful to the other (receiving the traffic) LSR. LSR's could 1240 learn about each other's labels in a variety of ways. We call the 1241 general topic "label distribution". 1243 4.1.1 Explicit Label Distribution 1245 Explicit label distribution anticipates the specification by MPLS of 1246 a standard protocol for label distribution. Two of the possible 1247 approaches [TDP] [ARIS] that are oriented toward topology driven 1248 label distribution. One other approach [FANP], in contrast, makes use 1249 of traffic driven label distribution. 1251 We expect that the label distribution protocol (LDP) which emerges 1252 from the MPLS WG is likely to inherit elements from one or more of 1253 possible approaches. 1255 Consider LSR A forwarding traffic to LSR B. We call A the upstream 1256 (wrt to dataflow) LSR and B the downstream LSR. A must apply a label 1257 to the traffic that B "understands". Label distribution must ensure 1258 that the "meaning" of the label will be communicated between A and B. 1259 An important question is whether A or B (or some other entity) 1260 allocates the label. 1262 In this discussion we are talking about the allocation and 1263 distribution of labels between two peer LSRs that are on a single 1264 segment of what may be a longer path. A related but in fact entirely 1265 separate issue is 1267 the question of where control of the whole path resides. In essence 1268 there are two models; by analogy to upstream and downstream for a 1269 single segment we can talk about ingress and egress for an LSP (or to 1270 and from a label swapping "domain"). In one model a path is setup 1271 from ingress to egress in the other from egress to ingress. 1273 4.1.1.1 Downstream Label Allocation 1275 "Downstream Label Allocation" refers to a method where the label 1276 allocation is done by the downstream LSR, i.e. the LSR that uses the 1277 label as an index into its switching tables. 1279 This is, arguably, the most natural label allocation/distribution 1280 mode for unicast traffic. As an LSR build its routing tables (we 1281 consider here control driven allocation of tags) it is free, within 1282 some limits we will discuss, to allocate labels to in any manner that 1283 may be convenient to the particular implementation. Since the labels 1284 that it allocates will be those upon which it subsequently makes 1285 forwarding decisions we assume implementations will perform the 1286 allocation in an optimal manner. Having allocated labels the default 1287 behavior is to distribute the labels (and bindings) to all peers. 1289 In some cases (particularly with ATM) there may be a limited number 1290 of labels which may be used across an interface, and/or a limited 1291 number of label assignments which may be supported by a single 1292 device. Operation in this case may make use of "on demand" label 1293 assignment. With this approach, an LSR may for example request a 1294 label for a route from a particular peer only when its routing 1295 calculations indicate that peer to be the new next hop for the route. 1297 4.1.1.2 Upstream Label Allocation 1299 "Upstream Label Allocation" refers to a method where the label 1300 allocation is done by the upstream LSR. In this case the LSR choosing 1301 the label (the upstream LSR) and the LSR which needs to interpret 1302 packets using the label (the downstream LSR) are not the same node. 1303 We note here that in the upstream LSR the label at issue is not used 1304 as an index into the switching tables but rather is found as the 1305 result of a lookup on those tables. 1307 The motivation for upstream label allocation comes from the 1308 recognition that it might be possible to optimize multicast machinery 1309 in an LSR if it were possible to use the same label on all output 1310 ports for which a particular multicast packet/cell were destined. 1311 Upstream assignment makes this possible. 1313 4.1.1.3 Other Label Allocation Methods 1315 Another option would be to make use of label values which are unique 1316 within the MPLS domain (implying that a domain-wide allocation would 1317 be needed). In this case, any stream to a particular MPLS egress node 1318 could make use of the label of that node (implying that label values 1319 do not need to be swapped at intermediate nodes). 1321 With this method of label allocation, there is a choice to be made 1322 regarding the scope over which a label is unique. One approach is to 1323 configure each node in an MPLS domain with a label which is unique in 1324 that domain. Another approach is to use a truly global identifier 1325 (for example the IEEE 48 bit identifier), where each MPLS-capable 1326 node would be stamped at birth with a truly globally unique 1327 identifier. The point of this global approach is to simplify 1328 configuration in each MPLS domain by eliminating the need to 1329 configure label IDs. 1331 4.1.2 Piggybacking on Other Control Messages 1333 While we have discussed use of an explicit MPLS LDP we note that 1334 there are several existing protocols that can be easily modified to 1335 distribute both routing/control and label information. This could be 1336 done with any of OSPF, BGP, RSVP and/or PIM. A particular 1337 architectural elegance of these schemes is that label distribution 1338 uses the same mechanisms as are used in distribution of the 1339 underlying routing or control information. 1341 When explicit label distribution is used, the routing computation and 1342 label distribution are decoupled. This implies a possibility that at 1343 some point you may either have a route to a specific destination 1344 without an associated label, and/or a label for a specific 1345 destination which makes use of a path which you are no longer using. 1346 Piggybacking label distribution on the operation of the routing 1347 protocol is one way to eliminate this decoupling. 1349 Piggybacking label distribution on the routing protocol introduces an 1350 issue regarding how to negotiate acceptable label values and what to 1351 do if an invalid label is received. This is discussed in section 1352 4.1.3. 1354 4.1.3 Acceptable Label Values 1356 There are some constraints on which label values may be used in 1357 either allocation mode. Clearly the label values must lie within the 1358 allowable range described in the encapsulation standards that the 1359 MPLS WG will produce. The label value used must also, however, lie 1360 within a range that the peer LSR is capable of supporting. We 1361 imagine that certain machines, for example ATM switches operating as 1362 LSRs may, due to operational or implementation restrictions, support 1363 a label space more limited than that bounded by the valid range found 1364 in the encapsulation standard. This implies that an advertisement or 1365 negotiation mechanism for useable label range may be a part of the 1366 MPLS LDP. When operating over ATM using ATM forwarding hardware, due 1367 to the need for compatibility with the existing use of the ATM 1368 VPI/VCI space, it is quite likely that an explicit mechanism will be 1369 needed for label range negotiation. 1371 In addition we note that LDP may be one of a number of mechanism used 1372 to distribute labels between any given pair of LSRs. Clearly where 1373 such multiple mechanisms exist care must be taken to coordinate the 1374 allocation of label values. A single label value must have a unique 1375 meaning to the LSR that distributes it. 1377 There is an issue regarding how to allow negotiation of acceptable 1378 label values if label distribution is piggybacked with the routing 1379 protocol. In this case it may be necessary either to require 1380 equipment to accept any possible label value, or to configure devices 1381 to know which range of label values may be selected. It is not clear 1382 in this case what to do if an invalid label value is received as 1383 there may be no means of sending a NAK. 1385 A similar issue occurs with multicast traffic over broadcast media, 1386 where there may be multiple nodes which receive the same transmission 1387 (using a single label value). Here again it may be "non-trivial" how 1388 to allow n-party negotiation of acceptable label values. 1390 4.1.4 LDP Reliability 1392 The need for reliable label distribution depends upon the relative 1393 performance of L2 and L3 forwarding, as well as the relationship 1394 between label distribution and the routing protocol operation. 1396 If label distribution is tied to the operation of the routing 1397 protocol, then a reasonable protocol design would ensure that labels 1398 are distributed successfully as long as the associated route and/or 1399 reachability advertisement is distributed successfully. This implies 1400 that the reliability of label distribution will be the same as the 1401 reliability of route distribution. 1403 If there is a very large difference between L2 and L3 forwarding 1404 performance, then the cost of failing to deliver a label is 1405 significant. In this case it is important to ensure that labels are 1406 distributed reliably. Given that LDP needs to operate in a wide 1407 variety of environments with a wide variety of equipment, this 1408 implies that it is important for any LDP developed by the MPLS WG to 1409 ensure reliable delivery of label information. 1411 4.1.5 Label Purge Mechanisms 1413 Another issue to be considered is the "lifetime" of label data once 1414 it arrives at an LSR, and the method of purging label data. There are 1415 several methods that could be used either separately, or (more 1416 likely) in combination. 1418 One approach is for label information to be timed out. With this 1419 approach a lifetime is distributed along with the label value. The 1420 label value may be refreshed prior to timing out. If the label is not 1421 refreshed prior to timing out it is discarded. In this case each 1422 lifetime and timer may apply to a single label, or to a group of 1423 labels (e.g., all labels selected by the same node). 1425 Similarly, two peer nodes may make use of an MPLS peer keepalive 1426 mechanism. This implies exchange of MPLS control packets between 1427 neighbors on a periodic basis. This in general is likely to use a 1428 smaller timeout value than label value timers (analogous to the fact 1429 that the OSPF HELLO interval is much shorter than the OSPF LSA 1430 lifetime). If the peer session between two MPLS nodes fails (due to 1431 expiration of the associated timer prior to reception of the refresh) 1432 then associated label information is discarded. 1434 If label information is piggybacked on the routing protocol then the 1435 timeout mechanisms would also be taken from the associated routing 1436 protocol (note that routing protocols in general have mechanisms to 1437 invalidate stale routing information). 1439 An alternative method for invalidating labels is to make use of an 1440 explicit label removal message. 1442 4.2 Stream Merging 1444 In order to scale O(n) (rather than O(n-squared), MPLS makes use of 1445 the concept of stream merge. This makes use of multipoint to point 1446 streams in order to allow multiple streams to be merged into one 1447 stream. 1449 Types of Stream Merge 1451 There are several types of stream merge that can be used, depending 1452 upon the underlying media. 1454 When MPLS is used over frame based media merging is straightforward. 1455 All that is required for stream merge to take place is for a node to 1456 allow multiple upstream labels to be forwarded the same way and 1457 mapped into a single downstream label. This is referred to as frame 1458 merge. 1460 Operation over ATM media is less straightforward. In ATM, the data 1461 packets are encapsulated into an ATM Adaptation Layer, say AAL5, and 1462 the AAL5 PDU is segmented into ATM cells with a VPI/VCI value and the 1463 cells are transmitted in sequence. It is contingent on ATM switches 1464 to keep the cells of a PDU (or with the same VPI/VCI value) 1465 contiguous and in sequence. This is because the device that 1466 reassembles the cells to re-form the transmitted PDU expects the 1467 cells to be contiguous and in sequence, as there isn't sufficient 1468 information in the ATM cell header (unlike IP fragmentation) to 1469 reassemble the PDU with any cell order. Hence, if cells from several 1470 upstream link are transmitted onto the same downstream VPI/VCI, then 1471 cells from one PDU can get interleaved with cells from another PDU on 1472 the outgoing VPI/VCI, and result in corruption of the original PDUs 1473 by mis-sequencing the cells of each PDU. 1475 The most straightforward (but erroneous) method of merging in an ATM 1476 environment would be to take the cells from two incoming VCs and 1477 merge them into a single outgoing VCI. If this was done without any 1478 buffering of cells then cells from two or more packets could end up 1479 being interleaved into a single AAL5 frame. Therefore the problem 1480 when operating over ATM is how to avoid interleaving of cells from 1481 multiple sources. 1483 There are two ways to solve this interleaving problem, which are 1484 referred to as VC merge and VP merge. 1486 VC merge allows multiple VCs to be merged into a single outgoing VC. 1487 In order for this to work the node performing the merge needs to keep 1488 the cells from one AAL5 frame (e.g., corresponding to an IP packet) 1489 separate from the cells of other AAL5 frames. This may be done by 1490 performing the SAR function in order to reassemble each IP packet 1491 before forwarding that packet. In this case VC merge is essentially 1492 equivalent to frame merge. An alternative is to buffer the cells of 1493 one AAL5 frame together, without actually reassembling them. When the 1494 end of frame indicator is reached that frame can be forwarded. Note 1495 however that both forms of VC merge requires that the entire AAL5 1496 frame be received before any cells corresponding to that frame be 1497 forwarded. VC merge therefore requires capabilities which are 1498 generally not available in most existing ATM forwarding hardware. 1500 The alternative for use over ATM media is VP merge. Here multiple VPs 1501 can be merged into a single VP. Separate VCIs within the merged VP 1502 are used to distinguish frames (e.g., IP packets) from different 1503 sources. In some cases, one VP may be used for the tree from each 1504 ingress node to a single egress node. 1506 VP merge requires that the VCIs be coordinated to ensure uniqueness. 1507 This may be accomplished either by pre-configuring each node with a 1508 unique VCI value (or values), or by having some one node (most likely 1509 they root of the multipoint to point tree) coordinate the VCI values 1510 used within the VP. Note also that if the root coordinates the VCI 1511 space, then some protocol mechanism will be needed to allow this to 1512 occur. How hard this is to do depends somewhat upon whether the root 1513 is otherwise involved in coordinating the multipoint to point tree. 1514 For example, allowing one node (such as the root) to coordinate the 1515 tree may be useful for purposes of coordinating load sharing. Thus 1516 whether or not the issue of coordinating the VCI space is significant 1517 or trivial may depend upon other design choices which at first glance 1518 may have appeared to be independent protocol design choices. 1520 Buffering Issues Related To Stream Merge 1522 There is an issue regarding the amount of buffering required for 1523 frame merge, VC merge, and VP merge. Frame merge and VC merge 1524 requires that intermediate points buffer incoming packets until the 1525 entire packet arrives. This is essentially the same as is required in 1526 traditional IP routers. 1528 VP merge allows cells to be transmitted by intermediate nodes as soon 1529 as they arrive, reducing the buffering and latency at intermediate 1530 nodes. However, the use of VP merge implies that cells from multiple 1531 packets will arrive at the egress node interleaved on separate VCIs. 1532 This in turn implies that the egress node may have somewhat increased 1533 buffering requirements. To a large extent egress nodes for some 1534 destinations will be intermediate nodes for other destinations, 1535 implying that increase in buffers required for some purpose (egress 1536 traffic) will be offset by a reduction in buffers required for other 1537 purposes (transit traffic). Also, routers today typically deal with 1538 high-fanout channelized interfaces and with multi-VC ATM interfaces, 1539 implying that the requirement of buffering simultaneously arriving 1540 cells from multiple packets and sources is something that routers 1541 typically do today. This is not meant to imply that the required 1542 buffer size and performance is inexpensive, but rather is meant to 1543 observe that it is a solvable issue. 1545 4.3 Loop Handling 1547 Generally, methods for dealing with loops can be split into three 1548 categories: Loop Survival makes use of methods which minimize the 1549 impact of loops, for example by limiting the amount of network 1550 resources which can be consumed by a loop; Loop Detection allows 1551 loops to be set up, but later detects these loops and eliminates 1552 them; Loop Prevention provides methods for avoiding setting up L2 1553 forwarding in a way which results in a L2 loop; 1555 Note that we are concerned here only with loops that occur in L2 1556 forwarding. Transient loops at L3 will continue to be part of the 1557 normal IP operation, and will be handled the way that IP has been 1558 handling loops for years (see section 3.5). 1560 Loop Survival 1562 Loop Survival refers to methods that are used to allow the network to 1563 operate well even though short term transient loops may be formed by 1564 the routing protocol. The basic approach to loop survival is to limit 1565 the amount of network resources which are consumed by looping 1566 packets, and to minimize the effect on other (non-looping) traffic. 1567 Note that loop survival is the method used by conventional IP 1568 forwarding, and is therefore based on long and relatively successful 1569 experience in the Internet. 1571 The most basic method for loop survival is based on the use to a TTL 1572 (Time To Live) field. The TTL field is decremented at each hop. If 1573 the TTL field reaches zero, then the packet is discarded. This method 1574 works well over those media which has a TTL field. This explicitly 1575 includes L3 IP forwarding. Also, assuming that the core MPLS 1576 specifications will include definition of a "shim" MPLS header for 1577 use over those media which do not have their own labels, in order to 1578 carry labels for use in forwarding of user data, it is likely that 1579 the shim header will also include a TTL field. 1581 However, there is considerable interest in using MPLS over L2 1582 protocols which provide their own labels, with the L2 label used for 1583 MPLS forwarding. Specific L2 protocols which offer a label for this 1584 purpose include ATM and Frame Relay. However, neither ATM nor Frame 1585 Relay have a TTL field. This implies that this method cannot be used 1586 when basic ATM or Frame Relay forwarding is being used. 1588 Another basic method for loop survival is the use of dynamic routing 1589 protocols which converge rapidly to non-looping paths. In some 1590 instances it is possible that congestion caused by looping data could 1591 effect the convergence of the routing protocol (see section 3.5). 1592 MPLS should be designed to prevent this problem from occurring. Given 1593 that MPLS uses the same routing protocols as are used for IP, this 1594 method does not need to be discussed further in this framework 1595 document. 1597 Another possible tool for loop survival is the use of fair queuing. 1598 This allows unrelated flows of user data to be placed in different 1599 queues. This helps to ensure that a node which is overloaded with 1600 looping user data can nonetheless forward unrelated non-looping data, 1601 thereby minimizing the effect that looping data has on other data. We 1602 cannot assume that fair queuing will always be available. In 1603 practice, many fair queuing implementations merge multiple streams 1604 into one queue (implying that the number of queues used is less than 1605 the number of user data flows which are present in the network). 1606 This implies that any data which happens to be in the same queue with 1607 looping data may be adversely effected. 1609 Loop Detection 1611 Loop Detection refers to methods whereby a loop may be set up at L2, 1612 but the loop is subsequently detected. When the loop is detected, it 1613 may be broken at L2 by dropping the label relationship, implying that 1614 packets for a set of destinations must be forwarded at L3. 1616 A possible method for loop detection is based on transmitting a "loop 1617 detection" control packet (LDCP) along the path towards a specified 1618 destination whenever the route to the destination changes. This LDCP 1619 is forwarded in the direction that the label specifies, with the 1620 labels swapped to the correct next hop value. However, normal L2 1621 forwarding cannot be used because each hop needs to examine the 1622 packet to check for loops. The LDCP is forwarded towards that 1623 destination until one of the following happens: (i) The LDCP reaches 1624 the last MPLS node along the path (i.e. the next hop is either a 1625 router which is not participating in MPLS, or is the final 1626 destination host); (ii) The TTL of the LDCP expires (assuming that 1627 the control packet uses a TTL, which is optional but not absolutely 1628 necessary), or (iii) The LDCP returns to the node which originally 1629 transmitted it. If the latter occurs, then the packet has looped and 1630 the node which originally transmitted the LDCP stops using the 1631 associated label, and instead uses L3 forwarding for the associated 1632 destination addresses. One problem with this method is that once a 1633 loop is detected it is not known when the loop clears. One option 1634 would be to set a timer, and to transmit a new LDCP when the timer 1635 expires. 1637 An alternate method counts the hops to each egress node, based on the 1638 routes currently available. Each node advertises its distance (in hop 1639 counts) to each destination. An egress node advertises the 1640 destinations that it can reach directly with an associated hop count 1641 of zero. For each destination, a node computes the hop count to that 1642 destination based on adding one to the hop count advertised by its 1643 actual next hop used for that destination. When the hop count for a 1644 particular destination changes, the hop counts needs to be 1645 readvertised. 1647 In addition, the first of the loop prevention schemes discussed below 1648 may be modified to provide loop detection (the details are 1649 straightforward, but have not been written down in time to include in 1650 this rough draft). 1652 Loop Prevention 1654 Loop prevention makes use of methods to ensure that loops are never 1655 set up at L2. This implies that the labels are not used until some 1656 method is used to ensure that following the label towards the 1657 destination, with associated label swaps at each switch, will not 1658 result in a loop. Until the L2 path (making use of assigned labels) 1659 is available, packets are forwarded at L3. 1661 Loop prevention requires explicit signaling of some sort to be used 1662 when setting up an L2 stream. 1664 One method of loop prevention requires that labels be propagated 1665 starting at the egress switch. The egress switch signals to 1666 neighboring switches the label to use for a particular destination. 1667 That switch then signals an associated label to its neighbors, etc. 1668 The control packets which propagate the labels also include the path 1669 to the egress (as a list of routerIDs). Any looping control packet 1670 can therefore be detected and the path not set up to or past the 1671 looping point. 1673 . 1676 Another option is to use source routing to set up label bindings from 1677 the egress switch to each ingress switch. This precludes the 1678 possibility of looping, since the entire path is computed by one 1679 node. This also allows non-looping paths to be set up provided that 1680 the egress switch has a view of the topology which is reasonably 1681 close to reality (if there are operational links which the egress 1682 switch doesn't know about, it will simply pick a path which doesn't 1683 use those links; if there are links which have failed but which the 1684 the egress switch thinks are operational, then there is some chance 1685 that the setup attempt will fail but in this case the attempt can be 1686 retried on a separate path). Note therefore that non-looping paths 1687 can be set up with this method in many cases where distributed 1688 routing plus hop by hop forwarding would not actually result in non- 1689 looping paths. This method is similar to the method used by standard 1690 ATM routing to ensure that SVCs are non-looping [PNNI]. 1692 Source routing is only applicable if the routing protocol gives the 1693 egress switch sufficient information to set up the source route, 1694 implying that the protocol must be either a link state protocol (such 1695 as OSPF) or a path vector protocol (such as BGP). Source routing 1696 therefore is not appropriate as a general approach for use in any 1697 network regardless of the routing protocol. This method also requires 1698 some overhead for the call setup before label-based forwarding can be 1699 used. If the network topology changes in a manner which breaks the 1700 existing path, then a new path will need to be source routed from the 1701 egress switch. Due to this overhead this method is probably only 1702 appropriate if other significant advantages are also going to be 1703 obtained from having a single node (the egress switch) coordinate the 1704 paths to be used. Examples of other reasons to have one node 1705 coordinate the paths to a single egress switch include: (i) 1706 Coordinating the VCI space where VP merge is used (see section 4.2); 1707 and (ii) Coordinating the routing of streams from multiple ingress 1708 switches to one egress switch so as to balance the load on multiple 1709 alternate paths through the network. 1711 In principle the source routing could also be done in the alternate 1712 direction (from ingress to egress). However, this would make it more 1713 difficult to merge streams if stream merge is to be used. This would 1714 also make it more difficult to coordinate (i) changes to the paths 1715 used, (ii) the VCI space assignments, and (iii) load sharing. This 1716 therefore makes source routing more difficult, and also reduces the 1717 other advantages that could be obtained from the approach. 1719 If label distribution is piggybacked on the routing protocol (see 1720 section 4.1.2), then loop prevention is only possible if the routing 1721 protocol itself does loop prevention. 1723 What To Do If A Loop Is Detected 1725 With all of these schemes, if a loop is known to exist then the L2 1726 label-swapped path is not set up. This leads to the obvious question 1727 of what does an MPLS node do when it doesn't have a label for a 1728 particular destination, and a packet for that destination arrives to 1729 be forwarded? If possible, the packet is forwarded using normal L3 1730 (IP) forwarding. There are two issues that this raises: (i) What 1731 about nodes which are not capable of L3 forwarding; (ii) Given the 1732 relative speeds of L2 and L3 forwarding, does this work? 1734 Nodes which are not capable of L3 forwarding obviously can't forward 1735 a packet unless it arrives with a label, and the associated next hop 1736 label has been assigned. Such nodes, when they receive a packet for 1737 which the next hop label has not been assigned, must discard the 1738 packet. It is probably safe to assume that if a node cannot forward 1739 an L3 packet, then it is probably also incapable of forwarding an 1740 ICMP error report that it originates. This implies that the packet 1741 will need to be discarded in this case. 1743 In many cases L2 forwarding will be significantly faster than L3 1744 forwarding (allowing faster forwarding is a significant motivation 1745 behind the work on MPLS). This implies that if a node is forwarding a 1746 large volume of traffic at L2, and a change in the routing protocol 1747 causes the associated labels to be lost (necessitating L3 1748 forwarding), in some cases the node will not be capable of forwarding 1749 the same volume of traffic at L3. This will of course require that 1750 packets be discarded. However, in some cases only a relatively small 1751 volume of traffic will need to be forwarded at L3. Thus forwarding at 1752 L3 when L2 is not available is not necessarily always a problem. 1753 There may be some nodes which are capable of forwarding equally fast 1754 at L2 and L3 (for example, such nodes may contain IP forwarding 1755 hardware which is not available in all nodes). Finally, when packets 1756 are lost this will cause TCP to backoff, which will in turn reduce 1757 the load on the network and allow the network to stabilize even at 1758 reduced forwarding rates until such time as the label bindings can be 1759 reestablished. 1761 Note that in most cases loops will be caused either by configuration 1762 errors, or due to short term transient problems caused by the failure 1763 of a link. If only one link goes down, and if routing creates a 1764 normal "tree-shaped" set of paths to any one destination, then the 1765 failure of one link somewhere in the network will effect only one 1766 link's worth of data passing through any one node in the network. 1767 This implies that if a node is capable of forwarding one link's worth 1768 of data at L3, then in many or most cases it will have sufficient L3 1769 bandwidth to handle looping data. 1771 4.4 Interoperation with NHRP 1773 1778 When label switching is used over ATM, and there exists an LSR which 1779 is also operating as a Next Hop Client (NHC), the possibility of 1780 direct interaction arises. That is, could one switch cells between 1781 the two technologies without reassembly. To enable this several 1782 important issues must be addressed. 1784 The encapsulation must be acceptable to both MPLS and NHRP. If only 1785 a single label is used, then the null encapsulation could be used. 1786 Other solutions could be developed to handle label stacks. 1788 NHRP must understand and respect the granularity of a stream. 1790 Currently NHRP resolves an IP address to an ATM address. The response 1791 may include a mask indicating a range of addresses. However, any VC 1792 to the ATM address is considered to be a viable means of packet 1793 delivery. Suppose that an NHC NHRPs for IP address A and gets back 1794 ATM address 1 and sets up a VC to address 1. Later the same NHC NHRPs 1795 for a totally unrelated IP address B and gets back the same ATM 1796 address 1. In this case normal NHRP behavior allows the NHC to use 1797 the VC (that was set up for destination A) for traffic to B. 1799 Note: In this section we will refer to a VC set up as a result of an 1800 NHRP query/response as a shortcut VC. 1802 If one expects to be able to label switch the packets being received 1803 from a shortcut VC, then the label switch needs to be informed as to 1804 exactly what traffic will arrive on that VC and that mapping cannot 1805 change without notice. Currently there exists no mechanism in the 1806 defined signaling of an shortcut VC. Several means are possible. A 1807 binding, equivalent to the binding in LDP, could be sent in the setup 1808 message. Alternatively, the binding of prefix to label could remain 1809 in an LDP session (or whatever means of label distribution as 1810 appropriate) and the setup could carry a binding of the label to the 1811 VC. This would leave the binding mechanism for shortcut VCs 1812 independent of the label distribution mechanism. 1814 A further architectural challenge exists in that label switching is 1815 inherently unidirectional whereas ATM is bi-directional. The above 1816 binding semantics are fairly straight-forward. However, effectively 1817 using the reverse direction of a VC presents further challenges. 1819 Label switching must also respect the granularity of the shortcut VC. 1820 Without VC merge, this means a single label switched flow must map to 1821 a VC. In the case of VC merge, multiple label switched streams could 1822 be merged onto a single shortcut VC. But given the asymmetry 1823 involved, there is perhaps little practical use 1825 Another issue is one of practicality and usefulness. What is sent 1826 over the VC must be at a fine enough granularity to be label switched 1827 through receiving domain. One potential place where the two 1828 technologies might come into play is in moving data from one campus 1829 via the wide-area to another campus. In such a scenario, the two 1830 technologies would border precisely at the point where summarization 1831 is likely to occur. Each campus would have a detailed understanding 1832 of itself, but not of the other campus. The wide-area is likely to 1833 have summarized knowledge only. But at such a point level 3 1834 processing becomes the likely solution. 1836 4.5 Operation in a Hierarchy 1838 1840 4.6 Stacked Labels in a Flat Routing Environment 1842 1844 4.7 Multicast 1846 1848 4.8 Multipath 1850 Many IP routing protocols support the notion of equal-cost multipath 1851 routes, in which a router maintains multiple next hops for one 1852 destination prefix when two or more equal-cost paths to the prefix 1853 exist. There are a few possible approaches for handling multipath 1854 with MPLS. 1856 In this discussion we will use the term "multipath node" to mean a 1857 node which is keeping track of multiple switched paths from itself 1858 for a single destination. 1860 The first approach maintains a separate switched path from each 1861 ingress node via one or more multipath nodes to a merge point. This 1862 requires MPLS to distinguish the separate switched paths, so that 1863 learning of a new switched path is not misinterpreted as a 1864 replacement of the same switched path. This also requires an ingress 1865 MPLS node be capable of distributing the traffic among the multiple 1866 switched paths. This approach preserves switching performance, but at 1867 a cost of proliferating the number of switched paths. For example, 1868 each switched path consumes a distinct label. 1870 The second approach establishes only one switched path from any one 1871 ingress node to a destination. However, when the paths from two 1872 different ingress nodes happen to arrive at the same node, that node 1873 may use different paths for each (implying that the node becomes a 1874 multipath node). Thus the switched path chosen by the multipath node 1875 may assign a different downstream path to each incoming stream. This 1876 conserves switched paths and maintains switching performance, but 1877 cannot balance loads across downstream links as well as the other 1878 approaches, even if switched paths are selectively assigned. With 1879 this approach is that the L2 path may be different from the normal L3 1880 path, as traffic that otherwise would have taken multiple distinct 1881 paths is forced onto a single path. 1883 The third approach allows a single stream arriving at a multipath 1884 node to be split into multiple streams, by using L3 forwarding at the 1885 multipath node. For example, the multipath node might choose to use a 1886 hash function on the source and destination IP addresses, in order to 1887 avoid misordering packets between any one IP source and destination. 1888 This approach conserves switched paths at the cost of switching 1889 performance. 1891 4.9 Host Interactions 1893 There are a range of options for host interaction with MPLS: 1895 The most straightforward approach is no host involvement. Thus host 1896 operation may be completely independent of MPLS, rather hosts operate 1897 according to other IP standards. If there is no host involvement then 1898 this implies that the first hop requires an L3 lookup. 1900 If the host is ATM attached and doing NHRP, then this would allow the 1901 host to set up a Virtual Circuit to a router. However this brings up 1902 a range of issues as was discussed in section 4.4 ("interoperation 1903 with NHRP"). 1905 On the ingress side, it is reasonable to consider having the first 1906 hop LSR provide labels to the hosts, and thus have hosts attach 1907 labels for packets that they transmit. This could allow the first hop 1908 LSR to avoid an L3 lookup. It is reasonable here to have the host 1909 request labels only when needed, rather than require the host to 1910 remember all labels assigned for use in the network. 1912 On the egress side, it is questionable whether hosts should be 1913 involved. For scaling reasons, it would be undesirable to use a 1914 different label for reaching each host. 1916 4.10 Explicit Routing 1918 This section is FFS. 1920 4.11 Traceroute 1922 This section is FFS. 1924 4.12 Security 1926 Security in a network using MPLS should be relatively similar to 1927 security in a normal IP network. 1929 Routing in an MPLS network uses precisely the same IP routing 1930 protocols as are currently used with IP. This implies that route 1931 filtering is unchanged from current operation. Similarly, the 1932 security of the routing protocols is not effected by the use of MPLS. 1934 Packet filtering also may be done as in normal IP. This will require 1935 either (i) that label swapping be terminated prior to any firewalls 1936 performing packet filtering (in which case a separate instance of 1937 label swapping may optionally be started after the firewall); or (ii) 1938 that firewalls "look past the labels", in order to inspect the entire 1939 IP packet contents. In this latter case note that the label may imply 1940 semantics greater than that contained in the packet header: In 1941 particular, a particular label value may imply that the packet is to 1942 take a particular path after the firewall. In environments in which 1943 this is considered to be a security issue it may be desirable to 1944 terminate the label prior to the firewall. 1946 Note that in principle labels could be used to speed up the operation 1947 of firewalls: In particular, the label could be used as an index into 1948 a table which indicates the characteristics that the packet needs to 1949 have in order to pass through the firewall. Depending upon 1950 implementation considerations matching the contents of the packet to 1951 the contents of the table may be quicker than parsing the packet in 1952 the absence of the label. 1954 5. References 1956 [1] "ARIS: Aggregate Route-Based IP Switching", A. Viswanathan, N. 1957 Feldman, R. Boivie, R. Woundy, work in progress, Internet Draft 1958 , March 1997. 1960 [2] "ARIS Specification", N. Feldman, A. Viswanathan, work in 1961 progress, Internet Draft , March 1962 1997. 1964 [3] "ARIS Support for LAN Media Switching", S. Blake, A. Ghanwani, W. 1965 Pace, V. Srinivasan, work in progress, Internet Draft , March 1997. 1968 [4] "Tag Switching Architecture - Overview", Rekhter, Davie, Katz, 1969 Rosen, Swallow, Farinacci, work in progress, Internet Draft 1970 1972 [5] Tag distribution Protocol", Doolan, Davie, Katz, Rekhter, Rosen, 1973 work in progress, internet draft 1975 [6] "Use of Tag Switching with ATM", Davie, Doolan, Lawrence, 1976 McGloghrie, Rekhter, Rosen, Swallow, work in progress, Internet 1977 Draft 1979 [7] "Label Switching: Label Stack Encodings", Rosen, Rekhter, Tappan, 1980 Farinacci, Fedorkow, work in progress, internet draft 1983 [8] "Partitioning Tag Space among Multicast Routers on a Common 1984 Subnet", Farinacci, work in progress, internet draft 1987 [9] "Multicast Tag Binding and Distribution using PIM", Farinacci, 1988 Rekhter, work in progress, internet draft 1991 [10] "Toshiba's Router Architecture Extensions fir ATM: Overview", 1992 Katsube, Nagami, Esaki, RFC2098.TXT. 1994 [11] "Soft State Switching: A Proposal to Extend RSVP for Switching 1995 RSVP Flows", A. Viswanathan, V. Srinivasan, work in progress, 1996 Internet Draft , March 1997. 1998 [12] "Integrated Services in the Internet Architecture: an Overview", 1999 R. Braden et al, RFC 1633, June 1994. 2001 [13] "Resource ReSerVation Protocol (RSVP), Version 1 Functional 2002 Specification", work in progress, draft-ietf-rsvp-spec-14.txt, 2003 November 1996 2005 [14] "OSPF version 2", J. Moy, RFC 1583, March 1994. 2007 [15] "A Border Gateway Protocol 4 (BGP-4)", Y. Rekhter and T. Li, 2008 RFC 1771, March 1995. 2010 [16] "Ipsilon Flow Management Protocol Specification for IPv4 Version 2011 1.0", P. Newman et al., RFC 1953, May 1996. 2013 [17] "ATM Forum Private Network-Network Interface Specification, 2014 Version 1.0", ATM Forum af-pnni-0055.000, March 1996. 2016 [18] "NBMA Next Hop Resolution Protocol (NHRP)", J. Luciani et al., 2017 work in progress, draft-ietf-rolc-nhrp-11.txt, March 1997. 2019 6. Author's Addresses 2021 Ross Callon 2022 Cascade Communications Corp. 2023 5 Carlisle Road 2024 Westford, MA 01886 2025 508-952-7412 2026 rcallon@casc.com 2028 Paul Doolan 2029 Cisco Systems, Inc 2030 250 Apollo Drive 2031 Chelmsford, MA 01824 2032 508-634-1204 2033 pdoolan@cisco.com 2035 Nancy Feldman 2036 IBM Corp. 2037 17 Skyline Drive 2038 Hawthorne NY 10532 2039 914-784-3254 2040 nkf@vnet.ibm.com 2042 Andre Fredette 2043 Bay Networks Inc 2044 3 Federal Street 2045 Billerica, MA 01821 2046 508-916-8524 2047 fredette@baynetworks.com 2049 George Swallow 2050 Cisco Systems, Inc 2051 250 Apollo Drive 2052 Chelmsford, MA 01824 2053 508-244-8143 2054 swallow@cisco.com 2056 Arun Viswanathan 2057 IBM Corp. 2058 17 Skyline Drive 2059 Hawthorne NY 10532 2060 914-784-3273 2061 arunv@vnet.ibm.com