idnits 2.17.1 draft-ietf-mpls-arch-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2-11], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 574: '... stack operation MUST be to "pop the s...' RFC 2119 keyword, line 2266: '...like this. The MPLS architecture MUST...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 213 has weird spacing: '...e class a gr...' == Line 228 has weird spacing: '...on base the...' == Line 283 has weird spacing: '... router an ...' == Line 325 has weird spacing: '...itching an IE...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 1997) is 9751 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '2-11' on line 39 == Unused Reference: '2' is defined on line 2135, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 2139, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 2143, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 2147, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 2151, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 2155, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 2159, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 2163, but no explicit reference was found in the text == Unused Reference: '10' is defined on line 2167, but no explicit reference was found in the text == Unused Reference: '11' is defined on line 2171, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' ** Downref: Normative reference to an Informational RFC: RFC 2098 (ref. '11') -- Possible downref: Non-RFC (?) normative reference: ref. '12' Summary: 10 errors (**), 0 flaws (~~), 16 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Eric C. Rosen 3 Internet Draft Cisco Systems, Inc. 4 Expiration Date: February 1998 5 Arun Viswanathan 6 IBM Corp. 8 Ross Callon 9 Ascend Communications, Inc. 11 August 1997 13 A Proposed Architecture for MPLS 15 draft-ietf-mpls-arch-00.txt 17 Status of this Memo 19 This document is an Internet-Draft. Internet-Drafts are working 20 documents of the Internet Engineering Task Force (IETF), its areas, 21 and its working groups. Note that other groups may also distribute 22 working documents as Internet-Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 To learn the current status of any Internet-Draft, please check the 30 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 31 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 32 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 33 ftp.isi.edu (US West Coast). 35 Abstract 37 This internet draft contains a draft protocol architecture for 38 multiprotocol label switching (MPLS). The proposed architecture is 39 based on other label switching approaches [2-11] as well as on the 40 MPLS Framework document [1]. 42 Table of Contents 44 1 Introduction to MPLS ............................... 3 45 1.1 Overview ........................................... 3 46 1.2 Terminology ........................................ 5 47 1.3 Acronyms and Abbreviations ......................... 9 48 1.4 Acknowledgments .................................... 10 49 2 Outline of Approach ................................ 10 50 2.1 Labels ............................................. 10 51 2.2 Upstream and Downstream LSRs ....................... 11 52 2.3 Labeled Packet ..................................... 11 53 2.4 Label Assignment and Distribution; Attributes ...... 11 54 2.5 Label Distribution Protocol (LDP) .................. 12 55 2.6 The Label Stack .................................... 12 56 2.7 The Next Hop Label Forwarding Entry (NHLFE) ........ 13 57 2.8 Incoming Label Map (ILM) ........................... 13 58 2.9 Stream-to-NHLFE Map (STN) .......................... 13 59 2.10 Label Swapping ..................................... 14 60 2.11 Label Switched Path (LSP), LSP Ingress, LSP Egress . 14 61 2.12 LSP Next Hop ....................................... 16 62 2.13 Route Selection .................................... 17 63 2.14 Time-to-Live (TTL) ................................. 18 64 2.15 Loop Control ....................................... 19 65 2.15.1 Loop Prevention .................................... 20 66 2.15.2 Interworking of Loop Control Options ............... 22 67 2.16 Merging and Non-Merging LSRs ....................... 23 68 2.16.1 Stream Merge ....................................... 24 69 2.16.2 Non-merging LSRs ................................... 24 70 2.16.3 Labels for Merging and Non-Merging LSRs ............ 25 71 2.16.4 Merge over ATM ..................................... 26 72 2.16.4.1 Methods of Eliminating Cell Interleave ............. 26 73 2.16.4.2 Interoperation: VC Merge, VP Merge, and Non-Merge .. 26 74 2.17 LSP Control: Egress versus Local ................... 27 75 2.18 Granularity ........................................ 29 76 2.19 Tunnels and Hierarchy .............................. 30 77 2.19.1 Hop-by-Hop Routed Tunnel ........................... 30 78 2.19.2 Explicitly Routed Tunnel ........................... 30 79 2.19.3 LSP Tunnels ........................................ 30 80 2.19.4 Hierarchy: LSP Tunnels within LSPs ................. 31 81 2.19.5 LDP Peering and Hierarchy .......................... 31 82 2.20 LDP Transport ...................................... 33 83 2.21 Label Encodings .................................... 33 84 2.21.1 MPLS-specific Hardware and/or Software ............. 33 85 2.21.2 ATM Switches as LSRs ............................... 34 86 2.21.3 Interoperability among Encoding Techniques ......... 35 87 2.22 Multicast .......................................... 36 88 3 Some Applications of MPLS .......................... 36 89 3.1 MPLS and Hop by Hop Routed Traffic ................. 36 90 3.1.1 Labels for Address Prefixes ........................ 36 91 3.1.2 Distributing Labels for Address Prefixes ........... 36 92 3.1.2.1 LDP Peers for a Particular Address Prefix .......... 36 93 3.1.2.2 Distributing Labels ................................ 37 94 3.1.3 Using the Hop by Hop path as the LSP ............... 38 95 3.1.4 LSP Egress and LSP Proxy Egress .................... 38 96 3.1.5 The POP Label ...................................... 39 97 3.1.6 Option: Egress-Targeted Label Assignment ........... 40 98 3.2 MPLS and Explicitly Routed LSPs .................... 41 99 3.2.1 Explicitly Routed LSP Tunnels: Traffic Engineering . 42 100 3.3 Label Stacks and Implicit Peering .................. 42 101 3.4 MPLS and Multi-Path Routing ........................ 43 102 3.5 LSPs may be Multipoint-to-Point Entities ........... 44 103 3.6 LSP Tunneling between BGP Border Routers ........... 44 104 3.7 Other Uses of Hop-by-Hop Routed LSP Tunnels ........ 46 105 3.8 MPLS and Multicast ................................. 46 106 4 LDP Procedures ..................................... 47 107 5 Security Considerations ............................ 47 108 6 Authors' Addresses ................................. 47 109 7 References ......................................... 47 110 Appendix A Why Egress Control is Better ....................... 48 111 Appendix B Why Local Control is Better ........................ 56 113 1. Introduction to MPLS 115 1.1. Overview 117 In connectionless network layer protocols, as a packet travels from 118 one router hop to the next, an independent forwarding decision is 119 made at each hop. Each router analyzes the packet header, and runs a 120 network layer routing algorithm. The next hop for a packet is chosen 121 based on the header analysis and the result of running the routing 122 algorithm. 124 Packet headers contain considerably more information than is needed 125 simply to choose the next hop. Choosing the next hop can therefore be 126 thought of as the composition of two functions. The first function 127 partitions the entire packet forwarding space into "forwarding 128 equivalence classes (FECs)". The second maps these FECs to a next 129 hop. Multiple network layer headers which get mapped into the same 130 FEC are indistinguishable, as far as the forwarding decision is 131 concerned. The set of packets belonging to the same FEC, traveling 132 from a common node, will follow the same path and be forwarded in the 133 same manner (for example, by being placed in a common queue) towards 134 the destination. This set of packets following the same path, 135 belonging to the same FEC (and therefore being forwarded in a common 136 manner) may be referred to as a "stream". 138 In IP forwarding, multiple packets are typically assigned to the same 139 Stream by a particular router if there is some address prefix X in 140 that router's routing tables such that X is the "longest match" for 141 each packet's destination address. 143 In MPLS, the mapping from packet headers to stream is performed just 144 once, as the packet enters the network. The stream to which the 145 packet is assigned is encoded with a short fixed length value known 146 as a "label". When a packet is forwarded to its next hop, the label 147 is sent along with it; that is, the packets are "labeled". 149 At subsequent hops, there is no further analysis of the network layer 150 header. Rather, the label is used as an index into a table which 151 specifies the next hop, and a new label. The old label is replaced 152 with the new label, and the packet is forwarded to its next hop. This 153 eliminates the need to perform a longest match computation for each 154 packet at each hop; the computation can be performed just once. 156 Some routers analyze a packet's network layer header not merely to 157 choose the packet's next hop, but also to determine a packet's 158 "precedence" or "class of service", in order to apply different 159 discard thresholds or scheduling disciplines to different packets. In 160 MPLS, this can also be inferred from the label, so that no further 161 header analysis is needed. 163 The fact that a packet is assigned to a Stream just once, rather than 164 at every hop, allows the use of sophisticated forwarding paradigms. 165 A packet that enters the network at a particular router can be 166 labeled differently than the same packet entering the network at a 167 different router, and as a result forwarding decisions that depend on 168 the ingress point ("policy routing") can be easily made. In fact, 169 the policy used to assign a packet to a Stream need not have only the 170 network layer header as input; it may use arbitrary information about 171 the packet, and/or arbitrary policy information as input. Since this 172 decouples forwarding from routing, it allows one to use MPLS to 173 support a large variety of routing policies that are difficult or 174 impossible to support with just conventional network layer 175 forwarding. 177 Similarly, MPLS facilitates the use of explicit routing, without 178 requiring that each IP packet carry the explicit route. Explicit 179 routes may be useful to support policy routing and traffic 180 engineering. 182 MPLS makes use of a routing approach whereby the normal mode of 183 operation is that L3 routing (e.g., existing IP routing protocols 184 and/or new IP routing protocols) is used by all nodes to determine 185 the routed path. 187 MPLS stands for "Multiprotocol" Label Switching, multiprotocol 188 because its techniques are applicable to ANY network layer protocol. 189 In this document, however, we focus on the use of IP as the network 190 layer protocol. 192 A router which supports MPLS is known as a "Label Switching Router", 193 or LSR. 195 A general discussion of issues related to MPLS is presented in "A 196 Framework for Multiprotocol Label Switching" [1]. 198 1.2. Terminology 200 This section gives a general conceptual overview of the terms used in 201 this document. Some of these terms are more precisely defined in 202 later sections of the document. 204 aggregate stream synonym of "stream" 206 DLCI a label used in Frame Relay networks to 207 identify frame relay circuits 209 flow a single instance of an application to 210 application flow of data (as in the RSVP 211 and IFMP use of the term "flow") 213 forwarding equivalence class a group of IP packets which are 214 forwarded in the same manner (e.g., 215 over the same path, with the same 216 forwarding treatment) 218 frame merge stream merge, when it is applied to 219 operation over frame based media, so that 220 the potential problem of cell interleave 221 is not an issue. 223 label a short fixed length physically 224 contiguous identifier which is used to 225 identify a stream, usually of local 226 significance. 228 label information base the database of information containing 229 label bindings 231 label swap the basic forwarding operation consisting 232 of looking up an incoming label to 233 determine the outgoing label, 234 encapsulation, port, and other data 235 handling information. 237 label swapping a forwarding paradigm allowing 238 streamlined forwarding of data by using 239 labels to identify streams of data to be 240 forwarded. 242 label switched hop the hop between two MPLS nodes, on which 243 forwarding is done using labels. 245 label switched path the path created by the concatenation of 246 one or more label switched hops, allowing 247 a packet to be forwarded by swapping 248 labels from an MPLS node to another MPLS 249 node. 251 layer 2 the protocol layer under layer 3 (which 252 therefore offers the services used by 253 layer 3). Forwarding, when done by the 254 swapping of short fixed length labels, 255 occurs at layer 2 regardless of whether 256 the label being examined is an ATM 257 VPI/VCI, a frame relay DLCI, or an MPLS 258 label. 260 layer 3 the protocol layer at which IP and its 261 associated routing protocols operate link 262 layer synonymous with layer 2 264 loop detection a method of dealing with loops in which 265 loops are allowed to be set up, and data 266 may be transmitted over the loop, but the 267 loop is later detected and closed 269 loop prevention a method of dealing with loops in which 270 data is never transmitted over a loop 272 label stack an ordered set of labels 273 loop survival a method of dealing with loops in which 274 data may be transmitted over a loop, but 275 means are employed to limit the amount of 276 network resources which may be consumed 277 by the looping data 279 label switched path The path through one or more LSRs at one 280 level of the hierarchy followed by a 281 stream. 283 label switching router an MPLS node which is capable of 284 forwarding native L3 packets 286 merge point the node at which multiple streams and 287 switched paths are combined into a single 288 stream sent over a single path. 290 Mlabel abbreviation for MPLS label 292 MPLS core standards the standards which describe the core 293 MPLS technology 295 MPLS domain a contiguous set of nodes which operate 296 MPLS routing and forwarding and which are 297 also in one Routing or Administrative 298 Domain 300 MPLS edge node an MPLS node that connects an MPLS domain 301 with a node which is outside of the 302 domain, either because it does not run 303 MPLS, and/or because it is in a different 304 domain. Note that if an LSR has a 305 neighboring host which is not running 306 MPLS, that that LSR is an MPLS edge node. 308 MPLS egress node an MPLS edge node in its role in handling 309 traffic as it leaves an MPLS domain 311 MPLS ingress node an MPLS edge node in its role in handling 312 traffic as it enters an MPLS domain 314 MPLS label a label placed in a short MPLS shim 315 header used to identify streams 317 MPLS node a node which is running MPLS. An MPLS 318 node will be aware of MPLS control 319 protocols, will operate one or more L3 320 routing protocols, and will be capable of 321 forwarding packets based on labels. An 322 MPLS node may optionally be also capable 323 of forwarding native L3 packets. 325 MultiProtocol Label Switching an IETF working group and the effort 326 associated with the working group 328 network layer synonymous with layer 3 330 stack synonymous with label stack 332 stream an aggregate of one or more flows, 333 treated as one aggregate for the purpose 334 of forwarding in L2 and/or L3 nodes 335 (e.g., may be described using a single 336 label). In many cases a stream may be the 337 aggregate of a very large number of 338 flows. Synonymous with "aggregate 339 stream". 341 stream merge the merging of several smaller streams 342 into a larger stream, such that for some 343 or all of the path the larger stream can 344 be referred to using a single label. 346 switched path synonymous with label switched path 348 virtual circuit a circuit used by a connection-oriented 349 layer 2 technology such as ATM or Frame 350 Relay, requiring the maintenance of state 351 information in layer 2 switches. 353 VC merge stream merge when it is specifically 354 applied to VCs, specifically so as to 355 allow multiple VCs to merge into one 356 single VC 358 VP merge stream merge when it is applied to VPs, 359 specifically so as to allow multiple VPs 360 to merge into one single VP. In this case 361 the VCIs need to be unique. This allows 362 cells from different sources to be 363 distinguished via the VCI. 365 VPI/VCI a label used in ATM networks to identify 366 circuits 368 1.3. Acronyms and Abbreviations 370 ATM Asynchronous Transfer Mode 372 BGP Border Gateway Protocol 374 DLCI Data Link Circuit Identifier 376 FEC Forwarding Equivalence Class 378 STN Stream to NHLFE Map 380 IGP Interior Gateway Protocol 382 ILM Incoming Label Map 384 IP Internet Protocol 386 LIB Label Information Base 388 LDP Label Distribution Protocol 390 L2 Layer 2 392 L3 Layer 3 394 LSP Label Switched Path 396 LSR Label Switching Router 398 MPLS MultiProtocol Label Switching 400 MPT Multipoint to Point Tree 402 NHLFE Next Hop Label Forwarding Entry 404 SVC Switched Virtual Circuit 406 SVP Switched Virtual Path 408 TTL Time-To-Live 410 VC Virtual Circuit 412 VCI Virtual Circuit Identifier 414 VP Virtual Path 415 VPI Virtual Path Identifier 417 1.4. Acknowledgments 419 The ideas and text in this document have been collected from a number 420 of sources and comments received. We would like to thank Rick Boivie, 421 Paul Doolan, Nancy Feldman, Yakov Rekhter, Vijay Srinivasan, and 422 George Swallow for their inputs and ideas. 424 2. Outline of Approach 426 In this section, we introduce some of the basic concepts of MPLS and 427 describe the general approach to be used. 429 2.1. Labels 431 A label is a short fixed length locally significant identifier which 432 is used to identify a stream. The label is based on the stream or 433 forwarding equivalence class that a packet is assigned to. The label 434 does not directly encode the network layer address, and is based on 435 the network layer address only to the extent that the forwarding 436 equivalence class is based on the address. 438 If Ru and Rd are neighboring LSRs, they may agree to use label L to 439 represent Stream S for packets which are sent from Ru to Rd. That 440 is, they can agree to a "mapping" between label L and Stream S for 441 packets moving from Ru to Rd. As a result of such an agreement, L 442 becomes Ru's "outgoing label" corresponding to Stream S for such 443 packets; L becomes Rd's "incoming label" corresponding to Stream S 444 for such packets. 446 Note that L does not necessarily correspond to Stream S for any 447 packets other than those which are being sent from Ru to Rd. Also, L 448 is not an inherently meaningful value and does not have any network- 449 wide value; the particular value assigned to L gets its meaning 450 solely from the agreement between Ru and Rd. 452 Sometimes it may be difficult or even impossible for Rd to tell that 453 an arriving packet carrying label L comes from Ru, rather than from 454 some other LSR. In such cases, Rd must make sure that the mapping 455 from label to FEC is one-to-one. That is, in such cases, Rd must not 456 agree with Ru1 to use L for one purpose, while also agreeing with 457 some other LSR Ru2 to use L for a different purpose. 459 The scope of labels could be unique per interface, or unique per MPLS 460 node, or unique in a network. If labels are unique within a network, 461 no label swapping needs to be performed in the MPLS nodes in that 462 domain. The packets are just label forwarded and not label swapped. 463 The possible use of labels with network-wide scope is FFS. 465 2.2. Upstream and Downstream LSRs 467 Suppose Ru and Rd have agreed to map label L to Stream S, for packets 468 sent from Ru to Rd. Then with respect to this mapping, Ru is the 469 "upstream LSR", and Rd is the "downstream LSR". 471 The notion of upstream and downstream relate to agreements between 472 nodes of the label values to be assigned for packets belonging to a 473 particular Stream that might be traveling from an upstream node to a 474 downstream node. This is independent of whether the routing protocol 475 actually will cause any packets to be transmitted in that particular 476 direction. Thus, Rd is the downstream LSR for a particular mapping 477 for label L if it recognizes L-labeled packets from Ru as being in 478 Stream S. This may be true even if routing does not actually forward 479 packets for Stream S between nodes Rd and Ru, or if routing has made 480 Ru downstream of Rd along the path which is actually used for packets 481 in Stream S. 483 2.3. Labeled Packet 485 A "labeled packet" is a packet into which a label has been encoded. 486 The encoding can be done by means of an encapsulation which exists 487 specifically for this purpose, or by placing the label in an 488 available location in either of the data link or network layer 489 headers. Of course, the encoding technique must be agreed to by the 490 entity which encodes the label and the entity which decodes the 491 label. 493 2.4. Label Assignment and Distribution; Attributes 495 For unicast traffic in the MPLS architecture, the decision to bind a 496 particular label L to a particular Stream S is made by the LSR which 497 is downstream with respect to that mapping. The downstream LSR then 498 informs the upstream LSR of the mapping. Thus labels are 499 "downstream-assigned", and are "distributed upstream". 501 A particular mapping of label L to Stream S, distributed by Rd to Ru, 502 may have associated "attributes". If Ru, acting as a downstream LSR, 503 also distributes a mapping of a label to Stream S, then under certain 504 conditions, it may be required to also distribute the corresponding 505 attribute that it received from Rd. 507 2.5. Label Distribution Protocol (LDP) 509 A Label Distribution Protocol (LDP) is a set of procedures by which 510 one LSR informs another of the label/Stream mappings it has made. 511 Two LSRs which use an LDP to exchange label/Stream mapping 512 information are known as "LDP Peers" with respect to the mapping 513 information they exchange; we will speak of there being an "LDP 514 Adjacency" between them. 516 (N.B.: two LSRs may be LDP Peers with respect to some set of 517 mappings, but not with respect to some other set of mappings.) 519 The LDP also encompasses any negotiations in which two LDP Peers need 520 to engage in order to learn of each other's MPLS capabilities. 522 2.6. The Label Stack 524 So far, we have spoken as if a labeled packet carries only a single 525 label. As we shall see, it is useful to have a more general model in 526 which a labeled packet carries a number of labels, organized as a 527 last-in, first-out stack. We refer to this as a "label stack". 529 At a particular LSR, the decision as to how to forward a labeled 530 packet is always based exclusively on the label at the top of the 531 stack. 533 An unlabeled packet can be thought of as a packet whose label stack 534 is empty (i.e., whose label stack has depth 0). 536 If a packet's label stack is of depth m, we refer to the label at the 537 bottom of the stack as the level 1 label, to the label above it (if 538 such exists) as the level 2 label, and to the label at the top of the 539 stack as the level m label. 541 The utility of the label stack will become clear when we introduce 542 the notion of LSP Tunnel and the MPLS Hierarchy (sections 2.19.3 and 543 2.19.4). 545 2.7. The Next Hop Label Forwarding Entry (NHLFE) 547 The "Next Hop Label Forwarding Entry" (NHLFE) is used when forwarding 548 a labeled packet. It contains the following information: 550 1. the packet's next hop 552 2. the data link encapsulation to use when transmitting the packet 554 3. the way to encode the label stack when transmitting the packet 556 4. the operation to perform on the packet's label stack; this is 557 one of the following operations: 559 a) replace the label at the top of the label stack with a 560 specified new label 562 b) pop the label stack 564 c) replace the label at the top of the label stack with a 565 specified new label, and then push one or more specified 566 new labels onto the label stack. 568 Note that at a given LSR, the packet's "next hop" might be that LSR 569 itself. In this case, the LSR would need to pop the top level label 570 and examine and operate on the encapsulated packet. This may be a 571 lower level label, or may be the native IP packet. This implies that 572 in some cases the LSR may need to operate on the IP header in order 573 to forward the packet. If the packet's "next hop" is the current LSR, 574 then the label stack operation MUST be to "pop the stack". 576 2.8. Incoming Label Map (ILM) 578 The "Incoming Label Map" (ILM) is a mapping from incoming labels to 579 NHLFEs. It is used when forwarding packets that arrive as labeled 580 packets. 582 2.9. Stream-to-NHLFE Map (STN) 584 The "Stream-to-NHLFE" (STN) is a mapping from stream to NHLFEs. It is 585 used when forwarding packets that arrive unlabeled, but which are to 586 be labeled before being forwarded. 588 2.10. Label Swapping 590 Label swapping is the use of the following procedures to forward a 591 packet. 593 In order to forward a labeled packet, a LSR examines the label at the 594 top of the label stack. It uses the ILM to map this label to an 595 NHLFE. Using the information in the NHLFE, it determines where to 596 forward the packet, and performs an operation on the packet's label 597 stack. It then encodes the new label stack into the packet, and 598 forwards the result. 600 In order to forward an unlabeled packet, a LSR analyzes the network 601 layer header, to determine the packet's Stream. It then uses the FTN 602 to map this to an NHLFE. Using the information in the NHLFE, it 603 determines where to forward the packet, and performs an operation on 604 the packet's label stack. (Popping the label stack would, of course, 605 be illegal in this case.) It then encodes the new label stack into 606 the packet, and forwards the result. 608 It is important to note that when label swapping is in use, the next 609 hop is always taken from the NHLFE; this may in some cases be 610 different from what the next hop would be if MPLS were not in use. 612 2.11. Label Switched Path (LSP), LSP Ingress, LSP Egress 614 A "Label Switched Path (LSP) of level m" for a particular packet P is 615 a sequence of LSRs, 617 619 with the following properties: 621 1. R1, the "LSP Ingress", pushes a label onto P's label stack, 622 resulting in a label stack of depth m; 624 2. For all i, 10). 648 In other words, we can speak of the level m LSP for Packet P as the 649 sequence of LSRs: 651 1. which begins with an LSR (an "LSP Ingress") that pushes on a 652 level m label, 654 2. all of whose intermediate LSRs make their forwarding decision 655 by label Switching on a level m label, 657 3. which ends (at an "LSP Egress") when a forwarding decision is 658 made by label Switching on a level m-k label, where k>0, or 659 when a forwarding decision is made by "ordinary", non-MPLS 660 forwarding procedures. 662 A consequence (or perhaps a presupposition) of this is that whenever 663 an LSR pushes a label onto an already labeled packet, it needs to 664 make sure that the new label corresponds to a FEC whose LSP Egress is 665 the LSR that assigned the label which is now second in the stack. 667 Note that according to these definitions, if is a level 668 m LSP for packet P, P may be transmitted from R[n-1] to Rn with a 669 label stack of depth m-1. That is, the label stack may be popped at 670 the penultimate LSR of the LSP, rather than at the LSP Egress. This 671 is appropriate, since the level m label has served its function of 672 getting the packet to Rn, and Rn's forwarding decision cannot be made 673 until the level m label is popped. If the label stack is not popped 674 by R[n-1], then Rn must do two label lookups; this is an overhead 675 which is best avoided. However, some hardware switching engines may 676 not be able to pop the label stack. 678 The penultimate node pops the label stack only if this is 679 specifically requested by the egress node. Having the penultimate 680 node pop the label stack has an implication on the assignment of 681 labels: For any one node Rn, operating at level m in the MPLS 682 hierarchy, there may be some LSPs which terminate at that node (i.e., 683 for which Rn is the egress node) and some other LSPs which continue 684 beyond that node (i.e., for which Rn is an intermediate node). If the 685 penultimate node R[n-1] pops the stack for those LSPs which terminate 686 at Rn, then node R[n] will receive some packets for which the top of 687 the stack is a level m label (i.e., packets destined for other egress 688 nodes), and some packets for which the top of the stack is a level 689 m-1 label (i.e., packets for which Rn is the egress). This implies 690 that in order for node R[n-1] to pop the stack, node Rn must assign 691 labels such that level m and level m-1 labels are distinguishable 692 (i.e., use unique values across multiple levels of the MPLS 693 hierarchy). 695 Note that if m = 1, the LSP Egress may receive an unlabeled packet, 696 and in fact need not even be capable of supporting MPLS. In this 697 case, assuming that we are using globally meaningful IP addresses, 698 the confusion of labels at multiple levels is not possible. However, 699 it is possible that the label may still be of value for the egress 700 node. One example is that the label may be used to assign the packet 701 to a particular Forwarding Equivalence Class (for example, to 702 identify the packet as a high priority packet). Another example is 703 that the label may assign the packet to a particular virtual private 704 network (for example, the virtual private network may make use of 705 local IP addresses, and the label may be necessary to disambiguate 706 the addresses). Therefore even when there is only a single label 707 value the stack is nonetheless popped only when requested by the 708 egress node. 710 We will call a sequence of LSRs the "LSP for a particular Stream S" 711 if it is an LSP of level m for a particular packet P when P's level m 712 label is a label corresponding to Stream S. 714 2.12. LSP Next Hop 716 The LSP Next Hop for a particular labeled packet in a particular LSR 717 is the LSR which is the next hop, as selected by the NHLFE entry used 718 for forwarding that packet. 720 The LSP Next Hop for a particular Stream is the next hop as selected 721 by the NHLFE entry indexed by a label which corresponds to that 722 Stream. 724 2.13. Route Selection 726 Route selection refers to the method used for selecting the LSP for a 727 particular stream. The proposed MPLS protocol architecture supports 728 two options for Route Selection: (1) Hop by hop routing, and (2) 729 Explicit routing. 731 Hop by hop routing allows each node to independently choose the next 732 hop for the path for a stream. This is the normal mode today with 733 existing datagram IP networks. A hop by hop routed LSP refers to an 734 LSP whose route is selected using hop by hop routing. 736 An explicitly routed LSP is an LSP where, at a given LSR, the LSP 737 next hop is not chosen by each local node, but rather is chosen by a 738 single node (usually the ingress or egress node of the LSP). The 739 sequence of LSRs followed by an explicit routing LSP may be chosen by 740 configuration, or by a protocol selected by a single node (for 741 example, the egress node may make use of the topological information 742 learned from a link state database in order to compute the entire 743 path for the tree ending at that egress node). Explicit routing may 744 be useful for a number of purposes such as allowing policy routing 745 and/or facilitating traffic engineering. With MPLS the explicit 746 route needs to be specified at the time that Labels are assigned, but 747 the explicit route does not have to be specified with each IP packet. 748 This implies that explicit routing with MPLS is relatively efficient 749 (when compared with the efficiency of explicit routing for pure 750 datagrams). 752 For any one LSP (at any one level of hierarchy), there are two 753 possible options: (i) The entire LSP may be hop by hop routed from 754 ingress to egress; (ii) The entire LSP may be explicit routed from 755 ingress to egress. Intermediate cases do not make sense: In general, 756 an LSP will be explicit routed specifically because there is a good 757 reason to use an alternative to the hop by hop routed path. This 758 implies that if some of the nodes along the path follow an explicit 759 route but some of the nodes make use of hop by hop routing, then 760 inconsistent routing will result and loops (or severely inefficient 761 paths) may form. 763 For this reason, it is important that if an explicit route is 764 specified for an LSP, then that route must be followed. Note that it 765 is relatively simple to *follow* an explicit route which is specified 766 in a LDP setup. We therefore propose that the LDP specification 767 require that all MPLS nodes implement the ability to follow an 768 explicit route if this is specified. 770 It is not necessary for a node to be able to create an explicit 771 route. However, in order to ensure interoperability it is necessary 772 to ensure that either (i) Every node knows how to use hop by hop 773 routing; or (ii) Every node knows how to create and follow an 774 explicit route. We propose that due to the common use of hop by hop 775 routing in networks today, it is reasonable to make hop by hop 776 routing the default that all nodes need to be able to use. 778 2.14. Time-to-Live (TTL) 780 In conventional IP forwarding, each packet carries a "Time To Live" 781 (TTL) value in its header. Whenever a packet passes through a 782 router, its TTL gets decremented by 1; if the TTL reaches 0 before 783 the packet has reached its destination, the packet gets discarded. 785 This provides some level of protection against forwarding loops that 786 may exist due to misconfigurations, or due to failure or slow 787 convergence of the routing algorithm. TTL is sometimes used for other 788 functions as well, such as multicast scoping, and supporting the 789 "traceroute" command. This implies that there are two TTL-related 790 issues that MPLS needs to deal with: (i) TTL as a way to suppress 791 loops; (ii) TTL as a way to accomplish other functions, such as 792 limiting the scope of a packet. 794 When a packet travels along an LSP, it should emerge with the same 795 TTL value that it would have had if it had traversed the same 796 sequence of routers without having been label switched. If the 797 packet travels along a hierarchy of LSPs, the total number of LSR- 798 hops traversed should be reflected in its TTL value when it emerges 799 from the hierarchy of LSPs. 801 The way that TTL is handled may vary depending upon whether the MPLS 802 label values are carried in an MPLS-specific "shim" header, or if the 803 MPLS labels are carried in an L2 header such as an ATM header or a 804 frame relay header. 806 If the label values are encoded in a "shim" that sits between the 807 data link and network layer headers, then this shim should have a TTL 808 field that is initially loaded from the network layer header TTL 809 field, is decremented at each LSR-hop, and is copied into the network 810 layer header TTL field when the packet emerges from its LSP. 812 If the label values are encoded in an L2 header (e.g., the VPI/VCI 813 field in ATM's AAL5 header), and the labeled packets are forwarded by 814 an L2 switch (e.g., an ATM switch). This implies that unless the data 815 link layer itself has a TTL field (unlike ATM), it will not be 816 possible to decrement a packet's TTL at each LSR-hop. An LSP segment 817 which consists of a sequence of LSRs that cannot decrement a packet's 818 TTL will be called a "non-TTL LSP segment". 820 When a packet emerges from a non-TTL LSP segment, it should however 821 be given a TTL that reflects the number of LSR-hops it traversed. In 822 the unicast case, this can be achieved by propagating a meaningful 823 LSP length to ingress nodes, enabling the ingress to decrement the 824 TTL value before forwarding packets into a non-TTL LSP segment. 826 Sometimes it can be determined, upon ingress to a non-TTL LSP 827 segment, that a particular packet's TTL will expire before the packet 828 reaches the egress of that non-TTL LSP segment. In this case, the LSR 829 at the ingress to the non-TTL LSP segment must not label switch the 830 packet. This means that special procedures must be developed to 831 support traceroute functionality, for example, traceroute packets may 832 be forwarded using conventional hop by hop forwarding. 834 2.15. Loop Control 836 On a non-TTL LSP segment, by definition, TTL cannot be used to 837 protect against forwarding loops. The importance of loop control may 838 depend on the particular hardware being used to provide the LSR 839 functions along the non-TTL LSP segment. 841 Suppose, for instance, that ATM switching hardware is being used to 842 provide MPLS switching functions, with the label being carried in the 843 VPI/VCI field. Since ATM switching hardware cannot decrement TTL, 844 there is no protection against loops. If the ATM hardware is capable 845 of providing fair access to the buffer pool for incoming cells 846 carrying different VPI/VCI values, this looping may not have any 847 deleterious effect on other traffic. If the ATM hardware cannot 848 provide fair buffer access of this sort, however, then even transient 849 loops may cause severe degradation of the LSR's total performance. 851 Even if fair buffer access can be provided, it is still worthwhile to 852 have some means of detecting loops that last "longer than possible". 853 In addition, even where TTL and/or per-VC fair queuing provides a 854 means for surviving loops, it still may be desirable where practical 855 to avoid setting up LSPs which loop. 857 The MPLS architecture will therefore provide a technique for ensuring 858 that looping LSP segments can be detected, and a technique for 859 ensuring that looping LSP segments are never created. 861 2.15.1. Loop Prevention 863 LSR's maintain for each of their LSP's an LSR id list. This list is a 864 list of all the LSR's downstream from this LSR on a given LSP. The 865 LSR id list is used to prevent the formation of switched path loops. 866 The LSR ID list is propagated upstream from a node to its neighbor 867 nodes. The LSR ID list is used to prevent loops as follows: 869 When a node, R, detects a change in the next hop for a given stream, 870 it asks its new next hop for a label and the associated LSR ID list 871 for that stream. 873 The new next hop responds with a label for the stream and an 874 associated LSR id list. 876 R looks in the LSR id list. If R determines that it, R, is in the 877 list then we have a route loop. In this case, we do nothing and the 878 old LSP will continue to be used until the route protocols break the 879 loop. The means by which the old LSP is replaced by a new LSP after 880 the route protocols breathe loop is described below. 882 If R is not in the LSR id list, R will start a "diffusion" 883 computation [12]. The purpose of the diffusion computation is to 884 prune the tree upstream of R so that we remove all LSR's from the 885 tree that would be on a looping path if R were to switch over to the 886 new LSP. After those LSR's are removed from the tree, it is safe for 887 R to replace the old LSP with the new LSP (and the old LSP can be 888 released). 890 The diffusion computation works as follows: 892 R adds its LSR id to the list and sends a query message to each of 893 its "upstream" neighbors (i.e. to each of its neighbors that is not 894 the new "downstream" next hop). 896 A node S that receives such a query will process the query as 897 follows: 899 - If node R is not node S's next hop for the given stream, node S 900 will respond to node R will an "OK" message meaning that as far 901 as node S is concerned it is safe for node R to switch over to 902 the new LSP. 904 - If node R is node S's next hop for the stream, node S will check 905 to see if it, node S, is in the LSR id list that it received from 906 node R. If it is, we have a route loop and S will respond with a 907 "LOOP" message. R will unsplice the connection to S pruning S 908 from the tree. The mechanism by which S will get a new LSP for 909 the stream after the route protocols break the loop is described 910 below. 912 - If node S is not in the LSR id list, S will add its LSR id to the 913 LSR id list and send a new query message further upstream. The 914 diffusion computation will continue to propagate upstream along 915 each of the paths in the tree upstream of S until either a loop 916 is detected, in which case the node is pruned as described above 917 or we get to a point where a node gets a response ("OK" or 918 "LOOP") from each of its neighbors perhaps because none of those 919 neighbors considers the node in question to be its downstream 920 next hop. Once a node has received a response from each of its 921 upstream neighbors, it returns an "OK" message to its downstream 922 neighbor. When the original node, node R, gets a response from 923 each of its neighbors, it is safe to replace the old LSP with the 924 new one because all the paths that would loop have been pruned 925 from the tree. 927 There are a couple of details to discuss: 929 - First, we need to do something about nodes that for one reason or 930 another do not produce a timely response in response to a query 931 message. If a node Y does not respond to a query from node X 932 because of a failure of some kind, X will not be able to respond 933 to its downstream neighbors (if any) or switch over to a new LSP 934 if X is, like R above, the node that has detected the route 935 change. This problem is handled by timing out the query message. 936 If a node doesn't receive a response within a "reasonable" period 937 of time, it "unsplices" its VC to the upstream neighbor that is 938 not responding and proceeds as it would if it had received the 939 "LOOP" message. 941 - We also need to be concerned about multiple concurrent routing 942 updates. What happens, for example, when a node M receives a 943 request for an LSP from an upstream neighbor, N, when M is in the 944 middle of a diffusion computation i.e., it has sent a query 945 upstream but hasn't received all the responses. Since a 946 downstream node, node R is about to change from one LSP to 947 another, M needs to pass to N an LSR id list corresponding to the 948 union of the old and new LSP's if it is to avoid loops both 949 before and after the transition. This is easily accomplished 950 since M already has the LSR id list for the old LSP and it gets 951 the LSR id list for the new LSP in the query message. After R 952 makes the switch from the old LSP to the new one, R sends a new 953 establish message upstream with the LSR id list of (just) the new 954 LSP. At this point, the nodes upstream of R know that R has 955 switched over to the new LSP and that they can return the id list 956 for (just) the new LSP in response to any new requests for LSP's. 958 They can also grow the tree to include additional nodes that 959 would not have been valid for the combined LSR id list. 961 - We also need to discuss how a node that doesn't have an LSP for a 962 given stream at the end of a diffusion computation (because it 963 would have been on a looping LSP) gets one after the routing 964 protocols break the loop. If node L has been pruned from the 965 tree and its local route protocol processing entity breaks the 966 loop by changing L's next hop, L will request a new LSP from its 967 new downstream neighbor which it will use once it executes the 968 diffusion computation as described above. If the loop is broken 969 by a route change at another point in the loop, i.e. at a point 970 "downstream" of L, L will get a new LSP as the new LSP tree grows 971 upstream from the point of the route change as discussed in the 972 previous paragraph. 974 - Note that when a node is pruned from the tree, the switched path 975 upstream of that node remains "connected". This is important 976 since it allows the switched path to get "reconnected" to a 977 downstream switched path after a route change with a minimal 978 amount of unsplicing and resplicing once the appropriate 979 diffusion computation(s) have taken place. 981 The LSR Id list can also be used to provide a "loop detection" 982 capability. To use it in this manner, an LSR which sees that it is 983 already in the LSR Id list for a particular stream will immediately 984 unsplice itself from the switched path for that stream, and will NOT 985 pass the LSR Id list further upstream. The LSR can rejoin a switched 986 path for the stream when it changes its next hop for that stream, or 987 when it receives a new LSR Id list from its current next hop, in 988 which it is not contained. The diffusion computation would be 989 omitted. 991 2.15.2. Interworking of Loop Control Options 993 The MPLS protocol architecture allows some nodes to be using loop 994 prevention, while some other nodes are not (i.e., the choice of 995 whether or not to use loop prevention may be a local decision). When 996 this mix is used, it is not possible for a loop to form which 997 includes only nodes which do loop prevention. However, it is possible 998 for loops to form which contain a combination of some nodes which do 999 loop prevention, and some nodes which do not. 1001 There are at least four identified cases in which it makes sense to 1002 combine nodes which do loop prevention with nodes which do not: (i) 1003 For transition, in intermediate states while transitioning from all 1004 non-loop-prevention to all loop prevention, or vice versa; (ii) For 1005 interoperability, where one vendor implements loop prevention but 1006 another vendor does not; (iii) Where there is a mixed ATM and 1007 datagram media network, and where loop prevention is desired over the 1008 ATM portions of the network but not over the datagram portions; (iv) 1009 where some of the ATM switches can do fair access to the buffer pool 1010 on a per-VC basis, and some cannot, and loop prevention is desired 1011 over the ATM portions of the network which cannot. 1013 Note that interworking is straightforward. If an LSR is not doing 1014 loop prevention, and it receives from a downstream LSR a label 1015 mapping which contains loop prevention information, it (a) accepts 1016 the label mapping, (b) does NOT pass the loop prevention information 1017 upstream, and (c) informs the downstream neighbor that the path is 1018 loop-free. 1020 Similarly, if an LSR R which is doing loop prevention receives from a 1021 downstream LSR a label mapping which does not contain any loop 1022 prevention information, then R passes the label mapping upstream with 1023 loop prevention information included as if R were the egress for the 1024 specified stream. 1026 Optionally, a node is permitted to implement the ability of either 1027 doing or not doing loop prevention as options, and is permitted to 1028 choose which to use for any one particular LSP based on the 1029 information obtained from downstream nodes. When the label mapping 1030 arrives from downstream, then the node may choose whether to use loop 1031 prevention so as to continue to use the same approach as was used in 1032 the information passed to it. Note that regardless of whether loop 1033 prevention is used the egress nodes (for any particular LSP) always 1034 initiates exchange of label mapping information without waiting for 1035 other nodes to act. 1037 2.16. Merging and Non-Merging LSRs 1039 Merge allows multiple upstream LSPs to be merged into a single 1040 downstream LSP. When implemented by multiple nodes, this results in 1041 the traffic going to a particular egress nodes, based on one 1042 particular Stream, to follow a multipoint to point tree (MPT), with 1043 the MPT rooted at the egress node and associated with the Stream. 1044 This can have a significant effect on reducing the number of labels 1045 that need to be maintained by any one particular node. 1047 If merge was not used at all it would be necessary for each node to 1048 provide the upstream neighbors with a label for each Stream for each 1049 upstream node which may be forwarding traffic over the link. This 1050 implies that the number of labels needed might not in general be 1051 known a priori. However, the use of merge allows a single label to be 1052 used per Stream, therefore allowing label assignment to be done in a 1053 common way without regard for the number of upstream nodes which will 1054 be using the downstream LSP. 1056 The proposed MPLS protocol architecture supports LSP merge, while 1057 allowing nodes which do not support LSP merge. This leads to the 1058 issue of ensuring correct interoperation between nodes which 1059 implement merge and those which do not. The issue is somewhat 1060 different in the case of datagram media versus the case of ATM. The 1061 different media types will therefore be discussed separately. 1063 2.16.1. Stream Merge 1065 Let us say that an LSR is capable of Stream Merge if it can receive 1066 two packets from different incoming interfaces, and/or with different 1067 labels, and send both packets out the same outgoing interface with 1068 the same label. This in effect takes two incoming streams and merges 1069 them into one. Once the packets are transmitted, the information that 1070 they arrived from different interfaces and/or with different incoming 1071 labels is lost. 1073 Let us say that an LSR is not capable of Stream Merge if, for any two 1074 packets which arrive from different interfaces, or with different 1075 labels, the packets must either be transmitted out different 1076 interfaces, or must have different labels. 1078 An LSR which is capable of Stream Merge (a "Merging LSR") needs to 1079 maintain only one outgoing label for each FEC. AN LSR which is not 1080 capable of Stream Merge (a "Non-merging LSR") may need to maintain as 1081 many as N outgoing labels per FEC, where N is the number of LSRs in 1082 the network. Hence by supporting Stream Merge, an LSR can reduce its 1083 number of outgoing labels by a factor of O(N). Since each label in 1084 use requires the dedication of some amount of resources, this can be 1085 a significant savings. 1087 2.16.2. Non-merging LSRs 1089 The MPLS forwarding procedures is very similar to the forwarding 1090 procedures used by such technologies as ATM and Frame Relay. That is, 1091 a unit of data arrives, a label (VPI/VCI or DLCI) is looked up in a 1092 "cross-connect table", on the basis of that lookup an output port is 1093 chosen, and the label value is rewritten. In fact, it is possible to 1094 use such technologies for MPLS forwarding; LDP can be used as the 1095 "signalling protocol" for setting up the cross-connect tables. 1097 Unfortunately, these technologies do not necessarily support the 1098 Stream Merge capability. In ATM, if one attempts to perform Stream 1099 Merge, the result may be the interleaving of cells from various 1100 packets. If cells from different packets get interleaved, it is 1101 impossible to reassemble the packets. Some Frame Relay switches use 1102 cell switching on their backplanes. These switches may also be 1103 incapable of supporting Stream Merge, for the same reason -- cells of 1104 different packets may get interleaved, and there is then no way to 1105 reassemble the packets. 1107 We propose to support two solutions to this problem. First, MPLS will 1108 contain procedures which allow the use of non-merging LSRs. Second, 1109 MPLS will support procedures which allow certain ATM switches to 1110 function as merging LSRs. 1112 Since MPLS supports both merging and non-merging LSRs, MPLS also 1113 contains procedures to ensure correct interoperation between them. 1115 2.16.3. Labels for Merging and Non-Merging LSRs 1117 An upstream LSR which supports Stream Merge needs to be sent only one 1118 label per FEC. An upstream neighbor which does not support Stream 1119 Merge needs to be sent multiple labels per FEC. However, there is no 1120 way of knowing a priori how many labels it needs. This will depend on 1121 how many LSRs are upstream of it with respect to the FEC in question. 1123 In the MPLS architecture, if a particular upstream neighbor does not 1124 support Stream Merge, it is not sent any labels for a particular FEC 1125 unless it explicitly asks for a label for that FEC. The upstream 1126 neighbor may make multiple such requests, and is given a new label 1127 each time. When a downstream neighbor receives such a request from 1128 upstream, and the downstream neighbor does not itself support Stream 1129 Merge, then it must in turn ask its downstream neighbor for another 1130 label for the FEC in question. 1132 It is possible that there may be some nodes which support merge, but 1133 have a limited number of upstream streams which may be merged into a 1134 single downstream streams. Suppose for example that due to some 1135 hardware limitation a node is capable of merging four upstream LSPs 1136 into a single downstream LSP. Suppose however, that this particular 1137 node has six upstream LSPs arriving at it for a particular Stream. In 1138 this case, this node may merge these into two downstream LSPs 1139 (corresponding to two labels that need to be obtained from the 1140 downstream neighbor). In this case, the normal operation of the LDP 1141 implies that the downstream neighbor will supply this node with a 1142 single label for the Stream. This node can then ask its downstream 1143 neighbor for one additional label for the Stream, implying that the 1144 node will thereby obtain the required two labels. 1146 The interaction between explicit routing and merge is FFS. 1148 2.16.4. Merge over ATM 1150 2.16.4.1. Methods of Eliminating Cell Interleave 1152 There are several methods that can be used to eliminate the cell 1153 interleaving problem in ATM, thereby allowing ATM switches to support 1154 stream merge: : 1156 1. VP merge 1158 When VP merge is used, multiple virtual paths are merged into a 1159 virtual path, but packets from different sources are 1160 distinguished by using different VCs within the VP. 1162 2. VC merge 1164 When VC merge is used, switches are required to buffer cells 1165 from one packet until the entire packet is received (this may 1166 be determined by looking for the AAL5 end of frame indicator). 1168 VP merge has the advantage that it is compatible with a higher 1169 percentage of existing ATM switch implementations. This makes it more 1170 likely that VP merge can be used in existing networks. Unlike VC 1171 merge, VP merge does not incur any delays at the merge points and 1172 also does not impose any buffer requirements. However, it has the 1173 disadvantage that it requires coordination of the VCI space within 1174 each VP. There are a number of ways that this can be accomplished. 1175 Selection of one or more methods is FFS. 1177 This tradeoff between compatibility with existing equipment versus 1178 protocol complexity and scalability implies that it is desirable for 1179 the MPLS protocol to support both VP merge and VC merge. In order to 1180 do so each ATM switch participating in MPLS needs to know whether its 1181 immediate ATM neighbors perform VP merge, VC merge, or no merge. 1183 2.16.4.2. Interoperation: VC Merge, VP Merge, and Non-Merge 1185 The interoperation of the various forms of merging over ATM is most 1186 easily described by first describing the interoperation of VC merge 1187 with non-merge. 1189 In the case where VC merge and non-merge nodes are interconnected the 1190 forwarding of cells is based in all cases on a VC (i.e., the 1191 concatenation of the VPI and VCI). For each node, if an upstream 1192 neighbor is doing VC merge then that upstream neighbor requires only 1193 a single VPI/VCI for a particular Stream (this is analogous to the 1194 requirement for a single label in the case of operation over frame 1195 media). If the upstream neighbor is not doing merge, then the 1196 neighbor will require a single VPI/VCI per Stream for itself, plus 1197 enough VPI/VCIs to pass to its upstream neighbors. The number 1198 required will be determined by allowing the upstream nodes to request 1199 additional VPI/VCIs from their downstream neighbors (this is again 1200 analogous to the method used with frame merge). 1202 A similar method is possible to support nodes which perform VP merge. 1203 In this case the VP merge node, rather than requesting a single 1204 VPI/VCI or a number of VPI/VCIs from its downstream neighbor, instead 1205 may request a single VP (identified by a VPI) but several VCIs within 1206 the VP. Furthermore, suppose that a non-merge node is downstream 1207 from two different VP merge nodes. This node may need to request one 1208 VPI/VCI (for traffic originating from itself) plus two VPs (one for 1209 each upstream node), each associated with a specified set of VCIs (as 1210 requested from the upstream node). 1212 In order to support all of VP merge, VC merge, and non-merge, it is 1213 therefore necessary to allow upstream nodes to request a combination 1214 of zero or more VC identifiers (consisting of a VPI/VCI), plus zero 1215 or more VPs (identified by VPIs) each containing a specified number 1216 of VCs (identified by a set of VCIs which are significant within a 1217 VP). VP merge nodes would therefore request one VP, with a contained 1218 VCI for traffic that it originates (if appropriate) plus a VCI for 1219 each VC requested from above (regardless of whether or not the VC is 1220 part of a containing VP). VC merge node would request only a single 1221 VPI/VCI (since they can merge all upstream traffic into a single VC). 1222 Non-merge nodes would pass on any requests that they get from above, 1223 plus request a VPI/VCI for traffic that they originate (if 1224 appropriate). 1226 2.17. LSP Control: Egress versus Local 1228 There is a choice to be made regarding whether the initial setup of 1229 LSPs will be initiated by the egress node, or locally by each 1230 individual node. 1232 When LSP control is done locally, then each node may at any time pass 1233 label bindings to its neighbors for each FEC recognized by that node. 1234 In the normal case that the neighboring nodes recognize the same 1235 FECs, then nodes may map incoming labels to outgoing labels as part 1236 of the normal label swapping forwarding method. 1238 When LSP control is done by the egress, then initially only the 1239 egress node passes label bindings to its neighbors corresponding to 1240 any FECs which leave the MPLS network at that egress node. Other 1241 nodes wait until they get a label from downstream for a particular 1242 FEC before passing a corresponding label for the same FEC to upstream 1243 nodes. 1245 With local control, since each LSR is (at least initially) 1246 independently assigning labels to FECs, it is possible that different 1247 LSRs may make inconsistent decisions. For example, an upstream LSR 1248 may make a coarse decision (map multiple IP address prefixes to a 1249 single label) while its downstream neighbor makes a finer grain 1250 decision (map each individual IP address prefix to a separate label). 1251 With downstream label assignment this can be corrected by having LSRs 1252 withdraw labels that it has assigned which are inconsistent with 1253 downstream labels, and replace them with new consistent label 1254 assignments. 1256 Even with egress control it is possible that the choice of egress 1257 node may change, or the egress may (based on a change in 1258 configuration) change its mind in terms of the granularity which is 1259 to be used. This implies the same mechanism will be necessary to 1260 allow changes in granularity to bubble up to upstream nodes. The 1261 choice of egress or local control may therefore effect the frequency 1262 with which this mechanism is used, but will not effect the need for a 1263 mechanism to achieve consistency of label granularity. Generally 1264 speaking, the choice of local versus egress control does not appear 1265 to have any effect on the LDP mechanisms which need to be defined. 1267 Egress control and local control can interwork in a very 1268 straightforward manner (although some of the advantages ascribed to 1269 egress control may be lost, see appendices A and B). With either 1270 approach, (assuming downstream label assignment) the egress node will 1271 initially assign labels for particular FECs and will pass these 1272 labels to its neighbors. With either approach these label assignments 1273 will bubble upstream, with the upstream nodes choosing labels that 1274 are consistent with the labels that they receive from downstream. The 1275 difference between the two approaches is therefore primarily an issue 1276 of what each node does prior to obtaining a label assignment for a 1277 particular FEC from downstream nodes: Does it wait, or does it assign 1278 a preliminary label under the expectation that it will (probably) be 1279 correct? 1281 Regardless of which method is used (local control or egress control) 1282 each node needs to know (possibly by configuration) what granularity 1283 to use for labels that it assigns. Where egress control is used, this 1284 requires each node to know the granularity only for streams which 1285 leave the MPLS network at that node. For local control, in order to 1286 avoid the need to withdraw inconsistent labels, each node in the 1287 network would need to be configured consistently to know the 1288 granularity for each stream. However, in many cases this may be done 1289 by using a single level of granularity which applies to all streams 1290 (such as "one label per IP prefix in the forwarding table"). The 1291 choice between local control versus egress control could similarly be 1292 left as a configuration option. 1294 Future versions of the MPLS architecture will need to choose between 1295 three options: (i) Requiring local control; (ii) Requiring egress 1296 control; or (iii) Allowing a choice of local control or egress 1297 control. Arguments for local versus egress control are contained in 1298 appendices A and B. 1300 2.18. Granularity 1302 When forwarding by label swapping, a stream of packets following a 1303 stream arriving from upstream may be mapped into an equal or coarser 1304 grain stream. However, a coarse grain stream (for example, containing 1305 packets destined for a short IP address prefix covering many subnets) 1306 cannot be mapped directly into a finer grain stream (for example, 1307 containing packets destined for a longer IP address prefix covering a 1308 single subnet). This implies that there needs to be some mechanism 1309 for ensuring consistency between the granularity of LSPs in an MPLS 1310 network. 1312 The method used for ensuring compatibility of granularity may depend 1313 upon the method used for LSP control. 1315 When LSP control is local, it is possible that a node may pass a 1316 coarse grain label to its upstream neighbor(s), and subsequently 1317 receive a finer grain label from its downstream neighbor. In this 1318 case the node has two options: (i) It may forward the corresponding 1319 packets using normal IP datagram forwarding (i.e., by examination of 1320 the IP header); (ii) It may withdraw the label mappings that it has 1321 passed to its upstream neighbors, and replace these with finer grain 1322 label mappings. 1324 When LSP control is egress based, the label setup originates from the 1325 egress node and passes upstream. It is therefore straightforward with 1326 this approach to maintain equally-grained mappings along the route. 1328 2.19. Tunnels and Hierarchy 1330 Sometimes a router Ru takes explicit action to cause a particular 1331 packet to be delivered to another router Rd, even though Ru and Rd 1332 are not consecutive routers on the Hop-by-hop path for that packet, 1333 and Rd is not the packet's ultimate destination. For example, this 1334 may be done by encapsulating the packet inside a network layer packet 1335 whose destination address is the address of Rd itself. This creates a 1336 "tunnel" from Ru to Rd. We refer to any packet so handled as a 1337 "Tunneled Packet". 1339 2.19.1. Hop-by-Hop Routed Tunnel 1341 If a Tunneled Packet follows the Hop-by-hop path from Ru to Rd, we 1342 say that it is in an "Hop-by-Hop Routed Tunnel" whose "transmit 1343 endpoint" is Ru and whose "receive endpoint" is Rd. 1345 2.19.2. Explicitly Routed Tunnel 1347 If a Tunneled Packet travels from Ru to Rd over a path other than the 1348 Hop-by-hop path, we say that it is in an "Explicitly Routed Tunnel" 1349 whose "transmit endpoint" is Ru and whose "receive endpoint" is Rd. 1350 For example, we might send a packet through an Explicitly Routed 1351 Tunnel by encapsulating it in a packet which is source routed. 1353 2.19.3. LSP Tunnels 1355 It is possible to implement a tunnel as a LSP, and use label 1356 switching rather than network layer encapsulation to cause the packet 1357 to travel through the tunnel. The tunnel would be a LSP , where R1 is the transmit endpoint of the tunnel, and Rn is the 1359 receive endpoint of the tunnel. This is called a "LSP Tunnel". 1361 The set of packets which are to be sent though the LSP tunnel becomes 1362 a Stream, and each LSR in the tunnel must assign a label to that 1363 Stream (i.e., must assign a label to the tunnel). The criteria for 1364 assigning a particular packet to an LSP tunnel is a local matter at 1365 the tunnel's transmit endpoint. To put a packet into an LSP tunnel, 1366 the transmit endpoint pushes a label for the tunnel onto the label 1367 stack and sends the labeled packet to the next hop in the tunnel. 1369 If it is not necessary for the tunnel's receive endpoint to be able 1370 to determine which packets it receives through the tunnel, as 1371 discussed earlier, the label stack may be popped at the penultimate 1372 LSR in the tunnel. 1374 A "Hop-by-Hop Routed LSP Tunnel" is a Tunnel that is implemented as 1375 an hop-by-hop routed LSP between the transmit endpoint and the 1376 receive endpoint. 1378 An "Explicitly Routed LSP Tunnel" is a LSP Tunnel that is also an 1379 Explicitly Routed LSP. 1381 2.19.4. Hierarchy: LSP Tunnels within LSPs 1383 Consider a LSP . Let us suppose that R1 receives 1384 unlabeled packet P, and pushes on its label stack the label to cause 1385 it to follow this path, and that this is in fact the Hop-by-hop path. 1386 However, let us further suppose that R2 and R3 are not directly 1387 connected, but are "neighbors" by virtue of being the endpoints of an 1388 LSP tunnel. So the actual sequence of LSRs traversed by P is . 1391 When P travels from R1 to R2, it will have a label stack of depth 1. 1392 R2, switching on the label, determines that P must enter the tunnel. 1393 R2 first replaces the Incoming label with a label that is meaningful 1394 to R3. Then it pushes on a new label. This level 2 label has a value 1395 which is meaningful to R21. Switching is done on the level 2 label by 1396 R21, R22, R23. R23, which is the penultimate hop in the R2-R3 tunnel, 1397 pops the label stack before forwarding the packet to R3. When R3 sees 1398 packet P, P has only a level 1 label, having now exited the tunnel. 1399 Since R3 is the penultimate hop in P's level 1 LSP, it pops the label 1400 stack, and R4 receives P unlabeled. 1402 The label stack mechanism allows LSP tunneling to nest to any depth. 1404 2.19.5. LDP Peering and Hierarchy 1406 Suppose that packet P travels along a Level 1 LSP , 1407 and when going from R2 to R3 travels along a Level 2 LSP . From the perspective of the Level 2 LSP, R2's LDP peer is 1409 R21. From the perspective of the Level 1 LSP, R2's LDP peers are R1 1410 and R3. One can have LDP peers at each layer of hierarchy. We will 1411 see in sections 3.6 and 3.7 some ways to make use of this hierarchy. 1412 Note that in this example, R2 and R21 must be IGP neighbors, but R2 1413 and R3 need not be. 1415 When two LSRs are IGP neighbors, we will refer to them as "Local LDP 1416 Peers". When two LSRs may be LDP peers, but are not IGP neighbors, 1417 we will refer to them as "Remote LDP Peers". In the above example, 1418 R2 and R21 are local LDP peers, but R2 and R3 are remote LDP peers. 1420 The MPLS architecture supports two ways to distribute labels at 1421 different layers of the hierarchy: Explicit Peering and Implicit 1422 Peering. 1424 One performs label Distribution with one's Local LDP Peers by opening 1425 LDP connections to them. One can perform label Distribution with 1426 one's Remote LDP Peers in one of two ways: 1428 1. Explicit Peering 1430 In explicit peering, one sets up LDP connections between Remote 1431 LDP Peers, exactly as one would do for Local LDP Peers. This 1432 technique is most useful when the number of Remote LDP Peers is 1433 small, or the number of higher level label mappings is large, 1434 or the Remote LDP Peers are in distinct routing areas or 1435 domains. Of course, one needs to know which labels to 1436 distribute to which peers; this is addressed in section 3.1.2. 1438 Examples of the use of explicit peering is found in sections 1439 3.2.1 and 3.6. 1441 2. Implicit Peering 1443 In Implicit Peering, one does not have LDP connections to one's 1444 remote LDP peers, but only to one's local LDP peers. To 1445 distribute higher level labels to ones remote LDP peers, one 1446 encodes the higher level labels as an attribute of the lower 1447 level labels, and distributes the lower level label, along with 1448 this attribute, to the local LDP peers. The local LDP peers 1449 then propagate the information to their peers. This process 1450 continues till the information reaches remote LDP peers. Note 1451 that the intermediary nodes may also be remote LDP peers. 1453 This technique is most useful when the number of Remote LDP 1454 Peers is large. Implicit peering does not require a n-square 1455 peering mesh to distribute labels to the remote LDP peers 1456 because the information is piggybacked through the local LDP 1457 peering. However, implicit peering requires the intermediate 1458 nodes to store information that they might not be directly 1459 interested in. 1461 An example of the use of implicit peering is found in section 1462 3.3. 1464 2.20. LDP Transport 1466 LDP is used between nodes in an MPLS network to establish and 1467 maintain the label mappings. In order for LDP to operate correctly, 1468 LDP information needs to be transmitted reliably, and the LDP 1469 messages pertaining to a particular FEC need to be transmitted in 1470 sequence. This may potentially be accomplished either by using an 1471 existing reliable transport protocol such as TCP, or by specifying 1472 reliability mechanisms as part of LDP (for example, the reliability 1473 mechanisms which are defined in IDRP could potentially be "borrowed" 1474 for use with LSP). The precise means for accomplishing transport 1475 reliability with LSP are for further study, but will be specified by 1476 the MPLS Protocol Architecture before the architecture may be 1477 considered complete. 1479 2.21. Label Encodings 1481 In order to transmit a label stack along with the packet whose label 1482 stack it is, it is necessary to define a concrete encoding of the 1483 label stack. The architecture supports several different encoding 1484 techniques; the choice of encoding technique depends on the 1485 particular kind of device being used to forward labeled packets. 1487 2.21.1. MPLS-specific Hardware and/or Software 1489 If one is using MPLS-specific hardware and/or software to forward 1490 labeled packets, the most obvious way to encode the label stack is to 1491 define a new protocol to be used as a "shim" between the data link 1492 layer and network layer headers. This shim would really be just an 1493 encapsulation of the network layer packet; it would be "protocol- 1494 independent" such that it could be used to encapsulate any network 1495 layer. Hence we will refer to it as the "generic MPLS 1496 encapsulation". 1498 The generic MPLS encapsulation would in turn be encapsulated in a 1499 data link layer protocol. 1501 The generic MPLS encapsulation should contain the following fields: 1503 1. the label stack, 1505 2. a Time-to-Live (TTL) field 1506 3. a Class of Service (CoS) field 1508 The TTL field permits MPLS to provide a TTL function similar to what 1509 is provided by IP. 1511 The CoS field permits LSRs to apply various scheduling packet 1512 disciplines to labeled packets, without requiring separate labels for 1513 separate disciplines. 1515 This section is not intended to rule out the use of alternative 1516 mechanisms in network environments where such alternatives may be 1517 appropriate. 1519 2.21.2. ATM Switches as LSRs 1521 It will be noted that MPLS forwarding procedures are similar to those 1522 of legacy "label swapping" switches such as ATM switches. ATM 1523 switches use the input port and the incoming VPI/VCI value as the 1524 index into a "cross-connect" table, from which they obtain an output 1525 port and an outgoing VPI/VCI value. Therefore if one or more labels 1526 can be encoded directly into the fields which are accessed by these 1527 legacy switches, then the legacy switches can, with suitable software 1528 upgrades, be used as LSRs. We will refer to such devices as "ATM- 1529 LSRs". 1531 There are three obvious ways to encode labels in the ATM cell header 1532 (presuming the use of AAL5): 1534 1. SVC Encoding 1536 Use the VPI/VCI field to encode the label which is at the top 1537 of the label stack. This technique can be used in any network. 1538 With this encoding technique, each LSP is realized as an ATM 1539 SVC, and the LDP becomes the ATM "signaling" protocol. With 1540 this encoding technique, the ATM-LSRs cannot perform "push" or 1541 "pop" operations on the label stack. 1543 2. SVP Encoding 1545 Use the VPI field to encode the label which is at the top of 1546 the label stack, and the VCI field to encode the second label 1547 on the stack, if one is present. This technique some advantages 1548 over the previous one, in that it permits the use of ATM "VP- 1549 switching". That is, the LSPs are realized as ATM SVPs, with 1550 LDP serving as the ATM signaling protocol. 1552 However, this technique cannot always be used. If the network 1553 includes an ATM Virtual Path through a non-MPLS ATM network, 1554 then the VPI field is not necessarily available for use by 1555 MPLS. 1557 When this encoding technique is used, the ATM-LSR at the egress 1558 of the VP effectively does a "pop" operation. 1560 3. SVP Multipoint Encoding 1562 Use the VPI field to encode the label which is at the top of 1563 the label stack, use part of the VCI field to encode the second 1564 label on the stack, if one is present, and use the remainder of 1565 the VCI field to identify the LSP ingress. If this technique 1566 is used, conventional ATM VP-switching capabilities can be used 1567 to provide multipoint-to-point VPs. Cells from different 1568 packets will then carry different VCI values, so multipoint- 1569 to-point VPs can be provided without any cell interleaving 1570 problems. 1572 This technique depends on the existence of a capability for 1573 assigning small unique values to each ATM switch. 1575 If there are more labels on the stack than can be encoded in the ATM 1576 header, the ATM encodings must be combined with the generic 1577 encapsulation. This does presuppose that it be possible to tell, 1578 when reassembling the ATM cells into packets, whether the generic 1579 encapsulation is also present. 1581 2.21.3. Interoperability among Encoding Techniques 1583 If is a segment of a LSP, it is possible that R1 will 1584 use one encoding of the label stack when transmitting packet P to R2, 1585 but R2 will use a different encoding when transmitting a packet P to 1586 R3. In general, the MPLS architecture supports LSPs with different 1587 label stack encodings used on different hops. Therefore, when we 1588 discuss the procedures for processing a labeled packet, we speak in 1589 abstract terms of operating on the packet's label stack. When a 1590 labeled packet is received, the LSR must decode it to determine the 1591 current value of the label stack, then must operate on the label 1592 stack to determine the new value of the stack, and then encode the 1593 new value appropriately before transmitting the labeled packet to its 1594 next hop. 1596 Unfortunately, ATM switches have no capability for translating from 1597 one encoding technique to another. The MPLS architecture therefore 1598 requires that whenever it is possible for two ATM switches to be 1599 successive LSRs along a level m LSP for some packet, that those two 1600 ATM switches use the same encoding technique. 1602 Naturally there will be MPLS networks which contain a combination of 1603 ATM switches operating as LSRs, and other LSRs which operate using an 1604 MPLS shim header. In such networks there may be some LSRs which have 1605 ATM interfaces as well as "MPLS Shim" interfaces. This is one example 1606 of an LSR with different label stack encodings on different hops. 1607 Such an LSR may swap off an ATM encoded label stack on an incoming 1608 interface and replace it with an MPLS shim header encoded label stack 1609 on the outgoing interface. 1611 2.22. Multicast 1613 This section is for further study 1615 3. Some Applications of MPLS 1617 3.1. MPLS and Hop by Hop Routed Traffic 1619 One use of MPLS is to simplify the process of forwarding packets 1620 using hop by hop routing. 1622 3.1.1. Labels for Address Prefixes 1624 In general, router R determines the next hop for packet P by finding 1625 the address prefix X in its routing table which is the longest match 1626 for P's destination address. That is, the packets in a given Stream 1627 are just those packets which match a given address prefix in R's 1628 routing table. In this case, a Stream can be identified with an 1629 address prefix. 1631 If packet P must traverse a sequence of routers, and at each router 1632 in the sequence P matches the same address prefix, MPLS simplifies 1633 the forwarding process by enabling all routers but the first to avoid 1634 executing the best match algorithm; they need only look up the label. 1636 3.1.2. Distributing Labels for Address Prefixes 1638 3.1.2.1. LDP Peers for a Particular Address Prefix 1640 LSRs R1 and R2 are considered to be LDP Peers for address prefix X if 1641 and only if one of the following conditions holds: 1643 1. R1's route to X is a route which it learned about via a 1644 particular instance of a particular IGP, and R2 is a neighbor 1645 of R1 in that instance of that IGP 1647 2. R1's route to X is a route which it learned about by some 1648 instance of routing algorithm A1, and that route is 1649 redistributed into an instance of routing algorithm A2, and R2 1650 is a neighbor of R1 in that instance of A2 1652 3. R1 is the receive endpoint of an LSP Tunnel that is within 1653 another LSP, and R2 is a transmit endpoint of that tunnel, and 1654 R1 and R2 are participants in a common instance of an IGP, and 1655 are in the same IGP area (if the IGP in question has areas), 1656 and R1's route to X was learned via that IGP instance, or is 1657 redistributed by R1 into that IGP instance 1659 4. R1's route to X is a route which it learned about via BGP, and 1660 R2 is a BGP peer of R1 1662 In general, these rules ensure that if the route to a particular 1663 address prefix is distributed via an IGP, the LDP peers for that 1664 address prefix are the IGP neighbors. If the route to a particular 1665 address prefix is distributed via BGP, the LDP peers for that address 1666 prefix are the BGP peers. In other cases of LSP tunneling, the 1667 tunnel endpoints are LDP peers. 1669 3.1.2.2. Distributing Labels 1671 In order to use MPLS for the forwarding of normally routed traffic, 1672 each LSR MUST: 1674 1. bind one or more labels to each address prefix that appears in 1675 its routing table; 1677 2. for each such address prefix X, use an LDP to distribute the 1678 mapping of a label to X to each of its LDP Peers for X. 1680 There is also one circumstance in which an LSR must distribute a 1681 label mapping for an address prefix, even if it is not the LSR which 1682 bound that label to that address prefix: 1684 3. If R1 uses BGP to distribute a route to X, naming some other 1685 LSR R2 as the BGP Next Hop to X, and if R1 knows that R2 has 1686 assigned label L to X, then R1 must distribute the mapping 1687 between T and X to any BGP peer to which it distributes that 1688 route. 1690 These rules ensure that labels corresponding to address prefixes 1691 which correspond to BGP routes are distributed to IGP neighbors if 1692 and only if the BGP routes are distributed into the IGP. Otherwise, 1693 the labels bound to BGP routes are distributed only to the other BGP 1694 speakers. 1696 These rules are intended to indicate which label mappings must be 1697 distributed by a given LSR to which other LSRs, NOT to indicate the 1698 conditions under which the distribution is to be made. That is 1699 discussed in section 2.17. 1701 3.1.3. Using the Hop by Hop path as the LSP 1703 If the hop-by-hop path that packet P needs to follow is , then can be an LSP as long as: 1706 1. there is a single address prefix X, such that, for all i, 1707 1<=i, and the Hop-by-hop path for P2 is . Let's suppose that R3 binds label L3 to X, and distributes 1967 this mapping to R2. R2 binds label L2 to X, and distributes this 1968 mapping to both R1 and R4. When R2 receives packet P1, its incoming 1969 label will be L2. R2 will overwrite L2 with L3, and send P1 to R3. 1970 When R2 receives packet P2, its incoming label will also be L2. R2 1971 again overwrites L2 with L3, and send P2 on to R3. 1973 Note then that when P1 and P2 are traveling from R2 to R3, they carry 1974 the same label, and as far as MPLS is concerned, they cannot be 1975 distinguished. Thus instead of talking about two distinct LSPs, and , we might talk of a single "Multipoint-to- 1977 Point LSP", which we might denote as <{R1, R4}, R2, R3>. 1979 This creates a difficulty when we attempt to use conventional ATM 1980 switches as LSRs. Since conventional ATM switches do not support 1981 multipoint-to-point connections, there must be procedures to ensure 1982 that each LSP is realized as a point-to-point VC. However, if ATM 1983 switches which do support multipoint-to-point VCs are in use, then 1984 the LSPs can be most efficiently realized as multipoint-to-point VCs. 1985 Alternatively, if the SVP Multipoint Encoding (section 2.21) can be 1986 used, the LSPs can be realized as multipoint-to-point SVPs. 1988 3.6. LSP Tunneling between BGP Border Routers 1990 Consider the case of an Autonomous System, A, which carries transit 1991 traffic between other Autonomous Systems. Autonomous System A will 1992 have a number of BGP Border Routers, and a mesh of BGP connections 1993 among them, over which BGP routes are distributed. In many such 1994 cases, it is desirable to avoid distributing the BGP routes to 1995 routers which are not BGP Border Routers. If this can be avoided, 1996 the "route distribution load" on those routers is significantly 1997 reduced. However, there must be some means of ensuring that the 1998 transit traffic will be delivered from Border Router to Border Router 1999 by the interior routers. 2001 This can easily be done by means of LSP Tunnels. Suppose that BGP 2002 routes are distributed only to BGP Border Routers, and not to the 2003 interior routers that lie along the Hop-by-hop path from Border 2004 Router to Border Router. LSP Tunnels can then be used as follows: 2006 1. Each BGP Border Router distributes, to every other BGP Border 2007 Router in the same Autonomous System, a label for each address 2008 prefix that it distributes to that router via BGP. 2010 2. The IGP for the Autonomous System maintains a host route for 2011 each BGP Border Router. Each interior router distributes its 2012 labels for these host routes to each of its IGP neighbors. 2014 3. Suppose that: 2016 a) BGP Border Router B1 receives an unlabeled packet P, 2018 b) address prefix X in B1's routing table is the longest 2019 match for the destination address of P, 2021 c) the route to X is a BGP route, 2023 d) the BGP Next Hop for X is B2, 2025 e) B2 has bound label L1 to X, and has distributed this 2026 mapping to B1, 2028 f) the IGP next hop for the address of B2 is I1, 2030 g) the address of B2 is in B1's and I1's IGP routing tables 2031 as a host route, and 2033 h) I1 has bound label L2 to the address of B2, and 2034 distributed this mapping to B1. 2036 Then before sending packet P to I1, B1 must create a label 2037 stack for P, then push on label L1, and then push on label L2. 2039 4. Suppose that BGP Border Router B1 receives a labeled Packet P, 2040 where the label on the top of the label stack corresponds to an 2041 address prefix, X, to which the route is a BGP route, and that 2042 conditions 3b, 3c, 3d, and 3e all hold. Then before sending 2043 packet P to I1, B1 must replace the label at the top of the 2044 label stack with L1, and then push on label L2. 2046 With these procedures, a given packet P follows a level 1 LSP all of 2047 whose members are BGP Border Routers, and between each pair of BGP 2048 Border Routers in the level 1 LSP, it follows a level 2 LSP. 2050 These procedures effectively create a Hop-by-Hop Routed LSP Tunnel 2051 between the BGP Border Routers. 2053 Since the BGP border routers are exchanging label mappings for 2054 address prefixes that are not even known to the IGP routing, the BGP 2055 routers should become explicit LDP peers with each other. 2057 3.7. Other Uses of Hop-by-Hop Routed LSP Tunnels 2059 The use of Hop-by-Hop Routed LSP Tunnels is not restricted to tunnels 2060 between BGP Next Hops. Any situation in which one might otherwise 2061 have used an encapsulation tunnel is one in which it is appropriate 2062 to use a Hop-by-Hop Routed LSP Tunnel. Instead of encapsulating the 2063 packet with a new header whose destination address is the address of 2064 the tunnel's receive endpoint, the label corresponding to the address 2065 prefix which is the longest match for the address of the tunnel's 2066 receive endpoint is pushed on the packet's label stack. The packet 2067 which is sent into the tunnel may or may not already be labeled. 2069 If the transmit endpoint of the tunnel wishes to put a labeled packet 2070 into the tunnel, it must first replace the label value at the top of 2071 the stack with a label value that was distributed to it by the 2072 tunnel's receive endpoint. Then it must push on the label which 2073 corresponds to the tunnel itself, as distributed to it by the next 2074 hop along the tunnel. To allow this, the tunnel endpoints should be 2075 explicit LDP peers. The label mappings they need to exchange are of 2076 no interest to the LSRs along the tunnel. 2078 3.8. MPLS and Multicast 2080 Multicast routing proceeds by constructing multicast trees. The tree 2081 along which a particular multicast packet must get forwarded depends 2082 in general on the packet's source address and its destination 2083 address. Whenever a particular LSR is a node in a particular 2084 multicast tree, it binds a label to that tree. It then distributes 2085 that mapping to its parent on the multicast tree. (If the node in 2086 question is on a LAN, and has siblings on that LAN, it must also 2087 distribute the mapping to its siblings. This allows the parent to 2088 use a single label value when multicasting to all children on the 2089 LAN.) 2091 When a multicast labeled packet arrives, the NHLFE corresponding to 2092 the label indicates the set of output interfaces for that packet, as 2093 well as the outgoing label. If the same label encoding technique is 2094 used on all the outgoing interfaces, the very same packet can be sent 2095 to all the children. 2097 4. LDP Procedures 2099 This section is FFS. 2101 5. Security Considerations 2103 Security considerations are not discussed in this version of this 2104 draft. 2106 6. Authors' Addresses 2108 Eric C. Rosen 2109 Cisco Systems, Inc. 2110 250 Apollo Drive 2111 Chelmsford, MA, 01824 2112 E-mail: erosen@cisco.com 2114 Arun Viswanathan 2115 IBM Corp. 2116 17 Skyline Drive 2117 Hawthorne NY 10532 2118 914-784-3273 2119 E-mail: arunv@vnet.ibm.com 2121 Ross Callon 2122 Ascend Communications, Inc. 2123 1 Robbins Road 2124 Westford, MA 01886 2125 508-952-7412 2126 E-mail: rcallon@casc.com 2128 7. References 2130 [1] "A Framework for Multiprotocol Label Switching", R.Callon, 2131 P.Doolan, N.Feldman, A.Fredette, G.Swallow, and A.Viswanathan, work 2132 in progress, Internet Draft , July 2133 1997. 2135 [2] "ARIS: Aggregate Route-Based IP Switching", A. Viswanathan, N. 2136 Feldman, R. Boivie, R. Woundy, work in progress, Internet Draft 2137 , March 1997. 2139 [3] "ARIS Specification", N. Feldman, A. Viswanathan, work in 2140 progress, Internet Draft , March 2141 1997. 2143 [4] "ARIS Support for LAN Media Switching", S. Blake, A. Ghanwani, W. 2144 Pace, V. Srinivasan, work in progress, Internet Draft , March 1997. 2147 [5] "Tag Switching Architecture - Overview", Rekhter, Davie, Katz, 2148 Rosen, Swallow, Farinacci, work in progress, Internet Draft , January, 1997. 2151 [6] "Tag distribution Protocol", Doolan, Davie, Katz, Rekhter, Rosen, 2152 work in progress, Internet Draft , May, 2153 1997. 2155 [7] "Use of Tag Switching with ATM", Davie, Doolan, Lawrence, 2156 McGloghrie, Rekhter, Rosen, Swallow, work in progress, Internet Draft 2157 , January, 1997. 2159 [8] "Label Switching: Label Stack Encodings", Rosen, Rekhter, Tappan, 2160 Farinacci, Fedorkow, Li, work in progress, Internet Draft , June, 1997. 2163 [9] "Partitioning Tag Space among Multicast Routers on a Common 2164 Subnet", Farinacci, work in progress, internet draft , December, 1996. 2167 [10] "Multicast Tag Binding and Distribution using PIM", Farinacci, 2168 Rekhter, work in progress, internet draft , December, 1996. 2171 [11] "Toshiba's Router Architecture Extensions for ATM: Overview", 2172 Katsube, Nagami, Esaki, RFC 2098, February, 1997. 2174 [12] "Loop-Free Routing Using Diffusing Computations", J.J. Garcia- 2175 Luna-Aceves, IEEE/ACM Transactions on Networking, Vol. 1, No. 1, 2176 February 1993. 2178 Appendix A Why Egress Control is Better 2180 This section is written by Arun Viswanathan. 2182 It is demonstrated here why egress control is a necessary and 2183 sufficient mechanism for the LDP, and therefore is the optimal method 2184 for setting up LSPs. 2186 The necessary condition is established by citing counter examples 2187 that can be achieved *only* by egress control. It's also established 2188 why these typical scenarios are vital requirements for a 2189 multiprotocol LDP. The sufficiency part is established by proving 2190 that egress control subsumes the local control. 2192 Then finally, some discussions are made to mitigate concerns 2193 expressed against not having local control. It is shown that local 2194 control has clearly undesirable properties which may lead to severe 2195 scalability and robustness problems. It is also shown that in having 2196 both egress control and local control simultaneously in a network 2197 leads to interoperability problems and how local control abrogates 2198 the essential benefits of egress control. 2200 A complete and self-contained case is presented here that clearly 2201 establishes that egress control is the preponderant mechanism for 2202 LDP, and it suffices to support egress control alone as the 2203 distribution paradigm. 2205 A.1 Definition of an Egress 2207 A node is identified as an "egress" for a Stream, if: 2209 1) it's at a routing boundary for that Stream, 2210 2) the next hop for that Stream is non-MPLS, 2211 3) the Stream is directly attached or the node itself. 2213 Nodes that satisfy conditions 1 or 2 for Streams, will by default 2214 start behaving as egress for those streams. Note that conditions 1 2215 and 2 can be learned dynamically. For condition 3, nodes will not by 2216 default act as an egress for themselves or directly attached 2217 networks. If this condition is made the default, the LSPs setup by 2218 egress control will create LSPs that are identical to the LSPs 2219 created by local control. 2221 A.2 Overview of Egress Control 2223 When a node is an egress for a Stream, it originates a LSP setup 2224 message for that particular Stream. The setup message is sent to all 2225 MPLS neighbors, except the next hop neighbor. Each of these messages 2226 to the neighbors carry an appropriate label for that Stream. When a 2227 node in a MPLS domain receives a setup message from a neighbor for a 2228 particular Stream, it checks if that neighbor is the next hop for the 2229 given Stream. If so, it propagates the message to all its MPLS 2230 neighbors, except the next hop from which the message arrived. If 2231 not, the node may keep the label provided in the setup message for 2232 future use or negatively acknowledge the node that sent the message 2233 to release the label assignment. But it must not forward the setup 2234 message from the incorrect next hop to any of its neighbors. This 2235 flooding scheme is similar in mechanism to Reverse Path Multicast. 2237 When a next hop for a Stream changes due to change in network 2238 topology, or a new node joins the topology, the node is locally 2239 appended to the existing LSP, without requiring egress intervention. 2240 The node may either request the label mapping from the new next hop, 2241 or use the previously stored (but unused) label from that next hop. 2242 In the former case, the new next hop immediately responds with a 2243 label mapping for that Stream if it has its own downstream mapping 2244 for that Stream. 2246 A.3 Why Egress Control is Necessary 2248 There are some important situations in which egress control is 2249 necessary: 2251 - Shutting off an LSP 2253 If for some reason a network administrator requires to "shut off" 2254 a LSP setup for a particular Stream, s/he can configure the 2255 egress node for that Stream for the desired result. Note that 2256 the requirement to shut off an LSP is a very fundamental one. If 2257 a destination has network layer reachability but no MPLS layer 2258 reachability (because of a problem in MPLS layer), shutting off 2259 an LSP provides the only means to reach that destination. This 2260 mode of operation can be used by LSRs in a network that aren't a 2261 sink for large amounts of data. These LSRs usually require an 2262 occasional telnet or network management traffic. It's important 2263 to provide the capability that such nodes in a network can be 2264 accessed through hop-by-hop connectivity avoiding the MPLS layer 2265 optimization. The reachability is more important than 2266 optimization in instances like this. The MPLS architecture MUST 2267 provide this capability. 2269 Note that this is only possible in local control when each node 2270 in an entire network is configured to shut off a LSP setup for a 2271 particular Stream. Such is neither desirable nor scalable. 2273 - Egress Aggregation 2275 In some networks, due to the absence of routing summarization, 2276 aggregation may not be possible through routing information. 2277 However, with Egress control, it is possible to aggregate *all* 2278 Streams that exit the network through a common egress node with a 2279 single LSP. This is achieved easily because the egress simply 2280 can use the same label for all Streams. 2282 Such is simply not possible with the Local control; with local 2283 knowledge LSRs cannot map several Streams to a single label 2284 because it is unknown if Streams will diverge at some subsequent 2285 downstream node. 2287 The egress aggregation works for both distance vector protocols 2288 and link state protocols; it is protocol independent. Note that 2289 when using VP switching in conjunction with some distance vector 2290 protocols it becomes very essential that such aggregation be 2291 possible, as there are many vendor switches that don't have VC 2292 merging capability, and have limited VP switching capability. 2293 The egress control provides such vendors with a level-playing 2294 field to compete with MPLS products. Moreover, this capability 2295 can be very useful in enterprise networks; where several legacy 2296 LANs at a site can be aggregated to the egress LSR at that site. 2297 Furthermore, this approach can drastically reduce signalling and 2298 LSP state maintenance overheads in the entire network. 2300 - Loop Prevention 2302 The loop-prevention mechanism only works from the egress node for 2303 multipoint-to-point LSPs, since the loop prevention mechanism 2304 requires the list of LSR nodes through which the setup message 2305 has already traversed in order to identify and prevent LSP loops. 2307 A loop prevention scheme is not possible through local control. 2309 - De-aggregation 2311 Egress control provides the capability to de-aggregate one or 2312 more Streams from an aggregated Stream. For example, if a 2313 network is aggregating all CIDRs of an EBGP node into a single 2314 LSP, with egress control, a specific CIDR from this bundle can be 2315 given its own dedicated LSP. This enables one to apply special 2316 policies to specific CIDRs when required. 2318 In the local control this can be achieved only by configuring 2319 every node in the network with specific de-aggregation 2320 information and the associated policy. This approach can lead 2321 severe scalability problems. 2323 - Unique Labels 2325 As is known, when using VP merging, all ingresses must have 2326 unique VCI values to prevent cell interleaving. With egress 2327 control, it is possible to distribute unique VCI values to the 2328 ingress nodes, avoiding the need to configure each ingress node. 2329 The egress node can pick a unique VCI for each ingress node. 2330 Another benefit of egress control is that each egress can be 2331 configured with a unique label value in the case of egress 2332 aggregation (as described above). Since the label value is 2333 unique, the same label value can be used on all the segments of a 2334 LSP. This enables one to identify anywhere in a network each LSP 2335 that is associated with a certain egress node, thus easing 2336 network debugging. 2338 This again, is not possible in the local control because of the 2339 lack of a single coordinating node. 2341 A.4 Examples that work better through egress control 2343 Local control needs to propagate attributes that come from the 2344 downstream node to all upstream nodes. This behavior itself can be 2345 LIKENED to the egress control. Nevertheless, the local control can 2346 achieve these only in a severely inefficient manner. Since each node 2347 only knows of local information, it creates and distributes an LSP 2348 with incorrect attributes. As each node learns of new downstream 2349 attributes, a correction is made as the attributes are propagated 2350 upstream again. This can lead to a worst case of O(n-squared) setup 2351 messages to create a single LSP, where n is the number of nodes in a 2352 LSP. 2354 In the egress control, the attribute distribution is achieved during 2355 initial LSP setup, with a single message from the egress to 2356 ingresses. 2358 - TTL/Traceroute 2360 The ingress requires a proper LSP hop-count value to decrement 2361 TTL in packets that use a particular LSP, in environments such as 2362 ATM which do not have a TTL equivalent. This simulates the TTL 2363 decrement which exists in an IP network, and also enables scoping 2364 utilities, such as traceroute, to work as they do today in IP 2365 networks. In egress control, the LSP hop-count is known at the 2366 ingress as a by-product of the LSP setup message, since an LSP 2367 setup message traverses from egress to ingress, and increments 2368 the hop-count at each node along the path. 2370 - MTU 2372 When the MTU at the egress node is smaller than the MTU at some 2373 of the ingress nodes, packets originated at those ingress nodes 2374 will be dropped when they reach the egress node. Hosts not using 2375 MTU discovery have no means to recover from this. However, 2376 similar to the hop-count, the minimum LSP MTU can be propagated 2377 to the ingresses via egress control LSP setup messages, enabling 2378 the ingress to do fragmentation when required. 2380 - Implicit Peering 2382 Implicit peering is the mechanism through which higher level 2383 stack labels are communicated to the ingress nodes. These label 2384 values are piggybacked in the LSP setup messages. This works 2385 best with egress control; when the egress creates the setup 2386 message, it can piggyback the stack labels at the same time. 2388 - ToS/COS Based LSPs 2390 When certain LSPs require higher or lower precedence or priority 2391 through a network, the single egress node for that LSP can be 2392 configured with the required priority and this can be 2393 communicated in the egress control LSP setup message. In the 2394 local control, each and every node in the network must be 2395 configured per LSP to achieve the same result. 2397 The local control initially distributes labels to its neighbors 2398 willy-nilly, and then waits for attributes to come through egress 2399 control. Thus, local control is completely dependent on egress 2400 control to provide complete functional operation to LSPs. Otherwise, 2401 local control requires that attributes be configured through the 2402 entire network for each Stream. This is the most compelling argument 2403 that local control is *not sufficient*; or conversely, egress control 2404 is necessary. This demonstrates egress control subsumes the local 2405 control. Moreover, distribution of labels without associated 2406 attributes may not be appropriate and may lead to undesired results. 2408 A.5 Egress Control is Sufficient 2410 The argument for sufficiency is proved by demonstrating that required 2411 LSPs can be created with egress control, and this is not the case 2412 with local control. 2414 The egress control can create an LSP for every route entry made by 2415 the routing protocols: 2417 1. A route can be learned from another routing domain, in which 2418 case the LSR at the routing domain will act as an egress for 2419 the route and originate an LSP setup for that route. 2421 2. A route can be a locally attached network or the LSR itself may 2422 be a host route. In this case, the LSR to which such a route 2423 is attached originates an LSP setup message. 2425 3. An LSR with a non-MPLS next-hop behaves as an egress for all 2426 those route whose next-hop is the non-MPLS neighbor. 2428 These three above methods can create an LSP for each route entry in a 2429 network. Moreover, policy specific LSPs, as described previously, 2430 can *only* be achieved with egress control. Thus, egress control is 2431 necessary and sufficient for creating LSPs. QED. 2433 A.6 Discussions 2435 A.6.1 Is Local control faster than Egress control? 2437 During topology changes, such as links going down, coming up, change 2438 in link cost, etc, there is no difference in setup latency between 2439 Egress Control and Local control. This is due to the fact that the 2440 node (Ru) which undergoes a change in next-hop for a Stream 2441 immediately requests a label assignment from the new next hop node 2442 (Rd). The new next hop node then immediately supplies the label 2443 mapping for the requested Stream. As explained in the Egress Control 2444 Method section, the node Ru may already have stored label assignments 2445 from the node Rd, in which case node Ru can immediately splice itself 2446 to the multipoint-to-point tree. Hence, new nodes are spliced into 2447 existing LSPs locally. In the scenario where a network initially 2448 learns of a new route, although the Local control may setup LSPs 2449 faster than the Egress control, this difference in latency has no 2450 perceived advantage. Since routing itself may take several seconds 2451 to propagate and converge on the new route information, the potential 2452 latency of egress control is small as compared to the routing 2453 protocol propagation time, and the initial setup time at route 2454 propagation time is unimportant since these are long lived LSPs. 2456 Moreover, the hurried distribution of labels in local control may not 2457 carry much meaning because: 2459 4. The associated attributes are not applied or propagated to the 2460 ingress. 2462 5. While the ingress may believe it has an LSP, in reality the 2463 packets may be blackholed in the middle of the network if the 2464 full LSP is not established. 2466 6. Policy based LSPs, which can only be achieved via egress 2467 control as described above, may undo an un-used label 2468 assignment established by local control. 2470 A.6.2 Scalability and Robustness 2472 It has been alleged that the egress control does not have the 2473 scalability and robustness properties required by distributed 2474 processing. However, the egress uses a root distribution paradigm 2475 commonly used by many other standard routing protocols. For example, 2476 in the case of OSPF, LSAs are flooded through a domain originating at 2477 the "egress", where the difference being that the flooding in the 2478 case of OSPF is contained through a sequence number and in the Egress 2479 control it is contained by the next hop validation. In the case of 2480 PIM (and some other multicast protocols), the distribution mechanism 2481 is in fact exactly similar. Even in BGP with route reflection, 2482 updates originate at the root and traverse a tree structure to reach 2483 the peers, as opposed to a n-square mesh. The commonality is the 2484 distribution paradigm, in which the distribution originates at the 2485 root of a tree and traverses the branches till it reaches all the 2486 leaves. None of the above mentioned protocols have scalability or 2487 robustness problems because of the distribution paradigm. 2489 The ONLY concern expressed against to counter Egress control is that 2490 if the setup message does not propagate upstream from a certain node, 2491 then the sub-tree upstream of that node will not be added into the 2492 LSP. It's a reasonable concern, but further analysis shows that it's 2493 not a realistic problem. The impact of this problem compared to the 2494 impact of a similar problem in local control are exactly the same 2495 when LSRs employed in a MPLS domain have little or no forwarding 2496 capabilities (for example, ATM LSRs), since in both cases, packets 2497 are blackholed. In fact, in the egress control the packets for 2498 afflicted LSPs will be dropped right at the ingress, while with local 2499 control the packets will be dropped at the point of breakage, causing 2500 packets to unnecessarily traverse part way through the network. When 2501 reasonable forwarding capability exists in the MPLS domain, with the 2502 egress control the packets may be forwarded hop-by-hop till the point 2503 where the LSP setup ended. Whereas in case of local control, the 2504 packets will label switched till the point of breakage and hop-by-hop 2505 forwarded till the LSP segment resumes. Since egress control has 2506 advantages when there is no forwarding capability, and local control 2507 is has advantages when there is forwarding capability, there is an 2508 equal tradeoff between them, and thus, neither is superior or 2509 inferior in this regard. This latter case is simply a loss in 2510 optimization, since the network has reasonable forwarding 2511 capabilities. Hence the robustness issue is not a problem in either 2512 types of networks. As mentioned before, the local control is 2513 dependent on egress control for distributing attributes. The 2514 attribute distribution could then also face the same problem of 2515 stalled propagation, which would lead to erroneous LSP setup. So, 2516 the local control can also be seen as afflicted with this problem, if 2517 it exists. 2519 Moreover, if stalled propagation were truly a problem, there are 2520 other schemes in MPLS that would face the same issue. For example, 2521 the label distribution through PIM, Explicit Route setup, and RSVP 2522 would also not work, and therefore should be withdrawn :-). 2524 Note that exhaustion of label space cannot stall the propagation of 2525 messages to the upstream nodes. Appropriate indications can be given 2526 to the upstream nodes in the setup message that no label allocation 2527 was made because of exhaustion of label space, so that correct action 2528 can be taken at the upstream nodes, and yet the LSP setup would 2529 continue. 2531 A.6.3 Conclusion 2533 The attempt here is not to deride the local control, but since one 2534 method subsumes the features and properties of the other, then why 2535 support both and complicate implementation, interoperability and 2536 maintenance? In fact RFC1925 says, "In protocol design, perfection 2537 has been reached not when there is nothing left to add, but when 2538 there is nothing left to take away". A usual diplomatic resolution 2539 for such controversy is to make accommodations for both. We feel 2540 that it's a poor choice of architecture to support both. That is why 2541 we feel strongly that this must be evaluated by the MPLS WG. 2543 In a way, controlling the network behavior as to which LSP are 2544 formed, which Streams map to which LSPs, and the associated 2545 attributes, can be compared to applying policies at the edges of an 2546 AS. This is precisely what the egress control provides, a rich and 2547 varied policy control at the egress node of LSPs. 2549 Appendix B Why Local Control is Better 2551 This section is written by Eric Rosen. 2553 The remaining area of dispute between advocates of "local control" 2554 and advocates of "egress control" is relatively small. In 2555 particular, there is agreement on the following points: 2557 1. If LSR R1's next hop for address prefix X is LSR R2, and R2 is 2558 in a different area or in a different routing domain than R1, 2559 then R1 may assign and distribute a label for X, even if R2 has 2560 not done so. 2562 This means that even under egress control, the border routers 2563 in one autonomous system do not have to wait, before 2564 distributing labels, for any downstream routers which are in 2565 other autonomous systems. 2567 2. If LSR R1's next hop for address prefix X is LSR R2, but R1 2568 receives a label mapping for X from LSR R3, then R1 may 2569 remember R3's mapping. If, at some later time, R3 becomes R1's 2570 next hop for S, then (if R1 is not using loop prevention) R1 2571 may immediately begin using R3 as the LSP next hop for S, using 2572 the remembered mapping from R3. 2574 3. Attributes which are passed upstream from the egress may change 2575 over time, as a result of reconfiguration of the egress, or of 2576 other events. This means that even if egress control is used, 2577 LSRs must be able to accept attribute changes on existing LSPs; 2578 attributes are not fixed when the LSP is first constructed, nor 2579 does a change in attributes require a new LSP to be 2580 constructed. 2582 The dispute is centered on the situation in which the following 2583 conditions hold: 2585 - LSR R1's next hop for address prefix X is within the same 2586 administrative domain as R1, and 2588 - R1's next hop for X has not distributed to R1 a label for X, and 2590 - R1 has not yet distributed to its neighbors any labels for X. 2592 With local control, R1 is permitted to distribute a label for X to 2593 its neighbors; with egress control it is not. 2595 From an implementation perspective, the difference then between 2596 egress control and local control is relatively small. Egress control 2597 simply creates an additional state in the label distribution process, 2598 and prohibits label distribution in that state. 2600 From the perspective of network behavior, however, this difference is 2601 a bit more significant: 2603 - Egress control adds latency to the initial construction of an 2604 LSP, because the path must be set up serially, node by node from 2605 the egress. With local control, all LSRs along the path may 2606 perform their setup activities in parallel. 2608 - Egress control adds additional interdependencies among nodes, as 2609 there is something that one node cannot do until some other node 2610 does something else first, which it cannot do until some other 2611 node does something first, etc. This is problematical for a 2612 number of reasons. 2614 * In robust system design, one tries to avoid such 2615 interdependencies, since they always bring along robustness 2616 and scalability problems. 2618 * In some situations, it is advantageous for a node to use 2619 MPLS, even if some node downstream is not functioning 2620 properly and hence not assigning labels as it should. 2622 These disadvantages might be tolerable if there is some significant 2623 problem which can be solved by egress control, but not by local 2624 control. So it is worth looking to see if there is such a problem. 2626 There are a number of situations in which it may be desirable for an 2627 LSP Ingress node to know certain attributes of the LSP, e.g., the 2628 number of hops in the LSP. It is sometimes claimed that obtaining 2629 such information requires the use of egress control. However, this 2630 is not true. Any attribute of an LSP is liable to change after the 2631 LSP exists. Procedures to detect and communicate the change must 2632 exist. These procedures CANNOT be tied to the initial construction 2633 of the LSP, since they must execute after the LSP has already been 2634 constructed. The ability to pass control information upstream along 2635 a path towards an ingress node does not presuppose anything about the 2636 procedures used to construct the path. 2638 The fundamental issue separating the advocates of egress control from 2639 the advocates of local control is really a network management issue. 2640 To advocates of egress control, setting up an LSP for a particular 2641 address prefix is analogous to setting up a PVC in an ATM network. 2642 When setting up a PVC, one goes to one of the PVC endpoints and 2643 enters certain configuration information. Similarly, one might think 2644 that to set up an LSP for a particular address prefix, one goes to 2645 the LSR which is the egress for that address prefix, and enters 2646 configuration information. This allows the network administrator 2647 complete control of which address prefixes are assigned LSPs and 2648 which are not. And if this is one's management model, egress control 2649 does simplify the configuration issues. 2651 On the other hand, if one's model is that the LSPs get set up 2652 automatically by the network, as a result of the operation of the 2653 routing algorithm, then egress control is of no utility at all. When 2654 one hears the claim that "egress control allow you to control your 2655 network from a few nodes", what is really being claimed is "egress 2656 control simplifies the job of manually configuring all the LSPs in 2657 your network". Of course, if you don't intend to manually configure 2658 all the LSPs in your network, this is irrelevant. 2660 So before an egress control scheme is adopted, one should ask whether 2661 complete manual configuration of the set of address prefixes which 2662 get assigned LSPs is necessary. That is, is this capability needed 2663 to solve a real problem? 2665 It is sometimes claimed that egress control is needed if one wants to 2666 conserve labels by assigning a single label to all address prefixes 2667 which have the same egress. This is not true. If the network is 2668 running a link state routing algorithm, each LSR already knows which 2669 address prefixes have a common egress, and hence can assign a common 2670 label. If the network is running a distance vector routing protocol, 2671 information about which address prefixes have a common egress can be 2672 made to "bubble up" from the egress, using LDP, even if local control 2673 is used. 2675 It is only in the case where the number of available labels is so 2676 small that their use must be manually administered that egress 2677 control has an advantage. It may be arguable that egress control 2678 should be an option that can be used for the special cases in which 2679 it provides value. In most cases, there is no reason to have it at 2680 all.