idnits 2.17.1 draft-rekhter-tagswitch-arch-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 21 longer pages, the longest (page 2) being 61 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 1997) is 9776 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft Yakov Rekhter 3 Expiration date:January 1998 cisco Systems 4 Bruce Davie 5 cisco Systems 6 Dave Katz 7 Juniper Networks Inc. 8 Eric Rosen 9 cisco Systems 10 George Swallow 11 cisco Systems 12 Dino Farinacci 13 cisco Systems 14 July 1997 16 Tag Switching Architecture - Overview 18 draft-rekhter-tagswitch-arch-01.txt 20 1. Status of this Memo 22 This document is an Internet Draft. Internet Drafts are working 23 documents of the Internet Engineering Task Force (IETF), its Areas, 24 and its Working Groups. Note that other groups may also distribute 25 working documents as Internet Drafts. 27 Internet Drafts are draft documents valid for a maximum of six 28 months. Internet Drafts may be updated, replaced, or obsoleted by 29 other documents at any time. It is not appropriate to use Internet 30 Drafts as reference material or to cite them other than as a "working 31 draft" or "work in progress." 33 Please check the 1id-abstracts.txt listing contained in the 34 internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net, 35 nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the 36 current status of any Internet Draft. 38 2. Abstract 40 This document provides an overview of tag switching. Tag switching is 41 a way to combine the label-swapping forwarding paradigm with network 42 layer routing. This has several advantages. Tags can have a wide 43 spectrum of forwarding granularities, so at one end of the spectrum a 44 tag could be associated with a group of destinations, while at the 45 other a tag could be associated with a single application flow. At 46 the same time forwarding based on tag switching, due to its 47 simplicity, is well suited to high performance forwarding. These 48 factors facilitate the development of a routing system which is both 49 functionally rich and scalable. Finally, tag switching simplifies 50 integration of routers and ATM switches by employing common 51 addressing, routing, and management procedures. 53 3. Introduction 55 Continuous growth of the Internet demands higher bandwidth within the 56 Internet Service Providers (ISPs). However, growth of the Internet is 57 not the only driving factor for higher bandwidth - demand for higher 58 bandwidth also comes from emerging multimedia applications. Demand 59 for higher bandwidth, in turn, requires higher forwarding performance 60 for both multicast and unicast traffic. 62 The growth of the Internet also demands improved scaling properties 63 of the Internet routing system. The ability to contain the volume of 64 routing information maintained by individual routers and the ability 65 to build a hierarchy of routing knowledge are essential to support a 66 high quality, scalable routing system. 68 While the destination-based forwarding paradigm is adequate in many 69 situations, we already see examples where it is no longer adequate. 70 The ability to overcome the rigidity of destination-based forwarding 71 and to have more flexible control over how traffic is routed is 72 likely to become more and more important. 74 We see the need to improve forwarding performance while at the same 75 time adding routing functionality to support multicast, allowing more 76 flexible control over how traffic is routed, and providing the 77 ability to build a hierarchy of routing knowledge. Moreover, it 78 becomes more and more crucial to have a routing system that can 79 support graceful evolution to accommodate new and emerging 80 requirements. 82 Tag switching is a technology that provides an efficient solution to 83 these challenges. Tag switching blends the flexibility and rich 84 functionality provided by Network Layer routing with the simplicity 85 provided by the label swapping forwarding paradigm. The simplicity of 86 the tag switching forwarding paradigm (label swapping) enables 87 improved forwarding performance, while maintaining competitive 88 price/performance. By associating a wide range of forwarding 89 granularities with a tag, the same forwarding paradigm can be used to 90 support a wide variety of routing functions, such as destination- 91 based routing, multicast, hierarchy of routing knowledge, and 92 flexible routing control. Finally, a combination of simple 93 forwarding, a wide range of forwarding granularities, and the ability 94 to evolve routing functionality while preserving the same forwarding 95 paradigm enables a routing system that can gracefully evolve to 96 accommodate new and emerging requirements. 98 4. Tag Switching components 100 Tag switching consists of two components: forwarding and control. The 101 forwarding component uses the tag information (tags) carried by 102 packets and the tag forwarding information maintained by a tag switch 103 to perform packet forwarding. The control component is responsible 104 for maintaining correct tag forwarding information among a group of 105 inter- connected tag switches. 107 Segregating control and forwarding into separate components promotes 108 modularity, which in turn enables to build a system that can 109 gracefully evolve to accommodate new and emerging requirements. 111 5. Forwarding component 113 The fundamental forwarding paradigm employed by tag switching is 114 based on the notion of label swapping. When a packet with a tag is 115 received by a tag switch, the switch uses the tag as an index in its 116 Tag Information Base (TIB). Each entry in the TIB consists of an 117 incoming tag, and one or more sub-entries of the form . If the switch 119 finds an entry with the incoming tag equal to the tag carried in the 120 packet, then for each in the entry the switch replaces the tag in 122 the packet with the outgoing tag, replaces the link level information 123 (e.g MAC address) in the packet with the outgoing link level 124 information, and forwards the packet over the outgoing interface. 126 From the above description of the forwarding component we can make 127 several observations. First, the forwarding decision is based on the 128 exact match algorithm using a fixed length, fairly short tag as an 129 index. This enables a simplified forwarding procedure, relative to 130 longest match forwarding traditionally used at the network layer. 131 This in turn enables higher forwarding performance (higher packets 132 per second). The forwarding procedure is simple enough to allow a 133 straightforward hardware implementation. 135 A second observation is that the forwarding decision is independent 136 of the tag's forwarding granularity. For example, the same forwarding 137 algorithm applies to both unicast and multicast - a unicast entry 138 would just have a single (outgoing tag, outgoing interface, outgoing 139 link level information) sub-entry, while a multicast entry may have 140 one or more (outgoing tag, outgoing interface, outgoing link level 141 information) sub-entries. (For multi-access links, the outgoing link 142 level information in this case would include a multicast MAC 143 address.) This illustrates how with tag switching the same forwarding 144 paradigm can be used to support different routing functions (e.g., 145 unicast, multicast, etc...) 147 The simple forwarding procedure is thus essentially decoupled from 148 the control component of tag switching. New routing (control) 149 functions can readily be deployed without disturbing the forwarding 150 paradigm. This means that it is not necessary to re-optimize 151 forwarding performance (by modifying either hardware or software) as 152 new routing functionality is added. 154 In the tag switching architecture, various implementation options are 155 acceptable. For example, support for network layer forwarding by a 156 tag switch (i.e., forwarding based on the network layer header as 157 opposed to a tag) is optional. Moreover, use of network layer 158 forwarding may be constrained to handling network layer control 159 traffic only. (Note, however, that a tag switch must be able to 160 source and sink network layer packets, e.g. to participate in network 161 layer routing protocols) 163 For the purpose of handling network layer hop count (time-to-live) 164 the architecture allows two alternatives: network layer hops may 165 correspond directly to hops formed by tag switches, or one network 166 layer hop may correspond to several tag switched hops. 168 When a switch receives a packet with a tag, and the TIB maintained by 169 the switch has no entry with the incoming tag equal to the tag 170 carried by the packet, or the entry exists, the outgoing tag entry is 171 entry, and the entry does not indicate local delivery to the switch, 172 the switch may either (a) discard the packet, or (b) strip the tag 173 information, and submit the packet for network layer processing. 174 Support for the latter is optional (as support for network layer 175 forwarding is optional). Note that it may not always be possible to 176 successfully forward a packet after stripping a tag even if a tag 177 switch supports network layer forwarding. 179 The architecture allows a tag switch to maintain either a single TIB 180 per tag switch, or a TIB per interface. Moreover, a tag switch could 181 mix both of these options - some tags could be maintained in a single 182 TIB, while other tags could be maintained in a TIB associated with 183 individual interfaces. 185 5.1. Tag encapsulation 187 Tag switching clearly requires a tag to be carried in each packet. 188 The tag information can be carried in a variety of ways: 190 - as a small "shim" tag header inserted between the layer 2 and 191 the Network Layer headers; 193 - as part of the layer 2 header, if the layer 2 header provides 194 adequate semantics (e.g., Frame Relay, or ATM); 196 - as part of the Network Layer header (e.g., using the Flow Label 197 field in IPv6 with appropriately modified semantics). 199 It is therefore possible to implement tag switching over virtually 200 any media type including point-to-point links, multi-access links, 201 and ATM. At the same time the forwarding component allows specific 202 optimizations for particular media (e.g., ATM). 204 Observe also that the tag forwarding component is Network Layer 205 independent. Use of control component(s) specific to a particular 206 Network Layer protocol enables the use of tag switching with 207 different Network Layer protocols. 209 6. Control component 211 Essential to tag switching is the notion of binding between a tag and 212 Network Layer routing (routes). The control component is responsible 213 for creating tag bindings, and then distributing the tag binding 214 information among tag switches. Creating a tag binding involves 215 allocating a tag, and then binding a tag to a route. The distribution 216 of tag binding information among tag switches could be accomplished 217 via several options: 219 - piggybacking on existing routing protocols 220 - using a separate Tag Distribution Protocol (TDP) 222 While the architecture supports distribution of tag binding 223 information that is independent of the underlying routing protocols, 224 the architecture acknowledges that considerable optimizations can be 225 achieved in some cases by small enhancements of existing protocols to 226 enable piggybacking tag binding information on these protocols. 228 One important characteristic of the tag switching architecture is 229 that creation of tag bindings is driven primarily by control traffic 230 rather than by data traffic. Control traffic driven creation of tag 231 bindings has several advantages, as compared to data traffic driven 232 creation of tag bindings. For one thing, it minimizes the amount of 233 additional control traffic needed to distribute tag binding 234 information, as tag binding information is distributed only in 235 response to control traffic, independent of data traffic. It also 236 makes the overall scheme independent of and insensitive to the data 237 traffic profile/pattern. Control traffic driven creation of tag 238 binding improves forwarding performance, as tags are precomputed 239 (prebound) before data traffic arrives, rather than being created as 240 data traffic arrives. It also simplifies the overall system behavior, 241 as the control plane is controlled solely by control traffic, rather 242 than by a mix of control and data traffic. 244 Another important characteristic of the tag switching architecture is 245 that distribution and maintenance of tag binding information is 246 consistent with distribution and maintenance of the associated 247 routing information. For example, distribution of tag binding 248 information for tags associated with unicast routing is based on the 249 technique of incremental updates with explicit acknowledgment. This 250 is very similar to the way unicast routing information gets 251 distributed by such protocols as OSPF and BGP. In contrast, 252 distribution of tag binding information for tags associated with 253 multicast routing is based on period updates/ refreshes, without any 254 explicit acknowledgments. This is consistent with the way multicast 255 routing information is distributed by such protocols as PIM. 257 To provide good scaling characteristics, while also accommodating 258 diverse routing functionality, tag switching supports a wide range of 259 forwarding granularities. At one extreme a tag could be associated 260 (bound) to a group of routes (more specifically to the Network Layer 261 Reachability Information of the routes in the group). At the other 262 extreme a tag could be bound to an individual application flow (e.g., 263 an RSVP flow). A tag could also be bound to a multicast tree. In 264 addition, a tag may be bound to a path that has been selected for a 265 certain set of packets based on some policy (e.g. an explicit route). 267 The control component is organized as a collection of modules, each 268 designed to support a particular routing function. To support new 269 routing functions, new modules can be added. The architecture does 270 not mandate a prescribed set of modules that have to be supported by 271 every tag switch. 273 The following describes some of the modules. 275 6.1. Destination-based routing 277 In this section we describe how tag switching can support 278 destination-based routing. Recall that with destination-based routing 279 a router makes a forwarding decision based on the destination address 280 carried in a packet and the information stored in the Forwarding 281 Information Base (FIB) maintained by the router. A router constructs 282 its FIB by using the information it receives from routing protocols 283 (e.g., OSPF, BGP). 285 To support destination-based routing with tag switching, a tag 286 switch, just like a router, participates in routing protocols (e.g., 287 OSPF, BGP), and constructs its FIB using the information it receives 288 from these protocols. 290 There are three permitted methods for tag allocation and Tag 291 Information Base (TIB) management: (a) downstream tag allocation, (b) 292 downstream tag allocation on demand, and (c) upstream tag allocation. 293 In all cases, a switch allocates tags and binds them to address 294 prefixes in its FIB. In downstream allocation, the tag that is 295 carried in a packet is generated and bound to a prefix by the switch 296 at the downstream end of the link (with respect to the direction of 297 data flow). On demand allocation means that tags will only be 298 allocated and distributed by the downstream switch when it is 299 requested to do so by the upstream switch. Method (b) is most useful 300 in ATM networks (see Section 8). In upstream allocation, tags are 301 allocated and bound at the upstream end of the link. Note that in 302 downstream allocation, a switch is responsible for creating tag 303 bindings that apply to incoming data packets, and receives tag 304 bindings for outgoing packets from its neighbors. In upstream 305 allocation, a switch is responsible for creating tag bindings for 306 outgoing tags, i.e. tags that are applied to data packets leaving the 307 switch, and receives bindings for incoming tags from its neighbors. 309 The downstream tag allocation scheme operates as follows: for each 310 route in its FIB the switch allocates a tag, creates an entry in its 311 Tag Information Base (TIB) with the incoming tag set to the allocated 312 tag, and then advertises the binding between the (incoming) tag and 313 the route to other adjacent tag switches. The advertisement could be 314 accomplished by either piggybacking the binding on top of the 315 existing routing protocols, or by using a separate Tag Distribution 316 Protocol (TDP). When a tag switch receives tag binding information 317 for a route, and that information was originated by the next hop for 318 that route, the switch places the tag (carried as part of the binding 319 information) into the outgoing tag of the TIB entry associated with 320 the route. This creates the binding between the outgoing tag and the 321 route. 323 With the downstream on demand tag allocation scheme, operation is as 324 follows. For each route in its FIB, the switch identifies the next 325 hop for that route. It then issues a request (via TDP) to the next 326 hop for a tag binding for that route. When the next hop receives the 327 request, it allocates a tag, creates an entry in its TIB with the 328 incoming tag set to the allocated tag, and then returns the binding 329 between the (incoming) tag and the route to the switch that sent the 330 original request. When the switch receives the binding information, 331 the switch creates an entry in its TIB, and sets the outgoing tag in 332 the entry to the value received from the next hop. Handling of data 333 packets is as for downstream allocation. The main application for 334 this mode of operation is with ATM switches, as described in Section 335 8. 337 The upstream tag allocation scheme is used as follows. If a tag 338 switch has one or more point-to-point interfaces, then for each route 339 in its FIB whose next hop is reachable via one of these interfaces, 340 the switch allocates a tag, creates an entry in its TIB with the 341 outgoing tag set to the allocated tag, and then advertises to the 342 next hop (via TDP) the binding between the (outgoing) tag and the 343 route. When a tag switch that is the next hop receives the tag 344 binding information, the switch places the tag (carried as part of 345 the binding information) into the incoming tag of the TIB entry 346 associated with the route. 348 Note that, while we have described upstream allocation for the sake 349 of completeness, we have found the two downstream allocation methods 350 adequate for all practical purposes so far. 352 Independent of which tag allocation method is used, once a TIB entry 353 is populated with both incoming and outgoing tags, the tag switch can 354 forward packets for routes bound to the tags by using the tag 355 switching forwarding algorithm (as described in Section 5). 357 When a tag switch creates a binding between an outgoing tag and a 358 route, the switch, in addition to populating its TIB, also updates 359 its FIB with the binding information. This enables the switch to add 360 tags to previously untagged packets. 362 So far we have described how a tag could be bound to a single route, 363 creating a one-to-one mapping between routes and tags. However, under 364 certain conditions it is possible to bind a tag not just to a single 365 route, but to a group of routes, creating a many-to-one mapping 366 between routes and tags. Consider a tag switch that is connected to a 367 router. It is quite possible that the switch uses the router as the 368 next hop not just for one route, but for a group of routes. Under 369 these conditions the switch does not have to allocate distinct tags 370 to each of these routes - one tag would suffice. The distribution of 371 tag binding information is unaffected by whether there is a one-to- 372 one or one-to-many mapping between tags and routes. Now consider a 373 tag switch that receives from one of its neighbors (tag switching 374 peers) tag binding information for a set of routes, such that the set 375 is bound to a single tag. If the switch decides to use some or all of 376 the routes in the set, then for these routes the switch does not need 377 to allocate individual tags - one tag would suffice. Such an approach 378 may be valuable when tags are a precious resource. Note that the 379 ability to support many-to-one mapping makes no assumptions about the 380 routing protocols being used. 382 When a tag switch adds a tag to a previously untagged packet the tag 383 could be either associated with the route to the destination address 384 carried in the packet, or with the route to some other tag switch 385 along the path to the destination (in some cases the address of that 386 other tag switch could be gleaned from network layer routing 387 protocols). The latter option provides yet another way of mapping 388 multiple routes into a single tag. However, this option is either 389 dependent on particular routing protocols, or would require a 390 separate mechanism for discovering tag switches along a path. 392 To understand the scaling properties of tag switching in conjunction 393 with destination-based routing, observe that the total number of tags 394 that a tag switch has to maintain can not be greater than the number 395 of routes in the switch's FIB. Moreover, as we have just seen, the 396 number of tags can be much less than the number of routes. Thus, much 397 less state is required than would be the case if tags were allocated 398 to individual flows. 400 In general, a tag switch will try to populate its TIB with incoming 401 and outgoing tags for all routes to which it has reachability, so 402 that all packets can be forwarded by simple label swapping. Tag 403 allocation is thus driven by topology (routing), not data traffic - 404 it is the existence of a FIB entry that causes tag allocations, not 405 the arrival of data packets. 407 Use of tags associated with routes, rather than flows, also means 408 that there is no need to perform flow classification procedures for 409 all the flows to determine whether to assign a tag to a flow. That, 410 in turn, simplifies the overall scheme, and makes it more robust and 411 stable in the presence of changing traffic patterns. 413 Note that when tag switching is used to support destination-based 414 routing, tag switching does not completely eliminate the need to 415 perform normal Network Layer forwarding at some network elements. 416 First of all, to add a tag to a previously untagged packet requires 417 normal Network Layer forwarding. This function could be performed by 418 the first hop router, or by the first router on the path that is able 419 to participate in tag switching. In addition, whenever a tag switch 420 aggregates a set of routes (e.g., by using the technique of 421 hierarchical routing), into a single route, and the routes do not 422 share a common next hop, the switch needs to perform Network Layer 423 forwarding for packets carrying the tag associated with the 424 aggregated route. However, one could observe that the number of 425 places where routes get aggregated is smaller than the total number 426 of places where forwarding decisions have to be made. Moreover, quite 427 often aggregation is applied to only a subset of the routes 428 maintained by a tag switch. As a result, on average a packet can be 429 forwarded most of the time using the tag switching algorithm. Note 430 that many tag switches may not need to perform any network layer 431 forwarding. 433 6.2. Hierarchy of routing knowledge 435 The IP routing architecture models a network as a collection of 436 routing domains. Within a domain, routing is provided via interior 437 routing (e.g., OSPF), while routing across domains is provided via 438 exterior routing (e.g., BGP). However, all routers within domains 439 that carry transit traffic (e.g., domains formed by Internet Service 440 Providers) have to maintain information provided by not just interior 441 routing, but exterior routing as well, even if only some of these 442 routers participate in exterior routing. That creates certain 443 problems. First of all, the amount of this information is not 444 insignificant. Thus it places additional demand on the resources 445 required by the routers. Moreover, increase in the volume of routing 446 information quite often increases routing convergence time. This, in 447 turn, degrades the overall performance of the system. 449 Tag switching allows complete decoupling of interior and exterior 450 routing. With tag switching only tag switches at the border of a 451 domain would be required to maintain routing information provided by 452 exterior routing - all other switches within the domain would just 453 maintain routing information provided by the domains interior routing 454 (which is usually significantly smaller than the exterior routing 455 information), with no "leaking" of exterior routing information into 456 interior routing. This, in turn, reduces the routing load on non- 457 border switches, and shortens routing convergence time. 459 To support this functionality, tag switching allows a packet to carry 460 not one but a set of tags, organized as a stack. A tag switch could 461 either swap the tag at the top of the stack, or pop the stack, or 462 swap the tag and push one or more tags into the stack. 464 Consider a tag switch that is at the border of a routing domain. This 465 switch maintains both exterior and interior routes. The interior 466 routes provide routing information and tags to all the other tag 467 switches within the domain. For each exterior route that the switch 468 receives from some other border tag switch that is in the same domain 469 as the local switch, the switch maintains not just a tag associated 470 with the route, but also a tag associated with the route to that 471 other border tag switch. Moreover, for inter-domain routing protocols 472 that are capable of passing the "third-party" next hop information 473 the switch would maintain a tag associated with the route to the next 474 hop, rather than with the route to the border tag switch from whom 475 the local switch received the exterior route. 477 When a packet is forwarded between two (border) tag switches in 478 different domains, the tag stack in the packet contains just one tag 479 (associated with an exterior route). However, when a packet is 480 forwarded within a domain, the tag stack in the packet contains not 481 one, but two tags (the second tag is pushed by the domain's ingress 482 border tag switch). The tag at the top of the stack provides packet 483 forwarding to an appropriate egress border tag switch (or the 484 "third-party" next hop), while the next tag in the stack provides 485 correct packet forwarding at the egress switch (or at the "third- 486 party" next hop). The stack is popped by either the egress switch (or 487 the "third-party" next hop) or by the penultimate (with respect to 488 the egress switch/"third-party" next hop) switch. 490 One could observe that when tag switching is confined to a single 491 routing domain, the above still could be used to decouple interior 492 from exterior routing, similar to what was described above. However, 493 in this case a border tag switch wouldn't maintain tags associated 494 with each exterior route, and forwarding between domains would be 495 performed at the network layer. 497 The control component used in this scenario is fairly similar to the 498 one used with destination-based routing. In fact, the only essential 499 difference is that in this scenario the tag binding information is 500 distributed both among physically adjacent tag switches, and among 501 border tag switches within a single domain. One could also observe 502 that the latter (distribution among border switches) could be 503 trivially accommodated by very minor extensions to BGP. 505 The notion of supporting hierarchy of routing knowledge with tag 506 switching is not limited to the case of exterior/interior routing, 507 but could be applicable to other cases where the hierarchy of routing 508 knowledge is possible. Moreover, while the above describes only a 509 two-level hierarchy of routing knowledge, the tag switching 510 architecture does not impose limits on the depth of the hierarchy. 512 In the presence of hierarchy of routing knowledge a tag switched path 513 at the level N in the hierarchy has to have its endpoints at tag 514 switches that are at border between the level N and (N-1) in the 515 hierarchy (level 0 in the hierarchy corresponds to an untagged path). 517 6.3. Multicast 519 Essential to multicast routing is the notion of spanning trees. 520 Multicast routing procedures (e.g., PIM) are responsible for 521 constructing such trees (with receivers as leafs), while multicast 522 forwarding is responsible for forwarding multicast packets along such 523 trees. Thus, to support a multicast forwarding function with tag 524 switching we need to be able to associate a tag with a multicast 525 tree. The following describes the procedures for allocation and 526 distribution of tags for multicast. 528 When tag switching is used for multicast, it is important that tag 529 switching be able to utilize multicast capabilities provided by the 530 Data Link layer (e.g., multicast capabilities provided by Ethernet). 531 To be able to do this, an (upstream) tag switch connected to a given 532 Data Link subnetwork should use the same tag when forwarding a 533 multicast packet to all of the (downstream) switches on that 534 subnetwork. This way the packet will be multicasted at the Data Link 535 layer over the subnetwork. To support this, all tag switches that are 536 part of a given multicast tree and are on a common subnetwork must 537 agree on a common tag that would be used for forwarding multicast 538 packets along the tree over the subnetwork. Moreover, since multicast 539 forwarding is based on Reverse Path Forwarding (RPF), it is crucial 540 that, when a tag switch receives a multicast packet, a tag carried in 541 a packet must enable the switch to identify both (a) a particular 542 multicast group, as well as (b) the previous hop (upstream) tag 543 switch that sent the packet. 545 To support the requirements outlined in the previous paragraph, the 546 tag switching architecture assumes that (a) multicast tags are 547 associated with interfaces on a tag switch (rather than with a tag 548 switch as a whole), (b) the tag space that a tag switch could use for 549 allocating tags for multicast is partitioned into non-overlapping 550 regions among all the tag switches connected to a common Data Link 551 subnetwork, and (c) there are procedures by which tag switches that 552 belong to a common multicast tree and are on a common Data Link 553 subnetwork agree on the tag switch that is responsible for allocating 554 a tag for the tree. 556 One possible way of partitioning tag space into non-overlapping 557 regions among tag switches connected to a common subnetwork is for 558 each tag switch to claim a region of the space and announce this 559 region to its neighbors. Conflicts are resolved based on the IP 560 address of the contending switches (the higher address wins, the 561 lower retries). Once the tag space is partitioned among tag switches, 562 the switches may create bindings between tags and multicast trees 563 (routes). 565 At least in principle there are two possible ways to create bindings 566 between tags and multicast trees (routes). With the first alternative 567 for a set of tag switches that share a common Data Link subnetwork, 568 the tag switch that is upstream with respect to a particular 569 multicast tree allocates a tag (out of its own region that does not 570 overlap with the regions of other switches on the subnetwork), binds 571 the tag to a multicast route, and then advertises the binding to all 572 the (downstream) switches on the subnetwork. With the second 573 alternative, one of the tag switches that is downstream with respect 574 to a particular multicast tree allocates a tag (out of its own region 575 that does not overlap with the regions of other switches on the 576 subnetwork), binds the tag to a multicast route, and then advertises 577 the binding to all the switches (both downstream and upstream) on the 578 subnetwork. Usually the first tag switch to join the group is the one 579 that performs the allocation. 581 Each of the above alternatives has its own trade-offs. The first 582 alternative is fairly simple - one upstream router does the tag 583 binding and multicasts the binding downstream. However, the first 584 alternative may create uneven distribution of allocated tags, as some 585 tag switches on a common subnetwork may have more upstream multicast 586 sources than the others. Also, changes in topology could result in 587 upstream neighbor changes, which in turn would require tag re- 588 binding. Finally, one could observe that distributing tag binding 589 from upstream towards downstream is inconsistent with the direction 590 of multicast routing information distribution (from downstream 591 towards upstream). 593 The second alternative, even if more complex that the first one, has 594 its own advantages. For one thing, it makes distribution of multicast 595 tag binding consistent with the distribution of unicast tag binding. 596 It also makes distribution of multicast tag binding consistent with 597 the distribution of multicast routing information. This, in turn, 598 allows the piggybacking of tag binding information on existing 599 multicast routing protocols (PIM). This alternative also avoids the 600 need for tag re-binding when there are changes in upstream neighbor. 601 Finally it is more likely to provide more even distribution of 602 allocated tags, as compared to the first alternative. Note that this 603 approach does require a mechanism to choose the tag allocator from 604 among the downstream tag switches on the subnetwork. 606 6.4. Quality of service 608 Two mechanisms are needed for providing a range of qualities of 609 service to packets passing through a router or a tag switch. First, 610 we need to classify packets into different classes. Second, we need 611 to ensure that the handling of packets is such that the appropriate 612 QOS characteristics (bandwidth, loss, etc.) are provided to each 613 class. 615 Tag switching provides an easy way to mark packets as belonging to a 616 particular class after they have been classified the first time. 617 Initial classification could be done using configuration information 618 (e.g., all traffic from a certain interface) or using information 619 carried in the network layer or higher layer headers (e.g., all 620 packets between a certain pair of hosts). A tag corresponding to the 621 resultant class would then be applied to the packet. Tagged packets 622 can then be efficiently handled by the tag switching routers in their 623 path without needing to be reclassified. The actual scheduling and 624 queueing of packets is largely orthogonal - the key point here is 625 that tag switching enables simple logic to be used to find the state 626 that identifies how the packet should be scheduled. 628 Tag switching can, for example, be used to support a small number of 629 classes of service in a service provider network (e.g. premium and 630 standard). On frame-based media, the class can be encoded by a field 631 in the tag header. On ATM tag switches, additional tags can be 632 allocated to differentiate the different classes. For example, rather 633 than having one tag for each destination prefix in the FIB, an ATM 634 tag switch could have two tags per prefix, one to be used by premium 635 traffic and one by standard. Thus a tag binding in this case is a 636 triple consisting of . Such a tag would be 637 used both to make a forwarding decision and to make a scheduling 638 decision, e.g., by selecting the appropriate queue in a weighted fair 639 queueing (WFQ) scheduler. 641 To provide a finer granularity of QOS, tag switching can be used with 642 RSVP. We propose a simple extension to RSVP in which a tag object is 643 defined. Such an object can be carried in an RSVP reservation message 644 and thus associated with a session. Each tag capable router assigns a 645 tag to the session and passes it upstream with the reservation 646 message. Thus the association of tags with RSVP sessions works very 647 much like the binding of tags to routes with downstream allocation. 648 Note, however, that binding is accomplished using RSVP rather than 649 TDP. (It would be possible to use TDP, but it is simpler to extend 650 RSVP to carry tags and this ensures that tags and reservation 651 information are communicated in a similar manner.) 653 When data packets are transmitted, the first router in the path that 654 is tag-capable applies the tag that it received from its downstream 655 neighbor. This tag can be used at the next hop to find the 656 corresponding reservation state, to forward and schedule the packet 657 appropriately, and to find the suitable outgoing tag value provided 658 by the next hop. Note that tag imposition could also be performed at 659 the sending host. 661 6.5. Flexible routing (explicit routes) 663 One of the fundamental properties of destination-based routing is 664 that the only information from a packet that is used to forward the 665 packet is the destination address. While this property enables highly 666 scalable routing, it also limits the ability to influence the actual 667 paths taken by packets. This, in turn, limits the ability to evenly 668 distribute traffic among multiple links, taking the load off highly 669 utilized links, and shifting it towards less utilized links. For 670 Internet Service Providers (ISPs) who support different classes of 671 service, destination-based routing also limits their ability to 672 segregate different classes with respect to the links used by these 673 classes. Some of the ISPs today use Frame Relay or ATM to overcome 674 the limitations imposed by destination-based routing. Tag switching, 675 because of the flexible granularity of tags, is able to overcome 676 these limitations without using either Frame Relay or ATM. 678 Another application where destination-based routing is no longer 679 adequate is routing with resource reservations (QOS routing). 680 Increasing the number of ways by which a particular reservation could 681 traverse a network may improve the success of the reservation. 682 Increasing the number of ways, in turn, requires the ability to 683 explore paths that are not constrained to the ones constructed solely 684 based on destination. 686 To provide forwarding along paths that are different from the paths 687 determined by destination-based routing, the control component of tag 688 switching allows installation of tag bindings in tag switches that do 689 not correspond to the destination-based routing paths. 691 One possible alternative for supporting explicit routes is to allow 692 TDP to carry information about an explicit route, where such a route 693 could be expressed as a sequence of tag switches. Another alternative 694 is to use tag-capable RSVP (see Section 6.4) as a mechanism to 695 distribute tag bindings, and to augment RSVP with the ability to 696 steer the PATH message along a particular (explicit) route. Finally, 697 it is also possible in principle to use some form of source route 698 (e.g., SDRP, GRE) to steer RSVP PATH messages carrying tag bindings 699 along a particular path. Note, however, that this would require a 700 change to the way in which RSVP handles PATH messages, as it would be 701 necessary to store the source route as part of the PATH state. 703 7. Tag Forwarding Granularities and Forwarding Equivalence Classes 705 A conventional router has some sort of structure or set of structures 706 which may be called a "forwarding table", which has a finite number 707 of entries. Whenever a packet is received, the router applies a 708 classification algorithm which maps the packet to one of the 709 forwarding table entries. This entry specifies how to forward the 710 packet. 712 We can think of this classification algorithm as a means of 713 partitioning the universe of possible packets into a finite set of 714 "Forwarding Equivalence Classes" (FECs). 716 Each router along a path must have some way of determining the next 717 hop for that FEC. For a given FEC, the corresponding entry in the 718 forwarding table may be created dynamically, by operation of the 719 routing protocols (unicast or multicast), or it might be created by 720 configuration, or it might be created by some combination of 721 configuration and protocol. 723 In tag switching, if a pair of tag switches are adjacent along a tag 724 switched path, they must agree on an assignment of tags to FECs. Once 725 this agreement is made, all tag switches on the tag switched path 726 other than the first are spared the work of actually executing the 727 classification algorithm. In fact, subsequent tag switches need not 728 even have the code which would be necessary to do this. 730 There are a large number of different ways in which one may choose to 731 partition a set of packets into FECs. Some examples: 733 1. Consider two packets to be in the same FEC if there is a single 734 address prefix in the routing table which is the longest match for 735 the destination address of each packet; 737 2. Consider two packets to be in the same FEC if these packets 738 have to traverse through a common router/tag switch; 740 3. Consider two packets to be in the same FEC if they have the 741 same source address and the same destination address; 742 4. Consider two packets to be in the same FEC if they have the 743 same source address, the same destination address, the same 744 transport protocol, the same source port, and the same destination 745 port. 747 5. Consider two packets to be in the same FEC if they are alike in 748 some arbitrary manner determined by policy. Note that the 749 assignment of a packet to a FEC by policy need not be done solely 750 by examining the network layer header. One might want, for 751 example, all packets arriving over a certain interface to be 752 classified into a single FEC, so that those packets all get 753 tunnelled through the network to a particular exit point. 755 Other examples can easily be thought of. 757 In case 1, the FEC can be identified by an address prefix (as 758 described in Section 6.1). In case 2, the FEC can be identified by 759 the address of a tag switch (as described in Section 6.1). Both 1 and 760 2 are useful for binding tags to unicast routes - tags are bound to 761 FECs, and an address prefix, or an address identifies a particular 762 FEC. Case 3 is useful for binding tags to multicast trees that are 763 constructed by protocols such as PIM (as described in Section 6.3). 764 Case 4 is useful for binding tags to individual flows, using, say, 765 RSVP (as described in Section 6.4). Case 5 is useful as a way of 766 connecting two pieces of a private network across a public backbone 767 (without even assuming that the private network is an IP network) (as 768 described in Section 6.5). 770 Any number of different kinds of FEC can co-exist in a single tag 771 switch, as long as the result is to partition the universe of packets 772 seen by that tag switch. Likewise, the procedures which different tag 773 switches use to classify (hitherto untagged) packets into FECs need 774 not be identical. 776 Networks could be organized around a hierarchy of FECs. For example, 777 (non-adjacent) tag switches TSa and TSb may classify packets into 778 some set of FECs FEC1,...,FECn. However from the point of view of 779 the intermediate tag switches between TSa and TSb, all of these FECs 780 may be treated indistinguishably. That is, as far as the intermediate 781 tag switches are concerned, the union of the FEC1,...,FECn is a 782 single FEC. Each intermediate tag switch may then prefer to use a 783 single tag for this union (rather than maintaining individual tags 784 for each member of this union). Tag switching accommodates this by 785 providing a hierarchy of tags, organized in a stack. 787 Much of the power of tag switching arises from the facts that: 789 - there are so many different ways to partition the packets into 790 FECs, 792 - different tag switches can partition the hitherto untagged 793 packets in different ways, 795 - the route to be used for a particular FEC can be chosen in 796 different ways, 798 - a hierarchy of tags, organized as a stack, can be used to 799 represent the network's hierarchy of FECs. 801 Note that tag switching does not specify, as an element of any 802 particular protocol, a general notion of "FEC identifier". Even if it 803 were possible to have such a thing, there is no need for it, since 804 there is no "one size fits all" setup protocol which works for any 805 arbitrary combination of packet classifier and routing protocol. 806 That's why tag distribution is sometimes done with TDP, sometimes 807 with BGP, sometimes with PIM, sometimes with RSVP. 809 8. Tag switching with ATM 811 Since the tag switching forwarding paradigm is based on label 812 swapping, and since ATM forwarding is also based on label swapping, 813 tag switching technology can readily be applied to ATM switches by 814 implementing the control component of tag switching. 816 The tag information needed for tag switching can be carried in the 817 VCI field. If two levels of tagging are needed, then the VPI field 818 could be used as well, although the size of the VPI field limits the 819 size of networks in which this would be practical. However, for most 820 applications of one level of tagging the VCI field is adequate. 822 To obtain the necessary control information, the switch should be 823 able to support the tag switching control component. Moreover, if the 824 switch has to perform routing information aggregation, then to 825 support destination-based unicast routing the switch should be able 826 to perform Network Layer forwarding for some fraction of the traffic 827 as well. 829 Supporting the destination-based routing function with tag switching 830 on an ATM switch may require the switch to maintain not one, but 831 several tags associated with a route (or a group of routes with the 832 same next hop). This is necessary to avoid the interleaving of 833 packets which arrive from different upstream tag switches, but are 834 sent concurrently to the same next hop. 836 If an ATM switch has built-in mechanism(s) to suppress cell 837 interleave, then the switch could implement the destination-based 838 routing function precisely the way it was described in Section 6.1. 839 This would eliminate the need to maintain several tags per route. 840 Note, however, that suppressing cell interleave is not part of the 841 ATM User Plane, as defined by the ATM Forum. 843 Yet another alternative that eliminates the need to maintain several 844 tags per route is to carry the tag information in the VPI field, and 845 use the VCI field for identifying cells that were sent by different 846 tag switches. Note, however, that the scalability of this alternative 847 is constrained by the size of the VPI space (4096 tags total). 848 Moreover, this alternative assumes that for a set of ATM tag switches 849 that form a contiguous segment of a network topology there exists a 850 mechanism to assign to each ATM tag switch around the edge of the 851 segment a set of unique VCIs that would be used by this switch alone. 853 The downstream tag allocation on demand scheme is likely to be a 854 preferred scheme for the tag allocation and TIB maintenance 855 procedures with ATM switches, as this scheme allows efficient use of 856 entries in the cross-connect tables maintained by ATM switches. 858 Implementing tag switching on an ATM switch simplifies integration of 859 ATM switches and routers. From a routing peering point of view an ATM 860 switch capable of tag switching would appear as a router to an 861 adjacent router; this reduces the number of routing peers a router 862 would have to maintain (relative to the common arrangement where a 863 large number of routers are fully meshed over an ATM cloud). Tag 864 switching enables better routing, as it exposes the underlying 865 physical topology to the Network Layer routing. Finally tag switching 866 simplifies overall operations by employing common addressing, 867 routing, and management procedures among both routers and ATM 868 switches. That could provide a viable, more scalable alternative to 869 the overlay model. Because creation of tag binding is driven by 870 control traffic, rather than data traffic, application of this 871 approach to ATM switches does not produce high call setup rates, nor 872 does it depend on the longevity of flows. 874 Implementing tag switching on an ATM switch does not preclude the 875 ability to support a traditional ATM control plane (e.g., PNNI) on 876 the same switch. The two components, tag switching and the ATM 877 control plane, would operate in a Ships In the Night mode (with 878 VPI/VCI space and other resources partitioned so that the components 879 do not interact). 881 9. Tag switching migration strategies 883 Since tag switching is performed between a pair of adjacent tag 884 switches, and since the tag binding information can be distributed on 885 a pairwise basis, tag switching could be introduced in a fairly 886 simple, incremental fashion. For example, once a pair of adjacent 887 routers are converted into tag switches, each of the switches would 888 tag packets destined to the other, thus enabling the other switch to 889 use tag switching. Since tag switches use the same routing protocols 890 as routers, the introduction of tag switches has no impact on 891 routers. In fact, a tag switch connected to a router acts just as a 892 router from the router's perspective. 894 As more and more routers are upgraded to enable tag switching, the 895 scope of functionality provided by tag switching widens. For example, 896 once all the routers within a domain are upgraded to support tag 897 switching, in becomes possible to start using the hierarchy of 898 routing knowledge function. 900 10. Summary 902 In this paper we described the tag switching technology. Tag 903 switching is not constrained to a particular Network Layer protocol - 904 it is a multiprotocol solution. The forwarding component of tag 905 switching is simple enough to facilitate high performance forwarding, 906 and may be implemented on high performance forwarding hardware such 907 as ATM switches. The control component is flexible enough to support 908 a wide variety of routing functions, such as destination-based 909 routing, multicast routing, hierarchy of routing knowledge, and 910 explicitly defined routes. By allowing a wide range of forwarding 911 granularities that could be associated with a tag, we provide both 912 scalable and functionally rich routing. A combination of a wide range 913 of forwarding granularities and the ability to evolve the control 914 component fairly independently from the forwarding component results 915 in a solution that enables graceful introduction of new routing 916 functionality to meet the demands of a rapidly evolving computer 917 networking environment. 919 11. Security Considerations 921 Security considerations are not addressed in this document. 923 12. Intellectual Property Considerations 925 Cisco Systems may seek patent or other intellectual property 926 protection for some or all of the technologies disclosed in this 927 document. If any standards arising from this document are or become 928 protected by one or more patents assigned to Cisco Systems, Cisco 929 intends to disclose those patents and license them under openly 930 specified and non-discriminatory terms, for no fee. 932 13. Acknowledgments 934 Significant contributions to this work have been made by Anthony 935 Alles, Fred Baker, Paul Doolan, Guy Fedorkow, Jeremy Lawrence, Arthur 936 Lin, Morgan Littlewood, Keith McCloghrie, and Dan Tappan. 938 14. References 940 15. Authors' Addresses 942 Yakov Rekhter 943 Cisco Systems, Inc. 944 170 Tasman Drive 945 San Jose, CA, 95134 946 E-mail: yakov@cisco.com 948 Bruce Davie 949 Cisco Systems, Inc. 950 250 Apollo Drive 951 Chelmsford, MA, 01824 952 E-mail: bsd@cisco.com 954 Dave Katz 955 Juniper Networks 956 3260 Jay Street 957 Santa Clara, CA 95051 958 E-mail: dkatz@jnx.com 960 Eric Rosen 961 Cisco Systems, Inc. 962 250 Apollo Drive 963 Chelmsford, MA, 01824 964 E-mail: erosen@cisco.com 966 George Swallow 967 Cisco Systems, Inc. 968 250 Apollo Drive 969 Chelmsford, MA, 01824 970 E-mail: swallow@cisco.com 972 Dino Farinacci 973 Cisco Systems, Inc. 974 170 West Tasman Drive 975 San Jose, CA 95134 976 E-mail: dino@cisco.com