idnits 2.17.1 draft-rekhter-tagswitch-arch-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 20 longer pages, the longest (page 2) being 61 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 1997) is 9963 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Yakov Rekhter 2 Expiration date:July 1997 Bruce Davie 3 Dave Katz 4 Eric Rosen 5 George Swallow 6 Dino Farinacci 7 cisco Systems 8 January 1997 10 Tag Switching Architecture - Overview 12 draft-rekhter-tagswitch-arch-00.txt 14 1. Status of this Memo 16 This document is an Internet Draft. Internet Drafts are working 17 documents of the Internet Engineering Task Force (IETF), its Areas, 18 and its Working Groups. Note that other groups may also distribute 19 working documents as Internet Drafts. 21 Internet Drafts are draft documents valid for a maximum of six 22 months. Internet Drafts may be updated, replaced, or obsoleted by 23 other documents at any time. It is not appropriate to use Internet 24 Drafts as reference material or to cite them other than as a "working 25 draft" or "work in progress." 27 Please check the 1id-abstracts.txt listing contained in the 28 internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net, 29 nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the 30 current status of any Internet Draft. 32 2. Abstract 34 This document provides an overview of tag switching. Tag switching is 35 a way to combine the label-swapping forwarding paradigm with network 36 layer routing. This has several advantages. Tags can have a wide 37 spectrum of forwarding granularities, so at one end of the spectrum a 38 tag could be associated with a group of destinations, while at the 39 other a tag could be associated with a single application flow. At 40 the same time forwarding based on tag switching, due to its 41 simplicity, is well suited to high performance forwarding. These 42 factors facilitate the development of a routing system which is both 43 functionally rich and scalable. Finally, tag switching simplifies 44 integration of routers and ATM switches by employing common 45 addressing, routing, and management procedures. 47 3. Introduction 49 Continuous growth of the Internet demands higher bandwidth within the 50 Internet Service Providers (ISPs). However, growth of the Internet is 51 not the only driving factor for higher bandwidth - demand for higher 52 bandwidth also comes from emerging multimedia applications. Demand 53 for higher bandwidth, in turn, requires higher forwarding performance 54 for both multicast and unicast traffic. 56 The growth of the Internet also demands improved scaling properties 57 of the Internet routing system. The ability to contain the volume of 58 routing information maintained by individual routers and the ability 59 to build a hierarchy of routing knowledge are essential to support a 60 high quality, scalable routing system. 62 While the destination-based forwarding paradigm is adequate in many 63 situations, we already see examples where it is no longer adequate. 64 The ability to overcome the rigidity of destination-based forwarding 65 and to have more flexible control over how traffic is routed is 66 likely to become more and more important. 68 We see the need to improve forwarding performance while at the same 69 time adding routing functionality to support multicast, allowing more 70 flexible control over how traffic is routed, and providing the 71 ability to build a hierarchy of routing knowledge. Moreover, it 72 becomes more and more crucial to have a routing system that can 73 support graceful evolution to accommodate new and emerging 74 requirements. 76 Tag switching is a technology that provides an efficient solution to 77 these challenges. Tag switching blends the flexibility and rich 78 functionality provided by Network Layer routing with the simplicity 79 provided by the label swapping forwarding paradigm. The simplicity of 80 the tag switching forwarding paradigm (label swapping) enables 81 improved forwarding performance, while maintaining competitive 82 price/performance. By associating a wide range of forwarding 83 granularities with a tag, the same forwarding paradigm can be used to 84 support a wide variety of routing functions, such as destination- 85 based routing, multicast, hierarchy of routing knowledge, and 86 flexible routing control. Finally, a combination of simple 87 forwarding, a wide range of forwarding granularities, and the ability 88 to evolve routing functionality while preserving the same forwarding 89 paradigm enables a routing system that can gracefully evolve to 90 accommodate new and emerging requirements. 92 4. Tag Switching components 94 Tag switching consists of two components: forwarding and control. The 95 forwarding component uses the tag information (tags) carried by 96 packets and the tag forwarding information maintained by a tag switch 97 to perform packet forwarding. The control component is responsible 98 for maintaining correct tag forwarding information among a group of 99 inter- connected tag switches. 101 Segregating control and forwarding into separate components promotes 102 modularity, which in turn enables to build a system that can 103 gracefully evolve to accommodate new and emerging requirements. 105 5. Forwarding component 107 The fundamental forwarding paradigm employed by tag switching is 108 based on the notion of label swapping. When a packet with a tag is 109 received by a tag switch, the switch uses the tag as an index in its 110 Tag Information Base (TIB). Each entry in the TIB consists of an 111 incoming tag, and one or more sub-entries of the form . If the switch 113 finds an entry with the incoming tag equal to the tag carried in the 114 packet, then for each in the entry the switch replaces the tag in 116 the packet with the outgoing tag, replaces the link level information 117 (e.g MAC address) in the packet with the outgoing link level 118 information, and forwards the packet over the outgoing interface. 120 From the above description of the forwarding component we can make 121 several observations. First, the forwarding decision is based on the 122 exact match algorithm using a fixed length, fairly short tag as an 123 index. This enables a simplified forwarding procedure, relative to 124 longest match forwarding traditionally used at the network layer. 125 This in turn enables higher forwarding performance (higher packets 126 per second). The forwarding procedure is simple enough to allow a 127 straightforward hardware implementation. 129 A second observation is that the forwarding decision is independent 130 of the tag's forwarding granularity. For example, the same forwarding 131 algorithm applies to both unicast and multicast - a unicast entry 132 would just have a single (outgoing tag, outgoing interface, outgoing 133 link level information) sub-entry, while a multicast entry may have 134 one or more (outgoing tag, outgoing interface, outgoing link level 135 information) sub-entries. (For multi-access links, the outgoing link 136 level information in this case would include a multicast MAC 137 address.) This illustrates how with tag switching the same forwarding 138 paradigm can be used to support different routing functions (e.g., 139 unicast, multicast, etc...) 141 The simple forwarding procedure is thus essentially decoupled from 142 the control component of tag switching. New routing (control) 143 functions can readily be deployed without disturbing the forwarding 144 paradigm. This means that it is not necessary to re-optimize 145 forwarding performance (by modifying either hardware or software) as 146 new routing functionality is added. 148 In the tag switching architecture, various implementation options are 149 acceptable. For example, support for network layer forwarding by a 150 tag switch (i.e., forwarding based on the network layer header as 151 opposed to a tag) is optional. Moreover, use of network layer 152 forwarding may be constrained to handling network layer control 153 traffic only. (Note, however, that a tag switch must be able to 154 source and sink network layer packets, e.g. to participate in network 155 layer routing protocols) 157 For the purpose of handling network layer hop count (time-to-live) 158 the architecture allows two alternatives: network layer hops may 159 correspond directly to hops formed by tag switches, or one network 160 layer hop may correspond to several tag switched hops. 162 When a switch receives a packet with a tag, and the TIB maintained by 163 the switch has no entry with the incoming tag equal to the tag 164 carried by the packet, or the entry exists, the outgoing tag entry is 165 entry, and the entry does not indicate local delivery to the switch, 166 the switch may either (a) discard the packet, or (b) strip the tag 167 information, and submit the packet for network layer processing. 168 Support for the latter is optional (as support for network layer 169 forwarding is optional). Note that it may not always be possible to 170 successfully forward a packet after stripping a tag even if a tag 171 switch supports network layer forwarding. 173 The architecture allows a tag switch to maintain either a single TIB 174 per tag switch, or a TIB per interface. Moreover, a tag switch could 175 mix both of these options - some tags could be maintained in a single 176 TIB, while other tags could be maintained in a TIB associated with 177 individual interfaces. 179 5.1. Tag encapsulation 181 Tag switching clearly requires a tag to be carried in each packet. 182 The tag information can be carried in a variety of ways: 184 - as a small "shim" tag header inserted between the layer 2 and 185 the Network Layer headers; 187 - as part of the layer 2 header, if the layer 2 header provides 188 adequate semantics (e.g., Frame Relay, or ATM); 190 - as part of the Network Layer header (e.g., using the Flow Label 191 field in IPv6 with appropriately modified semantics). 193 It is therefore possible to implement tag switching over virtually 194 any media type including point-to-point links, multi-access links, 195 and ATM. At the same time the forwarding component allows specific 196 optimizations for particular media (e.g., ATM). 198 Observe also that the tag forwarding component is Network Layer 199 independent. Use of control component(s) specific to a particular 200 Network Layer protocol enables the use of tag switching with 201 different Network Layer protocols. 203 6. Control component 205 Essential to tag switching is the notion of binding between a tag and 206 Network Layer routing (routes). The control component is responsible 207 for creating tag bindings, and then distributing the tag binding 208 information among tag switches. Creating a tag binding involves 209 allocating a tag, and then binding a tag to a route. The distribution 210 of tag binding information among tag switches could be accomplished 211 via several options: 213 - piggybacking on existing routing protocols 215 - using a separate Tag Distribution Protocol (TDP) 217 While the architecture supports distribution of tag binding 218 information that is independent of the underlying routing protocols, 219 the architecture acknowledges that considerable optimizations can be 220 achieved in some cases by small enhancements of existing protocols to 221 enable piggybacking tag binding information on these protocols. 223 One important characteristic of the tag switching architecture is 224 that creation of tag bindings is driven primarily by control traffic 225 rather than by data traffic. Control traffic driven creation of tag 226 bindings has several advantages, as compared to data traffic driven 227 creation of tag bindings. For one thing, it minimizes the amount of 228 additional control traffic needed to distribute tag binding 229 information, as tag binding information is distributed only in 230 response to control traffic, independent of data traffic. It also 231 makes the overall scheme independent of and insensitive to the data 232 traffic profile/pattern. Control traffic driven creation of tag 233 binding improves forwarding performance, as tags are precomputed 234 (prebound) before data traffic arrives, rather than being created as 235 data traffic arrives. It also simplifies the overall system behavior, 236 as the control plane is controlled solely by control traffic, rather 237 than by a mix of control and data traffic. 239 Another important characteristic of the tag switching architecture is 240 that distribution and maintenance of tag binding information is 241 consistent with distribution and maintenance of the associated 242 routing information. For example, distribution of tag binding 243 information for tags associated with unicast routing is based on the 244 technique of incremental updates with explicit acknowledgment. This 245 is very similar to the way unicast routing information gets 246 distributed by such protocols as OSPF and BGP. In contrast, 247 distribution of tag binding information for tags associated with 248 multicast routing is based on period updates/ refreshes, without any 249 explicit acknowledgments. This is consistent with the way multicast 250 routing information is distributed by such protocols as PIM. 252 To provide good scaling characteristics, while also accommodating 253 diverse routing functionality, tag switching supports a wide range of 254 forwarding granularities. At one extreme a tag could be associated 255 (bound) to a group of routes (more specifically to the Network Layer 256 Reachability Information of the routes in the group). At the other 257 extreme a tag could be bound to an individual application flow (e.g., 258 an RSVP flow). A tag could also be bound to a multicast tree. In 259 addition, a tag may be bound to a path that has been selected for a 260 certain set of packets based on some policy (e.g. an explicit route). 262 The control component is organized as a collection of modules, each 263 designed to support a particular routing function. To support new 264 routing functions, new modules can be added. The architecture does 265 not mandate a prescribed set of modules that have to be supported by 266 every tag switch. 268 The following describes some of the modules. 270 6.1. Destination-based routing 272 In this section we describe how tag switching can support 273 destination-based routing. Recall that with destination-based routing 274 a router makes a forwarding decision based on the destination address 275 carried in a packet and the information stored in the Forwarding 276 Information Base (FIB) maintained by the router. A router constructs 277 its FIB by using the information it receives from routing protocols 278 (e.g., OSPF, BGP). 280 To support destination-based routing with tag switching, a tag 281 switch, just like a router, participates in routing protocols (e.g., 282 OSPF, BGP), and constructs its FIB using the information it receives 283 from these protocols. 285 There are three permitted methods for tag allocation and Tag 286 Information Base (TIB) management: (a) downstream tag allocation, (b) 287 downstream tag allocation on demand, and (c) upstream tag allocation. 288 In all cases, a switch allocates tags and binds them to address 289 prefixes in its FIB. In downstream allocation, the tag that is 290 carried in a packet is generated and bound to a prefix by the switch 291 at the downstream end of the link (with respect to the direction of 292 data flow). On demand allocation means that tags will only be 293 allocated and distributed by the downstream switch when it is 294 requested to do so by the upstream switch. Method (b) is most useful 295 in ATM networks (see Section 8). In upstream allocation, tags are 296 allocated and bound at the upstream end of the link. Note that in 297 downstream allocation, a switch is responsible for creating tag 298 bindings that apply to incoming data packets, and receives tag 299 bindings for outgoing packets from its neighbors. In upstream 300 allocation, a switch is responsible for creating tag bindings for 301 outgoing tags, i.e. tags that are applied to data packets leaving the 302 switch, and receives bindings for incoming tags from its neighbors. 304 The downstream tag allocation scheme operates as follows: for each 305 route in its FIB the switch allocates a tag, creates an entry in its 306 Tag Information Base (TIB) with the incoming tag set to the allocated 307 tag, and then advertises the binding between the (incoming) tag and 308 the route to other adjacent tag switches. The advertisement could be 309 accomplished by either piggybacking the binding on top of the 310 existing routing protocols, or by using a separate Tag Distribution 311 Protocol (TDP). When a tag switch receives tag binding information 312 for a route, and that information was originated by the next hop for 313 that route, the switch places the tag (carried as part of the binding 314 information) into the outgoing tag of the TIB entry associated with 315 the route. This creates the binding between the outgoing tag and the 316 route. 318 With the downstream on demand tag allocation scheme, operation is as 319 follows. For each route in its FIB, the switch identifies the next 320 hop for that route. It then issues a request (via TDP) to the next 321 hop for a tag binding for that route. When the next hop receives the 322 request, it allocates a tag, creates an entry in its TIB with the 323 incoming tag set to the allocated tag, and then returns the binding 324 between the (incoming) tag and the route to the switch that sent the 325 original request. When the switch receives the binding information, 326 the switch creates an entry in its TIB, and sets the outgoing tag in 327 the entry to the value received from the next hop. Handling of data 328 packets is as for downstream allocation. The main application for 329 this mode of operation is with ATM switches, as described in Section 330 8. 332 The upstream tag allocation scheme is used as follows. If a tag 333 switch has one or more point-to-point interfaces, then for each route 334 in its FIB whose next hop is reachable via one of these interfaces, 335 the switch allocates a tag, creates an entry in its TIB with the 336 outgoing tag set to the allocated tag, and then advertises to the 337 next hop (via TDP) the binding between the (outgoing) tag and the 338 route. When a tag switch that is the next hop receives the tag 339 binding information, the switch places the tag (carried as part of 340 the binding information) into the incoming tag of the TIB entry 341 associated with the route. 343 Note that, while we have described upstream allocation for the sake 344 of completeness, we have found the two downstream allocation methods 345 adequate for all practical purposes so far. 347 Independent of which tag allocation method is used, once a TIB entry 348 is populated with both incoming and outgoing tags, the tag switch can 349 forward packets for routes bound to the tags by using the tag 350 switching forwarding algorithm (as described in Section 5). 352 When a tag switch creates a binding between an outgoing tag and a 353 route, the switch, in addition to populating its TIB, also updates 354 its FIB with the binding information. This enables the switch to add 355 tags to previously untagged packets. 357 So far we have described how a tag could be bound to a single route, 358 creating a one-to-one mapping between routes and tags. However, under 359 certain conditions it is possible to bind a tag not just to a single 360 route, but to a group of routes, creating a many-to-one mapping 361 between routes and tags. Consider a tag switch that is connected to a 362 router. It is quite possible that the switch uses the router as the 363 next hop not just for one route, but for a group of routes. Under 364 these conditions the switch does not have to allocate distinct tags 365 to each of these routes - one tag would suffice. The distribution of 366 tag binding information is unaffected by whether there is a one-to- 367 one or one-to-many mapping between tags and routes. Now consider a 368 tag switch that receives from one of its neighbors (tag switching 369 peers) tag binding information for a set of routes, such that the set 370 is bound to a single tag. If the switch decides to use some or all of 371 the routes in the set, then for these routes the switch does not need 372 to allocate individual tags - one tag would suffice. Such an approach 373 may be valuable when tags are a precious resource. Note that the 374 ability to support many-to-one mapping makes no assumptions about the 375 routing protocols being used. 377 When a tag switch adds a tag to a previously untagged packet the tag 378 could be either associated with the route to the destination address 379 carried in the packet, or with the route to some other tag switch 380 along the path to the destination (in some cases the address of that 381 other tag switch could be gleaned from network layer routing 382 protocols). The latter option provides yet another way of mapping 383 multiple routes into a single tag. However, this option is either 384 dependent on particular routing protocols, or would require a 385 separate mechanism for discovering tag switches along a path. 387 To understand the scaling properties of tag switching in conjunction 388 with destination-based routing, observe that the total number of tags 389 that a tag switch has to maintain can not be greater than the number 390 of routes in the switch's FIB. Moreover, as we have just seen, the 391 number of tags can be much less than the number of routes. Thus, much 392 less state is required than would be the case if tags were allocated 393 to individual flows. 395 In general, a tag switch will try to populate its TIB with incoming 396 and outgoing tags for all routes to which it has reachability, so 397 that all packets can be forwarded by simple label swapping. Tag 398 allocation is thus driven by topology (routing), not data traffic - 399 it is the existence of a FIB entry that causes tag allocations, not 400 the arrival of data packets. 402 Use of tags associated with routes, rather than flows, also means 403 that there is no need to perform flow classification procedures for 404 all the flows to determine whether to assign a tag to a flow. That, 405 in turn, simplifies the overall scheme, and makes it more robust and 406 stable in the presence of changing traffic patterns. 408 Note that when tag switching is used to support destination-based 409 routing, tag switching does not completely eliminate the need to 410 perform normal Network Layer forwarding at some network elements. 411 First of all, to add a tag to a previously untagged packet requires 412 normal Network Layer forwarding. This function could be performed by 413 the first hop router, or by the first router on the path that is able 414 to participate in tag switching. In addition, whenever a tag switch 415 aggregates a set of routes (e.g., by using the technique of 416 hierarchical routing), into a single tag, and the routes do not share 417 a common next hop, the switch needs to perform Network Layer 418 forwarding for packets carrying that tag. However, one could observe 419 that the number of places where routes get aggregated is smaller than 420 the total number of places where forwarding decisions have to be 421 made. Moreover, quite often aggregation is applied to only a subset 422 of the routes maintained by a tag switch. As a result, on average a 423 packet can be forwarded most of the time using the tag switching 424 algorithm. Note that many tag switches may not need to perform any 425 network layer forwarding. 427 6.2. Hierarchy of routing knowledge 429 The IP routing architecture models a network as a collection of 430 routing domains. Within a domain, routing is provided via interior 431 routing (e.g., OSPF), while routing across domains is provided via 432 exterior routing (e.g., BGP). However, all routers within domains 433 that carry transit traffic (e.g., domains formed by Internet Service 434 Providers) have to maintain information provided by not just interior 435 routing, but exterior routing as well, even if only some of these 436 routers participate in exterior routing. That creates certain 437 problems. First of all, the amount of this information is not 438 insignificant. Thus it places additional demand on the resources 439 required by the routers. Moreover, increase in the volume of routing 440 information quite often increases routing convergence time. This, in 441 turn, degrades the overall performance of the system. 443 Tag switching allows complete decoupling of interior and exterior 444 routing. With tag switching only tag switches at the border of a 445 domain would be required to maintain routing information provided by 446 exterior routing - all other switches within the domain would just 447 maintain routing information provided by the domains interior routing 448 (which is usually significantly smaller than the exterior routing 449 information), with no "leaking" of exterior routing information into 450 interior routing. This, in turn, reduces the routing load on non- 451 border switches, and shortens routing convergence time. 453 To support this functionality, tag switching allows a packet to carry 454 not one but a set of tags, organized as a stack. A tag switch could 455 either swap the tag at the top of the stack, or pop the stack, or 456 swap the tag and push one or more tags into the stack. 458 Consider a tag switch that is at the border of a routing domain. This 459 switch maintains both exterior and interior routes. The interior 460 routes provide routing information and tags to all the other tag 461 switches within the domain. For each exterior route that the switch 462 receives from some other border tag switch that is in the same domain 463 as the local switch, the switch maintains not just a tag associated 464 with the route, but also a tag associated with the route to that 465 other border tag switch. Moreover, for inter-domain routing protocols 466 that are capable of passing the "third-party" next hop information 467 the switch would maintain a tag associated with the route to the next 468 hop, rather than with the route to the border tag switch from whom 469 the local switch received the exterior route. 471 When a packet is forwarded between two (border) tag switches in 472 different domains, the tag stack in the packet contains just one tag 473 (associated with an exterior route). However, when a packet is 474 forwarded within a domain, the tag stack in the packet contains not 475 one, but two tags (the second tag is pushed by the domain's ingress 476 border tag switch). The tag at the top of the stack provides packet 477 forwarding to an appropriate egress border tag switch (or the 478 "third-party" next hop), while the next tag in the stack provides 479 correct packet forwarding at the egress switch (or at the "third- 480 party" next hop). The stack is popped by either the egress switch (or 481 the "third-party" next hop) or by the penultimate (with respect to 482 the egress switch/"third-party" next hop) switch. 484 One could observe that when tag switching is confined to a single 485 routing domain, the above still could be used to decouple interior 486 from exterior routing, similar to what was described above. However, 487 in this case a border tag switch wouldn't maintain tags associated 488 with each exterior route, and forwarding between domains would be 489 performed at the network layer. 491 The control component used in this scenario is fairly similar to the 492 one used with destination-based routing. In fact, the only essential 493 difference is that in this scenario the tag binding information is 494 distributed both among physically adjacent tag switches, and among 495 border tag switches within a single domain. One could also observe 496 that the latter (distribution among border switches) could be 497 trivially accommodated by very minor extensions to BGP. 499 The notion of supporting hierarchy of routing knowledge with tag 500 switching is not limited to the case of exterior/interior routing, 501 but could be applicable to other cases where the hierarchy of routing 502 knowledge is possible. Moreover, while the above describes only a 503 two-level hierarchy of routing knowledge, the tag switching 504 architecture does not impose limits on the depth of the hierarchy. 506 6.3. Multicast 508 Essential to multicast routing is the notion of spanning trees. 509 Multicast routing procedures (e.g., PIM) are responsible for 510 constructing such trees (with receivers as leafs), while multicast 511 forwarding is responsible for forwarding multicast packets along such 512 trees. Thus, to support a multicast forwarding function with tag 513 switching we need to be able to associate a tag with a multicast 514 tree. The following describes the procedures for allocation and 515 distribution of tags for multicast. 517 When tag switching is used for multicast, it is important that tag 518 switching be able to utilize multicast capabilities provided by the 519 Data Link layer (e.g., multicast capabilities provided by Ethernet). 520 To be able to do this, an (upstream) tag switch connected to a given 521 Data Link subnetwork should use the same tag when forwarding a 522 multicast packet to all of the (downstream) switches on that 523 subnetwork. This way the packet will be multicasted at the Data Link 524 layer over the subnetwork. To support this, all tag switches that are 525 part of a given multicast tree and are on a common subnetwork must 526 agree on a common tag that would be used for forwarding multicast 527 packets along the tree over the subnetwork. Moreover, since multicast 528 forwarding is based on Reverse Path Forwarding (RPF), it is crucial 529 that, when a tag switch receives a multicast packet, a tag carried in 530 a packet must enable the switch to identify both (a) a particular 531 multicast group, as well as (b) the previous hop (upstream) tag 532 switch that sent the packet. 534 To support the requirements outlined in the previous paragraph, the 535 tag switching architecture assumes that (a) multicast tags are 536 associated with interfaces on a tag switch (rather than with a tag 537 switch as a whole), (b) the tag space that a tag switch could use for 538 allocating tags for multicast is partitioned into non-overlapping 539 regions among all the tag switches connected to a common Data Link 540 subnetwork, and (c) there are procedures by which tag switches that 541 belong to a common multicast tree and are on a common Data Link 542 subnetwork agree on the tag switch that is responsible for allocating 543 a tag for the tree. 545 One possible way of partitioning tag space into non-overlapping 546 regions among tag switches connected to a common subnetwork is for 547 each tag switch to claim a region of the space and announce this 548 region to its neighbors. Conflicts are resolved based on the IP 549 address of the contending switches (the higher address wins, the 550 lower retries). Once the tag space is partitioned among tag switches, 551 the switches may create bindings between tags and multicast trees 552 (routes). 554 At least in principle there are two possible ways to create bindings 555 between tags and multicast trees (routes). With the first alternative 556 for a set of tag switches that share a common Data Link subnetwork, 557 the tag switch that is upstream with respect to a particular 558 multicast tree allocates a tag (out of its own region that does not 559 overlap with the regions of other switches on the subnetwork), binds 560 the tag to a multicast route, and then advertises the binding to all 561 the (downstream) switches on the subnetwork. With the second 562 alternative, one of the tag switches that is downstream with respect 563 to a particular multicast tree allocates a tag (out of its own region 564 that does not overlap with the regions of other switches on the 565 subnetwork), binds the tag to a multicast route, and then advertises 566 the binding to all the switches (both downstream and upstream) on the 567 subnetwork. Usually the first tag switch to join the group is the one 568 that performs the allocation. 570 Each of the above alternatives has its own trade-offs. The first 571 alternative is fairly simple - one upstream router does the tag 572 binding and multicasts the binding downstream. However, the first 573 alternative may create uneven distribution of allocated tags, as some 574 tag switches on a common subnetwork may have more upstream multicast 575 sources than the others. Also, changes in topology could result in 576 upstream neighbor changes, which in turn would require tag re- 577 binding. Finally, one could observe that distributing tag binding 578 from upstream towards downstream is inconsistent with the direction 579 of multicast routing information distribution (from downstream 580 towards upstream). 582 The second alternative, even if more complex that the first one, has 583 its own advantages. For one thing, it makes distribution of multicast 584 tag binding consistent with the distribution of unicast tag binding. 585 It also makes distribution of multicast tag binding consistent with 586 the distribution of multicast routing information. This, in turn, 587 allows the piggybacking of tag binding information on existing 588 multicast routing protocols (PIM). This alternative also avoids the 589 need for tag re-binding when there are changes in upstream neighbor. 590 Finally it is more likely to provide more even distribution of 591 allocated tags, as compared to the first alternative. Note that this 592 approach does require a mechanism to choose the tag allocator from 593 among the downstream tag switches on the subnetwork. 595 6.4. Quality of service 597 Two mechanisms are needed for providing a range of qualities of 598 service to packets passing through a router or a tag switch. First, 599 we need to classify packets into different classes. Second, we need 600 to ensure that the handling of packets is such that the appropriate 601 QOS characteristics (bandwidth, loss, etc.) are provided to each 602 class. 604 Tag switching provides an easy way to mark packets as belonging to a 605 particular class after they have been classified the first time. 606 Initial classification could be done using configuration information 607 (e.g., all traffic from a certain interface) or using information 608 carried in the network layer or higher layer headers (e.g., all 609 packets between a certain pair of hosts). A tag corresponding to the 610 resultant class would then be applied to the packet. Tagged packets 611 can then be efficiently handled by the tag switching routers in their 612 path without needing to be reclassified. The actual scheduling and 613 queueing of packets is largely orthogonal - the key point here is 614 that tag switching enables simple logic to be used to find the state 615 that identifies how the packet should be scheduled. 617 Tag switching can, for example, be used to support a small number of 618 classes of service in a service provider network (e.g. premium and 619 standard). On frame-based media, the class can be encoded by a field 620 in the tag header. On ATM tag switches, additional tags can be 621 allocated to differentiate the different classes. For example, rather 622 than having one tag for each destination prefix in the FIB, an ATM 623 tag switch could have two tags per prefix, one to be used by premium 624 traffic and one by standard. Thus a tag binding in this case is a 625 triple consisting of . Such a tag would be 626 used both to make a forwarding decision and to make a scheduling 627 decision, e.g., by selecting the appropriate queue in a weighted fair 628 queueing (WFQ) scheduler. 630 To provide a finer granularity of QOS, tag switching can be used with 631 RSVP. We propose a simple extension to RSVP in which a tag object is 632 defined. Such an object can be carried in an RSVP reservation message 633 and thus associated with a session. Each tag capable router assigns a 634 tag to the session and passes it upstream with the reservation 635 message. Thus the association of tags with RSVP sessions works very 636 much like the binding of tags to routes with downstream allocation. 637 Note, however, that binding is accomplished using RSVP rather than 638 TDP. (It would be possible to use TDP, but it is simpler to extend 639 RSVP to carry tags and this ensures that tags and reservation 640 information are communicated in a similar manner.) 642 When data packets are transmitted, the first router in the path that 643 is tag-capable applies the tag that it received from its downstream 644 neighbor. This tag can be used at the next hop to find the 645 corresponding reservation state, to forward and schedule the packet 646 appropriately, and to find the suitable outgoing tag value provided 647 by the next hop. Note that tag imposition could also be performed at 648 the sending host. 650 6.5. Flexible routing (explicit routes) 652 One of the fundamental properties of destination-based routing is 653 that the only information from a packet that is used to forward the 654 packet is the destination address. While this property enables highly 655 scalable routing, it also limits the ability to influence the actual 656 paths taken by packets. This, in turn, limits the ability to evenly 657 distribute traffic among multiple links, taking the load off highly 658 utilized links, and shifting it towards less utilized links. For 659 Internet Service Providers (ISPs) who support different classes of 660 service, destination-based routing also limits their ability to 661 segregate different classes with respect to the links used by these 662 classes. Some of the ISPs today use Frame Relay or ATM to overcome 663 the limitations imposed by destination-based routing. Tag switching, 664 because of the flexible granularity of tags, is able to overcome 665 these limitations without using either Frame Relay or ATM. 667 Another application where destination-based routing is no longer 668 adequate is routing with resource reservations (QOS routing). 669 Increasing the number of ways by which a particular reservation could 670 traverse a network may improve the success of the reservation. 671 Increasing the number of ways, in turn, requires the ability to 672 explore paths that are not constrained to the ones constructed solely 673 based on destination. 675 To provide forwarding along paths that are different from the paths 676 determined by destination-based routing, the control component of tag 677 switching allows installation of tag bindings in tag switches that do 678 not correspond to the destination-based routing paths. 680 One possible alternative for supporting explicit routes is to allow 681 TDP to carry information about an explicit route, where such a route 682 could be expressed as a sequence of tag switches. Another alternative 683 is to use tag-capable RSVP (see Section 6.4) as a mechanism to 684 distribute tag bindings, and to augment RSVP with the ability to 685 steer the PATH message along a particular (explicit) route. Finally, 686 it is also possible in principle to use some form of source route 687 (e.g., SDRP, GRE) to steer RSVP PATH messages carrying tag bindings 688 along a particular path. Note, however, that this would require a 689 change to the way in which RSVP handles PATH messages, as it would be 690 necessary to store the source route as part of the PATH state. 692 7. Tag Forwarding Granularities and Forwarding Equivalence Classes 694 A conventional router has some sort of structure or set of structures 695 which may be called a "forwarding table", which has a finite number 696 of entries. Whenever a packet is received, the router applies a 697 classification algorithm which maps the packet to one of the 698 forwarding table entries. This entry specifies how to forward the 699 packet. 701 We can think of this classification algorithm as a means of 702 partitioning the universe of possible packets into a finite set of 703 "Forwarding Equivalence Classes" (FECs). 705 Each router along a path must have some way of determining the next 706 hop for that FEC. For a given FEC, the corresponding entry in the 707 forwarding table may be created dynamically, by operation of the 708 routing protocols (unicast or multicast), or it might be created by 709 configuration, or it might be created by some combination of 710 configuration and protocol. 712 In tag switching, if a pair of tag switches are adjacent along a tag 713 switched path, they must agree on an assignment of tags to FECs. Once 714 this agreement is made, all tag switches on the tag switched path 715 other than the first are spared the work of actually executing the 716 classification algorithm. In fact, subsequent tag switches need not 717 even have the code which would be necessary to do this. 719 There are a large number of different ways in which one may choose to 720 partition a set of packets into FECs. Some examples: 722 1. Consider two packets to be in the same FEC if there is a single 723 address prefix in the routing table which is the longest match for 724 the destination address of each packet; 726 2. Consider two packets to be in the same FEC if these packets 727 have to traverse through a common router/tag switch; 729 3. Consider two packets to be in the same FEC if they have the 730 same source address and the same destination address; 732 4. Consider two packets to be in the same FEC if they have the 733 same source address, the same destination address, the same 734 transport protocol, the same source port, and the same destination 735 port. 737 5. Consider two packets to be in the same FEC if they are alike in 738 some arbitrary manner determined by policy. Note that the 739 assignment of a packet to a FEC by policy need not be done solely 740 by examining the network layer header. One might want, for 741 example, all packets arriving over a certain interface to be 742 classified into a single FEC, so that those packets all get 743 tunnelled through the network to a particular exit point. 745 Other examples can easily be thought of. 747 In case 1, the FEC can be identified by an address prefix (as 748 described in Section 6.1). In case 2, the FEC can be identified by 749 the address of a tag switch (as described in Section 6.1). Both 1 and 750 2 are useful for binding tags to unicast routes - tags are bound to 751 FECs, and an address prefix, or an address identifies a particular 752 FEC. Case 3 is useful for binding tags to multicast trees that are 753 constructed by protocols such as PIM (as described in Section 6.3). 754 Case 4 is useful for binding tags to individual flows, using, say, 755 RSVP (as described in Section 6.4). Case 5 is useful as a way of 756 connecting two pieces of a private network across a public backbone 757 (without even assuming that the private network is an IP network) (as 758 described in Section 6.5). 760 Any number of different kinds of FEC can co-exist in a single tag 761 switch, as long as the result is to partition the universe of packets 762 seen by that tag switch. Likewise, the procedures which different tag 763 switches use to classify (hitherto untagged) packets into FECs need 764 not be identical. 766 Networks could be organized around a hierarchy of FECs. For example, 767 (non-adjacent) tag switches TSa and TSb may classify packets into 768 some set of FECs FEC1,...,FECn. However from the point of view of 769 the intermediate tag switches between TSa and TSb, all of these FECs 770 may be treated indistinguishably. That is, as far as the intermediate 771 tag switches are concerned, the union of the FEC1,...,FECn is a 772 single FEC. Each intermediate tag switch may then prefer to use a 773 single tag for this union (rather than maintaining individual tags 774 for each member of this union). Tag switching accommodates this by 775 providing a hierarchy of tags, organized in a stack. 777 Much of the power of tag switching arises from the facts that: 779 - there are so many different ways to partition the packets into 780 FECs, 782 - different tag switches can partition the hitherto untagged 783 packets in different ways, 785 - the route to be used for a particular FEC can be chosen in 786 different ways, 788 - a hierarchy of tags, organized as a stack, can be used to 789 represent the network's hierarchy of FECs. 791 Note that tag switching does not specify, as an element of any 792 particular protocol, a general notion of "FEC identifier". Even if it 793 were possible to have such a thing, there is no need for it, since 794 there is no "one size fits all" setup protocol which works for any 795 arbitrary combination of packet classifier and routing protocol. 796 That's why tag distribution is sometimes done with TDP, sometimes 797 with BGP, sometimes with PIM, sometimes with RSVP. 799 8. Tag switching with ATM 801 Since the tag switching forwarding paradigm is based on label 802 swapping, and since ATM forwarding is also based on label swapping, 803 tag switching technology can readily be applied to ATM switches by 804 implementing the control component of tag switching. 806 The tag information needed for tag switching can be carried in the 807 VCI field. If two levels of tagging are needed, then the VPI field 808 could be used as well, although the size of the VPI field limits the 809 size of networks in which this would be practical. However, for most 810 applications of one level of tagging the VCI field is adequate. 812 To obtain the necessary control information, the switch should be 813 able to support the tag switching control component. Moreover, if the 814 switch has to perform routing information aggregation, then to 815 support destination-based unicast routing the switch should be able 816 to perform Network Layer forwarding for some fraction of the traffic 817 as well. 819 Supporting the destination-based routing function with tag switching 820 on an ATM switch may require the switch to maintain not one, but 821 several tags associated with a route (or a group of routes with the 822 same next hop). This is necessary to avoid the interleaving of 823 packets which arrive from different upstream tag switches, but are 824 sent concurrently to the same next hop. 826 If an ATM switch has built-in mechanism(s) to suppress cell 827 interleave, then the switch could implement the destination-based 828 routing function precisely the way it was described in Section 6.1. 829 This would eliminate the need to maintain several tags per route. 830 Note, however, that suppressing cell interleave is not part of the 831 ATM User Plane, as defined by the ATM Forum. 833 Yet another alternative that eliminates the need to maintain several 834 tags per route is to carry the tag information in the VPI field, and 835 use the VCI field for identifying cells that were sent by different 836 tag switches. Note, however, that the scalability of this alternative 837 is constrained by the size of the VPI space (4096 tags total). 838 Moreover, this alternative assumes that for a set of ATM tag switches 839 that form a contiguous segment of a network topology there exists a 840 mechanism to assign to each ATM tag switch around the edge of the 841 segment a set of unique VCIs that would be used by this switch alone. 843 The downstream tag allocation on demand scheme is likely to be a 844 preferred scheme for the tag allocation and TIB maintenance 845 procedures with ATM switches, as this scheme allows efficient use of 846 entries in the cross-connect tables maintained by ATM switches. 848 Implementing tag switching on an ATM switch simplifies integration of 849 ATM switches and routers. From a routing peering point of view an ATM 850 switch capable of tag switching would appear as a router to an 851 adjacent router; this reduces the number of routing peers a router 852 would have to maintain (relative to the common arrangement where a 853 large number of routers are fully meshed over an ATM cloud). Tag 854 switching enables better routing, as it exposes the underlying 855 physical topology to the Network Layer routing. Finally tag switching 856 simplifies overall operations by employing common addressing, 857 routing, and management procedures among both routers and ATM 858 switches. That could provide a viable, more scalable alternative to 859 the overlay model. Because creation of tag binding is driven by 860 control traffic, rather than data traffic, application of this 861 approach to ATM switches does not produce high call setup rates, nor 862 does it depend on the longevity of flows. 864 Implementing tag switching on an ATM switch does not preclude the 865 ability to support a traditional ATM control plane (e.g., PNNI) on 866 the same switch. The two components, tag switching and the ATM 867 control plane, would operate in a Ships In the Night mode (with 868 VPI/VCI space and other resources partitioned so that the components 869 do not interact). 871 9. Tag switching migration strategies 873 Since tag switching is performed between a pair of adjacent tag 874 switches, and since the tag binding information can be distributed on 875 a pairwise basis, tag switching could be introduced in a fairly 876 simple, incremental fashion. For example, once a pair of adjacent 877 routers are converted into tag switches, each of the switches would 878 tag packets destined to the other, thus enabling the other switch to 879 use tag switching. Since tag switches use the same routing protocols 880 as routers, the introduction of tag switches has no impact on 881 routers. In fact, a tag switch connected to a router acts just as a 882 router from the router's perspective. 884 As more and more routers are upgraded to enable tag switching, the 885 scope of functionality provided by tag switching widens. For example, 886 once all the routers within a domain are upgraded to support tag 887 switching, in becomes possible to start using the hierarchy of 888 routing knowledge function. 890 10. Summary 892 In this paper we described the tag switching technology. Tag 893 switching is not constrained to a particular Network Layer protocol - 894 it is a multiprotocol solution. The forwarding component of tag 895 switching is simple enough to facilitate high performance forwarding, 896 and may be implemented on high performance forwarding hardware such 897 as ATM switches. The control component is flexible enough to support 898 a wide variety of routing functions, such as destination-based 899 routing, multicast routing, hierarchy of routing knowledge, and 900 explicitly defined routes. By allowing a wide range of forwarding 901 granularities that could be associated with a tag, we provide both 902 scalable and functionally rich routing. A combination of a wide range 903 of forwarding granularities and the ability to evolve the control 904 component fairly independently from the forwarding component results 905 in a solution that enables graceful introduction of new routing 906 functionality to meet the demands of a rapidly evolving computer 907 networking environment. 909 11. Security Considerations 911 Security considerations are not addressed in this document. 913 12. Intellectual Property Considerations 915 Cisco Systems may seek patent or other intellectual property 916 protection for some or all of the technologies disclosed in this 917 document. If any standards arising from this document are or become 918 protected by one or more patents assigned to Cisco Systems, Cisco 919 intends to disclose those patents and license them under openly 920 specified and non-discriminatory terms, for no fee. 922 13. Acknowledgments 924 Significant contributions to this work have been made by Anthony 925 Alles, Fred Baker, Paul Doolan, Guy Fedorkow, Jeremy Lawrence, Arthur 926 Lin, Morgan Littlewood, Keith McCloghrie, and Dan Tappan. 928 14. References 930 15. Authors' Addresses 932 Yakov Rekhter 933 Cisco Systems, Inc. 934 170 Tasman Drive 935 San Jose, CA, 95134 936 E-mail: yakov@cisco.com 938 Bruce Davie 939 Cisco Systems, Inc. 940 250 Apollo Drive 941 Chelmsford, MA, 01824 942 E-mail: bsd@cisco.com 944 Dave Katz 945 Cisco Systems, Inc. 946 170 Tasman Drive 947 San Jose, CA, 95134 948 E-mail: dkatz@cisco.com 950 Eric Rosen 951 Cisco Systems, Inc. 952 250 Apollo Drive 953 Chelmsford, MA, 01824 954 E-mail: erosen@cisco.com 956 George Swallow 957 Cisco Systems, Inc. 958 250 Apollo Drive 959 Chelmsford, MA, 01824 960 E-mail: swallow@cisco.com 962 Dino Farinacci 963 Cisco Systems, Inc. 964 170 West Tasman Drive 965 San Jose, CA 95134 966 E-mail: dino@cisco.com