idnits 2.17.1 

draft-rekhter-tagswitch-arch-00.txt:
  ** The Abstract section seems to be numbered


  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-26) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 20
     longer pages, the longest (page 2) being 61 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (January 1997) is 9963 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

     No issues found here.

     Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Draft                                        Yakov Rekhter
2	Expiration date:July 1997                               Bruce Davie
3	                                                          Dave Katz
4	                                                         Eric Rosen
5	                                                     George Swallow
6	                                                     Dino Farinacci
7	                                                      cisco Systems
8	                                                       January 1997

10	                 Tag Switching Architecture - Overview

12	                  draft-rekhter-tagswitch-arch-00.txt

14	1. Status of this Memo

16	   This document is an Internet Draft. Internet Drafts are working
17	   documents of the Internet Engineering Task Force (IETF), its Areas,
18	   and its Working Groups. Note that other groups may also distribute
19	   working documents as Internet Drafts.

21	   Internet Drafts are draft documents valid for a maximum of six
22	   months. Internet Drafts may be updated, replaced, or obsoleted by
23	   other documents at any time. It is not appropriate to use Internet
24	   Drafts as reference material or to cite them other than as a "working
25	   draft" or "work in progress."

27	   Please check the 1id-abstracts.txt listing contained in the
28	   internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net,
29	   nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the
30	   current status of any Internet Draft.

32	2. Abstract

34	   This document provides an overview of tag switching. Tag switching is
35	   a way to combine the label-swapping forwarding paradigm with network
36	   layer routing. This has several advantages. Tags can have a wide
37	   spectrum of forwarding granularities, so at one end of the spectrum a
38	   tag could be associated with a group of destinations, while at the
39	   other a tag could be associated with a single application flow. At
40	   the same time forwarding based on tag switching, due to its
41	   simplicity, is well suited to high performance forwarding. These
42	   factors facilitate the development of a routing system which is both
43	   functionally rich and scalable. Finally, tag switching simplifies
44	   integration of routers and ATM switches by employing common
45	   addressing, routing, and management procedures.

47	3. Introduction

49	   Continuous growth of the Internet demands higher bandwidth within the
50	   Internet Service Providers (ISPs). However, growth of the Internet is
51	   not the only driving factor for higher bandwidth - demand for higher
52	   bandwidth also comes from emerging multimedia applications. Demand
53	   for higher bandwidth, in turn, requires higher forwarding performance
54	   for both multicast and unicast traffic.

56	   The growth of the Internet also demands improved scaling properties
57	   of the Internet routing system. The ability to contain the volume of
58	   routing information maintained by individual routers and the ability
59	   to build a hierarchy of routing knowledge are essential to support a
60	   high quality, scalable routing system.

62	   While the destination-based forwarding paradigm is adequate in many
63	   situations, we already see examples where it is no longer adequate.
64	   The ability to overcome the rigidity of destination-based forwarding
65	   and to have more flexible control over how traffic is routed is
66	   likely to become more and more important.

68	   We see the need to improve forwarding performance while at the same
69	   time adding routing functionality to support multicast, allowing more
70	   flexible control over how traffic is routed, and providing the
71	   ability to build a hierarchy of routing knowledge. Moreover, it
72	   becomes more and more crucial to have a routing system that can
73	   support graceful evolution to accommodate new and emerging
74	   requirements.

76	   Tag switching is a technology that provides an efficient solution to
77	   these challenges. Tag switching blends the flexibility and rich
78	   functionality provided by Network Layer routing with the simplicity
79	   provided by the label swapping forwarding paradigm. The simplicity of
80	   the tag switching forwarding paradigm (label swapping) enables
81	   improved forwarding performance, while maintaining competitive
82	   price/performance. By associating a wide range of forwarding
83	   granularities with a tag, the same forwarding paradigm can be used to
84	   support a wide variety of routing functions, such as destination-
85	   based routing, multicast, hierarchy of routing knowledge, and
86	   flexible routing control. Finally, a combination of simple
87	   forwarding, a wide range of forwarding granularities, and the ability
88	   to evolve routing functionality while preserving the same forwarding
89	   paradigm enables a routing system that can gracefully evolve to
90	   accommodate new and emerging requirements.

92	4. Tag Switching components

94	   Tag switching consists of two components: forwarding and control. The
95	   forwarding component uses the tag information (tags) carried by
96	   packets and the tag forwarding information maintained by a tag switch
97	   to perform packet forwarding. The control component is responsible
98	   for maintaining correct tag forwarding information among a group of
99	   inter- connected tag switches.

101	   Segregating control and forwarding into separate components promotes
102	   modularity, which in turn enables to build a system that can
103	   gracefully evolve to accommodate new and emerging requirements.

105	5. Forwarding component

107	   The fundamental forwarding paradigm employed by tag switching is
108	   based on the notion of label swapping. When a packet with a tag is
109	   received by a tag switch, the switch uses the tag as an index in its
110	   Tag Information Base (TIB). Each entry in the TIB consists of an
111	   incoming tag, and one or more sub-entries of the form <outgoing tag,
112	   outgoing interface, outgoing link level information>. If the switch
113	   finds an entry with the incoming tag equal to the tag carried in the
114	   packet, then for each <outgoing tag, outgoing interface, outgoing
115	   link level information> in the entry the switch replaces the tag in
116	   the packet with the outgoing tag, replaces the link level information
117	   (e.g MAC address) in the packet with the outgoing link level
118	   information, and forwards the packet over the outgoing interface.

120	   From the above description of the forwarding component we can make
121	   several observations. First, the forwarding decision is based on the
122	   exact match algorithm using a fixed length, fairly short tag as an
123	   index. This enables a simplified forwarding procedure, relative to
124	   longest match forwarding traditionally used at the network layer.
125	   This in turn enables higher forwarding performance (higher packets
126	   per second). The forwarding procedure is simple enough to allow a
127	   straightforward hardware implementation.

129	   A second observation is that the forwarding decision is independent
130	   of the tag's forwarding granularity. For example, the same forwarding
131	   algorithm applies to both unicast and multicast - a unicast entry
132	   would just have a single (outgoing tag, outgoing interface, outgoing
133	   link level information) sub-entry, while a multicast entry may have
134	   one or more (outgoing tag, outgoing interface, outgoing link level
135	   information) sub-entries. (For multi-access links, the outgoing link
136	   level information in this case would include a multicast MAC
137	   address.) This illustrates how with tag switching the same forwarding
138	   paradigm can be used to support different routing functions (e.g.,
139	   unicast, multicast, etc...)

141	   The simple forwarding procedure is thus essentially decoupled from
142	   the control component of tag switching. New routing (control)
143	   functions can readily be deployed without disturbing the forwarding
144	   paradigm. This means that it is not necessary to re-optimize
145	   forwarding performance (by modifying either hardware or software) as
146	   new routing functionality is added.

148	   In the tag switching architecture, various implementation options are
149	   acceptable. For example, support for network layer forwarding by a
150	   tag switch (i.e., forwarding based on the network layer header as
151	   opposed to a tag) is optional. Moreover, use of network layer
152	   forwarding may be constrained to handling network layer control
153	   traffic only. (Note, however, that a tag switch must be able to
154	   source and sink network layer packets, e.g. to participate in network
155	   layer routing protocols)

157	   For the purpose of handling network layer hop count (time-to-live)
158	   the architecture allows two alternatives: network layer hops may
159	   correspond directly to hops formed by tag switches, or one network
160	   layer hop may correspond to several tag switched hops.

162	   When a switch receives a packet with a tag, and the TIB maintained by
163	   the switch has no entry with the incoming tag equal to the tag
164	   carried by the packet, or the entry exists, the outgoing tag entry is
165	   entry, and the entry does not indicate local delivery to the switch,
166	   the switch may either (a) discard the packet, or (b) strip the tag
167	   information, and submit the packet for network layer processing.
168	   Support for the latter is optional (as support for network layer
169	   forwarding is optional). Note that it may not always be possible to
170	   successfully forward a packet after stripping a tag even if a tag
171	   switch supports network layer forwarding.

173	   The architecture allows a tag switch to maintain either a single TIB
174	   per tag switch, or a TIB per interface. Moreover, a tag switch could
175	   mix both of these options - some tags could be maintained in a single
176	   TIB, while other tags could be maintained in a TIB associated with
177	   individual interfaces.

179	5.1. Tag encapsulation

181	   Tag switching clearly requires a tag to be carried in each packet.
182	   The tag information can be carried in a variety of ways:

184	      - as a small "shim" tag header inserted between the layer 2 and
185	      the Network Layer headers;

187	      - as part of the layer 2 header, if the layer 2 header provides
188	      adequate semantics (e.g., Frame Relay, or ATM);

190	      - as part of the Network Layer header (e.g., using the Flow Label
191	      field in IPv6 with appropriately modified semantics).

193	   It is therefore possible to implement tag switching over virtually
194	   any media type including point-to-point links, multi-access links,
195	   and ATM. At the same time the forwarding component allows specific
196	   optimizations for particular media (e.g., ATM).

198	   Observe also that the tag forwarding component is Network Layer
199	   independent. Use of control component(s) specific to a particular
200	   Network Layer protocol enables the use of tag switching with
201	   different Network Layer protocols.

203	6. Control component

205	   Essential to tag switching is the notion of binding between a tag and
206	   Network Layer routing (routes). The control component is responsible
207	   for creating tag bindings, and then distributing the tag binding
208	   information among tag switches. Creating a tag binding involves
209	   allocating a tag, and then binding a tag to a route. The distribution
210	   of tag binding information among tag switches could be accomplished
211	   via several options:

213	      - piggybacking on existing routing protocols

215	      - using a separate Tag Distribution Protocol (TDP)

217	   While the architecture supports distribution of tag binding
218	   information that is independent of the underlying routing protocols,
219	   the architecture acknowledges that considerable optimizations can be
220	   achieved in some cases by small enhancements of existing protocols to
221	   enable piggybacking tag binding information on these protocols.

223	   One important characteristic of the tag switching architecture is
224	   that creation of tag bindings is driven primarily by control traffic
225	   rather than by data traffic. Control traffic driven creation of tag
226	   bindings has several advantages, as compared to data traffic driven
227	   creation of tag bindings. For one thing, it minimizes the amount of
228	   additional control traffic needed to distribute tag binding
229	   information, as tag binding information is distributed only in
230	   response to control traffic, independent of data traffic. It also
231	   makes the overall scheme independent of and insensitive to the data
232	   traffic profile/pattern. Control traffic driven creation of tag
233	   binding improves forwarding performance, as tags are precomputed
234	   (prebound) before data traffic arrives, rather than being created as
235	   data traffic arrives. It also simplifies the overall system behavior,
236	   as the control plane is controlled solely by control traffic, rather
237	   than by a mix of control and data traffic.

239	   Another important characteristic of the tag switching architecture is
240	   that distribution and maintenance of tag binding information is
241	   consistent with distribution and maintenance of the associated
242	   routing information. For example, distribution of tag binding
243	   information for tags associated with unicast routing is based on the
244	   technique of incremental updates with explicit acknowledgment. This
245	   is very similar to the way unicast routing information gets
246	   distributed by such protocols as OSPF and BGP. In contrast,
247	   distribution of tag binding information for tags associated with
248	   multicast routing is based on period updates/ refreshes, without any
249	   explicit acknowledgments. This is consistent with the way multicast
250	   routing information is distributed by such protocols as PIM.

252	   To provide good scaling characteristics, while also accommodating
253	   diverse routing functionality, tag switching supports a wide range of
254	   forwarding granularities. At one extreme a tag could be associated
255	   (bound) to a group of routes (more specifically to the Network Layer
256	   Reachability Information of the routes in the group). At the other
257	   extreme a tag could be bound to an individual application flow (e.g.,
258	   an RSVP flow). A tag could also be bound to a multicast tree. In
259	   addition, a tag may be bound to a path that has been selected for a
260	   certain set of packets based on some policy (e.g. an explicit route).

262	   The control component is organized as a collection of modules, each
263	   designed to support a particular routing function. To support new
264	   routing functions, new modules can be added. The architecture does
265	   not mandate a prescribed set of modules that have to be supported by
266	   every tag switch.

268	   The following describes some of the modules.

270	6.1. Destination-based routing

272	   In this section we describe how tag switching can support
273	   destination-based routing. Recall that with destination-based routing
274	   a router makes a forwarding decision based on the destination address
275	   carried in a packet and the information stored in the Forwarding
276	   Information Base (FIB) maintained by the router. A router constructs
277	   its FIB by using the information it receives from routing protocols
278	   (e.g., OSPF, BGP).

280	   To support destination-based routing with tag switching, a tag
281	   switch, just like a router, participates in routing protocols (e.g.,
282	   OSPF, BGP), and constructs its FIB using the information it receives
283	   from these protocols.

285	   There are three permitted methods for tag allocation and Tag
286	   Information Base (TIB) management: (a) downstream tag allocation, (b)
287	   downstream tag allocation on demand, and (c) upstream tag allocation.
288	   In all cases, a switch allocates tags and binds them to address
289	   prefixes in its FIB. In downstream allocation, the tag that is
290	   carried in a packet is generated and bound to a prefix by the switch
291	   at the downstream end of the link (with respect to the direction of
292	   data flow). On demand allocation means that tags will only be
293	   allocated and distributed by the downstream switch when it is
294	   requested to do so by the upstream switch. Method (b) is most useful
295	   in ATM networks (see Section 8). In upstream allocation, tags are
296	   allocated and bound at the upstream end of the link. Note that in
297	   downstream allocation, a switch is responsible for creating tag
298	   bindings that apply to incoming data packets, and receives tag
299	   bindings for outgoing packets from its neighbors. In upstream
300	   allocation, a switch is responsible for creating tag bindings for
301	   outgoing tags, i.e. tags that are applied to data packets leaving the
302	   switch, and receives bindings for incoming tags from its neighbors.

304	   The downstream tag allocation scheme operates as follows: for each
305	   route in its FIB the switch allocates a tag, creates an entry in its
306	   Tag Information Base (TIB) with the incoming tag set to the allocated
307	   tag, and then advertises the binding between the (incoming) tag and
308	   the route to other adjacent tag switches. The advertisement could be
309	   accomplished by either piggybacking the binding on top of the
310	   existing routing protocols, or by using a separate Tag Distribution
311	   Protocol (TDP). When a tag switch receives tag binding information
312	   for a route, and that information was originated by the next hop for
313	   that route, the switch places the tag (carried as part of the binding
314	   information) into the outgoing tag of the TIB entry associated with
315	   the route. This creates the binding between the outgoing tag and the
316	   route.

318	   With the downstream on demand tag allocation scheme, operation is as
319	   follows. For each route in its FIB, the switch identifies the next
320	   hop for that route. It then issues a request (via TDP) to the next
321	   hop for a tag binding for that route. When the next hop receives the
322	   request, it allocates a tag, creates an entry in its TIB with the
323	   incoming tag set to the allocated tag, and then returns the binding
324	   between the (incoming) tag and the route to the switch that sent the
325	   original request. When the switch receives the binding information,
326	   the switch creates an entry in its TIB, and sets the outgoing tag in
327	   the entry to the value received from the next hop. Handling of data
328	   packets is as for downstream allocation. The main application for
329	   this mode of operation is with ATM switches, as described in Section
330	   8.

332	   The upstream tag allocation scheme is used as follows. If a tag
333	   switch has one or more point-to-point interfaces, then for each route
334	   in its FIB whose next hop is reachable via one of these interfaces,
335	   the switch allocates a tag, creates an entry in its TIB with the
336	   outgoing tag set to the allocated tag, and then advertises to the
337	   next hop (via TDP) the binding between the (outgoing) tag and the
338	   route. When a tag switch that is the next hop receives the tag
339	   binding information, the switch places the tag (carried as part of
340	   the binding information) into the incoming tag of the TIB entry
341	   associated with the route.

343	   Note that, while we have described upstream allocation for the sake
344	   of completeness, we have found the two downstream allocation methods
345	   adequate for all practical purposes so far.

347	   Independent of which tag allocation method is used, once a TIB entry
348	   is populated with both incoming and outgoing tags, the tag switch can
349	   forward packets for routes bound to the tags by using the tag
350	   switching forwarding algorithm (as described in Section 5).

352	   When a tag switch creates a binding between an outgoing tag and a
353	   route, the switch, in addition to populating its TIB, also updates
354	   its FIB with the binding information. This enables the switch to add
355	   tags to previously untagged packets.

357	   So far we have described how a tag could be bound to a single route,
358	   creating a one-to-one mapping between routes and tags. However, under
359	   certain conditions it is possible to bind a tag not just to a single
360	   route, but to a group of routes, creating a many-to-one mapping
361	   between routes and tags. Consider a tag switch that is connected to a
362	   router.  It is quite possible that the switch uses the router as the
363	   next hop not just for one route, but for a group of routes. Under
364	   these conditions the switch does not have to allocate distinct tags
365	   to each of these routes - one tag would suffice. The distribution of
366	   tag binding information is unaffected by whether there is a one-to-
367	   one or one-to-many mapping between tags and routes. Now consider a
368	   tag switch that receives from one of its neighbors (tag switching
369	   peers) tag binding information for a set of routes, such that the set
370	   is bound to a single tag. If the switch decides to use some or all of
371	   the routes in the set, then for these routes the switch does not need
372	   to allocate individual tags - one tag would suffice. Such an approach
373	   may be valuable when tags are a precious resource. Note that the
374	   ability to support many-to-one mapping makes no assumptions about the
375	   routing protocols being used.

377	   When a tag switch adds a tag to a previously untagged packet the tag
378	   could be either associated with the route to the destination address
379	   carried in the packet, or with the route to some other tag switch
380	   along the path to the destination (in some cases the address of that
381	   other tag switch could be gleaned from network layer routing
382	   protocols). The latter option provides yet another way of mapping
383	   multiple routes into a single tag. However, this option is either
384	   dependent on particular routing protocols, or would require a
385	   separate mechanism for discovering tag switches along a path.

387	   To understand the scaling properties of tag switching in conjunction
388	   with destination-based routing, observe that the total number of tags
389	   that a tag switch has to maintain can not be greater than the number
390	   of routes in the switch's FIB. Moreover, as we have just seen, the
391	   number of tags can be much less than the number of routes. Thus, much
392	   less state is required than would be the case if tags were allocated
393	   to individual flows.

395	   In general, a tag switch will try to populate its TIB with incoming
396	   and outgoing tags for all routes to which it has reachability, so
397	   that all packets can be forwarded by simple label swapping. Tag
398	   allocation is thus driven by topology (routing), not data traffic -
399	   it is the existence of a FIB entry that causes tag allocations, not
400	   the arrival of data packets.

402	   Use of tags associated with routes, rather than flows, also means
403	   that there is no need to perform flow classification procedures for
404	   all the flows to determine whether to assign a tag to a flow. That,
405	   in turn, simplifies the overall scheme, and makes it more robust and
406	   stable in the presence of changing traffic patterns.

408	   Note that when tag switching is used to support destination-based
409	   routing, tag switching does not completely eliminate the need to
410	   perform normal Network Layer forwarding at some network elements.
411	   First of all, to add a tag to a previously untagged packet requires
412	   normal Network Layer forwarding. This function could be performed by
413	   the first hop router, or by the first router on the path that is able
414	   to participate in tag switching. In addition, whenever a tag switch
415	   aggregates a set of routes (e.g., by using the technique of
416	   hierarchical routing), into a single tag, and the routes do not share
417	   a common next hop, the switch needs to perform Network Layer
418	   forwarding for packets carrying that tag. However, one could observe
419	   that the number of places where routes get aggregated is smaller than
420	   the total number of places where forwarding decisions have to be
421	   made. Moreover, quite often aggregation is applied to only a subset
422	   of the routes maintained by a tag switch. As a result, on average a
423	   packet can be forwarded most of the time using the tag switching
424	   algorithm. Note that many tag switches may not need to perform any
425	   network layer forwarding.

427	6.2. Hierarchy of routing knowledge

429	   The IP routing architecture models a network as a collection of
430	   routing domains. Within a domain, routing is provided via interior
431	   routing (e.g., OSPF), while routing across domains is provided via
432	   exterior routing (e.g., BGP). However, all routers within domains
433	   that carry transit traffic (e.g., domains formed by Internet Service
434	   Providers) have to maintain information provided by not just interior
435	   routing, but exterior routing as well, even if only some of these
436	   routers participate in exterior routing. That creates certain
437	   problems. First of all, the amount of this information is not
438	   insignificant. Thus it places additional demand on the resources
439	   required by the routers.  Moreover, increase in the volume of routing
440	   information quite often increases routing convergence time. This, in
441	   turn, degrades the overall performance of the system.

443	   Tag switching allows complete decoupling of interior and exterior
444	   routing. With tag switching only tag switches at the border of a
445	   domain would be required to maintain routing information provided by
446	   exterior routing - all other switches within the domain would just
447	   maintain routing information provided by the domains interior routing
448	   (which is usually significantly smaller than the exterior routing
449	   information), with no "leaking" of exterior routing information into
450	   interior routing. This, in turn, reduces the routing load on non-
451	   border switches, and shortens routing convergence time.

453	   To support this functionality, tag switching allows a packet to carry
454	   not one but a set of tags, organized as a stack. A tag switch could
455	   either swap the tag at the top of the stack, or pop the stack, or
456	   swap the tag and push one or more tags into the stack.

458	   Consider a tag switch that is at the border of a routing domain. This
459	   switch maintains both exterior and interior routes. The interior
460	   routes provide routing information and tags to all the other tag
461	   switches within the domain. For each exterior route that the switch
462	   receives from some other border tag switch that is in the same domain
463	   as the local switch, the switch maintains not just a tag associated
464	   with the route, but also a tag associated with the route to that
465	   other border tag switch. Moreover, for inter-domain routing protocols
466	   that are capable of passing the "third-party" next hop information
467	   the switch would maintain a tag associated with the route to the next
468	   hop, rather than with the route to the border tag switch from whom
469	   the local switch received the exterior route.

471	   When a packet is forwarded between two (border) tag switches in
472	   different domains, the tag stack in the packet contains just one tag
473	   (associated with an exterior route). However, when a packet is
474	   forwarded within a domain, the tag stack in the packet contains not
475	   one, but two tags (the second tag is pushed by the domain's ingress
476	   border tag switch). The tag at the top of the stack provides packet
477	   forwarding to an appropriate egress border tag switch (or the
478	   "third-party" next hop), while the next tag in the stack provides
479	   correct packet forwarding at the egress switch (or at the "third-
480	   party" next hop). The stack is popped by either the egress switch (or
481	   the "third-party" next hop) or by the penultimate (with respect to
482	   the egress switch/"third-party" next hop) switch.

484	   One could observe that when tag switching is confined to a single
485	   routing domain, the above still could be used to decouple interior
486	   from exterior routing, similar to what was described above. However,
487	   in this case a border tag switch wouldn't maintain tags associated
488	   with each exterior route, and forwarding between domains would be
489	   performed at the network layer.

491	   The control component used in this scenario is fairly similar to the
492	   one used with destination-based routing. In fact, the only essential
493	   difference is that in this scenario the tag binding information is
494	   distributed both among physically adjacent tag switches, and among
495	   border tag switches within a single domain. One could also observe
496	   that the latter (distribution among border switches) could be
497	   trivially accommodated by very minor extensions to BGP.

499	   The notion of supporting hierarchy of routing knowledge with tag
500	   switching is not limited to the case of exterior/interior routing,
501	   but could be applicable to other cases where the hierarchy of routing
502	   knowledge is possible. Moreover, while the above describes only a
503	   two-level hierarchy of routing knowledge, the tag switching
504	   architecture does not impose limits on the depth of the hierarchy.

506	6.3. Multicast

508	   Essential to multicast routing is the notion of spanning trees.
509	   Multicast routing procedures (e.g., PIM) are responsible for
510	   constructing such trees (with receivers as leafs), while multicast
511	   forwarding is responsible for forwarding multicast packets along such
512	   trees. Thus, to support a multicast forwarding function with tag
513	   switching we need to be able to associate a tag with a multicast
514	   tree.  The following describes the procedures for allocation and
515	   distribution of tags for multicast.

517	   When tag switching is used for multicast, it is important that tag
518	   switching be able to utilize multicast capabilities provided by the
519	   Data Link layer (e.g., multicast capabilities provided by Ethernet).
520	   To be able to do this, an (upstream) tag switch connected to a given
521	   Data Link subnetwork should use the same tag when forwarding a
522	   multicast packet to all of the (downstream) switches on that
523	   subnetwork. This way the packet will be multicasted at the Data Link
524	   layer over the subnetwork. To support this, all tag switches that are
525	   part of a given multicast tree and are on a common subnetwork must
526	   agree on a common tag that would be used for forwarding multicast
527	   packets along the tree over the subnetwork. Moreover, since multicast
528	   forwarding is based on Reverse Path Forwarding (RPF), it is crucial
529	   that, when a tag switch receives a multicast packet, a tag carried in
530	   a packet must enable the switch to identify both (a) a particular
531	   multicast group, as well as (b) the previous hop (upstream) tag
532	   switch that sent the packet.

534	   To support the requirements outlined in the previous paragraph, the
535	   tag switching architecture assumes that (a) multicast tags are
536	   associated with interfaces on a tag switch (rather than with a tag
537	   switch as a whole), (b) the tag space that a tag switch could use for
538	   allocating tags for multicast is partitioned into non-overlapping
539	   regions among all the tag switches connected to a common Data Link
540	   subnetwork, and (c) there are procedures by which tag switches that
541	   belong to a common multicast tree and are on a common Data Link
542	   subnetwork agree on the tag switch that is responsible for allocating
543	   a tag for the tree.

545	   One possible way of partitioning tag space into non-overlapping
546	   regions among tag switches connected to a common subnetwork is for
547	   each tag switch to claim a region of the space and announce this
548	   region to its neighbors. Conflicts are resolved based on the IP
549	   address of the contending switches (the higher address wins, the
550	   lower retries). Once the tag space is partitioned among tag switches,
551	   the switches may create bindings between tags and multicast trees
552	   (routes).

554	   At least in principle there are two possible ways to create bindings
555	   between tags and multicast trees (routes). With the first alternative
556	   for a set of tag switches that share a common Data Link subnetwork,
557	   the tag switch that is upstream with respect to a particular
558	   multicast tree allocates a tag (out of its own region that does not
559	   overlap with the regions of other switches on the subnetwork), binds
560	   the tag to a multicast route, and then advertises the binding to all
561	   the (downstream) switches on the subnetwork. With the second
562	   alternative, one of the tag switches that is downstream with respect
563	   to a particular multicast tree allocates a tag (out of its own region
564	   that does not overlap with the regions of other switches on the
565	   subnetwork), binds the tag to a multicast route, and then advertises
566	   the binding to all the switches (both downstream and upstream) on the
567	   subnetwork. Usually the first tag switch to join the group is the one
568	   that performs the allocation.

570	   Each of the above alternatives has its own trade-offs. The first
571	   alternative is fairly simple - one upstream router does the tag
572	   binding and multicasts the binding downstream. However, the first
573	   alternative may create uneven distribution of allocated tags, as some
574	   tag switches on a common subnetwork may have more upstream multicast
575	   sources than the others. Also, changes in topology could result in
576	   upstream neighbor changes, which in turn would require tag re-
577	   binding. Finally, one could observe that distributing tag binding
578	   from upstream towards downstream is inconsistent with the direction
579	   of multicast routing information distribution (from downstream
580	   towards upstream).

582	   The second alternative, even if more complex that the first one, has
583	   its own advantages. For one thing, it makes distribution of multicast
584	   tag binding consistent with the distribution of unicast tag binding.
585	   It also makes distribution of multicast tag binding consistent with
586	   the distribution of multicast routing information. This, in turn,
587	   allows the piggybacking of tag binding information on existing
588	   multicast routing protocols (PIM). This alternative also avoids the
589	   need for tag re-binding when there are changes in upstream neighbor.
590	   Finally it is more likely to provide more even distribution of
591	   allocated tags, as compared to the first alternative. Note that this
592	   approach does require a mechanism to choose the tag allocator from
593	   among the downstream tag switches on the subnetwork.

595	6.4. Quality of service

597	   Two mechanisms are needed for providing a range of qualities of
598	   service to packets passing through a router or a tag switch. First,
599	   we need to classify packets into different classes. Second, we need
600	   to ensure that the handling of packets is such that the appropriate
601	   QOS characteristics (bandwidth, loss, etc.) are provided to each
602	   class.

604	   Tag switching provides an easy way to mark packets as belonging to a
605	   particular class after they have been classified the first time.
606	   Initial classification could be done using configuration information
607	   (e.g., all traffic from a certain interface) or using information
608	   carried in the network layer or higher layer headers (e.g., all
609	   packets between a certain pair of hosts). A tag corresponding to the
610	   resultant class would then be applied to the packet. Tagged packets
611	   can then be efficiently handled by the tag switching routers in their
612	   path without needing to be reclassified. The actual scheduling and
613	   queueing of packets is largely orthogonal - the key point here is
614	   that tag switching enables simple logic to be used to find the state
615	   that identifies how the packet should be scheduled.

617	   Tag switching can, for example, be used to support a small number of
618	   classes of service in a service provider network (e.g. premium and
619	   standard). On frame-based media, the class can be encoded by a field
620	   in the tag header. On ATM tag switches, additional tags can be
621	   allocated to differentiate the different classes. For example, rather
622	   than having one tag for each destination prefix in the FIB, an ATM
623	   tag switch could have two tags per prefix, one to be used by premium
624	   traffic and one by standard. Thus a tag binding in this case is a
625	   triple consisting of <prefix, QOS class, tag>. Such a tag would be
626	   used both to make a forwarding decision and to make a scheduling
627	   decision, e.g., by selecting the appropriate queue in a weighted fair
628	   queueing (WFQ) scheduler.

630	   To provide a finer granularity of QOS, tag switching can be used with
631	   RSVP. We propose a simple extension to RSVP in which a tag object is
632	   defined. Such an object can be carried in an RSVP reservation message
633	   and thus associated with a session. Each tag capable router assigns a
634	   tag to the session and passes it upstream with the reservation
635	   message. Thus the association of tags with RSVP sessions works very
636	   much like the binding of tags to routes with downstream allocation.
637	   Note, however, that binding is accomplished using RSVP rather than
638	   TDP. (It would be possible to use TDP, but it is simpler to extend
639	   RSVP to carry tags and this ensures that tags and reservation
640	   information are communicated in a similar manner.)

642	   When data packets are transmitted, the first router in the path that
643	   is tag-capable applies the tag that it received from its downstream
644	   neighbor. This tag can be used at the next hop to find the
645	   corresponding reservation state, to forward and schedule the packet
646	   appropriately, and to find the suitable outgoing tag value provided
647	   by the next hop.  Note that tag imposition could also be performed at
648	   the sending host.

650	6.5. Flexible routing (explicit routes)

652	   One of the fundamental properties of destination-based routing is
653	   that the only information from a packet that is used to forward the
654	   packet is the destination address. While this property enables highly
655	   scalable routing, it also limits the ability to influence the actual
656	   paths taken by packets. This, in turn, limits the ability to evenly
657	   distribute traffic among multiple links, taking the load off highly
658	   utilized links, and shifting it towards less utilized links. For
659	   Internet Service Providers (ISPs) who support different classes of
660	   service, destination-based routing also limits their ability to
661	   segregate different classes with respect to the links used by these
662	   classes. Some of the ISPs today use Frame Relay or ATM to overcome
663	   the limitations imposed by destination-based routing. Tag switching,
664	   because of the flexible granularity of tags, is able to overcome
665	   these limitations without using either Frame Relay or ATM.

667	   Another application where destination-based routing is no longer
668	   adequate is routing with resource reservations (QOS routing).
669	   Increasing the number of ways by which a particular reservation could
670	   traverse a network may improve the success of the reservation.
671	   Increasing the number of ways, in turn, requires the ability to
672	   explore paths that are not constrained to the ones constructed solely
673	   based on destination.

675	   To provide forwarding along paths that are different from the paths
676	   determined by destination-based routing, the control component of tag
677	   switching allows installation of tag bindings in tag switches that do
678	   not correspond to the destination-based routing paths.

680	   One possible alternative for supporting explicit routes is to allow
681	   TDP to carry information about an explicit route, where such a route
682	   could be expressed as a sequence of tag switches. Another alternative
683	   is to use tag-capable RSVP (see Section 6.4) as a mechanism to
684	   distribute tag bindings, and to augment RSVP with the ability to
685	   steer the PATH message along a particular (explicit) route. Finally,
686	   it is also possible in principle to use some form of source route
687	   (e.g., SDRP, GRE) to steer RSVP PATH messages carrying tag bindings
688	   along a particular path. Note, however, that this would require a
689	   change to the way in which RSVP handles PATH messages, as it would be
690	   necessary to store the source route as part of the PATH state.

692	7. Tag Forwarding Granularities and Forwarding Equivalence Classes

694	   A conventional router has some sort of structure or set of structures
695	   which may be called a "forwarding table", which has a finite number
696	   of entries. Whenever a packet is received, the router applies a
697	   classification algorithm which maps the packet to one of the
698	   forwarding table entries. This entry specifies how to forward the
699	   packet.

701	   We can think of this classification algorithm as a means of
702	   partitioning the universe of possible packets into a finite set of
703	   "Forwarding Equivalence Classes" (FECs).

705	   Each router along a path must have some way of determining the next
706	   hop for that FEC. For a given FEC, the corresponding entry in the
707	   forwarding table may be created dynamically, by operation of the
708	   routing protocols (unicast or multicast), or it might be created by
709	   configuration, or it might be created by some combination of
710	   configuration and protocol.

712	   In tag switching, if a pair of tag switches are adjacent along a tag
713	   switched path, they must agree on an assignment of tags to FECs. Once
714	   this agreement is made, all tag switches on the tag switched path
715	   other than the first are spared the work of actually executing the
716	   classification algorithm. In fact, subsequent tag switches need not
717	   even have the code which would be necessary to do this.

719	   There are a large number of different ways in which one may choose to
720	   partition a set of packets into FECs. Some examples:

722	      1. Consider two packets to be in the same FEC if there is a single
723	      address prefix in the routing table which is the longest match for
724	      the destination address of each packet;

726	      2. Consider two packets to be in the same FEC if these packets
727	      have to traverse through a common router/tag switch;

729	      3. Consider two packets to be in the same FEC if they have the
730	      same source address and the same destination address;

732	      4. Consider two packets to be in the same FEC if they have the
733	      same source address, the same destination address, the same
734	      transport protocol, the same source port, and the same destination
735	      port.

737	      5. Consider two packets to be in the same FEC if they are alike in
738	      some arbitrary manner determined by policy. Note that the
739	      assignment of a packet to a FEC by policy need not be done solely
740	      by examining the network layer header. One might want, for
741	      example, all packets arriving over a certain interface to be
742	      classified into a single FEC, so that those packets all get
743	      tunnelled through the network to a particular exit point.

745	   Other examples can easily be thought of.

747	   In case 1, the FEC can be identified by an address prefix (as
748	   described in Section 6.1). In case 2, the FEC can be identified by
749	   the address of a tag switch (as described in Section 6.1). Both 1 and
750	   2 are useful for binding tags to unicast routes - tags are bound to
751	   FECs, and an address prefix, or an address identifies a particular
752	   FEC. Case 3 is useful for binding tags to multicast trees that are
753	   constructed by protocols such as PIM (as described in Section 6.3).
754	   Case 4 is useful for binding tags to individual flows, using, say,
755	   RSVP (as described in Section 6.4). Case 5 is useful as a way of
756	   connecting two pieces of a private network across a public backbone
757	   (without even assuming that the private network is an IP network) (as
758	   described in Section 6.5).

760	   Any number of different kinds of FEC can co-exist in a single tag
761	   switch, as long as the result is to partition the universe of packets
762	   seen by that tag switch. Likewise, the procedures which different tag
763	   switches use to classify (hitherto untagged) packets into FECs need
764	   not be identical.

766	   Networks could be organized around a hierarchy of FECs. For example,
767	   (non-adjacent) tag switches TSa and TSb may classify packets into
768	   some set of FECs FEC1,...,FECn.  However from the point of view of
769	   the intermediate tag switches between TSa and TSb, all of these FECs
770	   may be treated indistinguishably. That is, as far as the intermediate
771	   tag switches are concerned, the union of the FEC1,...,FECn is a
772	   single FEC.  Each intermediate tag switch may then prefer to use a
773	   single tag for this union (rather than maintaining individual tags
774	   for each member of this union). Tag switching accommodates this by
775	   providing a hierarchy of tags, organized in a stack.

777	   Much of the power of tag switching arises from the facts that:

779	      - there are so many different ways to partition the packets into
780	      FECs,

782	      - different tag switches can partition the hitherto untagged
783	      packets in different ways,

785	      - the route to be used for a particular FEC can be chosen in
786	      different ways,

788	      - a hierarchy of tags, organized as a stack, can be used to
789	      represent the network's hierarchy of FECs.

791	   Note that tag switching does not specify, as an element of any
792	   particular protocol, a general notion of "FEC identifier". Even if it
793	   were possible to have such a thing, there is no need for it, since
794	   there is no "one size fits all" setup protocol which works for any
795	   arbitrary combination of packet classifier and routing protocol.
796	   That's why tag distribution is sometimes done with TDP, sometimes
797	   with BGP, sometimes with PIM, sometimes with RSVP.

799	8. Tag switching with ATM

801	   Since the tag switching forwarding paradigm is based on label
802	   swapping, and since ATM forwarding is also based on label swapping,
803	   tag switching technology can readily be applied to ATM switches by
804	   implementing the control component of tag switching.

806	   The tag information needed for tag switching can be carried in the
807	   VCI field. If two levels of tagging are needed, then the VPI field
808	   could be used as well, although the size of the VPI field limits the
809	   size of networks in which this would be practical. However, for most
810	   applications of one level of tagging the VCI field is adequate.

812	   To obtain the necessary control information, the switch should be
813	   able to support the tag switching control component. Moreover, if the
814	   switch has to perform routing information aggregation, then to
815	   support destination-based unicast routing the switch should be able
816	   to perform Network Layer forwarding for some fraction of the traffic
817	   as well.

819	   Supporting the destination-based routing function with tag switching
820	   on an ATM switch may require the switch to maintain not one, but
821	   several tags associated with a route (or a group of routes with the
822	   same next hop). This is necessary to avoid the interleaving of
823	   packets which arrive from different upstream tag switches, but are
824	   sent concurrently to the same next hop.

826	   If an ATM switch has built-in mechanism(s) to suppress cell
827	   interleave, then the switch could implement the destination-based
828	   routing function precisely the way it was described in Section 6.1.
829	   This would eliminate the need to maintain several tags per route.
830	   Note, however, that suppressing cell interleave is not part of the
831	   ATM User Plane, as defined by the ATM Forum.

833	   Yet another alternative that eliminates the need to maintain several
834	   tags per route is to carry the tag information in the VPI field, and
835	   use the VCI field for identifying cells that were sent by different
836	   tag switches. Note, however, that the scalability of this alternative
837	   is constrained by the size of the VPI space (4096 tags total).
838	   Moreover, this alternative assumes that for a set of ATM tag switches
839	   that form a contiguous segment of a network topology there exists a
840	   mechanism to assign to each ATM tag switch around the edge of the
841	   segment a set of unique VCIs that would be used by this switch alone.

843	   The downstream tag allocation on demand scheme is likely to be a
844	   preferred scheme for the tag allocation and TIB maintenance
845	   procedures with ATM switches, as this scheme allows efficient use of
846	   entries in the cross-connect tables maintained by ATM switches.

848	   Implementing tag switching on an ATM switch simplifies integration of
849	   ATM switches and routers. From a routing peering point of view an ATM
850	   switch capable of tag switching would appear as a router to an
851	   adjacent router; this reduces the number of routing peers a router
852	   would have to maintain (relative to the common arrangement where a
853	   large number of routers are fully meshed over an ATM cloud). Tag
854	   switching enables better routing, as it exposes the underlying
855	   physical topology to the Network Layer routing. Finally tag switching
856	   simplifies overall operations by employing common addressing,
857	   routing, and management procedures among both routers and ATM
858	   switches. That could provide a viable, more scalable alternative to
859	   the overlay model. Because creation of tag binding is driven by
860	   control traffic, rather than data traffic, application of this
861	   approach to ATM switches does not produce high call setup rates, nor
862	   does it depend on the longevity of flows.

864	   Implementing tag switching on an ATM switch does not preclude the
865	   ability to support a traditional ATM control plane (e.g., PNNI) on
866	   the same switch. The two components, tag switching and the ATM
867	   control plane, would operate in a Ships In the Night mode (with
868	   VPI/VCI space and other resources partitioned so that the components
869	   do not interact).

871	9. Tag switching migration strategies

873	   Since tag switching is performed between a pair of adjacent tag
874	   switches, and since the tag binding information can be distributed on
875	   a pairwise basis, tag switching could be introduced in a fairly
876	   simple, incremental fashion. For example, once a pair of adjacent
877	   routers are converted into tag switches, each of the switches would
878	   tag packets destined to the other, thus enabling the other switch to
879	   use tag switching. Since tag switches use the same routing protocols
880	   as routers, the introduction of tag switches has no impact on
881	   routers. In fact, a tag switch connected to a router acts just as a
882	   router from the router's perspective.

884	   As more and more routers are upgraded to enable tag switching, the
885	   scope of functionality provided by tag switching widens. For example,
886	   once all the routers within a domain are upgraded to support tag
887	   switching, in becomes possible to start using the hierarchy of
888	   routing knowledge function.

890	10. Summary

892	   In this paper we described the tag switching technology. Tag
893	   switching is not constrained to a particular Network Layer protocol -
894	   it is a multiprotocol solution. The forwarding component of tag
895	   switching is simple enough to facilitate high performance forwarding,
896	   and may be implemented on high performance forwarding hardware such
897	   as ATM switches. The control component is flexible enough to support
898	   a wide variety of routing functions, such as destination-based
899	   routing, multicast routing, hierarchy of routing knowledge, and
900	   explicitly defined routes. By allowing a wide range of forwarding
901	   granularities that could be associated with a tag, we provide both
902	   scalable and functionally rich routing. A combination of a wide range
903	   of forwarding granularities and the ability to evolve the control
904	   component fairly independently from the forwarding component results
905	   in a solution that enables graceful introduction of new routing
906	   functionality to meet the demands of a rapidly evolving computer
907	   networking environment.

909	11. Security Considerations

911	   Security considerations are not addressed in this document.

913	12. Intellectual Property Considerations

915	   Cisco Systems may seek patent or other intellectual property
916	   protection for some or all of the technologies disclosed in this
917	   document. If any standards arising from this document are or become
918	   protected by one or more patents assigned to Cisco Systems, Cisco
919	   intends to disclose those patents and license them under openly
920	   specified and non-discriminatory terms, for no fee.

922	13. Acknowledgments

924	   Significant contributions to this work have been made by Anthony
925	   Alles, Fred Baker, Paul Doolan, Guy Fedorkow, Jeremy Lawrence, Arthur
926	   Lin, Morgan Littlewood, Keith McCloghrie, and Dan Tappan.

928	14. References

930	15. Authors' Addresses

932	      Yakov Rekhter
933	      Cisco Systems, Inc.
934	      170 Tasman Drive
935	      San Jose, CA, 95134
936	      E-mail: yakov@cisco.com

938	      Bruce Davie
939	      Cisco Systems, Inc.
940	      250 Apollo Drive
941	      Chelmsford, MA, 01824
942	      E-mail: bsd@cisco.com

944	      Dave Katz
945	      Cisco Systems, Inc.
946	      170 Tasman Drive
947	      San Jose, CA, 95134
948	      E-mail: dkatz@cisco.com

950	      Eric Rosen
951	      Cisco Systems, Inc.
952	      250 Apollo Drive
953	      Chelmsford, MA, 01824
954	      E-mail: erosen@cisco.com

956	      George Swallow
957	      Cisco Systems, Inc.
958	      250 Apollo Drive
959	      Chelmsford, MA, 01824
960	      E-mail: swallow@cisco.com

962	      Dino Farinacci
963	      Cisco Systems, Inc.
964	      170 West Tasman Drive
965	      San Jose, CA 95134
966	      E-mail: dino@cisco.com