idnits 2.17.1 

draft-rekhter-tagswitch-arch-01.txt:
  ** The Abstract section seems to be numbered


  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-20) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 21
     longer pages, the longest (page 2) being 61 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 1997) is 9776 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

     No issues found here.

     Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                                        Yakov Rekhter
3	Expiration date:January 1998                          cisco Systems
4	                                                        Bruce Davie
5	                                                      cisco Systems
6	                                                          Dave Katz
7	                                              Juniper Networks Inc.
8	                                                         Eric Rosen
9	                                                      cisco Systems
10	                                                     George Swallow
11	                                                      cisco Systems
12	                                                     Dino Farinacci
13	                                                      cisco Systems
14	                                                          July 1997

16	                 Tag Switching Architecture - Overview

18	                  draft-rekhter-tagswitch-arch-01.txt

20	1. Status of this Memo

22	   This document is an Internet Draft. Internet Drafts are working
23	   documents of the Internet Engineering Task Force (IETF), its Areas,
24	   and its Working Groups. Note that other groups may also distribute
25	   working documents as Internet Drafts.

27	   Internet Drafts are draft documents valid for a maximum of six
28	   months. Internet Drafts may be updated, replaced, or obsoleted by
29	   other documents at any time. It is not appropriate to use Internet
30	   Drafts as reference material or to cite them other than as a "working
31	   draft" or "work in progress."

33	   Please check the 1id-abstracts.txt listing contained in the
34	   internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net,
35	   nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the
36	   current status of any Internet Draft.

38	2. Abstract

40	   This document provides an overview of tag switching. Tag switching is
41	   a way to combine the label-swapping forwarding paradigm with network
42	   layer routing. This has several advantages. Tags can have a wide
43	   spectrum of forwarding granularities, so at one end of the spectrum a
44	   tag could be associated with a group of destinations, while at the
45	   other a tag could be associated with a single application flow. At
46	   the same time forwarding based on tag switching, due to its
47	   simplicity, is well suited to high performance forwarding. These
48	   factors facilitate the development of a routing system which is both
49	   functionally rich and scalable. Finally, tag switching simplifies
50	   integration of routers and ATM switches by employing common
51	   addressing, routing, and management procedures.

53	3. Introduction

55	   Continuous growth of the Internet demands higher bandwidth within the
56	   Internet Service Providers (ISPs). However, growth of the Internet is
57	   not the only driving factor for higher bandwidth - demand for higher
58	   bandwidth also comes from emerging multimedia applications. Demand
59	   for higher bandwidth, in turn, requires higher forwarding performance
60	   for both multicast and unicast traffic.

62	   The growth of the Internet also demands improved scaling properties
63	   of the Internet routing system. The ability to contain the volume of
64	   routing information maintained by individual routers and the ability
65	   to build a hierarchy of routing knowledge are essential to support a
66	   high quality, scalable routing system.

68	   While the destination-based forwarding paradigm is adequate in many
69	   situations, we already see examples where it is no longer adequate.
70	   The ability to overcome the rigidity of destination-based forwarding
71	   and to have more flexible control over how traffic is routed is
72	   likely to become more and more important.

74	   We see the need to improve forwarding performance while at the same
75	   time adding routing functionality to support multicast, allowing more
76	   flexible control over how traffic is routed, and providing the
77	   ability to build a hierarchy of routing knowledge. Moreover, it
78	   becomes more and more crucial to have a routing system that can
79	   support graceful evolution to accommodate new and emerging
80	   requirements.

82	   Tag switching is a technology that provides an efficient solution to
83	   these challenges. Tag switching blends the flexibility and rich
84	   functionality provided by Network Layer routing with the simplicity
85	   provided by the label swapping forwarding paradigm. The simplicity of
86	   the tag switching forwarding paradigm (label swapping) enables
87	   improved forwarding performance, while maintaining competitive
88	   price/performance. By associating a wide range of forwarding
89	   granularities with a tag, the same forwarding paradigm can be used to
90	   support a wide variety of routing functions, such as destination-
91	   based routing, multicast, hierarchy of routing knowledge, and
92	   flexible routing control. Finally, a combination of simple
93	   forwarding, a wide range of forwarding granularities, and the ability
94	   to evolve routing functionality while preserving the same forwarding
95	   paradigm enables a routing system that can gracefully evolve to
96	   accommodate new and emerging requirements.

98	4. Tag Switching components

100	   Tag switching consists of two components: forwarding and control. The
101	   forwarding component uses the tag information (tags) carried by
102	   packets and the tag forwarding information maintained by a tag switch
103	   to perform packet forwarding. The control component is responsible
104	   for maintaining correct tag forwarding information among a group of
105	   inter- connected tag switches.

107	   Segregating control and forwarding into separate components promotes
108	   modularity, which in turn enables to build a system that can
109	   gracefully evolve to accommodate new and emerging requirements.

111	5. Forwarding component

113	   The fundamental forwarding paradigm employed by tag switching is
114	   based on the notion of label swapping. When a packet with a tag is
115	   received by a tag switch, the switch uses the tag as an index in its
116	   Tag Information Base (TIB). Each entry in the TIB consists of an
117	   incoming tag, and one or more sub-entries of the form <outgoing tag,
118	   outgoing interface, outgoing link level information>. If the switch
119	   finds an entry with the incoming tag equal to the tag carried in the
120	   packet, then for each <outgoing tag, outgoing interface, outgoing
121	   link level information> in the entry the switch replaces the tag in
122	   the packet with the outgoing tag, replaces the link level information
123	   (e.g MAC address) in the packet with the outgoing link level
124	   information, and forwards the packet over the outgoing interface.

126	   From the above description of the forwarding component we can make
127	   several observations. First, the forwarding decision is based on the
128	   exact match algorithm using a fixed length, fairly short tag as an
129	   index. This enables a simplified forwarding procedure, relative to
130	   longest match forwarding traditionally used at the network layer.
131	   This in turn enables higher forwarding performance (higher packets
132	   per second). The forwarding procedure is simple enough to allow a
133	   straightforward hardware implementation.

135	   A second observation is that the forwarding decision is independent
136	   of the tag's forwarding granularity. For example, the same forwarding
137	   algorithm applies to both unicast and multicast - a unicast entry
138	   would just have a single (outgoing tag, outgoing interface, outgoing
139	   link level information) sub-entry, while a multicast entry may have
140	   one or more (outgoing tag, outgoing interface, outgoing link level
141	   information) sub-entries. (For multi-access links, the outgoing link
142	   level information in this case would include a multicast MAC
143	   address.) This illustrates how with tag switching the same forwarding
144	   paradigm can be used to support different routing functions (e.g.,
145	   unicast, multicast, etc...)

147	   The simple forwarding procedure is thus essentially decoupled from
148	   the control component of tag switching. New routing (control)
149	   functions can readily be deployed without disturbing the forwarding
150	   paradigm. This means that it is not necessary to re-optimize
151	   forwarding performance (by modifying either hardware or software) as
152	   new routing functionality is added.

154	   In the tag switching architecture, various implementation options are
155	   acceptable. For example, support for network layer forwarding by a
156	   tag switch (i.e., forwarding based on the network layer header as
157	   opposed to a tag) is optional. Moreover, use of network layer
158	   forwarding may be constrained to handling network layer control
159	   traffic only. (Note, however, that a tag switch must be able to
160	   source and sink network layer packets, e.g. to participate in network
161	   layer routing protocols)

163	   For the purpose of handling network layer hop count (time-to-live)
164	   the architecture allows two alternatives: network layer hops may
165	   correspond directly to hops formed by tag switches, or one network
166	   layer hop may correspond to several tag switched hops.

168	   When a switch receives a packet with a tag, and the TIB maintained by
169	   the switch has no entry with the incoming tag equal to the tag
170	   carried by the packet, or the entry exists, the outgoing tag entry is
171	   entry, and the entry does not indicate local delivery to the switch,
172	   the switch may either (a) discard the packet, or (b) strip the tag
173	   information, and submit the packet for network layer processing.
174	   Support for the latter is optional (as support for network layer
175	   forwarding is optional). Note that it may not always be possible to
176	   successfully forward a packet after stripping a tag even if a tag
177	   switch supports network layer forwarding.

179	   The architecture allows a tag switch to maintain either a single TIB
180	   per tag switch, or a TIB per interface. Moreover, a tag switch could
181	   mix both of these options - some tags could be maintained in a single
182	   TIB, while other tags could be maintained in a TIB associated with
183	   individual interfaces.

185	5.1. Tag encapsulation

187	   Tag switching clearly requires a tag to be carried in each packet.
188	   The tag information can be carried in a variety of ways:

190	      - as a small "shim" tag header inserted between the layer 2 and
191	      the Network Layer headers;

193	      - as part of the layer 2 header, if the layer 2 header provides
194	      adequate semantics (e.g., Frame Relay, or ATM);

196	      - as part of the Network Layer header (e.g., using the Flow Label
197	      field in IPv6 with appropriately modified semantics).

199	   It is therefore possible to implement tag switching over virtually
200	   any media type including point-to-point links, multi-access links,
201	   and ATM. At the same time the forwarding component allows specific
202	   optimizations for particular media (e.g., ATM).

204	   Observe also that the tag forwarding component is Network Layer
205	   independent. Use of control component(s) specific to a particular
206	   Network Layer protocol enables the use of tag switching with
207	   different Network Layer protocols.

209	6. Control component

211	   Essential to tag switching is the notion of binding between a tag and
212	   Network Layer routing (routes). The control component is responsible
213	   for creating tag bindings, and then distributing the tag binding
214	   information among tag switches. Creating a tag binding involves
215	   allocating a tag, and then binding a tag to a route. The distribution
216	   of tag binding information among tag switches could be accomplished
217	   via several options:

219	      - piggybacking on existing routing protocols
220	      - using a separate Tag Distribution Protocol (TDP)

222	   While the architecture supports distribution of tag binding
223	   information that is independent of the underlying routing protocols,
224	   the architecture acknowledges that considerable optimizations can be
225	   achieved in some cases by small enhancements of existing protocols to
226	   enable piggybacking tag binding information on these protocols.

228	   One important characteristic of the tag switching architecture is
229	   that creation of tag bindings is driven primarily by control traffic
230	   rather than by data traffic. Control traffic driven creation of tag
231	   bindings has several advantages, as compared to data traffic driven
232	   creation of tag bindings. For one thing, it minimizes the amount of
233	   additional control traffic needed to distribute tag binding
234	   information, as tag binding information is distributed only in
235	   response to control traffic, independent of data traffic. It also
236	   makes the overall scheme independent of and insensitive to the data
237	   traffic profile/pattern. Control traffic driven creation of tag
238	   binding improves forwarding performance, as tags are precomputed
239	   (prebound) before data traffic arrives, rather than being created as
240	   data traffic arrives. It also simplifies the overall system behavior,
241	   as the control plane is controlled solely by control traffic, rather
242	   than by a mix of control and data traffic.

244	   Another important characteristic of the tag switching architecture is
245	   that distribution and maintenance of tag binding information is
246	   consistent with distribution and maintenance of the associated
247	   routing information. For example, distribution of tag binding
248	   information for tags associated with unicast routing is based on the
249	   technique of incremental updates with explicit acknowledgment. This
250	   is very similar to the way unicast routing information gets
251	   distributed by such protocols as OSPF and BGP. In contrast,
252	   distribution of tag binding information for tags associated with
253	   multicast routing is based on period updates/ refreshes, without any
254	   explicit acknowledgments. This is consistent with the way multicast
255	   routing information is distributed by such protocols as PIM.

257	   To provide good scaling characteristics, while also accommodating
258	   diverse routing functionality, tag switching supports a wide range of
259	   forwarding granularities. At one extreme a tag could be associated
260	   (bound) to a group of routes (more specifically to the Network Layer
261	   Reachability Information of the routes in the group). At the other
262	   extreme a tag could be bound to an individual application flow (e.g.,
263	   an RSVP flow). A tag could also be bound to a multicast tree. In
264	   addition, a tag may be bound to a path that has been selected for a
265	   certain set of packets based on some policy (e.g. an explicit route).

267	   The control component is organized as a collection of modules, each
268	   designed to support a particular routing function. To support new
269	   routing functions, new modules can be added. The architecture does
270	   not mandate a prescribed set of modules that have to be supported by
271	   every tag switch.

273	   The following describes some of the modules.

275	6.1. Destination-based routing

277	   In this section we describe how tag switching can support
278	   destination-based routing. Recall that with destination-based routing
279	   a router makes a forwarding decision based on the destination address
280	   carried in a packet and the information stored in the Forwarding
281	   Information Base (FIB) maintained by the router. A router constructs
282	   its FIB by using the information it receives from routing protocols
283	   (e.g., OSPF, BGP).

285	   To support destination-based routing with tag switching, a tag
286	   switch, just like a router, participates in routing protocols (e.g.,
287	   OSPF, BGP), and constructs its FIB using the information it receives
288	   from these protocols.

290	   There are three permitted methods for tag allocation and Tag
291	   Information Base (TIB) management: (a) downstream tag allocation, (b)
292	   downstream tag allocation on demand, and (c) upstream tag allocation.
293	   In all cases, a switch allocates tags and binds them to address
294	   prefixes in its FIB. In downstream allocation, the tag that is
295	   carried in a packet is generated and bound to a prefix by the switch
296	   at the downstream end of the link (with respect to the direction of
297	   data flow). On demand allocation means that tags will only be
298	   allocated and distributed by the downstream switch when it is
299	   requested to do so by the upstream switch. Method (b) is most useful
300	   in ATM networks (see Section 8). In upstream allocation, tags are
301	   allocated and bound at the upstream end of the link. Note that in
302	   downstream allocation, a switch is responsible for creating tag
303	   bindings that apply to incoming data packets, and receives tag
304	   bindings for outgoing packets from its neighbors. In upstream
305	   allocation, a switch is responsible for creating tag bindings for
306	   outgoing tags, i.e. tags that are applied to data packets leaving the
307	   switch, and receives bindings for incoming tags from its neighbors.

309	   The downstream tag allocation scheme operates as follows: for each
310	   route in its FIB the switch allocates a tag, creates an entry in its
311	   Tag Information Base (TIB) with the incoming tag set to the allocated
312	   tag, and then advertises the binding between the (incoming) tag and
313	   the route to other adjacent tag switches. The advertisement could be
314	   accomplished by either piggybacking the binding on top of the
315	   existing routing protocols, or by using a separate Tag Distribution
316	   Protocol (TDP). When a tag switch receives tag binding information
317	   for a route, and that information was originated by the next hop for
318	   that route, the switch places the tag (carried as part of the binding
319	   information) into the outgoing tag of the TIB entry associated with
320	   the route. This creates the binding between the outgoing tag and the
321	   route.

323	   With the downstream on demand tag allocation scheme, operation is as
324	   follows. For each route in its FIB, the switch identifies the next
325	   hop for that route. It then issues a request (via TDP) to the next
326	   hop for a tag binding for that route. When the next hop receives the
327	   request, it allocates a tag, creates an entry in its TIB with the
328	   incoming tag set to the allocated tag, and then returns the binding
329	   between the (incoming) tag and the route to the switch that sent the
330	   original request. When the switch receives the binding information,
331	   the switch creates an entry in its TIB, and sets the outgoing tag in
332	   the entry to the value received from the next hop. Handling of data
333	   packets is as for downstream allocation. The main application for
334	   this mode of operation is with ATM switches, as described in Section
335	   8.

337	   The upstream tag allocation scheme is used as follows. If a tag
338	   switch has one or more point-to-point interfaces, then for each route
339	   in its FIB whose next hop is reachable via one of these interfaces,
340	   the switch allocates a tag, creates an entry in its TIB with the
341	   outgoing tag set to the allocated tag, and then advertises to the
342	   next hop (via TDP) the binding between the (outgoing) tag and the
343	   route. When a tag switch that is the next hop receives the tag
344	   binding information, the switch places the tag (carried as part of
345	   the binding information) into the incoming tag of the TIB entry
346	   associated with the route.

348	   Note that, while we have described upstream allocation for the sake
349	   of completeness, we have found the two downstream allocation methods
350	   adequate for all practical purposes so far.

352	   Independent of which tag allocation method is used, once a TIB entry
353	   is populated with both incoming and outgoing tags, the tag switch can
354	   forward packets for routes bound to the tags by using the tag
355	   switching forwarding algorithm (as described in Section 5).

357	   When a tag switch creates a binding between an outgoing tag and a
358	   route, the switch, in addition to populating its TIB, also updates
359	   its FIB with the binding information. This enables the switch to add
360	   tags to previously untagged packets.

362	   So far we have described how a tag could be bound to a single route,
363	   creating a one-to-one mapping between routes and tags. However, under
364	   certain conditions it is possible to bind a tag not just to a single
365	   route, but to a group of routes, creating a many-to-one mapping
366	   between routes and tags. Consider a tag switch that is connected to a
367	   router.  It is quite possible that the switch uses the router as the
368	   next hop not just for one route, but for a group of routes. Under
369	   these conditions the switch does not have to allocate distinct tags
370	   to each of these routes - one tag would suffice. The distribution of
371	   tag binding information is unaffected by whether there is a one-to-
372	   one or one-to-many mapping between tags and routes. Now consider a
373	   tag switch that receives from one of its neighbors (tag switching
374	   peers) tag binding information for a set of routes, such that the set
375	   is bound to a single tag. If the switch decides to use some or all of
376	   the routes in the set, then for these routes the switch does not need
377	   to allocate individual tags - one tag would suffice. Such an approach
378	   may be valuable when tags are a precious resource. Note that the
379	   ability to support many-to-one mapping makes no assumptions about the
380	   routing protocols being used.

382	   When a tag switch adds a tag to a previously untagged packet the tag
383	   could be either associated with the route to the destination address
384	   carried in the packet, or with the route to some other tag switch
385	   along the path to the destination (in some cases the address of that
386	   other tag switch could be gleaned from network layer routing
387	   protocols). The latter option provides yet another way of mapping
388	   multiple routes into a single tag. However, this option is either
389	   dependent on particular routing protocols, or would require a
390	   separate mechanism for discovering tag switches along a path.

392	   To understand the scaling properties of tag switching in conjunction
393	   with destination-based routing, observe that the total number of tags
394	   that a tag switch has to maintain can not be greater than the number
395	   of routes in the switch's FIB. Moreover, as we have just seen, the
396	   number of tags can be much less than the number of routes. Thus, much
397	   less state is required than would be the case if tags were allocated
398	   to individual flows.

400	   In general, a tag switch will try to populate its TIB with incoming
401	   and outgoing tags for all routes to which it has reachability, so
402	   that all packets can be forwarded by simple label swapping. Tag
403	   allocation is thus driven by topology (routing), not data traffic -
404	   it is the existence of a FIB entry that causes tag allocations, not
405	   the arrival of data packets.

407	   Use of tags associated with routes, rather than flows, also means
408	   that there is no need to perform flow classification procedures for
409	   all the flows to determine whether to assign a tag to a flow. That,
410	   in turn, simplifies the overall scheme, and makes it more robust and
411	   stable in the presence of changing traffic patterns.

413	   Note that when tag switching is used to support destination-based
414	   routing, tag switching does not completely eliminate the need to
415	   perform normal Network Layer forwarding at some network elements.
416	   First of all, to add a tag to a previously untagged packet requires
417	   normal Network Layer forwarding. This function could be performed by
418	   the first hop router, or by the first router on the path that is able
419	   to participate in tag switching. In addition, whenever a tag switch
420	   aggregates a set of routes (e.g., by using the technique of
421	   hierarchical routing), into a single route, and the routes do not
422	   share a common next hop, the switch needs to perform Network Layer
423	   forwarding for packets carrying the tag associated with the
424	   aggregated route. However, one could observe that the number of
425	   places where routes get aggregated is smaller than the total number
426	   of places where forwarding decisions have to be made. Moreover, quite
427	   often aggregation is applied to only a subset of the routes
428	   maintained by a tag switch. As a result, on average a packet can be
429	   forwarded most of the time using the tag switching algorithm. Note
430	   that many tag switches may not need to perform any network layer
431	   forwarding.

433	6.2. Hierarchy of routing knowledge

435	   The IP routing architecture models a network as a collection of
436	   routing domains. Within a domain, routing is provided via interior
437	   routing (e.g., OSPF), while routing across domains is provided via
438	   exterior routing (e.g., BGP). However, all routers within domains
439	   that carry transit traffic (e.g., domains formed by Internet Service
440	   Providers) have to maintain information provided by not just interior
441	   routing, but exterior routing as well, even if only some of these
442	   routers participate in exterior routing. That creates certain
443	   problems. First of all, the amount of this information is not
444	   insignificant. Thus it places additional demand on the resources
445	   required by the routers.  Moreover, increase in the volume of routing
446	   information quite often increases routing convergence time. This, in
447	   turn, degrades the overall performance of the system.

449	   Tag switching allows complete decoupling of interior and exterior
450	   routing. With tag switching only tag switches at the border of a
451	   domain would be required to maintain routing information provided by
452	   exterior routing - all other switches within the domain would just
453	   maintain routing information provided by the domains interior routing
454	   (which is usually significantly smaller than the exterior routing
455	   information), with no "leaking" of exterior routing information into
456	   interior routing. This, in turn, reduces the routing load on non-
457	   border switches, and shortens routing convergence time.

459	   To support this functionality, tag switching allows a packet to carry
460	   not one but a set of tags, organized as a stack. A tag switch could
461	   either swap the tag at the top of the stack, or pop the stack, or
462	   swap the tag and push one or more tags into the stack.

464	   Consider a tag switch that is at the border of a routing domain. This
465	   switch maintains both exterior and interior routes. The interior
466	   routes provide routing information and tags to all the other tag
467	   switches within the domain. For each exterior route that the switch
468	   receives from some other border tag switch that is in the same domain
469	   as the local switch, the switch maintains not just a tag associated
470	   with the route, but also a tag associated with the route to that
471	   other border tag switch. Moreover, for inter-domain routing protocols
472	   that are capable of passing the "third-party" next hop information
473	   the switch would maintain a tag associated with the route to the next
474	   hop, rather than with the route to the border tag switch from whom
475	   the local switch received the exterior route.

477	   When a packet is forwarded between two (border) tag switches in
478	   different domains, the tag stack in the packet contains just one tag
479	   (associated with an exterior route). However, when a packet is
480	   forwarded within a domain, the tag stack in the packet contains not
481	   one, but two tags (the second tag is pushed by the domain's ingress
482	   border tag switch). The tag at the top of the stack provides packet
483	   forwarding to an appropriate egress border tag switch (or the
484	   "third-party" next hop), while the next tag in the stack provides
485	   correct packet forwarding at the egress switch (or at the "third-
486	   party" next hop). The stack is popped by either the egress switch (or
487	   the "third-party" next hop) or by the penultimate (with respect to
488	   the egress switch/"third-party" next hop) switch.

490	   One could observe that when tag switching is confined to a single
491	   routing domain, the above still could be used to decouple interior
492	   from exterior routing, similar to what was described above. However,
493	   in this case a border tag switch wouldn't maintain tags associated
494	   with each exterior route, and forwarding between domains would be
495	   performed at the network layer.

497	   The control component used in this scenario is fairly similar to the
498	   one used with destination-based routing. In fact, the only essential
499	   difference is that in this scenario the tag binding information is
500	   distributed both among physically adjacent tag switches, and among
501	   border tag switches within a single domain. One could also observe
502	   that the latter (distribution among border switches) could be
503	   trivially accommodated by very minor extensions to BGP.

505	   The notion of supporting hierarchy of routing knowledge with tag
506	   switching is not limited to the case of exterior/interior routing,
507	   but could be applicable to other cases where the hierarchy of routing
508	   knowledge is possible. Moreover, while the above describes only a
509	   two-level hierarchy of routing knowledge, the tag switching
510	   architecture does not impose limits on the depth of the hierarchy.

512	   In the presence of hierarchy of routing knowledge a tag switched path
513	   at the level N in the hierarchy has to have its endpoints at tag
514	   switches that are at border between the level N and (N-1) in the
515	   hierarchy (level 0 in the hierarchy corresponds to an untagged path).

517	6.3. Multicast

519	   Essential to multicast routing is the notion of spanning trees.
520	   Multicast routing procedures (e.g., PIM) are responsible for
521	   constructing such trees (with receivers as leafs), while multicast
522	   forwarding is responsible for forwarding multicast packets along such
523	   trees. Thus, to support a multicast forwarding function with tag
524	   switching we need to be able to associate a tag with a multicast
525	   tree.  The following describes the procedures for allocation and
526	   distribution of tags for multicast.

528	   When tag switching is used for multicast, it is important that tag
529	   switching be able to utilize multicast capabilities provided by the
530	   Data Link layer (e.g., multicast capabilities provided by Ethernet).
531	   To be able to do this, an (upstream) tag switch connected to a given
532	   Data Link subnetwork should use the same tag when forwarding a
533	   multicast packet to all of the (downstream) switches on that
534	   subnetwork. This way the packet will be multicasted at the Data Link
535	   layer over the subnetwork. To support this, all tag switches that are
536	   part of a given multicast tree and are on a common subnetwork must
537	   agree on a common tag that would be used for forwarding multicast
538	   packets along the tree over the subnetwork. Moreover, since multicast
539	   forwarding is based on Reverse Path Forwarding (RPF), it is crucial
540	   that, when a tag switch receives a multicast packet, a tag carried in
541	   a packet must enable the switch to identify both (a) a particular
542	   multicast group, as well as (b) the previous hop (upstream) tag
543	   switch that sent the packet.

545	   To support the requirements outlined in the previous paragraph, the
546	   tag switching architecture assumes that (a) multicast tags are
547	   associated with interfaces on a tag switch (rather than with a tag
548	   switch as a whole), (b) the tag space that a tag switch could use for
549	   allocating tags for multicast is partitioned into non-overlapping
550	   regions among all the tag switches connected to a common Data Link
551	   subnetwork, and (c) there are procedures by which tag switches that
552	   belong to a common multicast tree and are on a common Data Link
553	   subnetwork agree on the tag switch that is responsible for allocating
554	   a tag for the tree.

556	   One possible way of partitioning tag space into non-overlapping
557	   regions among tag switches connected to a common subnetwork is for
558	   each tag switch to claim a region of the space and announce this
559	   region to its neighbors. Conflicts are resolved based on the IP
560	   address of the contending switches (the higher address wins, the
561	   lower retries). Once the tag space is partitioned among tag switches,
562	   the switches may create bindings between tags and multicast trees
563	   (routes).

565	   At least in principle there are two possible ways to create bindings
566	   between tags and multicast trees (routes). With the first alternative
567	   for a set of tag switches that share a common Data Link subnetwork,
568	   the tag switch that is upstream with respect to a particular
569	   multicast tree allocates a tag (out of its own region that does not
570	   overlap with the regions of other switches on the subnetwork), binds
571	   the tag to a multicast route, and then advertises the binding to all
572	   the (downstream) switches on the subnetwork. With the second
573	   alternative, one of the tag switches that is downstream with respect
574	   to a particular multicast tree allocates a tag (out of its own region
575	   that does not overlap with the regions of other switches on the
576	   subnetwork), binds the tag to a multicast route, and then advertises
577	   the binding to all the switches (both downstream and upstream) on the
578	   subnetwork. Usually the first tag switch to join the group is the one
579	   that performs the allocation.

581	   Each of the above alternatives has its own trade-offs. The first
582	   alternative is fairly simple - one upstream router does the tag
583	   binding and multicasts the binding downstream. However, the first
584	   alternative may create uneven distribution of allocated tags, as some
585	   tag switches on a common subnetwork may have more upstream multicast
586	   sources than the others. Also, changes in topology could result in
587	   upstream neighbor changes, which in turn would require tag re-
588	   binding. Finally, one could observe that distributing tag binding
589	   from upstream towards downstream is inconsistent with the direction
590	   of multicast routing information distribution (from downstream
591	   towards upstream).

593	   The second alternative, even if more complex that the first one, has
594	   its own advantages. For one thing, it makes distribution of multicast
595	   tag binding consistent with the distribution of unicast tag binding.
596	   It also makes distribution of multicast tag binding consistent with
597	   the distribution of multicast routing information. This, in turn,
598	   allows the piggybacking of tag binding information on existing
599	   multicast routing protocols (PIM). This alternative also avoids the
600	   need for tag re-binding when there are changes in upstream neighbor.
601	   Finally it is more likely to provide more even distribution of
602	   allocated tags, as compared to the first alternative. Note that this
603	   approach does require a mechanism to choose the tag allocator from
604	   among the downstream tag switches on the subnetwork.

606	6.4. Quality of service

608	   Two mechanisms are needed for providing a range of qualities of
609	   service to packets passing through a router or a tag switch. First,
610	   we need to classify packets into different classes. Second, we need
611	   to ensure that the handling of packets is such that the appropriate
612	   QOS characteristics (bandwidth, loss, etc.) are provided to each
613	   class.

615	   Tag switching provides an easy way to mark packets as belonging to a
616	   particular class after they have been classified the first time.
617	   Initial classification could be done using configuration information
618	   (e.g., all traffic from a certain interface) or using information
619	   carried in the network layer or higher layer headers (e.g., all
620	   packets between a certain pair of hosts). A tag corresponding to the
621	   resultant class would then be applied to the packet. Tagged packets
622	   can then be efficiently handled by the tag switching routers in their
623	   path without needing to be reclassified. The actual scheduling and
624	   queueing of packets is largely orthogonal - the key point here is
625	   that tag switching enables simple logic to be used to find the state
626	   that identifies how the packet should be scheduled.

628	   Tag switching can, for example, be used to support a small number of
629	   classes of service in a service provider network (e.g. premium and
630	   standard). On frame-based media, the class can be encoded by a field
631	   in the tag header. On ATM tag switches, additional tags can be
632	   allocated to differentiate the different classes. For example, rather
633	   than having one tag for each destination prefix in the FIB, an ATM
634	   tag switch could have two tags per prefix, one to be used by premium
635	   traffic and one by standard. Thus a tag binding in this case is a
636	   triple consisting of <prefix, QOS class, tag>. Such a tag would be
637	   used both to make a forwarding decision and to make a scheduling
638	   decision, e.g., by selecting the appropriate queue in a weighted fair
639	   queueing (WFQ) scheduler.

641	   To provide a finer granularity of QOS, tag switching can be used with
642	   RSVP. We propose a simple extension to RSVP in which a tag object is
643	   defined. Such an object can be carried in an RSVP reservation message
644	   and thus associated with a session. Each tag capable router assigns a
645	   tag to the session and passes it upstream with the reservation
646	   message. Thus the association of tags with RSVP sessions works very
647	   much like the binding of tags to routes with downstream allocation.
648	   Note, however, that binding is accomplished using RSVP rather than
649	   TDP. (It would be possible to use TDP, but it is simpler to extend
650	   RSVP to carry tags and this ensures that tags and reservation
651	   information are communicated in a similar manner.)

653	   When data packets are transmitted, the first router in the path that
654	   is tag-capable applies the tag that it received from its downstream
655	   neighbor. This tag can be used at the next hop to find the
656	   corresponding reservation state, to forward and schedule the packet
657	   appropriately, and to find the suitable outgoing tag value provided
658	   by the next hop.  Note that tag imposition could also be performed at
659	   the sending host.

661	6.5. Flexible routing (explicit routes)

663	   One of the fundamental properties of destination-based routing is
664	   that the only information from a packet that is used to forward the
665	   packet is the destination address. While this property enables highly
666	   scalable routing, it also limits the ability to influence the actual
667	   paths taken by packets. This, in turn, limits the ability to evenly
668	   distribute traffic among multiple links, taking the load off highly
669	   utilized links, and shifting it towards less utilized links. For
670	   Internet Service Providers (ISPs) who support different classes of
671	   service, destination-based routing also limits their ability to
672	   segregate different classes with respect to the links used by these
673	   classes. Some of the ISPs today use Frame Relay or ATM to overcome
674	   the limitations imposed by destination-based routing. Tag switching,
675	   because of the flexible granularity of tags, is able to overcome
676	   these limitations without using either Frame Relay or ATM.

678	   Another application where destination-based routing is no longer
679	   adequate is routing with resource reservations (QOS routing).
680	   Increasing the number of ways by which a particular reservation could
681	   traverse a network may improve the success of the reservation.
682	   Increasing the number of ways, in turn, requires the ability to
683	   explore paths that are not constrained to the ones constructed solely
684	   based on destination.

686	   To provide forwarding along paths that are different from the paths
687	   determined by destination-based routing, the control component of tag
688	   switching allows installation of tag bindings in tag switches that do
689	   not correspond to the destination-based routing paths.

691	   One possible alternative for supporting explicit routes is to allow
692	   TDP to carry information about an explicit route, where such a route
693	   could be expressed as a sequence of tag switches. Another alternative
694	   is to use tag-capable RSVP (see Section 6.4) as a mechanism to
695	   distribute tag bindings, and to augment RSVP with the ability to
696	   steer the PATH message along a particular (explicit) route. Finally,
697	   it is also possible in principle to use some form of source route
698	   (e.g., SDRP, GRE) to steer RSVP PATH messages carrying tag bindings
699	   along a particular path. Note, however, that this would require a
700	   change to the way in which RSVP handles PATH messages, as it would be
701	   necessary to store the source route as part of the PATH state.

703	7. Tag Forwarding Granularities and Forwarding Equivalence Classes

705	   A conventional router has some sort of structure or set of structures
706	   which may be called a "forwarding table", which has a finite number
707	   of entries. Whenever a packet is received, the router applies a
708	   classification algorithm which maps the packet to one of the
709	   forwarding table entries. This entry specifies how to forward the
710	   packet.

712	   We can think of this classification algorithm as a means of
713	   partitioning the universe of possible packets into a finite set of
714	   "Forwarding Equivalence Classes" (FECs).

716	   Each router along a path must have some way of determining the next
717	   hop for that FEC. For a given FEC, the corresponding entry in the
718	   forwarding table may be created dynamically, by operation of the
719	   routing protocols (unicast or multicast), or it might be created by
720	   configuration, or it might be created by some combination of
721	   configuration and protocol.

723	   In tag switching, if a pair of tag switches are adjacent along a tag
724	   switched path, they must agree on an assignment of tags to FECs. Once
725	   this agreement is made, all tag switches on the tag switched path
726	   other than the first are spared the work of actually executing the
727	   classification algorithm. In fact, subsequent tag switches need not
728	   even have the code which would be necessary to do this.

730	   There are a large number of different ways in which one may choose to
731	   partition a set of packets into FECs. Some examples:

733	      1. Consider two packets to be in the same FEC if there is a single
734	      address prefix in the routing table which is the longest match for
735	      the destination address of each packet;

737	      2. Consider two packets to be in the same FEC if these packets
738	      have to traverse through a common router/tag switch;

740	      3. Consider two packets to be in the same FEC if they have the
741	      same source address and the same destination address;
742	      4. Consider two packets to be in the same FEC if they have the
743	      same source address, the same destination address, the same
744	      transport protocol, the same source port, and the same destination
745	      port.

747	      5. Consider two packets to be in the same FEC if they are alike in
748	      some arbitrary manner determined by policy. Note that the
749	      assignment of a packet to a FEC by policy need not be done solely
750	      by examining the network layer header. One might want, for
751	      example, all packets arriving over a certain interface to be
752	      classified into a single FEC, so that those packets all get
753	      tunnelled through the network to a particular exit point.

755	   Other examples can easily be thought of.

757	   In case 1, the FEC can be identified by an address prefix (as
758	   described in Section 6.1). In case 2, the FEC can be identified by
759	   the address of a tag switch (as described in Section 6.1). Both 1 and
760	   2 are useful for binding tags to unicast routes - tags are bound to
761	   FECs, and an address prefix, or an address identifies a particular
762	   FEC. Case 3 is useful for binding tags to multicast trees that are
763	   constructed by protocols such as PIM (as described in Section 6.3).
764	   Case 4 is useful for binding tags to individual flows, using, say,
765	   RSVP (as described in Section 6.4). Case 5 is useful as a way of
766	   connecting two pieces of a private network across a public backbone
767	   (without even assuming that the private network is an IP network) (as
768	   described in Section 6.5).

770	   Any number of different kinds of FEC can co-exist in a single tag
771	   switch, as long as the result is to partition the universe of packets
772	   seen by that tag switch. Likewise, the procedures which different tag
773	   switches use to classify (hitherto untagged) packets into FECs need
774	   not be identical.

776	   Networks could be organized around a hierarchy of FECs. For example,
777	   (non-adjacent) tag switches TSa and TSb may classify packets into
778	   some set of FECs FEC1,...,FECn.  However from the point of view of
779	   the intermediate tag switches between TSa and TSb, all of these FECs
780	   may be treated indistinguishably. That is, as far as the intermediate
781	   tag switches are concerned, the union of the FEC1,...,FECn is a
782	   single FEC.  Each intermediate tag switch may then prefer to use a
783	   single tag for this union (rather than maintaining individual tags
784	   for each member of this union). Tag switching accommodates this by
785	   providing a hierarchy of tags, organized in a stack.

787	   Much of the power of tag switching arises from the facts that:

789	      - there are so many different ways to partition the packets into
790	      FECs,

792	      - different tag switches can partition the hitherto untagged
793	      packets in different ways,

795	      - the route to be used for a particular FEC can be chosen in
796	      different ways,

798	      - a hierarchy of tags, organized as a stack, can be used to
799	      represent the network's hierarchy of FECs.

801	   Note that tag switching does not specify, as an element of any
802	   particular protocol, a general notion of "FEC identifier". Even if it
803	   were possible to have such a thing, there is no need for it, since
804	   there is no "one size fits all" setup protocol which works for any
805	   arbitrary combination of packet classifier and routing protocol.
806	   That's why tag distribution is sometimes done with TDP, sometimes
807	   with BGP, sometimes with PIM, sometimes with RSVP.

809	8. Tag switching with ATM

811	   Since the tag switching forwarding paradigm is based on label
812	   swapping, and since ATM forwarding is also based on label swapping,
813	   tag switching technology can readily be applied to ATM switches by
814	   implementing the control component of tag switching.

816	   The tag information needed for tag switching can be carried in the
817	   VCI field. If two levels of tagging are needed, then the VPI field
818	   could be used as well, although the size of the VPI field limits the
819	   size of networks in which this would be practical. However, for most
820	   applications of one level of tagging the VCI field is adequate.

822	   To obtain the necessary control information, the switch should be
823	   able to support the tag switching control component. Moreover, if the
824	   switch has to perform routing information aggregation, then to
825	   support destination-based unicast routing the switch should be able
826	   to perform Network Layer forwarding for some fraction of the traffic
827	   as well.

829	   Supporting the destination-based routing function with tag switching
830	   on an ATM switch may require the switch to maintain not one, but
831	   several tags associated with a route (or a group of routes with the
832	   same next hop). This is necessary to avoid the interleaving of
833	   packets which arrive from different upstream tag switches, but are
834	   sent concurrently to the same next hop.

836	   If an ATM switch has built-in mechanism(s) to suppress cell
837	   interleave, then the switch could implement the destination-based
838	   routing function precisely the way it was described in Section 6.1.
839	   This would eliminate the need to maintain several tags per route.
840	   Note, however, that suppressing cell interleave is not part of the
841	   ATM User Plane, as defined by the ATM Forum.

843	   Yet another alternative that eliminates the need to maintain several
844	   tags per route is to carry the tag information in the VPI field, and
845	   use the VCI field for identifying cells that were sent by different
846	   tag switches. Note, however, that the scalability of this alternative
847	   is constrained by the size of the VPI space (4096 tags total).
848	   Moreover, this alternative assumes that for a set of ATM tag switches
849	   that form a contiguous segment of a network topology there exists a
850	   mechanism to assign to each ATM tag switch around the edge of the
851	   segment a set of unique VCIs that would be used by this switch alone.

853	   The downstream tag allocation on demand scheme is likely to be a
854	   preferred scheme for the tag allocation and TIB maintenance
855	   procedures with ATM switches, as this scheme allows efficient use of
856	   entries in the cross-connect tables maintained by ATM switches.

858	   Implementing tag switching on an ATM switch simplifies integration of
859	   ATM switches and routers. From a routing peering point of view an ATM
860	   switch capable of tag switching would appear as a router to an
861	   adjacent router; this reduces the number of routing peers a router
862	   would have to maintain (relative to the common arrangement where a
863	   large number of routers are fully meshed over an ATM cloud). Tag
864	   switching enables better routing, as it exposes the underlying
865	   physical topology to the Network Layer routing. Finally tag switching
866	   simplifies overall operations by employing common addressing,
867	   routing, and management procedures among both routers and ATM
868	   switches. That could provide a viable, more scalable alternative to
869	   the overlay model. Because creation of tag binding is driven by
870	   control traffic, rather than data traffic, application of this
871	   approach to ATM switches does not produce high call setup rates, nor
872	   does it depend on the longevity of flows.

874	   Implementing tag switching on an ATM switch does not preclude the
875	   ability to support a traditional ATM control plane (e.g., PNNI) on
876	   the same switch. The two components, tag switching and the ATM
877	   control plane, would operate in a Ships In the Night mode (with
878	   VPI/VCI space and other resources partitioned so that the components
879	   do not interact).

881	9. Tag switching migration strategies

883	   Since tag switching is performed between a pair of adjacent tag
884	   switches, and since the tag binding information can be distributed on
885	   a pairwise basis, tag switching could be introduced in a fairly
886	   simple, incremental fashion. For example, once a pair of adjacent
887	   routers are converted into tag switches, each of the switches would
888	   tag packets destined to the other, thus enabling the other switch to
889	   use tag switching. Since tag switches use the same routing protocols
890	   as routers, the introduction of tag switches has no impact on
891	   routers. In fact, a tag switch connected to a router acts just as a
892	   router from the router's perspective.

894	   As more and more routers are upgraded to enable tag switching, the
895	   scope of functionality provided by tag switching widens. For example,
896	   once all the routers within a domain are upgraded to support tag
897	   switching, in becomes possible to start using the hierarchy of
898	   routing knowledge function.

900	10. Summary

902	   In this paper we described the tag switching technology. Tag
903	   switching is not constrained to a particular Network Layer protocol -
904	   it is a multiprotocol solution. The forwarding component of tag
905	   switching is simple enough to facilitate high performance forwarding,
906	   and may be implemented on high performance forwarding hardware such
907	   as ATM switches. The control component is flexible enough to support
908	   a wide variety of routing functions, such as destination-based
909	   routing, multicast routing, hierarchy of routing knowledge, and
910	   explicitly defined routes. By allowing a wide range of forwarding
911	   granularities that could be associated with a tag, we provide both
912	   scalable and functionally rich routing. A combination of a wide range
913	   of forwarding granularities and the ability to evolve the control
914	   component fairly independently from the forwarding component results
915	   in a solution that enables graceful introduction of new routing
916	   functionality to meet the demands of a rapidly evolving computer
917	   networking environment.

919	11. Security Considerations

921	   Security considerations are not addressed in this document.

923	12. Intellectual Property Considerations

925	   Cisco Systems may seek patent or other intellectual property
926	   protection for some or all of the technologies disclosed in this
927	   document. If any standards arising from this document are or become
928	   protected by one or more patents assigned to Cisco Systems, Cisco
929	   intends to disclose those patents and license them under openly
930	   specified and non-discriminatory terms, for no fee.

932	13. Acknowledgments

934	   Significant contributions to this work have been made by Anthony
935	   Alles, Fred Baker, Paul Doolan, Guy Fedorkow, Jeremy Lawrence, Arthur
936	   Lin, Morgan Littlewood, Keith McCloghrie, and Dan Tappan.

938	14. References

940	15. Authors' Addresses

942	      Yakov Rekhter
943	      Cisco Systems, Inc.
944	      170 Tasman Drive
945	      San Jose, CA, 95134
946	      E-mail: yakov@cisco.com

948	      Bruce Davie
949	      Cisco Systems, Inc.
950	      250 Apollo Drive
951	      Chelmsford, MA, 01824
952	      E-mail: bsd@cisco.com

954	      Dave Katz
955	      Juniper Networks
956	      3260 Jay Street
957	      Santa Clara, CA 95051
958	      E-mail: dkatz@jnx.com

960	      Eric Rosen
961	      Cisco Systems, Inc.
962	      250 Apollo Drive
963	      Chelmsford, MA, 01824
964	      E-mail: erosen@cisco.com

966	      George Swallow
967	      Cisco Systems, Inc.
968	      250 Apollo Drive
969	      Chelmsford, MA, 01824
970	      E-mail: swallow@cisco.com

972	      Dino Farinacci
973	      Cisco Systems, Inc.
974	      170 West Tasman Drive
975	      San Jose, CA 95134
976	      E-mail: dino@cisco.com