[Isis-wg] Clarification needed on domain-wide prefix distribution
Quaizar Vohra
qv@juniper.net
Tue, 5 Oct 1999 12:57:16 -0700 (PDT)
>
> Let's steer the recent threads a little bit here:
>
> 1. draft-ietf-isis-domain-wide solves one single problem for deployments that don't have the need to migrate towards
> TE extensions or don't want a "flag-day" solution but wnat to preserve 1195 compatibility. So discussion about TE
> draft and this draft are orthogonal.
> 2. I'm attaching the last version of the domain-wide draft to this mail that Henk didn't post yet so people can get on
> the same page. This is _not_ an official draft but a floating version between authors that was stable now for a
> while. Route preferences are clearly stated Henk, please submit the new version ASAP. Thank you.
> 3. I'm against using bit 8 in draft-ietf-isis-domain-wide since
this will break 1195.
It breaks rfc 1195 as much as setting the I/E bit in TLV 128.
> 4. the numbering discussion is getting very confusing and I definitely got suspicious after a 9th and higher type of
> routes type started to show up. There are 2 levels of 2 TLVs (128 and 130) with 2 bit combinations (I/E = 0 or
> 1) so maximum number of different route types 2^3 = 8 and from there encoding forces you to not differentiate
> things anymore! I would strongly suggest to align the terminology to something saying Level:TLV Type:Bit like
> written in the attached draft to clarify the discussion, otherwise we'll name the same bit combinations with many
> different names (and if we do, let's update the table in the draft so one can translate easily). With preferences,
> Henk was pretty much on the point except that I think that the fact was missed that L2,128,Internal routes MUST be
> preffered over anything level 1 external (of course only of interest for L1L2 and L2-only routers), otherwise we
> have external metric types overriding the internal ones and that has to lead to routing loops. I'd suggest to refer
> to the figure specifying the prefernce across the 8 types and let's discuss it from there. Route preference is
> absolutely critical (as are the 3 rules) so let's get that one
> nailed clearly.
Well if one starts distinguishing routes which are leaked down, i.e.
from L2 to L1, you end up getting 4 more types of routes. The domain
wide draft tries to create the leaking down distinction by
screwing external vs internal route distinction. Usage of 8th bit
does that more cleanly by not obliterating old semantics.
The route preference I had come up with was exactly in agreement to
what you are saying, that internal routes be preferred over external
routes.
Quaizar
> 5. critique on OSPF does not contribute to the quality of the discussions on this list :-)
>
> --
> thanks
>
> --- tony
>
>
>
>
> Network Working Group Tony Li
> INTERNET DRAFT Juniper Networks
>
> Tony Przygienda
> Bell Labs
>
> Henk Smit
> Cisco Systems
> April 1999
>
>
> Domain-wide Prefix Distribution with Multi-Level IS-IS
>
> <draft-ietf-isis-domain-wide-01.txt>
>
>
> Status
>
> This document is an Internet-Draft and is in full conformance with
> all provisions of Section 10 of RFC2026.
>
> Internet-Drafts are working documents of the Internet Engineering
> Task Force (IETF), its areas, and its working groups. Note that
> other groups may also distribute working documents as Internet-
> Drafts.
>
> Internet-Drafts are draft documents valid for a maximum of six months
> and may be updated, replaced, or obsoleted by other documents at any
> time. It is inappropriate to use Internet- Drafts as reference
> material or to cite them other than as "work in progress."
>
> The list of current Internet-Drafts can be accessed at
> http://www.ietf.org/ietf/1id-abstracts.txt
>
> The list of Internet-Draft Shadow Directories can be accessed at
> http://www.ietf.org/shadow.html.
>
>
> 1.0 Abstract
>
> This document describes extensions to the IS-IS protocol to support
> optimal routing within a multi-level domain. The IS-IS protocol is
> specified in ISO 10589 [1], with extensions for supporting IPv4
> specified in RFC 1195 [2].
>
> This document extends the semantics presented in RFC 1195 so that a
> routing domain running with both Level 1 and Level 2 Intermediate
> Systems (IS) [routers] can distribute IP prefixes between Level 1 and
> Level 2 and vice versa. This distribution requires certain
> restrictions to insure that persistent forwarding loops do not form.
> The goal of this domain-wide prefix distribution is to increase the
> granularity of the routing information within the domain.
>
>
> 2.0 Introduction
>
> An IS-IS routing domain (a.k.a., an autonomous system running IS-IS)
> can be partitioned into multiple level 1 (L1) areas, and a level 2
> (L2) connected subset of the topology that interconnects all of the
> L1 areas. Within each L1 area, all routers exchange link state
> information. L2 routers also exchange L2 link state information to
> compute routes between areas.
>
> RFC 1195 [2] defines the Type, Length and Value (TLV) tuples that are
> used to transport IPv4 routing information in IS-IS. RFC 1195 also
> specifies the semantics and procedures for interactions between
> levels. Specifically, routers in a L1 area will exchange information
> within the L1 area. For IP destinations not found in the prefixes in
> the L1 database, the L1 router should forward packets to the nearest
> router that is in both L1 and L2 (i.e., an L1L2 router) with the
> 'attach' bit set in its L1 Link State Protocol Data Unit (LSP).
>
> Also per RFC 1195, an L1L2 router should be manually configured with
> a set of prefixes that summarize the IP prefixes found in that L1
> area. These summaries are injected into L2. RFC 1195 specifies no
> further interactions between L1 and L2 for IPv4 prefixes.
>
>
> 2.1 Motivations for domain-wide prefix distribution
>
> The mechanisms specified in RFC 1195 are appropriate in many
> situations, and lead to excellent scalability properties. However,
> in certain circumstances, the domain administrator may wish to
> sacrifice some amount of scalability and distribute more specific
> information than is described by RFC 1195. This section discusses
> the various reasons why the domain administrator may wish to make
> such a tradeoff.
>
> One major reason for distributing more prefix information is to
> improve the quality of the resulting routes. A well know property of
> prefix summarization or any abstraction mechanism is that it
> necessarily results in a loss of information. This loss of
> information in turn results in the computation of a route based upon
> less information, which will frequently result in routes that are not
> optimal.
>
> A simple example can serve to demonstrate this adequately. Suppose
> that a L1 area has two L1L2 routers that both advertise a single
> summary of all prefixes within the L1 area. To reach a destination
> inside the L1 area, any other L2 router is going to compute the
> shortest path to one of the two L1L2 routers for that area. Suppose,
> for example, that both of the L1L2 routers are equidistant from the
> L2 source, and that the L2 source arbitrarily selects one L1L2
> router. This router may not be the optimal router when viewed from
> the L1 topology. In fact, it may be the case that the path from the
> selected L1L2 router to the destination router may traverse the L1L2
> router that was not selected. If more detailed topological
> information or more detailed metric information was available to the
> L2 source router, it could make a more optimal route computation.
>
> This situation is symmetric in that an L1 router has no information
> about prefixes in L2 or within a different L1 area. In using the
> nearest L1L2 router, that L1L2 is effectively injecting a default
> route without metric information into the L1 area. The route
> computation that the L1 router performs is similarly suboptimal.
>
> Besides the optimality of the routes computed, there two other
> significant drivers for the domain wide distribution of prefix
> information.
>
> When a router learns multiple possible paths to external destinations
> via BGP, it will select only one of those routes to be installed in
> the forwarding table. One of the factors in the BGP route selection
> is the IGP cost to the BGP next hop address. Many ISP networks
> depend on this technique, which is known as "shortest exit routing".
> If a L1 router does not know the exact IGP metric to all BGP speakers
> in other L1 areas, it cannot do effective shortest exit routing.
>
> The third driver is the current practice of using the IGP (IS-IS)
> metric as part of the BGP Multi-Exit Discriminator (MED). The value
> in the MED is advertised to other domains and is used to inform other
> domains of the optimal entry point into the current domain. Current
> practice is to take the IS-IS metric and insert it as the MED value.
> This tends to cause external traffic to enter the domain at the point
> closest to the exit router. Note that the receiving domain may,
> based upon policy, choose to ignore the MED that is advertised.
> However, current practice is to distribute the IGP metric in this way
> in order to optimize routing wherever possible. This is possible in
> current networks that only are a single area, but becomes problematic
> if hierarchy is to be installed into the network. This is again
> because the loss of end-to-end metric information means that the MED
> value will not reflect the true distance across the advertising
> domain. Full distribution of prefix information within the domain
> would alleviate this problem as it would allow accurate computation
> of the IS-IS metric across the domain, resulting in an accurate value
> presented in the MED.
>
>
> 2.2 Scalability
>
> The disadvantage to performing the domain-wide prefix distribution
> described above is that it has an impact to the scalability of IS-IS.
> Areas within IS-IS help scalability in that LSPs are contained within
> a single area. This limits the size of the link state database, that
> in turn limits the complexity of the shortest path computation.
>
> Further, the summarization of the prefix information aids scalability
> in that the abstraction of the prefix information removes the sheer
> number of data items to be transported and the number of routes to be
> computed.
>
> It should be noted quite strongly that the distribution of prefixes
> on a domain wide basis impacts the scalability of IS-IS in the second
> respect. It will increase the number of prefixes throughout the
> domain. This will result in increased memory consumption,
> transmission requirements and computation requirements throughout the
> domain.
>
> It must also be noted that the domain-wide distribution of prefixes
> has no effect whatsoever on the first aspect of scalability, namely
> the existence of areas and the limitation of the distribution of the
> link state database.
>
> Thus, the net result is that the introduction of domain-wide prefix
> distribution into a formerly flat, single area network is a clear
> benefit to the scalability of that network. However, it is a
> compromise and does not provide the maximum scalability available
> with IS-IS. Domains that choose to make use of this facility should
> be aware of the tradeoff that they are making between scalability and
> optimality and provision and monitor their networks accordingly.
> Normal provisioning guidelines that would apply to a fully
> hierarchical deployment of IS-IS will not apply to this type of
> configuration.
>
>
> 3.0 New semantics for external type metrics
>
> RFC 1195 defines two TLVs for carrying IP prefixes. TLV 128 is
> defined to carry 'internal' prefixes and TLV 130 is defined to carry
> 'external' prefixes. The original intent of RFC 1195 was to carry
> intra-domain routes within the internal prefix TLV and inter-domain
> routes or intra-domain routes from alternate IGPs in an external
> prefix TLV. Interestingly, TLV type 130 is not documented to exist
> in Level 1 LSPs.
>
> In addition to this distinction, RFC 1195 provides for a bit in each
> of these TLVs that distinguishes between an internal metric type and
> an external metric type. Similarly, the clear intent was that the
> internal metric type should reflect a total metric that is the sum of
> the metrics to the advertising router plus the metric to the prefix.
> Further, for an external metric type, the total metric should simply
> be the metric advertised to the prefix, not including the total
> metric necessary to reach the exit router. Prefixes with internal
> metrics are always preferred over external metrics, regardless of the
> value of the metrics.
>
> It should be noted that the combination of an internal prefix with an
> external metric type is not obviously useful, and is not allowed by
> RFC 1195.
>
> It should also be noted that as of this writing, the author knows of
> no deployed implementations that make use of either the external
> prefix or the external metric type. The implication is that this
> proposal is free to redefine the semantics of the external metric
> type bit without conflicting with existing protocol deployment.
>
> An essential property when redistributing prefixes between levels is
> to insure that no persistent loops form in the distribution of
> information (i.e., a routing loop), as this would lead to the
> indefinite propagation of the information, even in the event that the
> information was no longer originated by some system in the domain.
> Further, a routing loop is likely to form a forwarding loop, where
> actual traffic traverses the network in a cycle in the topology.
> Forwarding loops are known to consume large amounts of resources and
> are to be avoided.
>
>
> 3.1 Proposed semantics for inter-area routes
>
> To provide the above properties, this proposal defines the following
> syntax and semantics. The encoding is being extended compared to
> RFC1195 and is almost symmetric.
>
> An intra-area route is a route computed based on a prefix advertised
> by some IS-IS router in the area. Thus, a prefix advertised in the
> L1 link state database may become a L1 intra-area route within the
> area of the advertiser. Similarly, a prefix advertised in the L2
> link state database may become a L2 intra-area route within L2.
> Prefixes associated with an intra-area route are also said to be
> intra-area prefixes. These types of prefixes are being encoded in the
> same way as in RFC1195 within TLV 128 using internal metric types.
>
> An inter-area route is a route computed based on a prefix advertised
> by an IS-IS router not in the local area. Inter-area routes exist
> either in L2, in which case they are L1->L2 inter-area routes, or in
> L1, in which case they are L2->L1 inter-area routes. Prefixes
> associated with an inter-area route area also said to be inter-area
> prefixes. Such prefixes are encoded within L1 within TLV 128 using
> external metric type which is an extension to RFC 1195. Within L2
> these prefixes are indistinguishable from L2 intra-area routes and
> are therefore encoded within TLV 128 with internal metrics.
>
> External prefixes are reserved for prefixes originating outside of
> the IS-IS system, usually learned from another routing protocol. To
> allow for leaking of routes originating outside the ISIS domain
> within L1, TLV 130 is allowed in L1 with both internal and external
> metric types. Observe that introducing externals in L1 makes sense
> since those can be leaked into L2 and into other L1 areas.
>
> The following tables describe all types of prefixes now defined
> within IS-IS and how they are encoded:
>
> Level-1 LSPs | Internal TLV (128) | External TLV (130)
> ----------------------------------------------------------------------
> Internal metric-type | L1 intra-area | external |
> ----------------------------------------------------------------------
> External metric-type | L2->L1 inter-area | external |
> ----------------------------------------------------------------------
>
>
> Level-2 LSPs | Internal TLV (128) | External TLV (130)
> ----------------------------------------------------------------------
> Internal metric-type | L2 intra-area or | external |
> | L1->L2 inter-area | |
> ----------------------------------------------------------------------
> External metric-type | should not exist | external |
> ----------------------------------------------------------------------
>
> Based on these definitions and encodings, this proposal defines the
> following redistribution rules:
>
> 1) Only L1 intra-area prefixes and external prefixes are
> redistributed from L1 into L2.
>
> 2) All prefixes can be redistributed from L2 into L1 and become
> L2->L1 inter-area routes. All prefixes can be redistributed from L2
> into L1 and become L2->L1 inter-area routes. A L2 prefix must not be
> redistributed into a L1 area if that same prefix exists as L1 intra
> area route or L1 external route to prevent routing loops.
>
> 3) Within L1, an intra-area prefix is preferred over an inter-area
> prefix, regardless of the comparison of the metrics.
>
> Based on these rules, we first observe that this proposal is free
> from routing loops. No prefix can be redistributed from L2 to L1 and
> back into L2, because the route first becomes an L1 inter-area prefix
> by rule (2) and by rule (1) cannot be redistributed into L2.
> Similarly, a prefix redistributed from L1 to L2 becomes an L2 inter-
> area prefix by rule (1) but will not be redistributed into the
> original L1 area by rule (2).
>
> Even when following all the indicated rules, there is the possibility
> of a transient routing loop when the original prefix is withdrawn and
> the inter-area prefix is selected. However, all link state protocols
> are subject to transient routing loops, so this is no worse than the
> status quo.
>
> Note that this proposal is not radically different than the current
> semantics for RFC 1195: internal metric types are always preferred
> over externals, so rule (3) is an extension that allows external
> metric types in internal prefix TLVs. It does not introduce a new
> comparison between internal and external metric values. Expressed in
> a more abstract way by introducing a relationship denoted by '>' that
> stands for 'is preffered over', all the possible routes can be
> ordered unambiguosly, starting with intra-area L1 route and ending
> with external L2 route with external metric. The routes in the
> following figures are denoted in the notation <level:tlv:value of I/E
> bit>.
>
>
> <-- more preffered ---
>
> L1:128:I>L2:128:I>L1:128:E>L1:130:I>L2:130:I>L1:130:E>L2:130:E
>
> --- less preffered -->
>
>
> Rule (2) asserts that a route cannot override an already present,
> more preffered route in a L1 area when being leaked into it.
>
>
> 3.2 Transition issues
>
> Because no implementations currently make use of the external metric
> type, the deployment of prefixes with an external metric type is
> somewhat problematic. There is the possibility that the new type of
> advertisement may result in software instability in systems that do
> not deal with even the original semantics correctly. Further, there
> is a danger that haphazard deployment of systems supporting this
> proposal and legacy systems would have an unfortunate interaction.
> It is required, for any L1 area that should perform the mutual
> redistribution described in this proposal, that the L1L2 systems be
> updated first. If these systems operate correctly, this is
> sufficient to ensure that there are no persistent routing loops. In
> case where not all L1L2 systems are being upgraded, persistent
> routing loops are possible. Consider the following figure that gives
> an example of such a looping scenario with a setup where (A) has been
> upgraded and (B) has not.
>
>
>
>
>
>
>
>
>
>
>
>
> route through L2
> to 1/8 at cost 200
> |
> | +-- L2 link cost 1 --+
> | | |
> | computes 1/8 |
> | at 64 through (B) |
> [1] | |
> V V ^
> +--+--------+ +-----+-----+
> | new style | | old style |
> (A) | L1/L2 | | L1/L2 | (B)
> | leaks 1/8 | | leaks up |
> | at 63 Ext | | at 63 Ext |
> +----+------+ +-------+---+
> V ^
> | |
> | computes L1 route
> | 1/8 at cost 128
> | |
> +---- L1 link with cost 1 -+
>
>
> Originally a prefix 1/8 in L2 with a cost of 200 is being computed by
> upgraded L1L2 router (A) as best route towards 1/8 through interface
> 1. The prefix leaks at maximum cost of 63 (and with the I/E bit being
> set) into L1 domain and is used by L1L2 router (B) (which has not
> been upgraded) to compute best route to 1/8 at cost 128 in L1. We
> assume that (B) is not masking the I/E bit out but is using it as
> part of the metric, however the scenario holds as well in case (B)
> believes the metric to be 63. This L1 route will be obviously
> preferred by (B) to a computed L2 route. Assuming that (B) leaks 1/8
> into L2 domain again, (A) will use it for another L2 computation that
> ends up with a shorter L2 route to 1/8 through (B) and probably leaks
> it again into L1. Hence, a routing and forwarding loop has been
> formed.
>
> As described in the previous section, rule (2) must be followed to
> prevent looping when this extension is deployed using L1 routers
> understanding the semantics of the L1 inter-area route mixed with L1
> routers that treat all routes within 128 TLV as intra-area. The
> following example visualizes a forwarding loop encountered in a
> scenario where (A) leaks an L1 inter-area route despite the fact that
> (D) advertises the same route as L1 intra-area.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> route through L2 to
> 1/8 at cost 200
> /
> /
> v
> +============+
> | L1/L2 | routing table for 1/8
> | leaking | (top is active route)
> (A) | 1/8 down | L1 at 131 active
> | with cost | L2 at 200
> | of 63 as | L1 inter-area at 127
> | inter-area |
> +====+====+-=+
> | \
> | \
> | \ cost 1
> | \
> cost 1 | +-+--+ routing table
> | (C) | L1 | inter-area L1 at 128 active
> | +---++ L1 at 130
> | \
> | \
> | path with
> | total cost
> routing table +---+-+ of 130
> inter-area L1 | L1 | (B) \
> at 128 active +---+-+ \
> L1 at 131 | \
> | \
> | +-+--+
> +- path with -------+ L1 | advertises
> total cost +----+ 1/8 as L1
> of 131 (D) attached
> prefix
>
> (A) is an upgraded L1L2 router and leaks into L1 a prefix 1/8 that it
> computed through L2 at the maximum cost of 127 (or expressed
> differently, as inter-area with cost 63) which violates rule (2).
> Router (B), (C), (D) are all purely RFC1195 compliant routers so they
> perceive the leaked prefix as internal L1. At the same time, (D)
> advertises the same prefix 1/8 as L1 directly attached subnet into
> L1. To distinguish the different copies, the leaked prefix is shown
> as inter-area L1. (B) computes the inter-area L1 route at a cost of
> 127+1 and prefers it to the one through (D) since such a router has a
> cost of 131 and (B) does not understand that L1 intra-area routes
> should be always preffered over L1 inter-area routes. Therefore (B)
> forwards a packet to 1/8 towards (A). (A) cannot prefer the inter-
> area L1 route since it could not really forward using it but has to
> use L2 to get the packet into the L2 backbone. However, it must
> prefer L1 computed to (D) over L2 route. Hence, (A) forwards the
> packet towards (C). (C) has the L1 inter-area as preferred (since it
> looks to it like a cheaper L1 intra-area route to 1/8 than the L1
> route through (D)) and forwards the packet back to (A).
>
> Finally, it should be obvious that if a deployment uses L1 external
> routes in an area that still contains strictly RFC1195 compliant
> routers, forwarding loops can be easily formed. Routers that are able
> to interpret TLV 130 in L1 may try to forward towards the announcer
> of such external routes whereas RFC1195 compliant routers will always
> forward towards the closest, connected L2 capable router.
>
>
> 4.0 Comparisons with other proposals
>
> Another proposal is currently being discussed which is similar to
> this one in nature.
>
> In [3], a new TLV is proposed to transport IP prefix information.
> Because this is a new TLV, it is somewhat harder to deploy, requiring
> that all systems understand the new TLV before it can become
> effective. For this reason, this proposal provides an alternative
> that can be deployed sooner. There is no effective semantic
> difference between the two proposals. In [3], a bit is defined to
> mark a prefix as 'up' or 'down'. This is essentially the same
> semantics as is proposed here.
>
> 5.0 Security Considerations
>
> This document raises no new security issues for IS-IS.
>
> 6.0 References
>
> [1] ISO 10589, "Intermediate System to Intermediate System Intra-
> Domain Routeing Exchange Protocol for use in Conjunction with the
> Protocol for Providing the Connectionless-mode Network Service (ISO
> 8473)" [Also republished as RFC 1142]
>
> [2] RFC 1195, "Use of OSI IS-IS for routing in TCP/IP and dual
> environments", R.W. Callon, Dec. 1990
>
> [3] Smit, H., Li, T. "IS-IS extensions for Traffic Engineering",
> draft-ietf-isis-traffic-00.txt, work in progress
>
> 7.0 Authors' Addresses
>
> Tony Li
> Juniper Networks, Inc.
> 385 Ravendale Dr.
> Mountain View, CA 94043
> Email: tli@juniper.net
> Fax: +1 650 526 8001
> Voice: +1 650 526 8006
>
> Tony Przygienda
> Bell Labs, Lucent Technologies
> 101 Crawfords Corner Road
> Holmdel, NJ 07733-3030
> Email: prz@dnrc.bell-labs.com
>
> Henk Smit
> Cisco Systems, Inc.
> 210 West Tasman Drive
> San Jose, CA 95134
> Email: hsmit@cisco.com
> Voice: +31 20 342 3736
>
>
>
>
>