North-Bound
Distribution of Link-State and TE Information using BGPJuniper Networks, Inc.1194 N. Mathilda Ave.SunnyvaleCA94089UShannes@juniper.netCisco Systems, Inc.170, West Tasman DriveSan JoseCA95134USjmedved@cisco.comCisco Systems, Inc.Via Del Serafico, 200Rome00142Italysprevidi@cisco.comJuniper Networks, Inc.1194 N. Mathilda Ave.SunnyvaleCA94089USafarrel@juniper.netCisco Systems, Inc.170, West Tasman DriveSan JoseCA95134USsairay@cisco.comInter-Domain RoutingIn a number of environments, a component external to a network is
called upon to perform computations based on the network topology and
current state of the connections within the network, including traffic
engineering information. This is information typically distributed by IGP
routing protocols within the network.This document describes a mechanism by which links state and traffic
engineering information can be collected from networks and shared with
external components using the BGP routing protocol. This is achieved using
a new BGP Network Layer Reachability Information (NLRI) encoding
format. The mechanism is applicable to physical and virtual IGP links. The
mechanism described is subject to policy control.Applications of this technique include Application Layer Traffic
Optimization (ALTO) servers, and Path Computation Elements (PCEs).The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.The contents of a Link State Database (LSDB) or a Traffic Engineering
Database (TED) has the scope of an IGP area. Some applications, such as
end-to-end Traffic Engineering (TE), would benefit from visibility outside
one area or Autonomous System (AS) in order to make better decisions.The IETF has defined the Path Computation Element (PCE) as a mechanism for achieving the computation of
end-to-end TE paths that cross the visibility of more than one TED or
which require CPU-intensive or coordinated computations. The IETF has also
defined the ALTO Server as an entity that
generates an abstracted network topology and provides it to network-aware
applications.Both a PCE and an ALTO Server need to gather information about the
topologies and capabilities of the network in order to be able to fulfill
their function.This document describes a mechanism by which Link State and TE
information can be collected from networks and shared with external
components using the BGP routing protocol . This is achieved using a new BGP Network Layer
Reachability Information (NLRI) encoding format. The mechanism is
applicable to physical and virtual links. The mechanism described is
subject to policy control.A router maintains one or more databases for storing link-state
information about nodes and links in any given area. Link attributes
stored in these databases include: local/remote IP addresses, local/remote
interface identifiers, link metric and TE metric, link bandwidth,
reservable bandwidth, per CoS class reservation state, preemption and
Shared Risk Link Groups (SRLG). The router's BGP process can retrieve
topology from these LSDBs and distribute it to a consumer, either directly
or via a peer BGP Speaker (typically a dedicated Route Reflector), using
the encoding specified in this document.The collection of Link State and TE link state information and its
distribution to consumers is shown in the following figure.A BGP Speaker may apply configurable policy to the information that it
distributes. Thus, it may distribute the real physical topology from the
LSDB or the TED. Alternatively, it may create an abstracted topology,
where virtual, aggregated nodes are connected by virtual paths. Aggregated
nodes can be created, for example, out of multiple routers in a
POP. Abstracted topology can also be a mix of physical and virtual nodes
and physical and virtual links. Furthermore, the BGP Speaker can apply
policy to determine when information is updated to the consumer so that
there is reduction of information flow from the network to the
consumers. Mechanisms through which topologies can be aggregated or
virtualized are outside the scope of this documentThis section describes use cases from which the requirements can be
derived.As described in a PCE can be used to
compute MPLS-TE paths within a "domain" (such as an IGP area) or
across multiple domains (such as a multi-area AS, or multiple ASes).
Within a single area, the PCE offers enhanced computational power
that may not be available on individual routers, sophisticated policy
control and algorithms, and coordination of computation across the
whole area.If a router wants to compute a MPLS-TE path across IGP areas, then its
own TED lacks visibility of the complete topology. That means that the
router cannot determine the end-to-end path, and cannot even select
the right exit router (Area Border Router - ABR) for an optimal
path. This is an issue for large-scale networks that need to segment
their core networks into distinct areas, but still want to take
advantage of MPLS-TE.Previous solutions used per-domain path computation . The source router could only compute the path
for the first area because the router only has full topological
visibility for the first area along the path, but not for subsequent
areas. Per-domain path computation uses a technique called
"loose-hop-expansion" , and selects the
exit ABR and other ABRs or AS Border Routers (ASBRs) using the IGP
computed shortest path topology for the remainder of the path. This may
lead to sub-optimal paths, makes alternate/back-up path computation
hard, and might result in no TE path being found when one really does
exist.The PCE presents a computation server that may have visibility into
more than one IGP area or AS, or may cooperate with other PCEs to
perform distributed path computation. The PCE obviously needs access to
the TED for the area(s) it serves, but
does not describe how this is achieved. Many implementations make the
PCE a passive participant in the IGP so that it can learn the latest
state of the network, but this may be sub-optimal when the network is
subject to a high degree of churn, or when the PCE is responsible for
multiple areas.The following figure shows how a PCE can get its TED information
using the mechanism described in this document.The mechanism in this document allows the necessary TED information
to be collected from the IGP within the network, filtered according to
configurable policy, and distributed to the PCE as necessary.An ALTO Server is an entity that
generates an abstracted network topology and provides it to
network-aware applications over a web service based API. Example
applications are p2p clients or trackers, or CDNs. The abstracted
network topology comes in the form of two maps: a Network Map that
specifies allocation of prefixes to Partition Identifiers (PIDs), and a
Cost Map that specifies the cost between PIDs listed in the Network
Map. For more details, see .ALTO abstract network topologies can be auto-generated from the
physical topology of the underlying network. The generation would
typically be based on policies and rules set by the operator. Both
prefix and TE data are required: prefix data is required to generate
ALTO Network Maps, TE (topology) data is required to generate ALTO Cost
Maps. Prefix data is carried and originated in BGP, TE data is
originated and carried in an IGP. The mechanism defined in this document
provides a single interface through which an ALTO Server can retrieve
all the necessary prefix and network topology data from the underlying
network. Note an ALTO Server can use other mechanisms to get network
data, for example, peering with multiple IGP and BGP Speakers.The following figure shows how an ALTO Server can get network
topology information from the underlying network using the mechanism
described in this document.This specification contains two parts: definition of a new BGP NLRI
that describes links, nodes and prefixes comprising IGP link state
information, and definition of a new BGP path attribute (BGP-LS attribute)
that carries link, node and prefix properties and attributes, such as the
link and prefix metric or auxiliary Router-IDs of nodes, etc.It is desired to keep the dependencies on the protocol source
of this attributes to a minimum and represent any content in an IGP
neutral way, such that applications which do want to learn
about a Link-state topology do not need to know about any OSPF
or IS-IS protocol specifics.
Information in the new Link-State NLRIs and attributes is encoded in
Type/Length/Value triplets. The TLV format is shown in .The Length field defines the length of the value portion in octets
(thus a TLV with no value portion would have a length of zero). The TLV
is not padded to four-octet alignment. Unrecognized types are preserved
and propagated. In order to compare NLRIs with unknown TLVs all TLVs
MUST be ordered in ascending order by TLV Type. If there are more TLVs of the same
type, then the TLVs MUST be ordered in ascending order of the TLV value
within the TLVs with the same type. All TLVs that are not
specified as mandatory are considered optional.
The MP_REACH_NLRI and MP_UNREACH_NLRI attributes are BGP's containers for
carrying opaque information. Each Link-State NLRI describes either a
node, a link or a prefix.All non-VPN link, node and prefix information SHALL be encoded using
AFI 16388 / SAFI 71. VPN link, node and prefix information SHALL be
encoded using AFI 16388 / SAFI TBD.In order for two BGP speakers to exchange Link-State NLRI, they MUST
use BGP Capabilities Advertisement to ensure that they both are capable
of properly processing such NLRI. This is done as specified in , by using capability code 1 (multi-protocol
BGP), with an AFI 16388 / SAFI 71 and AFI 16388 / SAFI TBD for the VPN
flavor.The format of the Link-State NLRI is shown in the following
figure.The 'Total NLRI Length' field contains the cumulative
length, in octets, of rest of the NLRI not including the NLRI
Type field or itself. For VPN applications, it also includes
the length of the Route Distinguisher.TypeNLRI Type1Node NLRI2Link NLRI3IPv4 Topology Prefix NLRI4IPv6 Topology Prefix NLRIThe Node NLRI (NLRI Type = 1) is shown in the following figure.The Link NLRI (NLRI Type = 2) is shown in the following figure.The IPv4 and IPv6 Prefix NLRIs (NLRI Type = 3 and Type = 4) use the
same format as shown in the following figure.The 'Protocol-ID' field can contain one of the following values:Protocol-IDNLRI information source protocol1IS-IS Level 12IS-IS Level 23OSPFv24Direct5Static configuration6OSPFv3The 'Direct' and 'Static' protocol types SHOULD be used
when BGP-LS is sourcing local information. For all
information, derived from other protocols the corresponding
protocol-ID MUST be used. If BGP-LS has got
direct access to interface information and wants to
advertise a local link then the protocol-ID
'Direct' SHOULD be used. For modeling virtual links,
like described in
the protocol-ID 'Static configuration' SHOULD be used.
Both OSPF and IS-IS MAY run multiple routing protocol instances over
the same link. See and . These instances define independent "routing
universes". The 64-Bit 'Identifier' field is used to identify the
"routing universe" where the NLRI belongs. The NLRIs representing Link-state
objects (nodes, links or prefixes) from the same routing universe MUST
have the same 'Identifier' value; NLRIs with different 'Identifier'
values MUST be considered to be from different routing universes. Table
lists the 'Identifier'
values that are defined as well-known in this draft.
IdentifierRouting Universe0Default Layer 3 Routing topology1-31Reserved for future useIf a given Protocol does not support multiple routing
universes then it SHOULD set the 'Identifier' field according to
. However an
implementation MAY make the 'Identifier' configurable,
for a given protocol.
Each Node Descriptor and Link Descriptor consists of one or more TLVs
described in the following sections.
Each link is anchored by a pair of Router-IDs that are used by the
underlying IGP, namely, 48 Bit ISO System-ID for IS-IS and 32 bit
Router-ID for OSPFv2 and OSPFv3. An IGP may use one or more additional
auxiliary Router-IDs, mainly for traffic engineering purposes. For
example, IS-IS may have one or more IPv4 and IPv6 TE Router-IDs , . These auxiliary
Router-IDs MUST be included in the link attribute described in Section
.
It is desirable that the Router-ID assignments inside the Node
Descriptor are globally unique. However there may be Router-ID spaces
(e.g. ISO) where no global registry exists, or worse, Router-IDs have
been allocated following private-IP RFC
1918 allocation. We use Autonomous System (AS) Number and
BGP-LS Identifier
in order to disambiguate the Router-IDs, as
described in .One problem that needs to be addressed is the ability to identify
an IGP node globally (by "global", we mean within the BGP-LS
database collected by all BGP-LS speakers that talk to each other).
This can be expressed through the following two requirements:(A) The same node must not be represented by two keys (otherwise
one node will look like two nodes).(B) Two different nodes must not be represented by the same key
(otherwise, two nodes will look like one node).We define an "IGP domain" to be the set of nodes (hence, by
extension links and prefixes), within which, each node has a unique
IGP representation by using the combination of Area-ID, Router-ID,
Protocol, Topology-ID, and Instance ID. The problem is that BGP may
receive node/link/prefix information from multiple independent "IGP
domains" and we need to distinguish between them. Moreover, we
can't assume there is always one and only one IGP domain per
AS. During IGP transitions it may happen that two redundant IGPs are
in place.In section a set of
sub-TLVs is described, which allows specification of a flexible key for
any given Node/Link information such that global uniqueness of the
NLRI is ensured.
The Local Node Descriptors TLV contains Node Descriptors for the
node anchoring the local end of the link. This is a mandatory TLV in
all three types of NLRIs. The length of this TLV is variable. The
value contains one or more Node Descriptor Sub-TLVs defined in .The Remote Node Descriptors contains Node Descriptors for the
node anchoring the remote end of the link. This is a mandatory TLV
for link NLRIs. The length of this TLV is variable. The value
contains one or more Node Descriptor Sub-TLVs defined in .The Node Descriptor Sub-TLV type codepoints and lengths are
listed in the following table:Sub-TLV Code PointDescriptionLength512Autonomous System4513BGP-LS Identifier4514OSPF Area-ID4515IGP Router-IDVariableThe sub-TLV values in Node Descriptor TLVs are defined as
follows:opaque value (32 Bit AS
Number)opaque value (32 Bit ID). In
conjunction with ASN, uniquely identifies the BGP-LS domain. The
combination of ASN and BGP-LS ID MUST be globally unique. All
BGP-LS speakers within an IGP flooding-set (set of IGP nodes
within which an LSP/LSA is flooded) MUST use the same ASN, BGP-LS
ID tuple. If an IGP domain consists of multiple flooding-sets,
then all BGP-LS speakers within the IGP domain SHOULD use the same
ASN, BGP-LS ID tuple. The ASN, BGP Router-ID tuple (which is
globally unique ) of one of the
BGP-LS speakers within the flooding-set (or IGP domain) may be
used for all BGP-LS speakers in that flooding-set (or IGP domain).
It is used to identify the 32 Bit area to
which the NLRI belongs. Area Identifier allows the different NLRIs
of the same router to be discriminated.
opaque value. This is a mandatory
TLV. For an IS-IS non-Pseudonode, this contains 6 octet ISO
node-ID (ISO system-ID).
For an IS-IS Pseudonode corresponding to a LAN, this contains 6
octet ISO node-ID of the "Designated Intermediate System" (DIS)
followed by one octet nonzero PSN identifier (7 octets in total).
For an OSPFv2 or OSPFv3 non-"Pseudonode", this contains the 4 octet
Router-ID. For an OSPFv2 "Pseudonode" representing a LAN, this
contains the 4 octet Router-ID of the designated router (DR) followed
by the 4 octet IPv4 address of the DR's interface to the LAN (8 octets
in total). Similarly, for an OSPFv3 "Pseudonode", this contains the 4
octet Router-ID of the DR followed by the 4 octet interface identifier
of the DR's interface to the LAN (8 octets in total). The TLV size
in combination with protocol identifier enables the decoder to
determine the type of the node.
There can be at most one instance of each sub-TLV type present
in any Node Descriptor. The sub-TLVs within a Node descriptor
MUST be arranged in ascending order by sub-TLV type. This
needs to be done in order to compare NLRIs, even when an
implementation encounters an unknown sub-TLV. Using stable sorting
an implementation can do binary comparison of NLRIs and hence
allow incremental deployment of new key sub-TLVs.The Multi-Topology ID (MT-ID) TLV carries one or more IS-IS or
OSPF Multi-Topology IDs for a link, node or prefix.Semantics of the IS-IS MT-ID are defined in RFC5120, Section 7.2. Semantics of the OSPF
MT-ID are defined in RFC4915, Section
3.7. If the value in the MT-ID TLV is derived from OSPF, then
the upper 9 bits MUST be set to 0. Bits R are reserved, SHOULD be
set to 0 when originated and ignored on receipt.The format of the MT-ID TLV is shown in the following figure.
where Type is 263, Length is 2*n and n is the number of MT-IDs
carried in the TLV.The MT-ID TLV MAY be present in a Link Descriptor, a Prefix
Descriptor, or in the BGP-LS attribute of a node NLRI. In a Link or
Prefix Descriptor, only a single MT-ID TLV containing the MT-ID of
the topology where the link or the prefix is reachable is
allowed. In case one wants to advertise multiple
topologies for a given Link Descriptor or Prefix Descriptor, multiple NRLIs
need to be generated where each NLRI contains an unique MT-ID.
In the BGP-LS attribute of a node NLRI, one MT-ID TLV containing the array
of MT-IDs of all topologies where the node is reachable is allowed.
The 'Link Descriptor' field is a set of Type/Length/Value (TLV)
triplets. The format of each TLV is shown in . The 'Link descriptor' TLVs uniquely
identify a link among multiple parallel links between a pair of anchor
routers. A link described by the Link descriptor TLVs actually is a
"half-link", a unidirectional representation of a logical link. In
order to fully describe a single logical link, two originating routers
advertise a half-link each, i.e., two link NLRIs are advertised for a
given point-to-point link.The format and semantics of the 'value' fields in most 'Link
Descriptor' TLVs correspond to the format and semantics of value
fields in IS-IS Extended IS Reachability sub-TLVs, defined in , and . Although the encodings for 'Link Descriptor'
TLVs were originally defined for IS-IS, the TLVs can carry data
sourced either by IS-IS or OSPF.The following TLVs are valid as Link Descriptors in the Link
NLRI:TLV Code PointDescriptionIS-IS TLV/Sub-TLVValue defined in:258Link Local/Remote Identifiers22/4/1.1259IPv4 interface address22/6/3.2260IPv4 neighbor address22/8/3.3261IPv6 interface address22/12/4.2262IPv6 neighbor address22/13/4.3263Multi-Topology Identifier---The information about a link present in the LSA/LSP
originated by the local node of the link determines the set
of TLVs in the Link Descriptor of the link.
If interface and neighbor addresses, either IPv4 or
IPv6, are present, then the IP address TLVs are included
in the link descriptor, but not the link local/remote
Identifier TLV. The link local/remote identifiers MAY be
included in the link attribute.
If interface and neighbor addresses are not present and
the link local/remote identifiers are present, then the
link local/remote Identifier TLV is included in the link
descriptor.
The Multi-Topology Identifier TLV is included in link
descriptor if that information is present.
The 'Prefix Descriptor' field is a set of Type/Length/Value (TLV)
triplets. 'Prefix Descriptor' TLVs uniquely identify an IPv4 or IPv6
Prefix originated by a Node. The following TLVs are valid as Prefix
Descriptors in the IPv4/IPv6 Prefix NLRI:TLV Code PointDescriptionLengthValue defined in:263Multi-Topology Identifiervariable264OSPF Route Type1265IP Reachability Informationvariable OSPF Route Type is an optional TLV that MAY be present in Prefix
NLRIs. It is used to identify the OSPF route-type of the prefix. It
is used when an OSPF prefix is advertised in the OSPF domain with
multiple route-types. The Route Type TLV allows to
discrimination of these advertisements. The format of the OSPF Route Type
TLV is shown in the following figure.where the Type and Length fields of the TLV are defined in
. The OSPF Route Type
field values are defined in the OSPF protocol, and can be one of
the following:
Intra-Area (0x1)Inter-Area (0x2)External 1 (0x3)External 2 (0x4)NSSA 1 (0x5)NSSA 2 (0x6)The IP Reachability Information is a mandatory TLV that contains
one IP address prefix (IPv4 or IPv6) originally advertised in the
IGP topology. Its purpose is to glue a particular BGP service NLRI
by virtue of its BGP next-hop to a given Node in the LSDB. A router
SHOULD advertise an IP Prefix NLRI for each of its BGP Next-hops.
The format of the IP Reachability Information TLV is shown in the
following figure:The Type and Length fields of the TLV are defined in . The following two fields
determine the address-family reachability information. The 'Prefix
Length' field contains the length of the prefix in bits. The 'IP
Prefix' field contains the most significant octets of the prefix;
i.e., 1 octet for prefix length 1 up to 8, 2 octets for prefix
length 9 to 16, 3 octets for prefix length 17 up to 24 and 4 octets
for prefix length 25 up to 32, etc.This is an optional, non-transitive BGP attribute that is used to
carry link, node and prefix parameters and attributes. It is defined as
a set of Type/Length/Value (TLV) triplets, described in the following
section. This attribute SHOULD only be included with Link-State
NLRIs. This attribute MUST be ignored for all other
address-families.Node attribute TLVs are the TLVs that may be encoded in the BGP-LS
attribute with a node NLRI. The following node attribute TLVs are
defined:TLV Code PointDescriptionLengthValue defined in:263Multi-Topology Identifiervariable1024Node Flag Bits11025Opaque Node Propertiesvariable1026Node Namevariable1027IS-IS Area Identifiervariable1028IPv4 Router-ID of Local Node4/4.31029IPv6 Router-ID of Local Node16/4.1The Node Flag Bits TLV carries a bit mask describing node
attributes. The value is a variable length bit array of flags, where
each bit represents a node capability.The bits are defined as follows:BitDescriptionReference'O'Overload Bit'T'Attached Bit'E'External Bit'B'ABR BitReservedReserved for future useAn IS-IS node can be part of one or more IS-IS areas. Each of
these area addresses is carried in the IS-IS Area Identifier TLV. If
multiple Area Addresses are present, multiple TLVs are used to
encode them. The IS-IS Area Identifier TLV may be present in the
BGP-LS attribute only when advertised in the Link-State Node NLRI.
The Node Name TLV is optional. Its structure and encoding has
been borrowed from . The value field
identifies the symbolic name of the router node. This symbolic name
can be the FQDN for the router, it can be a subset of the FQDN, or
it can be any string operators want to use for the router. The use
of FQDN or a subset of it is strongly RECOMMENDED.
The Value field is encoded in 7-bit ASCII. If a user-interface
for configuring or displaying this field permits Unicode characters,
that user-interface is responsible for applying the ToASCII and/or
ToUnicode algorithm as described in to
achieve the correct format for transmission or display.
Although is an IS-IS specific extension,
usage of the Node Name TLV is possible for all protocols. How a
router derives and injects node names for e.g. OSPF nodes, is
outside of the scope of this document.
The local IPv4/IPv6 Router-ID TLVs are used to describe auxiliary
Router-IDs that the IGP might be using, e.g., for TE and migration
purposes like correlating a Node-ID between different protocols. If
there is more than one auxiliary Router-ID of a given type, then
each one is encoded in its own TLV.
The Opaque Node Attribute TLV is an envelope that transparently
carries optional node attribute TLVs advertised by a router. An
originating router shall use this TLV for encoding information
specific to the protocol advertised in the NLRI header Protocol-ID
field or new protocol extensions to the protocol as advertised in
the NLRI header Protocol-ID field for which there is no protocol
neutral representation in the BGP link-state NLRI.
The primary use of the Opaque Node Attribute TLV is to bridge the
document lag between e.g. a new IGP Link-state attribute being
defined and the 'protocol-neutral' BGP-LS extensions being published.
A router for example could use this extension in order to advertise
the native protocols node attribute TLVs, such as the OSPF Router
Informational Capabilities TLV defined in , or the IGP TE Node Capability Descriptor
TLV described in . Link attribute TLVs are TLVs that may be encoded in the BGP-LS
attribute with a link NLRI. Each 'Link Attribute' is a
Type/Length/Value (TLV) triplet formatted as defined in . The format and semantics of the 'value'
fields in some 'Link Attribute' TLVs correspond to the format and
semantics of value fields in IS-IS Extended IS Reachability sub-TLVs,
defined in and . Other 'Link Attribute' TLVs are defined in
this document. Although the encodings for 'Link Attribute' TLVs were
originally defined for IS-IS, the TLVs can carry data sourced either
by IS-IS or OSPF.The following 'Link Attribute' TLVs are are valid in the
LINK_STATE attribute:TLV Code PointDescriptionIS-IS TLV/Sub-TLVDefined in:1028IPv4 Router-ID of Local Node134/---/4.31029IPv6 Router-ID of Local Node140/---/4.11030IPv4 Router-ID of Remote Node134/---/4.31031IPv6 Router-ID of Remote Node140/---/4.11088Administrative group (color)22/3/3.11089Maximum link bandwidth22/9/3.31090Max. reservable link bandwidth22/10/3.51091Unreserved bandwidth22/11/3.61092TE Default Metric22/18/1093Link Protection Type22/20/1.21094MPLS Protocol Mask---1095IGP Metric---1096Shared Risk Link Group---1097Opaque link attribute---1098Link Name attribute---The local/remote IPv4/IPv6 Router-ID TLVs are used to describe
auxiliary Router-IDs that the IGP might be using, e.g., for TE
purposes. All auxiliary Router-IDs of both the local and the remote
node MUST be included in the link attribute of each link NLRI. If
there are more than one auxiliary Router-ID of a given type, then
multiple TLVs are used to encode them.
The MPLS Protocol TLV carries a bit mask describing which MPLS
signaling protocols are enabled. The length of this TLV is 1. The
value is a bit array of 8 flags, where each bit represents an MPLS
Protocol capability.Generation of the MPLS Protocol Mask TLV is only valid for
originators which have local link insight, like for example
Protocol-IDs 'Static' or 'Direct' as per
. The 'MPLS Protocol
Mask' TLV MUST NOT be included in NLRIs with protocol-IDs
'IS-IS L1', 'IS-IS L2', 'OSPFv2' or 'OSPFv3' as per
.
The following bits are defined:BitDescriptionReference'L'Label Distribution Protocol (LDP)'R'Extension to RSVP for LSP Tunnels (RSVP-TE)'Reserved'Reserved for future useThe TE Default Metric TLV carries the TE-metric for this link.
The length of this TLV is fixed at 4 octets. If a source protocol
(e.g. IS-IS) does not support a Metric width of 32 bits then the high
order octet MUST be set to zero.
The IGP Metric TLV carries the metric for this link. The length
of this TLV is variable, depending on the metric width of the
underlying protocol. IS-IS small metrics have a length of 1 octet
(the two most significant bits are ignored). OSPF link metrics have a
length of two octets. IS-IS wide-metrics have a length of three
octets.
The Shared Risk Link Group (SRLG) TLV carries the Shared Risk
Link Group information (see Section 2.3, "Shared Risk Link Group
Information", of ). It contains a data
structure consisting of a (variable) list of SRLG values, where each
element in the list has 4 octets, as shown in . The length of this TLV is 4 * (number of SRLG
values).Note that there is no SRLG TLV in OSPF-TE. In IS-IS the SRLG
information is carried in two different TLVs: the IPv4 (SRLG) TLV
(Type 138) defined in , and the IPv6
SRLG TLV (Type 139) defined in . In
Link-State NLRI both IPv4 and IPv6 SRLG information are carried in a
single TLV.The Opaque link Attribute TLV is an envelope that transparently
carries optional link attribute TLVs advertised by a router. An
originating router shall use this TLV for encoding information
specific to the protocol advertised in the NLRI header Protocol-ID
field or new protocol extensions to the protocol as advertised in
the NLRI header Protocol-ID field for which there is no protocol
neutral representation in the BGP link-state NLRI.
The primary use of the Opaque Link Attribute TLV is to bridge the
document lag between e.g. a new IGP Link-state attribute being
defined and the 'protocol-neutral' BGP-LS extensions being published.
The Link Name TLV is optional. The value field identifies the
symbolic name of the router link. This symbolic name can be the
FQDN for the link, it can be a subset of the FQDN, or it can be any
string operators want to use for the link. The use of FQDN or a
subset of it is strongly RECOMMENDED.
The Value field is encoded in 7-bit ASCII. If a user-interface
for configuring or displaying this field permits Unicode characters,
that user-interface is responsible for applying the ToASCII and/or
ToUnicode algorithm as described in to
achieve the correct format for transmission or display.
How a router derives and injects link names is outside of the
scope of this document.
Prefixes are learned from the IGP topology (IS-IS or OSPF) with a
set of IGP attributes (such as metric, route tags, etc.) that MUST be
reflected into the LINK_STATE attribute. This section describes the
different attributes related to the IPv4/IPv6 prefixes. Prefix
Attributes TLVs SHOULD be used when advertising NLRI types 3 and 4
only. The following attributes TLVs are defined:TLV Code PointDescriptionLengthReference1152IGP Flags11153Route Tag4*n1154Extended Tag8*n1155Prefix Metric41156OSPF Forwarding Address41157Opaque Prefix AttributevariableIGP Flags TLV contains IS-IS and OSPF flags and bits originally
assigned tothe prefix. The IGP Flags TLV is encoded as follows:The value field contains bits defined according to the table
below:BitDescriptionReference'D'IS-IS Up/Down Bit'N'OSPF "no unicast" Bit'L'OSPF "local address" Bit'P'OSPF "propagate NSSA" BitReservedReserved for future use.Route Tag TLV carries original IGP TAGs (IS-IS or OSPF) of the prefix and is encoded as
follows:Length is a multiple of 4.The value field contains one or more Route Tags as learned in the
IGP topology.Extended Route Tag TLV carries IS-IS Extended Route TAGs of the
prefix and is encoded as follows:Length is a multiple of 8.The 'Extended Route Tag' field contains one or more Extended
Route Tags as learned in the IGP topology.Prefix Metric TLV is an optional attribute and may only appear once.
If present, it carries the metric of the prefix as known in the IGP
topology (and therefore
represents the reachability cost to the prefix).
If not present, it means that the prefix is advertised
without any reachability.Length is 4.OSPF Forwarding Address TLV and
carries the OSPF forwarding address as known in the original OSPF
advertisement. Forwarding address can be either IPv4 or IPv6.Length is 4 for an IPv4 forwarding address an 16 for an IPv6
forwarding address.The Opaque Prefix Attribute TLV is an envelope that transparently
carries optional prefix attribute TLVs advertised by a router. An
originating router shall use this TLV for encoding information
specific to the protocol advertised in the NLRI header Protocol-ID
field or new protocol extensions to the protocol as advertised in
the NLRI header Protocol-ID field for which there is no protocol
neutral representation in the BGP link-state NLRI.
The primary use of the Opaque Prefix Attribute TLV is to bridge the
document lag between e.g. a new IGP Link-state attribute being
defined and the 'protocol-neutral' BGP-LS extensions being published.
The format of the TLV is as follows:Type is as specified in and
Length is variable.BGP link-state information for both IPv4 and IPv6 networks can be
carried over either an IPv4 BGP session, or an IPv6 BGP session. If an IPv4
BGP session is used, then the next hop in the MP_REACH_NLRI SHOULD be an
IPv4 address. Similarly, if an IPv6 BGP session is used, then the next hop
in the MP_REACH_NLRI SHOULD be an IPv6 address. Usually the next hop
will be set to the local end-point address of the BGP session. The next
hop address MUST be encoded as described in . The length field of the next hop address will
specify the next hop address-family. If the next hop length is 4, then
the next hop is an IPv4 address; if the next hop length is 16, then it
is a global IPv6 address and if the next hop length is 32, then there is
one global IPv6 address followed by a link-local IPv6 address. The
link-local IPv6 address should be used as described in . For VPN SAFI, as per custom, an 8 byte
route-distinguisher set to all zero is prepended to the next hop.
The BGP Next Hop attribute is used by each BGP-LS speaker to validate
the NLRI it receives. In case identical NLRIs are sourced
by multiple originators the BGP next hop attribute is used to tie-break
as per the standard BGP path decision process.
This specification doesn't mandate any
rule regarding the re-write of the BGP Next Hop attribute.The main source of TE information is the IGP, which is not active on
inter-AS links. In some cases, the IGP may have information of inter-AS
links (, ). In other cases, an implementation SHOULD
provide a means to inject inter-AS links into BGP-LS. The exact
mechanism used to provision the inter-AS links is outside the scope of
this documentEncoding of a broadcast LAN in IS-IS provides a good example of how
Router-IDs are encoded. Consider . This represents a Broadcast LAN
between a pair of routers. The "real" (=non pseudonode) routers have
both an IPv4 Router-ID and IS-IS Node-ID. The pseudonode does not have
an IPv4 Router-ID. Node1 is the DIS for the LAN. Two unidirectional
links (Node1, Pseudonode 1) and (Pseudonode1, Node2) are being
generated.The link NRLI of (Node1, Pseudonode1) is encoded as follows: the IGP
Router-ID TLV of the local node descriptor is 6 octets long containing
ISO-ID of Node1, 1920.0000.2001; the IGP Router-ID TLV of the remote
node descriptor is 7 octets long containing the ISO-ID of Pseudonode1,
1920.0000.2001.02. The BGP-LS attribute of this link contains one local
IPv4 Router-ID TLV (TLV type 1028) containing 192.0.2.1, the IPv4
Router-ID of Node1.
The link NRLI of (Pseudonode1. Node2) is encoded as follows: the IGP
Router-ID TLV of the local node descriptor is 7 octets long containing
the ISO-ID of Pseudonode1, 1920.0000.2001.02; the IGP Router-ID TLV of
the remote node descriptor is 6 octets long containing ISO-ID of Node2,
1920.0000.2002. The BGP-LS attribute of this link contains one remote
IPv4 Router-ID TLV (TLV type 1030) containing 192.0.2.2, the IPv4
Router-ID of Node2.
Encoding of a broadcast LAN in OSPF provides a good example of how
Router-IDs and local Interface IPs are encoded. Consider . This represents a Broadcast LAN
between a pair of routers. The "real" (=non pseudonode) routers have
both an IPv4 Router-ID and an Area Identifier. The pseudonode does have
an IPv4 Router-ID, an IPv4 interface Address (for
disambiguation) and an OSPF Area. Node1 is the DR for the
LAN, hence its local IP address 10.1.1.1 is used both as the
Router-ID and Interface IP for the Pseudonode keys.
Two unidirectional links (Node1, Pseudonode 1) and
(Pseudonode1, Node2) are being generated.The link NRLI of (Node1, Pseudonode1) is encoded as follows:
Local Node Descriptor
TLV #515: IGP Router ID: 11.11.11.11TLV #514: OSPF Area-ID: ID:0.0.0.0Remote Node Descriptor
TLV #515: IGP Router ID: 10.1.1.1:10.1.1.1TLV #514: OSPF Area-ID: ID:0.0.0.0The link NRLI of (Pseudonode1, Node2) is encoded as follows:
Local Node Descriptor
TLV #515: IGP Router ID: 10.1.1.1:10.1.1.1TLV #514: OSPF Area-ID: ID:0.0.0.0Remote Node Descriptor
TLV #515: IGP Router ID: 33.33.33.34TLV #514: OSPF Area-ID: ID:0.0.0.0Graceful migration from one IGP to another requires coordinated
operation of both protocols during the migration period. Such a
coordination requires identifying a given physical link in both
IGPs. The IPv4 Router-ID provides that "glue" which is present in the
node descriptors of the OSPF link NLRI and in the link attribute of the
IS-IS link NLRI.
Consider a point-to-point link between two routers, A and B, that
initially were OSPFv2-only routers and then IS-IS is enabled on
them. Node A has IPv4 Router-ID and ISO-ID; node B has IPv4 Router-ID,
IPv6 Router-ID and ISO-ID. Each protocol generates one link NLRI for
the link (A, B), both of which are carried by BGP-LS. The OSPFv2 link
NLRI for the link is encoded with the IPv4 Router-ID of nodes A and B in
the local and remote node descriptors, respectively. The IS-IS link
NLRI for the link is encoded with the ISO-ID of nodes A and B in the
local and remote node descriptors, respectively. In addition, the BGP-LS
attribute of the IS-IS link NLRI contains the TLV type 1028
containing the IPv4 Router-ID of node A; TLV type 1030 containing the
IPv4 Router-ID of node B and TLV type 1031 containing the IPv6 Router-ID
of node B. In this case, by using IPv4 Router-ID, the link (A, B) can be
identified in both IS-IS and OSPF protocol.
Distribution of all links available in the global Internet is certainly
possible, however not desirable from a scaling and privacy point of
view. Therefore an implementation may support link to path
aggregation. Rather than advertising all specific links of a domain, an
ASBR may advertise an "aggregate link" between a non-adjacent pair of
nodes. The "aggregate link" represents the aggregated set of link
properties between a pair of non-adjacent nodes. The actual methods to
compute the path properties (of bandwidth, metric) are outside the scope
of this document. The decision whether to advertise all specific links or
aggregated links is an operator's policy choice. To highlight the varying
levels of exposure, the following deployment examples are discussed.Consider . Both AS1 and AS2
operators want to protect their inter-AS {R1,R3}, {R2, R4} links using
RSVP-FRR LSPs. If R1 wants to compute its link-protection LSP to R3 it
needs to "see" an alternate path to R3. Therefore the AS2 operator
exposes its topology. All BGP TE enabled routers in AS1 "see" the full
topology of AS and therefore can compute a backup path. Note that the
decision if the direct link between {R3, R4} or the {R4, R5, R3) path is
used is made by the computing router.The brief difference between the "no-link aggregation" example and
this example is that no specific link gets exposed. Consider . The only link which gets
advertised by AS2 is an "aggregate" link between R3 and R4. This is
enough to tell AS1 that there is a backup path. However the actual links
being used are hidden from the topology.Service providers in control of multiple ASes may even decide to not
expose their internal inter-AS links. Consider . AS3 is modeled as a single node
which connects to the border routers of the aggregated domain.
This document requests a code point from the registry of Address Family
Numbers. As per early allocation procedure this is AFI 16388.This document requests a code point from the registry of Subsequent
Address Family Numbers named 'BGP-LS'. As per early allocation procedure this is SAFI
71.This document requests a code point from the registry of Subsequent
Address Family Numbers named 'BGP-LS-VPN'.This document requests a code point from the BGP Path Attributes
registry. As per early allocation procedure this is Path
Attribute 29.This document requests creation of a new registry for BGP-LS
NLRI-Types. Value 0 is reserved. The registry will be initialized as
shown in . Allocations within the registry
will require documentation of the proposed use of the allocated value and
approval by the Designated Expert assigned by the IESG (see ).This document requests creation of a new registry for BGP-LS
Protocol-IDs. Value 0 is reserved. The registry will be initialized as
shown in . Allocations within the registry
will require documentation of the proposed use of the allocated value and
approval by the Designated Expert assigned by the IESG (see ).This document requests creation of a new registry for BGP-LS
Well-known Instance-IDs. The registry will be initialized as
shown in . Allocations within the registry
will require documentation of the proposed use of the allocated value and
approval by the Designated Expert assigned by the IESG (see ).This document requests creation of a new registry for node anchor, link
descriptor and link attribute TLVs. Values 0-255 are reserved. Values
256-65535 will be used for code points. The registry will be initialized as
shown in . Allocations within the registry
will require documentation of the proposed use of the allocated value and
approval by the Designated Expert assigned by the IESG (see ).Note to RFC Editor: this section may be removed on publication as an
RFC.This section is structured as recommended in .Existing BGP operational procedures apply. No new operation
procedures are defined in this document. It is noted that the NLRI
information present in this document purely carries application level
data that has no immediate corresponding forwarding state impact. As
such, any churn in reachability information has different impact than
regular BGP updates which need to change forwarding state for an
entire router. Furthermore it is anticipated that distribution of this
NLRI will be handled by dedicated route-reflectors providing a level
of isolation and fault-containment between different NLRI types.Configuration parameters defined in SHOULD be initialized to
the following default values: The Link-State NLRI capability is turned off for all neighbors.The maximum rate at which Link-State NLRIs will be
advertised/withdrawn from neighbors is set to 200 updates per
second.The proposed extension is only activated between BGP peers after
capability negotiation. Moreover, the extensions can be turned
on/off an individual peer basis (see ), so the extension can be
gradually rolled out in the network.The protocol extension defined in this document does not put new
requirements on other protocols or functional components.Frequency of Link-State NLRI updates could interfere with regular
BGP prefix distribution. A network operator MAY use a dedicated
Route-Reflector infrastructure to distribute Link-State NLRIs.Distribution of Link-State NLRIs SHOULD be limited to a single
admin domain, which can consist of multiple areas within an AS or
multiple ASes.Existing BGP procedures apply. In addition, an implementation
SHOULD allow an operator to:
List neighbors with whom the Speaker is exchanging Link-State
NLRIsThis document does not mandate any new MIB information or NETCONF/YANG models.
If an implementation of BGP-LS detects a malformed
attribute, then it SHOULD use the 'Attribute Discard' action
as per
Section 2.
An implementation of BGP-LS MUST perform the following
syntactic checks for determining if a message is malformed.
Does the sum of all TLVs found in the BGP LS
attribute correspond to the BGP LS path attribute length ?Does the sum of all TLVs found in the BGP MP_REACH_NLRI
attribute correspond to the BGP MP_REACH_NLRI length ?Does the sum of all TLVs found in the BGP MP_UNREACH_NLRI
attribute correspond to the BGP MP_UNREACH_NLRI length ?Does the sum of all TLVs found in a Node-, Link or
Prefix Descriptor NLRI attribute correspond to the
Node-, Link- or Prefix Descriptors 'Total NLRI Length' field ?Does any fixed length TLV correspond to the TLV Length
field in this document ?An implementation SHOULD allow the operator to specify neighbors
to which Link-State NLRIs will be advertised and from which
Link-State NLRIs will be accepted.An implementation SHOULD allow the operator to specify the
maximum rate at which Link-State NLRIs will be advertised/withdrawn
from neighbors.An implementation SHOULD allow the operator to specify the
maximum number of Link-State NLRIs stored in router's RIB.An implementation SHOULD allow the operator to create abstracted
topologies that are advertised to neighbors; Create different
abstractions for different neighbors.An implementation SHOULD allow the operator to configure a 64-bit
instance ID.An implementation SHOULD allow the operator to configure a pair
of ASN and BGP-LS
identifier per flooding set in which the node participates.Not Applicable.An implementation SHOULD provide the following statistics: Total number of Link-State NLRI updates sent/receivedNumber of Link-State NLRI updates sent/received, per
neighborNumber of errored received Link-State NLRI updates, per
neighborTotal number of locally originated Link-State NLRIsAn operator SHOULD define ACLs to limit inbound updates as
follows: Drop all updates from Consumer peersThis section contains the global table of all TLVs/Sub-TLVs defined in
this document.TLV Code PointDescriptionIS-IS TLV/ Sub-TLVValue defined in:256Local Node Descriptors---257Remote Node Descriptors---258Link Local/Remote Identifiers22/4/1.1259IPv4 interface address22/6/3.2260IPv4 neighbor address22/8/3.3261IPv6 interface address22/12/4.2262IPv6 neighbor address22/13/4.3263Multi-Topology ID---264OSPF Route Type---265IP Reachability Information---512Autonomous System---513BGP-LS Identifier---514OSPF Area ID---515IGP Router-ID---1024Node Flag Bits---1025Opaque Node Properties---1026Node Namevariable1027IS-IS Area Identifiervariable1028IPv4 Router-ID of Local Node134/---/4.31029IPv6 Router-ID of Local Node140/---/4.11030IPv4 Router-ID of Remote Node134/---/4.31031IPv6 Router-ID of Remote Node140/---/4.11088Administrative group (color)22/3/3.11089Maximum link bandwidth22/9/3.31090Max. reservable link bandwidth22/10/3.51091Unreserved bandwidth22/11/3.61092TE Default Metric22/181093Link Protection Type22/20/1.21094MPLS Protocol Mask---1095IGP Metric---1096Shared Risk Link Group---1097Opaque link attribute---1098Link Name attribute---1152IGP Flags---1153Route Tag---1154Extended Tag---1155Prefix Metric---1156OSPF Forwarding Address---1157Opaque Prefix Attribute---Procedures and protocol extensions defined in this document do not
affect the BGP security model. See the 'Security Considerations' section
of for a discussion of BGP security. Also refer
to and
for analysis of security issues for BGP.In the context of the BGP peerings associated with this document, a BGP
Speaker SHOULD NOT accept updates from a Consumer peer. That is, a
participating BGP Speaker, should be aware of the nature of its
relationships for link state relationships and should protect itself from
peers sending updates that either represent erroneous information feedback
loops, or are false input. Such protection can be achieved by manual
configuration of Consumer peers at the BGP Speaker.An operator SHOULD employ a mechanism to protect a BGP Speaker against
DDoS attacks from Consumers. The principal attack a consumer may apply is
to attempt to start multiple sessions either sequentially or
simultaneously. Protection can be applied by imposing rate limits.Additionally, it may be considered that the export of link state and TE
information as described in this document constitutes a risk to
confidentiality of mission-critical or commercially-sensitive information
about the network. BGP peerings are not automatic and require
configuration, thus it is the responsibility of the network operator to
ensure that only trusted Consumers are configured to receive such
information.
We would like to thank Robert Varga for the significant contribution he
gave to this document.We would like to thank Nischal Sheth, Alia Atlas, David Ward, Derek
Yeung, Murtuza Lightwala, John Scudder, Kaliraj Vairavakkalai, Les
Ginsberg, Liem Nguyen, Manish Bhardwaj, Mike Shand, Peter Psenak, Rex
Fernando, Richard Woundy, Steven Luong, Tamas Mondal, Waqas Alam, Vipin
Kumar, Naiming Shen, Balaji Rajagopalan and Yakov Rekhter for their comments.