Advertising p2mp policies
in BGPNokiaOttawaCanadahooman.bidgoli@nokia.comBell CanadaMontrealCanadadaniel.yover@bell.caNokiaOttawaCanadaandrew.stone@nokia.comCisco SystemSan JoseUSAriparekh@cisco.comCisco System, Inc.RixensartBelgiumsekrier@cisco.comCisco System, Inc.OttawaCanadaarvvenka@cisco.comSR P2MP policies are set of policies that enable architecture for
P2MP service delivery.A P2MP policy consists of candidate paths that connects the Root of
the Tree to a set of Leaves. The P2MP policy is composed of replication
segments. A replication segment is a forwarding instruction for a
candidate path which is downloaded to the Root, transit nodes and the
leaves.This document specifies a new BGP SAFI with a new NLRI in order to
advertise P2MP policy from a controller to a set of nodes.This document introduces two new route types within this NLRI, one
for P2MP policy and its candidate paths that need to be programmed on
the Root node and another for the replication segment and forwarding
instructions that needs to be programmed on the Root, and optionally on
Transit and Leaf nodes.It should be noted that this document does not specify how the Root
and the Leaves are discovered on the controller, it only describes how
the P2MP Policy and Replication Segments are programmed from the
controller to the nodes.The draft defines a
variant of the SR Policy for constructing a
P2MP segment to support multicast service delivery.A Point-to-Multipoint (P2MP) Policy contains a set of candidate paths
and identifies a Root node and a set of Leaf nodes in a Segment Routing
Domain. The draft also defines a Replication segment, which corresponds
to the state of a P2MP segment on a particular node. The Replication
segment is the forwarding instruction for a P2MP LSP at the Root,
Transit and Leaf nodes.For a P2MP segment, a controller may be used to compute a tree from a
Root node to a set of Leaf nodes, optionally via a set of replication
nodes. A packet is replicated at the root node and optionally on
Replication nodes towards each Leaf node.We define two types of a P2MP segment: Ingress Replication (aka
Spray) and Downstream Replication (aka TreeSID).A Point-to-Multipoint service delivery could be via Ingress
Replication (aka Spray in some SR context), i.e., the root unicasts
individual copies of traffic to each leaf. The corresponding P2MP
segment consists of replication segments only for the root and the
leaves.A Point-to-Multipoint service delivery could also be via Downstream
Replication (aka TreeSID in some SR context), i.e., the root and some
downstream replication nodes replicate the traffic along the way as it
traverses closer to the leaves.It should be noted that two replication nodes can be connected
directly, or they can be connected via unicast SR segment or a segment
list. The leaves and the root of a p2mp policy can be discovered via the
multicast protocols or procedures like NG-MVPN
or manually configured on the PCC (CLI) or the PCE.Based on the discovered root and leaves, the controller builds a P2MP
policy and advertise it to the head-end router (i.e. the root of the
P2MP Tree). The advertisement uses BGP extensions defined in this
document. The controller also calculates the tree path and builds the
replication segments on each segment of the tree, Root, Transit and Leaf
nodes and downloads the forwarding instructions to the nodes via BGP
extensions defined in this document.SR p2mp policy is a variant of the SR policy and as such it reuses
the concept of a candidate path. This draft reuses some of the concepts
and TLVs mentioned in A candidate path with in the P2MP policy can contain multiple path-
instances. A path-instance can be viewed as a P2MP LSP. For candidate
path global optimization purposes, two or more path-instances can be
used to execute make before break procedures.Each path-instance is a P2MP LSP as such each path-instance needs a
set of replication segments to construct its forwarding
instructions.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].This document defines a new BGP NLRI, called the P2MP-POLICY
NLRI.A new SAFI is defined: the SR P2MP Policy SAFI, (Codepoint tbd
assigned by IANA). The following is the format of the P2MP-POLICY
NLRI:The Route type field defines the encoding of the rest of the
P2MP- POLICY NLRI.The length field indicates the length in octets of the route
type specific data, excluding route type and lengthThis document defines the following route types:P2MP Policy route: TBD1Replication Segment: TBD2The NLRI containing the SR P2MP Policy is carried in a BGP UPDATE
message using BGP multiprotocol extensions
with an AFI of 1 or 2 (IPv4 or IPv6) and with
a SAFI of "TBD" (assigned by IANA from the "Subsequent Address Family
Identifiers (SAFI) Parameters" registry).All other recommendations of section SR Policy
SAFI and NLRI, should be taken into account for P2MP policy.Root-ID: IPv4/IPv6 address of the head-end (Root) of the p2mp
tree, based on AFI.Tree-ID: a unique 4 octets identifier of the p2mp tree on the
head- end (root)router.Distinguisher: 4-octets value uniquely identifying the policy
in the context of <Tree-ID, Originating Router's IP>
tuple. The distinguisher has no semantic value and is solely
used by the SR P2MP Policy originator to make unique (from an
NLRI perspective) multiple occurrences of the same SR P2MP
Policy.There can be two type of replication segment, shared and
non-shared. A shared replication segment can carry multiple MVPN
services or it can be used for Facility Fast reroute protecting
multiple P2MP trees. A non-shared tree is used when the label field
of the PMSI Tunnel Attribute (PTA) is set to 0 as per . The following route
type can be encoded as per for shared and
non-shared replication segment.Root-ID: IPv4/IPv6 address of the head-end (Root) of the p2mp
tree based on AFI. Tree-ID: a unique 4 octets identifier of the p2mp tree on the
head- end router (Root)instance-id, identifies the path-instance with in the p2mp-
policy. Each candidate path can have one, two or more
path-instance. Path-instance is used for global optimization of
the candidate path via make before break procedures. Instance-ID
can be usedDistinguisher: 4-octets value uniquely identifying the policy
in the context of <Root-ID, Tree-ID> tuple. The
distinguisher has no semantic value and is solely used by the SR
P2MP Policy originator to make unique (from an NLRI perspective)
multiple occurrences of the same SR P2MP Policy. Node-ID: Node's IPv4/IPv6 addressThe content of this new NLRI is encoded in the tunnel Encapsulation
Attribute originally defined in using two new Tunnel-Type TLV
(codepoint is TBD, assigned by IANA from the "BGP Tunnel Encapsulation
Attribute Tunnel Types" registry) one for P2MP Policy and another for
Replication segment.Relevant only at the Root. SR P2MP-POLICY NLRI and P2MP Policy route type.Tunnel Encapsulation Attribute is defined in Tunnel-Type is set to P2MP-Policy Tunnel-Type TBD (assigned
by IANA from the "BGP Tunnel Encapsulation Attribute Tunnel
Types" registry).Policy Name, Policy Candidate Path Name are defined in Preference, leaf-list, remote-end point and path- instance,
instance-ids are defined in this document.Additional sub-TLVs may be defined in the future.SR P2MP-POLICY NLRI and non-shared tree Replication segment
route type or shared tree Replication segment route type.Tunnel Encapsulation Attribute is defined in .Tunnel-Type is set to Replication Segment Tunnel Type, TBD
(assigned by IANA from the "BGP Tunnel Encapsulation Attribute
Tunnel Types" registry).tree-identifier, replication-sid (binding sid), SRv6
replication-sid, downstream-nodes and segment-list are defined
in this document.Additional sub-TLVs may be defined in the future.EACH P2MP policy NLRI represents a candidate path for a P2MP
policy. A P2MP policy can have multiple candidate paths and would need
multiple P2MP policy NRLI to download all the candidate paths.As defined in preference Sub-TLV section in the candidate
path with highest preference is the active candidate path.The leaf list sub-tlv identifies a set of leaves for the tree.
Each leaf is a remote endpoint as defined in The leaf-list sub-tlv is optional.
The PCE can choose to download the leaf list every time it is
configured or learns a new leaf. If the PCE chooses to download this
optional sub-tlv it should download the entire set of the end-points
every time the endpoint list has been modified. The leaf list has
informational value only hence why it is optonal and it is not
required for the root PE to operate. However, it must be noted that
in some cases the end-points list can become very large with 100s of
leaves.Type: TBD, 1 octetLength: 2 octets, the total length (not including the Type
and Length fields) of the sub-TLVs encoded within the leaf-list
sub-TLV.RESERVED: 1 octet of reserved bits. SHOULD be unset on
transmission and MUST be ignored on receipt.sub-TLVs: One or more remote endpoint sub-TLVs. Note the
remote endpoint object is defined in The path instance sub-tlv contains a set of instance-ids (P2MP
LSPs). These LSPs can be used for MBB procedure under a candidate
path. Each LSP Instance-id has a unique id (4 octets) with in the
<root node, P2MP policy>, in other word it is unique per
<root node,tree-id>. The PCE SHOULD always download all
instance-ids to the node. The active instance is identified via the
active instance-id sub-tlv.The P2MP LSP and its replication segments should be configured
from root to the leaves first before the PCE switches that active
instance-id to this new instance.Type: TBD, 1 octetLength: 2 octets, the total length (not including the Type
and Length fields) of the sub-TLVs encoded within the Segment
List sub-TLV.RESERVED: 1 octet of reserved bits. SHOULD be unset on
transmission and MUST be ignored on receiptsub-TLVs: * active instance-id * one or more instance-idThe Active instance-id is used to identify the P2MP LSP which
should be active amongst the collection of instances.Type: TBD.Length: the total length (not including the Type and Length
fields) of the sub-TLVs encoded within the Segment List
sub-TLV.RESERVED: 1 octet of reserved bits. SHOULD be unset on
transmission and MUST be ignored on receipt.active instant-id: The identifier of the active
instance-idMultiple Instance-ids can be programmed for a candidate
path.Type: TBDLength: the total length (not including the Type and Length
fields) of the sub-TLVs encoded within the Segment List
sub-TLV.RESERVED: 1 octet of reserved bits. SHOULD be unset on
transmission and MUST be ignored on receipt.instan-id: a 32 bit unique identifier. The instance-id is
unique with in the context of the <root node, p2mp
policy>The Replication SID is form of a Binding SID as it is defined in
. The
definition of replication sid with in P2MP Policy is defined in
. On the
transit and leaf node the replication SID can be used to identify
the replication segment and the forwarding information at the node.
However on the head-end node (Root), the replication segment acts as
a Binding SID to direct the traffic into the P2MP Tree. It should be
noted that two replication SIDs can be directly connected or
connected via a unicast SR segment list, in this case the
replication sid needs to be at the bottom of sid.The sr-te-policy binding sid and SRv6 binding sid sub-tlvs are
used for replication sid. This draft defines a new flag for
replication sid at transit and leaf nodeR-FLAG: is Replication SID. Replication SID can be used
to define the forwarding information of the transit or leaf
nodes.The down-stream nodes sub-tlv is the list of down stream nodes
that the arriving packet needs to be replicated to. As an example an
arriving packet that needs to be replicated to downstream node A and
node B will have two down stream node TLVs. Each down stream node
sub-tlv could have a single segment list or multiple segment list.
Multiple segment list can be used for ECMP or fast reroute. In the
case of the fast reroute the downstream node flag needs to set the P
bit to explicitly indicate the this downstream node is protected and
the protection sub-tlv needs to be included with every segment
list.Type: TBD.Length: the total length (not including the Type and Length
fields) of the sub-TLVs encoded within the down-stream nodes
sub-TLV.RESERVED: 1 octet of reserved bits. SHOULD be unset on
transmission and MUST be ignored on receipt.flags p: this down stream node has protected segment
list.downstream node id: an id to uniquely identify the downstream
node for this sub-tlv, as an example the loopback IPv4/IPv6 of
the node.sub-TLVs: One or more segment list sub-TLVs. As an example
there can be Two segment list for ECMP or FRR.The segment list Sub-TLV is defined in . The segment-list
Sub-TLV contains one or more segment Sub-TLVs. Two replication
segments can be directly connected via a replication sid or can be
connected via a unicast segment list and a replication sid. In the
later case the replication sid needs to be at the bottom of the
unicast segment list.The Weight sub-TLV is optional and is as defined in . With in the
downstream node sub-tlv, there can be one or more segment list used
for ECMP. In this case the weight sub-tlv can provide weighted
ECMP.Protection sub-tlv is optional, if FRR is desired for the
downstream node this sub-tlv can be used to identify the protection
segment list. To identify protection segment list this sub-tlv
provides a segment list identifier. If protection is desired under
the endpoint all the segment lists should have this sub-tlv. A
protection segment list can not have a weight sub-tlv and it can not
participate in ECMP. That said a segment list that is being
protected can have a weight sub-tlv and participate in ECMP.In general protection segment list is used only if replication
segments are directly connected and there is no unicast segment list
connecting two replication segment. If there is a unicast
replication segment connecting the two replication sid, then the
unicast protection mechanism can be exercise and there is no need
for this protection sub-tlv, hence why this sub-tlv is optional.Type : tbd, 1 octet.Length: 8Flag: 1 octet, the P bit is set when this segment list is
protected by another segment list for the downstream nodesegment list id: the segment list idprotection segment list id: the segment list id that is being
used as protection.The segment sub-Tlv is identified in . As it was
mentioned before two replication segments can be connected directly
to each other or via a segment list. If they are connected directly
to each other then the segment list can be constructed via:If the replication segment is steered via IPv4 or IPv6
nexthops or interface then the segment type E or G can be used
with the new R flag set.If the replication segment is steered via a SR Unicast node
or adjacency SID then segment type A can be used with the new R
flag set. Unicast SR segment types can also be configured for
steering.If they are connected via SR domain then the segment list can
contain multiple different types of SIDs, such as Node, Adjacency or
Binding SIDe. In this case the replication sid is at the bottom of
the stack and of type A with the R flag set. The SR node/adjacency
or binding sids steer the packet through a SR domain until it
reaches another replication segment. where the bottom of the stack
replication sid identifies the forwarding information on that
replication segment.A replication segment can use the same type of segment types
defined in . To identify a
replication segment explicitly a new flag is defined.Where R-Flag is set for a segment Sub-TLV that identifies
a Replication Segment. It should be noted that in a segment list
only the last segment can have the R flag set. Multiple replication
segments can not be stacked on top of each other. That said there
can be special cases for Link Protection where a bypass tunnel is
build via a shared replication segment. As an example when the PCE
downloads a bypass tunnel for link protection that is only
constructed via shared replication segments to protect a group of
non-shared replication segments.Inline with
the consumer of an P2MP Policy is not the BGP process. The BGP process
is used for distributing the P2MP policy NLRI and its route-types but
its installation and use is outside the scope of BGP. The detail for
P2MP Policy can be found in The controller usually is connected to the receivers via a route
reflector. As such one or more route-target SHOULD be attached to the
advertisement of P2MP Policy NLRI and its route-type. Each route
target identifies one head-end (root nodes) for P2MP Policy route or
one or more head-end, transit and leaf nodes for the Non-
Shared/Shared Tree Replication Segment route, for the advertised P2MP
Policy.When a BGP speaker receives an P2MP Policy NLRI the following rules
apply:The P2MP Policy update MUST have either the NO_ADVERTISE
community or at least one route-target extended community in
IPv4-address format. If a router supporting this document receives
an P2MP Policy update with no route-target extended communities
and no NO_ADVERTISE community, the update MUST NOT be processed.
Furthermore, it SHOULD be considered to be malformed, and the
"treat-as-withdraw" strategy of is
applied.If one or more route-targets are present, then at least one
route- target MUST match one of the BGP Identifiers of the
receiver in order for the update to be considered usable. The BGP
Identifier is defined in as a 4 octet
IPv4 address. Therefore the route- target extended community MUST
be of the same format.If one or more route-targets are present and no one matches any
of the local BGP Identifiers, then, while the P2MP Policy NLRI is
acceptable, it is not usable on the receiver node.When a P2MP LSP needs to be optimized for any reason (i.e. it is
taking on an FRR Path or new routers are added to the network) a
global optimization is possible. Note that optimization works per
candidate path. Each candidate path is capable of global optimization.
To do so each candidate path contains two or more path- instances.
Each path instance is a P2MP LSP, each P2MP LSP is identified via a
path-instance-id (equivalent to an lsp-id [RFC3209]). After
calculating an optimized P2MP LSP path the PCE will program the
candidate path with a 2nd path instance and its set of replication
segments for this path-instance on the root, transit and leaf nodes.
After the optimized LSP replication segments are downloaded a MBB
procedure is performed and the previous instance of the path instance
is deleted and removed from head-end node and its corresponding
replication segments from head-end, transit and leaves.A new SAFI is defined: the SR P2MP Policy SAFI, (Codepoint tbd
assigned by IANA)2 new Route type field defines the encoding of the rest of the
P2MP- POLICY SAFIP2MP Policy RouteReplication SegmentTwo new Tunnel type to be assigned by IANATBDD. Yoyer, C. Filsfils, R.Prekh, H.bidgoli, Z. Zhang,
"draft-ietf-pim-sr-p2mp-policy"D. Yoyer, C. Filsfils, R.Prekh, H.bidgoli, Z. Zhang,
"draft-ietf-pim-sr-p2mp-policy "