idnits 2.17.1 

draft-atlas-rtgwg-mrt-mc-arch-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 194 has weird spacing: '...wo MRTs  found...'

  == Line 477 has weird spacing: '...wo MRTs  found...'

  -- The document date (July 12, 2013) is 3939 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'R' is mentioned on line 470, but not defined

  == Missing Reference: 'F' is mentioned on line 470, but not defined

  == Missing Reference: 'C' is mentioned on line 470, but not defined

  == Missing Reference: 'G' is mentioned on line 470, but not defined

  == Missing Reference: 'E' is mentioned on line 467, but not defined

  == Missing Reference: 'D' is mentioned on line 467, but not defined

  == Missing Reference: 'J' is mentioned on line 467, but not defined

  == Missing Reference: 'A' is mentioned on line 473, but not defined

  == Missing Reference: 'B' is mentioned on line 473, but not defined

  == Missing Reference: 'H' is mentioned on line 473, but not defined

  == Missing Reference: 'S' is mentioned on line 1080, but not defined

  == Missing Reference: 'PLR' is mentioned on line 1080, but not defined

  == Unused Reference: 'I-D.wijnands-mpls-mldp-node-protection' is defined on
     line 1274, but no explicit reference was found in the text

  == Outdated reference: A later version (-04) exists of
     draft-enyedi-rtgwg-mrt-frr-algorithm-03

  ** Downref: Normative reference to an Informational draft:
     draft-enyedi-rtgwg-mrt-frr-algorithm (ref.
     'I-D.enyedi-rtgwg-mrt-frr-algorithm')

  == Outdated reference: A later version (-10) exists of
     draft-ietf-rtgwg-mrt-frr-architecture-03

  ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761)

  == Outdated reference: A later version (-08) exists of
     draft-ietf-rtgwg-mofrr-02

  == Outdated reference: A later version (-04) exists of
     draft-iwijnand-mpls-mldp-multi-topology-03

  == Outdated reference: A later version (-01) exists of
     draft-kebler-pim-mrt-protection-00


     Summary: 2 errors (**), 0 flaws (~~), 21 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Routing Area Working Group                                 A. Atlas, Ed.
3	Internet-Draft                                                 R. Kebler
4	Intended status: Standards Track                        Juniper Networks
5	Expires: January 13, 2014                                   IJ. Wijnands
6	                                                     Cisco Systems, Inc.
7	                                                              A. Csaszar
8	                                                               G. Enyedi
9	                                                                Ericsson
10	                                                           July 12, 2013

12	An Architecture for Multicast Protection Using Maximally Redundant Trees
13	                    draft-atlas-rtgwg-mrt-mc-arch-02

15	Abstract

17	   Failure protection is desirable for multicast traffic, whether
18	   signaled via PIM or mLDP.  Different mechanisms are suitable for
19	   different use-cases and deployment scenarios.  This document
20	   describes the architecture for global protection (aka multicast live-
21	   live) and for local protection (aka fast-reroute).

23	   The general methods for global protection and local protection using
24	   alternate-trees are dependent upon the use of Maximally Redundant
25	   Trees.  Local protection can also tunnel traffic in unicast tunnels
26	   to take advantage of the routing and fast-reroute mechanisms
27	   available for IP/LDP unicast destinations.

29	   The failures protected against are single link or node failures.
30	   While the basic architecture might support protection against shared
31	   risk group failures, algorithms to dynamically compute MRTs
32	   supporting this are for future study.

34	Status of this Memo

36	   This Internet-Draft is submitted in full conformance with the
37	   provisions of BCP 78 and BCP 79.

39	   Internet-Drafts are working documents of the Internet Engineering
40	   Task Force (IETF).  Note that other groups may also distribute
41	   working documents as Internet-Drafts.  The list of current Internet-
42	   Drafts is at http://datatracker.ietf.org/drafts/current/.

44	   Internet-Drafts are draft documents valid for a maximum of six months
45	   and may be updated, replaced, or obsoleted by other documents at any
46	   time.  It is inappropriate to use Internet-Drafts as reference
47	   material or to cite them other than as "work in progress."
48	   This Internet-Draft will expire on January 13, 2014.

50	Copyright Notice

52	   Copyright (c) 2013 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
68	     1.1.  Maximally Redundant Trees (MRTs) . . . . . . . . . . . . .  4
69	     1.2.  MRTs and Multicast . . . . . . . . . . . . . . . . . . . .  6
70	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
71	   3.  Use-Cases and Applicability  . . . . . . . . . . . . . . . . .  8
72	   4.  Global Protection: Multicast Live-Live . . . . . . . . . . . .  9
73	     4.1.  Creation of MRMTs  . . . . . . . . . . . . . . . . . . . . 10
74	     4.2.  Traffic Self-Identification  . . . . . . . . . . . . . . . 11
75	       4.2.1.  Merging MRMTs for PIM if Traffic Doesn't
76	               Self-Identify  . . . . . . . . . . . . . . . . . . . . 12
77	     4.3.  Convergence Behavior . . . . . . . . . . . . . . . . . . . 13
78	     4.4.  Inter-area/level Behavior  . . . . . . . . . . . . . . . . 14
79	       4.4.1.  Inter-area Node Protection with 2 border routers . . . 15
80	       4.4.2.  Inter-area Node Protection with > 2 Border Routers . . 16
81	     4.5.  PIM  . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
82	       4.5.1.  Traffic Handling: RPF Checks . . . . . . . . . . . . . 17
83	     4.6.  mLDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
84	   5.  Local Repair: Fast-Reroute . . . . . . . . . . . . . . . . . . 17
85	     5.1.  PLR-driven Unicast Tunnels . . . . . . . . . . . . . . . . 18
86	       5.1.1.  Learning the MPs . . . . . . . . . . . . . . . . . . . 19
87	       5.1.2.  Using Unicast Tunnels and Indirection  . . . . . . . . 19
88	       5.1.3.  MP Alternate Traffic Handling  . . . . . . . . . . . . 20
89	       5.1.4.  Merge Point Reconvergence  . . . . . . . . . . . . . . 21
90	       5.1.5.  PLR termination of alternate traffic . . . . . . . . . 21
91	     5.2.  MP-driven Unicast Tunnels  . . . . . . . . . . . . . . . . 21
92	     5.3.  MP-driven Alternate Trees  . . . . . . . . . . . . . . . . 22
93	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
94	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 23
95	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 23
96	   9.  Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . 23
97	     9.1.  MP-driven Alternate Trees  . . . . . . . . . . . . . . . . 23
98	       9.1.1.  PIM details for Alternate-Trees  . . . . . . . . . . . 26
99	       9.1.2.  mLDP details for Alternate-Trees . . . . . . . . . . . 26
100	       9.1.3.  Traffic Handling by PLR  . . . . . . . . . . . . . . . 26
101	     9.2.  Methods Compared for PIM . . . . . . . . . . . . . . . . . 27
102	     9.3.  Methods Compared for mLDP  . . . . . . . . . . . . . . . . 27
103	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27
104	     10.1. Normative References . . . . . . . . . . . . . . . . . . . 27
105	     10.2. Informative References . . . . . . . . . . . . . . . . . . 28
106	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29

108	1.  Introduction

110	   This document describes how the algorithms in
111	   [I-D.enyedi-rtgwg-mrt-frr-algorithm], which are used in
112	   [I-D.ietf-rtgwg-mrt-frr-architecture] for unicast IP/LDP fast-
113	   reroute, can be used to provide protection for multicast traffic.  It
114	   specifically applies to multicast state signaled by PIM[RFC4601] or
115	   mLDP[RFC6388].  There are additional protocols that depend upon these
116	   (e.g.  VPLS, mVPN, etc.) and consideration of the applicability to
117	   such traffic will be in a future version.

119	   In this document, global protection is used to refer to the method of
120	   having two maximally disjoint multicast trees where traffic may be
121	   sent on both and resolved by the receiver.  This is similar to the
122	   ability with RSVP-TE LSPs to have a primary and a hot standby, except
123	   that it can operate in 1+1 mode.  This capability is also referred to
124	   as multicast live-live and is a generalized form of that discussed in
125	   [I-D.ietf-rtgwg-mofrr].  In this document, local protection refers to
126	   the method of having alternate ways of reaching the pre-identified
127	   merge points upon detection of a local failure.  This capability is
128	   also referred to as fast-reroute.

130	   This document describes the general architecture, framework, and
131	   trade-offs of the different approaches to solving these general
132	   problems.  It will recommend how to generally provide global
133	   protection and local protection for mLDP and PIM traffic.  Where
134	   protocol extensions are necessary, they will be defined in separate
135	   documents as follows.

137	   o  Global 1+1 Protection Using PIM

139	   o  Global 1+1 Protection Using mLDP

141	   o  Local Protection Using mLDP:
142	      [I-D.wijnands-mpls-mldp-node-protection]This document describes
143	      how to provide node-protection and the necessary extensions using
144	      targeted LDP session.

146	   o  Local Protection Using PIM

148	1.1.  Maximally Redundant Trees (MRTs)

150	   Maximally Redundant Trees (MRTs) are described in
151	   [I-D.enyedi-rtgwg-mrt-frr-algorithm]; here we only give a brief
152	   description about the concept.  A pair of MRTs is a pair of directed
153	   spanning trees (red and blue tree) with a common root, directed so
154	   that each node can be reached from the root on both trees.  Moreover,
155	   these trees are redundant, since they are constructed so that no
156	   single link or single node failure can separate any node from the
157	   root on both trees, unless that failed link or node is splitting the
158	   network into completely separated components (e.g. the link or node
159	   was a cut-edge or cut-vertex).

161	   Although for multicast, the arcs (directed links) are directed away
162	   from the root instead of towards the root, the same MRT computations
163	   are used and apply.  This is similar to how multicast uses unicast
164	   routing's next-hops as the upstream-hops.  Thus this definition
165	   slightly differs from the one presented in
166	   [I-D.enyedi-rtgwg-mrt-frr-algorithm], since the arcs are directed
167	   away and not towards the root.  When we need two paths towards a
168	   given destination and not two away from it (e.g. for unicast detours
169	   for local repair solutions), we only need to reverse the arcs from
170	   how they are used for the unicast routing case; thus constructing
171	   MRTs towards or away from the root is the same problem.  A pair of
172	   MRTs is depicted in Figure 1.

174	                            [E]---[D]---|     |---[J]
175	                             |     |    |     |    |
176	                             |     |    |     |    |
177	                            [R]   [F]  [C]---[G]   |
178	                             |     |    |     |    |
179	                             |     |    |     |    |
180	                            [A]---[B]---|     |---[H]

182	                                 (a) a network

184	           [E]<--[D]---|     |-->[J]        [E]<--[D]             [J]
185	            ^     |    |     |    |                ^               ^
186	            |     V    V     |    |                |               |
187	           [R]   [F]  [C]-->[G]   |         [R]   [F]  [C]-->[G]   |
188	                  |               |          |     ^    ^     |    |
189	                  V               V          V     |    |     |    |
190	           [A]<--[B]             [H]        [A]-->[B]---|     |-->[H]

192	            (b) Blue MRT of root R           (c) Red MRT of root R

194	               Figure 1: A network and two MRTs  found in it

196	   It is important to realize that this redundancy criterion does not
197	   imply that, after a failure, either of the MRTs remains intact, since
198	   a node failure must affect any spanning tree.  Redundancy here means
199	   that there will be a set of nodes, which can be reached along the
200	   blue MRT, and there will be another set, which remains reachable
201	   along the red MRT.  As an example, suppose that node F goes down;
202	   that would separate B and A on the blue MRT and D and E on the red
203	   MRT.  Naturally, it is possible that the intersection of these two
204	   sets is not empty, e.g.  C, G, H and J will remain reachable on both
205	   MRTs.  Additionally, observe that a single link can be used in both
206	   of the trees in different directions, so even a link failure can cut
207	   both trees.  In this example, the failure of link F<->B leads to the
208	   same reachability sets.

210	   Finally, it is critical to recall that a pair of MRTs is always
211	   constructed together and they are not SPTs.  While it would be useful
212	   to have an algorithm that could find a redundant pair for a given
213	   tree (e.g. for the SPT), that is impossible in general.  Moreover, if
214	   there is a failure and at least one of the trees change, the other
215	   tree may need to change as well.  Therefore, even if a node still
216	   receives the traffic along the red tree, it cannot keep the old red
217	   tree, and construct a blue pair for it; there can be reconfiguration
218	   in cases when traditional shortest-path-based-thinking would not
219	   expect it.  To converge to a new pair of disjoint MRTs, it is
220	   generally necessary to update both the blue MRT and the red MRT.

222	   The two MRTs provide two separate forwarding topologies that can be
223	   used in addition to the default shortest-path-tree (SPT) forwarding
224	   topology (usually MT-ID 0).  There is a Blue MRT forwarding topology
225	   represented by one MT-ID; similarly there is a Red MRT forwarding
226	   topology represented by a different MT-ID.  Naturally, a multicast
227	   protocol is required to use the forwarding topologies information to
228	   build the desired multicast trees.  The multicast protocol can simply
229	   request appropriate upstream interfaces, but include the MT-ID when
230	   needed.

232	1.2.  MRTs and Multicast

234	   Maximally Redundant Trees (MRT) provide two advantages for protecting
235	   multicast traffic.  First, for global protection, MRTs are precisely
236	   what needs to be computed to have maximally redundant multicast
237	   distribution trees.  Second, for local repair, MRTs ensure that there
238	   will protection to the merge points; the certainty of a path from any
239	   merge point to the PLR that avoids the failure node allows for the
240	   creation of alternate trees.

242	   A known disadvantage of MRT, and redundant trees in general, is that
243	   the trees do not necessarily provide shortest detour paths.  Modeling
244	   is underway to investigate and compare the MRT lengths for the
245	   different algorithm options [I-D.enyedi-rtgwg-mrt-frr-algorithm].

247	2.  Terminology
248	   2-connected:   A graph that has no cut-vertices.  This is a graph
249	      that requires two nodes to be removed before the network is
250	      partitioned.

252	   2-connected cluster:   A maximal set of nodes that are 2-connected.

254	   2-edge-connected:   A network graph where at least two links must be
255	      removed to partition the network.

257	   ADAG:   Almost Directed Acyclic Graph - a graph that, if all links
258	      incoming to the root were removed, would be a DAG.

260	   block:   Either a 2-connected cluster, a cut-edge, or an isolated
261	      vertex.

263	   cut-link:   A link whose removal partitions the network.  A cut-link
264	      by definition must be connected between two cut-vertices.  If
265	      there are multiple parallel links, then they are referred to as
266	      cut-links in this document if removing the set of parallel links
267	      would partition the network.

269	   cut-vertex:   A vertex whose removal partitions the network.

271	   DAG:   Directed Acyclic Graph - a graph where all links are directed
272	      and there are no cycles in it.

274	   GADAG:   Generalized ADAG - a graph that is the combination of the
275	      ADAGs of all blocks.

277	   Maximally Redundant Trees (MRT):   A pair of trees where the path
278	      from any node X to the root R along the first tree and the path
279	      from the same node X to the root along the second tree share the
280	      minimum number of nodes and the minimum number of links.  Each
281	      such shared node is a cut-vertex.  Any shared links are cut-links.
282	      Any RT is an MRT but many MRTs are not RTs.

284	   Maximally Redundant Multicast Trees (MRMT):   A pair of multicast
285	      trees built of the sub-set of MRTs that is needed to reach all
286	      interested receivers.

288	   network graph:   A graph that reflects the network topology where all
289	      links connect exactly two nodes and broadcast links have been
290	      transformed into the standard pseudo-node representation.

292	   Redundant Trees (RT):   A pair of trees where the path from any node
293	      X to the root R along the first tree is node-disjoint with the
294	      path from the same node X to the root along the second tree.
295	      These can be computed in 2-connected graphs.

297	   Merge Point (MP):   For local repair, a router at which the alternate
298	      traffic rejoins the primary multicast tree.  For global
299	      protection, a router which receives traffic on multiple trees and
300	      must decide which stream to forward on.

302	   Point of Local Repair (PLR):   The router that detects a local
303	      failure and decides whether and when to forward traffic on
304	      appropriate alternates.

306	   MT-ID:   Multi-topology identifier.  The default shortest-path-tree
307	      topology is MT-ID 0.

309	   MultiCast Ingress (MCI):   Multicast Ingress, the node where the
310	      multicast stream enters the current transport technology (MPLS-
311	      mLDP or IP-PIM) domain.  This maybe the router attached to the
312	      multicast source, the PIM Rendezvous Point (RP) or the mLDP Root
313	      node address.

315	   Upstream Multicast Hop (UMH):   Upstream Multicast Hop, a candidate
316	      next-hop that can be used to reach the MCI of the tree.

318	   Stream Selection:   The process by which a router determines which of
319	      the multiple primary multicast streams to accept and forward.  The
320	      router can decide on a packet-by-packet basis or simply per-
321	      stream.  This is done for global protection 1+1 and described in
322	      [I-D.ietf-rtgwg-mofrr].

324	   MultiCast Egress (MCE):   Multicast Egress, a node where the
325	      multicast stream exists the current transport technology (MPLS-
326	      mLDP or IP-PIM) domain.  This is usually a receiving router that
327	      may forward the multicast traffic on towards receivers based upon
328	      IGMP or other technology.

330	3.  Use-Cases and Applicability

332	   Protection of multicast streams has gained importance with the use of
333	   multicast to distribute video, including live video such as IP-TV.
334	   There are a number of different scenarios and uses of multicast that
335	   require protection.  A few preliminary examples are described below.

337	   o  When video is distributed via IP or MPLS for a cable application,
338	      it is desirable to have global protection 1+1 so that the
339	      customer-perceived impact is limited.  A QAM can join two
340	      multicast groups and determine which stream to use based upon the
341	      stream quality.  A network implementing this may be custom-
342	      engineered for this particular purpose.

344	   o  In financial markets, stock ticker data is distributed via
345	      multicast.  The loss of data can have a significant financial
346	      impact.  Depending on the network, either global protection 1+1 or
347	      local protection can minimize the impact.

349	   o  Several solutions exist for updating software or firmwares of a
350	      large number of end-user or operator-owned networking equipment
351	      that are based on IP multicast.  Since IP multicast is based on
352	      datagram transport so taking care of lost data is cumbersome and
353	      decreases the advantages offered by multicast.  Solutions may rely
354	      on sending the updates several times: a properly protected network
355	      may result in that less repetitions are required.  Other solutions
356	      rely on the recipent asking for lost data segments explicitly on-
357	      demand.  A network failure could cause data loss for a significant
358	      number of receivers, which in turn would start requesting the the
359	      lost data in a burst that could overload the server.  Properly
360	      engineered multicast fast reroute would minimise such impacts.

362	   o  Some providers offer multicast VPN services to their customers.
363	      SLAs between the customer and provider may set low packet loss
364	      requirements.  In such cases interruptions longer than the outage
365	      timescales targeted by FRR could cause direct financial losses for
366	      the provider.

368	   Global protection 1+1 uses maximally redundant multicast trees
369	   (MRMTs) to simultaneously distribute a multicast stream on both
370	   MRMTs.  The disadvantage is the extra state and bandwidth
371	   requirements of always sending the traffic twice.  The advantage is
372	   that the latency of each MRMT can be known and the receiver can
373	   select the best stream.

375	   Local protection provides a patch around the fault while the
376	   multicast tree reconverges.  When PLR replication is used, there is
377	   no extra multicast state in the network, but the bandwidth
378	   requirements vary based upon how many potential merge-points must be
379	   provided.  When alternate-trees are used, there is extra multicast
380	   state but the bandwidth requirements on a link can be minimized to no
381	   more than once for the primary multicast tree traffic and once for
382	   the alternate-tree traffic.

384	4.  Global Protection: Multicast Live-Live

386	   In MoFRR [I-D.ietf-rtgwg-mofrr], the idea of joining both a primary
387	   and a secondary tree is introduced with the requirement that the
388	   primary and secondary trees be link and node disjoint.  This works
389	   well for networks where there are dual-planes, as explained in
390	   [I-D.ietf-rtgwg-mofrr].  For other networks, it is still desirable to
391	   have two disjoint multicast trees and allow a receiver to join both
392	   and make its own decision about which traffic to accept.

394	   Using MRTs gives the ability to guarantee that the two trees are as
395	   disjoint as possible and dynamically recomputed whenever the topology
396	   changes.  The MRTs used are rooted at the MultiCast Ingress (MCI).
397	   One multicast tree is created using the Blue MRT forwarding topology.
398	   The second multicast tree is created using the Red MRT forwarding
399	   topology.  This can be accomplished by specifying the appropriate
400	   MT-ID associated with each forwarding topology.

402	   There are four different aspects of using MRTs for 1+1 Global
403	   Protection that are necessary to consider.  They are as follows.

405	   1.  Creation of the maximally redundant multicast trees (MRMTs) based
406	       upon the forwarding topologies.

408	   2.  Traffic Identification: How to handle traffic when the two MRMTs
409	       overlap due to a cut-vertex or cut-link.

411	   3.  Convergence: How to converge after a network change and get back
412	       to a protected state.

414	   4.  Inter-area/inter-level Behavior: How to compute and use MRMTs
415	       when the multicast source is outside the area/level and how to
416	       provide border-router protection.

418	4.1.  Creation of MRMTs

420	   The creation of the two maximally redundant multicast trees occurs as
421	   described below.  This assumes that the next-hops to the MCI
422	   associated with the Blue and Red forwarding topologies have already
423	   been computed and stored.

425	   1.  A receiving router determines that it wants to join both the Blue
426	       tree and the Red tree.  The details on how it does this decision
427	       are not covered in this document and could be based on
428	       configuration, additional protocols, etc.

430	   2.  The router selects among the Blue next-hops an Upstream Multicast
431	       Hop (UMH) to reach the MCI node.  The router joins the tree
432	       towards the selected UMH including a multi-topology id (MT-ID)
433	       identifying the Blue MRT.

435	   3.  The router selects among the Red next-hops an Upstream Multicast
436	       Hop (UMH) to reach the MCI node.  The router joins the tree
437	       towards the selected UMH including a multi-topology id (MT-ID)
438	       identifying the Red MRT.

440	   4.  When a router receives a tree setup request specifying a
441	       particular MT-ID (e.g.  Color), then the router selects among the
442	       Color next-hops to the MCI a UMH node, creates the necessary
443	       multicast state, and joins the tree towards the UMH node.

445	4.2.  Traffic Self-Identification

447	   Two maximally redundant trees will share any cut-vertices and cut-
448	   links in the network.  In the multicast global protection 1+1 case,
449	   this means that the potential single failures of the other nodes and
450	   links in the network are still protected against.  If each cut-vertex
451	   cannot associate traffic to a particular MRMT, then the traffic would
452	   be incorrectly replicated to both MRMT resulting in complete
453	   duplication of traffic.  An example of such MRTs is given earlier in
454	   Figure 1 and repeated below in Figure 2, where there are two cut-
455	   vertices C and G and a cut-link C<->G.

457	                            [E]---[D]---|     |---[J]
458	                             |     |    |     |    |
459	                             |     |    |     |    |
460	                            [R]   [F]  [C]---[G]   |
461	                             |     |    |     |    |
462	                             |     |    |     |    |
463	                            [A]---[B]---|     |---[H]

465	                                 (a) a network

467	           [E]<--[D]---|     |-->[J]        [E]<--[D]             [J]
468	            ^     |    |     |    |                ^               ^
469	            |     V    V     |    |                |               |
470	           [R]   [F]  [C]-->[G]   |         [R]   [F]  [C]-->[G]   |
471	                  |               |          |     ^    ^     |    |
472	                  V               V          V     |    |     |    |
473	           [A]<--[B]             [H]        [A]-->[B]---|     |-->[H]

475	            (b) Blue MRT of root R           (c) Red MRT of root R

477	               Figure 2: A network and two MRTs  found in it

479	   In this example, traffic from the multicast source R to a receiver G,
480	   J, or H will cross link C<->G on both the Blue and Red MRMTs.  When
481	   this occurs, there are several different possibilities depending upon
482	   protocol.

484	   mLDP:   Different label bindings will be created for the Blue and Red
485	      MRMTs.  As specified in [I-D.iwijnand-mpls-mldp-multi-topology],
486	      the P2MP FEC Element will use the MT IP Address Family to encode
487	      the Root node address and MRT T-ID.  Each MRMT will therefore have
488	      a different P2MP FEC Element and be assigned an independent label.

490	   PIM:   There are three different ways to handle IP traffic forwarded
491	      based upon PIM when that traffic will overlap on a link.

493	      A.  Different Groups: If different multicast groups are used for
494	          each MRMT, then the traffic clearly indicates which MRMT it
495	          belongs to.  In this case, traffic on the Blue MRMT would use
496	          multicast group G-blue and traffic on the Red MRMT would use
497	          multicast group G-red.

499	      B.  Different Source Loopbacks: Another option is to use different
500	          IP addresses for the source S, so S might announce S-red and
501	          S-blue.  In this case, traffic on the Blue MRMT would have an
502	          IP source of S-blue and traffic on the Red MRMT would have an
503	          IP source of S-red.

505	      C.  Stream Selection and Merging: The third option, described in
506	          Section 4.2.1, is to have a router that gets (S,G) Joins for
507	          both the Blue MT-ID and the Red MT-ID merge those into a
508	          single tree.  The router may need to select which upstream
509	          stream to use, just as if it were a receiving router.

511	   There are three options presented for PIM.  The most appropriate will
512	   depend upon deployment scenario as well as router capabilities.

514	4.2.1.  Merging MRMTs for PIM if Traffic Doesn't Self-Identify

516	   When traffic doesn't self-identify, the cut-vertices must follow
517	   specific rules to avoid traffic duplication.  This section describes
518	   that behavior which allows the same (S,G) to be used for both the
519	   Blue MT-ID and Red MT-ID (e.g. when the traffic doesn't self-identify
520	   as to its MT-ID).

522	   The behavior described in this section differs from the conflict
523	   resolution described in [RFC6420] because these rules apply to the
524	   Global Protection 1+1 case.  Specifically, it is not sufficient for a
525	   upstream router to pick only one of the two MT-IDs to join because
526	   that does not maximize the protection provided.

528	   As described in [RFC6420], a router that receives (S,G) Joins for
529	   both the Blue MT-ID and the Red MT-ID can merge the set of downstream
530	   interfaces in its forwarding entry.  Unlike the procedures defined in
531	   [RFC6420], the router must send a Join upstream for each MT-ID.  If a
532	   router has different upstream interfaces for these MRMTs, then the
533	   router will need to do stream selection and forward the selected
534	   stream to its outgoing interfaces, just as if it were an MCE.  The
535	   stream selection methods of detecting failures and handle traffic
536	   discarding are described in [I-D.ietf-rtgwg-mofrr].

538	   This method does not work if the MRMTs merge on a common LAN with
539	   different upstream routers.  In this case, the traffic cannot be
540	   distinguished on the LAN and will result in duplication on the LAN.
541	   The normal PIM Assert procedure would stop one of the upstream
542	   routers from transmitting duplicates onto the LAN once it is
543	   detected.  This, in turn, may cause the duplicate stream to be pruned
544	   back to the source.  Thus, end-to-end protection in this case of the
545	   MRMTs converging on a single LAN with different upstream interfaces
546	   can only be accomplished by the methods of traffic self-
547	   identification.

549	4.3.  Convergence Behavior

551	   It is necessary to handle topology changes and get back to having two
552	   MRMTs that provide global protection.  To understand the requirements
553	   and what can be computed, recall the following facts.

555	   a.  It is not generally possible to compute a single tree that is
556	       maximally redundant to an existing tree.

558	   b.  The pair of MRTs must be computed simultaneously.

560	   c.  After a single link or node failure, there is one set of nodes
561	       that can be reached from the root on the Blue MRMT and a second
562	       set of nodes that can be reached from the root on the Red MRMT.
563	       If the failure wasn't a cut-vertex or cut-edge, all nodes will be
564	       in at least one of these two sets.

566	   To gracefully converge, it is necessary to never have a router where
567	   both its red MRMT and blue MRMT are broken.  There are three
568	   different ways in which this could be done.  These options are being
569	   more fully explored to see which is most practical and provides the
570	   best set of trade-offs.

572	   Ordered Convergence  When a single failure occurs, each receiver
573	      determines whether it was affected or unaffected.  First, the
574	      affected receivers identify the broken MRMT color (e.g. blue) and
575	      join the MRMT via their new UMH for that MRT color.  Once the
576	      affected receivers receive confirmation that the new MRMT has been
577	      successfully created back to the MCI, then the affected receivers
578	      switch to using that MRMT.  The affected receivers tear down the
579	      old broken MRMT state and join the MRMT via their new UMH for the
580	      other MRT color (e.g. red).  Finally, once the affected receivers
581	      receive confirmation that the new MRMT has been successfully
582	      created back to the MCI, the affected receivers can tear down the
583	      old working MRMT state.  Once the affected receivers have updated
584	      their state, the unaffected receivers need to also do the same
585	      staging - first joining the MRMT via their new UMH for the Blue
586	      MRT, waiting for confirmation, switching to using traffic from the
587	      Blue MRMT, tearing down the old Blue MRMT state, joining the MRMT
588	      via their new UMH for the Red MRT, waiting for confirmation, and
589	      tearing down the old Red MRMT state.  There are complexities
590	      remaining, such as determining how an Unaffected Receiver decides
591	      that the Affected Receivers are done.  When the topology change
592	      isn't a failure, all receivers are unaffected and the same process
593	      can apply.

595	   Protocol Make-Before-Break  In the control plane, a router joins the
596	      tree on the new Blue topology but does not stop receiving traffic
597	      on the old Blue topology.  Once traffic is observed from the new
598	      Blue UMH, then the router accepts traffic on the new Blue UMH and
599	      removes the old Blue UMH.  This behavior can happen simultaneously
600	      with both Blue and Red forwarding topologies.  An advantage is
601	      that it works regardless of the type of topology change and
602	      existing traffic streams aren't broken.  Another advantage is that
603	      the complexity is limited and this method is well understood.  The
604	      disadvantage is that the number of traffic-affecting events
605	      depends upon the number of hops to the MCI.

607	   Multicast Source Make-Before-Break  On a topology change, routers
608	      would create new MRMTs using new MRT forwarding state and leaving
609	      the old MRMTs as they are.  After the new MRMTs are complete, the
610	      multicast source could switch from sending on the old MRMTs to
611	      sending on the new MRMTs.  After a time, the old MRMTs could be
612	      torn down.  There are a number of details to still investigate.

614	4.4.  Inter-area/level Behavior

616	   A source outside of the IGP area/level can be treated as a proxy
617	   node.  When the join request reaches a border router (whether ABR for
618	   OSPF or LBR for ISIS), that border router needs to determine whether
619	   to use the Blue or Red forwarding topology in the next selected area/
620	   level.

622	                                      |-------------------|
623	                                      |                   |
624	               |---[S]---|          [BR1]-----[ X ]       |
625	               |         |            |         |         |
626	             [ A ]-----[ B ]          |         |         |
627	               |         |          [ Y ]-----[BR2]--(proxy for S)
628	               |         |
629	             [BR1]-----[BR2]          (b) Area 10
630	                                      Y's Red next-hop:  BR1
631	            (a) Area 0                Y's Blue next-hop: BR2
632	           Red Next-Hops to S
633	             BR1's is BR2
634	             BR2's is B
635	             B's is S

637	           Blue Next-Hops to S
638	             BR1's is A
639	             BR2's is BR1
640	             A's is S

642	           Figure 3: Inter-area Selection - next-hops towards S

644	   Achieving maximally node-disjoint trees across multiple areas is hard
645	   due to the information-hiding and abstraction.  If there is only one
646	   border router, it is trivial but protection of the border router is
647	   not possible.  With exactly 2 border routers, inter-area/level node
648	   protection is reasonably straightforward but can require that the BR
649	   rewrite the (S,G) for PIM.  With more than 2 border routers, inter-
650	   area node protection is possible at the cost of additional bandwidth
651	   and border router complexity.  These two solutions are described in
652	   the following sub-sections.

654	4.4.1.  Inter-area Node Protection with 2 border routers

656	   If there are exactly two border routers between the areas, then the
657	   solution and necessary computation is straightforward.  In that
658	   specific case, each BR knows that only the other BR must specifically
659	   be avoided in the second area when a forwarding topology is selected.
660	   As described in [I-D.enyedi-rtgwg-mrt-frr-algorithm], it is possible
661	   for a node X to determine whether the Red or Blue forwarding topology
662	   should be used to reach a node D while avoiding another node Y.

664	   The results of this computation and the resulting changes in MT-ID
665	   from Red to Blue or Blue to Red are illustrated in Figure 3.  It
666	   shows an example where BR1 must modify joins received from Area 10
667	   for the Red MT-ID to use the Blue MT-ID in Area 0.  Similarly, BR2
668	   must modify joins received from Area 10 for the Blue MT-ID to use the
669	   Red MT-ID in Area 0.

671	   For mLDP, modifying the MT-ID in the control-plane is all that is
672	   needed.  For PIM, if the same (S,G) is used for both the Blue MT-ID
673	   and the Red MT-ID, then only control-plane changes are needed.
674	   However, for PIM, if different group IDs (e.g.  G-red and G-blue) or
675	   different source loopback addresses (S-red and S-blue) are used, it
676	   is necessary to modify the traffic to reflect the MT-ID included in
677	   the join message received on that interface.  An alternative could be
678	   to use an MPLS label that indicates the MT-ID instead of different
679	   group IDs or source loopback addresses.

681	   To summarize the necessary logic, when a BR1 receives a join from a
682	   neighbor in area N to a destination D in area M on the Color MT-ID,
683	   the BR1:

685	   a.  Identifies the BR2 at the other end of the proxy node in area N.

687	   b.  Determines which forwarding topology may avoid BR2 to reach D in
688	       area M. Refer to that as Color-2 MT-ID.

690	   c.  Uses Color-2 MT-ID to determine the next-hops to S. When a join
691	       is sent upstream, the MT-ID used is that for Color-2.

693	4.4.2.  Inter-area Node Protection with > 2 Border Routers

695	   If there are more than two BRs between areas, then the problem of
696	   ensuring inter-area node-disjointness is not solved.  Instead, once a
697	   request to join the multicast tree has been received by a BR from an
698	   area that isn't closest to the multicast source, the BR must join
699	   both the Red MT-ID and the Blue MT-ID in the area closest to the
700	   multicast source.  Regardless of what single link or node failure
701	   happens, each BR will receive the multicast stream.  Then, the BR can
702	   use the stream-selection techniques specified in
703	   [I-D.ietf-rtgwg-mofrr] to pick either the Blue or Red stream and
704	   forward it to downstream routers in the other area.  Each of the BRs
705	   for the other area should be attached to a proxy-node representing
706	   the other area.

708	   This approach ensures that a BR will receive the multicast stream in
709	   the closest area as long as the single link or node failure isn't a
710	   single point of failure.  Thus, each area or level is independently
711	   protected.  The BR is required to be able to select among the
712	   multicast streams and, if necessary for PIM, translate the traffic to
713	   contain the correct (S,G) for forwarding.

715	4.5.  PIM

717	   Capabilities need to be exchanged to determine that a neighbor
718	   supports using MRT forwarding topologies with PIM.  Additional
719	   signaling extensions are not necessary to PIM to support Global
720	   Protection.  [RFC6420] already defines how to specify an MT-ID as a
721	   Join Attribute.

723	4.5.1.  Traffic Handling: RPF Checks

725	   For PIM, RPF checks would still be enabled by the control plane.  The
726	   control plane can program different forwarding entries on the G-blue
727	   incoming interface and on the G-red incoming interface.  The other
728	   interfaces would still discard both G-blue and G-red traffic.

730	   The receiver would still need to detect failures and handle traffic
731	   discarding as is specified in [I-D.ietf-rtgwg-mofrr].

733	4.6.  mLDP

735	   Capabilities need to be exchanged to determine that a neighbor
736	   supports using MRT forwarding topologies with mLDP.  The basic
737	   mechansims for mLDP to support multi-topology are already described
738	   in [I-D.iwijnand-mpls-mldp-multi-topology].  It may be desirable to
739	   extend the capability defined in this draft to indicate that MRT is
740	   or is not supported.

742	5.  Local Repair: Fast-Reroute

744	   Local repair for multicast traffic is different from unicast in
745	   several important ways.

747	   o  There is more than a single final destination.  The full set of
748	      receiving routers may not be known by the PLR and may be extremely
749	      large.  Therefore, it makes sense to repair to the immediate next-
750	      hops for link-repair and the next-next-hops for node-repair.
751	      These are the potential merge points (MPs).

753	   o  If a failure cannot be positively identified as a node-failure,
754	      then it is important to repair to the immediate next-hops since
755	      they may have receivers attached.

757	   o  If a failure cannot be positively identified as a link-failure and
758	      node protection is desired, then it is important to repair to the
759	      next-next-hops since they may not receive traffic from the
760	      immediate next-hops.

762	   o  Updating multicast forwarding state may take significantly longer
763	      than updating unicast state, since the multicast state is updated
764	      tree by tree based on control-plane signaling.

766	   o  For tunnel-based IP/LDP approaches, neither the PLR nor the MP may
767	      be able to specify which interface the alternate traffic will
768	      arrive at the MP on.  The simplest reason is the unicast
769	      forwarding includes the use of ECMP and the path selection is
770	      based upon internal router behavior for all paths between the PLR
771	      and the MP.

773	   For multicast fast-reroute, there are three different mechanisms that
774	   can be used.  As long as the necessary signaling is available, these
775	   methods can be combined in the same network and even for the same PLR
776	   and failure point.

778	   PLR-driven Unicast Tunnels:   The PLR learns the set of MPs that need
779	      protection.  On a failure, the PLR replicates the traffic and
780	      tunnels it to each MP using the unicast route.  If desired, an
781	      RSVP-TE tunnel could be used instead of relying upon unicast
782	      routing.

784	   MP-driven Unicast Tunnels:   Each MP learns the identity of the PLR.
785	      Before failure, each MP independently signals to the PLR the
786	      desire for protection and other information to use.  On a failure,
787	      the PLR replicates the traffic and tunnels it to each MP using the
788	      unicast route.  If desired, an RSVP-TE tunnel could be used
789	      instead of relying upon unicast routing.

791	   MP-driven Alternate Trees:   Each MP learns the identity of the PLR
792	      and the failure point (node and interface) to be protected
793	      against.  Each MP selects an upstream interface and forwarding
794	      topology where the path will avoid the failure point; each MP
795	      signals a join towards that upstream interface to create that
796	      state.

798	   Each of these options is described in more detail in their respective
799	   sections.  Then the methods are compared and contrasted for PIM and
800	   for mLDP.

802	5.1.  PLR-driven Unicast Tunnels

804	   With PLR-driven unicast tunnels, the PLR learns the set of merge
805	   points (MPs) and, on a locally detected failure, uses the existing
806	   unicast routing to tunnel the multicast traffic to those merge
807	   points.  The failure being protected against may be link or node
808	   failure.  If unicast forwarding can provide an SRLG-protecting
809	   alternate, then SRLG-protection is also possible.

811	   There are five aspects to making this work.

813	   1.  PLR needs to learn the MPs and their associated MPLS labels to
814	       create protection state.

816	   2.  Unicast routing has to offer alternates or have dedicated tunnels
817	       to reach the MPs.  The PLR encapsulates the multicast traffic and
818	       directs it to be forwarded via unicast routing.

820	   3.  The MP must identify alternate traffic and decide when to accept
821	       and forward it or drop it.

823	   4.  When the MP reconverges, it must move to its new UMH using make-
824	       before-break so that traffic loss is minimized.

826	   5.  The PLR must know when to stop sending traffic on the alternates.

828	5.1.1.  Learning the MPs

830	   If link-protection is all that is desired, then the PLR already knows
831	   the identities of the MPs.  For node-protection, this is not
832	   sufficient.  In the PLR-driven case, there is no direct communication
833	   possible between the PLR and the next-next-hops on the multicast
834	   tree.  (For mLDP, when targeted LDP sessions are used, this is
835	   considered to be MP-driven and is covered in Section 5.2.)

837	   In addition to learning the identities of the MPs, the PLR must also
838	   learn the MPLS label, if any, associated with each MP.  For mLDP, a
839	   different label should be supplied for the alternate traffic; this
840	   allows the MP to distinguish between the primary and alternate
841	   traffic.  For PIM, an MPLS label is used to identify that traffic is
842	   the alternate.  The unicast tunnel used to send traffic to the MP may
843	   have penultimate-hop-popping done; thus without an explicit MPLS
844	   label, there is no certainty that a packet could be conclusively
845	   identified as primary traffic or as alternate traffic.

847	   A router must tell its UMH the identity of all downstream multicast
848	   routers, and their associated alternate labels, on the particular
849	   multicast tree.  This clearly requires protocol extensions.  The
850	   extensions for PIM are given in [I-D.kebler-pim-mrt-protection].

852	5.1.2.  Using Unicast Tunnels and Indirection

854	   The PLR must encapsulate the multicast traffic and tunnel it towards
855	   each MP.  The key point is how that traffic then reaches the MP.
856	   There are basically two possibilities.  It is possible that a
857	   dedicated RSVP-TE tunnel exists and can be used to reach the MP for
858	   just this traffic; such an RSVP-TE tunnel would be explicitly routed
859	   to avoid the failure point.  The second possibility is that the
860	   packet is tunneled via LDP and uses unicast routing.  The second case
861	   is explored here.

863	   It is necessary to assume that unicast LDP fast-reroute
864	   [I-D.ietf-rtgwg-mrt-frr-architecture][RFC5714][RFC5286] is supported
865	   by the PLR.  Since multicast convergence takes longer than unicast
866	   convergence, the PLR may have two different routes to the MP over
867	   time.  When the failure happens, the PLR will have an alternate,
868	   whether LFA or MRT, to reach the MP.  Then the unicast routing
869	   converges and the PLR will have a new primary route to the MP.  Once
870	   the routing has converged, it is important that alternate traffic is
871	   no longer carried on the MRT forwarding topologies.  This rule allows
872	   the MRT forwarding topologies to reconverge and be available for the
873	   next failure.  Therefore, it is also necessary for the tunneled
874	   multicast traffic to move from the alternate route to the new primary
875	   route when the PLR reconverges.  Therefore, the tunneled multicast
876	   traffic should use indirection to obtain the unicast routing's
877	   current next-hops to the MP.  If physical indirection is not
878	   feasible, then when the unicast LIB is updated, the associated
879	   multicast alternate tunnel state should be as well.

881	   When the PLR detects a local failure, the PLR replicates each
882	   multicast packet, swaps or adds the alternate MPLS label needed by
883	   the MP, and finally pushes the appropriate label for the MP based
884	   upon the outgoing interface selected by the unicast routing.

886	   For PIM, if no alternate labels are supplied by the MPs, then the
887	   multicast traffic could be tunneled in IP.  This would require
888	   unicast IP fast-reroute.

890	5.1.3.  MP Alternate Traffic Handling

892	   A potential Merge Point must determine when and if to accept
893	   alternate traffic.  There are two critical components to this
894	   decision.  First, the MP must know the state of all links to its UMH.
895	   This allows the MP to determine whether the multicast stream could be
896	   received from the UMH.  Second, the MP must be able to distinguish
897	   between a normal multicast tree packet and an alternate packet.

899	   The logic is similar for PIM and mLDP, but in PIM there is only one
900	   RPF-interface or interface of interest to the UMH.  In mLDP, all the
901	   directly connected interfaces to the UMH are of interest.  When the
902	   MP detects a local failure, if that interface was the last connected
903	   to the UMH and used for the multicast group, then the MP must rapidly
904	   switch from accepting the normal multicast tree traffic to accepting
905	   the alternate traffic.  This rapid change must happen within the same
906	   approximately 50 milliseconds that the PLR switching to send traffic
907	   on the alternate takes and for the same reasons.  It does no good for
908	   the PLR to send alternate traffic if the MP doesn't accept it when it
909	   is needed.

911	   The MP can identify alternate traffic based upon the MPLS label.
912	   This will be the alternate label that the MP supplied to its UMH for
913	   this purpose.

915	5.1.4.  Merge Point Reconvergence

917	   After a failure, the MP will want to join the multicast tree
918	   according to the new topology.  It is critical that the MP does this
919	   in a way that minimizes the traffic disruption.  Whenever paths
920	   change, there is also the possibility for a traffic-affecting event
921	   due to different latencies.  However, traffic impact above that
922	   should be avoided.

924	   The MP must do make-before-break.  Until the MP knows that its new
925	   UMH is fully connected to the MCI, the MP should continue to accept
926	   its old alternate traffic.  The MP could learn that the new UMH is
927	   sufficient either via control-plane mechanisms or data-driven.  In
928	   the latter case, the reception of traffic from the new UMH can
929	   trigger the change-over.  If the data-driven approach is used, a
930	   time-out to force the switch should apply to handle multicast trees
931	   that have long quiet periods.

933	5.1.5.  PLR termination of alternate traffic

935	   The PLR sends traffic on the alternates for a configurable time-out.
936	   There is no clean way for the next-hop routers and/or next-next-hop
937	   routers to indicate that the traffic is no longer needed.

939	   If better control were desired, each MP could tell its UMH what the
940	   desired time-out is.  The UMH could forward this to the PLR as well.
941	   Then the PLR could send alternate traffic to different MPs based upon
942	   the MP's individual timer.  This would only be an advantage if some
943	   of the MPs were expected to have a longer multicast reconvergence
944	   time than others - either due to load or router capabilities.

946	5.2.  MP-driven Unicast Tunnels

948	   MP-driven unicast tunnels are only relevant for mLDP where targeted
949	   LDP sessions are feasible.  For PIM, there is no mechanism to
950	   communicate beyond a router's immediate neighbors; these techniques
951	   could work for link-protection, but even then there would not be a
952	   way of requesting that the PLR should stop sending traffic.

954	   There are three differences for MP-driven unicast tunnels from PLR-
955	   driven unicast tunnels.

957	   1.  The MPs learn the identity of the PLR from their UMH.  The PLR
958	       does not learn the identities of the MPs.

960	   2.  The MPs create direct connections to the PLR and communicate
961	       their alternate labels.

963	   3.  When the MPs have converged, each explicitly tells the PLR to
964	       stop sending alternate traffic.

966	   The first means that a router communicates its UMH to all its
967	   downstream multicast hops.  Then each MP communicates to the PLR(s)
968	   (1 for link-protection and 1 for node-protection) and indicates the
969	   multicast tree that protection is desired for and the associated
970	   alternate label.

972	   When the PLR learns about a new MP, it adds that MP and associated
973	   information to the set of MPs to be protected.  On a failure, the PLR
974	   does the same behavior as for the PLR-driven unicast tunnels.

976	   After the failure, the MP reconverges using make-before-break.  Then
977	   the MP explicitly communicates to the PLR(s) that alternate traffic
978	   is no longer needed for that multicast tree.  When the node-
979	   protecting PLR hasn't changed for a MP, it may be necessary to
980	   withdraw the old alternate label, which tells the PLR to stop
981	   transmitting alternate traffic, and then provide a new alternate
982	   label.

984	5.3.  MP-driven Alternate Trees

986	   In this document we have defined different solutions to achieve fast
987	   convergence for multicast link and node protection based on MRTs.  At
988	   a high level these solutions can be separated in Local and Global
989	   protections.  Alternate Trees, which is a Local protection schema,
990	   initially looked like an attractive solution for Multicast node
991	   protection since it avoids replicating the packet by the PLR to each
992	   of the receivers of the protected node and waisting bandwidth.
993	   However, this comes at the expense of extra multicast state and
994	   complexity.  In order to mitigate the extra multicast state its
995	   possible to aggregate the Alternate Trees by creating an Alternate
996	   Tree per protected node and reuse it for all the multicast trees
997	   going through this node.  This further complicates the procedures and
998	   upstream assigned labels are required to de-aggregate the trees.
999	   With aggregation we are also introducing an unwanted side effect.
1000	   The receiver population of the aggregated trees will very likely not
1001	   be the same.  That means multicast packets will be forwarded on the
1002	   Alternate Tree to node(s) that may not have a receiver(s) for the
1003	   protected tree.  The more protected trees are aggregated, the higher
1004	   the risk of forwarding unwanted multicast packets, this leads again
1005	   to waisting bandwidth.

1007	   Considering the complexity of this solution and the unwanted side-
1008	   effects, the authors of this document believe its better to solve
1009	   Multicast node protection using a Global protection schema, as
1010	   documented in Section 4.  The solution previously defined in this
1011	   section has been move to Appendix A (Section 9).

1013	6.  Acknowledgements

1015	   The authors would like to thank Kishore Tiruveedhula, Santosh Esale,
1016	   and Maciek Konstantynowicz for their suggestions and review.

1018	7.  IANA Considerations

1020	   This doument includes no request to IANA.

1022	8.  Security Considerations

1024	   This architecture is not currently believed to introduce new security
1025	   concerns.

1027	9.  Appendix A

1029	9.1.  MP-driven Alternate Trees

1031	   For some networks, it is highly desirable not to have the PLR perform
1032	   replication to each MP.  PLR replication can cause substantial
1033	   congestion on links used by alternates to different MPs.  At the same
1034	   time, it is also desirable to have minimal extra state created in the
1035	   network.  This can be resolved by creating alternate-trees that can
1036	   protect multiple multicast groups as a bypass-alternate-tree.  An
1037	   alternate-tree can also be created per multicast group, PLR and
1038	   failure point.

1040	   It is not possible to merge alternate-trees for different PLRs or for
1041	   different neighbors.  This is shown in Figure 4 where G can't select
1042	   an acceptable upstream node on the alternate tree that doesn't
1043	   violate either the need to avoid C (for PLR A) or D (for PLR B).

1045	           |-------[S]--------|           Alternate from A must avoid C
1046	           V                  V           Alternate from B ust avoid D
1047	          [A]------[E]-------[B]
1048	           |        |         |
1049	           V        |         V
1050	       |--[C]------[F]-------[D]---|
1051	       |   |                  |    |
1052	       |   |-------[G]--------|    |
1053	       |            |              |
1054	       |            |              |
1055	       |->[R1]-----[H]-------[R2]<-|

1057	          (a) Multicast tree from S
1058	        S->A->C->R1  and  S->B->D->R2

1060	        Figure 4: Alternate Trees from PLR A and B can't be merged

1062	   A MP that joins an alternate-tree for a particular multicast stream
1063	   should not expect or request PLR-replicated tunneled alternate
1064	   traffic for that same multicast stream.

1066	   Each alternate-tree is identified by the PLR which sources the
1067	   traffic and the failure point (node and link) (FP) to be avoided.
1068	   Different multicast groups with the same PLR and FP may have
1069	   different sets of MPs - but they are all at most going to include the
1070	   FP (for link protection) and the neighbors of FP except for the PLR.
1071	   For a bypass-alternate-tree to work, it must be acceptable to
1072	   temporarily send a multicast group's traffic to FP's neighbors that
1073	   do not need it.  This is the trade-off required to reduce alternate-
1074	   tree state and use bypass-alternate-trees.  As discussed in
1075	   Section 5.1.3, a potential MP can determine whether to accept
1076	   alternate traffic based upon the state of its normal upstream links.
1077	   Alternate traffic for a group the MP hasn't joined can just be
1078	   discarded.

1080	                             [S]......[PLR]--[ A ]
1081	                                       | |     |
1082	                                      1| |2    |
1083	                                      [ FP]--[MP3]
1084	                                        |  \   |
1085	                                        |   \  |
1086	                                     [MP1]--[MP2]

1088	                     Figure 5: Alternate Tree Scenario

1090	   For any router, knowing the PLR and the FP to avoid will force
1091	   selection of either the Blue MRT or the Red MRT.  It is possible that
1092	   the FP doesn't actually appear in either MRT path, but the FP will
1093	   always be in either the set of nodes that might be used for the Blue
1094	   MRT path or the set of nodes that might be used for the Red MRT path.
1095	   The FP's membership in one of the sets is a function of the partial
1096	   ordering and topological ordering created by the MRT algorithm and is
1097	   consistent between routers in the network graph.

1099	   To create an alternate-tree, the following must happen:

1101	   1.  For node-protection, the MP learns from its upstream (the FP) the
1102	       node-id of its upstream (the PLR) and, optionally, a link
1103	       identifier for the link used to the PLR.  The link-id is only
1104	       needed for traffic handling in PIM, since mLDP can have targeted
1105	       sessions between the MP and the PLR.

1107	   2.  For link-protection, the MP needs to know the node-id of its
1108	       upstream (the PLR) and, optionally, its identifier for the link
1109	       used to the PLR.

1111	   3.  The MP determines whether to use the Blue or Red forwarding
1112	       topology to reach the PLR while avoiding the FP and associated
1113	       interface.  This gives the MP its alternate-tree upstream
1114	       interface.

1116	   4.  The MP signals a backup-join to its alternate-tree upstream
1117	       interface.  The backup-join specifies the PLR, FP and, for PIM,
1118	       the FP-PLR link identifier.  If the alternate-tree is not to be
1119	       used as a bypass-alternate-tree, then the multicast group (e.g.
1120	       (S,G) or Opaque-Value) must be specified.

1122	   5.  A router that receives a backup-join and is not the PLR needs to
1123	       create multicast state and send a backup-join towards the PLR on
1124	       the appropriate Blue or Red forwarding topology as is locally
1125	       determined to avoid the FP and FP-PLR link.

1127	   6.  Backup-joins for the same (PLR, FP, PLR-FP link-id) that
1128	       reference the same multicast group can be merged into a single
1129	       alternate-tree.  Similarly, backup-joins for the same (PLR, FP,
1130	       PLR-FP link-id) that reference no multicast group can be merged
1131	       into a single alternate-tree.

1133	   7.  When the PLR receives the backup-join, it associates either the
1134	       specified multicast group with that alternate-tree, if such is
1135	       given, or associates all multicast groups that go to the FP via
1136	       the specified FP-PLR link with the alternate-tree.

1138	   For an example, look at Figure 5.  FP would send a backup-join to MP3
1139	   indicating (PLR, FP, PLR-FP link-1).  MP3 sends a backup-join to A.
1140	   MP1 sends a backup-join to MP2 and MP2 sends a backup-join to MP3.

1142	   It is necessary that traffic on each alternate-tree self-identify as
1143	   to which alternate-tree it is part of.  This is because an alternate-
1144	   tree for a multicast-group and a particular (PLR, FP, PLR-FP link-id)
1145	   can easily overlap with an alternate-tree for the same multicast
1146	   group and a different (PLR, FP, PLR-FP link-id).  The best way of
1147	   doing this depends upon whether PIM or mLDP is being used.

1149	9.1.1.  PIM details for Alternate-Trees

1151	   For PIM, the (S,G) of the IP packet is a globally unique identifier
1152	   and is understood.  To identify the alternate-tree, the most
1153	   straightforward way is to use MPLS labels distributed in the PIM
1154	   backup-join messages.  A MP can use the incoming label to indicate
1155	   the set of RPF-interfaces for which the traffic may be an alternate.
1156	   If the alternate-tree isn't a bypass-alternate-tree, then only one
1157	   RPF interface is referenced.  If the alternate-tree is a bypass-
1158	   alternate-tree, then multiple RPF-interfaces (parallel links to FP)
1159	   might be intended.  Alternate-tree traffic may cross an interface
1160	   multiple times - either because the interface is a broadcast
1161	   interface and different downstream-assigned labels are provided
1162	   and/or because a MP may provide different labels.

1164	9.1.2.  mLDP details for Alternate-Trees

1166	   For mLDP, if bypass-alternate-trees are used, then the PLR must
1167	   provide upstream-assigned labels to each multicast stream.  The MP
1168	   provides the label for the alternate-tree; if the alternate-tree is
1169	   not a bypass-alternate-tree, this label also describes the multicast
1170	   stream.  If the alternate-tree is a bypass-alternate-tree, then it
1171	   provides the context for the PLR-assigned labels for each multicast
1172	   stream.  If there are targeted LDP sessions between the PLR and the
1173	   MPs, then the PLR could provide the necessary upstream-assigned
1174	   labels.

1176	9.1.3.  Traffic Handling by PLR

1178	   An issue with traffic is how long should the PLR continue to send
1179	   alternate traffic out.  With an alternate-tree, the PLR can know to
1180	   stop forwarding alternate traffic on the alternate-tree when that
1181	   alternate-tree's state is torn down.  This provides a clear signal
1182	   that alternate traffic is no longer needed.

1184	9.2.  Methods Compared for PIM

1186	   The two approaches that are feasible for PIM are PLR-driven Unicast
1187	   Tunnels and MP-driven Alternate-Trees.

1189	   +-------------------------+-------------------+---------------------+
1190	   |          Aspect         |     PLR-driven    |      MP-driven      |
1191	   |                         |  Unicast Tunnels  |   Alternate-Trees   |
1192	   +-------------------------+-------------------+---------------------+
1193	   |    Worst-case Traffic   | 1 + number of MPs |          2          |
1194	   |   Replication Per Link  |                   |                     |
1195	   |  PLR alternate-traffic  |    timer-based    |    control-plane    |
1196	   |                         |                   |      terminated     |
1197	   |  Extra multicast state  |        none       |  per (PLR,FP,S) for |
1198	   |                         |                   |     bypass mode     |
1199	   +-------------------------+-------------------+---------------------+

1201	   Which approach is prefered may be network-dependent.  It should also
1202	   be possible to use both in the same network.

1204	9.3.  Methods Compared for mLDP

1206	   All three approaches are feasible for mLDP.  Below is a brief
1207	   comparison of various aspects of each.

1209	   +-------------------+---------------+-------------+-----------------+
1210	   |       Aspect      |   MP-driven   |  PLR-driven |    MP-driven    |
1211	   |                   |    Unicast    |   Unicast   | Alternate-Trees |
1212	   |                   |    Tunnels    |   Tunnels   |                 |
1213	   +-------------------+---------------+-------------+-----------------+
1214	   |     Worst-case    | 1 + number of |  1 + number |        2        |
1215	   |      Traffic      |      MPs      |    of MPs   |                 |
1216	   |  Replication Per  |               |             |                 |
1217	   |        Link       |               |             |                 |
1218	   |        PLR        | control-plane | timer-based |  control-plane  |
1219	   | alternate-traffic |   terminated  |             |    terminated   |
1220	   |  Extra multicast  |      none     |     none    |  per (PLR,FP,S) |
1221	   |       state       |               |             | for bypass mode |
1222	   +-------------------+---------------+-------------+-----------------+

1224	10.  References

1226	10.1.  Normative References

1228	   [I-D.enyedi-rtgwg-mrt-frr-algorithm]
1229	              Atlas, A., Envedi, G., Csaszar, A., and A. Gopalan,
1230	              "Algorithms for computing Maximally Redundant Trees for
1231	              IP/LDP Fast- Reroute",
1232	              draft-enyedi-rtgwg-mrt-frr-algorithm-03 (work in
1233	              progress), July 2013.

1235	   [I-D.ietf-rtgwg-mrt-frr-architecture]
1236	              Atlas, A., Kebler, R., Envedi, G., Csaszar, A., Tantsura,
1237	              J., Konstantynowicz, M., White, R., and M. Shand, "An
1238	              Architecture for IP/LDP Fast-Reroute Using Maximally
1239	              Redundant Trees", draft-ietf-rtgwg-mrt-frr-architecture-03
1240	              (work in progress), July 2013.

1242	   [RFC4601]  Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas,
1243	              "Protocol Independent Multicast - Sparse Mode (PIM-SM):
1244	              Protocol Specification (Revised)", RFC 4601, August 2006.

1246	   [RFC6388]  Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas,
1247	              "Label Distribution Protocol Extensions for Point-to-
1248	              Multipoint and Multipoint-to-Multipoint Label Switched
1249	              Paths", RFC 6388, November 2011.

1251	   [RFC6420]  Cai, Y. and H. Ou, "PIM Multi-Topology ID (MT-ID) Join
1252	              Attribute", RFC 6420, November 2011.

1254	10.2.  Informative References

1256	   [I-D.ietf-rtgwg-mofrr]
1257	              Karan, A., Filsfils, C., Farinacci, D., Wijnands, I.,
1258	              Decraene, B., Joorde, U., and W. Henderickx, "Multicast
1259	              only Fast Re-Route", draft-ietf-rtgwg-mofrr-02 (work in
1260	              progress), June 2013.

1262	   [I-D.iwijnand-mpls-mldp-multi-topology]
1263	              Wijnands, I. and K. Raza, "mLDP Extensions for Multi
1264	              Topology Routing",
1265	              draft-iwijnand-mpls-mldp-multi-topology-03 (work in
1266	              progress), June 2013.

1268	   [I-D.kebler-pim-mrt-protection]
1269	              Kebler, R., Atlas, A., Wijnands, IJ., and G. Enyedi, "PIM
1270	              Extensions for Protection Using Maximally Redundant
1271	              Trees",  draft-kebler-pim-mrt-protection-00 (work in
1272	              progress), March 2012.

1274	   [I-D.wijnands-mpls-mldp-node-protection]
1275	              Wijnands, I., Rosen, E., Raza, K., Tantsura, J., Atlas,
1276	              A., and Q. Zhao, "mLDP Node Protection",
1277	              draft-wijnands-mpls-mldp-node-protection-04 (work in
1278	              progress), June 2013.

1280	   [RFC5286]  Atlas, A. and A. Zinin, "Basic Specification for IP Fast
1281	              Reroute: Loop-Free Alternates", RFC 5286, September 2008.

1283	   [RFC5714]  Shand, M. and S. Bryant, "IP Fast Reroute Framework",
1284	              RFC 5714, January 2010.

1286	Authors' Addresses

1288	   Alia Atlas (editor)
1289	   Juniper Networks
1290	   10 Technology Park Drive
1291	   Westford, MA  01886
1292	   USA

1294	   Email: akatlas@juniper.net

1296	   Robert Kebler
1297	   Juniper Networks
1298	   10 Technology Park Drive
1299	   Westford, MA  01886
1300	   USA

1302	   Email: rkebler@juniper.net

1304	   IJsbrand Wijnands
1305	   Cisco Systems, Inc.

1307	   Email: ice@cisco.com

1309	   Andras Csaszar
1310	   Ericsson
1311	   Konyves Kalman krt 11
1312	   Budapest  1097
1313	   Hungary

1315	   Email: Andras.Csaszar@ericsson.com
1316	   Gabor Sandor Enyedi
1317	   Ericsson
1318	   Konyves Kalman krt 11.
1319	   Budapest  1097
1320	   Hungary

1322	   Email: Gabor.Sandor.Enyedi@ericsson.com