idnits 2.17.1 

draft-ietf-trill-resilient-trees-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == The 'Updates: ' line in the draft header should list only the _numbers_
     of the RFCs which will be updated by this document (if approved); it
     should not include the word 'RFC' in the list.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC6325, updated by this document, for
     RFC5378 checks: 2006-05-11)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 2, 2015) is 3214 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC2119' is mentioned on line 172, but not defined

  ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761)

  ** Obsolete normative reference: RFC 7180 (Obsoleted by RFC 7780)


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	INTERNET-DRAFT                                              Mingui Zhang
3	Intended Status: Proposed Standard                                Huawei
4	Updates: RFC 6325                                     Tissa Senevirathne
5	                                                                   Cisco
6	                                                    Janardhanan Pathangi
7	                                                                    DELL
8	                                                           Ayan Banerjee
9	                                                                   Cisco
10	                                                          Anoop Ghanwani
11	                                                                    DELL
12	Expires: January 3, 2016                                    July 2, 2015

14	                   TRILL Resilient Distribution Trees
15	                draft-ietf-trill-resilient-trees-03.txt

17	Abstract

19	   TRILL protocol provides multicast data forwarding based on IS-IS link
20	   state routing. Distribution trees are computed based on the link
21	   state information through Shortest Path First calculation. When a
22	   link on the distribution tree fails, a campus-wide recovergence of
23	   this distribution tree will take place, which can be time consuming
24	   and may cause considerable disruption to the ongoing multicast
25	   service.

27	   This document specifies how to build backup distribution trees to
28	   protect links on the primary distribution tree. Since the backup
29	   distribution tree is built up ahead of the link failure, when a link
30	   on the primary distribution tree fails, the pre-installed backup
31	   forwarding table will be utilized to deliver multicast packets
32	   without waiting for the campus-wide recovergence. This minimizes the
33	   service disruption. This document updates RFC 6325.

35	Status of this Memo

37	   This Internet-Draft is submitted to IETF in full conformance with the
38	   provisions of BCP 78 and BCP 79.

40	   Internet-Drafts are working documents of the Internet Engineering
41	   Task Force (IETF), its areas, and its working groups. Note that other
42	   groups may also distribute working documents as Internet-Drafts.

44	   Internet-Drafts are draft documents valid for a maximum of six months
45	   and may be updated, replaced, or obsoleted by other documents at any
46	   time.  It is inappropriate to use Internet-Drafts as reference
47	   material or to cite them other than as "work in progress."

49	   The list of current Internet-Drafts can be accessed at
50	   http://www.ietf.org/1id-abstracts.html

52	   The list of Internet-Draft Shadow Directories can be accessed at
53	   http://www.ietf.org/shadow.html

55	Copyright and License Notice

57	   Copyright (c) 2015 IETF Trust and the persons identified as the
58	   document authors. All rights reserved.

60	   This document is subject to BCP 78 and the IETF Trust's Legal
61	   Provisions Relating to IETF Documents
62	   (http://trustee.ietf.org/license-info) in effect on the date of
63	   publication of this document. Please review these documents
64	   carefully, as they describe your rights and restrictions with respect
65	   to this document. Code Components extracted from this document must
66	   include Simplified BSD License text as described in Section 4.e of
67	   the Trust Legal Provisions and are provided without warranty as
68	   described in the Simplified BSD License.

70	Table of Contents

72	   1. Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  4
73	     1.1. Conventions used in this document . . . . . . . . . . . . .  5
74	     1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . .  5
75	   2. Usage of Affinity Sub-TLV . . . . . . . . . . . . . . . . . . .  5
76	     2.1. Allocating Affinity Links . . . . . . . . . . . . . . . . .  5
77	     2.2. Distribution Tree Calculation with Affinity Links . . . . .  6
78	   3. Resilient Distribution Trees Calculation  . . . . . . . . . . .  7
79	     3.1. Designating Roots for Backup Trees  . . . . . . . . . . . .  8
80	       3.1.1. Conjugate Trees . . . . . . . . . . . . . . . . . . . .  8
81	       3.1.2. Explicitly Advertising Tree Roots . . . . . . . . . . .  8
82	     3.2. Backup DT Calculation . . . . . . . . . . . . . . . . . . .  8
83	       3.2.1. Backup DT Calculation with Affinity Links . . . . . . .  8
84	         3.2.1.1. Algorithm for Choosing Affinity Links . . . . . . .  9
85	         3.2.1.2. Affinity Links Advertisement  . . . . . . . . . . . 10
86	       3.2.2. Backup DT Calculation without Affinity Links  . . . . . 10
87	   4. Resilient Distribution Trees Installation . . . . . . . . . . . 10
88	     4.1. Pruning the Backup Distribution Tree  . . . . . . . . . . . 11
89	     4.2. RPF Filters Preparation . . . . . . . . . . . . . . . . . . 12
90	   5. Protection Mechanisms with Resilient Distribution Trees . . . . 12
91	     5.1. Global 1:1 Protection . . . . . . . . . . . . . . . . . . . 13
92	     5.2. Global 1+1 Protection . . . . . . . . . . . . . . . . . . . 13
93	       5.2.1. Failure Detection . . . . . . . . . . . . . . . . . . . 14
94	       5.2.2. Traffic Forking and Merging . . . . . . . . . . . . . . 14
95	     5.3. Local Protection  . . . . . . . . . . . . . . . . . . . . . 14
96	       5.3.1. Start Using the Backup Distribution Tree  . . . . . . . 15
97	       5.3.2. Duplication Suppression . . . . . . . . . . . . . . . . 15
98	       5.3.3. An Example to Walk Through  . . . . . . . . . . . . . . 15
99	     5.4. Switching Back to the Primary Distribution Tree . . . . . . 16
100	   6. Security Considerations . . . . . . . . . . . . . . . . . . . . 16
101	   7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 17
102	   Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . 17
103	   8. References  . . . . . . . . . . . . . . . . . . . . . . . . . . 17
104	     8.1. Normative References  . . . . . . . . . . . . . . . . . . . 17
105	     8.2. Informative References  . . . . . . . . . . . . . . . . . . 18
106	   Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19

108	1. Introduction

110	   Lots of multicast traffic is generated by interrupt latency sensitive
111	   applications, e.g., video distribution including IP-TV, video
112	   conference and so on. Normally, a network fault will be recovered
113	   through a network wide reconvergence of the forwarding states, but
114	   this process is too slow to meet the tight Service Level Agreement
115	   (SLA) requirements on the service disruption duration. What is worse,
116	   updating multicast forwarding states may take significantly longer
117	   than unicast convergence since multicast states are updated based on
118	   control-plane signaling [mMRT].

120	   Protection mechanisms are commonly used to reduce the service
121	   disruption caused by network faults. With backup forwarding states
122	   installed in advance, a protection mechanism can restore an
123	   interrupted multicast stream in tens of milliseconds which meets
124	   stringent SLAs on service disruption. Several protection mechanisms
125	   for multicast traffic have been developed for IP/MPLS networks [mMRT]
126	   [MoFRR]. However, the way that TRILL constructs distribution trees
127	   (DT) is different from the way that multicast trees are computed
128	   under IP/MPLS, therefore a multicast protection mechanism suitable
129	   for TRILL is required.

131	   This document proposes "Resilient Distribution Trees" (RDT) in which
132	   backup trees are installed in advance for the purpose of fast failure
133	   repair. Three types of protection mechanisms are proposed.

135	   o  Global 1:1 protection is used to refer to the mechanism where the
136	      multicast source RBridge normally injects one multicast stream
137	      onto the primary DT. When an interruption of this stream is
138	      detected, the source RBridge switches to the backup DT to inject
139	      subsequent multicast streams until the primary DT is recovered.

141	   o  Global 1+1 protection is used to refer to the mechanism where the
142	      multicast source RBridge always injects two copies of multicast
143	      streams, one onto the primary DT and one onto the backup DT
144	      respectively. In the normal case, multicast receivers pick the
145	      stream sent along the primary DT and egress it to its local link.
146	      When a link failure interrupts the primary stream, the backup one
147	      will be picked until the primary DT is recovered.

149	   o  Local protection refers to the mechanism where the RBridge
150	      attached to the failed link locally repairs the failure.

152	   RDT may greatly reduce the service disruption caused by link
153	   failures. In the global 1:1 protection, the time cost by DT
154	   recalculation and installation can be saved. The global 1+1
155	   protection and local protection further save the time spent on
156	   failure propagation. A failed link can be repaired in tens of
157	   milliseconds. Although it's possible to make use of RDT to achieve
158	   load balance of multicast traffic, this document leaves that for
159	   future study.

161	   [RFC7176] specifies the Affinity Sub-TLV. An "Affinity Link" can be
162	   explicitly assigned to a distribution tree or trees. This offers a
163	   way to manipulate the calculation of distribution trees. With
164	   intentional assignment of Affinity Links, a backup distribution tree
165	   can be set up to protect links on a primary distribution tree.

167	1.1. Conventions used in this document

169	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
170	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
171	   document are to be interpreted as described in RFC 2119 [RFC2119].

173	1.2. Terminology

175	   DT: Distribution Tree

177	   IS-IS: Intermediate System to Intermediate System

179	   PLR: Point of Local Repair. In this document, PLR is the multicast
180	     upstream RBridge connecting the failed link. It's valid only for
181	     local protection.

183	   RDT: Resilient Distribution Tree

185	   RPF: Reverse Path Forwarding

187	   SLA: Service Level Agreement

189	   TRILL: TRansparent Interconnection of Lots of Links

191	2. Usage of Affinity Sub-TLV

193	   This document uses the Affinity Sub-TLV [RFC7176] to assign a parent
194	   to an RBridge in a tree as discussed below.

196	2.1. Allocating Affinity Links

198	   The Affinity Sub-TLV explicitly assigns parents for RBridges on
199	   distribution trees. It can be recognized by each RBridge in the
200	   campus. The originating RBridge becomes the parent and the nickname
201	   contained in the Affinity Record identifies the child. This
202	   explicitly provides an "Affinity Link" on a distribution tree or
203	   trees. The "Tree-num of roots" of the Affinity Record identify the
204	   distribution trees that adopt this Affinity Link [RFC7176].

206	   Affinity Links may be configured or automatically determined using an
207	   algorithm [CMT]. Suppose link RB2-RB3 is chosen as an Affinity Link
208	   on the distribution tree rooted at RB1. RB2 should send out the
209	   Affinity Sub-TLV with an Affinity Record that is like {Nickname=RB3,
210	   Num of Trees=1, Tree-num of roots=RB1}. In this document, RB3 does
211	   not have to be a leaf node on a distribution tree, therefore an
212	   Affinity Link can be used to identify any link on a distribution
213	   tree. This kind of assignment offers a flexibility to RBridges in
214	   distribution tree calculation: they are allowed to choose child for
215	   which they are not on the shortest paths from the root. This
216	   flexibility is used to increase the reliability of distribution trees
217	   in this document.

219	   Note that Affinity Link MUST NOT be misused to connect two RBridges
220	   which are not adjacent. If it is, the Affinity Link is ignored and
221	   has no effect on tree building.

223	2.2. Distribution Tree Calculation with Affinity Links

225	   When RBridges receive an Affinity Sub-TLV with Affinity Link that is
226	   an incoming link of RB2 (i.e., RB2 is the child on this Affinity
227	   Link), RB2's incoming links other than the Affinity Link are removed
228	   from the full graph of the campus to get a sub graph. RBridges
229	   perform the Shortest Path First calculation to compute the
230	   distribution tree based on the sub graph. In this way, the Affinity
231	   Link will surely appear on the distribution tree.

233	          Root                         Root
234	          +---+ -> +---+ -> +---+      +---+ -> +---+ -> +---+
235	          |RB1|    |RB2|    |RB3|      |RB1|    |RB2|    |RB3|
236	          +---+ <- +---+ <- +---+      +---+ <- +---+ <- +---+
237	           ^ |      ^ |      ^ |        ^ |      ^        ^ |
238	           | v      | v      | v        | v      |        | v
239	          +---+ -> +---+ -> +---+      +---+ -> +---+ -> +---+
240	          |RB4|    |RB5|    |RB6|      |RB4|    |RB5|    |RB6|
241	          +---+ <- +---+ <- +---+      +---+ <- +---+    +---+

243	                 Full Graph                    Sub Graph

245	                Root 1                       Root 1
246	                    / \                          / \
247	                   /   \                        /   \
248	                  4     2                      4     2
249	                       / \                     |     |
250	                      /   \                    |     |
251	                     5     3                   5     3
252	                     |                         |
253	                     |                         |
254	                     6                         6

256	   Shortest Path Tree of Full Graph   Shortest Path Tree of Sub Graph

258	       Figure 2.1: DT Calculation with the Affinity Link RB4-RB5

260	   Take Figure 2.1 as an example. Suppose RB1 is the root and link RB4-
261	   RB5 is the Affinity Link. RB5's other incoming links RB2-RB5 and RB6-
262	   RB5 are removed from the Full Graph to get the Sub Graph. Since RB4-
263	   RB5 is the unique link to reach RB5, the Shortest Path Tree
264	   inevitably contains this link.

266	3. Resilient Distribution Trees Calculation

268	   RBridges use IS-IS to detect and advertise network faults. A node or
269	   link failure will trigger a campus-wide reconvergence of distribution
270	   trees. The reconvergence generally includes the following procedures:

272	   1. Failure detected through IS-IS control messages (HELLO) exchanging
273	      or some other method such as BFD [RFC7175] [RBmBFD];

275	   2. IS-IS state flooding so each RBridge learns about the failure;

277	   3. Each RBridge recalculates affected distribution trees
278	      independently;

280	   4. RPF filters are updated according to the new distribution trees.
281	      The recomputed distribution trees are pruned and installed into
282	      the multicast forwarding tables.

284	   The reconvergence can be slow, which will disrupt ongoing multicast
285	   traffic. In protection mechanisms, alternative paths prepared ahead
286	   of potential node or link failures are used to detour the failures
287	   upon the failure detection, therefore service disruption can be
288	   minimized.

290	   This document focuses only on link protection. The construction of
291	   backup DT for the purpose of node protection is out the scope of this
292	   document. In order to protect a node on the primary tree, a backup
293	   tree can be setup without this node. When this node fails, the backup
294	   tree can be safely used to forward multicast traffic to make a
295	   detour. However, TRILL distribution trees are shared among all VLANs
296	   and Fine Grained Labels [RFC7172] and they have to cover all RBridge
297	   nodes in the campus [RFC6325]. A DT that does not span all RBridges
298	   in the campus may not cover all receivers of many multicast groups.
299	   (This is different from the multicast trees construction signaled by
300	   PIM [RFC4601] or mLDP [RFC6388].)

302	3.1. Designating Roots for Backup Trees

304	   Operators MAY manually configure the roots for the backup DTs.
305	   Nevertheless, this document aims to provide a mechanism with minimum
306	   configuration. Two options are offered as follows.

308	3.1.1. Conjugate Trees

310	   [RFC6325] and [RFC7180] specify how distribution tree roots are
311	   selected. When a backup DT is computed for a primary DT, its root is
312	   set to be the root of this primary DT. In order to distinguish the
313	   primary DT and the backup DT, the root RBridge MUST own multiple
314	   nicknames.

316	3.1.2. Explicitly Advertising Tree Roots

318	   RBridge RB1 having the highest root priority nickname might
319	   explicitly advertise a list of nicknames to identify the roots of the
320	   primary and backup tree roots (See Section 4.5 of [RFC6325]).

322	3.2. Backup DT Calculation

324	3.2.1. Backup DT Calculation with Affinity Links
325	                          2                  1
326	                         /                    \
327	                   Root 1___                ___2 Root
328	                       /|\  \              /  /|\
329	                      / | \  \            /  / | \
330	                     3  4  5  6          3  4  5  6
331	                     |  |  |  |           \/    \/
332	                     |  |  |  |           /\    /\
333	                     7  8  9  10         7  8  9  10

335	                      Primary DT          Backup DT

337	        Figure 3.1: An Example of a Primary DT and its Backup DT

339	   TRILL supports the computation of multiple distribution trees by
340	   RBridges. With the intentional assignment of Affinity Links in DT
341	   calculation, this document proposes a method to construct RDTs. For
342	   example, in Figure 3.1, the backup DT is set up maximally disjoint to
343	   the primary DT. (The full topology is a combination of these two DTs,
344	   which is not shown in the figure.) Except for the link between RB1
345	   and RB2, all other links on the primary DT do not overlap with links
346	   on the backup DT. It means that every link on the primary DT, except
347	   link RB1-RB2, can be protected by the backup DT.

349	3.2.1.1. Algorithm for Choosing Affinity Links

351	   Operators MAY configure Affinity Links to intentionally protect a
352	   specific link, such as the link connected to a gateway. But it is
353	   desirable that every RBridge independently computes Affinity Links
354	   for a backup DT across the whole campus. This enables a distributed
355	   deployment and also minimizes configuration.

357	   Algorithms for Maximally Redundant Trees [MRT] may be used to figure
358	   out Affinity Links on a backup DT which is maximally disjointed to
359	   the primary DT but it only provides a subset of all possible
360	   solutions, i.e., the conjugate trees described in Section 3.1.1. In
361	   TRILL, RDT does not restrict the root of the backup DT to be the same
362	   as that of the primary DT. Two disjoint (or maximally disjointed)
363	   trees may root from different nodes, which significantly augments the
364	   solution space.

366	   This document RECOMMENDS achieving the independent method through a
367	   slight change to the conventional DT calculation process of TRILL.
368	   Basically, after the primary DT is calculated, the RBridge will be
369	   aware of which links will be used. When the backup DT is calculated,
370	   each RBridge increases the metric of these links by a proper value
371	   (for safety, it's recommended to used the summation of all original
372	   link metrics in the campus but not more than 2**23), which gives
373	   these links a lower priority being chosen by the backup DT by
374	   performing Shortest Path First calculation. All links on this backup
375	   DT can be assigned as Affinity Links but this is unnecessary. In
376	   order to reduce the amount of Affinity Sub-TLVs flooded across the
377	   campus, only those NOT picked by conventional DT calculation process
378	   ought to be recognized as Affinity Links.

380	3.2.1.2. Affinity Links Advertisement

382	   Similar to [CMT], every parent RBridge of an Affinity Link takes
383	   charge of announcing this link in an Affinity Sub-TLV. When this
384	   RBridge plays the role of parent RBridge for several Affinity Links,
385	   it is natural to have them advertised together in the same Affinity
386	   Sub-TLV and each Affinity Link is structured as one Affinity Record.

388	   Affinity Links are announced in the Affinity Sub-TLV that is
389	   recognized by every RBridge. Since each RBridge computes distribution
390	   trees as the Affinity Sub-TLV requires, the backup DT will be built
391	   up consistently.

393	3.2.2. Backup DT Calculation without Affinity Links

395	   This section provides an alternative method to set up a disjoint
396	   backup DT.

398	   After the primary DT is calculated, each RBridge increases the cost
399	   of those links which are already in the primary DT by a multiplier
400	   (For safety, 64x is RECOMMENDED.). It would ensure that a link
401	   appears in both trees if and only if there is no other way to reach
402	   the node (i.e. the graph would become disconnected if it were pruned
403	   of the links in the first tree.). In other words, the two trees will
404	   be maximally disjoint.

406	   The above algorithm is similar as that defined in Section 3.2.1.1.
407	   All RBridges MUST agree on the same algorithm, then the backup DT can
408	   be calculated by each RBridge consistently and configuration is
409	   unnecessary.

411	4. Resilient Distribution Trees Installation

413	   As specified in Section 4.5.2 of [RFC6325], an ingress RBridge MUST
414	   announce the distribution trees it may choose to ingress multicast
415	   frames. Thus other RBridges in the campus can limit the amount of
416	   states which are necessary for RPF check. Also, [RFC6325] recommends
417	   that an ingress RBridge by default chooses the DT or DTs whose root
418	   or roots are least cost from the ingress RBridge. To sum up, RBridges
419	   do pre-compute all the trees that might be used so they can properly
420	   forward multi-destination packets, but only install RPF state for
421	   some combinations of ingress and tree.

423	   This document states that the backup DT MUST be contained in an
424	   ingress RBridge's DT announcement list and included in this ingress
425	   RBridge's LSP. In order to reduce the service disruption time,
426	   RBridges SHOULD install backup DTs in advance, which also includes
427	   the RPF filters that need to be set up for RPF Check.

429	   Since the backup DT is intentionally built maximally disjoint to the
430	   primary DT, when a link fails and interrupts the ongoing multicast
431	   traffic sent along the primary DT, it is probable that the backup DT
432	   is not affected. Therefore, the backup DT installed in advance can be
433	   used to deliver multicast packets immediately.

435	4.1. Pruning the Backup Distribution Tree

437	   The way that a backup DT is pruned is different from the way that the
438	   primary DT is pruned. Even though a branch contains no downstream
439	   receivers, it is probable that it should not be pruned for the
440	   purpose of protection. The rule for backup DT pruning is that the
441	   backup DT should be pruned, eliminating branches that have no
442	   potential downstream RBridges which appear on the pruned primary DT.

444	   It is probably that the primary DT is not optimally pruned in
445	   practice. In this case, the backup DT SHOULD be pruned presuming that
446	   the primary DT is optimally pruned. Those redundant links that ought
447	   to be pruned will not be protected.

449	                                              1
450	                                               \
451	                    Root 1___                ___2 Root
452	                        / \  \              /  /|\
453	                       /   \  \            /  / | \
454	                      3     5  6          3  4  5  6
455	                      |     |  |            /    \/
456	                      |     |  |           /     /\
457	                      7     9  10         7     9  10
458	                    Pruned Primary DT   Pruned Backup DT

460	  Figure 4.1: The Backup DT is Pruned Based on the Pruned Primary DT.

462	   Suppose RB7, RB9 and RB10 constitute a multicast group MGx. The
463	   pruned primary DT and backup DT are shown in Figure 4.1. Referring
464	   back to Figure 3.1, branches RB2-RB1 and RB4-RB1 on the primary DT
465	   are pruned for the distribution of MGx traffic since there are no
466	   potential receivers on these two branches. Although branches RB1-RB2
467	   and RB3-RB2 on the backup DT have no potential multicast receivers,
468	   they appear on the pruned primary DT and may be used to repair link
469	   failures of the primary DT. Therefore they are not pruned from the
470	   backup DT. Branch RB8-RB3 can be safely pruned because it does not
471	   appear on the pruned primary DT.

473	4.2. RPF Filters Preparation

475	   RB2 includes in its LSP the information to indicate which trees RB2
476	   might choose to ingress multicast frames [RFC6325]. When RB2
477	   specifies the trees it might choose to ingress multicast traffic, it
478	   SHOULD include the backup DT. Other RBridges will prepare the RPF
479	   check states for both the primary DT and backup DT. When a multicast
480	   packet is sent along either the primary DT or the backup DT, it will
481	   pass the RPF Check. This works when global 1:1 protection is used.
482	   However, when global 1+1 protection or local protection is applied,
483	   traffic duplication will happen if multicast receivers accept both
484	   copies of the multicast packets from two RPF filters. In order to
485	   avoid such duplication, egress RBridge multicast receivers MUST act
486	   as merge points to activate a single RPF filter and discard the
487	   duplicated packets from the other RPF filter. In normal case, the RPF
488	   state is set up according to the primary DT. When a link fails, the
489	   RPF filter based on the backup DT should be activated.

491	5. Protection Mechanisms with Resilient Distribution Trees

493	   Protection mechanisms can be developed to make use of the backup DT
494	   installed in advance. But protection mechanisms already developed
495	   using PIM or mLDP for multicast of IP/MPLS networks are not
496	   applicable to TRILL due to the following fundamental differences in
497	   their distribution tree calculation.

499	   o  The link on a TRILL distribution tree is bidirectional while the
500	      link on a distribution tree in IP/MPLS networks is unidirectional.

502	   o  In TRILL, a multicast source node does not have to be the root of
503	      the distribution tree. It is just the opposite in IP/MPLS
504	      networks.

506	   o  In IP/MPLS networks, distribution trees are constructed for each
507	      multicast source node as well as their backup distribution trees.
508	      In TRILL, a small number of core distribution trees are shared
509	      among multicast groups. A backup DT does not have to share the
510	      same root as the primary DT.

512	   Therefore a TRILL specific multicast protection mechanism is needed.

514	   Global 1:1 protection, global 1+1 protection and local protection are
515	   developed in this section. In Figure 4.1, assume RB7 is the ingress
516	   RBridge of the multicast stream while RB9 and RB10 are the multicast
517	   receivers. Suppose link RB1-RB5 fails during the multicast
518	   forwarding. The backup DT rooted at RB2 does not include link RB1-
519	   RB5, therefore it can be used to protect this link. In global 1:1
520	   protection, RB7 will switch the subsequent multicast traffic to this
521	   backup DT when it's notified about the link failure. In the global
522	   1+1 protection, RB7 will inject two copies of the multicast stream
523	   and let multicast receivers RB9 and RB10 merge them. In the local
524	   protection, when link RB1-RB5 fails, RB1 will locally replicate the
525	   multicast traffic and send it on the backup DT.

527	5.1. Global 1:1 Protection

529	   In the global 1:1 protection, the ingress RBridge of the multicast
530	   traffic is responsible for switching the failure affected traffic
531	   from the primary DT over to the backup DT. Since the backup DT has
532	   been installed in advance, the global protection need not wait for
533	   the DT recalculation and installation. When the ingress RBridge is
534	   notified about the failure, it immediately makes this switch over.

536	   This type of protection is simple and duplication safe. However,
537	   depending on the topology of the RBridge campus, the time spent on
538	   the failure detection and propagation through the IS-IS control plane
539	   may still cause a considerable service disruption.

541	   BFD (Bidirectional Forwarding Detection) protocol can be used to
542	   reduce the failure detection time. Link failures can be rapidly
543	   detected with one-hop BFD [RFC7175]. [RBmBFD] introduces the fast
544	   failure detection of multicast paths. It can be used to reduce both
545	   the failure detection and propagation time in the global protection.
546	   In [RBmBFD], ingress RBridge need to send BFD control packets to poll
547	   each receiver, and receivers return BFD control packets to the
548	   ingress as response. If no response is received from a specific
549	   receiver for a detection time, the ingress can judge that the
550	   connectivity to this receiver is broken. Therefore, [RBmBFD] is used
551	   to detect the connectivity of a path rather than a link. The ingress
552	   RBridge will determine a minimum failed branch which contains this
553	   receiver. The ingress RBridge will switch ongoing multicast traffic
554	   based on this judgment. For example, on Figure 4.1, if RB9 does not
555	   response while RB10 still responds, RB7 will presume that link RB1-
556	   RB5 and RB5-RB9 are failed. Multicast traffic will be switched to a
557	   backup DT that can protect these two links. Accurate link failure
558	   detection might help ingress RBridges to make smarter decision but
559	   it's out of the scope of this document.

561	5.2. Global 1+1 Protection

563	   In the global 1+1 protection, the multicast source RBridge always
564	   replicates the multicast packets and sends them onto both the primary
565	   and backup DT. This may sacrifice the capacity efficiency but given
566	   there is much connection redundancy and inexpensive bandwidth in Data
567	   Center Networks, such kind of protection can be popular [MoFRR].

569	5.2.1. Failure Detection

571	   Egress RBridges (merge points) SHOULD realize the link failure as
572	   early as possible so that failure affected egress RBridges may update
573	   their RPF filters quickly to minimize the traffic disruption. Three
574	   options are provided as follows.

576	   1. Egress RBridges assume a minimum known packet rate for a given
577	      data stream [MoFRR]. A failure detection timer Td are set as the
578	      interval between two continuous packets. Td is reinitialized each
579	      time a packet is received. If Td expires and packets are arriving
580	      at the egress RBridge on the backup DT (within the time frame Td),
581	      it updates the RPF filters and starts to receive packets forwarded
582	      on the backup DT.

584	   2. With [RBmBFD], when a link failure happens, affected egress
585	      RBridges can detect a lack of connectivity from the ingress.
586	      Therefore these egress RBridges are able to update their RPF
587	      filters promptly.

589	   3. Egress RBridges can always rely on the IS-IS control plane to
590	      learn the failure and determine whether their RPF filters should
591	      be updated.

593	5.2.2. Traffic Forking and Merging

595	   For the sake of protection, transit RBridges SHOULD activate both
596	   primary and backup RPF filters, therefore both copies of the
597	   multicast packets will pass through transit RBridges.

599	   Multicast receivers (egress RBridges) MUST act as "merge points" to
600	   egress only one copy of each multicast packet. This is achieved by
601	   the activation of only a single RPF filter. In normal case, egress
602	   RBridges activate the primary RPF filter. When a link on the pruned
603	   primary DT fails, ingress RBridge cannot reach some of the receivers.
604	   When these unreachable receivers realize it, they SHOULD update their
605	   RPF filters to receive packets sent on the backup DT.

607	5.3. Local Protection

609	   In the local protection, the Point of Local Repair (PLR) happens at
610	   the upstream RBridge connecting the failed link. It is this RBridge
611	   that makes the decision to replicate the multicast traffic to recover
612	   this link failure. Local protection can further save the time spent
613	   on failure notification through the flooding of LSPs across the
614	   campus. In addition, the failure detection can be speeded up using
615	   [RFC7175], therefore local protection can minimize the service
616	   disruption within 50 milliseconds.

618	   Since the ingress RBridge is not necessarily the root of the
619	   distribution tree in TRILL, a multicast downstream point may not be
620	   the descendants of the ingress point on the distribution tree.
621	   Moreover, distribution trees in TRILL are bidirectional and do not
622	   share the same root. There are fundamental differences between the
623	   distribution tree calculation of TRILL and those used in PIM and
624	   mLDP, therefore local protection mechanisms used for PIM and mLDP,
625	   such as [mMRT] and [MoFRR], are not applicable here.

627	5.3.1. Start Using the Backup Distribution Tree

629	   The egress nickname TRILL header field of the replicated multicast
630	   TRILL data packets specifies the tree on which they are being
631	   distributed. This field will be rewritten to the backup DT's root
632	   nickname by the PLR. But the ingress of the multicast frame MUST
633	   remain unchanged. This is a halfway change of the DT for multicast
634	   packets. Afterwards, the PLR begins to forward multicast traffic
635	   along the backup DT. This updates [RFC6325] which specifies that the
636	   egress nickname in the TRILL header of a multi-destination TRILL data
637	   packet must not be changed by transit RBridges.

639	   In the above example, the PLR RB1 locally determines to send
640	   replicated multicast packets according to the backup DT. It will send
641	   it to the next hop RB2.

643	5.3.2. Duplication Suppression

645	   When a PLR starts to send replicated multicast packets on the backup
646	   DT, some multicast packets are still being sent along the primary DT.
647	   Some egress RBridges might receive duplicated multicast packets. The
648	   traffic forking and merging method in the global 1+1 protection can
649	   be adopted to suppress the duplication.

651	5.3.3. An Example to Walk Through

653	   The example used in the above local protection is put together to get
654	   a whole "walk through" below.

656	   In the normal case, multicast frames ingressed by RB7 with pruned
657	   distribution on primary DT rooted at RB1 are being received by RB9
658	   and RB10. When the link RB1-RB5 fails, the PLR RB1 begins to
659	   replicate and forward subsequent multicast packets using the pruned
660	   backup DT rooted at RB2. When RB2 gets the multicast packets from the
661	   link RB1-RB2, it accepts them since the RPF filter {DT=RB2,
662	   ingress=RB7, receiving links=RB1-RB2, RB3-RB2, RB4-RB2, RB5-RB2 and
663	   RB6-RB2} is installed on RB2. RB2 forwards the replicated multicast
664	   packets to its neighbors except RB1. The multicast packets reach RB6
665	   where both RPF filters {DT=RB1, ingress=RB7, receiving link=RB1-RB6}
666	   and {DT=RB2, ingress=RB7, receiving links=RB2-RB6 and RB9-RB6} are
667	   active. RB6 will let both multicast streams through. Multicast
668	   packets will finally reach RB9 where the RPF filter is updated from
669	   {DT=RB1, ingress=RB7, receiving link=RB5-RB9} to {DT=RB2,
670	   ingress=RB7, receiving link=RB6-RB9}. RB9 will egress the multicast
671	   packets on to the local link.

673	5.4. Switching Back to the Primary Distribution Tree

675	   Assume an RBridge receives the LSP that indicates a link failure.
676	   This RBridge starts to calculate the new primary DT based on the new
677	   topology without the failed link. Suppose the new primary DT is
678	   installed at t1.

680	   The propagation of LSPs around the campus will take some time. For
681	   safety, we assume all RBridges in the campus will have converged to
682	   the new primary DT at t1+Ts. By default, Ts (the "settling time") is
683	   set to 30s but it is configurable. At t1+Ts, the ingress RBridge
684	   switches the traffic from the backup DT back to the new primary DT.

686	   After another Ts (at t1+2*Ts), no multicast packets are being
687	   forwarded along the old primary DT. The backup DT should be updated
688	   (recalculated and reinstalled) according to the new primary DT. The
689	   process of this update under different protection types are discussed
690	   as follows.

692	   a) For the global 1:1 protection, the backup DT is simply updated at
693	      t1+2*Ts.

695	   b) For the global 1+1 protection, the ingress RBridge stops
696	      replicating the multicast packets onto the old backup DT at t1+Ts.
697	      The backup DT is updated at t1+2*Ts. It MUST wait for another Ts,
698	      during which time period all RBridges converge to the new backup
699	      DT. At t1+3*Ts, it's safe for the ingress RBridge start to
700	      replicate multicast packets onto the new backup DT.

702	   c) For the local protection, the PLR stops replicating and sending
703	      packets on the old backup DT at t1+Ts. It is safe for RBridges to
704	      start updating the backup DT at t1+2*Ts.

706	6. Security Considerations

708	   This document raises no new security issues for TRILL.

710	   For general TRILL Security Considerations, see [RFC6325].

712	7. IANA Considerations

714	   No new registry or registry entries are requested to be assigned by
715	   IANA. The Affinity Sub-TLV has already been defined in [RFC7176].
716	   This document does not change its definition. RFC Editor: please
717	   remove this section before publication.

719	Acknowledgements

721	   The careful review from Gayle Noble is gracefully acknowledged. The
722	   authors would like to thank the comments and suggestions from Donald
723	   Eastlake, Erik Nordmark, Fangwei Hu, Hongjun Zhai and Xudong Zhang.

725	8. References

727	8.1. Normative References

729	   [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D.,
730	             and A. Banerjee, "Transparent Interconnection of Lots of
731	             Links (TRILL) Use of IS-IS", RFC 7176, May 2014.

733	   [CMT]     T. Senevirathne, J. Pathangi, et al, "Coordinated Multicast
734	             Trees (CMT) for TRILL", draft-ietf-trill-cmt, work in
735	             progress.

737	   [RFC6325] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol
738	             Specification", RFC 6325, July 2011.

740	   [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas,
741	             "Protocol Independent Multicast - Sparse Mode (PIM-SM):
742	             Protocol Specification (Revised)", RFC 4601, August 2006.

744	   [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas,
745	             "Label Distribution Protocol Extensions for Point-to-
746	             Multipoint and Multipoint-to-Multipoint Label Switched
747	             Paths", RFC 6388, November 2011.

749	   [RBmBFD]  M. Zhang, S. Pallagatti and V. Govindan, "TRILL Support of
750	             Point to Multipoint BFD", draft-ietf-trill-p2mp-bfd, work
751	             in progress.

753	   [RFC7175] Manral, V., Eastlake 3rd, D., Ward, D., and A. Banerjee,
754	             "Transparent Interconnection of Lots of Links (TRILL):
755	             Bidirectional Forwarding Detection (BFD) Support", RFC
756	             7175, May 2014.

758	   [RFC7180] Eastlake 3rd, D., Zhang, M., Ghanwani, A., Manral, V., and
759	             A. Banerjee, "Transparent Interconnection of Lots of Links
760	             (TRILL): Clarifications, Corrections, and Updates", RFC
761	             7180, May 2014.

763	8.2. Informative References

765	   [mMRT]    A. Atlas, R. Kebler, et al., "An Architecture for Multicast
766	             Protection Using Maximally Redundant Trees", draft-atlas-
767	             rtgwg-mrt-mc-arch, work in progress.

769	   [MRT]     A. Atlas, Ed., R. Kebler, et al., "An Architecture for
770	             IP/LDP Fast-Reroute Using Maximally Redundant Trees",
771	             draft-ietf-rtgwg-mrt-frr-architecture, work in progress.

773	   [MoFRR]   A. Karan, C. Filsfils, et al., "Multicast only Fast Re-
774	             Route", draft-ietf-rtgwg-mofrr, work in progress.

776	   [mBFD]    D. Katz, D. Ward, "BFD for Multipoint Networks", draft-
777	             ietf-bfd-multipoint, work in progress.

779	   [RFC7172] Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., and
780	             D. Dutt, "Transparent Interconnection of Lots of Links
781	             (TRILL): Fine-Grained Labeling", RFC 7172, May 2014.

783	Author's Addresses

785	   Mingui Zhang
786	   Huawei Technologies Co.,Ltd
787	   Huawei Building, No.156 Beiqing Rd.
788	   Beijing 100095 P.R. China

790	   Email: zhangmingui@huawei.com

792	   Tissa Senevirathne
793	   Cisco Systems
794	   375 East Tasman Drive,
795	   San Jose, CA 95134

797	   Phone: +1-408-853-2291
798	   Email: tsenevir@cisco.com

800	   Janardhanan Pathangi
801	   Dell/Force10 Networks
802	   Olympia Technology Park,
803	   Guindy Chennai 600 032

805	   Phone: +91 44 4220 8400
806	   Email: Pathangi_Janardhanan@Dell.com

808	   Ayan Banerjee
809	   Cisco

811	   Email: ayabaner@cisco.com

813	   Anoop Ghanwani
814	   Dell
815	   350 Holger Way
816	   San Jose, CA 95134

818	   Phone: +1-408-571-3500
819	   Email: Anoop@alumni.duke.edu