idnits 2.17.1 

draft-ietf-spring-segment-protection-sr-te-paths-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 18 instances of too long lines in the document, the longest
     one being 7 characters in excess of 72.

  == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (September 30, 2020) is 1302 days in the past.  Is
     this intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: '1000-2000' is mentioned on line 246, but not defined

  == Missing Reference: '3000-4000' is mentioned on line 246, but not defined

  -- Looks like a reference, but probably isn't: '1100' on line 201

  -- Looks like a reference, but probably isn't: '1005' on line 201

  == Missing Reference: '400000-405000' is mentioned on line 651, but not
     defined

  == Outdated reference: A later version (-16) exists of
     draft-bashandy-rtgwg-segment-routing-uloop-09

  == Outdated reference: A later version (-13) exists of
     draft-ietf-rtgwg-segment-routing-ti-lfa-04

  == Outdated reference: A later version (-09) exists of
     draft-li-rtgwg-enhanced-ti-lfa-02


     Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Routing area                                                    S. Hegde
3	Internet-Draft                                                 C. Bowers
4	Intended status: Informational                     Juniper Networks Inc.
5	Expires: April 3, 2021                                      S. Litkowski
6	                                                           Cisco Systems
7	                                                                   X. Xu
8	                                                            Alibaba Inc.
9	                                                                   F. Xu
10	                                                                 Tencent
11	                                                      September 30, 2020

13	                   Segment Protection for SR-TE Paths
14	          draft-ietf-spring-segment-protection-sr-te-paths-00

16	Abstract

18	   Segment routing supports the creation of explicit paths using Adj-
19	   Segment-ID (SID), Node-SIDs, and BSIDs.  It is important to provide
20	   fast reroute (FRR) mechanisms to respond to failures of links and
21	   nodes in the Segment-Routed Traffic-Engineered(SR-TE) path.  A point
22	   of local repair (PLR) can provide FRR protection against the failure
23	   of a link in an SR-TE path by examining only the first (top) label in
24	   the SR label stack.  In order to protect against the failure of a
25	   node, a PLR may need to examine the second label in the stack as
26	   well, in order to determine SR-TE path beyond the failed node.  This
27	   document specifies how a PLR can use the first and second label in
28	   the SR-MPLS label stack describing an SR-TE path to provide
29	   protection against node failures.

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at https://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on April 3, 2021.

48	Copyright Notice

50	   Copyright (c) 2020 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (https://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
66	   2.  Node Failures Along SR-TE Paths . . . . . . . . . . . . . . .   3
67	     2.1.  Segment protection for explicit paths with Node-SIDs  . .   4
68	     2.2.  Segment Protection for Anycast-SIDs . . . . . . . . . . .   4
69	     2.3.  Segment protection for explicit paths with Adj-SIDs . . .   5
70	   3.  Detailed Solution using Context Tables  . . . . . . . . . . .   7
71	     3.1.  Building Context Tables . . . . . . . . . . . . . . . . .   7
72	     3.2.  Segment protection for Node-SIDs  . . . . . . . . . . . .   8
73	     3.3.  Segment protection for Adj-SIDs . . . . . . . . . . . . .   9
74	     3.4.  Segment protection for edge nodes . . . . . . . . . . . .  10
75	       3.4.1.  Detailed Example for Segment protection for edge
76	               nodes . . . . . . . . . . . . . . . . . . . . . . . .  11
77	   4.  Determining node can be bypassed  . . . . . . . . . . . . . .  12
78	   5.  Hold timers for Node-SID/Prefix-SIDs and Adj-SIDs . . . . . .  13
79	     5.1.  Interaction with micro-loop avoidance . . . . . . . . . .  14
80	   6.  Optimization Considerations . . . . . . . . . . . . . . . . .  14
81	     6.1.  Segment Protection Example with Common SRGB . . . . . . .  15
82	   7.  Operational Considerations  . . . . . . . . . . . . . . . . .  16
83	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  16
84	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  17
85	   10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  17
86	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  17
87	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  17
88	     11.2.  Informative References . . . . . . . . . . . . . . . . .  17
89	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  18

91	1.  Introduction

93	   It is possible for a routing device to completely go out of service
94	   abruptly due to power failure, hardware failure or software crashes.
95	   Node protection is an important property of the Fast Reroute
96	   mechanism.  It provides protection against a node failure by
97	   rerouting traffic around the failed node.  For example, the
98	   mechanisms described in Loop Free Alternates ([RFC5286]), Remote Loop
99	   Free Alternates ([RFC8102]), and
100	   [I-D.ietf-rtgwg-segment-routing-ti-lfa] can be used to provide node
101	   protection to ensure minimal traffic loss after a node failure.

103	   Section 2 describes problems with SR-TE paths and the need for a
104	   specialized mechanism to provide node protection for SR-TE paths.
105	   Section 3 describes the solution applied to paths built using Adj-
106	   SIDs and Node-SIDs.  In order to distinguish the node failures of the
107	   segment endpoints (mid points) in an SR-TE path from the usual node
108	   protection mechanisms described in various LFA mechansims, this
109	   document uses the term Segment Protection.

111	2.  Node Failures Along SR-TE Paths

113	   The topology shown in Figure 1. illustrates a example network
114	   topology with Segment Routing enabled on each node.

116	      Node          Node          Node          Node          Node
117	      SID:1         SID:2         SID:3         SID:4         SID:5
118	      +----+   10   +----+   10   +----+   10   +----+   10   +----+
119	      | R1 |--------| R2 |--------| R3 |--------| R4 |--------| R5 |
120	      +----+        +----+        +----+        +----+        +----+
121	          \                           \          /
122	           \ 10                        \ 100    / 60
123	            \                           \      /
124	             \   +----+                  +----+
125	              +--| R7 |------------------| R8 |
126	                 +----+    30            +----+
127	                / Node                   Node             Label stack:
128	               /  SID:7                  SID:8            +------------+
129	         +----+                          SRGB:            |  1008 (top)|
130	         | R6 |                          3000-4000        +------------+
131	         +----+                                           |  3005      |
132	         Node                                             +------------+
133	         SID:6

135	             * Numbers on the links represent the symmetric link cost

137	   Figure 1: Example topology.  The segment index for each node is shown
138	     in the diagram.  All nodes have SRGB = [1000-2000], except for R8
139	   which has SRGB = [3000-4000].  A label stack that represents the path
140	                   R1->R7->R8->R4->R5 is shown as well.

142	2.1.  Segment protection for explicit paths with Node-SIDs

144	   Consider an explicit path in the topology in Figure 1 from R1->R5 via
145	   R1->R7->R8->R4->R5.  This path can be built using the shortest paths
146	   from R1-to-R8 and R8-to-R5.  The label stack to instantiate this path
147	   contains two Node-SIDs 1008 and 3005.  The 1008 label will take the
148	   packet from R1 to R8 via R7 and get popped.  The next label in the
149	   stack 3005 will take the packet from R8 to the destination R5 via R4.
150	   If the node R8 goes down, it is not possible for R7 to perform FRR
151	   without examining the second label in the incoming label stack
152	   (3005).

154	   Note that in the absence of a failure, R7 does not need to understand
155	   the meaning of the second label (3005) in order to perform normal
156	   forwarding.  However, in order to support segment protection, R7 will
157	   need to understand the meaning of label 3005 in order to determine
158	   where the packet is headed after R8.

160	   The mechanisms used to detect whether a node failed or a link failed,
161	   is outside the scope of this document.  The possible options for node
162	   failure detection capabilities of a device and resultant forwarding
163	   state is described in section 5.2 in [RFC8679] are applicable to this
164	   draft as well.

166	2.2.  Segment Protection for Anycast-SIDs

168	   A prefix segment advertised as a Node-SID may only be advertised by
169	   one node in the network.  Instead, an anycast prefix segment may be
170	   advertised by more than one node.  In some situations, one can use
171	   Anycast-SIDs to construct SR-TE paths that are protected against node
172	   failure, without the need for the mechanism described in this
173	   document.

175	      +----+   10   +----+   10   +----+   10   +----+   10   +----+
176	      | R1 |--------| R2 |--------| R3 |--------| R4 |--------| R5 |
177	      +----+        +----+        +----+        +----+        +----+
178	          \                           \          / |
179	           \ 10                        \100   60/  |
180	            \                           \      /   |
181	             \   +----+    30            +----+    |
182	              +--| R7 |------------------| R8 |    |
183	                 +----+                  +----+    |
184	                /    \                  Anycast    +
185	               /      \                 SID:100   /
186	         +----+        \                         /
187	         | R6 |         \    40          +----+ /60
188	         +----+          +---------------| R9 |+          Label stack:
189	                                         +----+           +------------+
190	                                        Anycast           |  1100 (top)|
191	                                        SID:100           +------------+
192	                                                          |  1005      |
193	                                                          +------------+
194	           * Numbers on the links represent the symmetric link cost

196	      Figure 2: Topology illustrating use of Anycast-SIDs to protect
197	        against node failures.  All nodes have SRGB = [1000-2000].

199	   An example of this is shown in Figure 2.  In this example, R8 and R9
200	   advertise an Anycast-SID of 100.  The label stack in this example =
201	   [1100, 1005];. The top label (1100) corresponds to the Anycast-SID
202	   advertised by both R8 and R9.  In the absence of a failure, the
203	   packet sent by R1 with this label stack will follow the path from
204	   R1->R5 along R1->R7->R8->R4->R5.

206	   If R7 is performing a per-prefix LFA calculation [RFC5286], then R7
207	   will install a backup next-hop to R9 for this Anycast-SID, protecting
208	   against the failure of the primary next-hop to R8.  This backup path
209	   does not pass through R8, so it is would not be affected by a
210	   complete failure of node R8.  As illustrated by this example, for
211	   some topologies segment-protecting SR-TE paths can be constructed
212	   through the use of Anycast-SIDs, as opposed to the mechanism
213	   described in this document.

215	2.3.  Segment protection for explicit paths with Adj-SIDs
216	                                  Adj-SID:
217	                                  R3-R8:9044

219	      Node-         Node          Node          Node          Node
220	      SID:1         SID:2         SID:3         SID:4         SID:5
221	      +----+   10   +----+   10   +----+   10   +----+   10   +----+
222	      | R1 |--------| R2 |--------| R3 |--------| R4 |--------| R5 |
223	      +----+        +----+        +----+        +----+        +----+
224	          \                           \          /              |
225	           \ 10                        \ 100    / 60            | 10
226	            \                           \      /                |
227	             \   +----+                  +----+               +----+
228	              +--| R7 |------------------| R8 |---------------| R9 |
229	                 +----+    30            +----+      10       +----+
230	                / Node                   Node                 Node
231	               /  SID:7                  SID:8                SID:9
232	         +----+                          SRGB:
233	         | R6 |                          3000-4000        Label stack:
234	         +----+                                           +------------+
235	         Node                            Adj-SIDs:        |  1003 (top)|
236	         SID:6                           R8-R4:9054       +------------+
237	                                                          |  9044      |
238	                                                          +------------+
239	                                                          |  9054      |
240	                                                          +------------+
241	                                                          |  1005      |
242	                                                          +------------+
243	         * Numbers on the links represent the symmetric link cost

245	     Figure 3: Explicit path using an Adj-SID.  All nodes have SRGB =
246	         [1000-2000], except for R8 which has SRGB = [3000-4000].

248	   Consider an explicit path from R1->R5 via R1->R2->R3->R8->R4->R5.
249	   This path can be built using a combination of Node-SIDs and Adj-SIDs,
250	   as shown in Figure 3.  The diagram shows the label stack needed to
251	   instantiate this path, as well as several Adj-SIDs advertised by
252	   nodes involved in this path.  When a packet leaving R1 with this
253	   label stack reaches R3, the top label is 9044, which will take the
254	   packet to R8.  The next-next-hop in the path is R4.  To provide
255	   protection for the failure of node R8, R3 would need to send the the
256	   packet to R4 without going through R8.  However, the only way R3 can
257	   learn that the packet needs to go to the R4 is to examine the next
258	   label in the stack, label 9054.  Since R3 knows that R8 has
259	   advertised label 9054 as the adjacency segment for the link from R8
260	   to R4, R3 knows that a backup path can merge back into the original
261	   explicit path at R4.

263	3.  Detailed Solution using Context Tables

265	   This section provides a detailed description of how to construct
266	   node-protecting backup paths for SR-TE paths using context tables.
267	   The end result of this description is externally visible forwarding
268	   behavior that can be specified as a packet arriving at a PLR with a
269	   particular incoming label stack and leaving the PLR on a particular
270	   outgoing interface with a particular outgoing label stack.  There may
271	   be other methods of arriving at the same externally visible
272	   forwarding behavior as described in draft
273	   [I-D.ietf-rtgwg-segment-routing-ti-lfa]section 6.2.  It is not the
274	   intent of this document to exclude other methods, as long as the
275	   externally visible forwarding behavior is the same as produced by
276	   this method.

278	3.1.  Building Context Tables

280	   [RFC5331] introduced the concept of Context Specific Label Spaces and
281	   there are various applications making use of this concept.A context
282	   label table on a router represents the Label Forwarding Information
283	   Base (LFIB) from the point of view of a particular neighbor . Context
284	   tables are built by constructing incoming label mappings advertised
285	   by the neighbor and the actions corresponding to those labels.  The
286	   labels advertised by each node are local to the node and may not be
287	   unique across the segment routing domain.  The context tables are
288	   separate tables built on a per-neighbor basis on every node to ensure
289	   they represent LFIBs of a particular neighbor.

291	   When a PLR needs to protect an SR-TE path against the failure of a
292	   neighbor N, it creates a context table associated with N.  This
293	   context table is populated with the following segment routing
294	   forwarding entries:

296	      - All the Prefix-SIDs of the network.  The programmed incoming
297	      label map uses the SRGB of N to compute the input label value.
298	      The NHLFE (Next Hop Label Forwarding Entry) is then constructed by
299	      looking into all the nexthops for the Prefix-SID and choosing a
300	      loop-free path as explained in Section 3.2

302	      - All the Adj-SIDs advertised by N.  The NHLFE is constructed as
303	      explained in Section 3.3

305	   The following section illustrates how the context table is
306	   constructed to allow the PLR to provide node-protecting paths for the
307	   next-next hops in the topology shown in Figure 1 and Figure 3.

309	3.2.  Segment protection for Node-SIDs

311	   Figure 4 shows the routing table entries on R7 corresponding to the
312	   Node-SIDs to reach R1 and R8 for the topology in Figure 1.  In the
313	   absence of a failure, a packet with a label stack whose top label is
314	   1008 will have its top label popped by R7 (assuming PHP behavior),
315	   and R7 will forward the packet to R8.  When the interface to R8 is
316	   down, the backup next-hop entry is used.  R7 will pop the top label
317	   of 1008, and use the context table that R7 computed for R8 to
318	   evaluate the next label on the stack.

320	       R7's Routing Table (partial)
321	       Transits routes for Node-SIDs for R1 and R8
322	      +=============+=============================================+
323	      | In label    | Outgoing label action                       |
324	      +=============+=============================================+
325	      | 1001        | Primary: pop, fwd to R1                     |
326	      |             | Backup: pop, lookup context.r1              |
327	      +-------------+---------------------------------------------+
328	      | 1008        | Primary: pop, fwd to R8                     |
329	      |             | Backup: pop, lookup context.r8              |
330	      +-------------+---------------------------------------------+

332	       R7's Context Table for R8 (context.r8, partial)
333	      +=============+=============================================+
334	      | In label    | Outgoing label action                       |
335	      +=============+=============================================+
336	      | 3004        | swap 1004, fwd to R1                        |
337	      +-------------+---------------------------------------------+
338	      | 3005        | swap 1005, fwd to R1                        |
339	      +-------------+---------------------------------------------+
340	      | 3008        | drop                                        |
341	      +-------------+---------------------------------------------+

343	      Figure 4: Building node-protecting backup paths for SR-TE paths
344	                            involving Node-SIDs

346	   R7 builds context table for R8 using the following process.  R7
347	   computes the mapping of incoming label to Node-SID that R8 expects to
348	   see based on the SRGB advertised by R8.  In the example in Figure 1,
349	   R7 can determine that R8 interprets in incoming label of 3005 as
350	   mapping to the the Node-SID for R5.

352	   R7 then computes a loop-free backup path to reach R5 which is node-
353	   protecting with respect to the failure of R8.  In this example, the
354	   backup path computed by R7 to reach R5 without passing through R8 can
355	   be achieved forwarding the packet to R1 with a top label of 1005,
356	   corresponding to the Node-SID for R5 in the context of R1's SRGB.

358	   The loop-free path computation may be based on a mechanism such as
359	   LFA, R-LFA, TI-LFA, or constraint based SPF avoiding failure.  To
360	   populate the context table for R8, R7 maps the out label actions
361	   corresponding to the backup path to R5 to the incoming label 3005.
362	   This results in the entry for label 3005 shown in context.r8 in
363	   Figure 4.

365	   Therefore, when a packet arrives at R7 with label stack = [1008,
366	   3005], and the link from R7 to R8 has recently failed, R7 will use
367	   backup next-hop entry for label 1008 in its main routing table.
368	   Based on this entry, R7 will pop label 1008, and use context.r8 to
369	   lookup the new top label = 3005.  R7 will swap label 3005 for 1005
370	   and forward the packet to R1.  This will get the packet to R5 on a
371	   node protecting backup path.

373	   Note that R7 activates the node-protecting backup path when it
374	   detects that the link to R8 has failed.  R7 does not know that node
375	   R8 has actually failed.  However, the node-protecting backup path is
376	   computed assuming that the failure of the link to R8 implies that R8
377	   has failed.

379	3.3.  Segment protection for Adj-SIDs

381	   This section gives an example of how to constuct node-protecting
382	   backup paths when the SR-TE path uses Adj-SIDs.  Figure 5 shows some
383	   of the routing table entries for R3 corresponding to the sample
384	   network shown in Figure 3.  When the top label of the label stack is
385	   an Adj-SID, the PLR needs to recognize that in order to provide a
386	   node-protecting backup path, it needs to pop the top label and
387	   examine the next label in the context of the next-hop router
388	   identified by the top label Adj-SID.  In this example, when R3 is
389	   constructing its routing table, it recognizes that label 9044
390	   corresponds to a next-hop of R8, so it installs a backup entry,
391	   corresponding to the failure of the link to R8, when pops label 9044,
392	   and then examines the new top label in the context of R8.

394	       R3's Routing Table (partial)
395	       Transit route for Adj-SID
396	      +=============+=============================================+
397	      | In label    | Outgoing label action                       |
398	      +=============+=============================================+
399	      | 9044        | Primary: pop, fwd to R8                     |
400	      |             | Backup: pop, lookup context.r8              |
401	      +-------------+---------------------------------------------+

403	       R3's Context Table for R8 (context.r8, partial)
404	      +=============+=============================================+
405	      | In label    | Outgoing label action                       |
406	      +=============+=============================================+
407	      | 3005        | swap 1005, fwd to R4                        |
408	      +-------------+---------------------------------------------+
409	      | 9054        | pop, fwd to R4                              |
410	      +-------------+---------------------------------------------+

412	      Figure 5: Building node-protecting backup paths for SR-TE paths
413	                            involving Adj-SIDs

415	   R3 constructs its context table for R8 by determining which labels R8
416	   expects to receive to accomplish different forwarding actions.  The
417	   entry for incoming label 3005 in context.r8 in Figure 5 corresponds
418	   to a Node-SID This entry is computed using the methods described in
419	   Section 3.2

421	   The entry for incoming label 9054 in context.r8 corresponds to an
422	   Adj-SID.  R3 recognizes that R8 has advertised this Adj-SID for the
423	   link from R8 to R4 in Figure 3.  So R3 determines the outgoing label
424	   action needed to reach R4 without passing through R8.  This can be
425	   accomplished by popping the label 9054, and forwarding the packet
426	   directly on the link from R3 to R4.

428	3.4.  Segment protection for edge nodes

430	   The segment protection mechanism described in the previous sections
431	   depends on the assumption that the label immediately below the top
432	   label in the label stack is understood in the IGP domain.When the
433	   provider edge routers exchange service labels via BGP or some other
434	   non-IGP mechanism the bottom label is not understood in the IGP
435	   domain.

437	   The EPE-SIDs as described in [I-D.ietf-idr-bgpls-segment-routing-epe]
438	   are used to choose egress interface among a set of egress paths.
439	   EPE-SID can be a bottom-most label in a SR-TE path.  EPE-SIDs are not
440	   understood in the IGP domain.  In order to support the procedures
441	   described in this document, EPE-SIDs should always be added after
442	   Anycast-SID for the nodes that advertised the EPE-SIDs.  Same EPE-SID
443	   should be configured on all these Anycast nodes so that in case of
444	   node failure, the traffic is correctly forwarded by the other
445	   protector nodes.  If a Node-SID is used instead of an Anycast SID,
446	   above the EPE-SID in the label stack, if procedures in this document
447	   are in use, it may cause packets to be dropped.

449	   The egress node protection mechanisms described in the draft
450	   [RFC8679] is applicable to this usecase and no additional changes
451	   will be required for SR based networks

453	3.4.1.  Detailed Example for Segment protection for edge nodes

455	        sid:1    sid:2     sid:3       sid:4      sid:5
456	 1000-2000   1000-2000 1000-2000   1000-2000  1000-2000
457	   R2:1024    R3:1034   R8:1044     R5:1064
458	       R4:2014 =========================
459	   +----+ 10 +----+ 10 +----+  10   +----+ 10 +----+ Primary
460	   | PE1|----| R2 |----| R3 |-------| R4  |-- | PE2| context 1.1.1.1: sid 10
461	   +----+    +----+    +----+       +----+    +----+\
462	       \                  \          /               \+-----+
463	        \ 10               \ 100    / 60             /| CE1 |
464	         \                  \      /               /  +-----+
465	          \   +----+         +----+ R4:1054 +-----+
466	           +--| R7 |---------| R8 | --------| PE3 |context 1.1.1.1 sid 10
467	              +----+    30   +----+         +-----+ Protector mirror SID 100
468	               /   sid:7       sid:8         sid:9
469	              /    1000-2000   3000-4000     1000-2000
470	             / 10
471	          +----+
472	          | R6 |
473	          +----+
474	          sid:6
475	          1000-2000

477	                  R4's Context Table for PE2 (context.PE2, partial)
478	   +=============+=============================================+
479	   | In label    | Outgoing label action                       |
480	   +=============+=============================================+
481	   | 1010        | swap 1100(mirror sid), push 1010 fwd to R8  |
482	   +-------------+---------------------------------------------+

484	    * Numbers on the links represent the symmetric link cost

486	             Figure 6: Node protection for edge nodes Adj-SIDs

488	   The segment protection mechanisms that are described in previous
489	   sections depend on the assumption that the label below the top label
490	   in the label stack are understood in the IGP domain.  If the edge
491	   node goes down, the label below the top label representing the edge
492	   node could be BGP service label or labels representing other
493	   applications.  Service mirroring use case is described in [RFC8402]
494	   section 5.1.  The Customer edges are multi-homed to provider edges
495	   and one of the PE's acts in primary role and the other in protector
496	   role.  The two PEs advertise a context ip address for each customer
497	   site and attaches a Anycast-SID to the context.  The protector PE
498	   advertises a binding sid with M bit set (Mirror-SID)which implies
499	   mirroring capability for the context.  Protector PE builds the
500	   context table for the BGP service labels advertised by the primary PE
501	   for the same context.  The BGP service resolves on a transport that
502	   has stack of labels with context-sid at the bottom of the label
503	   stack.  Any penultimate node of PE2 builds a context table for PE2 as
504	   explained in the section Section 3.1.  This context table contains
505	   the sid for the context-id and output action is to pop the top label
506	   and replace with the Mirror-SID that the protector PE advertised for
507	   the context 1.1.1.1.  As shown in the example Section 3.4.1 the SID
508	   10 attached to context-id 1.1.1.1 has been programmed in the
509	   context.PE2 on the penultimate router R4.  The action is to swap 1010
510	   with Mirror-SID 1100 and push 1010 which is PE2's context SID.  When
511	   packet reaches PE2, it has top label of 1100 which is a Mirror-
512	   SID(context label)on PE2 and directs the protector PE to lookup the
513	   context table of Primary PE for the BGP service labels.

515	4.  Determining node can be bypassed

517	   In certain scenarios, the node in the label stack may represent an
518	   important function such as firewall filter which must be performed.
519	   Bypassing such a functionality may cause major security issues.  When
520	   segment protection mechanisms described in this document are applied,
521	   it's possible that if the firewall goes down, traffic is re-routed
522	   via the next label in the stack.  There are multiple ways this
523	   problem could be solved.

525	   The procedures described in this document should be optional and
526	   should be enabled when devices are configured to apply the procedures
527	   and examine next label in the stack.  The feature should be
528	   controllable on a per neighbor granularity.  When certain devices
529	   offer a critical function, the neighbors of the devices may disable
530	   the segment protection for this particular neighbor providing
531	   critical functions.

533	   IGP protocol extensions are proposed in
534	   [I-D.li-rtgwg-enhanced-ti-lfa] which define a "no bypass" flag for
535	   the SIDs.  The nodes that indicate critical functions may advertise
536	   SIDs with "NB" bit set.  Segment protection procedures described in
537	   this document should not be applied on these SIDs and in case of
538	   failure either link protecting backup paths can be programmed or
539	   packet can be dropped with no protection.

541	5.  Hold timers for Node-SID/Prefix-SIDs and Adj-SIDs

543	   SR-TE paths may be computed by a controller or by the head-end
544	   router.  When there is a node failure in the network, the controller
545	   or head-end router has to learn about the failure, recompute the
546	   label stacks of any affected SR-TE paths, and get the new label
547	   stacks programmed into the forwarding plane of the head-end router.
548	   This process may be slow compared to the speed with which routers in
549	   the network react to the event.  After learning about a node failure,
550	   the non-PLR routers in the network will no longer be able to compute
551	   a path to reach the failed node.  If no special precautions are
552	   taken, these non-PLR routers will remove the forwarding entries
553	   corresponding the Node-SID and Prefix-SIDs advertised by the failed
554	   node.  If the head-end router is still sending traffic with that
555	   Node-SID/Prefix-SID in the stack, traffic can be blackholed at a non-
556	   PLR router.  In this case, the node-protection FRR mechanisms do not
557	   bring full benefit.

559	   In order to solve the above problem, hold timers are recommended.
560	   The hold-timer corresponds to the maximum time that a combination of
561	   controller and head-end router or a head-end router alone takes to
562	   compute and install label stacks corresponding to a new SR-TE paths
563	   in the event of a node failure.  The hold times should be applied to
564	   forwarding entries for Node-SIDs and Prefix-SIDs that are advertised
565	   by single node in the network.  If the Node-SID or Prefix-SID becomes
566	   unreachable, the event and resulting forwarding changes should not
567	   communicated to the forwarding planes on all configured routers
568	   (including PLRs for the failed node) until the hold-timer expires.
569	   The traffic will continue to follow the previous path and get FRR
570	   protection on the PLR.

572	   A route corresponding to a global Adj-SID advertised by a node that
573	   becomes unreachable should also be left in the forwarding table for
574	   the duration of the hold-timer.

576	   The node-protecting backup forwarding entry on the PLR corresponding
577	   to the local Adj-SID from the PLR to the failed node should also be
578	   left in the forwarding table for the duration of the hold-timer.

580	5.1.  Interaction with micro-loop avoidance

582	   During network convergence, the micro-loop avoidance mechansims as
583	   described in [I-D.bashandy-rtgwg-segment-routing-uloop] may be
584	   applied.For the failed node, all the nodes in the network should
585	   consistently detect the failure and maintain the pre-failure shortest
586	   path in the forwarding plane so that the traffic can follow pre-
587	   failure shortest path and take the node-protecting backup path at the
588	   PLR of the failed node.

590	6.  Optimization Considerations

592	   The solution described in this document requires that a PLR build a
593	   context table for each neighbor for which node-protection is desired.
594	   The context table for each protected neighbor needs to contain route
595	   entries for all of the Prefix-SIDs in the network, as well as the
596	   route entries corresponding to the Adj-SIDs advertised by the
597	   protected neighbor.  Although the scale of IGP domain is limited,
598	   this may result in considerable additional memory consumption on the
599	   routers.  It is possible to take advantage of an optimization that
600	   allows the PLR to avoid creating context-tables when all of the nodes
601	   in the network advertise the same Segment Routing Global Block (SRGB)
602	   and all Adj-SIDs in the network are advertised as global Adj-SIDs.
603	   In this case, all labels in the stack representing an SR-TE path are
604	   globally unique.Protection for node failure cases in such a
605	   deployment can be achieved by doing a lookup of the first label and
606	   potentially a second lookup of the second label using a common route
607	   table with primary and backup entries for all Prefix-SIDs as well as
608	   for all of the global Adj-SIDs.

610	   The primary route entries for global Adj-SIDs not advertised by the
611	   PLR will be the shortest path to the node advertising the global Adj-
612	   SID.  The backup route entries for these global Adj-SIDs will
613	   generally correspond to the node-protecting backup path to the node
614	   advertising the global Adj-SID.  However, for a global Adj-SID
615	   advertised by the direct neighbor of the PLR the node-protecting
616	   backup route entry will correspond to the backup path to the node on
617	   the far end of the Adj-SID.

619	   With the common route table constructed in this manner, when the PLR
620	   receives a packet whose first label is a global Adj-SID advertised by
621	   the failed neighbor of the PLR, the lookup of the first label will
622	   produce the correct backup path directly.  When the PLR receives a
623	   packet whose first label is the Node-SID of the failed neighbor,or an
624	   Adj-SID advertised by the PLR corresponding to the failed neighbor,
625	   the route entry will instruct the PLR to lookup the second label
626	   using the common route table.  Finally, when the PLR receives a
627	   packet whose first label is a global Adj-SID or a Node-SID advertised
628	   by a node which is neither the PLR nor the failed neighbor, then the
629	   usual link-protecting backup path will be produced based on a lookup
630	   of the first label only.

632	6.1.  Segment Protection Example with Common SRGB

634	Node               Node               Node               Node             Node
635	sid:1000           sid:1001           sid:1002           sid:1003      sid:1004
636	+----+2001 1 2100+----+2102  1  2201+----+2203  1  2302+----+2304  1 2403+----+
637	| R0 |-----------| R1 |-------------| R2 |-------------| R3 |------------| R4 |
638	+----+           +----+             +----+             +----+            +----+
639	    \ 2005                              \ 2206         / 2306          2407 |
640	     \                                   \            /                     |
641	      \ 1                                 \ 10       / 6                  1 |
642	       \                                   \        /                       |
643	        \                              2602 \      / 2603              2704 |
644	         \ 2500+----+ 2506               2605+----+2607              2706+----+
645	          +----| R5 |------------------------| R6 |----------------------| R7 |
646	               +----+           3            +----+            1         +----+
647	               Node                          Node                          Node
648	               sid:1005                      sid:1006                  sid:1007

650	      * Numbers on the links represent the symmetric link cost
651	          * All nodes have SRGB = [400000-405000] size 5000

653	            R2's Routing Table (partial)

655	   +=============+=============================================+
656	   | In label    | Outgoing label action                       |
657	   +=============+=============================================+
658	   | 4001003     | Primary: pop, fwd to R3                     |
659	   |             | Backup: pop, lookup ilm table or ip table   |
660	   |             |              based on BOS bit               |
661	   +-------------+---------------------------------------------+
662	   | 4001007     | Primary: swap 401007, fwd to R6             |
663	   |             | Backup: Swap 401007, Push 401005(top),fwd R1|
664	   +-------------+---------------------------------------------+
665	   | 4002203     | Primary: pop, fwd to R3                     |
666	   |             | Backup: pop, lookup ilm table or ip table   |
667	   |             |              based on BOS bit               |
668	   +-------------+---------------------------------------------+

670	Label Stack 1:
671	   +-------------+
672	   |4001003 (top)|
673	   +-------------+
674	   |   4001007   |
675	   +-------------+                                Label Stack 2:
676	                                                       +-------------+
677	                                                       |4001003 (top)|
678	                                                       +-------------+
679	                                                       |   4001007   |
680	                                                       +-------------+

682	                           Figure 7: Common SRGB

684	   The diagram Figure 7 shows an example where optimized Segment
685	   Protection mechanism is deployed.  All the nodes have a common SRGB
686	   of 400000 to 4005000.  The Node-SIDs are in the range 1001, 1002 etc
687	   and the global Adj-SIDs are in the range 2001, 2005 and so on.  R2's
688	   partial ILM table consisting of primary and backup nexthops is also
689	   shown in the diagram.  Node-SID of R3 which is represented by label
690	   4001003 has a primary nexthop pointing to R3 and backup nexthop which
691	   pops the label and looks up ILM table with next label in the packet.
692	   For Example consider a path from R0 to R7 with a label stack
693	   consisting of 4001003 and 4001007.  When the node R3 fails, R2 which
694	   is the PLR, will pop the label 4001003 and lookup for next label in
695	   the same table.  Next label in this example is 4001007.  Based on the
696	   primary nexthop for 4001007, traffic is forwarded to R6.  Another
697	   example label stack consists of global Adj-SID of 4002203 (Adj-SID
698	   from R2->R3).  As shown in the partial ILM table on R2, 4002203 also
699	   has a backup nexthop which pops the label and looks-up next label in
700	   the packet.On R3's failure, traffic will get forwarded via R6.

702	7.  Operational Considerations

704	   The procedures described in this document should be configurable and
705	   applied only when enabled explicitly.  In order to satisfy scenarios
706	   described in Section 4, the feature should be controllable on the per
707	   neighbor basis.  The optimisation procedures described in Section 6,
708	   should be applied only when the entire network has a common SRGB and
709	   all nodes advertise global Adj-SIDs.  This optimization should be
710	   applied based on explicit configuration.

712	8.  Security Considerations

714	   The procedures described in this document will in most common cases
715	   be deployed inside a single ownership IGP domain.  No new security
716	   risks are exposed due to the procedures described in this document.
717	   The security considerations for SR-MPLS with label stacking is
718	   described in detail in [RFC8402] are applicable for this document as
719	   well.  This document introduces the context table lookup for the
720	   labels in the label stack.  As described in [RFC8402] MPLS packet
721	   filtering at the boundaries ensures the operations on the MPLS labels
722	   inside the domain is safe includingcontext table lookup operation.
723	   The security procedures applicable to IGP protocols are also
724	   applicable to segment routing extensions as described in [RFC8667]
725	   and [RFC8665] and ensure required protection for the segment
726	   protection procedures described in this document.

728	9.  IANA Considerations

730	10.  Acknowledgments

732	   The authors would like to thank Peter Psenak, Bruno Decraene,
733	   Alexander Vainshtein and Huzibo, Dhruv Dhody Ketan Talaulikar for
734	   their review and suggestions.

736	11.  References

738	11.1.  Normative References

740	   [RFC5286]  Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for
741	              IP Fast Reroute: Loop-Free Alternates", RFC 5286,
742	              DOI 10.17487/RFC5286, September 2008,
743	              <https://www.rfc-editor.org/info/rfc5286>.

745	   [RFC5331]  Aggarwal, R., Rekhter, Y., and E. Rosen, "MPLS Upstream
746	              Label Assignment and Context-Specific Label Space",
747	              RFC 5331, DOI 10.17487/RFC5331, August 2008,
748	              <https://www.rfc-editor.org/info/rfc5331>.

750	   [RFC8402]  Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
751	              Decraene, B., Litkowski, S., and R. Shakir, "Segment
752	              Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
753	              July 2018, <https://www.rfc-editor.org/info/rfc8402>.

755	11.2.  Informative References

757	   [I-D.bashandy-rtgwg-segment-routing-uloop]
758	              Bashandy, A., Filsfils, C., Litkowski, S., Decraene, B.,
759	              Francois, P., and P. Psenak, "Loop avoidance using Segment
760	              Routing", draft-bashandy-rtgwg-segment-routing-uloop-09
761	              (work in progress), June 2020.

763	   [I-D.ietf-idr-bgpls-segment-routing-epe]
764	              Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray,
765	              S., and J. Dong, "BGP-LS extensions for Segment Routing
766	              BGP Egress Peer Engineering", draft-ietf-idr-bgpls-
767	              segment-routing-epe-19 (work in progress), May 2019.

769	   [I-D.ietf-rtgwg-segment-routing-ti-lfa]
770	              Litkowski, S., Bashandy, A., Filsfils, C., Decraene, B.,
771	              Francois, P., Voyer, D., Clad, F., and P. Camarillo,
772	              "Topology Independent Fast Reroute using Segment Routing",
773	              draft-ietf-rtgwg-segment-routing-ti-lfa-04 (work in
774	              progress), August 2020.

776	   [I-D.li-rtgwg-enhanced-ti-lfa]
777	              Li, C. and Z. Hu, "Enhanced Topology Independent Loop-free
778	              Alternate Fast Re-route", draft-li-rtgwg-enhanced-ti-
779	              lfa-02 (work in progress), August 2020.

781	   [RFC8102]  Sarkar, P., Ed., Hegde, S., Bowers, C., Gredler, H., and
782	              S. Litkowski, "Remote-LFA Node Protection and
783	              Manageability", RFC 8102, DOI 10.17487/RFC8102, March
784	              2017, <https://www.rfc-editor.org/info/rfc8102>.

786	   [RFC8665]  Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler,
787	              H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF
788	              Extensions for Segment Routing", RFC 8665,
789	              DOI 10.17487/RFC8665, December 2019,
790	              <https://www.rfc-editor.org/info/rfc8665>.

792	   [RFC8667]  Previdi, S., Ed., Ginsberg, L., Ed., Filsfils, C.,
793	              Bashandy, A., Gredler, H., and B. Decraene, "IS-IS
794	              Extensions for Segment Routing", RFC 8667,
795	              DOI 10.17487/RFC8667, December 2019,
796	              <https://www.rfc-editor.org/info/rfc8667>.

798	   [RFC8679]  Shen, Y., Jeganathan, M., Decraene, B., Gredler, H.,
799	              Michel, C., and H. Chen, "MPLS Egress Protection
800	              Framework", RFC 8679, DOI 10.17487/RFC8679, December 2019,
801	              <https://www.rfc-editor.org/info/rfc8679>.

803	Authors' Addresses
804	   Shraddha Hegde
805	   Juniper Networks Inc.
806	   Exora Business Park
807	   Bangalore, KA  560103
808	   India

810	   Email: shraddha@juniper.net

812	   Chris Bowers
813	   Juniper Networks Inc.

815	   Email: cbowers@juniper.net

817	   Stephane Litkowski
818	   Cisco Systems

820	   Email: slitkows.ietf@gmail.com

822	   Xiaohu Xu
823	   Alibaba Inc.
824	   Beijing
825	   China

827	   Email: xiaohu.xxh@alibaba-inc.com

829	   Feng Xu
830	   Tencent
831	   China

833	   Email: oliverxu@tencent.com