idnits 2.17.1 draft-ietf-spring-segment-protection-sr-te-paths-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 18 instances of too long lines in the document, the longest one being 7 characters in excess of 72. == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 30, 2020) is 1302 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: '1000-2000' is mentioned on line 246, but not defined == Missing Reference: '3000-4000' is mentioned on line 246, but not defined -- Looks like a reference, but probably isn't: '1100' on line 201 -- Looks like a reference, but probably isn't: '1005' on line 201 == Missing Reference: '400000-405000' is mentioned on line 651, but not defined == Outdated reference: A later version (-16) exists of draft-bashandy-rtgwg-segment-routing-uloop-09 == Outdated reference: A later version (-13) exists of draft-ietf-rtgwg-segment-routing-ti-lfa-04 == Outdated reference: A later version (-09) exists of draft-li-rtgwg-enhanced-ti-lfa-02 Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing area S. Hegde 3 Internet-Draft C. Bowers 4 Intended status: Informational Juniper Networks Inc. 5 Expires: April 3, 2021 S. Litkowski 6 Cisco Systems 7 X. Xu 8 Alibaba Inc. 9 F. Xu 10 Tencent 11 September 30, 2020 13 Segment Protection for SR-TE Paths 14 draft-ietf-spring-segment-protection-sr-te-paths-00 16 Abstract 18 Segment routing supports the creation of explicit paths using Adj- 19 Segment-ID (SID), Node-SIDs, and BSIDs. It is important to provide 20 fast reroute (FRR) mechanisms to respond to failures of links and 21 nodes in the Segment-Routed Traffic-Engineered(SR-TE) path. A point 22 of local repair (PLR) can provide FRR protection against the failure 23 of a link in an SR-TE path by examining only the first (top) label in 24 the SR label stack. In order to protect against the failure of a 25 node, a PLR may need to examine the second label in the stack as 26 well, in order to determine SR-TE path beyond the failed node. This 27 document specifies how a PLR can use the first and second label in 28 the SR-MPLS label stack describing an SR-TE path to provide 29 protection against node failures. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on April 3, 2021. 48 Copyright Notice 50 Copyright (c) 2020 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 66 2. Node Failures Along SR-TE Paths . . . . . . . . . . . . . . . 3 67 2.1. Segment protection for explicit paths with Node-SIDs . . 4 68 2.2. Segment Protection for Anycast-SIDs . . . . . . . . . . . 4 69 2.3. Segment protection for explicit paths with Adj-SIDs . . . 5 70 3. Detailed Solution using Context Tables . . . . . . . . . . . 7 71 3.1. Building Context Tables . . . . . . . . . . . . . . . . . 7 72 3.2. Segment protection for Node-SIDs . . . . . . . . . . . . 8 73 3.3. Segment protection for Adj-SIDs . . . . . . . . . . . . . 9 74 3.4. Segment protection for edge nodes . . . . . . . . . . . . 10 75 3.4.1. Detailed Example for Segment protection for edge 76 nodes . . . . . . . . . . . . . . . . . . . . . . . . 11 77 4. Determining node can be bypassed . . . . . . . . . . . . . . 12 78 5. Hold timers for Node-SID/Prefix-SIDs and Adj-SIDs . . . . . . 13 79 5.1. Interaction with micro-loop avoidance . . . . . . . . . . 14 80 6. Optimization Considerations . . . . . . . . . . . . . . . . . 14 81 6.1. Segment Protection Example with Common SRGB . . . . . . . 15 82 7. Operational Considerations . . . . . . . . . . . . . . . . . 16 83 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 84 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 85 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 86 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 87 11.1. Normative References . . . . . . . . . . . . . . . . . . 17 88 11.2. Informative References . . . . . . . . . . . . . . . . . 17 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 91 1. Introduction 93 It is possible for a routing device to completely go out of service 94 abruptly due to power failure, hardware failure or software crashes. 95 Node protection is an important property of the Fast Reroute 96 mechanism. It provides protection against a node failure by 97 rerouting traffic around the failed node. For example, the 98 mechanisms described in Loop Free Alternates ([RFC5286]), Remote Loop 99 Free Alternates ([RFC8102]), and 100 [I-D.ietf-rtgwg-segment-routing-ti-lfa] can be used to provide node 101 protection to ensure minimal traffic loss after a node failure. 103 Section 2 describes problems with SR-TE paths and the need for a 104 specialized mechanism to provide node protection for SR-TE paths. 105 Section 3 describes the solution applied to paths built using Adj- 106 SIDs and Node-SIDs. In order to distinguish the node failures of the 107 segment endpoints (mid points) in an SR-TE path from the usual node 108 protection mechanisms described in various LFA mechansims, this 109 document uses the term Segment Protection. 111 2. Node Failures Along SR-TE Paths 113 The topology shown in Figure 1. illustrates a example network 114 topology with Segment Routing enabled on each node. 116 Node Node Node Node Node 117 SID:1 SID:2 SID:3 SID:4 SID:5 118 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 119 | R1 |--------| R2 |--------| R3 |--------| R4 |--------| R5 | 120 +----+ +----+ +----+ +----+ +----+ 121 \ \ / 122 \ 10 \ 100 / 60 123 \ \ / 124 \ +----+ +----+ 125 +--| R7 |------------------| R8 | 126 +----+ 30 +----+ 127 / Node Node Label stack: 128 / SID:7 SID:8 +------------+ 129 +----+ SRGB: | 1008 (top)| 130 | R6 | 3000-4000 +------------+ 131 +----+ | 3005 | 132 Node +------------+ 133 SID:6 135 * Numbers on the links represent the symmetric link cost 137 Figure 1: Example topology. The segment index for each node is shown 138 in the diagram. All nodes have SRGB = [1000-2000], except for R8 139 which has SRGB = [3000-4000]. A label stack that represents the path 140 R1->R7->R8->R4->R5 is shown as well. 142 2.1. Segment protection for explicit paths with Node-SIDs 144 Consider an explicit path in the topology in Figure 1 from R1->R5 via 145 R1->R7->R8->R4->R5. This path can be built using the shortest paths 146 from R1-to-R8 and R8-to-R5. The label stack to instantiate this path 147 contains two Node-SIDs 1008 and 3005. The 1008 label will take the 148 packet from R1 to R8 via R7 and get popped. The next label in the 149 stack 3005 will take the packet from R8 to the destination R5 via R4. 150 If the node R8 goes down, it is not possible for R7 to perform FRR 151 without examining the second label in the incoming label stack 152 (3005). 154 Note that in the absence of a failure, R7 does not need to understand 155 the meaning of the second label (3005) in order to perform normal 156 forwarding. However, in order to support segment protection, R7 will 157 need to understand the meaning of label 3005 in order to determine 158 where the packet is headed after R8. 160 The mechanisms used to detect whether a node failed or a link failed, 161 is outside the scope of this document. The possible options for node 162 failure detection capabilities of a device and resultant forwarding 163 state is described in section 5.2 in [RFC8679] are applicable to this 164 draft as well. 166 2.2. Segment Protection for Anycast-SIDs 168 A prefix segment advertised as a Node-SID may only be advertised by 169 one node in the network. Instead, an anycast prefix segment may be 170 advertised by more than one node. In some situations, one can use 171 Anycast-SIDs to construct SR-TE paths that are protected against node 172 failure, without the need for the mechanism described in this 173 document. 175 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 176 | R1 |--------| R2 |--------| R3 |--------| R4 |--------| R5 | 177 +----+ +----+ +----+ +----+ +----+ 178 \ \ / | 179 \ 10 \100 60/ | 180 \ \ / | 181 \ +----+ 30 +----+ | 182 +--| R7 |------------------| R8 | | 183 +----+ +----+ | 184 / \ Anycast + 185 / \ SID:100 / 186 +----+ \ / 187 | R6 | \ 40 +----+ /60 188 +----+ +---------------| R9 |+ Label stack: 189 +----+ +------------+ 190 Anycast | 1100 (top)| 191 SID:100 +------------+ 192 | 1005 | 193 +------------+ 194 * Numbers on the links represent the symmetric link cost 196 Figure 2: Topology illustrating use of Anycast-SIDs to protect 197 against node failures. All nodes have SRGB = [1000-2000]. 199 An example of this is shown in Figure 2. In this example, R8 and R9 200 advertise an Anycast-SID of 100. The label stack in this example = 201 [1100, 1005];. The top label (1100) corresponds to the Anycast-SID 202 advertised by both R8 and R9. In the absence of a failure, the 203 packet sent by R1 with this label stack will follow the path from 204 R1->R5 along R1->R7->R8->R4->R5. 206 If R7 is performing a per-prefix LFA calculation [RFC5286], then R7 207 will install a backup next-hop to R9 for this Anycast-SID, protecting 208 against the failure of the primary next-hop to R8. This backup path 209 does not pass through R8, so it is would not be affected by a 210 complete failure of node R8. As illustrated by this example, for 211 some topologies segment-protecting SR-TE paths can be constructed 212 through the use of Anycast-SIDs, as opposed to the mechanism 213 described in this document. 215 2.3. Segment protection for explicit paths with Adj-SIDs 216 Adj-SID: 217 R3-R8:9044 219 Node- Node Node Node Node 220 SID:1 SID:2 SID:3 SID:4 SID:5 221 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 222 | R1 |--------| R2 |--------| R3 |--------| R4 |--------| R5 | 223 +----+ +----+ +----+ +----+ +----+ 224 \ \ / | 225 \ 10 \ 100 / 60 | 10 226 \ \ / | 227 \ +----+ +----+ +----+ 228 +--| R7 |------------------| R8 |---------------| R9 | 229 +----+ 30 +----+ 10 +----+ 230 / Node Node Node 231 / SID:7 SID:8 SID:9 232 +----+ SRGB: 233 | R6 | 3000-4000 Label stack: 234 +----+ +------------+ 235 Node Adj-SIDs: | 1003 (top)| 236 SID:6 R8-R4:9054 +------------+ 237 | 9044 | 238 +------------+ 239 | 9054 | 240 +------------+ 241 | 1005 | 242 +------------+ 243 * Numbers on the links represent the symmetric link cost 245 Figure 3: Explicit path using an Adj-SID. All nodes have SRGB = 246 [1000-2000], except for R8 which has SRGB = [3000-4000]. 248 Consider an explicit path from R1->R5 via R1->R2->R3->R8->R4->R5. 249 This path can be built using a combination of Node-SIDs and Adj-SIDs, 250 as shown in Figure 3. The diagram shows the label stack needed to 251 instantiate this path, as well as several Adj-SIDs advertised by 252 nodes involved in this path. When a packet leaving R1 with this 253 label stack reaches R3, the top label is 9044, which will take the 254 packet to R8. The next-next-hop in the path is R4. To provide 255 protection for the failure of node R8, R3 would need to send the the 256 packet to R4 without going through R8. However, the only way R3 can 257 learn that the packet needs to go to the R4 is to examine the next 258 label in the stack, label 9054. Since R3 knows that R8 has 259 advertised label 9054 as the adjacency segment for the link from R8 260 to R4, R3 knows that a backup path can merge back into the original 261 explicit path at R4. 263 3. Detailed Solution using Context Tables 265 This section provides a detailed description of how to construct 266 node-protecting backup paths for SR-TE paths using context tables. 267 The end result of this description is externally visible forwarding 268 behavior that can be specified as a packet arriving at a PLR with a 269 particular incoming label stack and leaving the PLR on a particular 270 outgoing interface with a particular outgoing label stack. There may 271 be other methods of arriving at the same externally visible 272 forwarding behavior as described in draft 273 [I-D.ietf-rtgwg-segment-routing-ti-lfa]section 6.2. It is not the 274 intent of this document to exclude other methods, as long as the 275 externally visible forwarding behavior is the same as produced by 276 this method. 278 3.1. Building Context Tables 280 [RFC5331] introduced the concept of Context Specific Label Spaces and 281 there are various applications making use of this concept.A context 282 label table on a router represents the Label Forwarding Information 283 Base (LFIB) from the point of view of a particular neighbor . Context 284 tables are built by constructing incoming label mappings advertised 285 by the neighbor and the actions corresponding to those labels. The 286 labels advertised by each node are local to the node and may not be 287 unique across the segment routing domain. The context tables are 288 separate tables built on a per-neighbor basis on every node to ensure 289 they represent LFIBs of a particular neighbor. 291 When a PLR needs to protect an SR-TE path against the failure of a 292 neighbor N, it creates a context table associated with N. This 293 context table is populated with the following segment routing 294 forwarding entries: 296 - All the Prefix-SIDs of the network. The programmed incoming 297 label map uses the SRGB of N to compute the input label value. 298 The NHLFE (Next Hop Label Forwarding Entry) is then constructed by 299 looking into all the nexthops for the Prefix-SID and choosing a 300 loop-free path as explained in Section 3.2 302 - All the Adj-SIDs advertised by N. The NHLFE is constructed as 303 explained in Section 3.3 305 The following section illustrates how the context table is 306 constructed to allow the PLR to provide node-protecting paths for the 307 next-next hops in the topology shown in Figure 1 and Figure 3. 309 3.2. Segment protection for Node-SIDs 311 Figure 4 shows the routing table entries on R7 corresponding to the 312 Node-SIDs to reach R1 and R8 for the topology in Figure 1. In the 313 absence of a failure, a packet with a label stack whose top label is 314 1008 will have its top label popped by R7 (assuming PHP behavior), 315 and R7 will forward the packet to R8. When the interface to R8 is 316 down, the backup next-hop entry is used. R7 will pop the top label 317 of 1008, and use the context table that R7 computed for R8 to 318 evaluate the next label on the stack. 320 R7's Routing Table (partial) 321 Transits routes for Node-SIDs for R1 and R8 322 +=============+=============================================+ 323 | In label | Outgoing label action | 324 +=============+=============================================+ 325 | 1001 | Primary: pop, fwd to R1 | 326 | | Backup: pop, lookup context.r1 | 327 +-------------+---------------------------------------------+ 328 | 1008 | Primary: pop, fwd to R8 | 329 | | Backup: pop, lookup context.r8 | 330 +-------------+---------------------------------------------+ 332 R7's Context Table for R8 (context.r8, partial) 333 +=============+=============================================+ 334 | In label | Outgoing label action | 335 +=============+=============================================+ 336 | 3004 | swap 1004, fwd to R1 | 337 +-------------+---------------------------------------------+ 338 | 3005 | swap 1005, fwd to R1 | 339 +-------------+---------------------------------------------+ 340 | 3008 | drop | 341 +-------------+---------------------------------------------+ 343 Figure 4: Building node-protecting backup paths for SR-TE paths 344 involving Node-SIDs 346 R7 builds context table for R8 using the following process. R7 347 computes the mapping of incoming label to Node-SID that R8 expects to 348 see based on the SRGB advertised by R8. In the example in Figure 1, 349 R7 can determine that R8 interprets in incoming label of 3005 as 350 mapping to the the Node-SID for R5. 352 R7 then computes a loop-free backup path to reach R5 which is node- 353 protecting with respect to the failure of R8. In this example, the 354 backup path computed by R7 to reach R5 without passing through R8 can 355 be achieved forwarding the packet to R1 with a top label of 1005, 356 corresponding to the Node-SID for R5 in the context of R1's SRGB. 358 The loop-free path computation may be based on a mechanism such as 359 LFA, R-LFA, TI-LFA, or constraint based SPF avoiding failure. To 360 populate the context table for R8, R7 maps the out label actions 361 corresponding to the backup path to R5 to the incoming label 3005. 362 This results in the entry for label 3005 shown in context.r8 in 363 Figure 4. 365 Therefore, when a packet arrives at R7 with label stack = [1008, 366 3005], and the link from R7 to R8 has recently failed, R7 will use 367 backup next-hop entry for label 1008 in its main routing table. 368 Based on this entry, R7 will pop label 1008, and use context.r8 to 369 lookup the new top label = 3005. R7 will swap label 3005 for 1005 370 and forward the packet to R1. This will get the packet to R5 on a 371 node protecting backup path. 373 Note that R7 activates the node-protecting backup path when it 374 detects that the link to R8 has failed. R7 does not know that node 375 R8 has actually failed. However, the node-protecting backup path is 376 computed assuming that the failure of the link to R8 implies that R8 377 has failed. 379 3.3. Segment protection for Adj-SIDs 381 This section gives an example of how to constuct node-protecting 382 backup paths when the SR-TE path uses Adj-SIDs. Figure 5 shows some 383 of the routing table entries for R3 corresponding to the sample 384 network shown in Figure 3. When the top label of the label stack is 385 an Adj-SID, the PLR needs to recognize that in order to provide a 386 node-protecting backup path, it needs to pop the top label and 387 examine the next label in the context of the next-hop router 388 identified by the top label Adj-SID. In this example, when R3 is 389 constructing its routing table, it recognizes that label 9044 390 corresponds to a next-hop of R8, so it installs a backup entry, 391 corresponding to the failure of the link to R8, when pops label 9044, 392 and then examines the new top label in the context of R8. 394 R3's Routing Table (partial) 395 Transit route for Adj-SID 396 +=============+=============================================+ 397 | In label | Outgoing label action | 398 +=============+=============================================+ 399 | 9044 | Primary: pop, fwd to R8 | 400 | | Backup: pop, lookup context.r8 | 401 +-------------+---------------------------------------------+ 403 R3's Context Table for R8 (context.r8, partial) 404 +=============+=============================================+ 405 | In label | Outgoing label action | 406 +=============+=============================================+ 407 | 3005 | swap 1005, fwd to R4 | 408 +-------------+---------------------------------------------+ 409 | 9054 | pop, fwd to R4 | 410 +-------------+---------------------------------------------+ 412 Figure 5: Building node-protecting backup paths for SR-TE paths 413 involving Adj-SIDs 415 R3 constructs its context table for R8 by determining which labels R8 416 expects to receive to accomplish different forwarding actions. The 417 entry for incoming label 3005 in context.r8 in Figure 5 corresponds 418 to a Node-SID This entry is computed using the methods described in 419 Section 3.2 421 The entry for incoming label 9054 in context.r8 corresponds to an 422 Adj-SID. R3 recognizes that R8 has advertised this Adj-SID for the 423 link from R8 to R4 in Figure 3. So R3 determines the outgoing label 424 action needed to reach R4 without passing through R8. This can be 425 accomplished by popping the label 9054, and forwarding the packet 426 directly on the link from R3 to R4. 428 3.4. Segment protection for edge nodes 430 The segment protection mechanism described in the previous sections 431 depends on the assumption that the label immediately below the top 432 label in the label stack is understood in the IGP domain.When the 433 provider edge routers exchange service labels via BGP or some other 434 non-IGP mechanism the bottom label is not understood in the IGP 435 domain. 437 The EPE-SIDs as described in [I-D.ietf-idr-bgpls-segment-routing-epe] 438 are used to choose egress interface among a set of egress paths. 439 EPE-SID can be a bottom-most label in a SR-TE path. EPE-SIDs are not 440 understood in the IGP domain. In order to support the procedures 441 described in this document, EPE-SIDs should always be added after 442 Anycast-SID for the nodes that advertised the EPE-SIDs. Same EPE-SID 443 should be configured on all these Anycast nodes so that in case of 444 node failure, the traffic is correctly forwarded by the other 445 protector nodes. If a Node-SID is used instead of an Anycast SID, 446 above the EPE-SID in the label stack, if procedures in this document 447 are in use, it may cause packets to be dropped. 449 The egress node protection mechanisms described in the draft 450 [RFC8679] is applicable to this usecase and no additional changes 451 will be required for SR based networks 453 3.4.1. Detailed Example for Segment protection for edge nodes 455 sid:1 sid:2 sid:3 sid:4 sid:5 456 1000-2000 1000-2000 1000-2000 1000-2000 1000-2000 457 R2:1024 R3:1034 R8:1044 R5:1064 458 R4:2014 ========================= 459 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ Primary 460 | PE1|----| R2 |----| R3 |-------| R4 |-- | PE2| context 1.1.1.1: sid 10 461 +----+ +----+ +----+ +----+ +----+\ 462 \ \ / \+-----+ 463 \ 10 \ 100 / 60 /| CE1 | 464 \ \ / / +-----+ 465 \ +----+ +----+ R4:1054 +-----+ 466 +--| R7 |---------| R8 | --------| PE3 |context 1.1.1.1 sid 10 467 +----+ 30 +----+ +-----+ Protector mirror SID 100 468 / sid:7 sid:8 sid:9 469 / 1000-2000 3000-4000 1000-2000 470 / 10 471 +----+ 472 | R6 | 473 +----+ 474 sid:6 475 1000-2000 477 R4's Context Table for PE2 (context.PE2, partial) 478 +=============+=============================================+ 479 | In label | Outgoing label action | 480 +=============+=============================================+ 481 | 1010 | swap 1100(mirror sid), push 1010 fwd to R8 | 482 +-------------+---------------------------------------------+ 484 * Numbers on the links represent the symmetric link cost 486 Figure 6: Node protection for edge nodes Adj-SIDs 488 The segment protection mechanisms that are described in previous 489 sections depend on the assumption that the label below the top label 490 in the label stack are understood in the IGP domain. If the edge 491 node goes down, the label below the top label representing the edge 492 node could be BGP service label or labels representing other 493 applications. Service mirroring use case is described in [RFC8402] 494 section 5.1. The Customer edges are multi-homed to provider edges 495 and one of the PE's acts in primary role and the other in protector 496 role. The two PEs advertise a context ip address for each customer 497 site and attaches a Anycast-SID to the context. The protector PE 498 advertises a binding sid with M bit set (Mirror-SID)which implies 499 mirroring capability for the context. Protector PE builds the 500 context table for the BGP service labels advertised by the primary PE 501 for the same context. The BGP service resolves on a transport that 502 has stack of labels with context-sid at the bottom of the label 503 stack. Any penultimate node of PE2 builds a context table for PE2 as 504 explained in the section Section 3.1. This context table contains 505 the sid for the context-id and output action is to pop the top label 506 and replace with the Mirror-SID that the protector PE advertised for 507 the context 1.1.1.1. As shown in the example Section 3.4.1 the SID 508 10 attached to context-id 1.1.1.1 has been programmed in the 509 context.PE2 on the penultimate router R4. The action is to swap 1010 510 with Mirror-SID 1100 and push 1010 which is PE2's context SID. When 511 packet reaches PE2, it has top label of 1100 which is a Mirror- 512 SID(context label)on PE2 and directs the protector PE to lookup the 513 context table of Primary PE for the BGP service labels. 515 4. Determining node can be bypassed 517 In certain scenarios, the node in the label stack may represent an 518 important function such as firewall filter which must be performed. 519 Bypassing such a functionality may cause major security issues. When 520 segment protection mechanisms described in this document are applied, 521 it's possible that if the firewall goes down, traffic is re-routed 522 via the next label in the stack. There are multiple ways this 523 problem could be solved. 525 The procedures described in this document should be optional and 526 should be enabled when devices are configured to apply the procedures 527 and examine next label in the stack. The feature should be 528 controllable on a per neighbor granularity. When certain devices 529 offer a critical function, the neighbors of the devices may disable 530 the segment protection for this particular neighbor providing 531 critical functions. 533 IGP protocol extensions are proposed in 534 [I-D.li-rtgwg-enhanced-ti-lfa] which define a "no bypass" flag for 535 the SIDs. The nodes that indicate critical functions may advertise 536 SIDs with "NB" bit set. Segment protection procedures described in 537 this document should not be applied on these SIDs and in case of 538 failure either link protecting backup paths can be programmed or 539 packet can be dropped with no protection. 541 5. Hold timers for Node-SID/Prefix-SIDs and Adj-SIDs 543 SR-TE paths may be computed by a controller or by the head-end 544 router. When there is a node failure in the network, the controller 545 or head-end router has to learn about the failure, recompute the 546 label stacks of any affected SR-TE paths, and get the new label 547 stacks programmed into the forwarding plane of the head-end router. 548 This process may be slow compared to the speed with which routers in 549 the network react to the event. After learning about a node failure, 550 the non-PLR routers in the network will no longer be able to compute 551 a path to reach the failed node. If no special precautions are 552 taken, these non-PLR routers will remove the forwarding entries 553 corresponding the Node-SID and Prefix-SIDs advertised by the failed 554 node. If the head-end router is still sending traffic with that 555 Node-SID/Prefix-SID in the stack, traffic can be blackholed at a non- 556 PLR router. In this case, the node-protection FRR mechanisms do not 557 bring full benefit. 559 In order to solve the above problem, hold timers are recommended. 560 The hold-timer corresponds to the maximum time that a combination of 561 controller and head-end router or a head-end router alone takes to 562 compute and install label stacks corresponding to a new SR-TE paths 563 in the event of a node failure. The hold times should be applied to 564 forwarding entries for Node-SIDs and Prefix-SIDs that are advertised 565 by single node in the network. If the Node-SID or Prefix-SID becomes 566 unreachable, the event and resulting forwarding changes should not 567 communicated to the forwarding planes on all configured routers 568 (including PLRs for the failed node) until the hold-timer expires. 569 The traffic will continue to follow the previous path and get FRR 570 protection on the PLR. 572 A route corresponding to a global Adj-SID advertised by a node that 573 becomes unreachable should also be left in the forwarding table for 574 the duration of the hold-timer. 576 The node-protecting backup forwarding entry on the PLR corresponding 577 to the local Adj-SID from the PLR to the failed node should also be 578 left in the forwarding table for the duration of the hold-timer. 580 5.1. Interaction with micro-loop avoidance 582 During network convergence, the micro-loop avoidance mechansims as 583 described in [I-D.bashandy-rtgwg-segment-routing-uloop] may be 584 applied.For the failed node, all the nodes in the network should 585 consistently detect the failure and maintain the pre-failure shortest 586 path in the forwarding plane so that the traffic can follow pre- 587 failure shortest path and take the node-protecting backup path at the 588 PLR of the failed node. 590 6. Optimization Considerations 592 The solution described in this document requires that a PLR build a 593 context table for each neighbor for which node-protection is desired. 594 The context table for each protected neighbor needs to contain route 595 entries for all of the Prefix-SIDs in the network, as well as the 596 route entries corresponding to the Adj-SIDs advertised by the 597 protected neighbor. Although the scale of IGP domain is limited, 598 this may result in considerable additional memory consumption on the 599 routers. It is possible to take advantage of an optimization that 600 allows the PLR to avoid creating context-tables when all of the nodes 601 in the network advertise the same Segment Routing Global Block (SRGB) 602 and all Adj-SIDs in the network are advertised as global Adj-SIDs. 603 In this case, all labels in the stack representing an SR-TE path are 604 globally unique.Protection for node failure cases in such a 605 deployment can be achieved by doing a lookup of the first label and 606 potentially a second lookup of the second label using a common route 607 table with primary and backup entries for all Prefix-SIDs as well as 608 for all of the global Adj-SIDs. 610 The primary route entries for global Adj-SIDs not advertised by the 611 PLR will be the shortest path to the node advertising the global Adj- 612 SID. The backup route entries for these global Adj-SIDs will 613 generally correspond to the node-protecting backup path to the node 614 advertising the global Adj-SID. However, for a global Adj-SID 615 advertised by the direct neighbor of the PLR the node-protecting 616 backup route entry will correspond to the backup path to the node on 617 the far end of the Adj-SID. 619 With the common route table constructed in this manner, when the PLR 620 receives a packet whose first label is a global Adj-SID advertised by 621 the failed neighbor of the PLR, the lookup of the first label will 622 produce the correct backup path directly. When the PLR receives a 623 packet whose first label is the Node-SID of the failed neighbor,or an 624 Adj-SID advertised by the PLR corresponding to the failed neighbor, 625 the route entry will instruct the PLR to lookup the second label 626 using the common route table. Finally, when the PLR receives a 627 packet whose first label is a global Adj-SID or a Node-SID advertised 628 by a node which is neither the PLR nor the failed neighbor, then the 629 usual link-protecting backup path will be produced based on a lookup 630 of the first label only. 632 6.1. Segment Protection Example with Common SRGB 634 Node Node Node Node Node 635 sid:1000 sid:1001 sid:1002 sid:1003 sid:1004 636 +----+2001 1 2100+----+2102 1 2201+----+2203 1 2302+----+2304 1 2403+----+ 637 | R0 |-----------| R1 |-------------| R2 |-------------| R3 |------------| R4 | 638 +----+ +----+ +----+ +----+ +----+ 639 \ 2005 \ 2206 / 2306 2407 | 640 \ \ / | 641 \ 1 \ 10 / 6 1 | 642 \ \ / | 643 \ 2602 \ / 2603 2704 | 644 \ 2500+----+ 2506 2605+----+2607 2706+----+ 645 +----| R5 |------------------------| R6 |----------------------| R7 | 646 +----+ 3 +----+ 1 +----+ 647 Node Node Node 648 sid:1005 sid:1006 sid:1007 650 * Numbers on the links represent the symmetric link cost 651 * All nodes have SRGB = [400000-405000] size 5000 653 R2's Routing Table (partial) 655 +=============+=============================================+ 656 | In label | Outgoing label action | 657 +=============+=============================================+ 658 | 4001003 | Primary: pop, fwd to R3 | 659 | | Backup: pop, lookup ilm table or ip table | 660 | | based on BOS bit | 661 +-------------+---------------------------------------------+ 662 | 4001007 | Primary: swap 401007, fwd to R6 | 663 | | Backup: Swap 401007, Push 401005(top),fwd R1| 664 +-------------+---------------------------------------------+ 665 | 4002203 | Primary: pop, fwd to R3 | 666 | | Backup: pop, lookup ilm table or ip table | 667 | | based on BOS bit | 668 +-------------+---------------------------------------------+ 670 Label Stack 1: 671 +-------------+ 672 |4001003 (top)| 673 +-------------+ 674 | 4001007 | 675 +-------------+ Label Stack 2: 676 +-------------+ 677 |4001003 (top)| 678 +-------------+ 679 | 4001007 | 680 +-------------+ 682 Figure 7: Common SRGB 684 The diagram Figure 7 shows an example where optimized Segment 685 Protection mechanism is deployed. All the nodes have a common SRGB 686 of 400000 to 4005000. The Node-SIDs are in the range 1001, 1002 etc 687 and the global Adj-SIDs are in the range 2001, 2005 and so on. R2's 688 partial ILM table consisting of primary and backup nexthops is also 689 shown in the diagram. Node-SID of R3 which is represented by label 690 4001003 has a primary nexthop pointing to R3 and backup nexthop which 691 pops the label and looks up ILM table with next label in the packet. 692 For Example consider a path from R0 to R7 with a label stack 693 consisting of 4001003 and 4001007. When the node R3 fails, R2 which 694 is the PLR, will pop the label 4001003 and lookup for next label in 695 the same table. Next label in this example is 4001007. Based on the 696 primary nexthop for 4001007, traffic is forwarded to R6. Another 697 example label stack consists of global Adj-SID of 4002203 (Adj-SID 698 from R2->R3). As shown in the partial ILM table on R2, 4002203 also 699 has a backup nexthop which pops the label and looks-up next label in 700 the packet.On R3's failure, traffic will get forwarded via R6. 702 7. Operational Considerations 704 The procedures described in this document should be configurable and 705 applied only when enabled explicitly. In order to satisfy scenarios 706 described in Section 4, the feature should be controllable on the per 707 neighbor basis. The optimisation procedures described in Section 6, 708 should be applied only when the entire network has a common SRGB and 709 all nodes advertise global Adj-SIDs. This optimization should be 710 applied based on explicit configuration. 712 8. Security Considerations 714 The procedures described in this document will in most common cases 715 be deployed inside a single ownership IGP domain. No new security 716 risks are exposed due to the procedures described in this document. 717 The security considerations for SR-MPLS with label stacking is 718 described in detail in [RFC8402] are applicable for this document as 719 well. This document introduces the context table lookup for the 720 labels in the label stack. As described in [RFC8402] MPLS packet 721 filtering at the boundaries ensures the operations on the MPLS labels 722 inside the domain is safe includingcontext table lookup operation. 723 The security procedures applicable to IGP protocols are also 724 applicable to segment routing extensions as described in [RFC8667] 725 and [RFC8665] and ensure required protection for the segment 726 protection procedures described in this document. 728 9. IANA Considerations 730 10. Acknowledgments 732 The authors would like to thank Peter Psenak, Bruno Decraene, 733 Alexander Vainshtein and Huzibo, Dhruv Dhody Ketan Talaulikar for 734 their review and suggestions. 736 11. References 738 11.1. Normative References 740 [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for 741 IP Fast Reroute: Loop-Free Alternates", RFC 5286, 742 DOI 10.17487/RFC5286, September 2008, 743 . 745 [RFC5331] Aggarwal, R., Rekhter, Y., and E. Rosen, "MPLS Upstream 746 Label Assignment and Context-Specific Label Space", 747 RFC 5331, DOI 10.17487/RFC5331, August 2008, 748 . 750 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 751 Decraene, B., Litkowski, S., and R. Shakir, "Segment 752 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 753 July 2018, . 755 11.2. Informative References 757 [I-D.bashandy-rtgwg-segment-routing-uloop] 758 Bashandy, A., Filsfils, C., Litkowski, S., Decraene, B., 759 Francois, P., and P. Psenak, "Loop avoidance using Segment 760 Routing", draft-bashandy-rtgwg-segment-routing-uloop-09 761 (work in progress), June 2020. 763 [I-D.ietf-idr-bgpls-segment-routing-epe] 764 Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, 765 S., and J. Dong, "BGP-LS extensions for Segment Routing 766 BGP Egress Peer Engineering", draft-ietf-idr-bgpls- 767 segment-routing-epe-19 (work in progress), May 2019. 769 [I-D.ietf-rtgwg-segment-routing-ti-lfa] 770 Litkowski, S., Bashandy, A., Filsfils, C., Decraene, B., 771 Francois, P., Voyer, D., Clad, F., and P. Camarillo, 772 "Topology Independent Fast Reroute using Segment Routing", 773 draft-ietf-rtgwg-segment-routing-ti-lfa-04 (work in 774 progress), August 2020. 776 [I-D.li-rtgwg-enhanced-ti-lfa] 777 Li, C. and Z. Hu, "Enhanced Topology Independent Loop-free 778 Alternate Fast Re-route", draft-li-rtgwg-enhanced-ti- 779 lfa-02 (work in progress), August 2020. 781 [RFC8102] Sarkar, P., Ed., Hegde, S., Bowers, C., Gredler, H., and 782 S. Litkowski, "Remote-LFA Node Protection and 783 Manageability", RFC 8102, DOI 10.17487/RFC8102, March 784 2017, . 786 [RFC8665] Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler, 787 H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF 788 Extensions for Segment Routing", RFC 8665, 789 DOI 10.17487/RFC8665, December 2019, 790 . 792 [RFC8667] Previdi, S., Ed., Ginsberg, L., Ed., Filsfils, C., 793 Bashandy, A., Gredler, H., and B. Decraene, "IS-IS 794 Extensions for Segment Routing", RFC 8667, 795 DOI 10.17487/RFC8667, December 2019, 796 . 798 [RFC8679] Shen, Y., Jeganathan, M., Decraene, B., Gredler, H., 799 Michel, C., and H. Chen, "MPLS Egress Protection 800 Framework", RFC 8679, DOI 10.17487/RFC8679, December 2019, 801 . 803 Authors' Addresses 804 Shraddha Hegde 805 Juniper Networks Inc. 806 Exora Business Park 807 Bangalore, KA 560103 808 India 810 Email: shraddha@juniper.net 812 Chris Bowers 813 Juniper Networks Inc. 815 Email: cbowers@juniper.net 817 Stephane Litkowski 818 Cisco Systems 820 Email: slitkows.ietf@gmail.com 822 Xiaohu Xu 823 Alibaba Inc. 824 Beijing 825 China 827 Email: xiaohu.xxh@alibaba-inc.com 829 Feng Xu 830 Tencent 831 China 833 Email: oliverxu@tencent.com