idnits 2.17.1 draft-filsfils-spring-sr-policy-considerations-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 15, 2019) is 1809 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-22) exists of draft-ietf-spring-segment-routing-policy-02 == Outdated reference: A later version (-08) exists of draft-anand-spring-poi-sr-07 == Outdated reference: A later version (-18) exists of draft-ietf-idr-bgp-ls-segment-routing-ext-12 == Outdated reference: A later version (-19) exists of draft-ietf-idr-bgpls-segment-routing-epe-18 == Outdated reference: A later version (-26) exists of draft-ietf-idr-segment-routing-te-policy-05 == Outdated reference: A later version (-19) exists of draft-ietf-idr-te-lsp-distribution-10 == Outdated reference: A later version (-26) exists of draft-ietf-lsr-flex-algo-01 == Outdated reference: A later version (-22) exists of draft-ietf-spring-segment-routing-mpls-19 == Outdated reference: A later version (-07) exists of draft-sivabalan-pce-binding-label-sid-06 -- Obsolete informational reference (is this intentional?): RFC 7752 (Obsoleted by RFC 9552) -- Obsolete informational reference (is this intentional?): RFC 7810 (Obsoleted by RFC 8570) Summary: 0 errors (**), 0 flaws (~~), 10 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SPRING Working Group C. Filsfils 3 Internet-Draft K. Talaulikar, Ed. 4 Intended status: Informational Cisco Systems, Inc. 5 Expires: October 17, 2019 P. Krol 6 Google, Inc. 7 M. Horneffer 8 Deutsche Telekom 9 P. Mattes 10 Microsoft 11 April 15, 2019 13 SR Policy Implementation and Deployment Considerations 14 draft-filsfils-spring-sr-policy-considerations-03.txt 16 Abstract 18 Segment Routing (SR) allows a headend node to steer a packet flow 19 along any path. Intermediate per-flow states are eliminated thanks 20 to source routing. SR Policy framework enables the instantiation and 21 the management of necessary state on the headend node for flows along 22 a source routed paths using an ordered list of segments associated 23 with their specific SR Policies. This document describes some of the 24 implementation and deployment aspects that are useful for 25 operationalizing the SR Policy architecture. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on October 17, 2019. 44 Copyright Notice 46 Copyright (c) 2019 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (https://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. SR Policy Headend Architecture . . . . . . . . . . . . . . . 3 63 3. Dynamic Path Computation . . . . . . . . . . . . . . . . . . 4 64 3.1. Optimization Objective . . . . . . . . . . . . . . . . . 4 65 3.2. Constraints . . . . . . . . . . . . . . . . . . . . . . . 5 66 3.3. SR Native Algorithm . . . . . . . . . . . . . . . . . . . 6 67 3.4. Path to SID . . . . . . . . . . . . . . . . . . . . . . . 7 68 4. Candidate Path Selection . . . . . . . . . . . . . . . . . . 7 69 5. Distributed and/or Centralized Control Plane . . . . . . . . 11 70 5.1. Distributed Control Plane within a single Link-State IGP 71 area . . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 5.2. Distributed Control Plane across several Link-State IGP 73 areas . . . . . . . . . . . . . . . . . . . . . . . . . . 11 74 5.3. Centralized Control Plane . . . . . . . . . . . . . . . . 12 75 5.4. Distributed and Centralized Control Plane . . . . . . . . 12 76 6. Binding SID Aspects . . . . . . . . . . . . . . . . . . . . . 13 77 6.1. Benefits of Binding SID . . . . . . . . . . . . . . . . . 13 78 6.2. Centralized Discovery of available BSID . . . . . . . . . 14 79 7. Flex-Algorithm Based SR Policies . . . . . . . . . . . . . . 16 80 8. Layer 2 and Optical Transport . . . . . . . . . . . . . . . . 17 81 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 82 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 83 11. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 18 84 12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 18 85 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 86 13.1. Normative References . . . . . . . . . . . . . . . . . . 20 87 13.2. Informative References . . . . . . . . . . . . . . . . . 20 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 90 1. Introduction 92 Segment Routing (SR) allows a headend node to steer a packet flow 93 along any path. Intermediate per-flow states are eliminated with 94 source routing [RFC8402]. 96 The headend node steers a flow into a Segment Routing Policy (SR 97 Policy) by augmenting packet headers with the ordered list of 98 segments associated with that SR Policy. 99 [I-D.ietf-spring-segment-routing-policy] defines the SR Policy 100 architecture and details the concepts of SR Policy and steering into 101 an SR Policy. 103 This document describes some of the implementation aspects for SR 104 Policy framework which should be considered as suggestions. The same 105 behavior, as defined in [I-D.ietf-spring-segment-routing-policy], may 106 in fact be realized with other alternate approaches. The deployment 107 aspects described in this document are also meant to only serve as 108 guidelines. This document describes these aspects and other 109 considerations related to SR Policy concepts as they are important to 110 facilitate multi-vendor interoperable deployments for various SR 111 Policy use-cases. 113 These apply equally to the MPLS 114 [I-D.ietf-spring-segment-routing-mpls] and SRv6 115 [I-D.filsfils-spring-srv6-network-programming] instantiations of 116 segment routing. 118 For reading simplicity, the illustrations are provided for the MPLS 119 instantiations. 121 2. SR Policy Headend Architecture 123 This section provides a conceptual overview of components (or 124 functions) that interact to implement SR Policy on a headend 126 +--------+ +--------+ 127 | BGP | | PCEP | 128 +--------+ +--------+ 129 \ / 130 +--------+ +----------+ +--------+ 131 | | | SR | | | 132 | CLI |--| Policy |--| NETCONF| 133 | | | | | | 134 +--------+ +----------+ +--------+ 135 | 136 +--------+ 137 | FIB | 138 +--------+ 140 Figure 1: SR Policy Architecture at a Headend 142 The SR Policy functionality at a headend can be implemented in an SR 143 Policy (SRP) process as illustrated in Figure 1 . 145 The SRP process interacts with other processes to learn candidate 146 paths. 148 The SRP process selects the active path of an SR Policy. 150 The SRP process interacts with the RIB/FIB process to install an 151 active SR Policy in the dataplane. 153 In order to validate explicit candidate paths and compute dynamic 154 candidate paths, the SRP process maintains an SR Database (SR-DB) as 155 specified in [I-D.ietf-spring-segment-routing-policy]. The SRP 156 process interacts with other processes as shown in Figure 2 to 157 collect the SR-DB information. 159 +--------+ +--------+ +--------+ 160 | BGP SR | | BGP-LS | | IGP | 161 | Policy | +--------+ +--------+ 162 +--------+ \ | / 163 +--------+ +-----------+ +--------+ 164 | PCEP |---| SRP |--| NETCONF| 165 +--------+ +-----------+ +--------+ 167 Figure 2: Topology/link-state database architecture 169 The SR Policy architecture supports both centralized and distributed 170 control-plane. 172 3. Dynamic Path Computation 174 A dynamic candidate path for SR Policy is specified as an 175 optimization objective and constraints and needs to be computed by 176 either the headend or a Path Computation Element (PCE). The 177 distributed or centralized computation aspect is described further in 178 Section 5. This section describes the computation aspects of a 179 dynamic path. 181 3.1. Optimization Objective 183 This document describes two optimization objectives: 185 o Min-Metric - requests computation of a solution Segment-List 186 optimized for a selected metric. 188 o Min-Metric with margin and maximum number of SIDs - Min-Metric 189 with two changes: a margin of by which two paths with similar 190 metrics would be considered equal, a constraint on the max number 191 of SIDs in the Segment-List. 193 The "Min-Metric" optimization objective requests to compute a 194 solution Segment-List such that packets flowing through the solution 195 Segment-List use ECMP-aware paths optimized for the selected metric. 196 The "Min-Metric" objective can be instantiated for the IGP metric 197 ([RFC1195] [RFC2328] [RFC5340]) xor the TE metric ([RFC5305] 198 [RFC3630]) xor the latency extended TE metric ([RFC7810] [RFC7471]). 199 This metric is called the O metric (the optimized metric) to 200 distinguish it from the IGP metric. The solution Segment-List must 201 be computed to minimize the number of SIDs and the number of Segment- 202 Lists. 204 If the selected O metric is the IGP metric and the headend and 205 tailend are in the same IGP domain, then the solution Segment-List is 206 made of the single prefix-SID of the tailend. 208 When the selected O metric is not the IGP metric, then the solution 209 Segment-List is made of prefix SIDs of intermediate nodes, Adjacency 210 SIDs along intermediate links and potentially Binding SIDs (BSIDs) of 211 intermediate policies. 213 In many deployments there are insignificant metric differences 214 between mostly equal path (e.g. a difference of 100 usec of latency 215 between two paths from NYC to SFO would not matter in most cases). 216 The "Min-Metric with margin" objective supports such requirement. 218 The "Min-Metric with margin and maximum number of SIDs" optimization 219 objective requests to compute a solution Segment-List such that 220 packets flowing through the solution Segment-List do not use a path 221 whose cumulative O metric is larger than the shortest-path O metric + 222 margin. 224 If this is not possible because of the number of SIDs constraint, 225 then one option is that the solution Segment-List minimizes the O 226 metric while meeting the maximum number of SID constraints (i.e. path 227 with the least value of O metric while using <= the number of SIDs 228 specified). The other default option is to not come up with a 229 solution unless the desired SLA is guaranteed. 231 Section 7 describes another approach for computing a solution 232 Segment-List consisting of a single segment when the O metric is not 233 the IGP metric by using the Flex Algorithm Prefix-SID of the tailend. 235 3.2. Constraints 237 The following constraints can be described: 239 o Inclusion and/or exclusion of TE affinity. 241 o Inclusion and/or exclusion of IP address. 243 o Inclusion and/or exclusion of SRLG. 245 o Inclusion and/or exclusion of admin-tag. 247 o Maximum accumulated metric (IGP, TE and latency). 249 o Maximum number of SIDs in the solution Segment-List. 251 o Maximum number of weighted Segment-Lists in the solution set. 253 o Diversity to another service instance (e.g., link, node, or SRLG 254 disjoint paths originating from different head-ends). 256 3.3. SR Native Algorithm 258 1----------------2----------------3 259 |\ / 260 | \ / 261 | 4-------------5-------------7 262 | \ /| 263 | +-----------6-----------+ | 264 8------------------------------9 266 Figure 3: Illustration used to describe SR native algorithm 268 Let us assume that all the links have the same IGP metric of 10 and 269 let us consider the dynamic path defined as: Min-Metric(from 1, to 3, 270 IGP metric, margin 0) with constraint "avoid link 2-to-3". 272 A classical circuit implementation would do: prune the graph, compute 273 the shortest-path, pick a single non-ECMP branch of the ECMP-aware 274 shortest-path and encode it as a Segment-List. The solution Segment- 275 List would be <4, 5, 7, 3>. 277 An SR-native algorithm would find a Segment-List that minimizes the 278 number of SIDs and maximize the use of all the ECMP branches along 279 the ECMP shortest path. In this illustration, the solution Segment- 280 List would be <7, 3>. 282 In the vast majority of SR use-cases, SR-native algorithms should be 283 preferred: they preserve the native ECMP of IP and they minimize the 284 dataplane header overhead. 286 In some specific use-case (e.g. TDM migration over IP where the 287 circuit notion prevails), one may prefer a classic circuit 288 computation followed by an encoding into SIDs (potentially only using 289 non-protected Adj SIDs that pin the path to specific links and avoid 290 ECMP to reflect the TDM paradigm). 292 SR-native algorithms are a local node behavior and are thus outside 293 the scope of this document. 295 3.4. Path to SID 297 Let us assume the below diagram where all the links have an IGP 298 metric of 10 and a TE metric of 10 except the link AB which has an 299 IGP metric of 20 and the link AD which has a TE metric of 100. Let 300 us consider the min-metric(from A, to D, TE metric, margin 0). 302 B---C 303 | | 304 A---D 306 Figure 4: Illustration used to describe path to SID conversion 308 The solution path to this problem is ABCD. 310 This path can be expressed in SIDs as where B and D are the 311 IGP prefix SIDs respectively associated with nodes B and D in the 312 diagram. 314 Indeed, from A, the IGP path to B is AB (IGP metric 20 better than 315 ADCB of IGP metric 30). From B, the IGP path to D is BCD (IGP metric 316 20 better than BAD of IGP metric 30). 318 While the details of the algorithm remain a local node behavior, a 319 high-level description follows: start at the headend and find an IGP 320 prefix SID that leads as far down the desired path as 321 possible(without using any link not included in the desired path). 322 If no prefix SID exists, use the Adj SID to the first neighbor along 323 the path. Restart from the node that was reached. 325 4. Candidate Path Selection 327 An SR Policy may have multiple candidate paths that are provisioned 328 or signaled [I-D.ietf-idr-segment-routing-te-policy] 329 [I-D.ietf-pce-segment-routing] from one of more sources. The tie- 330 breaker rules defined in [I-D.ietf-spring-segment-routing-policy] 331 result in determination of a single "active path" in a formal 332 definition. 334 This section describe some examples for the candidate path selection 335 based on the same rules. 337 Example 1: 339 Consider headend H where two candidate paths of the same SR Policy 340 are signaled via BGP 341 [I-D.ietf-idr-segment-routing-te-policy] and whose respective NLRIs 342 have the same route distinguishers: 344 NLRI A with distinguisher = RD1, color = C, endpoint = N, preference 345 P1. 347 NLRI B with distinguisher = RD1, color = C, endpoint = N, preference 348 P2. 350 o Because the NLRIs are identical (same distinguisher), BGP will 351 perform bestpath selection. Note that there are no changes to BGP 352 best path selection algorithm. 354 o H installs one advertisement as bestpath into the BGP table. 356 o A single advertisement is passed to the SR Policy instantiation 357 process. 359 o The SRP process does not perform any path selection. 361 Note that the candidate path's preference value does not have any 362 effect on the BGP bestpath selection process. 364 Example 2: 366 Consider headend H where two candidate paths of the same SR Policy 367 are signaled via BGP and whose respective NLRIs 368 have different route distinguishers: 370 NLRI A with distinguisher = RD1, color = C, endpoint = N, preference 371 P1. 373 NLRI B with distinguisher = RD2, color = C, endpoint = N, preference 374 P2. 376 o Because the NLRIs are different (different distinguisher), BGP 377 will not perform bestpath selection. 379 o H installs both advertisements into the BGP table. 381 o Both advertisements are passed to the SR Policy instantiation 382 process. 384 o SRP process at H selects the candidate path advertised by NLRI B 385 as the active path for the SR policy since P2 is greater than P1. 387 Note that the recommended approach is to use NLRIs with different 388 distinguishers when several candidate paths for the same SR Policy 389 (color, endpoint) are signaled via BGP to a headend. 391 Example 3: 393 Consider that a headend H learns two candidate paths of the same SR 394 Policy one signaled via BGP and another via Local 395 configuration. 397 NLRI A with distinguisher = RD1, color = C, endpoint = N, preference 398 P1. 400 Local "foo" with color = C, endpoint = N, preference P2. 402 o H installs NLRI A into the BGP table. 404 o NLRI A and "foo" are both passed to the SRP process. 406 o SRP process at H selects the candidate path indicated by "foo" as 407 the active path for the SR policy since P2 is greater than P1. 409 Now, let us consider cases, when an SR Policy has multiple valid 410 candidate paths with the same best preference, the SRP process at a 411 headend uses the rules described in 412 [I-D.ietf-spring-segment-routing-policy] section 2.9 to select the 413 active path. This is explained in the following examples: 415 Example 4: 417 Consider headend H with two candidate paths of the same SR Policy 418 and the same preference value received from the 419 same controller R and where RD2 is higher than RD1. 421 o NLRI A with distinguisher RD1, color C, endpoint N, preference 422 P1(selected as active path at time t0). 424 o NLRI B with distinguisher RD2 (RD2 is greater than RD1), color C, 425 endpoint N, preference P1 (passed to SR Policy instatiation 426 process at time t1 > t0). 428 After t1, SRP process at H selects candidate path associated with 429 NLRI B as active path of the SR policy since RD2 is higher than RD1. 430 Here the time when the headend receives the candidate path via BGP is 431 not a factor in the selection. 433 Note that, in such a scenario where there are redundant sessions to 434 the same controller, the recommended approach is to use the same RD 435 value for conveying the same candidate paths and let the BGP best 436 path algorithm pick the best path. 438 Example 5: 440 Consider headend H with two candidate paths of the same SR Policy 441 and the same preference value both received from 442 the same controller R and where RD2 is higher than RD1. 444 Consider also that headend H is configured to override the 445 discriminator tiebreaker specified in 446 [I-D.ietf-spring-segment-routing-policy] section 2.9 448 o NLRI A with distinguisher RD1, color C, endpoint N, preference P1 449 (selected as active path at time t0). 451 o NLRI B with distinguisher RD2, color C, endpoint N, preference P1 452 (passed to SR Policy instatiation process at time t1). 454 Even after t1, SRP process at H retains candidate path associated 455 with NLRI A as active path of the SR policy since the discriminator 456 tiebreaker is disabled at H. 458 Example 6: 460 Consider headend H with two candidate paths of the same SR Policy 461 and the same preference value. 463 o Local "foo" with color C, endpoint N, preference P1 (selected as 464 active path at time t0). 466 o NLRI A with distinguisher RD1, color C, endpoint N, preference P1 467 (passed to SRP process at time t1). 469 Even after t1, SRP process at H retains candidate path associated 470 with local candidate path "foo" as active path of the SR policy since 471 the Local protocol is preferred over BGP by default based on its 472 higher protocol identifier value. 474 Example 7: 476 Consider headend H with two candidate paths of the same SR Policy 477 and the same preference value but received via 478 NETCONF from two controllers R and S (where S > R) 480 o Path A from R with distinguisher D1, color C, endpoint N, 481 preference P1 (selected as active path at time t0). 483 o Path B from S with distinguisher D2, color C, endpoint N, 484 preference P1 (passed to SRP process at time t1). 486 Note that the NETCONF process sends both paths to the SRP process 487 since it does not have any tiebreaker logic. After t1, SRP process 488 at H selects candidate path associated with Path B as active path of 489 the SR policy. 491 5. Distributed and/or Centralized Control Plane 493 5.1. Distributed Control Plane within a single Link-State IGP area 495 Consider a single-area IGP with per-link latency measurement and 496 advertisement of the measured latency in the extended-TE IGP TLV. 498 A head-end H is configured with a single dynamic candidate path for 499 SR policy P with a low-latency optimization objective and endpoint E. 501 Clearly the SRP process at H learns the topology (and extended TE 502 latency information) from the IGP and computes the solution Segment- 503 List providing the low-latency path to E. 505 No centralized controller is involved in such a deployment. 507 The SR-DB at H only uses the Link-State DataBase (LSDB) provided by 508 the IGP. 510 5.2. Distributed Control Plane across several Link-State IGP areas 512 Consider a domain D composed of two link-state IGP single-area 513 instances (I1 and I2) where each sub-domain benefits from per-link 514 latency measurement and advertisement of the measured latency in the 515 related IGP. The link-state information of each IGP is advertised 516 via BGP-LS [RFC7752] towards a set of BGP-LS route reflectors (RR). 518 H is a headend in IGP I1 sub-domain and E is an endpoint in IGP I2 519 sub-domain. 521 Using a BGP-LS session to any BGP-LS RR, H's SRP process may learn 522 the link-state information of the remote domain I2. H can thus 523 compute the low-latency path from H to E as a solution Segment-List 524 that spans the two domains I1 and I2. 526 The SR-DB at H collects the LSDB from both sub-domains (I1 and I2). 528 No centralized controller is required. 530 5.3. Centralized Control Plane 532 Considering the same domain D as in the previous section, let us now 533 assume that H does not have a BGP-LS session to the BGP-LS RR's. 534 Instead, let us assume a controller "C" has at least one BGP-LS 535 session to the BGP-LS RR's. 537 The controller C learns the topology and extended latency information 538 from both sub-domains via BGP-LS. It computes a low-latency path 539 from H to E as a Segment-List and programs H with the 540 related explicit candidate path. 542 The headend H does not compute the solution Segment-List (it cannot). 543 The headend only validates the received explicit candidate path. 544 Most probably, the controller encodes the SID's of the Segment-List 545 with Type-1. In that case, The headend's validation simply consists 546 in resolving the first SID on an outgoing interface and next-hop. 548 The SR-DB at H only includes the LSDB provided by the IGP I1. 550 The SR-DB of the controller collects the LSDB from both sub- 551 domains(I1 and I2). 553 5.4. Distributed and Centralized Control Plane 555 Consider the same domain D as in the previous section. 557 H's SRP process is configured to associate color C1 with a low- 558 latency optimization objective. 560 H's BGP process is configured to steer a Route R/r of extended-color 561 community C1 and of next-hop N via an SR policy (N, C1). 563 Upon receiving a first BGP route of color C1 and of next-hop N, H 564 recognizes the need for an SR Policy (N, C1) with a low-latency 565 objective to N. As N is outside the SRTE DB of H, H requests a 566 controller to compute such Segment-List (e.g., PCEP 567 [I-D.ietf-pce-segment-routing]). 569 This is an example of hybrid control-plane: the BGP distributed 570 control plane signals the routes and their TE requirements. Upon 571 receiving these BGP routes, a local headend either computes the 572 solution Segment-List (entirely distributed when the endpoint is in 573 the SR-DB of the headend) else delegates the computation to a 574 controller (hybrid distributed/centralized control-plane). 576 The SR-DB at H only includes the LSDB provided by the IGP. 578 The SR-DB of the controller collects the LSDB from both sub-domains. 580 6. Binding SID Aspects 582 The Binding SID (BSID) is fundamental to Segment Routing. It 583 provides scaling, network opacity and service independence. 585 This section describes implementation and operational aspects related 586 to the Binding SID. 588 6.1. Benefits of Binding SID 590 A simplified illustration is provided on the basis of Figure 5 where 591 it is assumed that S, A, B, Data Center Interconnect DCI1 and DCI2 592 share the same IGP-SR instance in the data-center 1 (DC1). DCI1, 593 DCI2, C, D, E, F, G, DCI3 and DCI4 share the same IGP-SR domain in 594 the core. DCI3, DCI4, H, K and Z share the same IGP-SR domain in the 595 data-center 2 (DC2). 597 A---DCI1----C----D----E----DCI3---H 598 / | | \ 599 S | | Z 600 \ | | / 601 B---DCI2----F---------G----DCI4---K 602 <==DC1==><=========Core========><==DC2==> 604 Figure 5: A Simple Datacenter Topology 606 In this example, it is assumed no redistribution between the IGP's 607 and no presence of BGP-LU. The inter-domain communication is only 608 provided by SR through SR Policies. 610 The latency from S to DCI1 equals to DCI2. The latency from Z to 611 DCI3 equals to DCI4. All the intra-DC links have the same IGP metric 612 10. 614 The path DCI1, C, D, E, DCI3 has a lower latency and lower capacity 615 than the path DCI2, F, G, DCI4. 617 The IGP metrics of all the core links are set to 10 except the links 618 D-E which is set to 100. 620 A low-latency multi-domain policy from S to Z may be expressed as 621 where: 623 o DCI1 is the prefix SID of DCI1. 625 o BSID is the Binding SID bound to an SR policy 626 instantiated at DCI1. 628 o Z is the prefix SID of Z. 630 Without the use of an intermediate core SR Policy (efficiently 631 summarized by a single BSID), S would need to steer its low-latency 632 flow into the policy . 634 The use of a BSID (and the intermediate bound SR Policy) decreases 635 the number of segments imposed by the source. 637 A BSID acts as a stable anchor point which isolates one domain from 638 the churn of another domain. Upon topology changes within the core 639 of the network, the low-latency path from DCI1 to DCI3 may change. 640 While the path of an intermediate policy changes, its BSID does not 641 change. Hence the policy used by the source does not change, hence 642 the source is shielded from the churn in another domain. 644 A BSID provides opacity and independence between domains. The 645 administrative authority of the core domain may not want to share 646 information about its topology. The use of a BSID allows keeping the 647 service opaque. S is not aware of the details of how the low-latency 648 service is provided by the core domain. S is not aware of the need 649 of the core authority to temporarily change the intermediate path. 651 6.2. Centralized Discovery of available BSID 653 This section explains how controllers can discover the local SIDs 654 available at a node N so as to pick an explicit BSID for a SR Policy 655 to be instantiated at headend N. 657 Any controller can discover the following properties of a node N 658 (e.g., via BGP-LS , NETCONF etc.): 660 o its local topology [RFC7752]. 662 o its topology-related SIDs (Prefix SIDs, Adj SID and EPE SID 663 [I-D.ietf-idr-bgp-ls-segment-routing-ext] 664 [I-D.ietf-idr-bgpls-segment-routing-epe]). 666 o its Segment Routing Label Block (SRLB). 668 o its SR Policies and their BSID ([I-D.ietf-pce-segment-routing] 669 [I-D.sivabalan-pce-binding-label-sid] 670 [I-D.ietf-idr-te-lsp-distribution]). 672 Any controller can thus infer the available SIDs in the SRLB of any 673 node with the assumption that all SIDs allocated from the SRLB on 674 that node are being advertised by it via some protocols or mechanisms 675 to the controller. 677 As an example, a controller discovers the following characteristics 678 of N: SRLB (4000, 8000), 3 Adj SIDs (4001, 4002, 4003), 2 EPE SIDs 679 (4004, 4005) and 3 SRTE policies (whose BSIDs are respectively 4006, 680 4007 and 4008). This controller can deduce that the SRLB sub-range 681 (4009, 8000) is free for allocation. 683 A controller is not restricted to use the next numerically available 684 SID in the available SRLB sub-range. It can pick any label in the 685 subset of available labels. This random pick make the chance for a 686 collision unlikely. 688 An operator could also sub-allocate the SRLB between different 689 controllers (e.g. (4000-4499) to controller 1 and (4500-5000) to 690 controller 2). 692 Inter-controller state-synchronization may be used to avoid/detect 693 collision in BSID. 695 All these techniques make the likelihood of a collision between 696 different controllers very unlikely. 698 In the unlikely case of a collision, the controllers will detect it 699 through system alerts, BGP-LS reporting using 700 [I-D.ietf-idr-te-lsp-distribution] or PCEP notification [RFC8231]. 701 They then have the choice to continue the operation of their SR 702 Policy with the dynamically allocated BSID or re-try with another 703 explicit pick. 705 Note: in deployments where PCE Protocol (PCEP) is used between head- 706 end and controller (PCE), a head-end can report BSID as well as 707 policy attributes (e.g., type of disjointness) and operational and 708 administrative states to controller. Similarly, a controller can 709 also assign/update the BSID of a policy via PCEP when instantiating 710 or updating SR Policy. 712 7. Flex-Algorithm Based SR Policies 714 SR allows for association of algorithms to Prefix SIDs [RFC8402]. 715 [I-D.ietf-lsr-flex-algo] defines the IGP based Flex-Algorithm 716 solution which allows IGPs themselves to compute constraint based 717 paths over the network. Prefix SIDs for the specific flex-algorithm 718 and associated with a node are used in the forwarding plane to steer 719 along the specific constraint path to that node. 721 As specified in [RFC8402] these IGP Flex Algo Prefix SIDs can be used 722 as segments within SR Policies thereby leveraging the underlying IGP 723 Flex Algo solution. 725 1--RED--2-------6 726 | | | 727 4-------3--RED--9 729 Figure 6: Illustration for Flex-Alg SID 731 Now let us assume that 733 o 1, 2, 3 and 4 are part of IGP 1. 735 o 2, 6, 9 and 3 are part of IGP 2. 737 o All the IGP link costs are 10. 739 o Links 1to2 and 3to9 are colored with IGP Link Affinity Red. 741 o Flex-Alg1 is defined in both IGPs as: avoid red, minimize IGP 742 metric. 744 o All nodes of each IGP domain are enabled for FlexAlg1 746 o SID(k, 0) represents the PrefixSID of node k according to Alg=0. 748 o SID(k, FlexAlg1) represents the PrefixSID of node k according to 749 Flex-Alg1. 751 A controller can steer a flow from 1 to 9 through an end-to-end path 752 that avoids the RED links of both IGP domains thanks to the explicit 753 SR Policy . 755 8. Layer 2 and Optical Transport 757 1----2----3----4----5 758 I2(lambda L241)\ / I4(lambda L241) 759 Optical 761 Figure 7: SR Policy with integrated DWDM 763 An explicit candidate path can express a path through a transport 764 layer beneath IP (ATM, FR, DWDM). The transport layer could be ATM, 765 FR, DWDM, back-to-back Ethernet etc. The transport path is modelled 766 as a link between two IP nodes with the specific assumption that no 767 distributed IP routing protocol runs over the link. The link may 768 have IP address or be IP unnumbered. Depending on the transport 769 protocol case, the link can be a physical DWDM interface and a lambda 770 (integrated solution), an Ethernet interface and a VLAN, an ATM 771 interface with a VPI/VCI, a FR interface with a DLCI etc. 773 Using the DWDM integrated use-case of Figure 7 as an illustration, 774 let us assume 776 o nodes 1, 2, 3, 4 and 5 are IP routers running an SR-enable IGP on 777 the links 1-2, 2-3, 3-4 and 4-5. 779 o The SRGB is homogeneous (16000, 24000). 781 o Node K's prefix SID is 16000+K. 783 o node 2 has an integrated DWDM interface I2 with Lambda L1. 785 o node 4 has an integrated DWDM interface I4 with Lambda L2. 787 o the optical network is provisioned with a circuit from 2 to 4 with 788 continuous lambda L241 (details outside the scope of this 789 document). 791 o Node 2 is provisioned with an SR policy with Segment-List 792 and Binding SID B where I2(L241) is of type 5 (IPv4) or 793 type 7 (IPv6), see section 4 of 794 [I-D.ietf-spring-segment-routing-policy] . 796 o node 1 steers a packet P1 towards the prefix SID of node 5 797 (16005). 799 o node 1 steers a packet P2 on the SR policy <16002, B, 16005>. 801 In such a case, the journey of P1 will be 1-2-3-4-5 while the journey 802 of P2 will be 1-2-lambda(L241)-4-5. P2 skips the IP hop 3 and 803 leverages the DWDM circuit from node 2 to node 4. P1 follows the 804 shortest-path computed by the distributed routing protocol. The path 805 of P1 is unaltered by the addition, modification or deletion of 806 optical bypass circuits. 808 The salient point of this example is that the SR Policy architecture 809 seamlessly support explicit candidate paths through any transport 810 sub-layer. 812 BGP-LS Extensions to describe the sub-IP-layer characteristics of the 813 SR Policy are out of scope of this document (e.g. in Figure 7, the 814 DWDM characteristics of the SR Policy at node 2 in terms of latency, 815 loss, security, domain/country traversed by the circuit etc.). 817 Further details of the SR Policy use-case for Packet Optical networks 818 are specified in [I-D.anand-spring-poi-sr] . 820 9. Security Considerations 822 The security considerations related to Segment Routing architecture 823 are described in [RFC8402] and for SR Policy architecture are 824 described in [I-D.ietf-spring-segment-routing-policy] and they apply 825 to this document as well. 827 10. IANA Considerations 829 This document has no actions for IANA. 831 11. Acknowledgement 833 The authors like to thank Tarek Saad, Dhanendra Jain, Muhammad 834 Durrani and Rob Shakir for their valuable comments and suggestions. 836 12. Contributors 838 The following people have contributed to this document: 840 Siva Sivabalan 841 Cisco Systems 842 Email: msiva@cisco.com 844 Zafar Ali 845 Cisco Systems 846 Email: zali@cisco.com 847 Jose Liste 848 Cisco Systems 849 Email: jliste@cisco.com 851 Francois Clad 852 Cisco Systems 853 Email: fclad@cisco.com 855 Kamran Raza 856 Cisco Systems 857 Email: skraza@cisco.com 859 Shraddha Hegde 860 Juniper Networks 861 Email: shraddha@juniper.net 863 Steven Lin 864 Google, Inc. 865 Email: stevenlin@google.com 867 Alex Bogdanov 868 Google, Inc. 869 Email: bogdanov@google.com 871 Daniel Voyer 872 Bell Canada 873 Email: daniel.voyer@bell.ca 875 Dirk Steinberg 876 Steinberg Consulting 877 Email: dws@steinbergnet.net 879 Bruno Decraene 880 Orange Business Services 881 Email: bruno.decraene@orange.com 883 Stephane Litkowski 884 Orange Business Services 885 Email: stephane.litkowski@orange.com 887 Luay Jalil 888 Verizon 889 Email: luay.jalil@verizon.com 891 13. References 893 13.1. Normative References 895 [I-D.ietf-spring-segment-routing-policy] 896 Filsfils, C., Sivabalan, S., daniel.voyer@bell.ca, d., 897 bogdanov@google.com, b., and P. Mattes, "Segment Routing 898 Policy Architecture", draft-ietf-spring-segment-routing- 899 policy-02 (work in progress), October 2018. 901 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 902 Decraene, B., Litkowski, S., and R. Shakir, "Segment 903 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 904 July 2018, . 906 13.2. Informative References 908 [I-D.anand-spring-poi-sr] 909 Anand, M., Bardhan, S., Subrahmaniam, R., Tantsura, J., 910 Mukhopadhyaya, U., and C. Filsfils, "Packet-Optical 911 Integration in Segment Routing", draft-anand-spring-poi- 912 sr-07 (work in progress), January 2019. 914 [I-D.filsfils-spring-srv6-network-programming] 915 Filsfils, C., Camarillo, P., Leddy, J., 916 daniel.voyer@bell.ca, d., Matsushima, S., and Z. Li, "SRv6 917 Network Programming", draft-filsfils-spring-srv6-network- 918 programming-07 (work in progress), February 2019. 920 [I-D.ietf-idr-bgp-ls-segment-routing-ext] 921 Previdi, S., Talaulikar, K., Filsfils, C., Gredler, H., 922 and M. Chen, "BGP Link-State extensions for Segment 923 Routing", draft-ietf-idr-bgp-ls-segment-routing-ext-12 924 (work in progress), March 2019. 926 [I-D.ietf-idr-bgpls-segment-routing-epe] 927 Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, 928 S., and J. Dong, "BGP-LS extensions for Segment Routing 929 BGP Egress Peer Engineering", draft-ietf-idr-bgpls- 930 segment-routing-epe-18 (work in progress), March 2019. 932 [I-D.ietf-idr-segment-routing-te-policy] 933 Previdi, S., Filsfils, C., Jain, D., Mattes, P., Rosen, 934 E., and S. Lin, "Advertising Segment Routing Policies in 935 BGP", draft-ietf-idr-segment-routing-te-policy-05 (work in 936 progress), November 2018. 938 [I-D.ietf-idr-te-lsp-distribution] 939 Previdi, S., Talaulikar, K., Dong, J., Chen, M., Gredler, 940 H., and J. Tantsura, "Distribution of Traffic Engineering 941 (TE) Policies and State using BGP-LS", draft-ietf-idr-te- 942 lsp-distribution-10 (work in progress), February 2019. 944 [I-D.ietf-lsr-flex-algo] 945 Psenak, P., Hegde, S., Filsfils, C., Talaulikar, K., and 946 A. Gulko, "IGP Flexible Algorithm", draft-ietf-lsr-flex- 947 algo-01 (work in progress), November 2018. 949 [I-D.ietf-pce-segment-routing] 950 Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., 951 and J. Hardwick, "PCEP Extensions for Segment Routing", 952 draft-ietf-pce-segment-routing-16 (work in progress), 953 March 2019. 955 [I-D.ietf-spring-segment-routing-mpls] 956 Bashandy, A., Filsfils, C., Previdi, S., Decraene, B., 957 Litkowski, S., and R. Shakir, "Segment Routing with MPLS 958 data plane", draft-ietf-spring-segment-routing-mpls-19 959 (work in progress), March 2019. 961 [I-D.sivabalan-pce-binding-label-sid] 962 Sivabalan, S., Filsfils, C., Tantsura, J., Hardwick, J., 963 Previdi, S., and C. Li, "Carrying Binding Label/Segment-ID 964 in PCE-based Networks.", draft-sivabalan-pce-binding- 965 label-sid-06 (work in progress), February 2019. 967 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 968 dual environments", RFC 1195, DOI 10.17487/RFC1195, 969 December 1990, . 971 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 972 DOI 10.17487/RFC2328, April 1998, 973 . 975 [RFC3630] Katz, D., Kompella, K., and D. Yeung, "Traffic Engineering 976 (TE) Extensions to OSPF Version 2", RFC 3630, 977 DOI 10.17487/RFC3630, September 2003, 978 . 980 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 981 Engineering", RFC 5305, DOI 10.17487/RFC5305, October 982 2008, . 984 [RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF 985 for IPv6", RFC 5340, DOI 10.17487/RFC5340, July 2008, 986 . 988 [RFC7471] Giacalone, S., Ward, D., Drake, J., Atlas, A., and S. 989 Previdi, "OSPF Traffic Engineering (TE) Metric 990 Extensions", RFC 7471, DOI 10.17487/RFC7471, March 2015, 991 . 993 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 994 S. Ray, "North-Bound Distribution of Link-State and 995 Traffic Engineering (TE) Information Using BGP", RFC 7752, 996 DOI 10.17487/RFC7752, March 2016, 997 . 999 [RFC7810] Previdi, S., Ed., Giacalone, S., Ward, D., Drake, J., and 1000 Q. Wu, "IS-IS Traffic Engineering (TE) Metric Extensions", 1001 RFC 7810, DOI 10.17487/RFC7810, May 2016, 1002 . 1004 [RFC8231] Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path 1005 Computation Element Communication Protocol (PCEP) 1006 Extensions for Stateful PCE", RFC 8231, 1007 DOI 10.17487/RFC8231, September 2017, 1008 . 1010 Authors' Addresses 1012 Clarence Filsfils 1013 Cisco Systems, Inc. 1014 Pegasus Parc 1015 De kleetlaan 6a, DIEGEM BRABANT 1831 1016 BELGIUM 1018 Email: cfilsfil@cisco.com 1020 Ketan Talaulikar (editor) 1021 Cisco Systems, Inc. 1023 Email: ketant@cisco.com 1025 Przemyslaw Krol 1026 Google, Inc. 1028 Email: pkrol@google.com 1029 Martin Horneffer 1030 Deutsche Telekom 1032 Email: martin.horneffer@telekom.de 1034 Paul Mattes 1035 Microsoft 1036 One Microsoft Way 1037 Redmond, WA 98052-6399 1038 USA 1040 Email: pamattes@microsoft.com