idnits 2.17.1 draft-filsfils-spring-sr-policy-considerations-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 21, 2018) is 2161 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-06) exists of draft-filsfils-spring-segment-routing-policy-05 == Outdated reference: A later version (-08) exists of draft-anand-spring-poi-sr-05 == Outdated reference: A later version (-07) exists of draft-filsfils-spring-srv6-network-programming-04 == Outdated reference: A later version (-18) exists of draft-ietf-idr-bgp-ls-segment-routing-ext-07 == Outdated reference: A later version (-19) exists of draft-ietf-idr-bgpls-segment-routing-epe-15 == Outdated reference: A later version (-26) exists of draft-ietf-idr-segment-routing-te-policy-03 == Outdated reference: A later version (-19) exists of draft-ietf-idr-te-lsp-distribution-08 == Outdated reference: A later version (-26) exists of draft-ietf-lsr-flex-algo-00 == Outdated reference: A later version (-16) exists of draft-ietf-pce-segment-routing-11 == Outdated reference: A later version (-22) exists of draft-ietf-spring-segment-routing-mpls-13 == Outdated reference: A later version (-07) exists of draft-sivabalan-pce-binding-label-sid-04 -- Obsolete informational reference (is this intentional?): RFC 7752 (Obsoleted by RFC 9552) -- Obsolete informational reference (is this intentional?): RFC 7810 (Obsoleted by RFC 8570) Summary: 0 errors (**), 0 flaws (~~), 12 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SPRING Working Group C. Filsfils 3 Internet-Draft S. Sivabalan 4 Intended status: Informational Cisco Systems, Inc. 5 Expires: November 22, 2018 S. Hegde 6 Juniper Networks, Inc. 7 D. Voyer 8 Bell Canada. 9 S. Lin 10 A. Bogdanov 11 P. Krol 12 Google, Inc. 13 M. Horneffer 14 Deutsche Telekom 15 D. Steinberg 16 Steinberg Consulting 17 B. Decraene 18 S. Litkowski 19 Orange Business Services 20 P. Mattes 21 Microsoft 22 Z. Ali 23 K. Talaulikar 24 J. Liste 25 F. Clad 26 K. Raza 27 Cisco Systems, Inc. 28 May 21, 2018 30 SR Policy Implementation and Deployment Considerations 31 draft-filsfils-spring-sr-policy-considerations-00.txt 33 Abstract 35 Segment Routing (SR) allows a headend node to steer a packet flow 36 along any path. Intermediate per-flow states are eliminated thanks 37 to source routing. SR Policy framework enables the instantiation and 38 the management of necessary state on the headend node for flows along 39 a source routed paths using an ordered list of segments associated 40 with their specific SR Policies. This document describes some of the 41 implementation and deployment aspects that are useful for 42 operationalizing the SR Policy architecture. 44 Status of This Memo 46 This Internet-Draft is submitted in full conformance with the 47 provisions of BCP 78 and BCP 79. 49 Internet-Drafts are working documents of the Internet Engineering 50 Task Force (IETF). Note that other groups may also distribute 51 working documents as Internet-Drafts. The list of current Internet- 52 Drafts is at https://datatracker.ietf.org/drafts/current/. 54 Internet-Drafts are draft documents valid for a maximum of six months 55 and may be updated, replaced, or obsoleted by other documents at any 56 time. It is inappropriate to use Internet-Drafts as reference 57 material or to cite them other than as "work in progress." 59 This Internet-Draft will expire on November 22, 2018. 61 Copyright Notice 63 Copyright (c) 2018 IETF Trust and the persons identified as the 64 document authors. All rights reserved. 66 This document is subject to BCP 78 and the IETF Trust's Legal 67 Provisions Relating to IETF Documents 68 (https://trustee.ietf.org/license-info) in effect on the date of 69 publication of this document. Please review these documents 70 carefully, as they describe your rights and restrictions with respect 71 to this document. Code Components extracted from this document must 72 include Simplified BSD License text as described in Section 4.e of 73 the Trust Legal Provisions and are provided without warranty as 74 described in the Simplified BSD License. 76 Table of Contents 78 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 79 2. SR Policy Headend Architecture . . . . . . . . . . . . . . . 3 80 3. Dynamic Path Computation . . . . . . . . . . . . . . . . . . 5 81 3.1. Optimization Objective . . . . . . . . . . . . . . . . . 5 82 3.2. Constraints . . . . . . . . . . . . . . . . . . . . . . . 6 83 3.3. SR Native Algorithm . . . . . . . . . . . . . . . . . . . 6 84 3.4. Path to SID . . . . . . . . . . . . . . . . . . . . . . . 7 85 4. Candidate Path Selection . . . . . . . . . . . . . . . . . . 8 86 5. Distributed and/or Centralized Control Plane . . . . . . . . 11 87 5.1. Distributed Control Plane within a single Link-State IGP 88 area . . . . . . . . . . . . . . . . . . . . . . . . . . 11 89 5.2. Distributed Control Plane across several Link-State IGP 90 areas . . . . . . . . . . . . . . . . . . . . . . . . . . 12 91 5.3. Centralized Control Plane . . . . . . . . . . . . . . . . 12 92 5.4. Distributed and Centralized Control Plane . . . . . . . . 13 93 6. Binding SID Aspects . . . . . . . . . . . . . . . . . . . . . 13 94 6.1. Benefits of Binding SID . . . . . . . . . . . . . . . . . 13 95 6.2. Centralized Discovery of available BSID . . . . . . . . . 15 96 7. Flex-Algorithm Based SR Policies . . . . . . . . . . . . . . 16 97 8. Layer 2 and Optical Transport . . . . . . . . . . . . . . . . 17 98 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 99 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 100 11. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 19 101 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 102 12.1. Normative References . . . . . . . . . . . . . . . . . . 19 103 12.2. Informative References . . . . . . . . . . . . . . . . . 19 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 106 1. Introduction 108 Segment Routing (SR) allows a headend node to steer a packet flow 109 along any path. Intermediate per-flow states are eliminated with 110 source routing [I-D.ietf-spring-segment-routing]. 112 The headend node steers a flow into a Segment Routing Policy (SR 113 Policy) by augmenting packet headers with the ordered list of 114 segments associated with that SR Policy. 115 [I-D.filsfils-spring-segment-routing-policy] defines the SR Policy 116 architecture and details the concepts of SR Policy and steering into 117 an SR Policy. 119 This document describes some of the implementation aspects for SR 120 Policy framework which should be considered as suggestions. The same 121 behavior, as defined in [I-D.filsfils-spring-segment-routing-policy], 122 may in fact be realized with other alternate approaches. The 123 deployment aspects described in this document are also meant to only 124 serve as guidelines. This document describes these aspects and other 125 considerations related to SR Policy concepts as they are important to 126 facilitate multi-vendor interoperable deployments for various SR 127 Policy use-cases. 129 These apply equally to the MPLS 130 [I-D.ietf-spring-segment-routing-mpls] and SRv6 131 [I-D.filsfils-spring-srv6-network-programming] instantiations of 132 segment routing. 134 For reading simplicity, the illustrations are provided for the MPLS 135 instantiations. 137 2. SR Policy Headend Architecture 138 +--------+ +--------+ 139 | BGP | | PCEP | 140 +--------+ +--------+ 141 \ / 142 +--------+ +----------+ +--------+ 143 | | | SR Policy| | | 144 | CLI |--| Module |--| NETCONF| 145 | | | (SRPM) | | | 146 +--------+ +----------+ +--------+ 147 | 148 +--------+ 149 | FIB | 150 +--------+ 152 Figure 1: SR Policy Architecture at a Headend 154 The SR Policy functionality at a headend can be implemented in an SR 155 Policy Module (SRPM) process as illustrated in Figure 1 . 157 The SRPM process interacts with other processes to learn candidate 158 paths. 160 The SRPM process selects the active path of an SR Policy. 162 The SRPM process interacts with the RIB/FIB process to install an 163 active SR Policy in the dataplane. 165 In order to validate explicit candidate paths and compute dynamic 166 candidate paths, the SRPM process maintains an SR Database (SR-DB) as 167 specified in [I-D.filsfils-spring-segment-routing-policy]. The SRPM 168 process interacts with other processes as shown in Figure 2 to 169 collect the SR-DB information. 171 +--------+ +--------+ +--------+ 172 | BGP SR | | BGP-LS | | IGP | 173 | Policy | +--------+ +--------+ 174 +--------+ \ | / 175 +--------+ +-----------+ +--------+ 176 | PCEP |---| SRPM |--| NETCONF| 177 +--------+ +-----------+ +--------+ 179 Figure 2: Topology/link-state database architecture 181 The SR Policy architecture supports both centralized and distributed 182 control-plane. 184 3. Dynamic Path Computation 186 A dynamic candidate path for SR Policy is specified as an 187 optimization objective and constraints and needs to be computed by 188 either the headend or a Path Computation Element (PCE). The 189 distributed or centralized computation aspect is described further in 190 Section 5. This section describes the computation aspects of a 191 dynamic path. 193 3.1. Optimization Objective 195 This document describes two optimization objectives: 197 o Min-Metric - requests computation of a solution SID-List optimized 198 for a selected metric. 200 o Min-Metric with margin and maximum number of SIDs - Min-Metric 201 with two changes: a margin of by which two paths with similar 202 metrics would be considered equal, a constraint on the max number 203 of SIDs in the SID-List. 205 The "Min-Metric" optimization objective requests to compute a 206 solution SID-List such that packets flowing through the solution SID- 207 List use ECMP-aware paths optimized for the selected metric. The 208 "Min-Metric" objective can be instantiated for the IGP metric 209 ([RFC1195] [RFC2328] [RFC5340]) xor the TE metric ([RFC5305] 210 [RFC3630]) xor the latency extended TE metric ([RFC7810] [RFC7471]). 211 This metric is called the O metric (the optimized metric) to 212 distinguish it from the IGP metric. The solution SID-List must be 213 computed to minimize the number of SIDs and the number of SID-Lists. 215 If the selected O metric is the IGP metric and the headend and 216 tailend are in the same IGP domain, then the solution SID-List is 217 made of the single prefix-SID of the tailend. 219 When the selected O metric is not the IGP metric, then the solution 220 SID-List is made of prefix SIDs of intermediate nodes, Adjacency SIDs 221 along intermediate links and potentially Binding SIDs (BSIDs) of 222 intermediate policies. 224 In many deployments there are insignificant metric differences 225 between mostly equal path (e.g. a difference of 100 usec of latency 226 between two paths from NYC to SFO would not matter in most cases). 227 The "Min-Metric with margin" objective supports such requirement. 229 The "Min-Metric with margin and maximum number of SIDs" optimization 230 objective requests to compute a solution SID-List such that packets 231 flowing through the solution SID-List do not use a path whose 232 cumulative O metric is larger than the shortest-path O metric + 233 margin. 235 If this is not possible because of the number of SIDs constraint, 236 then the solution SID-List minimizes the O metric while meeting the 237 maximum number of SID constraints (i.e. path with the least value of 238 O metric while using <= the number of SIDs specified). 240 3.2. Constraints 242 The following constraints can be described: 244 o Inclusion and/or exclusion of TE affinity. 246 o Inclusion and/or exclusion of IP address. 248 o Inclusion and/or exclusion of SRLG. 250 o Inclusion and/or exclusion of admin-tag. 252 o Maximum accumulated metric (IGP, TE and latency). 254 o Maximum number of SIDs in the solution SID-List. 256 o Maximum number of weighted SID-Lists in the solution set. 258 o Diversity to another service instance (e.g., link, node, or SRLG 259 disjoint paths originating from different head-ends). 261 3.3. SR Native Algorithm 263 1----------------2----------------3 264 |\ / 265 | \ / 266 | 4-------------5-------------7 267 | \ /| 268 | +-----------6-----------+ | 269 8------------------------------9 271 Figure 3: Illustration used to describe SR native algorithm 273 Let us assume that all the links have the same IGP metric of 10 and 274 let us consider the dynamic path defined as: Min-Metric(from 1, to 3, 275 IGP metric, margin 0) with constraint "avoid link 2-to-3". 277 A classical circuit implementation would do: prune the graph, compute 278 the shortest-path, pick a single non-ECMP branch of the ECMP-aware 279 shortest-path and encode it as a SID-List. The solution SID-List 280 would be <4, 5, 7, 3>. 282 An SR-native algorithm would find a SID-List that minimizes the 283 number of SIDs and maximize the use of all the ECMP branches along 284 the ECMP shortest path. In this illustration, the solution SID-List 285 would be <7, 3>. 287 In the vast majority of SR use-cases, SR-native algorithms should be 288 preferred: they preserve the native ECMP of IP and they minimize the 289 dataplane header overhead. 291 In some specific use-case (e.g. TDM migration over IP where the 292 circuit notion prevails), one may prefer a classic circuit 293 computation followed by an encoding into SIDs (potentially only using 294 non-protected Adj SIDs that pin the path to specific links and avoid 295 ECMP to reflect the TDM paradigm). 297 SR-native algorithms are a local node behavior and are thus outside 298 the scope of this document. 300 3.4. Path to SID 302 Let us assume the below diagram where all the links have an IGP 303 metric of 10 and a TE metric of 10 except the link AB which has an 304 IGP metric of 20 and the link AD which has a TE metric of 100. Let 305 us consider the min-metric(from A, to D, TE metric, margin 0). 307 B---C 308 | | 309 A---D 311 Figure 4: Illustration used to describe path to SID conversion 313 The solution path to this problem is ABCD. 315 This path can be expressed in SIDs as where B and D are the 316 IGP prefix SIDs respectively associated with nodes B and D in the 317 diagram. 319 Indeed, from A, the IGP path to B is AB (IGP metric 20 better than 320 ADCB of IGP metric 30). From B, the IGP path to D is BCD (IGP metric 321 20 better than BAD of IGP metric 30). 323 While the details of the algorithm remain a local node behavior, a 324 high-level description follows: start at the headend and find an IGP 325 prefix SID that leads as far down the desired path as 326 possible(without using any link not included in the desired path). 328 If no prefix SID exists, use the Adj SID to the first neighbor along 329 the path. Restart from the node that was reached. 331 4. Candidate Path Selection 333 An SR Policy may have multiple candidate paths that are provisioned 334 or signaled [I-D.ietf-idr-segment-routing-te-policy] 335 [I-D.ietf-pce-segment-routing] from one of more sources. The tie- 336 breaker rules defined in [I-D.filsfils-spring-segment-routing-policy] 337 result in determination of a single "active path" in a formal 338 definition. 340 This section describe some examples for the candidate path selection 341 based on the same rules. 343 Example 1: 345 Consider headend H where two candidate paths of the same SR Policy 346 are signaled via BGP 347 [I-D.ietf-idr-segment-routing-te-policy] and whose respective NLRIs 348 have the same route distinguishers: 350 NLRI A with distinguisher = RD1, color = C, endpoint = N, preference 351 P1. 353 NLRI B with distinguisher = RD1, color = C, endpoint = N, preference 354 P2. 356 o Because the NLRIs are identical (same distinguisher), BGP will 357 perform bestpath selection. Note that there are no changes to BGP 358 best path selection algorithm. 360 o H installs one advertisement as bestpath into the BGP table. 362 o A single advertisement is passed to the SR Policy instantiation 363 process. 365 o The SRPM process does not perform any path selection. 367 Note that the candidate path's preference value does not have any 368 effect on the BGP bestpath selection process. 370 Example 2: 372 Consider headend H where two candidate paths of the same SR Policy 373 are signaled via BGP and whose respective NLRIs 374 have different route distinguishers: 376 NLRI A with distinguisher = RD1, color = C, endpoint = N, preference 377 P1. 379 NLRI B with distinguisher = RD2, color = C, endpoint = N, preference 380 P2. 382 o Because the NLRIs are different (different distinguisher), BGP 383 will not perform bestpath selection. 385 o H installs both advertisements into the BGP table. 387 o Both advertisements are passed to the SR Policy instantiation 388 process. 390 o SRPM process at H selects the candidate path advertised by NLRI B 391 as the active path for the SR policy since P2 is greater than P1. 393 Note that the recommended approach is to use NLRIs with different 394 distinguishers when several candidate paths for the same SR Policy 395 (endpoint, color) are signaled via BGP to a headend. 397 Example 3: 399 Consider that a headend H learns two candidate paths of the same SR 400 Policy one signaled via BGP and another via Local 401 configuration. 403 NLRI A with distinguisher = RD1, color = C, endpoint = N, preference 404 P1. 406 Local "foo" with color = C, endpoint = N, preference P2. 408 o H installs NLRI A into the BGP table. 410 o NLRI A and "foo" are both passed to the SRPM process. 412 o SRPM process at H selects the candidate path indicated by "foo" as 413 the active path for the SR policy since P2 is greater than P1. 415 Now, let us consider cases, when an SR Policy has multiple valid 416 candidate paths with the same best preference, the SRPM process at a 417 headend uses the rules described in 418 [I-D.filsfils-spring-segment-routing-policy] section 2.9 to select 419 the active path. This is explained in the following examples: 421 Example 4: 423 Consider headend H with two candidate paths of the same SR Policy 424 and the same preference value received from the 425 same controller R and where RD2 is higher than RD1. 427 o NLRI A with distinguisher RD1, color C, endpoint N, preference 428 P1(selected as active path at time t0). 430 o NLRI B with distinguisher RD2 (RD2 is greater than RD1), color C, 431 endpoint N, preference P1 (passed to SR Policy instatiation 432 process at time t1 > t0). 434 After t1, SRPM process at H selects candidate path associated with 435 NLRI B as active path of the SR policy since RD2 is higher than RD1. 436 Here the time when the headend receives the candidate path via BGP is 437 not a factor in the selection. 439 Note that, in such a scenario where there are redundant sessions to 440 the same controller, the recommended approach is to use the same RD 441 value for conveying the same candidate paths and let the BGP best 442 path algorithm pick the best path. 444 Example 5: 446 Consider headend H with two candidate paths of the same SR Policy 447 and the same preference value both received from 448 the same controller R and where RD2 is higher than RD1. 450 Consider also that headend H is configured to override the 451 discriminator tiebreaker specified in 452 [I-D.filsfils-spring-segment-routing-policy] section 2.9 454 o NLRI A with distinguisher RD1, color C, endpoint N, preference P1 455 (selected as active path at time t0). 457 o NLRI B with distinguisher RD2, color C, endpoint N, preference P1 458 (passed to SR Policy instatiation process at time t1). 460 Even after t1, SRPM process at H retains candidate path associated 461 with NLRI A as active path of the SR policy since the discriminator 462 tiebreaker is disabled at H. 464 Example 6: 466 Consider headend H with two candidate paths of the same SR Policy 467 and the same preference value. 469 o Local "foo" with color C, endpoint N, preference P1 (selected as 470 active path at time t0). 472 o NLRI A with distinguisher RD1, color C, endpoint N, preference P1 473 (passed to SRPM process at time t1). 475 Even after t1, SRPM process at H retains candidate path associated 476 with local candidate path "foo" as active path of the SR policy since 477 the Local protocol is preferred over BGP by default based on its 478 higher protocol identifier value. 480 Example 7: 482 Consider headend H with two candidate paths of the same SR Policy 483 and the same preference value but received via 484 NETCONF from two controllers R and S (where S > R) 486 o Path A from R with distinguisher D1, color C, endpoint N, 487 preference P1 (selected as active path at time t0). 489 o Path B from S with distinguisher D2, color C, endpoint N, 490 preference P1 (passed to SRPM process at time t1). 492 Note that the NETCONF process sends both paths to the SRPM process 493 since it does not have any tiebreaker logic. After t1, SRPM process 494 at H selects candidate path associated with Path B as active path of 495 the SR policy. 497 5. Distributed and/or Centralized Control Plane 499 5.1. Distributed Control Plane within a single Link-State IGP area 501 Consider a single-area IGP with per-link latency measurement and 502 advertisement of the measured latency in the extended-TE IGP TLV. 504 A head-end H is configured with a single dynamic candidate path for 505 SR policy P with a low-latency optimization objective and endpoint E. 507 Clearly the SRPM process at H learns the topology (and extended TE 508 latency information) from the IGP and computes the solution SID list 509 providing the low-latency path to E. 511 No centralized controller is involved in such a deployment. 513 The SR-DB at H only uses the Link-State DataBase (LSDB) provided by 514 the IGP. 516 5.2. Distributed Control Plane across several Link-State IGP areas 518 Consider a domain D composed of two link-state IGP single-area 519 instances (I1 and I2) where each sub-domain benefits from per-link 520 latency measurement and advertisement of the measured latency in the 521 related IGP. The link-state information of each IGP is advertised 522 via BGP-LS [RFC7752] towards a set of BGP-LS route reflectors (RR). 523 H is a headend in IGP I1 sub-domain and E is an endpoint in IGP I2 524 sub-domain. 526 Using a BGP-LS session to any BGP-LS RR, H's SRPM process may learn 527 the link-state information of the remote domain I2. H can thus 528 compute the low-latency path from H to E as a solution SID list that 529 spans the two domains I1 and I2. 531 The SR-DB at H collects the LSDB from both sub-domains (I1 and I2). 533 No centralized controller is required. 535 5.3. Centralized Control Plane 537 Considering the same domain D as in the previous section, let us now 538 assume that H does not have a BGP-LS session to the BGP-LS RR's. 539 Instead, let us assume a controller "C" has at least one BGP-LS 540 session to the BGP-LS RR's. 542 The controller C learns the topology and extended latency information 543 from both sub-domains via BGP-LS. It computes a low-latency path 544 from H to E as a SID list and programs H with the 545 related explicit candidate path. 547 The headend H does not compute the solution SID list (it cannot). 548 The headend only validates the received explicit candidate path. 549 Most probably, the controller encodes the SID's of the SID-List with 550 Type-1. In that case, The headend's validation simply consists in 551 resolving the first SID on an outgoing interface and next-hop. 553 The SR-DB at H only includes the LSDB provided by the IGP I1. 555 The SR-DB of the controller collects the LSDB from both sub- 556 domains(I1 and I2). 558 5.4. Distributed and Centralized Control Plane 560 Consider the same domain D as in the previous section. 562 H's SRPM process is configured to associate color C1 with a low- 563 latency optimization objective. 565 H's BGP process is configured to steer a Route R/r of extended-color 566 community C1 and of next-hop N via an SR policy (N, C1). 568 Upon receiving a first BGP route of color C1 and of next-hop N, H 569 recognizes the need for an SR Policy (N, C1) with a low-latency 570 objective to N. As N is outside the SRTE DB of H, H requests a 571 controller to compute such SID list (e.g., PCEP 572 [I-D.ietf-pce-segment-routing]). 574 This is an example of hybrid control-plane: the BGP distributed 575 control plane signals the routes and their TE requirements. Upon 576 receiving these BGP routes, a local headend either computes the 577 solution SID list (entirely distributed when the endpoint is in the 578 SR-DB of the headend) else delegates the computation to a controller 579 (hybrid distributed/centralized control-plane). 581 The SR-DB at H only includes the LSDB provided by the IGP. 583 The SR-DB of the controller collects the LSDB from both sub-domains. 585 6. Binding SID Aspects 587 The Binding SID (BSID) is fundamental to Segment Routing. It 588 provides scaling, network opacity and service independence. 590 This section describes implementation and operational aspects related 591 to the Binding SID. 593 6.1. Benefits of Binding SID 595 A simplified illustration is provided on the basis of Figure 5 where 596 it is assumed that S, A, B, Data Center Interconnect DCI1 and DCI2 597 share the same IGP-SR instance in the data-center 1 (DC1). DCI1, 598 DCI2, C, D, E, F, G, DCI3 and DCI4 share the same IGP-SR domain in 599 the core. DCI3, DCI4, H, K and Z share the same IGP-SR domain in the 600 data-center 2 (DC2). 602 A---DCI1----C----D----E----DCI3---H 603 / | | \ 604 S | | Z 605 \ | | / 606 B---DCI2----F---------G----DCI4---K 607 <==DC1==><=========Core========><==DC2==> 609 Figure 5: A Simple Datacenter Topology 611 In this example, it is assumed no redistribution between the IGP's 612 and no presence of BGP-LU. The inter-domain communication is only 613 provided by SR through SR Policies. 615 The latency from S to DCI1 equals to DCI2. The latency from Z to 616 DCI3 equals to DCI4. All the intra-DC links have the same IGP metric 617 10. 619 The path DCI1, C, D, E, DCI3 has a lower latency and lower capacity 620 than the path DCI2, F, G, DCI4. 622 The IGP metrics of all the core links are set to 10 except the links 623 D-E which is set to 100. 625 A low-latency multi-domain policy from S to Z may be expressed as 626 where: 628 o DCI1 is the prefix SID of DCI1. 630 o BSID is the Binding SID bound to an SR policy 631 instantiated at DCI1. 633 o Z is the prefix SID of Z. 635 Without the use of an intermediate core SR Policy (efficiently 636 summarized by a single BSID), S would need to steer its low-latency 637 flow into the policy . 639 The use of a BSID (and the intermediate bound SR Policy) decreases 640 the number of segments imposed by the source. 642 A BSID acts as a stable anchor point which isolates one domain from 643 the churn of another domain. Upon topology changes within the core 644 of the network, the low-latency path from DCI1 to DCI3 may change. 645 While the path of an intermediate policy changes, its BSID does not 646 change. Hence the policy used by the source does not change, hence 647 the source is shielded from the churn in another domain. 649 A BSID provides opacity and independence between domains. The 650 administrative authority of the core domain may not want to share 651 information about its topology. The use of a BSID allows keeping the 652 service opaque. S is not aware of the details of how the low-latency 653 service is provided by the core domain. S is not aware of the need 654 of the core authority to temporarily change the intermediate path. 656 6.2. Centralized Discovery of available BSID 658 This section explains how controllers can discover the local SIDs 659 available at a node N so as to pick an explicit BSID for a SR Policy 660 to be instantiated at headend N. 662 Any controller can discover the following properties of a node N 663 (e.g., via BGP-LS , NETCONF etc.): 665 o its local topology [RFC7752]. 667 o its topology-related SIDs (Prefix SIDs, Adj SID and EPE SID 668 [I-D.ietf-idr-bgp-ls-segment-routing-ext] 669 [I-D.ietf-idr-bgpls-segment-routing-epe]). 671 o its Segment Routing Label Block (SRLB). 673 o its SR Policies and their BSID ([I-D.ietf-pce-segment-routing] 674 [I-D.sivabalan-pce-binding-label-sid] 675 [I-D.ietf-idr-te-lsp-distribution]). 677 Any controller can thus infer the available SIDs in the SRLB of any 678 node. 680 As an example, a controller discovers the following characteristics 681 of N: SRLB (4000, 8000), 3 Adj SIDs (4001, 4002, 4003), 2 EPE SIDs 682 (4004, 4005) and 3 SRTE policies (whose BSIDs are respectively 4006, 683 4007 and 4008). This controller can deduce that the SRLB sub-range 684 (4009, 5000) is free for allocation. 686 A controller is not restricted to use the next numerically available 687 SID in the available SRLB sub-range. It can pick any label in the 688 subset of available labels. This random pick make the chance for a 689 collision unlikely. 691 An operator could also sub-allocate the SRLB between different 692 controllers (e.g. (4000-4499) to controller 1 and (4500-5000) to 693 controller 2). 695 Inter-controller state-synchronization may be used to avoid/detect 696 collision in BSID. 698 All these techniques make the likelihood of a collision between 699 different controllers very unlikely. 701 In the unlikely case of a collision, the controllers will detect it 702 through system alerts, BGP-LS reporting using 703 [I-D.ietf-idr-te-lsp-distribution] or PCEP notification [RFC8231]. 704 They then have the choice to continue the operation of their SR 705 Policy with the dynamically allocated BSID or re-try with another 706 explicit pick. 708 Note: in deployments where PCE Protocol (PCEP) is used between head- 709 end and controller (PCE), a head-end can report BSID as well as 710 policy attributes (e.g., type of disjointness) and operational and 711 administrative states to controller. Similarly, a controller can 712 also assign/update the BSID of a policy via PCEP when instantiating 713 or updating SR Policy. 715 7. Flex-Algorithm Based SR Policies 717 SR allows for association of algorithms to Prefix SIDs 718 [I-D.ietf-spring-segment-routing]. [I-D.ietf-lsr-flex-algo] defines 719 the IGP based Flex-Algorithm solution which allows IGPs themselves to 720 compute constraint based paths over the network. Prefix SIDs for the 721 specific flex-algorithm and associated with a node are used in the 722 forwarding plane to steer along the specific constraint path to that 723 node. 725 As specified in [I-D.ietf-spring-segment-routing] these IGP Flex Algo 726 Prefix SIDs can be used as segments within SR Policies thereby 727 leveraging the underlying IGP Flex Algo solution. 729 1--RED--2-------6 730 | | | 731 4-------3--RED--9 733 Figure 6: Illustration for Flex-Alg SID 735 Now let us assume that 737 o 1, 2, 3 and 4 are part of IGP 1. 739 o 2, 6, 9 and 3 are part of IGP 2. 741 o All the IGP link costs are 10. 743 o Links 1to2 and 3to9 are colored with IGP Link Affinity Red. 745 o Flex-Alg1 is defined in both IGPs as: avoid red, minimize IGP 746 metric. 748 o All nodes of each IGP domain are enabled for FlexAlg1 750 o SID(k, 0) represents the PrefixSID of node k according to Alg=0. 752 o SID(k, FlexAlg1) represents the PrefixSID of node k according to 753 Flex-Alg1. 755 A controller can steer a flow from 1 to 9 through an end-to-end path 756 that avoids the RED links of both IGP domains thanks to the explicit 757 SR Policy . 759 8. Layer 2 and Optical Transport 761 1----2----3----4----5 762 I2(lambda L241)\ / I4(lambda L241) 763 Optical 765 Figure 7: SR Policy with integrated DWDM 767 An explicit candidate path can express a path through a transport 768 layer beneath IP (ATM, FR, DWDM). The transport layer could be ATM, 769 FR, DWDM, back-to-back Ethernet etc. The transport path is modelled 770 as a link between two IP nodes with the specific assumption that no 771 distributed IP routing protocol runs over the link. The link may 772 have IP address or be IP unnumbered. Depending on the transport 773 protocol case, the link can be a physical DWDM interface and a lambda 774 (integrated solution), an Ethernet interface and a VLAN, an ATM 775 interface with a VPI/VCI, a FR interface with a DLCI etc. 777 Using the DWDM integrated use-case of Figure 7 as an illustration, 778 let us assume 780 o nodes 1, 2, 3, 4 and 5 are IP routers running an SR-enable IGP on 781 the links 1-2, 2-3, 3-4 and 4-5. 783 o The SRGB is homogeneous (16000, 24000). 785 o Node K's prefix SID is 16000+K. 787 o node 2 has an integrated DWDM interface I2 with Lambda L1. 789 o node 4 has an integrated DWDM interface I4 with Lambda L2. 791 o the optical network is provisioned with a circuit from 2 to 4 with 792 continuous lambda L241 (details outside the scope of this 793 document). 795 o Node 2 is provisioned with an SR policy with SID list 796 and Binding SID B where I2(L241) is of type 5 (IPv4) or type 7 797 (IPv6), see section 4. 799 o node 1 steers a packet P1 towards the prefix SID of node 5 800 (16005). 802 o node 1 steers a packet P2 on the SR policy <16002, B, 16005>. 804 In such a case, the journey of P1 will be 1-2-3-4-5 while the journey 805 of P2 will be 1-2-lambda(L241)-4-5. P2 skips the IP hop 3 and 806 leverages the DWDM circuit from node 2 to node 4. P1 follows the 807 shortest-path computed by the distributed routing protocol. The path 808 of P1 is unaltered by the addition, modification or deletion of 809 optical bypass circuits. 811 The salient point of this example is that the SR Policy architecture 812 seamlessly support explicit candidate paths through any transport 813 sub-layer. 815 BGP-LS Extensions to describe the sub-IP-layer characteristics of the 816 SR Policy are out of scope of this document (e.g. in Figure 7, the 817 DWDM characteristics of the SR Policy at node 2 in terms of latency, 818 loss, security, domain/country traversed by the circuit etc.). 820 Further details of the SR Policy use-case for Packet Optical networks 821 are specified in [I-D.anand-spring-poi-sr] . 823 9. Security Considerations 825 The security considerations related to Segment Routing architecture 826 are described in [I-D.ietf-spring-segment-routing] and for SR Policy 827 architecture are described in 828 [I-D.filsfils-spring-segment-routing-policy] and they apply to this 829 document as well. 831 10. IANA Considerations 833 This document has no actions for IANA. 835 11. Acknowledgement 837 The authors like to thank Tarek Saad, Dhanendra Jain and Muhammad 838 Durrani for their valuable comments and suggestions. 840 12. References 842 12.1. Normative References 844 [I-D.filsfils-spring-segment-routing-policy] 845 Filsfils, C., Sivabalan, S., Raza, K., Liste, J., Clad, 846 F., Talaulikar, K., Ali, Z., Hegde, S., 847 daniel.voyer@bell.ca, d., Lin, S., bogdanov@google.com, 848 b., Krol, P., Horneffer, M., Steinberg, D., Decraene, B., 849 Litkowski, S., and P. Mattes, "Segment Routing Policy for 850 Traffic Engineering", draft-filsfils-spring-segment- 851 routing-policy-05 (work in progress), February 2018. 853 [I-D.ietf-spring-segment-routing] 854 Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., 855 Litkowski, S., and R. Shakir, "Segment Routing 856 Architecture", draft-ietf-spring-segment-routing-15 (work 857 in progress), January 2018. 859 12.2. Informative References 861 [I-D.anand-spring-poi-sr] 862 Anand, M., Bardhan, S., Subrahmaniam, R., Tantsura, J., 863 Mukhopadhyaya, U., and C. Filsfils, "Packet-Optical 864 Integration in Segment Routing", draft-anand-spring-poi- 865 sr-05 (work in progress), February 2018. 867 [I-D.filsfils-spring-srv6-network-programming] 868 Filsfils, C., Li, Z., Leddy, J., daniel.voyer@bell.ca, d., 869 daniel.bernier@bell.ca, d., Steinberg, D., Raszuk, R., 870 Matsushima, S., Lebrun, D., Decraene, B., Peirens, B., 871 Salsano, S., Naik, G., Elmalky, H., Jonnalagadda, P., and 872 M. Sharif, "SRv6 Network Programming", draft-filsfils- 873 spring-srv6-network-programming-04 (work in progress), 874 March 2018. 876 [I-D.ietf-idr-bgp-ls-segment-routing-ext] 877 Previdi, S., Talaulikar, K., Filsfils, C., Gredler, H., 878 and M. Chen, "BGP Link-State extensions for Segment 879 Routing", draft-ietf-idr-bgp-ls-segment-routing-ext-07 880 (work in progress), May 2018. 882 [I-D.ietf-idr-bgpls-segment-routing-epe] 883 Previdi, S., Filsfils, C., Patel, K., Ray, S., and J. 884 Dong, "BGP-LS extensions for Segment Routing BGP Egress 885 Peer Engineering", draft-ietf-idr-bgpls-segment-routing- 886 epe-15 (work in progress), March 2018. 888 [I-D.ietf-idr-segment-routing-te-policy] 889 Previdi, S., Filsfils, C., Jain, D., Mattes, P., Rosen, 890 E., and S. Lin, "Advertising Segment Routing Policies in 891 BGP", draft-ietf-idr-segment-routing-te-policy-03 (work in 892 progress), May 2018. 894 [I-D.ietf-idr-te-lsp-distribution] 895 Previdi, S., Dong, J., Chen, M., Gredler, H., and J. 896 Tantsura, "Distribution of Traffic Engineering (TE) 897 Policies and State using BGP-LS", draft-ietf-idr-te-lsp- 898 distribution-08 (work in progress), December 2017. 900 [I-D.ietf-lsr-flex-algo] 901 Psenak, P., Hegde, S., Filsfils, C., Talaulikar, K., and 902 A. Gulko, "IGP Flexible Algorithm", draft-ietf-lsr-flex- 903 algo-00 (work in progress), May 2018. 905 [I-D.ietf-pce-segment-routing] 906 Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., 907 and J. Hardwick, "PCEP Extensions for Segment Routing", 908 draft-ietf-pce-segment-routing-11 (work in progress), 909 November 2017. 911 [I-D.ietf-spring-segment-routing-mpls] 912 Bashandy, A., Filsfils, C., Previdi, S., Decraene, B., 913 Litkowski, S., and R. Shakir, "Segment Routing with MPLS 914 data plane", draft-ietf-spring-segment-routing-mpls-13 915 (work in progress), April 2018. 917 [I-D.sivabalan-pce-binding-label-sid] 918 Sivabalan, S., Tantsura, J., Filsfils, C., Previdi, S., 919 Hardwick, J., and D. Dhody, "Carrying Binding Label/ 920 Segment-ID in PCE-based Networks.", draft-sivabalan-pce- 921 binding-label-sid-04 (work in progress), March 2018. 923 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 924 dual environments", RFC 1195, DOI 10.17487/RFC1195, 925 December 1990, . 927 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 928 DOI 10.17487/RFC2328, April 1998, 929 . 931 [RFC3630] Katz, D., Kompella, K., and D. Yeung, "Traffic Engineering 932 (TE) Extensions to OSPF Version 2", RFC 3630, 933 DOI 10.17487/RFC3630, September 2003, 934 . 936 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 937 Engineering", RFC 5305, DOI 10.17487/RFC5305, October 938 2008, . 940 [RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF 941 for IPv6", RFC 5340, DOI 10.17487/RFC5340, July 2008, 942 . 944 [RFC7471] Giacalone, S., Ward, D., Drake, J., Atlas, A., and S. 945 Previdi, "OSPF Traffic Engineering (TE) Metric 946 Extensions", RFC 7471, DOI 10.17487/RFC7471, March 2015, 947 . 949 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 950 S. Ray, "North-Bound Distribution of Link-State and 951 Traffic Engineering (TE) Information Using BGP", RFC 7752, 952 DOI 10.17487/RFC7752, March 2016, 953 . 955 [RFC7810] Previdi, S., Ed., Giacalone, S., Ward, D., Drake, J., and 956 Q. Wu, "IS-IS Traffic Engineering (TE) Metric Extensions", 957 RFC 7810, DOI 10.17487/RFC7810, May 2016, 958 . 960 [RFC8231] Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path 961 Computation Element Communication Protocol (PCEP) 962 Extensions for Stateful PCE", RFC 8231, 963 DOI 10.17487/RFC8231, September 2017, 964 . 966 Authors' Addresses 968 Clarence Filsfils 969 Cisco Systems, Inc. 970 Pegasus Parc 971 De kleetlaan 6a, DIEGEM BRABANT 1831 972 BELGIUM 974 Email: cfilsfil@cisco.com 975 Siva Sivabalan 976 Cisco Systems, Inc. 977 2000 Innovation Drive 978 Kanata, Ontario K2K 3E8 979 Canada 981 Email: msiva@cisco.com 983 Shraddha Hegde 984 Juniper Networks, Inc. 985 Embassy Business Park 986 Bangalore, KA 560093 987 India 989 Email: shraddha@juniper.net 991 Daniel Voyer 992 Bell Canada. 993 671 de la gauchetiere W 994 Montreal, Quebec H3B 2M8 995 Canada 997 Email: daniel.voyer@bell.ca 999 Steven Lin 1000 Google, Inc. 1002 Email: stevenlin@google.com 1004 Alex Bogdanov 1005 Google, Inc. 1007 Email: bogdanov@google.com 1009 Przemyslaw Krol 1010 Google, Inc. 1012 Email: pkrol@google.com 1013 Martin Horneffer 1014 Deutsche Telekom 1016 Email: martin.horneffer@telekom.de 1018 Dirk Steinberg 1019 Steinberg Consulting 1021 Email: dws@steinbergnet.net 1023 Bruno Decraene 1024 Orange Business Services 1026 Email: bruno.decraene@orange.com 1028 Stephane Litkowski 1029 Orange Business Services 1031 Email: stephane.litkowski@orange.com 1033 Paul Mattes 1034 Microsoft 1035 One Microsoft Way 1036 Redmond, WA 98052-6399 1037 USA 1039 Email: pamattes@microsoft.com 1041 Zafar Ali 1042 Cisco Systems, Inc. 1044 Email: zali@cisco.com 1046 Ketan Talaulikar 1047 Cisco Systems, Inc. 1049 Email: ketant@cisco.com 1050 Jose Liste 1051 Cisco Systems, Inc. 1052 821 Alder Drive 1053 Milpitas, California 95035 1054 USA 1056 Email: jliste@cisco.com 1058 Francois Clad 1059 Cisco Systems, Inc. 1061 Email: fclad@cisco.com 1063 Kamran Raza 1064 Cisco Systems, Inc. 1065 2000 Innovation Drive 1066 Kanata, Ontario K2K 3E8 1067 Canada 1069 Email: skraza@cisco.com