idnits 2.17.1 draft-filsfils-spring-segment-routing-policy-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 30, 2017) is 2369 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 440 -- Looks like a reference, but probably isn't: '254' on line 440 -- Looks like a reference, but probably isn't: '4000' on line 603 -- Looks like a reference, but probably isn't: '8000' on line 603 -- Looks like a reference, but probably isn't: '4009' on line 606 -- Looks like a reference, but probably isn't: '5000' on line 606 == Missing Reference: '4000-4499' is mentioned on line 618, but not defined == Missing Reference: '4500-5000' is mentioned on line 618, but not defined == Unused Reference: 'GLOBECOM' is defined on line 1070, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-isis-segment-routing-extensions' is defined on line 1081, but no explicit reference was found in the text == Unused Reference: 'SIGCOMM' is defined on line 1128, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'GLOBECOM' == Outdated reference: A later version (-19) exists of draft-ietf-idr-te-lsp-distribution-07 == Outdated reference: A later version (-25) exists of draft-ietf-isis-segment-routing-extensions-13 == Outdated reference: A later version (-16) exists of draft-ietf-pce-segment-routing-10 == Outdated reference: A later version (-15) exists of draft-ietf-spring-segment-routing-13 == Outdated reference: A later version (-07) exists of draft-sivabalan-pce-binding-label-sid-03 -- Possible downref: Non-RFC (?) normative reference: ref. 'SIGCOMM' Summary: 2 errors (**), 0 flaws (~~), 11 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Filsfils 3 Internet-Draft S. Sivabalan 4 Intended status: Standards Track K. Raza 5 Expires: May 3, 2018 Cisco Systems, Inc. 6 S. Hegde 7 Juniper Networks, Inc. 8 D. Yoyer 9 Bell Canada. 10 S. Lin 11 A. Bogdanov 12 Google, Inc. 13 M. Horneffer 14 Deutsche Telekom 15 F. Clad 16 Cisco Systems, Inc., 17 D. Steinberg 18 Steinberg Consulting 19 B. Decraene 20 S. Litkowski 21 Orange Business Services 22 M. Nanduri 23 ebay Corporation. 24 October 30, 2017 26 Segment Routing Policy for Traffic Engineering 27 draft-filsfils-spring-segment-routing-policy-02.txt 29 Abstract 31 Segment Routing (SR) allows a headend node to steer a packet flow 32 along any path. Intermediate per-flow states are eliminated thanks 33 to source routing. The headend node steers a flow into an SR Policy. 34 The header of a packet steered in an SR Policy is augmented with the 35 ordered list of segments associated with that SR Policy. This 36 document details the concepts of SR Policy and steering into an SR 37 Policy. 39 Requirements Language 41 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 42 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 43 document are to be interpreted as described in [RFC2119]. 45 Status of This Memo 47 This Internet-Draft is submitted in full conformance with the 48 provisions of BCP 78 and BCP 79. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF). Note that other groups may also distribute 52 working documents as Internet-Drafts. The list of current Internet- 53 Drafts is at https://datatracker.ietf.org/drafts/current/. 55 Internet-Drafts are draft documents valid for a maximum of six months 56 and may be updated, replaced, or obsoleted by other documents at any 57 time. It is inappropriate to use Internet-Drafts as reference 58 material or to cite them other than as "work in progress." 60 This Internet-Draft will expire on May 3, 2018. 62 Copyright Notice 64 Copyright (c) 2017 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents 69 (https://trustee.ietf.org/license-info) in effect on the date of 70 publication of this document. Please review these documents 71 carefully, as they describe your rights and restrictions with respect 72 to this document. Code Components extracted from this document must 73 include Simplified BSD License text as described in Section 4.e of 74 the Trust Legal Provisions and are provided without warranty as 75 described in the Simplified BSD License. 77 Table of Contents 79 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 80 2. SR Traffic Engineering Architecture . . . . . . . . . . . . . 4 81 3. SR Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 5 82 4. Segment-list . . . . . . . . . . . . . . . . . . . . . . . . 7 83 4.1. Explicit Null . . . . . . . . . . . . . . . . . . . . . . 8 84 5. SR Policy Multi-Domain Database . . . . . . . . . . . . . . . 9 85 6. Operations . . . . . . . . . . . . . . . . . . . . . . . . . 9 86 6.1. W-ECMP . . . . . . . . . . . . . . . . . . . . . . . . . 9 87 6.2. Path Validation . . . . . . . . . . . . . . . . . . . . . 9 88 6.3. Fast Convergence . . . . . . . . . . . . . . . . . . . . 10 89 7. Binding SID . . . . . . . . . . . . . . . . . . . . . . . . . 10 90 7.1. Benefits . . . . . . . . . . . . . . . . . . . . . . . . 10 91 7.2. Allocation . . . . . . . . . . . . . . . . . . . . . . . 11 92 7.2.1. Dynamic BSID Allocation . . . . . . . . . . . . . . . 12 93 7.2.2. Explicit BSID Allocation . . . . . . . . . . . . . . 12 94 7.2.3. Generic BSID Allocation . . . . . . . . . . . . . . . 12 95 8. Centralized Discovery . . . . . . . . . . . . . . . . . . . . 13 96 9. Dynamic Path . . . . . . . . . . . . . . . . . . . . . . . . 14 97 9.1. Optimization Objective . . . . . . . . . . . . . . . . . 14 98 9.2. Constraints . . . . . . . . . . . . . . . . . . . . . . . 15 99 9.3. SR Native Algorithm . . . . . . . . . . . . . . . . . . . 16 100 9.4. Path to SID . . . . . . . . . . . . . . . . . . . . . . . 16 101 9.5. PCE Computed Path . . . . . . . . . . . . . . . . . . . . 17 102 10. Signaling Paths of an SR Policy to a Head-end . . . . . . . . 17 103 10.1. BGP . . . . . . . . . . . . . . . . . . . . . . . . . . 18 104 10.2. PCEP . . . . . . . . . . . . . . . . . . . . . . . . . . 18 105 10.3. NETCONF . . . . . . . . . . . . . . . . . . . . . . . . 18 106 10.4. CLI . . . . . . . . . . . . . . . . . . . . . . . . . . 18 107 11. Steering into an SR Policy . . . . . . . . . . . . . . . . . 18 108 11.1. Incoming Active SID is a BSID . . . . . . . . . . . . . 18 109 11.2. Recursion on a BSID . . . . . . . . . . . . . . . . . . 19 110 11.2.1. Multiple Colors . . . . . . . . . . . . . . . . . . 19 111 11.3. Recursion on an on-demand dynamic BSID . . . . . . . . . 20 112 11.3.1. Multiple Colors . . . . . . . . . . . . . . . . . . 20 113 11.4. An array of BSIDs associated with an IGP entry . . . . . 20 114 11.5. A Routing Policy on a BSID . . . . . . . . . . . . . . . 21 115 12. Optional Steering Modes for BGP Destinations . . . . . . . . 21 116 12.1. Color-Only BGP Destination Steering . . . . . . . . . . 21 117 12.2. Drop on Invalid . . . . . . . . . . . . . . . . . . . . 22 118 13. Multipoint SR Policy . . . . . . . . . . . . . . . . . . . . 23 119 13.1. Spray SR Policy . . . . . . . . . . . . . . . . . . . . 23 120 14. Reporting SR Policy . . . . . . . . . . . . . . . . . . . . . 23 121 15. Work in Progress . . . . . . . . . . . . . . . . . . . . . . 23 122 16. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 23 123 17. Normative References . . . . . . . . . . . . . . . . . . . . 23 124 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 126 1. Introduction 128 Segment Routing (SR) allows a headend node to steer a packet flow 129 along any path. Intermediate per-flow states are eliminated thanks 130 to source routing [I-D.ietf-spring-segment-routing]. 132 The headend node is said to steer a flow into an Segment Routing 133 Policy (SR Policy). 135 The header of a packet steered in an SR Policy is augmented with the 136 ordered list of segments associated with that SR Policy. 138 This document details the concepts of SR Policy and steering into an 139 SR Policy. These apply equally to the MPLS and SRv6 instantiations 140 of segment routing. 142 For reading simplicity, the illustrations are provided for the MPLS 143 instantiations. 145 2. SR Traffic Engineering Architecture 147 +--------+ +--------+ 148 | BGP | | PCEP | 149 +--------+ +--------+ 150 \ / 151 +--------+ +--------+ +--------+ 152 | CLI |--| SRTE |--| NETCONF| 153 +--------+ +--------+ +--------+ 154 | 155 +--------+ 156 | FIB | 157 +--------+ 159 Figure 1: SR Policy architecture 161 The Segment Routing Traffic Engineering (SRTE) process installs a 162 Segment Routing Policy (SR Policy) in the forwarding plane (FIB). 164 An SR policy is represented in FIB as a BSID-keyed entry. For 165 traffic steering purpose, suitable SR policy is identified using 166 either BSID or IP prefix. 168 For a given SR policy, the SRTE process MAY learn multiple candidate 169 paths from different sources: NETCONF with OpenConfig or YANG model 170 (work in progress), PCEP [I-D.ietf-pce-pce-initiated-lsp], local 171 configuration or BGP [I-D.previdi-idr-segment-routing-te-policy]. 173 The SRTE process selects the best candidate path and installs it in 174 FIB. 176 +--------+ +--------+ 177 | BGP-LS | | IGP | 178 +--------+ +--------+ 179 \ / 180 +--------+ +--------+ 181 | SRTE |--| NETCONF| 182 +--------+ +--------+ 184 Figure 2: Topology/link-state database architecture 186 The SRTE process maintains an SRTE database (SRTE-DB). 188 The SRTE-DB is multi-domain capable. 190 The attached domain topology MAY be learned via IGP, BGP-LS or 191 NETCONF. 193 A non-attached (remote) domain topology MAY be learned via BGP-LS or 194 NETCONF. 196 In some use-cases, the SRTE-DB may only contain the attached domain 197 topology while in others, the SRTE-DB may contain the topology of 198 multiple domains. 200 3. SR Policy 202 An SR Policy is identified through the following tuple: 204 o The head-end where the policy is instantiated/implemented. 206 o The endpoint (i.e.: the destination of the policy). 208 o The color (an arbitrary numerical value). 210 At a given head-end, an SR Policy is fully identified by the tuple. 213 An endpoint can be specified as an IPv4 or IPv6 address. 215 An SR Policy is associated with one or more candidate paths. A path 216 refers to a list of segments (i.e. Segment-list) or a set of 217 Segment-lists. A Segment-list represents a specific way to send 218 traffic from the head-end to the end-point of the corresponding SR 219 policy. If a path contains multiple Segment-lists, each list can be 220 associated with a weight for equal or unequal cost load-balancing 221 (default is equal cost load-balancing). 223 For each SR Policy, at most one candidate path is selected, and only 224 the selected path is used for forwarding traffic that is being 225 steered onto that policy. 227 A candidate path is either dynamic or explicit. 229 A dynamic path expresses an optimization objective and a set of 230 constraints. The headend computes a solution to the optimization 231 problem as a Segment-list or a set of Segment-lists. When the 232 headend does not have enough topological information (e.g. multi- 233 domain problem), the headend may delegate the computation to a PCE. 234 Whenever the network state changes, the path is recomputed. 236 An explicit path is a Segment-list or a set of Segment-lists. 238 A candidate path has a preference. If not specified, the default 239 preference is 100. 241 A candidate path is associated with a single Binding SID (BSID) in 242 the context of the corresponding SR policy. If a candidate path is 243 selected (active path), its BSID must be the same as that of the 244 corresponding SR policy. If more than one SR policies might happen 245 to be associated with an identical candidate path, each candidate 246 path MUST be associated with a unique BSID to ensure that each policy 247 has a distinct BSID. 249 A candidate path is valid if it is usable. A common path validity 250 criterion is the reachability of its constituent SIDs. The 251 validation rules are defined in a later section. 253 A Path is selected (i.e. it is the best path of the policy) when it 254 is valid and its preference is the best (highest value) among all the 255 candidate paths of the SR Policy. The selected path is referred to 256 as the "active path" of the SR policy in this document. 258 Whenever a new path is learned or the validity of an existing path 259 changes or an existing path is changed, the selection process must be 260 re-executed. 262 A headend may be informed about a candidate path for a policy by various means including: local configuration, NETCONF, 264 PCEP or BGP. The protocol source of the path does not matter to how 265 an active path is chosen. For a given Policy, when comparing a 266 candidate path learned via one means to a candidate path learned via 267 another means (e.g., one via BGP another via PCEP), a valid path will 268 be regarded as preferable to the other based on the preference. If 269 there are multiple valid paths with equal preference, the selection 270 of active path is a matter of local preference. If multiple paths 271 for a given color and end-point are distributed via BGP, the BGP path 272 selection process is used to select the best candate path among all 273 BGP distributed path. The best candididate path distributed via BGP 274 will then be compared against the paths learned via other means 275 (e.g., BGP) to select the active path for the color and end-point in 276 question. 278 In the vast majority of use-cases known to date, a path is associated 279 with a single Segment-list and each path of a policy has a different 280 preference. 282 The BSID of an SR Policy refers to its selected path. 284 At any given time, a given BSID MUST map to a single SR policy and 285 indirectly map to its selected path. However, the mapping from a 286 given BSID to an SR Policy may change over the life of the SR policy, 287 and the true identification of a policy is the tuple . 290 An SR Policy is active at a headend as soon as this 291 head-end knows about a valid path for this policy. 293 An active SR Policy installs a BSID-keyed entry in the forwarding 294 plane with the action of steering the packets matching this entry to 295 the selected path of the SR Policy. 297 If a set of Segment-lists is associated with the selected path of the 298 policy, then the steering is flow and W-ECMP based according to the 299 relative weight of each Segment-list. 301 In summary, the information model is the following: 303 SR policy FOO 304 Candidate-paths 305 path preference 200 (selected) 306 BSID1 307 Weight W1, Segment-list1: SID11...SID1i 308 Weight W2, Segment-list2: SID21...SID2j 309 path preference 100 310 BSID2 311 Weight W3, Segment-list3: SID31...SID3i 312 Weight W4, Segment-list4: SID41...SID4j 314 The numbers 200 and 100 are preferences of the paths associated with 315 the policy. 317 In general BSDIn = BSID1 = BSID2 ... If paths of an SR policy have 318 different BSIDs, then the BSID of the SR policy is that of the 319 selected path. 321 4. Segment-list 323 A Segment-list includes segments of different types (1 to 8) and an 324 optional weight value that is used for W-ECMP. 326 The following segment types are defined: 328 Type 1: SID only, in the form of MPLS Label. 330 Type 2: SID only, in the form of IPv6 address. 331 Type 3: IPv4 Node Address with optional SID. 332 Type 4: IPv6 Node Address with optional SID. 333 Type 5: IPv4 Address + index with optional SID. 334 Type 6: IPv4 Local and Remote addresses with optional SID. 335 Type 7: IPv6 Address + index with optional SID. 336 Type 8: IPv6 Local and Remote addresses with optional SID. 338 The optional SID can be an MPLS label (SR applied to the MPLS 339 dataplane) or an IPv6 SID (SRv6, SR applied to the IPv6 dataplane). 341 When building the MPLS label stack or the IPv6 Segment list from the 342 Segment List, the node instantiating the policy MUST interpret the 343 set of Segments as follows: 345 o The first Segment represents the topmost label or the first IPv6 346 segment. It identifies the first segment the traffic will be 347 directed toward along the SR explicit path. 348 o The last Segment represents the bottommost label or the last IPv6 349 segment the traffic will be directed toward along the SR explicit 350 path. 352 A Segment-list is represented as where S1 is the 353 first SID. 355 4.1. Explicit Null 357 A Type 1 SID may be any MPLS label, including reserved labels. 359 For example, assuming that the desired traffic-engineered path from a 360 headend 1 to an endpoint 4 can be expressed by the Segment-list 361 <16002, 16003, 16004> where 16002, 16003 and 16004 respectively refer 362 to the IPv4 Prefix SIDs bound to node 2, 3 and 4, then IPv6 traffic 363 can be traffic-engineered from nodes 1 to 4 via the previously 364 described path using an SRTE Policy with Segment-list <16002, 16003, 365 16004, 2> where mpls label value of 2 represents the "IPv6 Explicit 366 NULL Label". 368 The penultimate node before node 4 will pop 16004 and will forward 369 the frame on its directly connected interface to node 4. 371 The endpoint receives the traffic with top label "2" which indicates 372 that the payload is an IPv6 packet. 374 5. SR Policy Multi-Domain Database 376 A headend can learn an attached domain topology via its IGP or a BGP- 377 LS session. A headend can learn a non-attached domain topology via a 378 BGP-LS session. 380 A headend collects all these topologies in the SR-TE database (SRTE- 381 DB). 383 The SRTE-DB is multi-domain capable. 385 In some deployments, the SRTE-DB may only contain the attached domain 386 topology while in others, the SRTE-DB may contain the topology of 387 multiple domains. 389 6. Operations 391 6.1. W-ECMP 393 Packets steered to an SR Policy (i.e. to its BSID either via presence 394 in the packet header as active segment or via FIB recursion) are 395 load-balanced on a weighted basis among the Segment-lists associated 396 with the selected path of the SR Policy. 398 The fraction of the flows associated with a given Segment-list is w/ 399 Sw where w is the weight of the Segment-list and Sw is the sum of the 400 weights of the Segment-lists of the selected path of the SR Policy. 402 The accuracy of the weighted load-balancing depends on the platform 403 implementation. 405 6.2. Path Validation 407 A Segment-list is invalid as soon as: 409 o It is empty. 410 o The headend is unable to resolve the first SID into one or more 411 outgoing interface(s) and next-hop(s). 412 o The headend is unable to resolve any non-first SID of type 3-to-8 413 into an MPLS label or an SRv6 SID. 415 Unreachable means that the headend has no path to the SID in its 416 SRTE-DB. 418 In multi-domain deployments, it is expected that the headend be 419 unable to verify the reachability of the SIDs in remote domains. 420 Types 1 and 2 MUST be used for the SIDs for which the reachability 421 cannot be verified. Note that the first SID must always be reachable 422 whatever is type. 424 A Path is invalid as soon as it has no valid Segment-list. 426 The headend of an SR Policy updates the validity of a Segment-list 427 upon network topological change. 429 A path of an SR Policy is invalid when all its Segment-lists are 430 invalid. 432 An SR Policy is invalid when all its paths are invalid. 434 6.3. Fast Convergence 436 Upon topological change, many policies could be recomputed. An 437 implementation MAY provide a per-policy priority field. The operator 438 MAY set this field to indicate in which order the policies should be 439 re-computed. Such a priority may be represented by an integer in the 440 range [0, 254] where the lowest value is the highest priority. 442 7. Binding SID 444 7.1. Benefits 446 The Binding SID (BSID) is fundamental to Segment Routing. It 447 provides scaling, network opacity and service independence. 449 A---DCI1----C----D----E----DCI3---H 450 / | | \ 451 S | | Z 452 \ | | / 453 B---DCI2----F---------G----DCI4---K 454 <==DC1==><=========Core========><==DC2==> 456 Figure 3: A Simple Datacenter Topology 458 A simplified illustration is provided on the basis of the previous 459 diagram where we assume that S, A, B, Data Center Interconnect DCI1 460 and DCI2 share the same IGP-SR instance in the data-center 1 (DC1). 461 DCI1, DCI2, C, D, E, F, G, DCI3 and DCI4 share the same IGP-SR domain 462 in the core. DCI3, DCI4, H, K and Z share the same IGP-SR domain in 463 the data-center 2 (DC2). 465 In this example, we assume no redistribution between the IGP's and no 466 presence of BGP. The inter-domain communication is only provided by 467 SR through SR Policies. 469 The latency from S to DCI1 equals to DCI2. The latency from Z to 470 DCI3 equals to DCI4. All the intra-DC links have the same IGP metric 471 10. 473 The path DCI1, C, D, E, DCI3 has a lower latency and lower capacity 474 than the path DCI2, F, G, DCI4. 476 The IGP metrics of all the core links are set to 10 except the links 477 D-E which is set to 100. 479 A low-latency multi-domain policy from S to Z may be expressed as 480 where: 482 o DCI1 is the prefix SID of DCI1. 483 o BSID is the Binding SID bound to an SRTE policy 484 instantiated at DCI1. 485 o Z is the prefix SID of Z. 487 Without the use of an intermediate core SR Policy (efficiently 488 summarized by a single BSID), S would need to steer its low-latency 489 flow into the policy . 491 The use of a BSID (and the intermediate bound SR Policy) decreases 492 the number of segments imposed by the source. 494 A BSID acts as a stable anchor point which isolates one domain from 495 the churn of another domain. Upon topology changes within the core 496 of the network, the low-latency path from DCI1 to DCI3 may change. 497 While the path of an intermediate policy changes, its BSID does not 498 change. Hence the policy used by the source does not change, hence 499 the source is shielded from the churn in another domain. 501 A BSID provides opacity and independence between domains. The 502 administrative authority of the core domain may not want to share 503 information about its topology. The use of a BSID allows keeping the 504 service opaque. S is not aware of the details of how the low-latency 505 service is provided by the core domain. S is not aware of the need 506 of the core authority to temporarily change the intermediate path. 508 7.2. Allocation 510 There are three approaches to allocate a BSID to an SR Policy: all 511 the paths have no explicit BSID (called dynamic allocation), all the 512 paths have the same explicit BSID (explicit allocation) and finally a 513 mix of paths with and without explicit BSID (generic allocation). 515 In practice, all the use-cases seen to-date either use the explicit 516 allocation or the dynamic allocation. The explicit allocation is 517 most-often associated with controller-instantiated SR Policies. The 518 dynamic allocation is most-often associated with router-based on- 519 demand SR Policies. 521 7.2.1. Dynamic BSID Allocation 523 No path of the SR Policy have a specified BSID. 525 In such a case, the SR-TE implementation allocates a SID to the SR 526 Policy and keeps it along the whole existence of the policy. 528 In the case of SR-MPLS, the SR-TE implementation binds a local 529 dynamic label in the same way LDP, RSVP-TE or BGP would do. 531 7.2.2. Explicit BSID Allocation 533 All the paths of the SR Policy have the same specified BSID, with the 534 same behavioral preference in case this specified BSID is not 535 available. 537 If the specified BSID is available, then it is bound to the SR Policy 538 and used along the existence of the policy. 540 If the specified BSID is not available, then a SYSLOG/NETCONF message 541 is generated and if the preferred behavior is to fall-back on the 542 dynamic allocation, then the dynamic allocation is performed. 544 If the specified BSID is not available and the operator-requested 545 behavior is to not fall-back on the dynamic allocation, then a 546 SYSLOG/NETCONF message is generated and the SR Policy does not 547 install any BSID entry in the forwarding plane. 549 A later section will explain how controllers can discover the local 550 SIDs available at a node N so as to pick an explicit BSID for a SR 551 Policy to be instantiated at headend N. 553 7.2.3. Generic BSID Allocation 555 This section details the BSID allocation when a policy is made of 556 paths with different BSID allocation behaviors (e.g. mix of paths 557 with and without an explicit BSID, potentially with different 558 explicit BSIDs). 560 When the selected path has a specified BSID, the SR Policy uses that 561 BSID if this value (label in MPLS, IPv6 address in SRv6) is available 562 (i.e. not associated with any other usage: e.g. to another MPLS 563 client, to another SID, to another SR Policy). 565 If the selected path's BSID is not available, then the SR Policy 566 keeps the previous BSID. If the SR Policy did not have a previous 567 BSID, then the SR Policy dynamically binds a BSID to itself. 569 Note that a path may request that only its specified BSID be used. 570 In that case, if that BSID is not available and that path is active, 571 then no BSID is bound to the policy and a SYSLOG/NETCONF is 572 triggered. In this case, the SR Policy does not install any entry 573 indexed by a BSID in the forwarding plane. 575 When an SR Policy has multiple multiple valid paths with the best 576 preference but with different BSIDs, it is left to the implementation 577 to decide which BSID to install. This case is unlikely in practice 578 for two reasons. First, all known use-cases share the same BSID 579 across all the paths of a given SR Policy. Second, all known use- 580 cases have a different preference for each path. Hence in practice a 581 single path will be active and with a stable BSID on a per-policy 582 basis. 584 8. Centralized Discovery 586 This section explains how controllers can discover the local SIDs 587 available at a node N so as to pick an explicit BSID for a SR Policy 588 to be instantiated at headend N. 590 Any controller can discover the following properties of a node N 591 (e.g. via BGP-LS, NETCONF etc.): 593 o its local Segment Routing Label Block (SRLB). 594 o its local topology. 595 o its topology-related SIDs (Adj SID and EPE SID). 596 o its SR Policies and their BSID 597 ([I-D.ietf-idr-te-lsp-distribution]). 599 Any controller can thus infer the available SIDs in the SRLB of any 600 node. 602 As an example, a controller discovers the following characteristics 603 of N: SRLB [4000, 8000], 3 Adj SIDs (4001, 4002, 4003), 2 EPE SIDs 604 (4004, 4005) and 3 SRTE policies (whose BSIDs are respectively 4006, 605 4007 and 4008). This controller can deduce that the SRLB sub-range 606 [4009, 5000] is free for allocation. 608 Likely, the next question is: how do we ensure that different 609 controllers do not pick the same available SID at the same time for 610 different SR Policies. 612 Clearly, a controller is not restricted to use the next numerically 613 available SID in the available SRLB sub-range. It can pick any label 614 in the subset of available labels. This random pick make the chance 615 for a collision unlikely. 617 An operator could also sub-allocate the SRLB between different 618 controllers (e.g. [4000-4499] to controller 1 and [4500-5000] to 619 controller 2). 621 Inter-controller state-synchronization may be used to avoid/detect 622 collision in BSID. 624 All these techniques make the likelihood of a collision between 625 different controllers very unlikely. 627 In the unlikely case of a collision, the controllers will detect it 628 through SYSLOG/NETCONF, BGP-LS reporting 629 ([I-D.ietf-idr-te-lsp-distribution]) or PCEP notification. They then 630 have the choice to continue the operation of their SR Policy with the 631 dynamically allocated BSID or re-try with another explicit pick. 633 Note: in deployments where PCE Protocol (PCEP) is used between head- 634 end and controller (PCE), a head-end can report BSID as well as 635 policy attributes (e.g., type of disjointness) and operational and 636 administrative states to controller. Similarly, a controller can 637 also assign/update the BSID of a policy via PCEP when instantiating 638 or updating SR Policy. 640 9. Dynamic Path 642 A dynamic path is a path that expresses an optimization objective and 643 constraints. 645 The headend of the policy is responsible to compute a Segment-list 646 ("solution Segment-list") that fits this optimization problem. The 647 headend is responsible for computing the solution Segment-list any 648 time the inputs to the problem change (e.g. topology changes). 650 9.1. Optimization Objective 652 We define two optimization objectives: 654 o Min-Metric - requests computation of a solution Segment-list 655 optimized for a selected metric. 657 o Min-Metric with margin and maximum number of SIDs - Min-Metric 658 with two changes: a margin of by which two paths with similar 659 metrics would be considered equal, a constraint on the max number 660 of SIDs in the Segment-list. 662 The "Min-Metric" optimization objective requests to compute a 663 solution Segment-list such that packets flowing through the solution 664 Segment-list use ECMP-aware paths optimized for the selected metric. 665 The "Min-Metric" objective can be instantiated for the IGP metric xor 666 the TE metric xor the latency extended TE metric. This metric is 667 called the O metric (the optimized metric) to distinguish it from the 668 IGP metric. The solution Segment-list must be computed to minimize 669 the number of SIDs and the number of Segment-lists. 671 If the selected O metric is the IGP metric and the headend and 672 tailend are in the same IGP domain, then the solution Segment-list is 673 made of the single prefix-SID of the tailend. 675 When the selected O metric is not the IGP metric, then the solution 676 Segment-list is made of prefix SIDs of intermediate nodes, Adjacency 677 SIDs along intermediate links and potentially BSIDs of intermediate 678 policies. 680 In many deployments there are insignificant metric differences 681 between mostly equal path (e.g. a difference of 100 usec of latency 682 between two paths from NYC to SFO would not matter in most cases). 683 The "Min-Metric with margin" objective supports such requirement. 685 The "Min-Metric with margin and maximum number of SIDs" optimization 686 objective requests to compute a solution Segment-list such that 687 packets flowing through the solution Segment-list do not use a path 688 whose cumulated O metric is larger than the shortest-path O metric + 689 margin. 691 If this is not possible because of the number of SIDs constraint, 692 then the solution Segment-list minimizes the O metric while meeting 693 the maximum number of SID constraints. 695 9.2. Constraints 697 The following constraints can be defined: 699 o Inclusion and/or exclusion of TE affinity. 700 o Inclusion and/or exclusion of IP address. 701 o Inclusion and/or exclusion of SRLG. 702 o Inclusion and/or exclusion of admin-tag. 703 o Maximum accumulated metric (IGP, TE and latency). 704 o Maximum number of SIDs in the solution Segment-list. 706 o Maximum number of weighted Segment-lists in the solution set. 707 o Diversity to another service instance (e.g., link, node, or SRLG 708 disjoint paths originating from different head-ends). 710 9.3. SR Native Algorithm 712 1----------------2----------------3 713 |\ / 714 | \ / 715 | 4-------------5-------------7 716 | \ /| 717 | +-----------6-----------+ | 718 8------------------------------9 720 Figure 4: Illustration used to describe SR native algorithm 722 Let us assume that all the links have the same IGP metric of 10 and 723 let us consider the dynamic path defined as: Min-Metric(from 1, to 3, 724 IGP metric, margin 0) with constraint "avoid link 2-to-3". 726 A classical circuit implementation would do: prune the graph, compute 727 the shortest-path, pick a single non-ECMP branch of the ECMP-aware 728 shortest-path and encode it as a Segment-list. The solution Segment- 729 list would be <4, 5, 7, 3>. 731 An SR-native algorithm would find a Segment-list that minimizes the 732 number of SIDs and maximize the use of all the ECMP branches along 733 the ECMP shortest path. In this illustration, the solution Segment- 734 list would be <7, 3>. 736 In the vast majority of SR use-cases, SR-native algorithms should be 737 preferred: they preserve the native ECMP of IP and they minimize the 738 dataplane header overhead. 740 In some specific use-case (e.g. TDM migration over IP where the 741 circuit notion prevails), one may prefer a classic circuit 742 computation followed by an encoding into SIDs. 744 SR-native algorithms are a local node behavior and are thus outside 745 the scope of this document. 747 9.4. Path to SID 749 Let us assume the below diagram where all the links have an IGP 750 metric of 10 and a TE metric of 10 except the link AB which has an 751 IGP metric of 20 and the link AD which has a TE metric of 100. Let 752 us consider the min-metric(from A, to D, TE metric, margin 0). 754 B---C 755 | | 756 A---D 758 Figure 5: Illustration used to describe path to SID conversion 760 The solution path to this problem is ABCD. 762 This path can be expressed in SIDs as $#60;B, D$#62; where B and D 763 are the IGP prefix SIDs respectively associated with nodes B and D in 764 the diagram. 766 Indeed, from A, the IGP path to B is AB (IGP metric 20 better than 767 ADCB of IGP metric 30). From B, the IGP path to D is BCD (IGP metric 768 20 better than BAD of IGP metric 30). 770 While the details of the algorithm remain a local node behavior, a 771 high-level description follows: start at the headend and find an IGP 772 prefix SID that leads as far down the desired path as possible 773 (without using any link not included in the desired path). If no 774 prefix SID exists, use the Adj SID to the first neighbor along the 775 path. Restart from the node that was reached. 777 9.5. PCE Computed Path 779 A local computation should be preferred whenever possible. When 780 local computation is not possible (e.g., a policy's tail-end is 781 outside the topology known to the head-end), the head-end may send 782 path computation request to a PCE supporting PCEP extension specified 783 in [I-D.ietf-pce-segment-routing]. 785 10. Signaling Paths of an SR Policy to a Head-end 787 A headend H can be informed about a candidate path for an SR policy 788 (endpoint, color) via several means: BGP, PCEP, CLI, netconf. 790 We remind that the selection of the best path for a policy is 791 independent of the protocol source of the path. 793 10.1. BGP 795 Please refer to [I-D.previdi-idr-segment-routing-te-policy] 797 10.2. PCEP 799 Please refer to [I-D.ietf-pce-pce-initiated-lsp] 801 10.3. NETCONF 803 Operator MUST be able to install policy via NETCONF with OpenConfig/ 804 YANG models (work in progress). 806 10.4. CLI 808 Operator MUST be able to install policy via CLI. 810 11. Steering into an SR Policy 812 A headend can steer a packet flow on an SR Policy in various ways: 814 o Incoming packets have an active SID matching a local BSID at the 815 head-end. 816 o Incoming packets match a BGP/Service route which recurses on the 817 BSID of a local policy. 818 o Incoming packets match a BGP/Service route which recurses on an 819 array of paths to the BGP nhop where some of the paths in the 820 array are local SR Policies. 821 o Incoming packets match a routing policy which directs them on a 822 local SR policy. 824 For simplicity of illustration, we will use the SR-MPLS example. 826 11.1. Incoming Active SID is a BSID 828 Let us assume that headend H has a local SR Policy P of Segment-list 829 and BSID B. 831 When H receives a packet with label stack , H pops B and 832 pushes . H sends the resulting packet with label stack 833 along the path to S1. 835 H has steered the packet in the policy P. 837 H did not have to classify the packet. The classification was done 838 by a node upstream of H (e.g. the source of the packet or an 839 intermediate ingress edge node of the SR domain) and the result of 840 this classification was efficiently encoded in the packet header as a 841 BSID. 843 This is another key benefit of the segment routing in general and the 844 binding SID in particular: the ability to encode a classification and 845 the resulting steering in the packet header such as to better scale 846 and simplify intermediate aggregation nodes. 848 11.2. Recursion on a BSID 850 Let us assume that headend H: 852 o learns about a BGP route R/r via next-hop N, extended-color 853 community C and label V. 854 o has a local SR Policy P to (endpoint = N, color = C) of Segment- 855 list and BSID B. 856 o has a local BGP policy which matches on the extended-color 857 community C and allows its usage as an SR-TE SLA steering 858 information. 860 In such a case, H installs R/r in RIB/FIB with next-hop = B (instead 861 of N). 863 Indeed, H's local BGP policy and the received BGP route indicate that 864 the headend should associate R/r with an SR-TE path to N with the SLA 865 associated with color C. The headend therefore installs the BGP 866 route on that policy. 868 This can be implemented by using the BSID as a generalized nhop and 869 installing the BGP route on that generalized next-hop. 871 When H receives a packet with a destination matching R/r, H pushes 872 the label stack and sends the resulting packet along 873 the path to S1. 875 Note that any label associated with the BGP route is pushed after the 876 Segment-list of the SR Policy. 878 11.2.1. Multiple Colors 880 When a BGP route has multiple extended-color communities each with a 881 valid SRTE policy, the BGP process installs the route on the Binding 882 SID corresponding to the SRTE policy whose color is of highest 883 numerical value. 885 Let us assume that headend H: 887 o learns about a BGP route R/r via next-hop N, extended-color 888 communities C1 and C2 and label V. 889 o has a local SR Policy P1 to (endpoint = N, color = C1) of SID list 890 and BSID B1. 891 o has a local SR Policy P2 to (endpoint = N, color = C2) of SID list 892 and BSID B2. 893 o has a local BGP policy which matches on the extended-color 894 communities C1 and C2 and allows their usage as an SR-TE SLA 895 steering information. 897 In such a case, H installs R/r in RIB/FIB with next-hop = B2 (instead 898 of N) because C2 > C1. 900 11.3. Recursion on an on-demand dynamic BSID 902 In the previous section, we assumed that H had a pre-established 903 "explicit" SR Policy (endpoint N, color C). 905 In this section,independently to the a-priori existence of any 906 explicit path of the SRTE policy (N, C), we note that the BGP process 907 at node H triggers the SRTE process at node H to instantiate a 908 dynamic path for the SRTE policy (N, C) as soon as: 910 o the BGP process learns of a route R/r via N and with color C. 911 o a local policy at node H authorizes the on-demand SRTE path 912 instantiation and maps the color to a dynamic SRTE optimization 913 template. 915 11.3.1. Multiple Colors 917 When a BGP route R/r via N has multiple extended-color communities Ci 918 (with i=1 ... n), an individual on-demand SR-TE dynamic path request 919 (endpoint N, color Ci) is triggered for each color Ci. 921 11.4. An array of BSIDs associated with an IGP entry 923 Let us assume that head-end H: 925 o learns about a BGP route R/r via next-hop N and label V. 926 o has a local SR Policy P1 to (endpoint = N, color = C1) of Segment- 927 list and BSID B1. 928 o has a local SR Policy P2 to (endpoint = N, color = C2) of Segment- 929 list and BSID B2. 930 o is configured to instantiate an array of paths to N where the 931 entry 0 is the IGP path to N, color C1 is the first entry and 932 Color C2 is the second entry. The index into the array is called 933 a Forwarding Class (FC). The index can have values 0 to 7. 935 o is configured to match flows in its ingress interfaces (upon any 936 field such as Ethernet destination/source/vlan/tos or IP 937 destination/source/DSCP or transport ports etc.) and color them 938 with an internal per-packet forwarding-class variable (0, 1 or 2 939 in this example). 941 In such a case, H installs in RIB/FIB: 943 o R/r in with next-hop N (as usual). 944 o N via a recursion on an array A (instead of the immediate outgoing 945 link associated with the IGP shortest-path to N. 946 o Entry A(0) set to the immediate outgoing link of the IGP shortest- 947 path to N. 948 o Entry A(1) set to B1. 949 o Entry A(2) set to B2. 951 H receives three packets P, P1 and P2 on its incoming interface. H 952 colors them respectively with forwarding-class 0, 1 and 2. As a 953 result: 955 o H pushes on packet P and forwards the resulting frame along 956 the shortest-path to N (which in SR-MPLS results in the pushing of 957 the prefix-SID of N. 958 o H pushes on packet P1 and forwards the resulting 959 frame along the shortest-path to S1. 960 o H pushes on packet P2 and forwards the resulting 961 frame along the shortest-path to S4. 963 If the local configuration does not specify any explicit forwarding 964 information for an entry of the array, then this entry is filled with 965 the same information as entry 0 (i.e. the IGP shortest-path). 967 This realizes per-flow steering: different flows bound to the same 968 BGP destination R/r are steered on different SR-TE paths. 970 11.5. A Routing Policy on a BSID 972 Finally, headend H may be configured with a local routing policy 973 which overrides any BGP/IGP path and steer a specified flow on an SR 974 Policy. 976 12. Optional Steering Modes for BGP Destinations 978 12.1. Color-Only BGP Destination Steering 980 In the previous section "Recursion on a BSID", we have seen that the 981 steering on an SR Policy is governed by the matching of the BGP 982 route's next-hop N and the authorized color C with a local SR Policy 983 defined by the tuple (N, C). 985 This is the most likely form of BGP destination steering and the one 986 we recommend. 988 In this section, we define an alternative steering mechanism based 989 only on the color. 991 This color-only steering variation is governed by two new flags "C" 992 and "O" defined in the color extended community. 994 The Color-Only flags "CO" are set to 00 by default. 996 When 00, the BGP destination is preferably steered onto a valid SR 997 Policy (N, C) where N is an IPv4/6 endpoint address and C is a color 998 value else it is steered on the IGP path to the next-hop N. This is 999 the classic case we described before and that we recommend. 1001 When 01, the BGP destination is preferably steered onto a valid SR 1002 Policy (N, C) else onto a valid SR Policy (null endpoint, C) else on 1003 the IGP path to the next-hop N. 1005 When 10, the BGP destination is preferably steered onto a valid SR 1006 Policy (N, C) else onto a valid SR Policy (null endpoint, C) else on 1007 any valid SR Policy (any endpoint, C) else on the IGP path to the 1008 next-hop N. 1010 The null endpoint is 0.0.0.0 for IPv4 and ::0 for IPv6 (all bits set 1011 to the 0 value). 1013 When 11, it is treated like 00. 1015 12.2. Drop on Invalid 1017 The local BGP policy authorizing the use of an extended color 1018 community steering on an SR policy may specify that if the related SR 1019 Policy becomes invalid then the related BSID should remain in RIB/FIB 1020 and point to null0 (drop any packet recursing on that BSID). 1022 Recall that, by default, for a BGP route R/r via next-hop N with 1023 extended-color community C, when the SR Policy (N, C) becomes 1024 invalid, then BGP re-installs R/r in RIB/FIB via N (the IGP path to 1025 N). 1027 13. Multipoint SR Policy 1029 13.1. Spray SR Policy 1031 A Spray SR-TE policy is a variant of an SR-TE policy which involves 1032 packet replication. 1034 Any traffic steered into a Spray SR Policy is replicated along the 1035 Segment-lists of its selected path. 1037 In the context of a Spray SR Policy, the selected path SHOULD have 1038 more than one Segment-list. The weights of the Segment-lists is not 1039 applicable for a Spray SR Policy. They MUST be set to 1. 1041 Like any SR policy, a Spray SR Policy has a BSID instantiated into 1042 the forwarding plane. 1044 Traffic is typically steered into a Spray SR Policy in two ways: 1046 o local policy-based routing at the headend of the policy. 1047 o remote classification and steering via the BSID of the Spray SR 1048 Policy. 1050 14. Reporting SR Policy 1052 Stateful PCEP ([I-D.ietf-pce-stateful-pce] and 1053 [I-D.sivabalan-pce-binding-label-sid] provides an ability for head- 1054 end to report BSID, attributes, and operational/administrative 1055 states. Using this protocol, a PCE can also update an existing SR 1056 Policy whose path computation is delegated to it as well as 1057 instantiate new SR Policy on a head-end. 1059 BGP-LS reports an SR Policy via ([I-D.ietf-idr-te-lsp-distribution] 1061 15. Work in Progress 1063 o Open configuration model. 1064 o Yang model. 1066 16. Acknowledgement 1068 17. Normative References 1070 [GLOBECOM] 1071 Filsfils, C., Nainar, N., Pignataro, C., Cardona, J., and 1072 P. Francois, "The Segment Routing Architecture, IEEE 1073 Global Communications Conference (GLOBECOM)", 2015. 1075 [I-D.ietf-idr-te-lsp-distribution] 1076 Previdi, S., Dong, J., Chen, M., Gredler, H., and j. 1077 jefftant@gmail.com, "Distribution of Traffic Engineering 1078 (TE) Policies and State using BGP-LS", draft-ietf-idr-te- 1079 lsp-distribution-07 (work in progress), July 2017. 1081 [I-D.ietf-isis-segment-routing-extensions] 1082 Previdi, S., Filsfils, C., Bashandy, A., Gredler, H., 1083 Litkowski, S., Decraene, B., and j. jefftant@gmail.com, 1084 "IS-IS Extensions for Segment Routing", draft-ietf-isis- 1085 segment-routing-extensions-13 (work in progress), June 1086 2017. 1088 [I-D.ietf-pce-pce-initiated-lsp] 1089 Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "PCEP 1090 Extensions for PCE-initiated LSP Setup in a Stateful PCE 1091 Model", draft-ietf-pce-pce-initiated-lsp-11 (work in 1092 progress), October 2017. 1094 [I-D.ietf-pce-segment-routing] 1095 Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., 1096 and J. Hardwick, "PCEP Extensions for Segment Routing", 1097 draft-ietf-pce-segment-routing-10 (work in progress), 1098 October 2017. 1100 [I-D.ietf-pce-stateful-pce] 1101 Crabbe, E., Minei, I., Medved, J., and R. Varga, "PCEP 1102 Extensions for Stateful PCE", draft-ietf-pce-stateful- 1103 pce-21 (work in progress), June 2017. 1105 [I-D.ietf-spring-segment-routing] 1106 Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., 1107 Litkowski, S., and R. Shakir, "Segment Routing 1108 Architecture", draft-ietf-spring-segment-routing-13 (work 1109 in progress), October 2017. 1111 [I-D.previdi-idr-segment-routing-te-policy] 1112 Previdi, S., Filsfils, C., Mattes, P., Rosen, E., and S. 1113 Lin, "Advertising Segment Routing Policies in BGP", draft- 1114 previdi-idr-segment-routing-te-policy-07 (work in 1115 progress), June 2017. 1117 [I-D.sivabalan-pce-binding-label-sid] 1118 Sivabalan, S., Filsfils, C., Previdi, S., Tantsura, J., 1119 Hardwick, J., and D. Dhody, "Carrying Binding Label/ 1120 Segment-ID in PCE-based Networks.", draft-sivabalan-pce- 1121 binding-label-sid-03 (work in progress), July 2017. 1123 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1124 Requirement Levels", BCP 14, RFC 2119, 1125 DOI 10.17487/RFC2119, March 1997, 1126 . 1128 [SIGCOMM] Hartert, R., Vissicchio, S., Schaus, P., Bonaventure, O., 1129 Filsfils, C., Telkamp, T., and P. Francois, "A Declarative 1130 and Expressive Approach to Control Forwarding Paths in 1131 Carrier-Grade Networks, ACM SIGCOMM", 2015. 1133 Authors' Addresses 1135 Clarence Filsfils 1136 Cisco Systems, Inc. 1137 Pegasus Parc 1138 De kleetlaan 6a, DIEGEM BRABANT 1831 1139 BELGIUM 1141 Email: cfilsfil@cisco.com 1143 Siva Sivabalan 1144 Cisco Systems, Inc. 1145 2000 Innovation Drive 1146 Kanata, Ontario K2K 3E8 1147 Canada 1149 Email: msiva@cisco.com 1151 Kamran Raza 1152 Cisco Systems, Inc. 1153 2000 Innovation Drive 1154 Kanata, Ontario K2K 3E8 1155 Canada 1157 Email: skraza@cisco.com 1159 Shraddha Hegde 1160 Juniper Networks, Inc. 1161 Embassy Business Park 1162 Bangalore, KA 560093 1163 India 1165 Email: shraddha@juniper.net 1166 Daniel Yoyer 1167 Bell Canada. 1169 Email: daniel.yoyer@bell.ca 1171 Steven Lin 1172 Google, Inc. 1174 Email: stevenlin@google.com 1176 Alex Bogdanov 1177 Google, Inc. 1179 Email: bogdanov@google.com 1181 Martin Horneffer 1182 Deutsche Telekom 1184 Email: martin.horneffer@telekom.de 1186 Francois Clad 1187 Cisco Systems, Inc., 1189 Email: fclad@cisco.com 1191 Dirk Steinberg 1192 Steinberg Consulting 1194 Email: dws@steinbergnet.net 1196 Bruno Decraene 1197 Orange Business Services 1199 Email: bruno.decraene@orange.com 1201 Stephane Litkowski 1202 Orange Business Services 1204 Email: stephane.litkowski@orange.com 1205 Mohan Nanduri 1206 ebay Corporation. 1207 2025 Hamilton Avenue 1208 San Jose, CA 98052 1209 USA 1211 Email: mnanduri@ebay.com