idnits 2.17.1 draft-filsfils-spring-segment-routing-policy-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 4 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 28, 2018) is 2242 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 460 -- Looks like a reference, but probably isn't: '255' on line 460 == Missing Reference: 'RFC5305' is mentioned on line 500, but not defined == Missing Reference: 'RFC3630' is mentioned on line 500, but not defined == Missing Reference: 'ID.draft-ietf-idr-tunnel-encaps-07' is mentioned on line 972, but not defined == Missing Reference: 'RFC6830' is mentioned on line 973, but not defined ** Obsolete undefined reference: RFC 6830 (Obsoleted by RFC 9300, RFC 9301) -- Looks like a reference, but probably isn't: '16000' on line 1207 -- Looks like a reference, but probably isn't: '24000' on line 1207 -- Looks like a reference, but probably isn't: '4000' on line 2099 -- Looks like a reference, but probably isn't: '8000' on line 2099 -- Looks like a reference, but probably isn't: '4009' on line 2102 -- Looks like a reference, but probably isn't: '5000' on line 2102 == Missing Reference: '4000-4499' is mentioned on line 2110, but not defined == Missing Reference: '4500-5000' is mentioned on line 2110, but not defined == Unused Reference: 'GLOBECOM' is defined on line 2139, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-isis-segment-routing-extensions' is defined on line 2150, but no explicit reference was found in the text == Unused Reference: 'I-D.previdi-idr-segment-routing-te-policy' is defined on line 2180, but no explicit reference was found in the text == Unused Reference: 'SIGCOMM' is defined on line 2197, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'GLOBECOM' == Outdated reference: A later version (-19) exists of draft-ietf-idr-te-lsp-distribution-08 == Outdated reference: A later version (-25) exists of draft-ietf-isis-segment-routing-extensions-15 == Outdated reference: A later version (-16) exists of draft-ietf-pce-segment-routing-11 == Outdated reference: A later version (-07) exists of draft-sivabalan-pce-binding-label-sid-03 -- Possible downref: Non-RFC (?) normative reference: ref. 'SIGCOMM' Summary: 3 errors (**), 0 flaws (~~), 16 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Filsfils 3 Internet-Draft S. Sivabalan 4 Intended status: Standards Track K. Raza 5 Expires: September 1, 2018 J. Liste 6 F. Clad 7 K. Talaulikar 8 Z. Ali 9 Cisco Systems, Inc. 10 S. Hegde 11 Juniper Networks, Inc. 12 D. Voyer 13 Bell Canada. 14 S. Lin 15 A. Bogdanov 16 P. Krol 17 Google, Inc. 18 M. Horneffer 19 Deutsche Telekom 20 D. Steinberg 21 Steinberg Consulting 22 B. Decraene 23 S. Litkowski 24 Orange Business Services 25 P. Mattes 26 Microsoft 27 February 28, 2018 29 Segment Routing Policy for Traffic Engineering 30 draft-filsfils-spring-segment-routing-policy-05.txt 32 Abstract 34 Segment Routing allows a headend node to steer a packet flow along 35 any path. Intermediate per-flow states are eliminated thanks to 36 source routing. The headend node steers a flow into an SR Policy. 37 The header of a packet steered in an SR Policy is augmented with the 38 ordered list of segments associated with that SR Policy. This 39 document details the concepts of SR Policy and steering into an SR 40 Policy. 42 Requirements Language 44 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 45 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 46 document are to be interpreted as described in [RFC2119]. 48 Status of This Memo 50 This Internet-Draft is submitted in full conformance with the 51 provisions of BCP 78 and BCP 79. 53 Internet-Drafts are working documents of the Internet Engineering 54 Task Force (IETF). Note that other groups may also distribute 55 working documents as Internet-Drafts. The list of current Internet- 56 Drafts is at https://datatracker.ietf.org/drafts/current/. 58 Internet-Drafts are draft documents valid for a maximum of six months 59 and may be updated, replaced, or obsoleted by other documents at any 60 time. It is inappropriate to use Internet-Drafts as reference 61 material or to cite them other than as "work in progress." 63 This Internet-Draft will expire on September 1, 2018. 65 Copyright Notice 67 Copyright (c) 2018 IETF Trust and the persons identified as the 68 document authors. All rights reserved. 70 This document is subject to BCP 78 and the IETF Trust's Legal 71 Provisions Relating to IETF Documents 72 (https://trustee.ietf.org/license-info) in effect on the date of 73 publication of this document. Please review these documents 74 carefully, as they describe your rights and restrictions with respect 75 to this document. Code Components extracted from this document must 76 include Simplified BSD License text as described in Section 4.e of 77 the Trust Legal Provisions and are provided without warranty as 78 described in the Simplified BSD License. 80 Table of Contents 82 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 83 2. SR Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 5 84 2.1. Identification of an SR Policy . . . . . . . . . . . . . 5 85 2.2. Candidate Path and Segment List . . . . . . . . . . . . . 6 86 2.3. Protocol-Origin of a Candidate Path . . . . . . . . . . . 6 87 2.4. Originator of a Candidate Path . . . . . . . . . . . . . 7 88 2.5. Discriminator of a Candidate Path . . . . . . . . . . . . 7 89 2.6. Identification of a Candidate Path . . . . . . . . . . . 8 90 2.7. Preference of a Candidate Path . . . . . . . . . . . . . 8 91 2.8. Validity of a Candidate Path . . . . . . . . . . . . . . 8 92 2.9. Active Candidate Path . . . . . . . . . . . . . . . . . . 8 93 2.10. Validity of an SR Policy . . . . . . . . . . . . . . . . 10 94 2.11. Instantiation of an SR Policy in the Forwarding Plane . . 10 95 2.12. Priority of an SR Policy . . . . . . . . . . . . . . . . 10 96 2.13. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 10 97 3. Segment Routing Database . . . . . . . . . . . . . . . . . . 11 98 4. Segment Types . . . . . . . . . . . . . . . . . . . . . . . . 12 99 4.1. Explicit Null . . . . . . . . . . . . . . . . . . . . . . 15 100 5. Validity of a Candidate Path . . . . . . . . . . . . . . . . 15 101 5.1. Explicit Candidate Path . . . . . . . . . . . . . . . . . 16 102 5.2. Dynamic Candidate Path . . . . . . . . . . . . . . . . . 17 103 6. Binding SID . . . . . . . . . . . . . . . . . . . . . . . . . 17 104 6.1. BSID of a candidate path . . . . . . . . . . . . . . . . 17 105 6.2. BSID of an SR Policy . . . . . . . . . . . . . . . . . . 17 106 6.2.1. Frequent use-cases : unspecified BSID . . . . . . . . 18 107 6.2.2. Frequent use-case: all specified to the same BSID . . 18 108 6.2.3. Specified-BSID-only . . . . . . . . . . . . . . . . . 18 109 6.3. Forwarding Plane . . . . . . . . . . . . . . . . . . . . 18 110 6.4. Not an identification . . . . . . . . . . . . . . . . . . 19 111 7. SR Policy State . . . . . . . . . . . . . . . . . . . . . . . 19 112 8. Steering into an SR Policy . . . . . . . . . . . . . . . . . 19 113 8.1. Validity of an SR Policy . . . . . . . . . . . . . . . . 19 114 8.2. Drop upon invalid SR Policy . . . . . . . . . . . . . . . 20 115 8.3. Incoming Active SID is a BSID . . . . . . . . . . . . . . 20 116 8.4. Per-Destination Steering . . . . . . . . . . . . . . . . 21 117 8.4.1. Multiple Colors . . . . . . . . . . . . . . . . . . . 21 118 8.5. Recursion on an on-demand dynamic BSID . . . . . . . . . 22 119 8.5.1. Multiple Colors . . . . . . . . . . . . . . . . . . . 22 120 8.6. Per-Flow Steering . . . . . . . . . . . . . . . . . . . . 22 121 8.7. Policy-based Routing . . . . . . . . . . . . . . . . . . 23 122 8.8. Optional Steering Modes for BGP Destinations . . . . . . 24 123 8.8.1. Color-Only BGP Destination Steering . . . . . . . . . 24 124 8.8.2. Multiple Colors and CO flags . . . . . . . . . . . . 25 125 8.8.3. Drop upon Invalid . . . . . . . . . . . . . . . . . . 25 126 9. Other type of SR Policies . . . . . . . . . . . . . . . . . . 26 127 9.1. Layer 2 and Optical Transport . . . . . . . . . . . . . . 26 128 9.2. Spray SR Policy . . . . . . . . . . . . . . . . . . . . . 27 129 10. 50msec Local Protection . . . . . . . . . . . . . . . . . . . 27 130 10.1. Leveraging TI-LFA local protection of the constituent 131 IGP segments . . . . . . . . . . . . . . . . . . . . . . 27 132 10.2. Using an SR Policy to locally protect a link . . . . . . 28 133 11. Other types of Segments . . . . . . . . . . . . . . . . . . . 28 134 11.1. Service SID . . . . . . . . . . . . . . . . . . . . . . 28 135 11.2. Flex-Alg IGP SID . . . . . . . . . . . . . . . . . . . . 29 136 12. Binding SID to a tunnel . . . . . . . . . . . . . . . . . . . 29 137 13. Traffic Accounting . . . . . . . . . . . . . . . . . . . . . 29 138 13.1. Traffic Counters Naming convention . . . . . . . . . . . 30 139 13.2. Per-Interface SR Counters . . . . . . . . . . . . . . . 31 140 13.2.1. Per interface, per protocol aggregate egress SR 141 traffic counters (SR.INT.E.PRO) . . . . . . . . . . 31 142 13.2.2. Per interface, per traffic-class, per protocol 143 aggregate egress SR traffic counters 144 (SR.INT.E.PRO.TC) . . . . . . . . . . . . . . . . . 31 145 13.2.3. Per interface aggregate ingress SR traffic counter 146 (SR.INT.I) . . . . . . . . . . . . . . . . . . . . . 31 147 13.2.4. Per interface, per TC aggregate ingress SR traffic 148 counter (SR.INT.I.TC) . . . . . . . . . . . . . . . 32 149 13.3. Prefix SID Counters . . . . . . . . . . . . . . . . . . 32 150 13.3.1. Per-prefix SID egress traffic counter (PSID.E) . . . 32 151 13.3.2. Per-prefix SID per-TC egress traffic counter 152 (PSID.E.TC) . . . . . . . . . . . . . . . . . . . . 32 153 13.3.3. Per-prefix SID, per egress interface traffic counter 154 (PSID.INT.E) . . . . . . . . . . . . . . . . . . . . 32 155 13.3.4. Per-prefix SID per TC per egress interface traffic 156 counter (PSID.INT.E.TC) . . . . . . . . . . . . . . 32 157 13.3.5. Per-prefix SID, per ingress interface traffic 158 counter (PSID.INT.I) . . . . . . . . . . . . . . . . 33 159 13.3.6. Per-prefix SID, per TC, per ingress interface 160 traffic counter (PSID.INT.I.TC) . . . . . . . . . . 33 161 13.4. Traffic Matrix Counters . . . . . . . . . . . . . . . . 33 162 13.4.1. Per-Prefix SID Traffic Matrix counter (PSID.E.TM) . 33 163 13.4.2. Per-Prefix, Per TC SID Traffic Matrix counter 164 (PSID.E.TM.TC) . . . . . . . . . . . . . . . . . . . 33 165 13.5. SR Policy Counters . . . . . . . . . . . . . . . . . . . 34 166 13.5.1. Per-SR Policy Aggregate traffic counter (POL) . . . 34 167 13.5.2. Per-SR Policy labelled steered aggregate traffic 168 counter (POL.BSID) . . . . . . . . . . . . . . . . . 34 169 13.5.3. Per-SR Policy, per TC Aggregate traffic counter 170 (POL.TC) . . . . . . . . . . . . . . . . . . . . . . 34 171 13.5.4. Per-SR Policy, per TC labelled steered aggregate 172 traffic counter (POL.BSID.TC) . . . . . . . . . . . 34 173 13.5.5. Per-SR Policy, Per-Segment-List Aggregate traffic 174 counter (POL.SL) . . . . . . . . . . . . . . . . . . 35 175 13.5.6. Per-SR Policy, Per-Segment-List labelled steered 176 aggregate traffic counter (POL.SL.BSID) . . . . . . 35 177 14. Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . 35 178 14.1. SRTE headend architecture . . . . . . . . . . . . . . . 35 179 14.2. Distributed and/or Centralized Control Plane . . . . . . 36 180 14.2.1. Distributed Control Plane within a single Link-State 181 IGP area . . . . . . . . . . . . . . . . . . . . . . 36 182 14.2.2. Distributed Control Plane across several Link-State 183 IGP areas . . . . . . . . . . . . . . . . . . . . . 36 184 14.2.3. Centralized Control Plane . . . . . . . . . . . . . 37 185 14.2.4. Distributed and Centralized Control Plane . . . . . 37 186 14.3. Examples of Candidate Path Selection . . . . . . . . . . 38 187 14.4. More on Dynamic Path . . . . . . . . . . . . . . . . . . 41 188 14.4.1. Optimization Objective . . . . . . . . . . . . . . . 41 189 14.4.2. Constraints . . . . . . . . . . . . . . . . . . . . 42 190 14.4.3. SR Native Algorithm . . . . . . . . . . . . . . . . 42 191 14.4.4. Path to SID . . . . . . . . . . . . . . . . . . . . 43 193 14.5. Benefits of Binding SID . . . . . . . . . . . . . . . . 44 194 14.6. Centralized Discovery of available SID in SRLB . . . . . 45 195 15. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 46 196 16. Normative References . . . . . . . . . . . . . . . . . . . . 46 197 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 48 199 1. Introduction 201 Segment Routing (SR) allows a headend node to steer a packet flow 202 along any path. Intermediate per-flow states are eliminated thanks 203 to source routing [I-D.ietf-spring-segment-routing]. 205 The headend node is said to steer a flow into an Segment Routing 206 Policy (SR Policy). 208 The header of a packet steered in an SR Policy is augmented with the 209 ordered list of segments associated with that SR Policy. 211 This document details the concepts of SR Policy and steering into an 212 SR Policy. These apply equally to the MPLS and SRv6 instantiations 213 of segment routing. 215 For reading simplicity, the illustrations are provided for the MPLS 216 instantiations. 218 2. SR Policy 220 2.1. Identification of an SR Policy 222 An SR Policy is identified through the tuple . In the context of a specific headend, one may identify an 224 SR policy by the tuple. 226 The headend is the node where the policy is instantiated/implemented. 227 The headend is specified as an IPv4 or IPv6 address. 229 The endpoint indicates the destination of the policy. The endpoint 230 is specified as an IPv4 or IPv6 address. In a specific case (refer 231 to section 8.8.1), the endpoint can be the null address (0.0.0.0 for 232 IPv4, ::0 for IPv6). 234 The color is a 32-bit numerical value that associates the SR Policy 235 with an intent (e.g., low-latency). 237 The endpoint and the color are used to automate the steering of 238 service or transport routes on SR Policies (refer to section 8). 240 2.2. Candidate Path and Segment List 242 An SR Policy is associated with one or more candidate paths. 244 A candidate path is itself associated with a Segment-List (SID-List) 245 or a set of SID-Lists. In the latter case, each SID-List is 246 associated with a weight for weighted load balancing (refer to 247 section 2.11 for details). The default weight is 1. 249 A SID-List represents a specific source-routed way to send traffic 250 from the head-end to the endpoint of the corresponding SR policy. 252 A candidate path is either dynamic or explicit. 254 An explicit candidate path is associated with a SID-List or a set of 255 SID-Lists. 257 A dynamic candidate path expresses an optimization objective and a 258 set of constraints. The headend (potentially with the help of a PCE) 259 computes the solution SID-List (or set of SID-Lists) that solves the 260 optimization problem. 262 2.3. Protocol-Origin of a Candidate Path 264 A headend may be informed about a candidate path for an SR Policy 265 by various means including: via configuration, PCEP 266 [I-D.ietf-pce-pce-initiated-lsp] or BGP [I-D.draft-ietf-idr-segment- 267 routing-te-policy]. 269 Protocol-Origin of a candidate path is an 8-bit value which 270 identifies the component or protocol that originates or signals the 271 candidate path. The table below specifies the RECOMMENDED default 272 values. Implementations MAY allow modifications of these default 273 values assigned to protocols on the SRTE head-end as long as no two 274 protocols share the same value. 276 The default values are listed below: 278 +-------+---------------------------------------------------------+ 279 | Value | Protocol-Origin | 280 +-------+---------------------------------------------------------+ 281 | 10 | PCEP | 282 | 20 | BGP-SRTE | 283 | 30 | Local (via CLI, Yang model through NETCONF, gRPC, etc.) | 284 +-------+---------------------------------------------------------+ 286 Table 1: Protocol-origin Identifier 288 2.4. Originator of a Candidate Path 290 Originator identifies the node which provisioned or signalled the 291 candidate path on the SRTE head-end. The originator is expressed in 292 the form of a 160 bit numerical value formed by the concatenation of 293 the fields of the tuple as below: 295 o ASN : represented as a 4 byte number. 297 o Node Address : represented as a 128 bit value. IPv4 addresses are 298 encoded in the lowest 32 bits. 300 When Protocol-Origin is Local, the ASN and node address MAY be set to 301 either the SRTE headend or the provisioning controller/node ASN and 302 address. Default value is 0 for both AS and node address. 304 When Protocol-Origin is PCEP, it is the IPv4 or IPv6 address of the 305 PCE and the AS number SHOULD be set to 0 by default when not 306 available or known. 308 Protocol-Origin is BGP-SRTE, it is provided by the BGP component on 309 the headend and is: 311 o the BGP Router ID and ASN of the node/controller signalling the 312 candidate path when it has a BGP session to the headend, OR 314 o the BGP Router ID of the eBGP peer signalling the candidate path 315 along with ASN of origin when the signalling is done via one or 316 more intermediate eBGP routers, OR 318 o the BGP Originator ID [rfc4456] and the ASN of the node/controller 319 when the signalling is done via one or more route-reflectors over 320 iBGP session. 322 2.5. Discriminator of a Candidate Path 324 The Discriminator is a 32 bit value associated with a candidate path 325 that uniquely identifies it within the context of an SR Policy from a 326 specific Protocol-Origin as specified below: 328 When Protocol-Origin is Local, this is an implementation's 329 configuration model specific unique identifier for a candidate path. 331 When PCEP is the Protocol-Origin, the method to uniquely identify 332 signalled path will be specified in an upcoming PCEP draft. 334 When BGP-SRTE is the Protocol-Origin, it is the distinguisher 335 specified in Section 2.1 of [I.D.draft-ietf-idr-segment-routing-te- 336 policy]. 338 2.6. Identification of a Candidate Path 340 A candidate path is identified in the context of a single SR Policy. 342 A candidate path is not shared across SR Policies. 344 A candidate path is not identified by its SID-List(s). 346 If CP1 is a candidate path of SR Policy Pol1 and CP2 is a 347 candidate path of SR Policy Pol2, then these two candidate paths 348 are independent, even if they happen to have the same SID-List. 349 The SID-List does not identify a candidate path. The SID-List is 350 an attribute of a candidate path. 352 The identity of a candidate path MUST be uniquely established in the 353 context of an SR Policy in order to handle 354 add, delete or modify operations on them in an unambiguous manner 355 regardless of their source(s). 357 The tuple uniquely 358 identify a candidate path. 360 2.7. Preference of a Candidate Path 362 The preference of the candidate path is used to select the best 363 candidate path for an SR Policy. The default preference is 100. 365 It is recommended that each candidate path of a given SR policy has a 366 different preference. 368 2.8. Validity of a Candidate Path 370 A candidate path is valid if it is usable. A common path validity 371 criterion is the reachability of its constituent SIDs. The 372 validation rules are specified in section 5. 374 2.9. Active Candidate Path 376 A candidate path is selected when it is valid and it is determined to 377 be the best path of the SR Policy. The selected path is referred to 378 as the "active path" of the SR policy in this document. 380 Whenever a new path is learned or an active path is deleted, the 381 validity of an existing path changes or an existing path is changed, 382 the selection process MUST be re-executed. 384 The candidate path selection process operates on the candidate path 385 Preference. A candidate path is selected when it is valid and it has 386 the highest preference value among all the candidate paths of the SR 387 Policy. 389 In the case of multiple valid candidate paths of the same preference, 390 the tie-breaking rules are evaluated on the identification tuple in 391 the following order until only one valid best path is selected: 393 1. Higher value of Protocol-Origin is selected. 395 2. Lower value of originator is selected. 397 3. Finally, the higher value of discriminator is selected. 399 An implementation MAY choose to override any of the tie-breaking 400 rules above and maintain the already selected candidate path as 401 active path. 403 The rules are framed with multiple protocols and sources in mind and 404 hence may not follow the logic of a single protocol (e.g. BGP best 405 path selection). The motivation behind these rules are as follows: 407 The Protocol-Origin allows an operator to setup a default selection 408 mechanism across protocol sources, e.g., to prefer locally 409 provisioned over paths signalled via BGP-SRTE or PCEP. 411 The preference, being the first tiebreaker, allows an operator to 412 influence selection across paths thus allowing provisioning of 413 multiple path options, e.g., CP1 is preferred and if it becomes 414 invalid then fall-back to CP2 and so on. Since preference works 415 across protocol sources it also enables (where necessary) selective 416 override of the default protocol-origin preference, e.g., to prefer a 417 path signalled via BGP-SRTE over what is locally provisioned. 419 The originator allows an operator to have multiple redundant 420 controllers and still maintain a deterministic behaviour over which 421 of them are preferred even if they are providing the same candidate 422 paths for the same SR policies to the headend. 424 The discriminator performs the final tiebreaking step to ensure a 425 deterministic outcome of selection regardless of the order in which 426 candidate paths are signalled across multiple transport channels or 427 sessions. 429 Section 14.3 provides a set of examples to illustrate the active 430 candidate path selection rules. 432 2.10. Validity of an SR Policy 434 An SR Policy is valid when it has at least one valid candidate path. 436 2.11. Instantiation of an SR Policy in the Forwarding Plane 438 A valid SR Policy is instantiated in the forwarding plane. 440 Only the active candidate path is used for forwarding traffic that is 441 being steered onto that policy. 443 If a set of SID-Lists is associated with the active path of the 444 policy, then the steering is per flow and W-ECMP based according to 445 the relative weight of each SID-List. 447 The fraction of the flows associated with a given SID-List is w/Sw 448 where w is the weight of the SID-List and Sw is the sum of the 449 weights of the SID-Lists of the selected path of the SR Policy. 451 The accuracy of the weighted load-balancing depends on the platform 452 implementation. 454 2.12. Priority of an SR Policy 456 Upon topological change, many policies could be recomputed. An 457 implementation MAY provide a per-policy priority field. The operator 458 MAY set this field to indicate order in which the policies should be 459 re-computed. Such a priority is represented by an integer in the 460 range [0, 255] where the lowest value is the highest priority. The 461 default value of priority is 128. 463 2.13. Summary 465 In summary, the information model is the following: 467 SR policy POL1 468 Candidate-path CP1 470 Preference 200 471 Weight W1, SID-List1 472 Weight W2, SID-List2 473 Candidate-path CP2 475 Preference 100 476 Weight W3, SID-List3 477 Weight W4, SID-List4 479 The SR Policy POL1 is identified by the tuple . It has two candidate paths CP1 and CP2. Each is 481 identified by a tuple . 482 CP1 is the active candidate path (it is valid and it has the highest 483 preference). The two SID-Lists of CP1 are installed as the 484 forwarding instantiation of SR policy Pol1. Traffic steered on Pol1 485 is flow-based hashed on SID-List with a ratio 486 W1/(W1+W2). 488 3. Segment Routing Database 490 An SR headend maintains the Segment Routing Traffic Engineering 491 Database (SRTE-DB). 493 An SR headend leverages the SRTE-DB to validate explicit candidate 494 paths and compute dynamic candidate paths. 496 The information in the SRTE-DB MAY include: 498 o IGP information (topology, IGP metrics). 499 o TE Link Attributes (such as TE metric, SRLG, attribute-flag, 500 extended admin group) [RFC5305, RFC3630]. 501 o Extended TE Link attributes (such as latency, loss) [RFC7810, 502 RFC7471]. 503 o Inter-Domain Topology information [I.D.draft-ietf-idr-bgpls- 504 segment-routing-epe]. 505 o Segment Routing information (such as SRGB, SRLB, Prefix-SIDs, Adj- 506 SIDs, BGP Peering SID, SRv6 SIDs). 508 The SRTE-DB is multi-domain capable. 510 The attached domain topology MAY be learned via IGP, BGP-LS or 511 NETCONF. 513 A non-attached (remote) domain topology MAY be learned via BGP-LS or 514 NETCONF. 516 In some use-cases, the SRTE-DB may only contain the attached domain 517 topology while in others, the SRTE-DB may contain the topology of 518 multiple domains. The SRTE-DB MAY also contain the SR Policies 519 instantiated in the network. This can be collected via BGP-LS ([I- 520 D.ietf-idr-te-lsp-distribution] or PCEP ([I-D.ietf-pce-stateful-pce] 521 and [I-D.sivabalan-pce-binding-label-sid]). 523 This information allows to build an end-to-end policy on the basis of 524 intermediate SR policies (Section 6). 526 The SRTE-DB MAY also contain the Maximum SID Depth (MSD) capability 527 of nodes in the topology. This can be collected via ISIS [draft- 528 ietf-isis-segment-routing-msd], OSPF [draft-ietf-ospf-segment- 529 routing-msd], BGP-LS [draft-ietf-idr-bgp-ls-segment-routing-msd] or 530 PCEP [I-D.ietf-pce-segment-routing]. 532 4. Segment Types 534 A SID-List is an ordered set of segments represented as where S1 is the first segment. 537 Based on the desired dataplane, either the MPLS label stack or the 538 SRv6 SRH is built from the SID-List. However, the SID-List itself 539 can specified using different segment-descriptor types and the 540 following are defined: 542 Type 1: SR-MPLS Label: 543 SR-MPLS label corresponding to any of the segment types defined 544 in [I.D.draft-ietf-spring-segment-routing] can be used. 545 Additionally, reserved labels like explicit-null or in general 546 any MPLS label may also be used. e.g. this type can be used to 547 specify a label representation which maps to an optical 548 transport path on a packet transport node. This type does not 549 require the SRTE process on the headend to perform any 550 resolution. 552 Type 2: SRv6 SID: 553 IPv6 address corresponding to any of the segment types defined 554 in [I.D.draft-filsfils-spring-srv6-network-programming] can be 555 used. This type does not require the SRTE process on the 556 headend to perform any resolution. 558 Type 3: IPv4 Prefix with optional SR Algorithm: 559 The SRTE process on the headend is required to resolve the 560 specified IPv4 Prefix Address to the SR-MPLS label 561 corresponding to its Prefix SID segment. The SR algorithm 562 (refer to Section 3.1.1 of [I.D.draft-ietf-spring-segment- 563 routing]) to be used MAY also be provided. When algorithm is 564 not specified, the SRTE process is expected to use the Prefix 565 SID signalled for the Strict Shortest Path algorithm when 566 available and if not then use the Shortest Path or default 567 algorithm. 569 Type 4: IPv6 Global Prefix with optional SR Algorithm for SR-MPLS: 570 In this case the SRTE process on the headend is required to 571 resolve the specified IPv6 Global Prefix Address to the SR-MPLS 572 label corresponding to its Prefix SID segment. The SR 573 Algorithm (refer to Section 3.1.1 of [I.D.draft-ietf-spring- 574 segment-routing]) to be used MAY also be provided. When 575 algorithm is not specified, the SRTE process is expected to use 576 the Prefix SID signalled for the Strict Shortest Path algorithm 577 when available and if not then use the Shortest Path or default 578 algorithm. 580 Type 5: IPv4 Prefix with Local Interface ID: 581 This type allows identification of Adjacency SID or BGP EPE 582 Peer Adjacency SID label for point-to-point links including IP 583 unnumbered links. The SRTE process on the headend is required 584 to resolve the specified IPv4 Prefix Address to the Node 585 originating it and then use the Local Interface ID to identify 586 the point-to-point link whose adjacency is being referred to. 587 The Local Interface ID link descriptor follows semantics as 588 specified in RFC7752. This type can also be used to indicate 589 indirection into a layer 2 interface (i.e. without IP address) 590 like a representation of an optical transport path or a layer 2 591 Ethernet port or circuit at the specified node. 593 Type 6: IPv4 Addresses for link endpoints as Local, Remote pair: 594 This type allows identification of Adjacency SID for BGP EPE 595 Peer Adjacency SID label for links. The SRTE process on the 596 headend is required to resolve the specified IPv4 Local Address 597 to the Node originating it and then use the IPv4 Remote Address 598 to identify the link adjacency being referred to. The Local 599 and Remote Address pair link descriptors follows semantics as 600 specified in RFC7752. 602 Type 7: IPv6 Prefix and Interface ID for link endpoints as Local, 603 Remote pair for SR-MPLS: 604 This type allows identification of Adjacency SID or BGP EPE 605 Peer Adjacency SID label for links including those with only 606 Link Local IPv6 addresses. The SRTE process on the headend is 607 required to resolve the specified IPv6 Prefix Address to the 608 Node originating it and then use the Local Interface ID to 609 identify the point-to-point link whose adjacency is being 610 referred to. For other than point-to-point links, additionally 611 the specific adjacency over the link needs to be resolved using 612 the Remote Prefix and Interface ID. The Local and Remote pair 613 of Prefix and Interface ID link descriptor follows semantics as 614 specified in RFC7752. This type can also be used to indicate 615 indirection into a layer 2 interface (i.e. without IP address) 616 like a representation of an optical transport path or a layer 2 617 Ethernet port or circuit at the specified node. 619 Type 8: IPv6 Addresses for link endpoints as Local, Remote pair for 620 SR-MPLS: 621 This type allows identification of Adjacency SID for BGP EPE 622 Peer Adjacency SID label for links with Global IPv6 addresses. 623 The SRTE process on the headend is required to resolve the 624 specified Local IPv6 Address to the Node originating it and 625 then use the Remote IPv6 Address to identify the link adjacency 626 being referred to. The Local and Remote Address pair link 627 descriptors follows semantics as specified in RFC7752. 629 Type 9: IPv6 Global Prefix with optional SR Algorithm for SRv6: 630 The SRTE process on the headend is required to resolve the 631 specified IPv6 Global Prefix Address to the SRv6 END function 632 SID corresponding to the node which is originating the prefix. 633 The SR Algorithm (refer to Section 3.1.1 of [I.D.draft-ietf- 634 spring-segment-routing]) to be used MAY also be provided. When 635 algorithm is not specified, the SRTE process is expected to use 636 the Prefix SID signaled for the Strict Shortest Path algorithm 637 when available and if not then use the Shortest Path or default 638 algorithm. 640 Type 10:IPv6 Prefix and Interface ID for link endpoints as Local, 641 Remote pair for SRv6: 642 This type allows identification of SRv6 END.X SID for links 643 with only Link Local IPv6 addresses. The SRTE process on the 644 headend is required to resolve the specified IPv6 Prefix 645 Address to the Node originating it and then use the Local 646 Interface ID to identify the point-to-point link whose 647 adjacency is being referred to. For other than point-to-point 648 links, additionally the specific adjacency needs to be resolved 649 using the Remote Prefix and Interface ID. The Local and Remote 650 pair of Prefix and Interface ID link descriptor follows 651 semantics as specified in RFC7752. 653 Type 11:IPv6 Addresses for link endpoints as Local, Remote pair for 654 SRv6: 655 This type allows identification of SRv6 END.X SID for links 656 with Global IPv6 addresses. The SRTE process on the headend is 657 required to resolve the specified Local IPv6 Address to the 658 Node originating it and then use the Remote IPv6 Address to 659 identify the link adjacency being referred to. The Local and 660 Remote Address pair link descriptors follows semantics as 661 specified in RFC7752. 663 When building the MPLS label stack or the IPv6 Segment list from the 664 Segment List, the node instantiating the policy MUST interpret the 665 set of Segments as follows: 667 o The first Segment represents the topmost label or the first IPv6 668 segment. It identifies the first segment the traffic will be 669 directed toward along the SR explicit path. 670 o The last Segment represents the bottommost label or the last IPv6 671 segment the traffic will be directed toward along the SR explicit 672 path. 674 4.1. Explicit Null 676 A Type 1 SID may be any MPLS label, including reserved labels. 678 For example, assuming that the desired traffic-engineered path from a 679 headend 1 to an endpoint 4 can be expressed by the SID-List <16002, 680 16003, 16004> where 16002, 16003 and 16004 respectively refer to the 681 IPv4 Prefix SIDs bound to node 2, 3 and 4, then IPv6 traffic can be 682 traffic-engineered from nodes 1 to 4 via the previously described 683 path using an SR Policy with SID-List <16002, 16003, 16004, 2> where 684 mpls label value of 2 represents the "IPv6 Explicit NULL Label". 686 The penultimate node before node 4 will pop 16004 and will forward 687 the frame on its directly connected interface to node 4. 689 The endpoint receives the traffic with top label "2" which indicates 690 that the payload is an IPv6 packet. 692 When steering unlabeled IPv6 BGP destination traffic using an SR 693 policy composed of SID-List(s) based on IPv4 SIDs, the Explicit Null 694 Label Policy is processed as specified in draft-idr-segment-routing- 695 te-policy Section 2.4.4. When this is not present then the headend 696 SHOULD automatically impose the "IPv6 Explicit NULL Label" as bottom 697 of stack label. Refer to "Steering" section later in this document. 699 5. Validity of a Candidate Path 700 5.1. Explicit Candidate Path 702 An explicit candidate path is associated with a SID-List or a set of 703 SID-Lists. 705 An explicit candidate path is provisioned by the operator directly or 706 via a controller. 708 The computation/logic that leads to the choice of the SID list is 709 external to the SR Policy headend. The SR Policy headend does not 710 compute the SID list. The SR Policy headend only confirms its 711 validity. 713 A SID-List of an explicit candidate path MUST be declared invalid 714 when: 716 o It is empty. 717 o Its weight is 0. 718 o The headend is unable to resolve the first SID into one or more 719 outgoing interface(s) and next-hop(s). 720 o The headend is unable to resolve any non-first SID of type 3-to-11 721 into an MPLS label or an SRv6 SID. 723 "Unable to resolve" means that the headend has no path to the SID in 724 its SRTE-DB. 726 In multi-domain deployments, it is expected that the headend be 727 unable to verify the reachability of the SIDs in remote domains. 728 Types 1 and 2 MUST be used for the SIDs for which the reachability 729 cannot be verified. Note that the first SID must always be reachable 730 regardless of its type. 732 In addition, a SID-List MAY be declared invalid when: 734 o Its last segment is not a Prefix SID (including BGP Peer Node-SID) 735 advertised by the node specified as the endpoint of the 736 corresponding SR policy. 737 o Its last segment is not an Adjacency SID (including BGP Peer 738 Adjacency SID) of any of the links present on neighbor nodes and 739 that terminate on the node specified as the endpoint of the 740 corresponding SR policy. 742 An explicit candidate path is invalid as soon as it has no valid SID- 743 List. 745 5.2. Dynamic Candidate Path 747 A dynamic candidate path is specified as an optimization objective 748 and constraints. 750 The headend of the policy leverages its SRTE-DB to compute a SID-List 751 ("solution SID-List") that solves this optimization problem. 753 The headend re-computes the solution SID-List any time the inputs to 754 the problem change (e.g., topology changes). 756 When local computation is not possible (e.g., a policy's tail-end is 757 outside the topology known to the head-end) or not desired, the head- 758 end MAY send path computation request to a PCE supporting PCEP 759 extension specified in [I-D.ietf-pce-segment-routing]. 761 If no solution is found to the optimization objective and 762 constraints, then the dynamic candidate path is declared invalid. 764 Section 14.4 lists some of the optimization objectives and 765 constraints that may be considered by a dynamic candidate path. It 766 illustrates some of the desirable properties of the computation of 767 the solution SID list. 769 6. Binding SID 771 The Binding SID (BSID) is fundamental to Segment Routing [I.D.draft- 772 ietf-spring-segment-routing]. It provides scaling, network opacity 773 and service independence. Section 14.5 illustrates these benefits. 775 6.1. BSID of a candidate path 777 Each candidate path MAY be defined with a BSID. 779 Candidate Paths of the same SR policy SHOULD have the same BSID. 781 Candidate Paths of different SR policies MUST NOT have the same BSID. 783 6.2. BSID of an SR Policy 785 The BSID of an SR policy is the BSID of its active candidate path. 787 When the active candidate path has a specified BSID, the SR Policy 788 uses that BSID if this value (label in MPLS, IPv6 address in SRv6) is 789 available (i.e., not associated with any other usage: e.g. to another 790 MPLS client, to another SID, to another SR Policy). 792 Optionally, instead of only checking that the BSID of the active path 793 is available, a headend MAY check that it is available within a given 794 SID range (i.e., SRLB). 796 When the specified BSID is not available (optionally is not in the 797 SRLB), an alert message is generated. 799 In the cases (as described above) where SR Policy does not have a 800 BSID available, then the SR Policy MAY dynamically bind a BSID to 801 itself. Dynamically bound BSID SHOULD use an available SID outside 802 the SRLB. 804 Assuming that at time t the BSID of the SR Policy is B1, if at time 805 t+dt a different candidate path becomes active and this new active 806 path does not have a specified BSID or its BSID is specified but is 807 not available, then the SR Policy keeps the previous BSID B1. 809 6.2.1. Frequent use-cases : unspecified BSID 811 All the candidate paths of the same SR Policy have unspecified BSID. 813 In such a case, a BSID MAY be dynamically bound to the SR Policy as 814 soon as the first valid candidate path is received. That BSID is 815 kept along all the life of the SR Policy and across changes of active 816 path. 818 6.2.2. Frequent use-case: all specified to the same BSID 820 All the paths of the SR Policy have the same specified BSID. 822 6.2.3. Specified-BSID-only 824 A headend MAY be configured with the Specified-BSID-only restrictive 825 behavior. 827 When this restrictive behavior is enabled, if the candidate path has 828 an unspecified BSID or if the specified BSID is not available when 829 the candidate path becomes active then no BSID is bound to it and it 830 is considered invalid. An alert is triggered. Other candidate paths 831 can then be evaluated for becoming the active candidate path. 833 6.3. Forwarding Plane 835 A valid SR Policy installs a BSID-keyed entry in the forwarding plane 836 with the action of steering the packets matching this entry to the 837 selected path of the SR Policy. 839 If the Specified-BSID-only restrictive behavior is enabled and the 840 BSID of the active path is not available (optionally not in the 841 SRLB), then the SR Policy does not install any entry indexed by a 842 BSID in the forwarding plane. 844 6.4. Not an identification 846 The association of an SR Policy to a BSID MAY change over the life of 847 the SR policy (e.g., upon active path change). The BSID of an SR 848 Policy is not an identification of an SR policy. The identification 849 of an SR Policy is the tuple . 851 7. SR Policy State 853 The SR Policy State is maintained on the headend by the SRTE process 854 represents the state of the policy and its candidate paths to provide 855 the accurate representation of whether the policy is being 856 instantiated in the forwarding plane and which of the candidate paths 857 is active. The SR Policy state MUST also reflect the reason when a 858 policy and/or its candidate path is not active due to validation 859 errors or not being preferred. 861 Implementations MAY support an administrative state to control 862 locally provisioned policies via mechanisms like CLI or NETCONF. 864 8. Steering into an SR Policy 866 A headend can steer a packet flow into a valid SR Policy in various 867 ways: 869 o Incoming packets have an active SID matching a local BSID at the 870 head-end. 871 o Per-destination Steering: incoming packets match a BGP/Service 872 route which recurses on an SR policy. 873 o Per-flow Steering: incoming packets match or recurse on a 874 forwarding array of where some of the entries are SR Policies. 875 o Policy-based Steering: incoming packets match a routing policy 876 which directs them on an SR policy. 878 For simplicity of illustration, this document uses the SR-MPLS 879 example. 881 8.1. Validity of an SR Policy 883 An SR Policy is invalid when all its candidate paths are invalid. 885 By default, upon transitioning to the invalid state, 886 o an SR Policy and its BSID are removed from the forwarding plane. 887 o any steering of a service (PW), destination (BGP-VPN), flow or 888 packet on the related SR policy is disabled and the related 889 service, destination, flow or packet is routed per the classic 890 forwarding table (e.g. longest-match to the destination or the 891 recursing next-hop). 893 8.2. Drop upon invalid SR Policy 895 An SR Policy MAY be enabled for the Drop-Upon-Invalid behavior: 897 o an invalid SR Policy and its BSID is kept in the forwarding plane 898 with an action to drop. 899 o any steering of a service (PW), destination (BGP-VPN), flow or 900 packet on the related SR policy is maintained with the action to 901 drop all of this traffic. 903 The drop-upon-invalid behavior has been deployed in use-cases where 904 the operator wants some PW to only be transported on a path with 905 specific constraints. When these constraints are no longer met, the 906 operator wants the PW traffic to be dropped. Specifically, the 907 operator does not want the PW to be routed according to the IGP 908 shortest-path to the PW endpoint. 910 8.3. Incoming Active SID is a BSID 912 Let us assume that headend H has a valid SR Policy P of SID-List and BSID B. 915 When H receives a packet K with label stack , H pops B and 916 pushes and forwards the resulting packet according to 917 SID S1. 919 "Forwarding the resulting packet according to S1" means: If S1 is 920 an Adj SID or a PHP-enabled prefix SID advertised by a neighbor, H 921 sends the resulting packet with label stack on 922 the outgoing interface associated with S1; Else H sends the 923 resulting packet with label stack along the 924 path of S1. 926 H has steered the packet in the SR policy P. 928 H did not have to classify the packet. The classification was done 929 by a node upstream of H (e.g., the source of the packet or an 930 intermediate ingress edge node of the SR domain) and the result of 931 this classification was efficiently encoded in the packet header as a 932 BSID. 934 This is another key benefit of the segment routing in general and the 935 binding SID in particular: the ability to encode a classification and 936 the resulting steering in the packet header to better scale and 937 simplify intermediate aggregation nodes. 939 If the SR Policy P is invalid, the BSID B is not in the forwarding 940 plane and hence the packet K is dropped by H. 942 8.4. Per-Destination Steering 944 Let us assume that headend H: 946 o learns a BGP route R/r via next-hop N, extended-color community C 947 and VPN label V. 948 o has a valid SR Policy P to (endpoint = N, color = C) of SID-List 949 and BSID B. 950 o has a BGP policy which matches on the extended-color community C 951 and allows its usage as an SRTE SLA steering information. 953 If all these conditions are met, H installs R/r in RIB/FIB with next- 954 hop = SR Policy P of BSID B instead of via N. 956 Indeed, H's local BGP policy and the received BGP route indicate that 957 the headend should associate R/r with an SRTE path to N with the SLA 958 associated with color C. The headend therefore installs the BGP 959 route on that policy. 961 This can be implemented by using the BSID as a generalized next-hop 962 and installing the BGP route on that generalized next-hop. 964 When H receives a packet K with a destination matching R/r, H pushes 965 the label stack and sends the resulting packet along 966 the path to S1. 968 Note that any SID associated with the BGP route is inserted after the 969 SID-List of the SR Policy (i.e., ). 971 The same behavior is applicable to any type of service route: any 972 AFI/SAFI of BGP ([ID.draft-ietf-idr-tunnel-encaps-07], [I.D.draft- 973 ietf-idr-segment-routing-te-policy]), any AFI/SAFI of LISP [RFC6830]. 975 8.4.1. Multiple Colors 977 When a BGP route has multiple extended-color communities each with a 978 valid SRTE policy, the BGP process installs the route on the SR 979 policy whose color is of highest numerical value. 981 Let us assume that headend H: 983 o learns a BGP route R/r via next-hop N, extended-color communities 984 C1 and C2 and VPN label V. 985 o has a valid SR Policy P1 to (endpoint = N, color = C1) of SID list 986 and BSID B1. 987 o has a valid SR Policy P2 to (endpoint = N, color = C2) of SID list 988 and BSID B2. 989 o has a BGP policy which matches on the extended-color communities 990 C1 and C2 and allows their usage as an SRTE SLA steering 991 information 993 If all these conditions are met, H installs R/r in RIB/FIB with next- 994 hop = SR Policy P2 of BSID=B2 (instead of N) because C2 > C1. 996 8.5. Recursion on an on-demand dynamic BSID 998 In the previous section, it was assumed that H had a pre-established 999 "explicit" SR Policy (endpoint N, color C). 1001 In this section, independently to the a-priori existence of any 1002 explicit candidate path of the SR policy (N, C), it is to be noted 1003 that the BGP process at node H triggers the SRTE process at node H to 1004 instantiate a dynamic candidate path for the SR policy (N, C) as soon 1005 as: 1007 o the BGP process learns of a route R/r via N and with color C. 1008 o a local policy at node H authorizes the on-demand SRTE path 1009 instantiation and maps the color to a dynamic SRTE path 1010 optimization template. 1012 8.5.1. Multiple Colors 1014 When a BGP route R/r via N has multiple extended-color communities Ci 1015 (with i=1 ... n), an individual on-demand SRTE dynamic path request 1016 (endpoint N, color Ci) is triggered for each color Ci. 1018 8.6. Per-Flow Steering 1020 Let us assume that head-end H: 1022 o has a valid SR Policy P1 to (endpoint = N, color = C1) of SID-List 1023 and BSID B1. 1024 o has a valid SR Policy P2 to (endpoint = N, color = C2) of SID-List 1025 and BSID B2. 1026 o is configured to instantiate an array of paths to N where the 1027 entry 0 is the IGP path to N, color C1 is the first entry and 1028 Color C2 is the second entry. The index into the array is called 1029 a Forwarding Class (FC). The index can have values 0 to 7. 1031 o is configured to match flows in its ingress interfaces (upon any 1032 field such as Ethernet destination/source/vlan/tos or IP 1033 destination/source/DSCP or transport ports etc.) and color them 1034 with an internal per-packet forwarding-class variable (0, 1 or 2 1035 in this example). 1037 If all these conditions are met, H installs in RIB/FIB: 1039 o N via a recursion on an array A (instead of the immediate outgoing 1040 link associated with the IGP shortest-path to N). 1041 o Entry A(0) set to the immediate outgoing link of the IGP shortest- 1042 path to N. 1043 o Entry A(1) set to SR Policy P1 of BSID=B1. 1044 o Entry A(2) set to SR Policy P2 of BSID=B2. 1046 H receives three packets K, K1 and K2 on its incoming interface. 1047 These three packets either longest-match on N or more likely on a 1048 BGP/service route which recurses on N. H colors these 3 packets 1049 respectively with forwarding-class 0, 1 and 2. As a result: 1051 o H forwards K along the shortest-path to N (which in SR-MPLS 1052 results in the pushing of the prefix-SID of N). 1053 o H pushes on packet K1 and forwards the resulting 1054 frame along the shortest-path to S1. 1055 o H pushes on packet K2 and forwards the resulting 1056 frame along the shortest-path to S4. 1058 If the local configuration does not specify any explicit forwarding 1059 information for an entry of the array, then this entry is filled with 1060 the same information as entry 0 (i.e. the IGP shortest-path). 1062 If the SR Policy mapped to an entry of the array becomes invalid, 1063 then this entry is filled with the same information as entry 0. When 1064 all the array entries have the same information as entry0, the 1065 forwarding entry for N is updated to bypass the array and point 1066 directly to its outgoing interface and next-hop. 1068 This realizes per-flow steering: different flows bound to the same 1069 BGP endpoint are steered on different IGP or SRTE paths. 1071 8.7. Policy-based Routing 1073 Finally, headend H may be configured with a local routing policy 1074 which overrides any BGP/IGP path and steer a specified packet on an 1075 SR Policy. This includes the use of mechanisms like IGP Shortcut for 1076 automatic routing of IGP prefixes over SR Policies intended for such 1077 purpose. 1079 8.8. Optional Steering Modes for BGP Destinations 1081 8.8.1. Color-Only BGP Destination Steering 1083 In the previous section, it is seen that the steering on an SR Policy 1084 is governed by the matching of the BGP route's next-hop N and the 1085 authorized color C with an SR Policy defined by the tuple (N, C). 1087 This is the most likely form of BGP destination steering and the one 1088 recommended for most use-cases. 1090 This section defines an alternative steering mechanism based only on 1091 the color. 1093 This color-only steering variation is governed by two new flags "C" 1094 and "O" defined in the color extended community [ref draft-ietf-idr- 1095 segment-routing-te-policy section 3]. 1097 The Color-Only flags "CO" are set to 00 by default. 1099 When 00, the BGP destination is steered as follows: 1101 IF there is a valid SR Policy (N, C) where N is the IPv4/v6 1102 endpoint address and C is a color; 1103 Steer into SR Policy (N, C); 1104 ELSE; 1105 Steer on the IGP path to the next-hop N. 1107 This is the classic case described in this document previously and 1108 what is recommended in most scenarios. 1110 When 01, the BGP destination is steered as follows: 1112 IF there is a valid SR Policy (N, C) where N is the IPv4/6 1113 endpoint address and C is a color; 1114 Steer into SR Policy (N, C); 1115 ELSE IF there is a valid SR Policy (null endpoint, C) of the 1116 same address-family of N; 1117 Steer into SR Policy (null endpoint, C); 1118 ELSE IF there is any valid SR Policy 1119 (any address-family null endpoint, C); 1120 Steer into SR Policy (any null endpoint, C); 1121 ELSE; 1122 Steer on the IGP path to the next-hop N. 1124 When 10, the BGP destination is steered as follows: 1126 IF there is a valid SR Policy (N, C) where N is an IPv4/6 1127 endpoint address and C is a color; 1128 Steer into SR Policy (N, C); 1129 ELSE IF there is a valid SR Policy (null endpoint, C) 1130 of the same address-family of N; 1131 Steer into SR Policy (null endpoint, C); 1132 ELSE IF there is any valid SR Policy 1133 (any address-family null endpoint, C); 1134 Steer into SR Policy (any null endpoint, C); 1135 ELSE IF there is any valid SR Policy (any endpoint, C) 1136 of the same address-family of N; 1137 Steer into SR Policy (any endpoint, C); 1138 ELSE IF there is any valid SR Policy 1139 (any address-family endpoint, C); 1140 Steer into SR Policy (any address-family endpoint, C); 1141 ELSE; 1142 Steer on the IGP path to the next-hop N. 1144 The null endpoint is 0.0.0.0 for IPv4 and ::0 for IPv6 (all bits set 1145 to the 0 value). 1147 The value 11 is reserved for future use and SHOULD NOT be used. Upon 1148 reception, an implementations MUST treat it like 00. 1150 8.8.2. Multiple Colors and CO flags 1152 The steering preference is first based on highest color value and 1153 then CO-dependent for the color. Assuming a Prefix via (NH, 1154 C1(CO=01), C2(CO=01)); C1>C2 The steering preference order is: 1156 o SR policy (NH, C1). 1157 o SR policy (null, C1). 1158 o SR policy (NH, C2). 1159 o SR policy (null, C2). 1160 o IGP to NH. 1162 8.8.3. Drop upon Invalid 1164 This document defined earlier that when all the following conditions 1165 are met, H installs R/r in RIB/FIB with next-hop = SR Policy P of 1166 BSID B instead of via N. 1168 o H learns a BGP route R/r via next-hop N, extended-color community 1169 C and VPN label V. 1170 o H has a valid SR Policy P to (endpoint = N, color = C) of SID-List 1171 and BSID B. 1172 o H has a BGP policy which matches on the extended-color community C 1173 and allows its usage as an SRTE SLA steering information. 1175 This behavior is extended by noting that the BGP policy may require 1176 the BGP steering to always stay on the SR policy whatever its 1177 validity. 1179 This is the "drop upon invalid" option described in section 10.2 1180 applied to BGP-based steering. 1182 9. Other type of SR Policies 1184 9.1. Layer 2 and Optical Transport 1186 1----2----3----4----5 1187 I2(lambda L241)\ / I4(lambda L241) 1188 Optical 1190 Figure 1: SR Policy with integrated DWDM 1192 An explicit candidate path can express a path through a transport 1193 layer beneath IP (ATM, FR, DWDM). The transport layer could be ATM, 1194 FR, DWDM, back-to-back Ethernet etc. The transport path is modelled 1195 as a link between two IP nodes with the specific assumption that no 1196 distributed IP routing protocol runs over the link. The link may 1197 have IP address or be IP unnumbered. Depending on the transport 1198 protocol case, the link can be a physical DWDM interface and a lambda 1199 (integrated solution), an Ethernet interface and a VLAN, an ATM 1200 interface with a VPI/VCI, a FR interface with a DLCI etc. 1202 Using the DWDM integrated use-case of Figure 1 as an illustration, 1203 let us assume 1205 o nodes 1, 2, 3, 4 and 5 are IP routers running an SR-enable IGP on 1206 the links 1-2, 2-3, 3-4 and 4-5. 1207 o The SRGB is homogeneous [16000, 24000]. 1208 o Node K's prefix SID is 16000+K. 1209 o node 2 has an integrated DWDM interface I2 with Lambda L1. 1210 o node 4 has an integrated DWDM interface I4 with Lamdda L2. 1211 o the optical network is provisioned with a circuit from 2 to 4 with 1212 continuous lambda L241 (details outside the scope of this 1213 document). 1214 o Node 2 is provisioned with an SR policy with SID list 1215 and Binding SID B where I2(L241) is of type 5 (IPv4) or type 7 1216 (IPv6), see section 4. 1217 o node 1 steers a packet P1 towards the prefix SID of node 5 1218 (16005). 1219 o node 1 steers a packet P2 on the SR policy <16002, B, 16005>. 1221 In such a case, the journey of P1 will be 1-2-3-4-5 while the journey 1222 of P2 will be 1-2-lambda(L241)-4-5. P2 skips the IP hop 3 and 1223 leverages the DWDM circuit from node 2 to node 4. P1 follows the 1224 shortest-path computed by the distributed routing protocol. The path 1225 of P1 is unaltered by the addition, modification or deletion of 1226 optical bypass circuits. 1228 The salient point of this example is that the SRTE architecture 1229 seamlessly support explicit candidate paths through any transport 1230 sub-layer. 1232 BGP-LS Extensions to describe the sub-IP-layer characteristics of the 1233 SR Policy are out of scope of this document (e.g. in Figure 1, the 1234 DWDM characteristics of the SR Policy at node 2 in terms of latency, 1235 loss, security, domain/country traversed by the circuit etc.). 1237 9.2. Spray SR Policy 1239 A Spray SRTE policy is a variant of an SRTE policy which involves 1240 packet replication. 1242 Any traffic steered into a Spray SR Policy is replicated along the 1243 SID-Lists of its selected path. 1245 In the context of a Spray SR Policy, the selected path SHOULD have 1246 more than one SID-List. The weights of the SID-Lists is not 1247 applicable for a Spray SR Policy. They MUST be set to 1. 1249 Like any SR policy, a Spray SR Policy has a BSID instantiated into 1250 the forwarding plane. 1252 Traffic is typically steered into a Spray SR Policy in two ways: 1254 o local policy-based routing at the headend of the policy. 1255 o remote classification and steering via the BSID of the Spray SR 1256 Policy. 1258 10. 50msec Local Protection 1260 10.1. Leveraging TI-LFA local protection of the constituent IGP 1261 segments 1263 In any topology, Topology-Independent LFA (TI-LFA) [I.D.draft- 1264 bashandy-rtgwg-segment-routing-ti-lfa] provides a 50msec local 1265 protection technique for IGP SIDs. The backup path is computed on a 1266 per IGP SID basis along the post-convergence path. 1268 In a network that has deployed TI-LFA, an SR Policy built on the 1269 basis of TI-LFA protected IGP segments leverage the local protection 1270 of the constituent segments. 1272 In a network that has deployed TI-LFA, an SR Policy instantiated only 1273 with non-protected Adj SIDs does not benefit from any local 1274 protection. 1276 10.2. Using an SR Policy to locally protect a link 1278 1----2-----6----7 1279 | | | | 1280 4----3-----9----8 1282 Figure 2: Local protection using SR Policy 1284 An SR Policy can be instantiated at node 2 to protect the link 2to6. 1285 A typical explicit SID list would be <3, 9, 6>. 1287 A typical use-case occurs for links outside an IGP domain: e.g. 1, 2, 1288 3 and 4 are part of IGP/SR sub-domain 1 while 6, 7, 8 and 9 are part 1289 of IGP/SR sub-domain 2. In such a case, links 2to6 and 3to9 cannot 1290 benefit from TI-LFA automated local protection. 1292 11. Other types of Segments 1294 The Segment Routing architecture specifies that any instruction can 1295 be bound to a segment. 1297 Similarly, an SR Policy can be composed of SIDs of any types. 1299 On top of the classic IGP SIDs, BGP SIDs and BSIDs, this section 1300 highlights the use of service SIDs and IGP-Flex-Alg SIDs. 1302 11.1. Service SID 1304 A Service Segment is a Segment associated with a service, either 1305 directly or via an SR proxy. A service may be a physical appliance 1306 running on dedicated hardware, a virtualized service inside an 1307 isolated environment such as a VM, container or namespace, or any 1308 process running on a compute element [I.D.draft-clad-spring-segment- 1309 routing-service-chaining]. 1311 An SR Policy can be composed of a mix of segments of various types: 1312 IGP segments, BGP segments, Binding SIDs and Service Segments. 1314 Similarly to other segments, service segments can be discovered via 1315 BGP-LS [I.D.draft-dawra-idr-bgp-sr-service-chaining]. 1317 11.2. Flex-Alg IGP SID 1319 1--RED--2-------6 1320 | | | 1321 4-------3--RED--9 1323 Figure 3: Illustration for Flex-Alg SID 1325 Let us assume that 1327 o 1, 2, 3 and 4 are part of IGP 1. 1328 o 2, 6, 9 and 3 are part of IGP 2. 1329 o All the IGP link costs are 10. 1330 o Links 1to2 and 3to9 are colored with IGP Link Affinity Red. 1331 o Flex-Alg1 is defined in both IGPs as: avoid red, minimize IGP 1332 metric. 1333 o All nodes of each IGP domain are enabled for FlexAlg1 1334 o SID(k, 0) represents the PrefixSID of node k according to Alg=0. 1335 o SID(k, FlexAlg1) represents the PrefixSID of node k according to 1336 Flex-Alg1. 1338 A controller can steer a flow from 1 to 9 through an end-to-end path 1339 that avoids the RED links of both IGP domains thanks to the explicit 1340 SR Policy . 1342 12. Binding SID to a tunnel 1344 A Binding SID can be bound to any type of tunnel: IP tunnel, GRE 1345 tunnel, IP/UDP tunnel, MPLS RSVP-TE tunnel, etc. 1347 13. Traffic Accounting 1349 This section describes counters for traffic accounting in segment 1350 routing networks. The essence of Segment Routing consists in scaling 1351 the network by only maintaining per-flow state at the source or edge 1352 of the network. Specifically, only the headend of an SR policy 1353 maintains the related per-policy state. Egress and Midpoints along 1354 the source route do not maintain any per-policy state. The traffic 1355 counters described in this section respects the architecture 1356 principles of SR, while given visibility to the service provider for 1357 network operation and capacity planning. The traffic counters are 1358 divided into four categories: interface counters, prefix counters, 1359 counters to measure the traffic (demand) matrix and SR policy 1360 counters at the policy head-end. 1362 13.1. Traffic Counters Naming convention 1364 The section uses the following naming convention when referring to 1365 the various counters. This is done in order to assign mnemonic names 1366 to SR counters. 1368 o The term counter(s) in all of the definitions specified in this 1369 document refers either to the (packet, byte) counters or the byte 1370 counter. 1371 o SR: any traffic whose FIB lookup is a segment (IGP prefix/Adj 1372 segments, BGP segments, any type of segments) or the matched FIB 1373 entry is steered on an SR Policy. 1374 o INT in name indicates a counter is implemented at a per interface 1375 level. 1376 o E in name refers to egress direction (with respect to the traffic 1377 flow). 1378 o I in name refers to ingress direction (with respect to the traffic 1379 flow). 1380 o TC in name indicates a counter is implemented on a Traffic Class 1381 (TC) basis. 1382 o TM in name refers to a Traffic Matrix (TM) counter. 1383 o PRO in name indicates that the counter is implemented on per 1384 protocol/adjacency type basis. Per PRO counters in this document 1385 can either be accounts for: 1387 * LAB (Labelled Traffic): the matched FIB entry is a segment, and 1388 the outgoing packet has at least one label (that label does not 1389 have to be a segment label, e.g., the label may be a VPN 1390 label). 1391 * V4 (IPv4 Traffic): the matched FIB entry is a segment which is 1392 PoP'ed. The outgoing packet is IPv4. 1393 * V6 (IPv6 Traffic): the matched FIB entry is a segment which is 1394 PoP'ed. The outgoing packet is IPv6. 1395 o POL in name refers to a Policy counter. 1396 o BSID in name indicates a policy counter for labelled traffic. 1397 o SL in name indicates a policy counter is implemented at a Segment- 1398 List (SL) level. 1400 Counter nomenclature is exemplified using the following example: 1402 o SR.INT.E.PRO: Per-interface per-protocol aggregate egress SR 1403 traffic. 1404 o POL.BSID: Per-SR Policy labelled steered aggregate traffic 1405 counter. 1407 13.2. Per-Interface SR Counters 1409 For each local interface, node N maintains the following per- 1410 interface SR counters. These counters include accounting due to 1411 push, pop or swap operations on SR traffic. 1413 13.2.1. Per interface, per protocol aggregate egress SR traffic 1414 counters (SR.INT.E.PRO) 1416 The following counters are included under this category. 1418 o SR.INT.E.LAB: For each egress interface (INT.E), N MUST maintain 1419 counter(s) for the aggregate SR traffic forwarded over the (INT.E) 1420 interface as labelled traffic. 1421 o SR.INT.E.V4: For each egress interface (INT.E), N MUST maintain 1422 counter(s) for the aggregate SR traffic forwarded over the (INT.E) 1423 interface as IPv4 traffic (due to the pop operation). 1424 o SR.INT.E.V6: For each egress interface (INT.E), N MUST maintain 1425 counter(s) for the aggregate SR traffic forwarded over the (INT.E) 1426 interface as IPv6 traffic (due to the pop operation). 1428 13.2.2. Per interface, per traffic-class, per protocol aggregate egress 1429 SR traffic counters (SR.INT.E.PRO.TC) 1431 This counter provides per Traffic Class (TC) breakdown of 1432 SR.INT.E.PRO. The following counters are included under this 1433 category. 1435 o SR.INT.E.LAB.TC: For each egress interface (INT.E) and a given 1436 Traffic Class (TC), N SHOULD maintain counter(s) for the aggregate 1437 SR traffic forwarded over the (INT.E) interface as labelled 1438 traffic. 1439 o SR.INT.E.V4.TC: For each egress interface (INT.E) and a given 1440 Traffic Class (TC), N SHOULD maintain counter(s) for the aggregate 1441 SR traffic forwarded over the (INT.E) interface as IPv4 traffic 1442 (due to the pop operation). 1443 o SR.INT.E.V6.TC: For each egress interface (INT.E) and a given 1444 Traffic Class (TC), N SHOULD maintain counter(s) for the aggregate 1445 SR traffic forwarded over the (INT.E) interface as IPv6 traffic 1446 (due to the pop operation). 1448 13.2.3. Per interface aggregate ingress SR traffic counter (SR.INT.I) 1450 The SR.INT.I counter is defined as follows: 1452 For each ingress interface (INT.I), N SHOULD maintain counter(s) for 1453 the aggregate SR traffic received on I. 1455 13.2.4. Per interface, per TC aggregate ingress SR traffic counter 1456 (SR.INT.I.TC) 1458 This counter provides per Traffic Class (TC) breakdown of the 1459 SR.INT.I. It is defined as follow: 1461 For each ingress interface (INT.I) and a given Traffic Class (TC), N 1462 MAY maintain counter(s) for the aggregate SR traffic (matching the 1463 traffic class TC criteria) received on I. 1465 13.3. Prefix SID Counters 1467 For a remote prefix SID S, node N maintains the following prefix SID 1468 counters. These counters include accounting due to push, pop or swap 1469 operations on the SR traffic. 1471 13.3.1. Per-prefix SID egress traffic counter (PSID.E) 1473 This counter is defined as follows: 1475 For a remote prefix SID S, N MUST maintain counter(s) for aggregate 1476 traffic forwarded towards S. 1478 13.3.2. Per-prefix SID per-TC egress traffic counter (PSID.E.TC) 1480 This counter provides per Traffic Class (TC) breakdown of PSID.E. It 1481 is defined as follows: 1483 For a given Traffic Class (TC) and a remote prefix SID S, N SHOULD 1484 maintain counter(s) for traffic forwarded towards S. 1486 13.3.3. Per-prefix SID, per egress interface traffic counter 1487 (PSID.INT.E) 1489 This counter is defined as follows: 1491 For a given egress interface (INT.E) and a remote prefix SID S, N 1492 SHOULD maintain counter(s) for traffic forwarded towards S over the 1493 (INT.E) interface. 1495 13.3.4. Per-prefix SID per TC per egress interface traffic counter 1496 (PSID.INT.E.TC) 1498 This counter provides per Traffic Class (TC) breakdown of PSID.INT.E. 1499 It is defined as follows: 1501 For a given Traffic Class (TC), an egress interface (INT.E) and a 1502 remote prefix SID S, N MAY maintain counter(s) for traffic forwarded 1503 towards S over the (INT.E) interface. 1505 13.3.5. Per-prefix SID, per ingress interface traffic counter 1506 (PSID.INT.I) 1508 This counter is defined as follows: 1510 For a given ingress interface (INT.I) and a remote prefix SID S, N 1511 MAY maintain counter(s) for the traffic received on I and forwarded 1512 towards S. 1514 13.3.6. Per-prefix SID, per TC, per ingress interface traffic counter 1515 (PSID.INT.I.TC) 1517 This counter provides per Traffic Class (TC) breakdown of PSID.INT.I. 1518 It is defined as follows: 1520 For a given Traffic Class (TC), ingress interface (INT.I), and a 1521 remote prefix SID S, N MAY maintain counter(s) for the traffic 1522 received on I and forwarded towards S. 1524 13.4. Traffic Matrix Counters 1526 A Traffic Matrix (TM) provides, for every ingress point N into the 1527 network and every egress point M out of the network, the volume of 1528 traffic T(N, M) from N to M over a given time interval. To measure 1529 the traffic matrix, nodes in an SR network designate its interfaces 1530 as either internal or external. 1532 When Node N receives a packet destined to remote prefix SID M, N 1533 maintains the following counters. These counters include accounting 1534 due to push, pop or swap operations. 1536 13.4.1. Per-Prefix SID Traffic Matrix counter (PSID.E.TM) 1538 This counter is defined as follows: 1540 For a given remote prefix SID M, N SHOULD maintain counter(s) for all 1541 the traffic received on any external interfaces and forwarded towards 1542 M. 1544 13.4.2. Per-Prefix, Per TC SID Traffic Matrix counter (PSID.E.TM.TC) 1546 This counter provides per Traffic Class (TC) breakdown of PSID.E.TM. 1547 It is defined as follows: 1549 For a given Traffic Class (TC) and a remote prefix SID M, N SHOULD 1550 maintain counter(s) for all the traffic received on any external 1551 interfaces and forwarded towards M. 1553 13.5. SR Policy Counters 1555 Per policy counters are only maintained at the policy head-end node. 1556 For each SR policy, the head-end node maintains the following 1557 counters. 1559 13.5.1. Per-SR Policy Aggregate traffic counter (POL) 1561 This counter includes both labelled and unlabelled steered traffic. 1562 It is defined as: 1564 For each SR policy (P), head-end node N MUST maintain counter(s) for 1565 the aggregate traffic steered onto P. 1567 13.5.2. Per-SR Policy labelled steered aggregate traffic counter 1568 (POL.BSID) 1570 This counter is defined as: 1572 For each SR policy (P), head-end node N SHOULD maintain counter(s) 1573 for the aggregate labelled traffic steered onto P. Please note that 1574 labelled steered traffic refers to incoming packets with an active 1575 SID matching a local BSID of an SR policy at the head-end. 1577 13.5.3. Per-SR Policy, per TC Aggregate traffic counter (POL.TC) 1579 This counter provides per Traffic Class (TC) breakdown of POL. It is 1580 defined as follows: 1582 For each SR policy (P) and a given Traffic Class (TC), head-end node 1583 N SHOULD maintain counter(s) for the aggregate traffic (matching the 1584 traffic class TC criteria) steered onto P. 1586 13.5.4. Per-SR Policy, per TC labelled steered aggregate traffic 1587 counter (POL.BSID.TC) 1589 This counter provides per Traffic Class (TC) breakdown of POL.BSID. 1590 It is defined as follows: 1592 For each SR policy (P) and a given Traffic Class (TC), head-end node 1593 N MAY maintain counter(s) for the aggregate labelled traffic steered 1594 onto P. 1596 13.5.5. Per-SR Policy, Per-Segment-List Aggregate traffic counter 1597 (POL.SL) 1599 This counter is defined as: 1601 For each SR policy (P) and a given Segment-List (SL), head-end node N 1602 SHOULD maintain counter(s) for the aggregate traffic steered onto the 1603 Segment-List (SL) of P. 1605 13.5.6. Per-SR Policy, Per-Segment-List labelled steered aggregate 1606 traffic counter (POL.SL.BSID) 1608 This counter is defined as: 1610 For each SR policy (P) and a given Segment-List (SL), head-end node N 1611 MAY maintain counter(s) for the aggregate labelled traffic steered 1612 onto the Segment-List SL of P. Please note that labelled steered 1613 traffic refers to incoming packets with an active SID matching a 1614 local BSID of an SR policy at the head-end. 1616 14. Appendix A 1618 14.1. SRTE headend architecture 1620 +--------+ +--------+ 1621 | BGP | | PCEP | 1622 +--------+ +--------+ 1623 \ / 1624 +--------+ +--------+ +--------+ 1625 | CLI |--| SRTE |--| NETCONF| 1626 +--------+ +--------+ +--------+ 1627 | 1628 +--------+ 1629 | FIB | 1630 +--------+ 1632 Figure 4: SRTE Architecture at a Headend 1634 The SRTE functionality at a headend can be implemented in an SRTE 1635 process as illustrated in Figure 1. 1637 The SRTE process interacts with other processes to learn candidate 1638 paths. 1640 The SRTE process selects the active path of an SR Policy. 1642 The SRTE process interacts with the RIB/FIB process to install an 1643 active SR Policy in the dataplane. 1645 In order to validate explicit candidate paths and compute dynamic 1646 candidate paths, the SRTE process maintains an SRTE-DB. The SRTE 1647 process interacts with other processes (Figure 2) to collect the 1648 SRTE-DB information. 1650 +--------+ +--------+ 1651 | BGP-LS | | IGP | 1652 +--------+ +--------+ 1653 \ / 1654 +--------+ +--------+ +--------+ 1655 | PCEP |--| SRTE |--| NETCONF| 1656 +--------+ +--------+ +--------+ 1658 Figure 5: Topology/link-state database architecture 1660 The SRTE architecture supports both centralized and distributed 1661 control-plane. 1663 14.2. Distributed and/or Centralized Control Plane 1665 14.2.1. Distributed Control Plane within a single Link-State IGP area 1667 Consider a single-area IGP with per-link latency measurement and 1668 advertisement of the measured latency in the extended-TE IGP TLV. 1670 A head-end H is configured with a single dynamic candidate path for 1671 SR policy P with a low-latency optimization objective and endpoint E. 1673 Clearly the SRTE process at H learns the topology (and extended TE 1674 latency information) from the IGP and computes the solution SID list 1675 providing the low-latency path to E. 1677 No centralized controller is involved in such a deployment. 1679 The SRTE-DB at H only uses the Link-State DataBase (LSDB) provided by 1680 the IGP. 1682 14.2.2. Distributed Control Plane across several Link-State IGP areas 1684 Consider a domain D composed of two link-state IGP single-area 1685 instances (I1 and I2) where each sub-domain benefits from per-link 1686 latency measurement and advertisement of the measured latency in the 1687 related IGP. The link-state information of each IGP is advertised 1688 via BGP-LS towards a set of BGP-LS route reflectors (RR). H is a 1689 headend in IGP I1 sub-domain and E is an endpoint in IGP I2 sub- 1690 domain. 1692 Thanks to a BGP-LS session to any BGP-LS RR, H's SRTE process may 1693 learn the link-state information of the remote domain I2. H can thus 1694 compute the low-latency path from H to E as a solution SID list that 1695 spans the two domains I1 and I2. 1697 The SRTE-DB at H collects the LSDB from both sub-domains (I1 and I2). 1699 No centralized controller is required. 1701 14.2.3. Centralized Control Plane 1703 Considering the same domain D as in the previous section, let us know 1704 assume that H does not have a BGP-LS session to the BGP-LS RR's. 1705 Instead, let us assume a controller "C" has at least one BGP-LS 1706 session to the BGP-LS RR's. 1708 The controller C learns the topology and extended latency information 1709 from both sub-domains via BGP-LS. It computes a low-latency path 1710 from H to E as a SID list and programs H with the 1711 related explicit candidate path. 1713 The headend H does not compute the solution SID list (it cannot). 1714 The headend only validates the received explicit candidate path. 1715 Most probably, the controller encodes the SID's of the SID-List with 1716 Type-1. In that case, The headend's validation simply consists in 1717 resolving the first SID on an outgoing interface and next-hop. 1719 The SRTE-DB at H only uses the LSDB provided by the IGP I1. 1721 The SRTE-DB of the controller collects the LSDB from both sub- 1722 domains(I1 and I2). 1724 14.2.4. Distributed and Centralized Control Plane 1726 Consider the same domain D as in the previous section. 1728 H's SRTE process is configured to associate color C1 with a low- 1729 latency optimization objective. 1731 H's BGP process is configured to steer a Route R/r of extended-color 1732 community C1 and of next-hop N via an SR policy (N, C1). 1734 Upon receiving a first BGP route of color C1 and of next-hop N, H 1735 recognizes the need for an SR Policy (N, C1) with a low-latency 1736 objective to N. As N is outside the SRTE DB of H, H requests a 1737 controller to compute such SID list (e.g., PCEP). 1739 This is an example of hybrid control-plane: the BGP distributed 1740 control plane signals the routes and their TE requirements. Upon 1741 receiving these BGP routes, a local headend either computes the 1742 solution SID list (entirely distributed when the endpoint is in the 1743 SRTE DB of the headend) else delegates the computation to a 1744 controller (hybrid distributed/centralized control-plane). 1746 The SRTE-DB at H only uses the LSDB provided by the IGP. 1748 The SRTE-DB of the controller collects the LSDB from both sub- 1749 domains. 1751 14.3. Examples of Candidate Path Selection 1753 Example 1: 1755 Consider headend H where two candidate paths of the same SR Policy 1756 are signaled via BGP and whose respective NLRIs 1757 have the same route distinguishers: 1759 NLRI A with distinguisher = RD1, color = C, endpoint = N, preference 1760 P1. 1762 NLRI B with distinguisher = RD2, color = C, endpoint = N, preference 1763 P2. 1765 o Because the NLRIs are identical (same distinguisher), BGP will 1766 perform bestpath selection. Note that there are no changes to BGP 1767 best path selection algorithm. 1768 o H installs one advertisement as bestpath into the BGP table. 1769 o A single advertisement is passed to the SRTE process. 1770 o SRTE process does not perform any path selection. 1772 Note that the candidate path's preference value does not have any 1773 effect on the BGP bestpath selection process. 1775 Example 2: 1777 Consider headend H where two candidate paths of the same SR Policy 1778 are signaled via BGP and whose respective NLRIs 1779 have different route distinguishers: 1781 NLRI A with distinguisher = RD1, color = C, endpoint = N, preference 1782 P1. 1784 NLRI B with distinguisher = RD2, color = C, endpoint = N, preference 1785 P2. 1787 o Because the NLRIs are different (different distinguisher), BGP 1788 will not perform bestpath selection. 1789 o H installs both advertisements into the BGP table. 1790 o Both advertisements are passed to the SRTE process. 1791 o SRTE process at H selects the candidate path advertised by NLRI B 1792 as the active path for the SR policy since P2 is greater than P1. 1794 Note that the recommended approach is to use NLRIs with different 1795 distinguishers when several candidate paths for the same SR Policy 1796 (endpoint, color) are signaled via BGP to a headend. 1798 Example 3: 1800 Consider that a headend H learns two candidate paths of the same SR 1801 Policy one signaled via BGP and another via Local 1802 configuration. 1804 NLRI A with distinguisher = RD1, color = C, endpoint = N, preference 1805 P1. 1807 Local "foo" with color = C, endpoint = N, preference P2. 1809 o H installs NLRI A into the BGP table. 1810 o NLRI A and "foo" are both passed to the SRTE process. 1811 o SRTE process at H selects the candidate path indicated by "foo" as 1812 the active path for the SR policy since P2 is greater than P1. 1814 When an SR Policy has multiple valid candidate paths with the same 1815 best preference, the SRTE process at a headend uses the rules 1816 described in section 2.9 to select the active path as explained in 1817 the following examples: 1819 Example 4: 1821 Consider headend H with two candidate paths of the same SR Policy 1822 and the same preference value both received from 1823 the same controller R and where RD2 is higher than RD1 1825 o NLRI A with distinguisher RD1, color C, endpoint N, preference 1826 P1(selected as active path at time t0). 1827 o NLRI B with distinguisher RD2 (RD2 is greater than RD1), color C, 1828 endpoint N, preference P1 (passed to SRTE process at time t1). 1830 After t1, SRTE process at H selects candidate path associated with 1831 NLRI B as active path of the SR policy since RD2 is higher than RD1. 1833 Note that, in such a scenario where there are redundant sessions to 1834 the same controller, the recommended approach is to use the same RD 1835 value for conveying the same candidate paths and let the BGP best 1836 path algorithm pick the best path. 1838 Example 5: 1840 Consider headend H with two candidate paths of the same SR Policy 1841 and the same preference value both received from 1842 the same controller R and where RD2 is higher than RD1. 1844 Consider also that headend H is configured to override the 1845 discriminator tiebreaker specified in section 2.9 1847 o NLRI A with distinguisher RD1, color C, endpoint N, preference P1 1848 (selected as active path at time t0). 1849 o NLRI B with distinguisher RD2, color C, endpoint N, preference P1 1850 (passed to SRTE process at time t1). 1852 Even after t1, SRTE process at H retains candidate path associated 1853 with NLRI A as active path of the SR policy since the discriminator 1854 tiebreaker is disabled at H. 1856 Example 6: 1858 Consider headend H with two candidate paths of the same SR Policy 1859 and the same preference value. 1861 o Local "foo" with color C, endpoint N, preference P1 (selected as 1862 active path at time t0). 1863 o NLRI A with distinguisher RD1, color C, endpoint N, preference P1 1864 (passed to SRTE process at time t1). 1866 Even after t1, SRTE process at H retains candidate path associated 1867 with local candidate path "foo" as active path of the SR policy since 1868 the Local protocol is preferred over BGP by default based on its 1869 higher protocol identifier value. 1871 Example 7: 1873 Consider headend H with two candidate paths of the same SR Policy 1874 and the same preference value but received via 1875 NETCONF from two controllers R and S (where S > R) 1877 o Path A from R with distinguisher D1, color C, endpoint N, 1878 preference P1 (selected as active path at time t0). 1880 o Path B from S with distinguisher D2, color C, endpoint N, 1881 preference P1 (passed to SRTE process at time t1). 1883 Note that the NETCONF process sends both paths to the SRTE process 1884 since it does not have any tiebreaker logic. After t1, SRTE process 1885 at H selects candidate path associated with Path B as active path of 1886 the SR policy. 1888 14.4. More on Dynamic Path 1890 14.4.1. Optimization Objective 1892 This document defines two optimization objectives: 1894 o Min-Metric - requests computation of a solution SID-List optimized 1895 for a selected metric. 1896 o Min-Metric with margin and maximum number of SIDs - Min-Metric 1897 with two changes: a margin of by which two paths with similar 1898 metrics would be considered equal, a constraint on the max number 1899 of SIDs in the SID-List. 1901 The "Min-Metric" optimization objective requests to compute a 1902 solution SID-List such that packets flowing through the solution SID- 1903 List use ECMP-aware paths optimized for the selected metric. The 1904 "Min-Metric" objective can be instantiated for the IGP metric xor the 1905 TE metric xor the latency extended TE metric. This metric is called 1906 the O metric (the optimized metric) to distinguish it from the IGP 1907 metric. The solution SID-List must be computed to minimize the 1908 number of SIDs and the number of SID-Lists. 1910 If the selected O metric is the IGP metric and the headend and 1911 tailend are in the same IGP domain, then the solution SID-List is 1912 made of the single prefix-SID of the tailend. 1914 When the selected O metric is not the IGP metric, then the solution 1915 SID-List is made of prefix SIDs of intermediate nodes, Adjacency SIDs 1916 along intermediate links and potentially BSIDs of intermediate 1917 policies. 1919 In many deployments there are insignificant metric differences 1920 between mostly equal path (e.g. a difference of 100 usec of latency 1921 between two paths from NYC to SFO would not matter in most cases). 1922 The "Min-Metric with margin" objective supports such requirement. 1924 The "Min-Metric with margin and maximum number of SIDs" optimization 1925 objective requests to compute a solution SID-List such that packets 1926 flowing through the solution SID-List do not use a path whose 1927 cumulative O metric is larger than the shortest-path O metric + 1928 margin. 1930 If this is not possible because of the number of SIDs constraint, 1931 then the solution SID-List minimizes the O metric while meeting the 1932 maximum number of SID constraints. 1934 14.4.2. Constraints 1936 The following constraints can be defined: 1938 o Inclusion and/or exclusion of TE affinity. 1939 o Inclusion and/or exclusion of IP address. 1940 o Inclusion and/or exclusion of SRLG. 1941 o Inclusion and/or exclusion of admin-tag. 1942 o Maximum accumulated metric (IGP, TE and latency). 1943 o Maximum number of SIDs in the solution SID-List. 1944 o Maximum number of weighted SID-Lists in the solution set. 1945 o Diversity to another service instance (e.g., link, node, or SRLG 1946 disjoint paths originating from different head-ends). 1948 14.4.3. SR Native Algorithm 1950 1----------------2----------------3 1951 |\ / 1952 | \ / 1953 | 4-------------5-------------7 1954 | \ /| 1955 | +-----------6-----------+ | 1956 8------------------------------9 1958 Figure 6: Illustration used to describe SR native algorithm 1960 Let us assume that all the links have the same IGP metric of 10 and 1961 let us consider the dynamic path defined as: Min-Metric(from 1, to 3, 1962 IGP metric, margin 0) with constraint "avoid link 2-to-3". 1964 A classical circuit implementation would do: prune the graph, compute 1965 the shortest-path, pick a single non-ECMP branch of the ECMP-aware 1966 shortest-path and encode it as a SID-List. The solution SID-List 1967 would be <4, 5, 7, 3>. 1969 An SR-native algorithm would find a SID-List that minimizes the 1970 number of SIDs and maximize the use of all the ECMP branches along 1971 the ECMP shortest path. In this illustration, the solution SID-List 1972 would be <7, 3>. 1974 In the vast majority of SR use-cases, SR-native algorithms should be 1975 preferred: they preserve the native ECMP of IP and they minimize the 1976 dataplane header overhead. 1978 In some specific use-case (e.g. TDM migration over IP where the 1979 circuit notion prevails), one may prefer a classic circuit 1980 computation followed by an encoding into SIDs (potentially only using 1981 non-protected Adj SIDs to reflect the TDM paradigm). 1983 SR-native algorithms are a local node behavior and are thus outside 1984 the scope of this document. 1986 14.4.4. Path to SID 1988 Let us assume the below diagram where all the links have an IGP 1989 metric of 10 and a TE metric of 10 except the link AB which has an 1990 IGP metric of 20 and the link AD which has a TE metric of 100. Let 1991 us consider the min-metric(from A, to D, TE metric, margin 0). 1993 B---C 1994 | | 1995 A---D 1997 Figure 7: Illustration used to describe path to SID conversion 1999 The solution path to this problem is ABCD. 2001 This path can be expressed in SIDs as where B and D are the 2002 IGP prefix SIDs respectively associated with nodes B and D in the 2003 diagram. 2005 Indeed, from A, the IGP path to B is AB (IGP metric 20 better than 2006 ADCB of IGP metric 30). From B, the IGP path to D is BCD (IGP metric 2007 20 better than BAD of IGP metric 30). 2009 While the details of the algorithm remain a local node behavior, a 2010 high-level description follows: start at the headend and find an IGP 2011 prefix SID that leads as far down the desired path as 2012 possible(without using any link not included in the desired path). 2013 If no prefix SID exists, use the Adj SID to the first neighbor along 2014 the path. Restart from the node that was reached. 2016 14.5. Benefits of Binding SID 2018 The Binding SID (BSID) is fundamental to Segment Routing. It 2019 provides scaling, network opacity and service independence. 2021 A---DCI1----C----D----E----DCI3---H 2022 / | | \ 2023 S | | Z 2024 \ | | / 2025 B---DCI2----F---------G----DCI4---K 2026 <==DC1==><=========Core========><==DC2==> 2028 Figure 8: A Simple Datacenter Topology 2030 A simplified illustration is provided on the basis of the previous 2031 diagram where it is assumed that S, A, B, Data Center Interconnect 2032 DCI1 and DCI2 share the same IGP-SR instance in the data-center 1 2033 (DC1). DCI1, DCI2, C, D, E, F, G, DCI3 and DCI4 share the same IGP- 2034 SR domain in the core. DCI3, DCI4, H, K and Z share the same IGP-SR 2035 domain in the data-center 2 (DC2). 2037 In this example, it is assumed no redistribution between the IGP's 2038 and no presence of BGP. The inter-domain communication is only 2039 provided by SR through SR Policies. 2041 The latency from S to DCI1 equals to DCI2. The latency from Z to 2042 DCI3 equals to DCI4. All the intra-DC links have the same IGP metric 2043 10. 2045 The path DCI1, C, D, E, DCI3 has a lower latency and lower capacity 2046 than the path DCI2, F, G, DCI4. 2048 The IGP metrics of all the core links are set to 10 except the links 2049 D-E which is set to 100. 2051 A low-latency multi-domain policy from S to Z may be expressed as 2052 where: 2054 o DCI1 is the prefix SID of DCI1. 2055 o BSID is the Binding SID bound to an SR policy 2056 instantiated at DCI1. 2057 o Z is the prefix SID of Z. 2059 Without the use of an intermediate core SR Policy (efficiently 2060 summarized by a single BSID), S would need to steer its low-latency 2061 flow into the policy . 2063 The use of a BSID (and the intermediate bound SR Policy) decreases 2064 the number of segments imposed by the source. 2066 A BSID acts as a stable anchor point which isolates one domain from 2067 the churn of another domain. Upon topology changes within the core 2068 of the network, the low-latency path from DCI1 to DCI3 may change. 2069 While the path of an intermediate policy changes, its BSID does not 2070 change. Hence the policy used by the source does not change, hence 2071 the source is shielded from the churn in another domain. 2073 A BSID provides opacity and independence between domains. The 2074 administrative authority of the core domain may not want to share 2075 information about its topology. The use of a BSID allows keeping the 2076 service opaque. S is not aware of the details of how the low-latency 2077 service is provided by the core domain. S is not aware of the need 2078 of the core authority to temporarily change the intermediate path. 2080 14.6. Centralized Discovery of available SID in SRLB 2082 This section explains how controllers can discover the local SIDs 2083 available at a node N so as to pick an explicit BSID for a SR Policy 2084 to be instantiated at headend N. 2086 Any controller can discover the following properties of a node N 2087 (e.g., via BGP-LS, NETCONF etc.): 2089 o its local Segment Routing Label Block (SRLB). 2090 o its local topology. 2091 o its topology-related SIDs (Adj SID and EPE SID). 2092 o its SR Policies and their BSID 2093 ([I-D.ietf-idr-te-lsp-distribution]). 2095 Any controller can thus infer the available SIDs in the SRLB of any 2096 node. 2098 As an example, a controller discovers the following characteristics 2099 of N: SRLB [4000, 8000], 3 Adj SIDs (4001, 4002, 4003), 2 EPE SIDs 2100 (4004, 4005) and 3 SRTE policies (whose BSIDs are respectively 4006, 2101 4007 and 4008). This controller can deduce that the SRLB sub-range 2102 [4009, 5000] is free for allocation. 2104 A controller is not restricted to use the next numerically available 2105 SID in the available SRLB sub-range. It can pick any label in the 2106 subset of available labels. This random pick make the chance for a 2107 collision unlikely. 2109 An operator could also sub-allocate the SRLB between different 2110 controllers (e.g. [4000-4499] to controller 1 and [4500-5000] to 2111 controller 2). 2113 Inter-controller state-synchronization may be used to avoid/detect 2114 collision in BSID. 2116 All these techniques make the likelihood of a collision between 2117 different controllers very unlikely. 2119 In the unlikely case of a collision, the controllers will detect it 2120 through system alerts, BGP-LS reporting 2121 ([I-D.ietf-idr-te-lsp-distribution]) or PCEP notification. They then 2122 have the choice to continue the operation of their SR Policy with the 2123 dynamically allocated BSID or re-try with another explicit pick. 2125 Note: in deployments where PCE Protocol (PCEP) is used between head- 2126 end and controller (PCE), a head-end can report BSID as well as 2127 policy attributes (e.g., type of disjointness) and operational and 2128 administrative states to controller. Similarly, a controller can 2129 also assign/update the BSID of a policy via PCEP when instantiating 2130 or updating SR Policy. 2132 15. Acknowledgement 2134 The authors like to thank Tarek Saad and Dhanendra Jain for their 2135 valuable comments and suggestions. 2137 16. Normative References 2139 [GLOBECOM] 2140 Filsfils, C., Nainar, N., Pignataro, C., Cardona, J., and 2141 P. Francois, "The Segment Routing Architecture, IEEE 2142 Global Communications Conference (GLOBECOM)", 2015. 2144 [I-D.ietf-idr-te-lsp-distribution] 2145 Previdi, S., Dong, J., Chen, M., Gredler, H., and J. 2146 Tantsura, "Distribution of Traffic Engineering (TE) 2147 Policies and State using BGP-LS", draft-ietf-idr-te-lsp- 2148 distribution-08 (work in progress), December 2017. 2150 [I-D.ietf-isis-segment-routing-extensions] 2151 Previdi, S., Ginsberg, L., Filsfils, C., Bashandy, A., 2152 Gredler, H., Litkowski, S., Decraene, B., and J. Tantsura, 2153 "IS-IS Extensions for Segment Routing", draft-ietf-isis- 2154 segment-routing-extensions-15 (work in progress), December 2155 2017. 2157 [I-D.ietf-pce-pce-initiated-lsp] 2158 Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "PCEP 2159 Extensions for PCE-initiated LSP Setup in a Stateful PCE 2160 Model", draft-ietf-pce-pce-initiated-lsp-11 (work in 2161 progress), October 2017. 2163 [I-D.ietf-pce-segment-routing] 2164 Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., 2165 and J. Hardwick, "PCEP Extensions for Segment Routing", 2166 draft-ietf-pce-segment-routing-11 (work in progress), 2167 November 2017. 2169 [I-D.ietf-pce-stateful-pce] 2170 Crabbe, E., Minei, I., Medved, J., and R. Varga, "PCEP 2171 Extensions for Stateful PCE", draft-ietf-pce-stateful- 2172 pce-21 (work in progress), June 2017. 2174 [I-D.ietf-spring-segment-routing] 2175 Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., 2176 Litkowski, S., and R. Shakir, "Segment Routing 2177 Architecture", draft-ietf-spring-segment-routing-15 (work 2178 in progress), January 2018. 2180 [I-D.previdi-idr-segment-routing-te-policy] 2181 Previdi, S., Filsfils, C., Mattes, P., Rosen, E., and S. 2182 Lin, "Advertising Segment Routing Policies in BGP", draft- 2183 previdi-idr-segment-routing-te-policy-07 (work in 2184 progress), June 2017. 2186 [I-D.sivabalan-pce-binding-label-sid] 2187 Sivabalan, S., Filsfils, C., Previdi, S., Tantsura, J., 2188 Hardwick, J., and D. Dhody, "Carrying Binding Label/ 2189 Segment-ID in PCE-based Networks.", draft-sivabalan-pce- 2190 binding-label-sid-03 (work in progress), July 2017. 2192 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2193 Requirement Levels", BCP 14, RFC 2119, 2194 DOI 10.17487/RFC2119, March 1997, 2195 . 2197 [SIGCOMM] Hartert, R., Vissicchio, S., Schaus, P., Bonaventure, O., 2198 Filsfils, C., Telkamp, T., and P. Francois, "A Declarative 2199 and Expressive Approach to Control Forwarding Paths in 2200 Carrier-Grade Networks, ACM SIGCOMM", 2015. 2202 Authors' Addresses 2204 Clarence Filsfils 2205 Cisco Systems, Inc. 2206 Pegasus Parc 2207 De kleetlaan 6a, DIEGEM BRABANT 1831 2208 BELGIUM 2210 Email: cfilsfil@cisco.com 2212 Siva Sivabalan 2213 Cisco Systems, Inc. 2214 2000 Innovation Drive 2215 Kanata, Ontario K2K 3E8 2216 Canada 2218 Email: msiva@cisco.com 2220 Kamran Raza 2221 Cisco Systems, Inc. 2222 2000 Innovation Drive 2223 Kanata, Ontario K2K 3E8 2224 Canada 2226 Email: skraza@cisco.com 2228 Jose Liste 2229 Cisco Systems, Inc. 2230 821 Alder Drive 2231 Milpitas, California 95035 2232 USA 2234 Email: jliste@cisco.com 2236 Francois Clad 2237 Cisco Systems, Inc. 2239 Email: fclad@cisco.com 2241 Ketan Talaulikar 2242 Cisco Systems, Inc. 2244 Email: ketant@cisco.com 2245 Zafar Ali 2246 Cisco Systems, Inc. 2248 Email: zali@cisco.com 2250 Shraddha Hegde 2251 Juniper Networks, Inc. 2252 Embassy Business Park 2253 Bangalore, KA 560093 2254 India 2256 Email: shraddha@juniper.net 2258 Daniel Voyer 2259 Bell Canada. 2260 671 de la gauchetiere W 2261 Montreal, Quebec H3B 2M8 2262 Canada 2264 Email: daniel.voyer@bell.ca 2266 Steven Lin 2267 Google, Inc. 2269 Email: stevenlin@google.com 2271 Alex Bogdanov 2272 Google, Inc. 2274 Email: bogdanov@google.com 2276 Przemyslaw Krol 2277 Google, Inc. 2279 Email: pkrol@google.com 2281 Martin Horneffer 2282 Deutsche Telekom 2284 Email: martin.horneffer@telekom.de 2285 Dirk Steinberg 2286 Steinberg Consulting 2288 Email: dws@steinbergnet.net 2290 Bruno Decraene 2291 Orange Business Services 2293 Email: bruno.decraene@orange.com 2295 Stephane Litkowski 2296 Orange Business Services 2298 Email: stephane.litkowski@orange.com 2300 Paul Mattes 2301 Microsoft 2302 One Microsoft Way 2303 Redmond, WA 98052-6399 2304 USA 2306 Email: pamattes@microsoft.com