idnits 2.17.1 draft-filsfils-spring-segment-routing-use-cases-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 27, 2014) is 3676 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '100' on line 237 -- Looks like a reference, but probably isn't: '199' on line 237 -- Looks like a reference, but probably isn't: '200' on line 239 -- Looks like a reference, but probably isn't: '299' on line 239 -- Looks like a reference, but probably isn't: '300' on line 241 -- Looks like a reference, but probably isn't: '399' on line 241 -- Looks like a reference, but probably isn't: '400' on line 243 -- Looks like a reference, but probably isn't: '499' on line 243 -- Looks like a reference, but probably isn't: '500' on line 245 -- Looks like a reference, but probably isn't: '599' on line 245 -- Looks like a reference, but probably isn't: '600' on line 247 -- Looks like a reference, but probably isn't: '699' on line 247 ** Obsolete normative reference: RFC 5316 (Obsoleted by RFC 9346) == Outdated reference: A later version (-03) exists of draft-filsfils-spring-segment-routing-ldp-interop-00 == Outdated reference: A later version (-03) exists of draft-filsfils-spring-segment-routing-mpls-00 == Outdated reference: A later version (-15) exists of draft-ietf-i2rs-architecture-02 == Outdated reference: A later version (-13) exists of draft-ietf-idr-ls-distribution-04 == Outdated reference: A later version (-11) exists of draft-ietf-isis-te-metric-extensions-01 == Outdated reference: A later version (-11) exists of draft-ietf-pce-pce-initiated-lsp-00 == Outdated reference: A later version (-21) exists of draft-ietf-pce-stateful-pce-08 == Outdated reference: A later version (-05) exists of draft-psenak-ospf-segment-routing-extensions-04 == Outdated reference: A later version (-03) exists of draft-sivabalan-pce-segment-routing-02 Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Filsfils, Ed. 3 Internet-Draft Cisco Systems, Inc. 4 Intended status: Standards Track P. Francois, Ed. 5 Expires: September 28, 2014 IMDEA Networks 6 S. Previdi 7 Cisco Systems, Inc. 8 B. Decraene 9 S. Litkowski 10 Orange 11 M. Horneffer 12 Deutsche Telekom 13 I. Milojevic 14 Telekom Srbija 15 R. Shakir 16 British Telecom 17 S. Ytti 18 TDC Oy 19 W. Henderickx 20 Alcatel-Lucent 21 J. Tantsura 22 S. Kini 23 Ericsson 24 E. Crabbe 25 Google, Inc. 26 March 27, 2014 28 Segment Routing Use Cases 29 draft-filsfils-spring-segment-routing-use-cases-00 31 Abstract 33 Segment Routing (SR) leverages the source routing and tunneling 34 paradigms. A node steers a packet through a controlled set of 35 instructions, called segments, by prepending the packet with an SR 36 header. A segment can represent any instruction, topological or 37 service-based. SR allows to enforce a flow through any topological 38 path and service chain while maintaining per-flow state only at the 39 ingress node of the SR domain. 41 The Segment Routing architecture can be directly applied to the MPLS 42 dataplane with no change on the forwarding plane. It requires minor 43 extension to the existing link-state routing protocols. Segment 44 Routing can also be applied to IPv6 with a new type of routing 45 extension header. 47 Requirements Language 49 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 50 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 51 document are to be interpreted as described in RFC 2119 [RFC2119]. 53 Status of this Memo 55 This Internet-Draft is submitted in full conformance with the 56 provisions of BCP 78 and BCP 79. 58 Internet-Drafts are working documents of the Internet Engineering 59 Task Force (IETF). Note that other groups may also distribute 60 working documents as Internet-Drafts. The list of current Internet- 61 Drafts is at http://datatracker.ietf.org/drafts/current/. 63 Internet-Drafts are draft documents valid for a maximum of six months 64 and may be updated, replaced, or obsoleted by other documents at any 65 time. It is inappropriate to use Internet-Drafts as reference 66 material or to cite them other than as "work in progress." 68 This Internet-Draft will expire on September 28, 2014. 70 Copyright Notice 72 Copyright (c) 2014 IETF Trust and the persons identified as the 73 document authors. All rights reserved. 75 This document is subject to BCP 78 and the IETF Trust's Legal 76 Provisions Relating to IETF Documents 77 (http://trustee.ietf.org/license-info) in effect on the date of 78 publication of this document. Please review these documents 79 carefully, as they describe your rights and restrictions with respect 80 to this document. Code Components extracted from this document must 81 include Simplified BSD License text as described in Section 4.e of 82 the Trust Legal Provisions and are provided without warranty as 83 described in the Simplified BSD License. 85 Table of Contents 87 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 88 1.1. Companion Documents . . . . . . . . . . . . . . . . . . . 4 89 1.2. Editorial simplification . . . . . . . . . . . . . . . . . 5 90 2. IGP-based MPLS Tunneling . . . . . . . . . . . . . . . . . . . 5 91 3. Fast Reroute . . . . . . . . . . . . . . . . . . . . . . . . . 7 92 3.1. Protecting node and adjacency segments . . . . . . . . . . 7 93 3.2. Protecting a node segment upon the failure of its 94 advertising node . . . . . . . . . . . . . . . . . . . . . 8 95 3.2.1. Advertisement of the Mirroring Capability . . . . . . 10 96 3.2.2. Mirroring Table . . . . . . . . . . . . . . . . . . . 10 97 3.2.3. LFA FRR at the Point of Local Repair . . . . . . . . . 10 98 3.2.4. Modified IGP Convergence upon Node deletion . . . . . 11 99 3.2.5. Conclusions . . . . . . . . . . . . . . . . . . . . . 11 100 4. Traffic Engineering . . . . . . . . . . . . . . . . . . . . . 12 101 4.1. Traffic Engineering without Bandwidth Admission Control . 12 102 4.1.1. Anycast Node Segment . . . . . . . . . . . . . . . . . 12 103 4.1.2. Distributed CSPF-based Traffic Engineering . . . . . . 17 104 4.1.3. Egress Peering Traffic Engineering . . . . . . . . . . 18 105 4.1.4. Deterministic non-ECMP Path . . . . . . . . . . . . . 20 106 4.1.5. Load-balancing among non-parallel links . . . . . . . 21 107 4.2. Traffic Engineering with Bandwidth Admission Control . . . 22 108 4.2.1. Capacity Planning Process . . . . . . . . . . . . . . 22 109 4.2.2. SDN/SR use-case . . . . . . . . . . . . . . . . . . . 25 110 4.2.3. Residual Bandwidth . . . . . . . . . . . . . . . . . . 29 111 5. Service chaining . . . . . . . . . . . . . . . . . . . . . . . 29 112 6. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 113 6.1. Monitoring a remote bundle . . . . . . . . . . . . . . . . 30 114 6.2. Monitoring a remote peering link . . . . . . . . . . . . . 30 115 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 116 8. Manageability Considerations . . . . . . . . . . . . . . . . . 31 117 9. Security Considerations . . . . . . . . . . . . . . . . . . . 31 118 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 31 119 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31 120 11.1. Normative References . . . . . . . . . . . . . . . . . . . 31 121 11.2. Informative References . . . . . . . . . . . . . . . . . . 31 122 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 34 124 1. Introduction 126 The objective of this document is to illustrate the properties and 127 benefits of the SR architecture, through the documentation of various 128 SR use-cases. 130 Section 2 illustrates the ability to tunnel traffic towards remote 131 service points without any other protocol than the IGP. 133 Section 3 reports various FRR use-cases leveraging the SR 134 functionality. 136 Section 4 documents traffic-engineering use-cases, with and without 137 support of bandwidth admission control. 139 Section 5 documents the use of SR to perform service chaining. 141 Section 6 illustrates OAM use-cases. 143 1.1. Companion Documents 145 The main reference for this document is the SR architecture defined 146 in [I-D.filsfils-rtgwg-segment-routing]. 148 The SR instantiation in the MPLS dataplane is described in 149 [I-D.filsfils-spring-segment-routing-mpls]. 151 [I-D.filsfils-spring-segment-routing-ldp-interop] documents the co- 152 existence and interworking with MPLS Signaling protocols. 154 IS-IS protocol extensions for Segment Routing are described in 155 [I-D.previdi-isis-segment-routing-extensions]. 157 OSPF protocol extensions for Segment Routing are defined in 158 [I-D.psenak-ospf-segment-routing-extensions]. 160 Fast-Reroute for Segment Routing is described in 161 [I-D.francois-sr-frr]. 163 The PCEP protocol extensions for Segment Routing are defined in 164 [I-D.sivabalan-pce-segment-routing]. 166 The SR instantiation in the IPv6 dataplane will be described in a 167 future draft. 169 1.2. Editorial simplification 171 A unique index is allocated to each IGP Prefix Segment. The related 172 absolute segment associated to an IGP Prefix SID is determined by 173 summing the index and the base of the SRGB. In the SR architecture, 174 each node can be configured with a different SRGB and hence the 175 absolute SID associated to an IGP Prefix Segment can change from node 176 to node. 178 We have described the first use-case of this document in the most 179 generic way, i.e. with different SRGB at each node in the SR IGP 180 domain. We have detailed the packet path highlighting that the SID 181 of a Prefix Segment may change hop by hop. 183 For editorial simplification purpose, we will assume for all the 184 other use cases that the operator ensures a single consistent SRGB 185 across all the nodes in the SR IGP domain. In that case, all the 186 nodes associate the same absolute SID with the same index and hence 187 one can use the absolute SID value instead of the index to refer to a 188 Prefix SID. 190 Several operators have indicated that they would deploy the SR 191 technology in this way: with a single consistent SRGB across all the 192 nodes. They motivated their choice based on operational simplicity 193 (e.g. troubleshooting across different nodes). 195 While this document notes this operator feedback and we use this 196 deployment model to simplify the text, we highlight that the SR 197 architecture is not limited to this specific deployment use-case 198 (different nodes may have different SRGB thanks to the indexation of 199 Prefix SID's). 201 2. IGP-based MPLS Tunneling 203 SR, applied to the MPLS dataplane, offers the ability to tunnel 204 services (VPN, VPLS, VPWS) from an ingress PE to an egress PE, 205 without any other protocol than ISIS or OSPF. LDP and RSVP-TE 206 signaling protocols are not required. 208 The operator only needs to allocate one node segment per PE and the 209 SR IGP control-plane automatically builds the required MPLS 210 forwarding constructs from any PE to any PE. 212 P1---P2 213 / \ 214 A---CE1---PE1 PE2---CE2---Z 215 \ / 216 P4---P4 218 Figure 1: IGP-based MPLS Tunneling 220 In Figure 1 above, the four nodes A, CE1, CE2 and Z are part of the 221 same VPN. CE2 advertises to PE2 a route to Z. PE2 binds a local 222 label LZ to that route and propagates the route and its label via 223 MPBGP to PE1 with nhop 192.168.0.2. PE1 installs the VPN prefix Z in 224 the appropriate VRF and resolves the next-hop onto the node segment 225 associated with PE2. Upon receiving a packet from A destined to Z, 226 PE1 pushes two labels onto the packet: the top label is the Prefix 227 SID attached to 192.168.0.2/32, the bottom label is the VPN label LZ 228 attached to the VPN route Z. 230 The Prefix-SID attached to prefix 192.168.0.2 is a shared segment 231 within the IGP domain, as such it is indexed. 233 Let us assume that: 235 - the operator allocated the index 2 to the prefix 192.168.0.2/32 237 - the operator allocated SRGB [100, 199] at PE1 239 - the operator allocated SRGB [200, 299] at P1 241 - the operator allocated SRGB [300, 399] at P2 243 - the operator allocated SRGB [400, 499] at P3 245 - the operator allocated SRGB [500, 599] at P4 247 - the operator allocated SRGB [600, 699] at PE2 249 Thanks to this context, any SR-capable IGP node in the domain can 250 determine what is the segment associated with the Prefix-SID attached 251 to prefix 192.168.0.2/32: 253 - PE1's SID is 100+2=102 255 - P1's SID is 200+2=202 256 - P2's SID is 300+2=302 258 - P3's SID is 400+2=402 260 - P4's SID is 500+2=502 262 - PE2's SID is 600+2=602 264 Specifically to our example this means that PE1 load-balance the 265 traffic to VPN route Z between P1 and P4. The packets sent to P1 266 have a top label 202 while the packets sent to P4 have a top label 267 502. P1 swaps 202 for 302 and forwards to P2. P2 pops 302 and 268 forwards to PE2. The packets sent to P4 had label 502. P4 swaps 502 269 for 402 and forwards the packets to P3. P3 pops the top label and 270 forwards the packets to PE2. Eventually all the packets reached PE2 271 with one single lable: LZ, the VPN label attached to VPN route Z. 273 This scenario illustrates how supporting MPLS services (VPN, VPLS, 274 VPWS) with SR has the following benefits: 276 - Simple operation: one single intra-domain protocol to operate: 277 the IGP. No need to support IGP synchronization extensions as 278 described in [RFC5443] and [RFC6138]. 280 - Excellent scaling: one Node-SID per PE. 282 3. Fast Reroute 284 Segment Routing aims at supporting services with tight SLA guarantees 285 [I-D.filsfils-rtgwg-segment-routing]. To meet this goal, local 286 protection mechanisms can be useful to provide fast connectivity 287 restoration after the sudden failure of network components. 288 Protection mechanisms for segments aim at letting a point of local 289 repair (PLR) pre-compute and install state allowing to locally 290 recover the delivery of packets when the primary outgoing interface 291 corresponding to the protected active segment is down. 293 This section describes use-cases leading to the definition of 294 different protection mechanisms for node, adjacency, and service 295 segments to be supported by the SR architecture. 297 3.1. Protecting node and adjacency segments 299 Node and adjacency segments are used to determine the path that a 300 packet should follow from an ingress node to an egress node of the SR 301 domain or a service node. 303 Ensuring fast recovery of the packet delivery service may wear 304 different requirements depending on the application using the 305 segment. For this reason, the SR architecture should be able to 306 accomodate multiple protection mechanisms and provide means to the 307 operator to configure the protection scheme applied for the segments 308 that are advertised in the SR domain. 310 The operator may want to achieve fast recovery in case of failures 311 with as little management effort as possible, using a protection 312 mechanism provided by the Segment Routing architecture itself. In 313 this case, a Segment Routing node is in charge of discovering "by 314 default" protection paths for each of its adjacent network component, 315 with minimal operational impact. Approaches for such applications, 316 typically in line with classical IP-FRR solutions, are discussed in 317 [I-D.francois-sr-frr]. 319 The operator of a Segment Routing network may also have strict 320 policies on how a given network component should be protected against 321 failures. A typical case is the knowledge by an external controller 322 (or through any other tool used by the operator) of shared risk among 323 different components, which should not be used to protect each other. 324 An operator could notably use [I-D.sivabalan-pce-segment-routing] for 325 this purpose. 327 Third, some SR applications have strict requirements in terms of 328 guaranteed performance, disjointness in the infrastructure components 329 used for different services, or for redundant provisioning of such 330 services. An approach for providing resiliency in these contexts is 331 explained in [I-D.shakir-rtgwg-sr-performance-engineered-lsps]. It 332 is basically aiming at letting the ingress node in the SR domain be 333 in charge of the recovery of the Segment Routing paths that it uses 334 to support these services. 336 The protection behavior applied to a given SID must be advertised in 337 the routing information that is propagated in the SR domain for that 338 SID, e.g., in [I-D.previdi-isis-segment-routing-extensions]. Nodes 339 injecting traffic in the SR domain can hence select segments based on 340 the protection mechanism that is required for their application. 342 3.2. Protecting a node segment upon the failure of its advertising node 344 Service segments can also benefit from a fast restoration mechanism 345 provided by the SR architecture. 347 Referring to the below figure, let us assume: 349 A is identified by IP address 192.0.2.1/32 to which Node-SID 101 350 is attached. 352 B is identified by IP address 192.0.2.2/32 to which Node-SID 102 353 is attached 355 A and B host the same set of services. 357 Each service is identified by a local segment at each node: i.e. 358 node A allocates a local service segment 9001 to identify a 359 specific service S while the same service is identified by a local 360 service segment 9002 at B. Specifically, for the sake of this 361 illustration, let us assume that service S is a BGP-VPN service 362 where A announces a VPN route V with BGP nhop 192.0.2.1/32 and 363 local VPN label 9001 and B announces the same VPN route V with BGP 364 nhop 192.0.2.2/32 and local VPN label 9002. 366 A generic mesh interconnects the three nodes M, Q and B. 368 N prefers to use the service S offered by A and hence sends its 369 S-destined traffic with segment list {101, 9001}. 371 Q is a node connected to A. 373 Q has a method to detect the loss of node A within a few 10's of 374 msec. 376 __ 377 { }---Q---A(service S) 378 N--M--{ } 379 {__}---B(service S) 381 Figure 2: Service Mirroring 383 In that context, we would like to protect the traffic destined to 384 service S upon the failure of node A. 386 The solution is built upon several components: 387 1. B advertises its mirroring capability for mirrored Node-SID 101 388 2. B pre-installs a mirroring table in order to process the 389 packets originally destined to 101. 390 3. Q and any neighbor of A pre-install the Mirror_FRR LFA 391 extension 392 4. All nodes implements a modified SRDB convergence upon Node-SID 393 101 deletion 395 3.2.1. Advertisement of the Mirroring Capability 397 B advertises a MIRROR sub-TLV in its IGP Link-State Router Capability 398 TLV with the values (TTT=000, MIRRORED_OBJECT=101, 399 CONTEXT_SEGMENT=10002),[I-D.filsfils-rtgwg-segment-routing], 400 [I-D.previdi-isis-segment-routing-extensions] and 401 [I-D.psenak-ospf-segment-routing-extensions] for more details in the 402 encodings. 404 Doing so, B advertises within the routing domain that it is willing 405 to backup any traffic originally sent to Node-SID 101 provided that 406 this rerouted traffic gets to B with the context segment 10002 407 directly preceding any local service segment advertised by A. 10002 408 is a local context segment allocated by B to identify traffic that 409 was originally meant for A. This allows B to match the subsequent 410 service segment (e.g. 9001) correctly. 412 3.2.2. Mirroring Table 414 We assume that B is able to discover all the local service segments 415 allocated by A (e.g. BGP route reflection and add-path). B maps all 416 the services advertised by A to its similar service representations. 417 For example, service 9001 advertised by A is mapped to service 9002 418 advertised by B as both relate to the same service S (the same VPN 419 route V). For example, B applies the same service treatment to a 420 packet received with top segments {102, 10002, 9001} or with top 421 segments {102, 9002}. Basically, B treats {10002, 9001} as a synonym 422 of {9002}. 424 3.2.3. LFA FRR at the Point of Local Repair 426 In advance of any failure of A, Q (and any other node connected to A) 427 learns the identity of the IGP Mirroring node for each Node-SID 428 advertised by A (MIRROR_TLV advertised by B) and pre-installs the 429 following new MIRROR_FRR entry: 430 - Trigger condition: the loss of nhop A 431 - Incoming active segment: 101 (a Node-SID advertised by A) 432 - Primary Segment processing: pop 101 433 - Backup Segment processing: pop 101, push {102, 10002} 434 - Primary nhop: A 435 - Backup nhop: primary path to node B 437 Upon detecting the loss of node A, Q intercepts any traffic destined 438 to Node-SID 101, pops the segment to A (101) and push a repair tunnel 439 {102, 10002}. Node-SID 102 steers the repaired traffic to B while 440 context segment 10002 allows B to process the following service 441 segment {9001} in the right context table. 443 3.2.4. Modified IGP Convergence upon Node deletion 445 Upon the failure of A, all the neighbors of A will flood the loss of 446 their adjacency to A and eventually every node within the IGP domain 447 will delete 192.0.2.1/32 from their RIB. 449 The RIB deletion of 192.0.2.1/32 at N is beneficial as it triggers 450 the BGP FRR Protection onto the precomputed backup next-hop 451 [I-D.rtgwg-bgp-pic]. 453 The RIB deletion at node M, if it occurs before the RIB deletion at 454 N, would be disastrous as it would lead to the loss of the traffic 455 from N to A before Q is able to apply the Mirroring protection. 457 The solution consists in delaying the deletion of the SRDB entry for 458 101 by 2 seconds while still deleting the IP RIB 192.0.2.1/32 entry 459 immediately. 461 The RIB deletion triggers the BGP FRR and BGP Convergence. This is 462 beneficial and must occur without delay. 464 The deletion of the SRDB entry to Node-SID101 is delayed to ensure 465 that the traffic still in transit towards Node-SID 101 is not 466 dropped. 468 The delay timer should be long enough to ensure that either the BGP 469 FRR or the BGP Convergence has taken place at N. 471 3.2.5. Conclusions 473 In our reference figure, N sends its packets towards A with the 474 segment list {101, 9001}. The shortest-path from S to A transits via 475 M and Q. 477 Within a few msec of the loss of A, Q activates its pre-installed 478 Mirror_FRR entry and reroutes the traffic to B with the following 479 segment list {102, 10002, 9001}. 481 Within a few 100's of msec, any IGP node deletes its RIB entry to A 482 but keeps its SRDB entry to Node-SID 101 for an extra 2 seconds. 484 Upon deleting its RIB entry to 192.0.2.1/32, N activates its BGP FRR 485 entry and reroutes its S destined traffic towards B with segment list 486 {102, 9002}. 488 By the time any IGP node deletes the SRDB entry to Node-SID 101, N no 489 longer sends any traffic with Node-SID 101. 491 The deletion of the SRDB entry to Node-SID101 is delayed to ensure 492 that the traffic still in transit towards Node-SID 101 is not 493 dropped. 495 In conclusion, the traffic loss only depends on the ability of Q to 496 detect the node failure of its adjacent node A. 498 4. Traffic Engineering 500 In this section, we describe Traffic Engineering use-cases for SR, 501 distinguishing use-cases for traffic engineering with bandwidth 502 admission control from those without. 504 4.1. Traffic Engineering without Bandwidth Admission Control 506 This section describes traffic-engineering use-cases which do not 507 require bandwidth admission control. 509 The first sub-section illustrates the use of anycast segments to 510 express macro policies. Two examples are provided: one involving a 511 disjointness enforcement within a so-called dual-plane network, and 512 the other involving CoS-based policies. 514 The second sub-section illustrate how a head-end router can combine a 515 distributed CSPF computation with SR. Various examples are provided 516 where the CSPF constraint or objective is either a TE affinity, an 517 SRLG or a latency metric. 519 The third sub-section illustrates how SR can help traffic-engineer 520 outbound traffic among different external peers, overriding the best 521 installed IP path at the egress border routers. 523 The fourth sub-section describes how SR can be used to express 524 deterministic non-ECMP paths. Several techniques to compress the 525 related segment lists are also introduced. 527 The fifth sub-section describes a use-case where a node attaches an 528 Adj-SID to a set of its interfaces however not sharing the same 529 neighbor. The illustrated benefit relates to loadbalancing. 531 4.1.1. Anycast Node Segment 533 The SR architecture defines an anycast segment as a segment attached 534 to an anycast IP prefix ([RFC4786]). 536 The anycast node segment is an interesting tool for traffic 537 engineering: 539 Macro-policy support: anycast segments allow to express policies 540 such as "go via plane1 of a dual-plane network" (Section 4.1.1.1) 541 or "go via Region3" (Section 4.1.3). 543 Implicit node resiliency: the traffic-engineering policy is not 544 anchored to a specific node whose failure could impact the 545 service. It is anchored to an anycast address/Anycast-SID and 546 hence the flow automatically reroutes on any ECMP-aware shortest- 547 path to any other router part of the anycast set. 549 The two following sub-sections illustrate to traffic-engineering use- 550 cases leveraging Anycast-SID. 552 4.1.1.1. Disjointness in dual-plane networks 554 Many networks are built according to the dual-plane design: 555 Each access region k is connected to the core by two C routers 556 (C(1,k) and C(2,k)). 558 C(1,k) is part of plane 1 and aggregation region K 560 C(2,k) is part of plane 2 and aggregation region K 562 C(1,k) has a link to C(2, j) iff k = j. 564 The core nodes of a given region are directly connected. 565 Inter-region links only connect core nodes of the same plane. 567 {C(1,k) has a link to C(1, j)} iff {C(2,k) has a link to C(2, j)}. 569 The distribution of these links depends on the topological 570 properties of the core of the AS. The design rule presented 571 above specifies that these links appear in both core planes. 573 We assume a common design rule found in such deployments: the inter- 574 plane link costs (Cik-Cjk where i<>j) are set such that the route to 575 an edge destination from a given plane stays within the plane unless 576 the plane is partitioned. 578 Edge Router A 579 / \ 580 / \ 581 / \ Agg Region A 582 / \ 583 / \ 584 C1A----------C2A 585 | \ | \ 586 | \ | \ 587 | C1B----------C2B 588 Plane1 | | | | Plane2 589 | | | | 590 C1C--|-----C2C | 591 \ | \ | 592 \ | \ | 593 C1Z----------C2Z 594 \ / 595 \ / Agg Region Z 596 \ / 597 \ / 598 Edge Router Z 600 Figure 3: Dual-Plane Network and Disjointness 602 In the above network diagram, let us that the operator configures: 604 The four routers (C1A, C1B, C1C, C1Z) with an anycast loopback 605 address 192.0.2.1/32 and an Anycast-SID 101. 607 The four routers (C2A, C2B, C2C, C2Z) with an anycast loopback 608 address 192.0.2.2/32 and an Anycast-SID 102. 610 Edge router Z with Node-SID 109. 612 A can then use the three following segment lists to control its 613 Z-destined traffic: 615 {109}: the traffic is load-balanced across any ECMP path through 616 the network. 618 {101, 109}: the traffic is load-balanced across any ECMP path 619 within the Plane1 of the network. 621 {102, 109}: the traffic is load-balanced across any ECMP path 622 within the Plane2 of the network. 624 Most of the data traffic to Z would use the first segment list, such 625 as to exploit the capacity efficiently. The operator would use the 626 two other segment lists for specific premium traffic that has 627 requested disjoint transport. 629 For example, let us assume a bank or a government customer has 630 requested that the two flows F1 and F2 injected at A and destined to 631 Z should be transported across disjoint paths. The operator could 632 classify F1 (F2) at A and impose and SR header with the second 633 (third) segment list. Focusing on F1 for the sake of illustration, A 634 would route the packets based on the active segment, Anycast-SID 101, 635 which steers the traffic along the ECMP-aware shortest-path to the 636 closest router part of the Anycast-SID 101, C1A is this example. 637 Once the packets have reached C1A, the second segment becomes active, 638 Node-SID 109, which steers the traffic on the ECMP-aware shortest- 639 path to Z. C1A load-balances the traffic between C1B-C1Z and C1C-C1Z 640 and then C1Z forwards to Z. 642 This SR use-case has the following benefits: 644 Zero per-service state and signaling on midpoint and tail-end 645 routers. 647 Only two additional node segments (one Anycast-SID per plane). 649 ECMP-awareness. 651 Node resiliency property: the traffic-engineering policy is not 652 anchored to a specific core node whose failure could impact the 653 service. 655 4.1.1.2. CoS-based Traffic Engineering 657 Frequently, different classes of service need different path 658 characteristics. 660 In the example below, a single-area international network with 661 presence in four different regions of the world has lots of cheap 662 network capacity from Region4 to Region1 via Region2 and some scarce 663 expensive capacity via Region3. 664 +-------[Region2]-------+ 665 | | 666 A----[Region4] [Region1]----Z 667 | | 668 +-------[Region3]-------+ 670 Figure 4: International Topology Example 672 In such case, the IGP metrics would be tuned to have a shortest-path 673 from A to Z via Region2. 675 This would provide efficient capacity planning usage while fulfilling 676 the requirements of most of the traffic demands. However, it may not 677 suite the latency requirements of the voice traffic between the two 678 cities. 680 Let us illustrate how this can be solved with Segment Routing. 682 The operator would configure: 683 - All the core routers in Region3 with an anycast loopback 684 192.0.2.3/32 to which Anycast-SID 333 is attached. 685 - A loopback 192.0.2.9/32 on Z and would attach Node-SID 109 686 to it. 687 - The IGP metrics such that the shortest-path from Region4 to 688 Region1 is via Region2, from Region4 to Region3 is directly 689 to Region3, the shortest-path from Region3 to Region1 is not 690 back via Region4 and Region2 but straight to Region1. 692 With this in mind, the operator would instruct A to apply the 693 following policy for its Z-destined traffic: 694 - Voice traffic: impose segment-list {333, 109} 695 - Anycast-SID 333 steers the Voice traffic along the 696 ECMP-aware shortest-path to the closest core router in 697 Region3, then Node-SID 109 steers the Voice traffic along 698 the ECMP-aware shortest-path to Z. Hence the Voice traffic 699 reaches Z from A via the low-latency path through Region3. 701 - Any other traffic: impose segment-list {109}: Node-SID 109 702 steers the Voice traffic along the ECMP-aware shortest-path 703 to Z. Hence the bulk traffic reaches Z from A via the cheapest 704 path for the operator. 706 This SR use-case has the following benefits: 708 Zero per-service state and signaling at midpoint and tailend 709 nodes. 711 One additional anycast segment per region. 713 ECMP-awareness. 715 Node resiliency property: the traffic-engineering policy is not 716 anchored to a specific core node whose failure could impact the 717 service. 719 4.1.2. Distributed CSPF-based Traffic Engineering 721 In this section, we illustrate how a head-end router can map the 722 result of its distributed CSPF computation into an SR segment list. 723 +---E---+ 724 | | 725 A-----B-------C-----Z 726 | | 727 +---D---+ 729 Figure 5: SRLG-based CSPF 731 Let us assume that in the above network diagram: 733 The operator configures a policy on A such that its Z-destined 734 traffic must avoid SRLG1. 736 The operator configures SRLG1 on the link BC (or is learned 737 dynamically from the IP/Optical interaction with the DWDM 738 network). 740 The SRLG's are flooded in the link-state IGP. 742 The operator respectively configures the Node-SIDs 101, 102, 103, 743 104, 105 and 109 at nodes A, B, C, D, E and Z. 745 In that context, A can apply the following CSPF behavior: 747 - It prunes all the links affected by the SRLG1, computes an SPF 748 on the remaining topology and picks one of the SPF paths. 749 - In our example, A finds two possible paths ABECZ and ABDCZ 750 and let's assume it takes the ABDCZ path. 752 - It translates the path as a list of segments 753 - In our example, ABDCZ can be expressed as {104, 109}: a 754 shortest path to node D, followed by a shortest-path to 755 node Z. 757 - It monitors the status of the LSDB and upon any change 758 impacting the policy, it either recomputes a path meeting the 759 policy or update its translation as a list of segments. 760 - For example, upon the loss of the link DC, the shortest-path 761 to Z from D (Node-SID 109) goes via the undesired link BC. 762 After a transient time immediately following such failure, 763 the node A would figure out that the chosen path is no longer 764 valid and instead select ABECZ which is translated as 765 {103, 109}. 767 - This behavior is a local matter at node A and hence the details 768 are outside the scope of this document. 770 The same use-case can be derived from any other C-SPF objective or 771 constraint (TE affinity, TE latency, SRLG, etc.) as defined in 772 [RFC5305] and [I-D.ietf-isis-te-metric-extensions]. Note that the 773 bandwidth case is specific and hence is treated in Section 4.2. 775 4.1.3. Egress Peering Traffic Engineering 776 +------+ 777 | | 778 +---D F 779 +---------+ / | AS 2 |\ +------+ 780 | |/ +------+ \| Z | 781 A C | | 782 | |\ +------+ /| AS 4 | 783 B AS1 | \ | |/ +------+ 784 | | +---E G 785 +---------+ | AS 3 | 786 +------+\ 788 Figure 6: Egress peering traffic engineering 790 Let us assume that: 792 C in AS1 learns about destination Z of AS 4 via two BGP paths 793 (AS2, AS4) and (AS3, AS4). 795 C sets next-hop-self before propagating the paths within AS1. 797 C propagates all the paths to Z within AS1 (add-path). 799 C only installs the path via AS2 in its RIB. 801 In that context, the operator of AS1 cannot apply the following 802 traffic-engineering policy: 804 Steer 60% of the Z-destined traffic received at A via AS2 and 40% 805 via AS3. 807 Steer 80% of the Z-destined traffic received at B via AS2 and 20% 808 via AS3. 810 This traffic-engineering policy can be supported thanks to the 811 following SR configuration. 813 The operator configures: 815 C with a loopback 192.0.2.1/32 and attach the Node-SID 101 to it. 817 C to bind an external adjacency segment 818 ([I-D.filsfils-rtgwg-segment-routing]) to each of its peering 819 interface. 821 For the sake of this illustration, let us assume that the external 822 adjacency segments bound by C for its peering interfaces to (D, AS2) 823 and (E, AS3) are respectively 9001 and 9002. 825 These external adjacencies (and their attached segments) are flooded 826 within the IGP domain of AS1 [RFC5316]. 828 As a result, the following information is available within AS1: 829 ISIS Link State Database: 831 - Node-SID 101 is attached to IP address 192.0.2.1/32 advertised 832 by C. 833 - C is connected to a peer D with external adjacency segment 9001. 834 - C is connected to a peer E with external adjacency segment 9002. 835 BGP Database: 837 - Z is reachable via 192.0.2.1 with AS Path {AS2, AS4}. 838 - Z is reachable via 192.0.2.1 with AS Path {AS3, AS4}. 840 The operator of AS1 can thus meet its traffic-engineering objective 841 by enforcing the following policies: 843 A should apply the segment list {101, 9001} to 60% of the 844 Z-destined traffic and the segment list {101, 9002} to the rest. 846 B should apply the segment list {101, 9001} to 80% of the 847 Z-destined traffic and the segment list {101, 9002} to the rest. 849 Node segment 101 steers the traffic to C. 851 External adjacency segment 9001 forces the traffic from C to (D, 852 AS2), without any IP lookup at C. 854 External adjacency segment 9002 forces the traffic from C to (E, 855 AS3), without any IP lookup at C. 857 A and B can also use the described segments to assess the liveness of 858 the remote peering links, see OAM section. 860 4.1.4. Deterministic non-ECMP Path 862 The previous sections have illustrated the ability to steer traffic 863 along ECMP-aware shortest-paths. SR is also able to express 864 deterministic non-ECMP path: i.e. as a list of adjacency segments. 865 We illustrate such an use-case in this section. 866 A-B-C-D-E-F-G-H-Z 867 | | 868 +-I-J-K-L-M-+ 870 Figure 7: Non-ECMP deterministic path 872 In the above figure, it is assumed all nodes are SR capable and only 873 the following SIDs are advertised: 874 - A advertises Adj-SID 9001 for its adjacency to B 875 - B advertises Adj-SID 9002 for its adjacency to C 876 - C advertises Adj-SID 9003 for its adjacency to D 877 - D advertises Adj-SID 9004 for its adjacency to E 878 - E advertises Adj-SID 9001 for its adjacency to F 879 - F advertises Adj-SID 9002 for its adjacency to G 880 - G advertises Adj-SID 9003 for its adjacency to H 881 - H advertises Adj-SID 9004 for its adjacency to Z 882 - E advertises Node-SID 101 883 - Z advertises Node-SID 109 885 The operator can steer the traffic from A to Z via a specific non- 886 ECMP path ABCDEFGHZ by imposing the segment list {9001, 9002, 9003, 887 9004, 9001, 9002, 9003, 9004}. 889 The following sub-sections illustrate how the segment list can be 890 compressed. 892 4.1.4.1. Node Segment 894 Clearly the same exact path can be expressed with a two-entry segment 895 list {101, 109}. 897 This example illustrates that a Node Segment can also be used to 898 express deterministic non-ECMP path. 900 4.1.4.2. Forwarding Adjacency 902 The operator can configure Node B to create a forwarding-adjacency to 903 node H along an explicit path BCDEFGH. The following behaviors can 904 then be automated by B: 906 B attaches an Adj-SID (e.g. 9007) to that forwarding adjacency 907 together with an ERO sub-sub-TLV which describes the explicit path 908 BCDEFGH. 910 B installs in its Segment Routing Database the following entry: 912 Active segment: 9007. 914 Operation: NEXT and PUSH {9002, 9003, 9004, 9001, 9002, 9003} 916 As a result, the operator can configure node A with the following 917 compressed segment list {9001, 9007, 9004}. 919 4.1.5. Load-balancing among non-parallel links 921 A given node may assign the same Adj-SID to multiple of its 922 adjacencies, even if these ones lead to different neighbors. This 923 may be useful to support traffic engineering policies. 925 +---C---D---+ 926 | | 927 PE1---A---B-----F-----E---PE2 929 Figure 8: Adj-SID For Multiple (non-parallel) Adjacencies 931 In the above example, let us assume that the operator: 933 Requires PE1 to load-balance its PE2-destined traffic between the 934 ABCDE and ABFE paths. 936 Configures B with Node-SID 102 and E with Node-SID 202. 938 Configures B to advertise an individual Adj-SID per adjacency 939 (e.g. 9001 for BC and 9002 for BF) and, in addition, an Adj-SID 940 for the adjacency set (BC, BF) (e.g. 9003). 942 With this context in mind, the operator achieves its objective by 943 configuring the following traffic-engineering policy at PE1 for the 944 PE2-destined traffic: {102, 9003, 202}: 946 Node-SID 102 steers the traffic to B. 948 Adj-SID 9003 load-balances the traffic to C or F. 950 From either C or F, Node-SID 202 steers the traffic to PE2. 952 In conclusion, the traffic is load-balanced between the ABCDE and 953 ABFE paths, as desired. 955 4.2. Traffic Engineering with Bandwidth Admission Control 957 The implementation of bandwidth admission control within a network 958 (and its possible routing consequence which consists in routing along 959 explicit paths where the bandwidth is available) requires a capacity 960 planning process. 962 The spreading of load among ECMP paths is a key attribute of the 963 capacity planning processes applied to packet-based networks. 965 The first sub-section details the capacity planning process and the 966 role of ECMP load-balancing. We highlight the relevance of SR in 967 that context. 969 The next two sub-sections document two use-cases of SR-based traffic 970 engineering with bandwidth admission control. 972 The second sub-section documents a concrete SR applicability 973 involving centralized-based admission control. This is often 974 referred to as the "SDN/SR use-case". 976 The third sub-section introduces a future research topic involving 977 the notion of residual bandwidth introduced in 978 [I-D.ietf-mpls-te-express-path]. 980 4.2.1. Capacity Planning Process 982 Capacity Planning anticipates the routing of the traffic matrix onto 983 the network topology, for a set of expected traffic and topology 984 variations. The heart of the process consists in simulating the 985 placement of the traffic along ECMP-aware shortest-paths and 986 accounting for the resulting bandwidth usage. 988 The bandwidth accounting of a demand along its shortest-path is a 989 basic capability of any planning tool or PCE server. 991 For example, in the network topology described below, and assuming a 992 default IGP metric of 1 and IGP metric of 2 for link GF, a 1600Mbps 993 A-to-Z flow is accounted as consuming 1600Mbps on links AB and FZ, 994 800Mbps on links BC, BG and GF, and 400Mbps on links CD, DF, CE and 995 EF. 996 C-----D 997 / \ \ 998 A---B +--E--F--Z 999 \ / 1000 G------+ 1002 Figure 9: Capacity Planning an ECMP-based demand 1004 ECMP is extremely frequent in SP, Enterprise and DC architectures and 1005 it is not rare to see as much as 128 different ECMP paths between a 1006 source and a destination within a single network domain. It is a key 1007 efficiency objective to spread the traffic among as many ECMP paths 1008 as possible. 1010 This is illustrated in the below network diagram which consists of a 1011 subset of a network where already 5 ECMP paths are observed from A to 1012 M. 1013 C 1014 / \ 1015 B-D-L-- 1016 / \ / \ 1017 A E \ 1018 \ M 1019 \ G / 1020 \ / \ / 1021 F K 1022 \ / 1023 I 1025 Figure 10: ECMP Topology Example 1027 Segment Routing offers a simple support for such ECMP-based shortest- 1028 path placement: a node segment. A single node segment enumerates all 1029 the ECMP paths along the shortest-path. 1031 When the capacity planning process detects that a traffic growth 1032 scenario and topology variation would lead to congestion, a capacity 1033 increase is triggered and if it cannot be deployed in due time, a 1034 traffic engineering solution is activated within the network. 1036 A basic traffic engineering objective consists of finding the 1037 smallest set of demands that need to be routed off their shortest 1038 path to eliminate the congestion, then to compute an explicit path 1039 for each of them and instantiating these traffic-engineered policies 1040 in the network. 1042 Segment Routing offers a simple support for explicit path policy. 1043 Let us provide two examples based on Figure 10. 1045 First example: let us assume that the process has selected the flow 1046 AM for traffic-engineering away from its ECMP-enabled shortest path 1047 and flow AM must avoid consuming resources on the LM and the FG 1048 links. 1050 The solution is straightforward: A sends its M-destined traffic 1051 towards the nhop F with a two-label stack where the top label is the 1052 adjacent segment FI and the next label is the node segment to M. 1053 Alternatively, a three-label stack with adjacency segments FI, IK and 1054 KM could have been used. 1056 Second example: let us assume that AM is still the selected flow but 1057 the constraint is relaxed to only avoid using resources from the LM 1058 link. 1060 The solution is straightforward: A sends its M-destined traffic 1061 towards the nhop F with a one-label stack where the label is the node 1062 segment to M. Note that while the AM flow has been traffic-engineered 1063 away from its natural shortest-path (ECMP across three paths), the 1064 traffic-engineered path is still ECMP-aware and leverages two of the 1065 three initial paths. This is accomplished with a single-label stack 1066 and without the enumeration of one tunnel per path. 1068 Under the light of these examples, Segment Routing offers an 1069 interesting solution for Capacity Planning because: 1071 One node segment represents the set of ECMP-aware shortest paths. 1073 Adjacency segments allow to express any explicit path. 1075 The combination of node and adjacency segment allows to express 1076 any path without having to enumerate all the ECMP options. 1078 The capacity planning process ensures that the majority of the 1079 traffic rides on node segments (ECMP-based shortest path), while a 1080 minority of the traffic is routed off its shortest-path. 1082 The explicitly-engineered traffic (which is a minority) still 1083 benefits from the ECMP-awareness of the node segments within their 1084 segment list. 1086 Only the head-end of a traffic-engineering policy maintains state. 1087 The midpoints and tail-ends do not maintain any state. 1089 4.2.2. SDN/SR use-case 1091 The heart of the application of SR to the SDN use-case lies in the 1092 SDN controller, also called Stateful PCE 1093 ([I-D.ietf-pce-stateful-pce]). 1095 The SDN controller is responsible to control the evolution of the 1096 traffic matrix and topology. It accepts or denies the addition of 1097 new traffic into the network. It decides how to route the accepted 1098 traffic. It monitors the topology and upon failure, determines the 1099 minimum traffic that should be rerouted on an alternate path to 1100 alleviate a bandwidth congestion issue. 1102 The algorithms supporting this behavior are a local matter of the SDN 1103 controller and are outside the scope of this document. 1105 The means of collecting traffic and topology information are the same 1106 as what would be used with other SDN-based traffic-engineering 1107 solutions (e.g. [RFC7011] and [I-D.ietf-idr-ls-distribution]. 1109 The means of instantiating policy information at a traffic- 1110 engineering head-end are the same as what would be used with other 1111 SDN-based traffic-engineering solutions (e.g.: 1112 [I-D.ietf-i2rs-architecture], [I-D.ietf-pce-pce-initiated-lsp] and 1113 [I-D.sivabalan-pce-segment-routing]). 1115 4.2.2.1. Illustration 1116 _______________ 1117 { } 1118 +--C--+ V { SDN Controller } 1119 |/ \| / {_______________} 1120 A===B--G--D==F--Y 1121 |\ /| \ 1122 +--E--+ Z 1124 SDN/SR use-case 1126 Let us assume that in the above network diagram: 1128 An SDN Controller (SC) is connected to the network and is able to 1129 retrieve the topology and traffic information, as well as set 1130 traffic-engineering policies on the network nodes. 1132 The operator (likely via the SDN Controller) as provisioned the 1133 Node-SIDs 101, 102, 103, 104, 105, 106, 107, 201, 202 and 203 1134 respectively at nodes A, B, C, D, E, F, G, V, Y and Z. 1136 All the links have the same BW (e.g. 10G) and IGP cost (e.g. 10) 1137 except the links BG and GD which have IGP cost 50. 1139 Each described node connectivity is formed as a bundle of two 1140 links, except (B, G) and (G, D) which are formed by a single link 1141 each. 1143 Flow FV is traveling from A to destinations behind V. 1145 Flow FY is traveling from A to destinations behind Y. 1147 Flow FZ is traveling from A to destinations behind Z. 1149 The SDN Controller has admitted all these flows and has let A 1150 apply the default SR policy: "map a flow onto its ECMP-aware 1151 shortest-path". 1153 In this example, this means that A respectively maps the flows 1154 FV onto segment list {201}, FY onto segment list {202} and FZ 1155 onto segment list {203}. 1157 In this example, the reader should note that the SDN Controller 1158 knows what A would do and hence knows and controls that none of 1159 these flows are mapped through G. 1161 Let us describe what happens upon the failure of one of the two links 1162 E-D. 1164 The SDN Controller monitors the link-state database and detects a 1165 congestion risk due to the reduced capacity between E and D. 1166 Specifically, SC updates its simulation of the traffic according to 1167 the policies he instructed the network to use and discovers that too 1168 much traffic is mapped on the remaining link E-D. 1170 The SDN Controller then computes the minimum number of flows that 1171 should be deviated from their existing path. For example, let us 1172 assume that the flow FZ is selected. 1174 The SDN controller then computes an explicit path for this flow. For 1175 example, let us assume that the chosen path is ABGDFZ. 1177 The SDN controller then maps the chosen path into an SR-based policy. 1178 In our example, the path ABGDFZ is translated into a segment list 1179 {107, 203}. Node-SID steers the traffic along ABG and then Node-SID 1180 203 steers the traffic along GDFZ. 1182 The SDN controller then applies the following traffic-engineering 1183 policy at A: "map any packet of the classified flow FZ onto segment- 1184 list {107, 203}". The SDN Controller uses PCEP extensions to 1185 instantiate that policy at A ([I-D.sivabalan-pce-segment-routing]). 1187 As soon as A receives the PCEP message, it enforces the policy and 1188 the traffic classified as FZ is immediately mapped onto segment list 1189 {107, 203}. 1191 This immediately eliminate the congestion risk. Flows FV and FY were 1192 untouched and keep using the ECMP-aware shortest-path. The minimum 1193 amount of traffic was rerouted (FZ). No signaling hop-by-hop through 1194 the network from A to Z is required. No admission control hop-by-hop 1195 is required. No state needs to be maintained by B, G, D, F or Z. The 1196 only maintained state is within the SDN controller and the head-end 1197 node (A). 1199 4.2.2.2. Benefits 1201 In the context of Centralized-Based Optimization and the SDN use- 1202 case, here are the benefits provided by the SR architecture: 1204 Explicit routing capability with or without ECMP-awareness. 1206 No signaling hop-by-hop through the network. 1208 State is only maintained at the policy head-end. No state is 1209 maintained at mid-points and tail-ends. 1211 Automated guaranteed FRR for any topology (Section 3. 1213 Optimum virtualization: the policy state is in the packet header 1214 and not in the intermediate node along the policy. The policy is 1215 completely virtualized away from midpoints and tail-ends. 1217 Highly responsive to change: the SDN Controller only needs to 1218 apply a policy change at the head-end. No delay is lost 1219 programming the midpoints and tail-end along the policy. 1221 4.2.2.3. Dataset analysis 1223 A future version of this document will report some analysis of the 1224 application of the SDN/SR use-case to real operator data sets. 1226 A first, incomplete, report is available here below. 1228 4.2.2.3.1. Example 1 1230 The first data-set consists in a full-mesh of 12000 explicitly-routed 1231 tunnels observed on a real network. These tunnels resulted from 1232 distributed headend-based CSPF computation. 1234 We measured that only 65% of the traffic is riding on its shortest 1235 path. 1237 Three well-known defects are illustrated in this data set: 1239 The lack of ECMP support in explicitly--routed tunnels: ATM-alike 1240 traffic-steering mechanisms steer the traffic along a non-ECMP 1241 path. 1243 The increase of the number of explicitly-routed non-ECMP tunnels 1244 to enumerate all the ECMP options. 1246 The inefficiency of distributed optimization: too much traffic is 1247 riding off its shortest path. 1249 We applied the SDN/SR use-case to this dataset. This means that: 1251 The distributed CSPF computation is replaced by centralized 1252 optimization and BW admission control, supported by the SDN 1253 Controller. 1255 As part of the optimization, we also optimized the IGP-metrics 1256 such as to get a maximum of traffic load-spread among ECMP- 1257 paths by default. 1259 The traffic-engineering policies are supported by SR segment- 1260 lists. 1262 As a result, we measured that 98% of the traffic would be kept on its 1263 normal policy (ride shortest-path) and only 2% of the traffic 1264 requires a path away from the shortest-path. 1266 Let us highlight a few benefits: 1268 98% of the traffic-engineering head-end policies are eliminated. 1270 Indeed, by default, an SR-capable ingress edge node maps the 1271 traffic on a single Node-ID to the egress edge node. No 1272 configuration or policy needs to be maintained at the ingress 1273 edge node to realize this. 1275 100% of the states at mid/tail nodes are eliminated. 1277 4.2.3. Residual Bandwidth 1279 The notion of Residual Bandwidth (RBW) is introduced by 1280 [I-D.ietf-mpls-te-express-path]. 1282 A future version of this document will describe the SR/RBW research 1283 opportunity. 1285 5. Service chaining 1287 Segment routing can be used to steer packets through services offered 1288 by middleboxes to perform specific actions such as DPI, accounting, 1289 etc. 1291 I---A---B---C---E 1292 \ | / \ / 1293 \ | / F 1294 \|/ 1295 D 1297 Figure 11 1299 For example, as illustrated in Figure 11, an ingress node I selects 1300 an egress node E for a packet P. An application however requires that 1301 P undergoes a specific treatment (DPI, firewalling, ...) offered by a 1302 node D, reachable in the SR domain. In the SR architecture, this 1303 application can be supported through the use of a service segment 1304 with a local scope to D, say SS, following the nodal segment which 1305 corresponds to D. The Ingress box keeps the control of the egress 1306 node through which the packet needs to exit the network, by placing a 1307 nodal segment identifying the egress node after the service segment. 1309 This would be achieved by letting I forward the packet P with the 1310 following sequence of segments: {D,SS,E}. D is a nodal segment, SS 1311 is the service segment corresponding to the service to apply to the 1312 packet P, and E is the nodal segment corresponding to the egress node 1313 selected by I for that packet. 1315 6. OAM 1317 6.1. Monitoring a remote bundle 1319 This section documents a few representative SR/OAM use-cases. 1320 +--+ _ +--+ +-------+ 1321 | | { } | |---991---L1---662---| | 1322 |MS|--{ }-|R1|---992---L2---663---|R2 (72)| 1323 | | {_} | |---993---L3---664---| | 1324 +--+ +--+ +-------+ 1326 Figure 12: Probing all the links of a remote bundle 1328 In the above figure, a monitoring system (MS) needs to assess the 1329 dataplane availability of all the links within a remote bundle 1330 connected to routers R1 and R2. 1332 The monitoring system retrieves the segment information from the IGP 1333 LSDB and appends the following segment list: {72, 662, 992, 664} on 1334 its IP probe (whose source and destination addresses are the address 1335 of AA). 1337 MS sends the probe to its connected router. If the connected router 1338 is not SR compliant, a tunneling technique can be used to tunnel the 1339 SR-based probe to the first SR router. The SR domain forwards the 1340 probe to R2 (72 is the node segment of R2). R2 forwards the probe to 1341 R1 over link L1 (adjacency segment 662). R1 forwards the probe to R2 1342 over link L2 (adjacency segment 992). R2 forwards the probe to R1 1343 over link L3 (adjacency segment 664). R1 then forwards the IP probe 1344 to AA as per classic IP forwarding. 1346 6.2. Monitoring a remote peering link 1348 In Figure 6, node A can monitor the dataplane liveness of the 1349 unidirectional peering link from C to D of AS2 by sending an IP probe 1350 with destination address A and segment list {101, 9001}. Node-SID 1351 101 steers the probe to C and External Adj-SID 9001 steers the probe 1352 from C over the desired peering link to D of AS2. The SR header is 1353 removed by C and D receives a plain IP packet with destination 1354 address A. D returns the probe to A through classic IP forwarding. 1355 BFD Echo mode ([RFC5880]) would support such liveliness 1356 unidirectional link probing application. 1358 7. IANA Considerations 1360 TBD 1362 8. Manageability Considerations 1364 TBD 1366 9. Security Considerations 1368 TBD 1370 10. Acknowledgements 1372 We would like to thank Dave Ward, Dan Frost, Stewart Bryant, Thomas 1373 Telkamp, Ruediger Geib and Les Ginsberg for their contribution to the 1374 content of this document. 1376 11. References 1378 11.1. Normative References 1380 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1381 Requirement Levels", BCP 14, RFC 2119, March 1997. 1383 [RFC4786] Abley, J. and K. Lindqvist, "Operation of Anycast 1384 Services", BCP 126, RFC 4786, December 2006. 1386 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 1387 Engineering", RFC 5305, October 2008. 1389 [RFC5316] Chen, M., Zhang, R., and X. Duan, "ISIS Extensions in 1390 Support of Inter-Autonomous System (AS) MPLS and GMPLS 1391 Traffic Engineering", RFC 5316, December 2008. 1393 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1394 (BFD)", RFC 5880, June 2010. 1396 [RFC7011] Claise, B., Trammell, B., and P. Aitken, "Specification of 1397 the IP Flow Information Export (IPFIX) Protocol for the 1398 Exchange of Flow Information", STD 77, RFC 7011, 1399 September 2013. 1401 11.2. Informative References 1403 [I-D.filsfils-rtgwg-segment-routing] 1404 Filsfils, C., Previdi, S., Bashandy, A., Decraene, B., 1405 Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R., 1406 Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe, 1407 "Segment Routing Architecture", 1408 draft-filsfils-rtgwg-segment-routing-01 (work in 1409 progress), October 2013. 1411 [I-D.filsfils-spring-segment-routing-ldp-interop] 1412 Filsfils, C., Previdi, S., Bashandy, A., Decraene, B., 1413 Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R., 1414 Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe, 1415 "Segment Routing interoperability with LDP", 1416 draft-filsfils-spring-segment-routing-ldp-interop-00 (work 1417 in progress), October 2013. 1419 [I-D.filsfils-spring-segment-routing-mpls] 1420 Filsfils, C., Previdi, S., Bashandy, A., Decraene, B., 1421 Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R., 1422 Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe, 1423 "Segment Routing with MPLS data plane", 1424 draft-filsfils-spring-segment-routing-mpls-00 (work in 1425 progress), October 2013. 1427 [I-D.francois-sr-frr] 1428 Francois, P., Filsfils, C., Bashandy, A., Previdi, S., and 1429 B. Decraene, "Segment Routing Fast Reroute", 1430 draft-francois-sr-frr-00 (work in progress), July 2013. 1432 [I-D.ietf-i2rs-architecture] 1433 Atlas, A., Halpern, J., Hares, S., Ward, D., and T. 1434 Nadeau, "An Architecture for the Interface to the Routing 1435 System", draft-ietf-i2rs-architecture-02 (work in 1436 progress), February 2014. 1438 [I-D.ietf-idr-ls-distribution] 1439 Gredler, H., Medved, J., Previdi, S., Farrel, A., and S. 1440 Ray, "North-Bound Distribution of Link-State and TE 1441 Information using BGP", draft-ietf-idr-ls-distribution-04 1442 (work in progress), November 2013. 1444 [I-D.ietf-isis-te-metric-extensions] 1445 Previdi, S., Giacalone, S., Ward, D., Drake, J., Atlas, 1446 A., Filsfils, C., and W. Wu, "IS-IS Traffic Engineering 1447 (TE) Metric Extensions", 1448 draft-ietf-isis-te-metric-extensions-01 (work in 1449 progress), October 2013. 1451 [I-D.ietf-mpls-te-express-path] 1452 Atlas, A., Drake, J., Giacalone, S., Ward, D., Previdi, 1453 S., and C. Filsfils, "Performance-based Path Selection for 1454 Explicitly Routed LSPs using TE Metric Extensions", 1455 draft-ietf-mpls-te-express-path-00 (work in progress), 1456 October 2013. 1458 [I-D.ietf-pce-pce-initiated-lsp] 1459 Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "PCEP 1460 Extensions for PCE-initiated LSP Setup in a Stateful PCE 1461 Model", draft-ietf-pce-pce-initiated-lsp-00 (work in 1462 progress), December 2013. 1464 [I-D.ietf-pce-stateful-pce] 1465 Crabbe, E., Medved, J., Minei, I., and R. Varga, "PCEP 1466 Extensions for Stateful PCE", 1467 draft-ietf-pce-stateful-pce-08 (work in progress), 1468 February 2014. 1470 [I-D.previdi-isis-segment-routing-extensions] 1471 Previdi, S., Filsfils, C., Bashandy, A., Gredler, H., 1472 Litkowski, S., and J. Tantsura, "IS-IS Extensions for 1473 Segment Routing", 1474 draft-previdi-isis-segment-routing-extensions-05 (work in 1475 progress), February 2014. 1477 [I-D.psenak-ospf-segment-routing-extensions] 1478 Psenak, P., Previdi, S., Filsfils, C., Gredler, H., 1479 Shakir, R., and W. Henderickx, "OSPF Extensions for 1480 Segment Routing", 1481 draft-psenak-ospf-segment-routing-extensions-04 (work in 1482 progress), February 2014. 1484 [I-D.rtgwg-bgp-pic] 1485 Bashandy, A., Filsfils, C., and P. Mohapatra, "Abstract", 1486 draft-rtgwg-bgp-pic-02 (work in progress), October 2013. 1488 [I-D.shakir-rtgwg-sr-performance-engineered-lsps] 1489 Shakir, R., Vernals, D., and A. Capello, "Performance 1490 Engineered LSPs using the Segment Routing Data-Plane", 1491 draft-shakir-rtgwg-sr-performance-engineered-lsps-00 (work 1492 in progress), July 2013. 1494 [I-D.sivabalan-pce-segment-routing] 1495 Sivabalan, S., Medved, J., Filsfils, C., Crabbe, E., and 1496 R. Raszuk, "PCEP Extensions for Segment Routing", 1497 draft-sivabalan-pce-segment-routing-02 (work in progress), 1498 October 2013. 1500 [RFC5443] Jork, M., Atlas, A., and L. Fang, "LDP IGP 1501 Synchronization", RFC 5443, March 2009. 1503 [RFC6138] Kini, S. and W. Lu, "LDP IGP Synchronization for Broadcast 1504 Networks", RFC 6138, February 2011. 1506 Authors' Addresses 1508 Clarence Filsfils (editor) 1509 Cisco Systems, Inc. 1510 Brussels, 1511 BE 1513 Email: cfilsfil@cisco.com 1515 Pierre Francois (editor) 1516 IMDEA Networks 1517 Leganes, 1518 ES 1520 Email: pierre.francois@imdea.org 1522 Stefano Previdi 1523 Cisco Systems, Inc. 1524 Via Del Serafico, 200 1525 Rome 00142 1526 Italy 1528 Email: sprevidi@cisco.com 1530 Bruno Decraene 1531 Orange 1532 FR 1534 Email: bruno.decraene@orange.com 1536 Stephane Litkowski 1537 Orange 1538 FR 1540 Email: stephane.litkowski@orange.com 1541 Martin Horneffer 1542 Deutsche Telekom 1543 Hammer Str. 216-226 1544 Muenster 48153 1545 DE 1547 Email: Martin.Horneffer@telekom.de 1549 Igor Milojevic 1550 Telekom Srbija 1551 Takovska 2 1552 Belgrade 1553 RS 1555 Email: igormilojevic@telekom.rs 1557 Rob Shakir 1558 British Telecom 1559 London 1560 UK 1562 Email: rob.shakir@bt.com 1564 Saku Ytti 1565 TDC Oy 1566 Mechelininkatu 1a 1567 TDC 00094 1568 FI 1570 Email: saku@ytti.fi 1572 Wim Henderickx 1573 Alcatel-Lucent 1574 Copernicuslaan 50 1575 Antwerp 2018 1576 BE 1578 Email: wim.henderickx@alcatel-lucent.com 1579 Jeff Tantsura 1580 Ericsson 1581 300 Holger Way 1582 San Jose, CA 95134 1583 US 1585 Email: Jeff.Tantsura@ericsson.com 1587 Sriganesh Kini 1588 Ericsson 1589 300 Holger Way 1590 San Jose, CA 95134 1591 US 1593 Email: sriganesh.kini@ericsson.com 1595 Edward Crabbe 1596 Google, Inc. 1597 1600 Amphitheatre Parkway 1598 Mountain View, CA 94043 1599 US 1601 Email: edc@google.com