idnits 2.17.1 

draft-filsfils-spring-segment-routing-use-cases-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 5 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.

  -- The document has examples using IPv4 documentation addresses according
     to RFC6890, but does not use any IPv6 documentation addresses.  Maybe
     there should be IPv6 examples, too?


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (March 27, 2014) is 3676 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '100' on line 237

  -- Looks like a reference, but probably isn't: '199' on line 237

  -- Looks like a reference, but probably isn't: '200' on line 239

  -- Looks like a reference, but probably isn't: '299' on line 239

  -- Looks like a reference, but probably isn't: '300' on line 241

  -- Looks like a reference, but probably isn't: '399' on line 241

  -- Looks like a reference, but probably isn't: '400' on line 243

  -- Looks like a reference, but probably isn't: '499' on line 243

  -- Looks like a reference, but probably isn't: '500' on line 245

  -- Looks like a reference, but probably isn't: '599' on line 245

  -- Looks like a reference, but probably isn't: '600' on line 247

  -- Looks like a reference, but probably isn't: '699' on line 247

  ** Obsolete normative reference: RFC 5316 (Obsoleted by RFC 9346)

  == Outdated reference: A later version (-03) exists of
     draft-filsfils-spring-segment-routing-ldp-interop-00

  == Outdated reference: A later version (-03) exists of
     draft-filsfils-spring-segment-routing-mpls-00

  == Outdated reference: A later version (-15) exists of
     draft-ietf-i2rs-architecture-02

  == Outdated reference: A later version (-13) exists of
     draft-ietf-idr-ls-distribution-04

  == Outdated reference: A later version (-11) exists of
     draft-ietf-isis-te-metric-extensions-01

  == Outdated reference: A later version (-11) exists of
     draft-ietf-pce-pce-initiated-lsp-00

  == Outdated reference: A later version (-21) exists of
     draft-ietf-pce-stateful-pce-08

  == Outdated reference: A later version (-05) exists of
     draft-psenak-ospf-segment-routing-extensions-04

  == Outdated reference: A later version (-03) exists of
     draft-sivabalan-pce-segment-routing-02


     Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 14 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                   C. Filsfils, Ed.
3	Internet-Draft                                       Cisco Systems, Inc.
4	Intended status: Standards Track                        P. Francois, Ed.
5	Expires: September 28, 2014                               IMDEA Networks
6	                                                              S. Previdi
7	                                                     Cisco Systems, Inc.
8	                                                             B. Decraene
9	                                                            S. Litkowski
10	                                                                  Orange
11	                                                            M. Horneffer
12	                                                        Deutsche Telekom
13	                                                            I. Milojevic
14	                                                          Telekom Srbija
15	                                                               R. Shakir
16	                                                         British Telecom
17	                                                                 S. Ytti
18	                                                                  TDC Oy
19	                                                           W. Henderickx
20	                                                          Alcatel-Lucent
21	                                                             J. Tantsura
22	                                                                 S. Kini
23	                                                                Ericsson
24	                                                               E. Crabbe
25	                                                            Google, Inc.
26	                                                          March 27, 2014

28	                       Segment Routing Use Cases
29	           draft-filsfils-spring-segment-routing-use-cases-00

31	Abstract

33	   Segment Routing (SR) leverages the source routing and tunneling
34	   paradigms.  A node steers a packet through a controlled set of
35	   instructions, called segments, by prepending the packet with an SR
36	   header.  A segment can represent any instruction, topological or
37	   service-based.  SR allows to enforce a flow through any topological
38	   path and service chain while maintaining per-flow state only at the
39	   ingress node of the SR domain.

41	   The Segment Routing architecture can be directly applied to the MPLS
42	   dataplane with no change on the forwarding plane.  It requires minor
43	   extension to the existing link-state routing protocols.  Segment
44	   Routing can also be applied to IPv6 with a new type of routing
45	   extension header.

47	Requirements Language

49	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
50	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
51	   document are to be interpreted as described in RFC 2119 [RFC2119].

53	Status of this Memo

55	   This Internet-Draft is submitted in full conformance with the
56	   provisions of BCP 78 and BCP 79.

58	   Internet-Drafts are working documents of the Internet Engineering
59	   Task Force (IETF).  Note that other groups may also distribute
60	   working documents as Internet-Drafts.  The list of current Internet-
61	   Drafts is at http://datatracker.ietf.org/drafts/current/.

63	   Internet-Drafts are draft documents valid for a maximum of six months
64	   and may be updated, replaced, or obsoleted by other documents at any
65	   time.  It is inappropriate to use Internet-Drafts as reference
66	   material or to cite them other than as "work in progress."

68	   This Internet-Draft will expire on September 28, 2014.

70	Copyright Notice

72	   Copyright (c) 2014 IETF Trust and the persons identified as the
73	   document authors.  All rights reserved.

75	   This document is subject to BCP 78 and the IETF Trust's Legal
76	   Provisions Relating to IETF Documents
77	   (http://trustee.ietf.org/license-info) in effect on the date of
78	   publication of this document.  Please review these documents
79	   carefully, as they describe your rights and restrictions with respect
80	   to this document.  Code Components extracted from this document must
81	   include Simplified BSD License text as described in Section 4.e of
82	   the Trust Legal Provisions and are provided without warranty as
83	   described in the Simplified BSD License.

85	Table of Contents

87	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
88	     1.1.  Companion Documents  . . . . . . . . . . . . . . . . . . .  4
89	     1.2.  Editorial simplification . . . . . . . . . . . . . . . . .  5
90	   2.  IGP-based MPLS Tunneling . . . . . . . . . . . . . . . . . . .  5
91	   3.  Fast Reroute . . . . . . . . . . . . . . . . . . . . . . . . .  7
92	     3.1.  Protecting node and adjacency segments . . . . . . . . . .  7
93	     3.2.  Protecting a node segment upon the failure of its
94	           advertising node . . . . . . . . . . . . . . . . . . . . .  8
95	       3.2.1.  Advertisement of the Mirroring Capability  . . . . . . 10
96	       3.2.2.  Mirroring Table  . . . . . . . . . . . . . . . . . . . 10
97	       3.2.3.  LFA FRR at the Point of Local Repair . . . . . . . . . 10
98	       3.2.4.  Modified IGP Convergence upon Node deletion  . . . . . 11
99	       3.2.5.  Conclusions  . . . . . . . . . . . . . . . . . . . . . 11
100	   4.  Traffic Engineering  . . . . . . . . . . . . . . . . . . . . . 12
101	     4.1.  Traffic Engineering without Bandwidth Admission Control  . 12
102	       4.1.1.  Anycast Node Segment . . . . . . . . . . . . . . . . . 12
103	       4.1.2.  Distributed CSPF-based Traffic Engineering . . . . . . 17
104	       4.1.3.  Egress Peering Traffic Engineering . . . . . . . . . . 18
105	       4.1.4.  Deterministic non-ECMP Path  . . . . . . . . . . . . . 20
106	       4.1.5.  Load-balancing among non-parallel links  . . . . . . . 21
107	     4.2.  Traffic Engineering with Bandwidth Admission Control . . . 22
108	       4.2.1.  Capacity Planning Process  . . . . . . . . . . . . . . 22
109	       4.2.2.  SDN/SR use-case  . . . . . . . . . . . . . . . . . . . 25
110	       4.2.3.  Residual Bandwidth . . . . . . . . . . . . . . . . . . 29
111	   5.  Service chaining . . . . . . . . . . . . . . . . . . . . . . . 29
112	   6.  OAM  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
113	     6.1.  Monitoring a remote bundle . . . . . . . . . . . . . . . . 30
114	     6.2.  Monitoring a remote peering link . . . . . . . . . . . . . 30
115	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 30
116	   8.  Manageability Considerations . . . . . . . . . . . . . . . . . 31
117	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 31
118	   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 31
119	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31
120	     11.1. Normative References . . . . . . . . . . . . . . . . . . . 31
121	     11.2. Informative References . . . . . . . . . . . . . . . . . . 31
122	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 34

124	1.  Introduction

126	   The objective of this document is to illustrate the properties and
127	   benefits of the SR architecture, through the documentation of various
128	   SR use-cases.

130	   Section 2 illustrates the ability to tunnel traffic towards remote
131	   service points without any other protocol than the IGP.

133	   Section 3 reports various FRR use-cases leveraging the SR
134	   functionality.

136	   Section 4 documents traffic-engineering use-cases, with and without
137	   support of bandwidth admission control.

139	   Section 5 documents the use of SR to perform service chaining.

141	   Section 6 illustrates OAM use-cases.

143	1.1.  Companion Documents

145	   The main reference for this document is the SR architecture defined
146	   in [I-D.filsfils-rtgwg-segment-routing].

148	   The SR instantiation in the MPLS dataplane is described in
149	   [I-D.filsfils-spring-segment-routing-mpls].

151	   [I-D.filsfils-spring-segment-routing-ldp-interop] documents the co-
152	   existence and interworking with MPLS Signaling protocols.

154	   IS-IS protocol extensions for Segment Routing are described in
155	   [I-D.previdi-isis-segment-routing-extensions].

157	   OSPF protocol extensions for Segment Routing are defined in
158	   [I-D.psenak-ospf-segment-routing-extensions].

160	   Fast-Reroute for Segment Routing is described in
161	   [I-D.francois-sr-frr].

163	   The PCEP protocol extensions for Segment Routing are defined in
164	   [I-D.sivabalan-pce-segment-routing].

166	   The SR instantiation in the IPv6 dataplane will be described in a
167	   future draft.

169	1.2.  Editorial simplification

171	   A unique index is allocated to each IGP Prefix Segment.  The related
172	   absolute segment associated to an IGP Prefix SID is determined by
173	   summing the index and the base of the SRGB.  In the SR architecture,
174	   each node can be configured with a different SRGB and hence the
175	   absolute SID associated to an IGP Prefix Segment can change from node
176	   to node.

178	   We have described the first use-case of this document in the most
179	   generic way, i.e. with different SRGB at each node in the SR IGP
180	   domain.  We have detailed the packet path highlighting that the SID
181	   of a Prefix Segment may change hop by hop.

183	   For editorial simplification purpose, we will assume for all the
184	   other use cases that the operator ensures a single consistent SRGB
185	   across all the nodes in the SR IGP domain.  In that case, all the
186	   nodes associate the same absolute SID with the same index and hence
187	   one can use the absolute SID value instead of the index to refer to a
188	   Prefix SID.

190	   Several operators have indicated that they would deploy the SR
191	   technology in this way: with a single consistent SRGB across all the
192	   nodes.  They motivated their choice based on operational simplicity
193	   (e.g. troubleshooting across different nodes).

195	   While this document notes this operator feedback and we use this
196	   deployment model to simplify the text, we highlight that the SR
197	   architecture is not limited to this specific deployment use-case
198	   (different nodes may have different SRGB thanks to the indexation of
199	   Prefix SID's).

201	2.  IGP-based MPLS Tunneling

203	   SR, applied to the MPLS dataplane, offers the ability to tunnel
204	   services (VPN, VPLS, VPWS) from an ingress PE to an egress PE,
205	   without any other protocol than ISIS or OSPF.  LDP and RSVP-TE
206	   signaling protocols are not required.

208	   The operator only needs to allocate one node segment per PE and the
209	   SR IGP control-plane automatically builds the required MPLS
210	   forwarding constructs from any PE to any PE.

212	                                  P1---P2
213	                                 /       \
214	                    A---CE1---PE1         PE2---CE2---Z
215	                                 \       /
216	                                  P4---P4

218	                    Figure 1: IGP-based MPLS Tunneling

220	   In Figure 1 above, the four nodes A, CE1, CE2 and Z are part of the
221	   same VPN.  CE2 advertises to PE2 a route to Z. PE2 binds a local
222	   label LZ to that route and propagates the route and its label via
223	   MPBGP to PE1 with nhop 192.168.0.2.  PE1 installs the VPN prefix Z in
224	   the appropriate VRF and resolves the next-hop onto the node segment
225	   associated with PE2.  Upon receiving a packet from A destined to Z,
226	   PE1 pushes two labels onto the packet: the top label is the Prefix
227	   SID attached to 192.168.0.2/32, the bottom label is the VPN label LZ
228	   attached to the VPN route Z.

230	   The Prefix-SID attached to prefix 192.168.0.2 is a shared segment
231	   within the IGP domain, as such it is indexed.

233	   Let us assume that:

235	      - the operator allocated the index 2 to the prefix 192.168.0.2/32

237	      - the operator allocated SRGB [100, 199] at PE1

239	      - the operator allocated SRGB [200, 299] at P1

241	      - the operator allocated SRGB [300, 399] at P2

243	      - the operator allocated SRGB [400, 499] at P3

245	      - the operator allocated SRGB [500, 599] at P4

247	      - the operator allocated SRGB [600, 699] at PE2

249	   Thanks to this context, any SR-capable IGP node in the domain can
250	   determine what is the segment associated with the Prefix-SID attached
251	   to prefix 192.168.0.2/32:

253	      - PE1's SID is 100+2=102

255	      - P1's SID is 200+2=202
256	      - P2's SID is 300+2=302

258	      - P3's SID is 400+2=402

260	      - P4's SID is 500+2=502

262	      - PE2's SID is 600+2=602

264	   Specifically to our example this means that PE1 load-balance the
265	   traffic to VPN route Z between P1 and P4.  The packets sent to P1
266	   have a top label 202 while the packets sent to P4 have a top label
267	   502.  P1 swaps 202 for 302 and forwards to P2.  P2 pops 302 and
268	   forwards to PE2.  The packets sent to P4 had label 502.  P4 swaps 502
269	   for 402 and forwards the packets to P3.  P3 pops the top label and
270	   forwards the packets to PE2.  Eventually all the packets reached PE2
271	   with one single lable: LZ, the VPN label attached to VPN route Z.

273	   This scenario illustrates how supporting MPLS services (VPN, VPLS,
274	   VPWS) with SR has the following benefits:

276	      - Simple operation: one single intra-domain protocol to operate:
277	      the IGP.  No need to support IGP synchronization extensions as
278	      described in [RFC5443] and [RFC6138].

280	      - Excellent scaling: one Node-SID per PE.

282	3.  Fast Reroute

284	   Segment Routing aims at supporting services with tight SLA guarantees
285	   [I-D.filsfils-rtgwg-segment-routing].  To meet this goal, local
286	   protection mechanisms can be useful to provide fast connectivity
287	   restoration after the sudden failure of network components.
288	   Protection mechanisms for segments aim at letting a point of local
289	   repair (PLR) pre-compute and install state allowing to locally
290	   recover the delivery of packets when the primary outgoing interface
291	   corresponding to the protected active segment is down.

293	   This section describes use-cases leading to the definition of
294	   different protection mechanisms for node, adjacency, and service
295	   segments to be supported by the SR architecture.

297	3.1.  Protecting node and adjacency segments

299	   Node and adjacency segments are used to determine the path that a
300	   packet should follow from an ingress node to an egress node of the SR
301	   domain or a service node.

303	   Ensuring fast recovery of the packet delivery service may wear
304	   different requirements depending on the application using the
305	   segment.  For this reason, the SR architecture should be able to
306	   accomodate multiple protection mechanisms and provide means to the
307	   operator to configure the protection scheme applied for the segments
308	   that are advertised in the SR domain.

310	   The operator may want to achieve fast recovery in case of failures
311	   with as little management effort as possible, using a protection
312	   mechanism provided by the Segment Routing architecture itself.  In
313	   this case, a Segment Routing node is in charge of discovering "by
314	   default" protection paths for each of its adjacent network component,
315	   with minimal operational impact.  Approaches for such applications,
316	   typically in line with classical IP-FRR solutions, are discussed in
317	   [I-D.francois-sr-frr].

319	   The operator of a Segment Routing network may also have strict
320	   policies on how a given network component should be protected against
321	   failures.  A typical case is the knowledge by an external controller
322	   (or through any other tool used by the operator) of shared risk among
323	   different components, which should not be used to protect each other.
324	   An operator could notably use [I-D.sivabalan-pce-segment-routing] for
325	   this purpose.

327	   Third, some SR applications have strict requirements in terms of
328	   guaranteed performance, disjointness in the infrastructure components
329	   used for different services, or for redundant provisioning of such
330	   services.  An approach for providing resiliency in these contexts is
331	   explained in [I-D.shakir-rtgwg-sr-performance-engineered-lsps].  It
332	   is basically aiming at letting the ingress node in the SR domain be
333	   in charge of the recovery of the Segment Routing paths that it uses
334	   to support these services.

336	   The protection behavior applied to a given SID must be advertised in
337	   the routing information that is propagated in the SR domain for that
338	   SID, e.g., in [I-D.previdi-isis-segment-routing-extensions].  Nodes
339	   injecting traffic in the SR domain can hence select segments based on
340	   the protection mechanism that is required for their application.

342	3.2.  Protecting a node segment upon the failure of its advertising node

344	   Service segments can also benefit from a fast restoration mechanism
345	   provided by the SR architecture.

347	   Referring to the below figure, let us assume:

349	      A is identified by IP address 192.0.2.1/32 to which Node-SID 101
350	      is attached.

352	      B is identified by IP address 192.0.2.2/32 to which Node-SID 102
353	      is attached

355	      A and B host the same set of services.

357	      Each service is identified by a local segment at each node: i.e.
358	      node A allocates a local service segment 9001 to identify a
359	      specific service S while the same service is identified by a local
360	      service segment 9002 at B. Specifically, for the sake of this
361	      illustration, let us assume that service S is a BGP-VPN service
362	      where A announces a VPN route V with BGP nhop 192.0.2.1/32 and
363	      local VPN label 9001 and B announces the same VPN route V with BGP
364	      nhop 192.0.2.2/32 and local VPN label 9002.

366	      A generic mesh interconnects the three nodes M, Q and B.

368	      N prefers to use the service S offered by A and hence sends its
369	      S-destined traffic with segment list {101, 9001}.

371	      Q is a node connected to A.

373	      Q has a method to detect the loss of node A within a few 10's of
374	      msec.

376	                               __
377	                              {  }---Q---A(service S)
378	                       N--M--{    }
379	                              {__}---B(service S)

381	                        Figure 2: Service Mirroring

383	   In that context, we would like to protect the traffic destined to
384	   service S upon the failure of node A.

386	   The solution is built upon several components:
387	   1. B advertises its mirroring capability for mirrored Node-SID 101
388	   2. B pre-installs a mirroring table in order to process the
389	      packets originally destined to 101.
390	   3. Q and any neighbor of A pre-install the Mirror_FRR LFA
391	      extension
392	   4. All nodes implements a modified SRDB convergence upon Node-SID
393	      101 deletion

395	3.2.1.  Advertisement of the Mirroring Capability

397	   B advertises a MIRROR sub-TLV in its IGP Link-State Router Capability
398	   TLV with the values (TTT=000, MIRRORED_OBJECT=101,
399	   CONTEXT_SEGMENT=10002),[I-D.filsfils-rtgwg-segment-routing],
400	   [I-D.previdi-isis-segment-routing-extensions] and
401	   [I-D.psenak-ospf-segment-routing-extensions] for more details in the
402	   encodings.

404	   Doing so, B advertises within the routing domain that it is willing
405	   to backup any traffic originally sent to Node-SID 101 provided that
406	   this rerouted traffic gets to B with the context segment 10002
407	   directly preceding any local service segment advertised by A. 10002
408	   is a local context segment allocated by B to identify traffic that
409	   was originally meant for A. This allows B to match the subsequent
410	   service segment (e.g. 9001) correctly.

412	3.2.2.  Mirroring Table

414	   We assume that B is able to discover all the local service segments
415	   allocated by A (e.g.  BGP route reflection and add-path).  B maps all
416	   the services advertised by A to its similar service representations.
417	   For example, service 9001 advertised by A is mapped to service 9002
418	   advertised by B as both relate to the same service S (the same VPN
419	   route V).  For example, B applies the same service treatment to a
420	   packet received with top segments {102, 10002, 9001} or with top
421	   segments {102, 9002}.  Basically, B treats {10002, 9001} as a synonym
422	   of {9002}.

424	3.2.3.  LFA FRR at the Point of Local Repair

426	   In advance of any failure of A, Q (and any other node connected to A)
427	   learns the identity of the IGP Mirroring node for each Node-SID
428	   advertised by A (MIRROR_TLV advertised by B) and pre-installs the
429	   following new MIRROR_FRR entry:
430	   - Trigger condition: the loss of nhop A
431	   - Incoming active segment: 101 (a Node-SID advertised by A)
432	   - Primary Segment processing: pop 101
433	      - Backup Segment processing: pop 101, push {102, 10002}
434	   - Primary nhop: A
435	      - Backup nhop: primary path to node B

437	   Upon detecting the loss of node A, Q intercepts any traffic destined
438	   to Node-SID 101, pops the segment to A (101) and push a repair tunnel
439	   {102, 10002}.  Node-SID 102 steers the repaired traffic to B while
440	   context segment 10002 allows B to process the following service
441	   segment {9001} in the right context table.

443	3.2.4.  Modified IGP Convergence upon Node deletion

445	   Upon the failure of A, all the neighbors of A will flood the loss of
446	   their adjacency to A and eventually every node within the IGP domain
447	   will delete 192.0.2.1/32 from their RIB.

449	   The RIB deletion of 192.0.2.1/32 at N is beneficial as it triggers
450	   the BGP FRR Protection onto the precomputed backup next-hop
451	   [I-D.rtgwg-bgp-pic].

453	   The RIB deletion at node M, if it occurs before the RIB deletion at
454	   N, would be disastrous as it would lead to the loss of the traffic
455	   from N to A before Q is able to apply the Mirroring protection.

457	   The solution consists in delaying the deletion of the SRDB entry for
458	   101 by 2 seconds while still deleting the IP RIB 192.0.2.1/32 entry
459	   immediately.

461	   The RIB deletion triggers the BGP FRR and BGP Convergence.  This is
462	   beneficial and must occur without delay.

464	   The deletion of the SRDB entry to Node-SID101 is delayed to ensure
465	   that the traffic still in transit towards Node-SID 101 is not
466	   dropped.

468	   The delay timer should be long enough to ensure that either the BGP
469	   FRR or the BGP Convergence has taken place at N.

471	3.2.5.  Conclusions

473	   In our reference figure, N sends its packets towards A with the
474	   segment list {101, 9001}.  The shortest-path from S to A transits via
475	   M and Q.

477	   Within a few msec of the loss of A, Q activates its pre-installed
478	   Mirror_FRR entry and reroutes the traffic to B with the following
479	   segment list {102, 10002, 9001}.

481	   Within a few 100's of msec, any IGP node deletes its RIB entry to A
482	   but keeps its SRDB entry to Node-SID 101 for an extra 2 seconds.

484	   Upon deleting its RIB entry to 192.0.2.1/32, N activates its BGP FRR
485	   entry and reroutes its S destined traffic towards B with segment list
486	   {102, 9002}.

488	   By the time any IGP node deletes the SRDB entry to Node-SID 101, N no
489	   longer sends any traffic with Node-SID 101.

491	   The deletion of the SRDB entry to Node-SID101 is delayed to ensure
492	   that the traffic still in transit towards Node-SID 101 is not
493	   dropped.

495	   In conclusion, the traffic loss only depends on the ability of Q to
496	   detect the node failure of its adjacent node A.

498	4.  Traffic Engineering

500	   In this section, we describe Traffic Engineering use-cases for SR,
501	   distinguishing use-cases for traffic engineering with bandwidth
502	   admission control from those without.

504	4.1.  Traffic Engineering without Bandwidth Admission Control

506	   This section describes traffic-engineering use-cases which do not
507	   require bandwidth admission control.

509	   The first sub-section illustrates the use of anycast segments to
510	   express macro policies.  Two examples are provided: one involving a
511	   disjointness enforcement within a so-called dual-plane network, and
512	   the other involving CoS-based policies.

514	   The second sub-section illustrate how a head-end router can combine a
515	   distributed CSPF computation with SR.  Various examples are provided
516	   where the CSPF constraint or objective is either a TE affinity, an
517	   SRLG or a latency metric.

519	   The third sub-section illustrates how SR can help traffic-engineer
520	   outbound traffic among different external peers, overriding the best
521	   installed IP path at the egress border routers.

523	   The fourth sub-section describes how SR can be used to express
524	   deterministic non-ECMP paths.  Several techniques to compress the
525	   related segment lists are also introduced.

527	   The fifth sub-section describes a use-case where a node attaches an
528	   Adj-SID to a set of its interfaces however not sharing the same
529	   neighbor.  The illustrated benefit relates to loadbalancing.

531	4.1.1.  Anycast Node Segment

533	   The SR architecture defines an anycast segment as a segment attached
534	   to an anycast IP prefix ([RFC4786]).

536	   The anycast node segment is an interesting tool for traffic
537	   engineering:

539	      Macro-policy support: anycast segments allow to express policies
540	      such as "go via plane1 of a dual-plane network" (Section 4.1.1.1)
541	      or "go via Region3" (Section 4.1.3).

543	      Implicit node resiliency: the traffic-engineering policy is not
544	      anchored to a specific node whose failure could impact the
545	      service.  It is anchored to an anycast address/Anycast-SID and
546	      hence the flow automatically reroutes on any ECMP-aware shortest-
547	      path to any other router part of the anycast set.

549	   The two following sub-sections illustrate to traffic-engineering use-
550	   cases leveraging Anycast-SID.

552	4.1.1.1.  Disjointness in dual-plane networks

554	   Many networks are built according to the dual-plane design:
555	      Each access region k is connected to the core by two C routers
556	      (C(1,k) and C(2,k)).

558	      C(1,k) is part of plane 1 and aggregation region K

560	      C(2,k) is part of plane 2 and aggregation region K

562	      C(1,k) has a link to C(2, j) iff k = j.

564	         The core nodes of a given region are directly connected.
565	         Inter-region links only connect core nodes of the same plane.

567	      {C(1,k) has a link to C(1, j)} iff {C(2,k) has a link to C(2, j)}.

569	         The distribution of these links depends on the topological
570	         properties of the core of the AS. The design rule presented
571	         above specifies that these links appear in both core planes.

573	   We assume a common design rule found in such deployments: the inter-
574	   plane link costs (Cik-Cjk where i<>j) are set such that the route to
575	   an edge destination from a given plane stays within the plane unless
576	   the plane is partitioned.

578	                             Edge Router A
579	                                 /  \
580	                                /    \
581	                               /      \  Agg Region A
582	                              /        \
583	                             /          \
584	                            C1A----------C2A
585	                            | \         | \
586	                            |  \        |  \
587	                            |   C1B----------C2B
588	                  Plane1    |    |      |    |     Plane2
589	                            |    |      |    |
590	                            C1C--|-----C2C   |
591	                              \  |        \  |
592	                               \ |         \ |
593	                               C1Z----------C2Z
594	                                  \        /
595	                                   \      /  Agg Region Z
596	                                    \    /
597	                                     \  /
598	                                 Edge Router Z

600	               Figure 3: Dual-Plane Network and Disjointness

602	   In the above network diagram, let us that the operator configures:

604	      The four routers (C1A, C1B, C1C, C1Z) with an anycast loopback
605	      address 192.0.2.1/32 and an Anycast-SID 101.

607	      The four routers (C2A, C2B, C2C, C2Z) with an anycast loopback
608	      address 192.0.2.2/32 and an Anycast-SID 102.

610	      Edge router Z with Node-SID 109.

612	   A can then use the three following segment lists to control its
613	   Z-destined traffic:

615	      {109}: the traffic is load-balanced across any ECMP path through
616	      the network.

618	      {101, 109}: the traffic is load-balanced across any ECMP path
619	      within the Plane1 of the network.

621	      {102, 109}: the traffic is load-balanced across any ECMP path
622	      within the Plane2 of the network.

624	   Most of the data traffic to Z would use the first segment list, such
625	   as to exploit the capacity efficiently.  The operator would use the
626	   two other segment lists for specific premium traffic that has
627	   requested disjoint transport.

629	   For example, let us assume a bank or a government customer has
630	   requested that the two flows F1 and F2 injected at A and destined to
631	   Z should be transported across disjoint paths.  The operator could
632	   classify F1 (F2) at A and impose and SR header with the second
633	   (third) segment list.  Focusing on F1 for the sake of illustration, A
634	   would route the packets based on the active segment, Anycast-SID 101,
635	   which steers the traffic along the ECMP-aware shortest-path to the
636	   closest router part of the Anycast-SID 101, C1A is this example.
637	   Once the packets have reached C1A, the second segment becomes active,
638	   Node-SID 109, which steers the traffic on the ECMP-aware shortest-
639	   path to Z. C1A load-balances the traffic between C1B-C1Z and C1C-C1Z
640	   and then C1Z forwards to Z.

642	   This SR use-case has the following benefits:

644	      Zero per-service state and signaling on midpoint and tail-end
645	      routers.

647	      Only two additional node segments (one Anycast-SID per plane).

649	      ECMP-awareness.

651	      Node resiliency property: the traffic-engineering policy is not
652	      anchored to a specific core node whose failure could impact the
653	      service.

655	4.1.1.2.  CoS-based Traffic Engineering

657	   Frequently, different classes of service need different path
658	   characteristics.

660	   In the example below, a single-area international network with
661	   presence in four different regions of the world has lots of cheap
662	   network capacity from Region4 to Region1 via Region2 and some scarce
663	   expensive capacity via Region3.
664	                         +-------[Region2]-------+
665	                         |                       |
666	                A----[Region4]               [Region1]----Z
667	                         |                       |
668	                         +-------[Region3]-------+

670	                 Figure 4: International Topology Example

672	   In such case, the IGP metrics would be tuned to have a shortest-path
673	   from A to Z via Region2.

675	   This would provide efficient capacity planning usage while fulfilling
676	   the requirements of most of the traffic demands.  However, it may not
677	   suite the latency requirements of the voice traffic between the two
678	   cities.

680	   Let us illustrate how this can be solved with Segment Routing.

682	   The operator would configure:
683	   - All the core routers in Region3 with an anycast loopback
684	     192.0.2.3/32 to which Anycast-SID 333 is attached.
685	   - A loopback 192.0.2.9/32 on Z and would attach Node-SID 109
686	     to it.
687	   - The IGP metrics such that the shortest-path from Region4 to
688	     Region1 is via Region2, from Region4 to Region3 is directly
689	     to Region3, the shortest-path from Region3 to Region1 is not
690	     back via Region4 and Region2 but straight to Region1.

692	   With this in mind, the operator would instruct A to apply the
693	   following policy for its Z-destined traffic:
694	   - Voice traffic: impose segment-list {333, 109}
695	      - Anycast-SID 333 steers the Voice traffic along the
696	        ECMP-aware shortest-path to the closest core router in
697	        Region3, then Node-SID 109 steers the Voice traffic along
698	        the ECMP-aware shortest-path to Z. Hence the Voice traffic
699	        reaches Z from A via the low-latency path through Region3.

701	   - Any other traffic: impose segment-list {109}: Node-SID 109
702	     steers the Voice traffic along the ECMP-aware shortest-path
703	     to Z. Hence the bulk traffic reaches Z from A via the cheapest
704	     path for the operator.

706	   This SR use-case has the following benefits:

708	      Zero per-service state and signaling at midpoint and tailend
709	      nodes.

711	      One additional anycast segment per region.

713	      ECMP-awareness.

715	      Node resiliency property: the traffic-engineering policy is not
716	      anchored to a specific core node whose failure could impact the
717	      service.

719	4.1.2.  Distributed CSPF-based Traffic Engineering

721	   In this section, we illustrate how a head-end router can map the
722	   result of its distributed CSPF computation into an SR segment list.
723	                                 +---E---+
724	                                 |       |
725	                           A-----B-------C-----Z
726	                                 |       |
727	                                 +---D---+

729	                         Figure 5: SRLG-based CSPF

731	   Let us assume that in the above network diagram:

733	      The operator configures a policy on A such that its Z-destined
734	      traffic must avoid SRLG1.

736	      The operator configures SRLG1 on the link BC (or is learned
737	      dynamically from the IP/Optical interaction with the DWDM
738	      network).

740	      The SRLG's are flooded in the link-state IGP.

742	      The operator respectively configures the Node-SIDs 101, 102, 103,
743	      104, 105 and 109 at nodes A, B, C, D, E and Z.

745	   In that context, A can apply the following CSPF behavior:

747	   - It prunes all the links affected by the SRLG1, computes an SPF
748	     on the remaining topology and picks one of the SPF paths.
749	      - In our example, A finds two possible paths ABECZ and ABDCZ
750	        and let's assume it takes the ABDCZ path.

752	   - It translates the path as a list of segments
753	      - In our example, ABDCZ can be expressed as {104, 109}: a
754	        shortest path to node D, followed by a shortest-path to
755	        node Z.

757	   - It monitors the status of the LSDB and upon any change
758	     impacting the policy, it either recomputes a path meeting the
759	     policy or update its translation as a list of segments.
760	     - For example, upon the loss of the link DC, the shortest-path
761	       to Z from D (Node-SID 109) goes via the undesired link BC.
762	       After a transient time immediately following such failure,
763	       the node A would figure out that the chosen path is no longer
764	       valid and instead select ABECZ which is translated as
765	       {103, 109}.

767	   - This behavior is a local matter at node A and hence the details
768	     are outside the scope of this document.

770	   The same use-case can be derived from any other C-SPF objective or
771	   constraint (TE affinity, TE latency, SRLG, etc.) as defined in
772	   [RFC5305] and [I-D.ietf-isis-te-metric-extensions].  Note that the
773	   bandwidth case is specific and hence is treated in Section 4.2.

775	4.1.3.  Egress Peering Traffic Engineering
776	                                     +------+
777	                                     |      |
778	                                 +---D      F
779	                    +---------+ /    | AS 2 |\ +------+
780	                    |         |/     +------+ \|   Z  |
781	                    A         C                |      |
782	                    |         |\     +------+ /| AS 4 |
783	                    B   AS1   | \    |      |/ +------+
784	                    |         |  +---E      G
785	                    +---------+      | AS 3 |
786	                                     +------+\

788	               Figure 6: Egress peering traffic engineering

790	   Let us assume that:

792	      C in AS1 learns about destination Z of AS 4 via two BGP paths
793	      (AS2, AS4) and (AS3, AS4).

795	      C sets next-hop-self before propagating the paths within AS1.

797	      C propagates all the paths to Z within AS1 (add-path).

799	      C only installs the path via AS2 in its RIB.

801	   In that context, the operator of AS1 cannot apply the following
802	   traffic-engineering policy:

804	      Steer 60% of the Z-destined traffic received at A via AS2 and 40%
805	      via AS3.

807	      Steer 80% of the Z-destined traffic received at B via AS2 and 20%
808	      via AS3.

810	   This traffic-engineering policy can be supported thanks to the
811	   following SR configuration.

813	   The operator configures:

815	      C with a loopback 192.0.2.1/32 and attach the Node-SID 101 to it.

817	      C to bind an external adjacency segment
818	      ([I-D.filsfils-rtgwg-segment-routing]) to each of its peering
819	      interface.

821	   For the sake of this illustration, let us assume that the external
822	   adjacency segments bound by C for its peering interfaces to (D, AS2)
823	   and (E, AS3) are respectively 9001 and 9002.

825	   These external adjacencies (and their attached segments) are flooded
826	   within the IGP domain of AS1 [RFC5316].

828	   As a result, the following information is available within AS1:
829	   ISIS Link State Database:

831	   - Node-SID 101 is attached to IP address 192.0.2.1/32 advertised
832	     by C.
833	   - C is connected to a peer D with external adjacency segment 9001.
834	   - C is connected to a peer E with external adjacency segment 9002.
835	   BGP Database:

837	   - Z is reachable via 192.0.2.1 with AS Path {AS2, AS4}.
838	   - Z is reachable via 192.0.2.1 with AS Path {AS3, AS4}.

840	   The operator of AS1 can thus meet its traffic-engineering objective
841	   by enforcing the following policies:

843	      A should apply the segment list {101, 9001} to 60% of the
844	      Z-destined traffic and the segment list {101, 9002} to the rest.

846	      B should apply the segment list {101, 9001} to 80% of the
847	      Z-destined traffic and the segment list {101, 9002} to the rest.

849	   Node segment 101 steers the traffic to C.

851	   External adjacency segment 9001 forces the traffic from C to (D,
852	   AS2), without any IP lookup at C.

854	   External adjacency segment 9002 forces the traffic from C to (E,
855	   AS3), without any IP lookup at C.

857	   A and B can also use the described segments to assess the liveness of
858	   the remote peering links, see OAM section.

860	4.1.4.  Deterministic non-ECMP Path

862	   The previous sections have illustrated the ability to steer traffic
863	   along ECMP-aware shortest-paths.  SR is also able to express
864	   deterministic non-ECMP path: i.e. as a list of adjacency segments.
865	   We illustrate such an use-case in this section.
866	                             A-B-C-D-E-F-G-H-Z
867	                               |           |
868	                               +-I-J-K-L-M-+

870	                   Figure 7: Non-ECMP deterministic path

872	   In the above figure, it is assumed all nodes are SR capable and only
873	   the following SIDs are advertised:
874	     - A advertises Adj-SID 9001 for its adjacency to B
875	     - B advertises Adj-SID 9002 for its adjacency to C
876	     - C advertises Adj-SID 9003 for its adjacency to D
877	     - D advertises Adj-SID 9004 for its adjacency to E
878	     - E advertises Adj-SID 9001 for its adjacency to F
879	     - F advertises Adj-SID 9002 for its adjacency to G
880	     - G advertises Adj-SID 9003 for its adjacency to H
881	     - H advertises Adj-SID 9004 for its adjacency to Z
882	     - E advertises Node-SID 101
883	     - Z advertises Node-SID 109

885	   The operator can steer the traffic from A to Z via a specific non-
886	   ECMP path ABCDEFGHZ by imposing the segment list {9001, 9002, 9003,
887	   9004, 9001, 9002, 9003, 9004}.

889	   The following sub-sections illustrate how the segment list can be
890	   compressed.

892	4.1.4.1.  Node Segment

894	   Clearly the same exact path can be expressed with a two-entry segment
895	   list {101, 109}.

897	   This example illustrates that a Node Segment can also be used to
898	   express deterministic non-ECMP path.

900	4.1.4.2.  Forwarding Adjacency

902	   The operator can configure Node B to create a forwarding-adjacency to
903	   node H along an explicit path BCDEFGH.  The following behaviors can
904	   then be automated by B:

906	      B attaches an Adj-SID (e.g. 9007) to that forwarding adjacency
907	      together with an ERO sub-sub-TLV which describes the explicit path
908	      BCDEFGH.

910	      B installs in its Segment Routing Database the following entry:

912	         Active segment: 9007.

914	         Operation: NEXT and PUSH {9002, 9003, 9004, 9001, 9002, 9003}

916	   As a result, the operator can configure node A with the following
917	   compressed segment list {9001, 9007, 9004}.

919	4.1.5.  Load-balancing among non-parallel links

921	   A given node may assign the same Adj-SID to multiple of its
922	   adjacencies, even if these ones lead to different neighbors.  This
923	   may be useful to support traffic engineering policies.

925	                                 +---C---D---+
926	                                 |           |
927	                       PE1---A---B-----F-----E---PE2

929	         Figure 8: Adj-SID For Multiple (non-parallel) Adjacencies

931	   In the above example, let us assume that the operator:

933	      Requires PE1 to load-balance its PE2-destined traffic between the
934	      ABCDE and ABFE paths.

936	      Configures B with Node-SID 102 and E with Node-SID 202.

938	      Configures B to advertise an individual Adj-SID per adjacency
939	      (e.g. 9001 for BC and 9002 for BF) and, in addition, an Adj-SID
940	      for the adjacency set (BC, BF) (e.g. 9003).

942	   With this context in mind, the operator achieves its objective by
943	   configuring the following traffic-engineering policy at PE1 for the
944	   PE2-destined traffic: {102, 9003, 202}:

946	      Node-SID 102 steers the traffic to B.

948	      Adj-SID 9003 load-balances the traffic to C or F.

950	      From either C or F, Node-SID 202 steers the traffic to PE2.

952	      In conclusion, the traffic is load-balanced between the ABCDE and
953	      ABFE paths, as desired.

955	4.2.  Traffic Engineering with Bandwidth Admission Control

957	   The implementation of bandwidth admission control within a network
958	   (and its possible routing consequence which consists in routing along
959	   explicit paths where the bandwidth is available) requires a capacity
960	   planning process.

962	   The spreading of load among ECMP paths is a key attribute of the
963	   capacity planning processes applied to packet-based networks.

965	   The first sub-section details the capacity planning process and the
966	   role of ECMP load-balancing.  We highlight the relevance of SR in
967	   that context.

969	   The next two sub-sections document two use-cases of SR-based traffic
970	   engineering with bandwidth admission control.

972	   The second sub-section documents a concrete SR applicability
973	   involving centralized-based admission control.  This is often
974	   referred to as the "SDN/SR use-case".

976	   The third sub-section introduces a future research topic involving
977	   the notion of residual bandwidth introduced in
978	   [I-D.ietf-mpls-te-express-path].

980	4.2.1.  Capacity Planning Process

982	   Capacity Planning anticipates the routing of the traffic matrix onto
983	   the network topology, for a set of expected traffic and topology
984	   variations.  The heart of the process consists in simulating the
985	   placement of the traffic along ECMP-aware shortest-paths and
986	   accounting for the resulting bandwidth usage.

988	   The bandwidth accounting of a demand along its shortest-path is a
989	   basic capability of any planning tool or PCE server.

991	   For example, in the network topology described below, and assuming a
992	   default IGP metric of 1 and IGP metric of 2 for link GF, a 1600Mbps
993	   A-to-Z flow is accounted as consuming 1600Mbps on links AB and FZ,
994	   800Mbps on links BC, BG and GF, and 400Mbps on links CD, DF, CE and
995	   EF.
996	                                   C-----D
997	                                 /  \     \
998	                            A---B    +--E--F--Z
999	                                 \        /
1000	                                  G------+

1002	             Figure 9: Capacity Planning an ECMP-based demand

1004	   ECMP is extremely frequent in SP, Enterprise and DC architectures and
1005	   it is not rare to see as much as 128 different ECMP paths between a
1006	   source and a destination within a single network domain.  It is a key
1007	   efficiency objective to spread the traffic among as many ECMP paths
1008	   as possible.

1010	   This is illustrated in the below network diagram which consists of a
1011	   subset of a network where already 5 ECMP paths are observed from A to
1012	   M.
1013	                                    C
1014	                                   / \
1015	                                  B-D-L--
1016	                                 / \ /   \
1017	                                A   E     \
1018	                                 \         M
1019	                                  \   G   /
1020	                                   \ / \ /
1021	                                    F   K
1022	                                     \ /
1023	                                      I

1025	                     Figure 10: ECMP Topology Example

1027	   Segment Routing offers a simple support for such ECMP-based shortest-
1028	   path placement: a node segment.  A single node segment enumerates all
1029	   the ECMP paths along the shortest-path.

1031	   When the capacity planning process detects that a traffic growth
1032	   scenario and topology variation would lead to congestion, a capacity
1033	   increase is triggered and if it cannot be deployed in due time, a
1034	   traffic engineering solution is activated within the network.

1036	   A basic traffic engineering objective consists of finding the
1037	   smallest set of demands that need to be routed off their shortest
1038	   path to eliminate the congestion, then to compute an explicit path
1039	   for each of them and instantiating these traffic-engineered policies
1040	   in the network.

1042	   Segment Routing offers a simple support for explicit path policy.
1043	   Let us provide two examples based on Figure 10.

1045	   First example: let us assume that the process has selected the flow
1046	   AM for traffic-engineering away from its ECMP-enabled shortest path
1047	   and flow AM must avoid consuming resources on the LM and the FG
1048	   links.

1050	   The solution is straightforward: A sends its M-destined traffic
1051	   towards the nhop F with a two-label stack where the top label is the
1052	   adjacent segment FI and the next label is the node segment to M.
1053	   Alternatively, a three-label stack with adjacency segments FI, IK and
1054	   KM could have been used.

1056	   Second example: let us assume that AM is still the selected flow but
1057	   the constraint is relaxed to only avoid using resources from the LM
1058	   link.

1060	   The solution is straightforward: A sends its M-destined traffic
1061	   towards the nhop F with a one-label stack where the label is the node
1062	   segment to M. Note that while the AM flow has been traffic-engineered
1063	   away from its natural shortest-path (ECMP across three paths), the
1064	   traffic-engineered path is still ECMP-aware and leverages two of the
1065	   three initial paths.  This is accomplished with a single-label stack
1066	   and without the enumeration of one tunnel per path.

1068	   Under the light of these examples, Segment Routing offers an
1069	   interesting solution for Capacity Planning because:

1071	      One node segment represents the set of ECMP-aware shortest paths.

1073	      Adjacency segments allow to express any explicit path.

1075	      The combination of node and adjacency segment allows to express
1076	      any path without having to enumerate all the ECMP options.

1078	      The capacity planning process ensures that the majority of the
1079	      traffic rides on node segments (ECMP-based shortest path), while a
1080	      minority of the traffic is routed off its shortest-path.

1082	      The explicitly-engineered traffic (which is a minority) still
1083	      benefits from the ECMP-awareness of the node segments within their
1084	      segment list.

1086	      Only the head-end of a traffic-engineering policy maintains state.
1087	      The midpoints and tail-ends do not maintain any state.

1089	4.2.2.  SDN/SR use-case

1091	   The heart of the application of SR to the SDN use-case lies in the
1092	   SDN controller, also called Stateful PCE
1093	   ([I-D.ietf-pce-stateful-pce]).

1095	   The SDN controller is responsible to control the evolution of the
1096	   traffic matrix and topology.  It accepts or denies the addition of
1097	   new traffic into the network.  It decides how to route the accepted
1098	   traffic.  It monitors the topology and upon failure, determines the
1099	   minimum traffic that should be rerouted on an alternate path to
1100	   alleviate a bandwidth congestion issue.

1102	   The algorithms supporting this behavior are a local matter of the SDN
1103	   controller and are outside the scope of this document.

1105	   The means of collecting traffic and topology information are the same
1106	   as what would be used with other SDN-based traffic-engineering
1107	   solutions (e.g.  [RFC7011] and [I-D.ietf-idr-ls-distribution].

1109	   The means of instantiating policy information at a traffic-
1110	   engineering head-end are the same as what would be used with other
1111	   SDN-based traffic-engineering solutions (e.g.:
1112	   [I-D.ietf-i2rs-architecture], [I-D.ietf-pce-pce-initiated-lsp] and
1113	   [I-D.sivabalan-pce-segment-routing]).

1115	4.2.2.1.  Illustration
1116	                                        _______________
1117	                                       {               }
1118	                      +--C--+    V    {  SDN Controller }
1119	                      |/   \|   /      {_______________}
1120	                  A===B--G--D==F--Y
1121	                      |\   /|   \
1122	                      +--E--+    Z

1124	                              SDN/SR use-case

1126	   Let us assume that in the above network diagram:

1128	      An SDN Controller (SC) is connected to the network and is able to
1129	      retrieve the topology and traffic information, as well as set
1130	      traffic-engineering policies on the network nodes.

1132	      The operator (likely via the SDN Controller) as provisioned the
1133	      Node-SIDs 101, 102, 103, 104, 105, 106, 107, 201, 202 and 203
1134	      respectively at nodes A, B, C, D, E, F, G, V, Y and Z.

1136	      All the links have the same BW (e.g. 10G) and IGP cost (e.g. 10)
1137	      except the links BG and GD which have IGP cost 50.

1139	      Each described node connectivity is formed as a bundle of two
1140	      links, except (B, G) and (G, D) which are formed by a single link
1141	      each.

1143	      Flow FV is traveling from A to destinations behind V.

1145	      Flow FY is traveling from A to destinations behind Y.

1147	      Flow FZ is traveling from A to destinations behind Z.

1149	      The SDN Controller has admitted all these flows and has let A
1150	      apply the default SR policy: "map a flow onto its ECMP-aware
1151	      shortest-path".

1153	         In this example, this means that A respectively maps the flows
1154	         FV onto segment list {201}, FY onto segment list {202} and FZ
1155	         onto segment list {203}.

1157	         In this example, the reader should note that the SDN Controller
1158	         knows what A would do and hence knows and controls that none of
1159	         these flows are mapped through G.

1161	   Let us describe what happens upon the failure of one of the two links
1162	   E-D.

1164	   The SDN Controller monitors the link-state database and detects a
1165	   congestion risk due to the reduced capacity between E and D.
1166	   Specifically, SC updates its simulation of the traffic according to
1167	   the policies he instructed the network to use and discovers that too
1168	   much traffic is mapped on the remaining link E-D.

1170	   The SDN Controller then computes the minimum number of flows that
1171	   should be deviated from their existing path.  For example, let us
1172	   assume that the flow FZ is selected.

1174	   The SDN controller then computes an explicit path for this flow.  For
1175	   example, let us assume that the chosen path is ABGDFZ.

1177	   The SDN controller then maps the chosen path into an SR-based policy.
1178	   In our example, the path ABGDFZ is translated into a segment list
1179	   {107, 203}.  Node-SID steers the traffic along ABG and then Node-SID
1180	   203 steers the traffic along GDFZ.

1182	   The SDN controller then applies the following traffic-engineering
1183	   policy at A: "map any packet of the classified flow FZ onto segment-
1184	   list {107, 203}".  The SDN Controller uses PCEP extensions to
1185	   instantiate that policy at A ([I-D.sivabalan-pce-segment-routing]).

1187	   As soon as A receives the PCEP message, it enforces the policy and
1188	   the traffic classified as FZ is immediately mapped onto segment list
1189	   {107, 203}.

1191	   This immediately eliminate the congestion risk.  Flows FV and FY were
1192	   untouched and keep using the ECMP-aware shortest-path.  The minimum
1193	   amount of traffic was rerouted (FZ).  No signaling hop-by-hop through
1194	   the network from A to Z is required.  No admission control hop-by-hop
1195	   is required.  No state needs to be maintained by B, G, D, F or Z. The
1196	   only maintained state is within the SDN controller and the head-end
1197	   node (A).

1199	4.2.2.2.  Benefits

1201	   In the context of Centralized-Based Optimization and the SDN use-
1202	   case, here are the benefits provided by the SR architecture:

1204	      Explicit routing capability with or without ECMP-awareness.

1206	      No signaling hop-by-hop through the network.

1208	      State is only maintained at the policy head-end.  No state is
1209	      maintained at mid-points and tail-ends.

1211	      Automated guaranteed FRR for any topology (Section 3.

1213	      Optimum virtualization: the policy state is in the packet header
1214	      and not in the intermediate node along the policy.  The policy is
1215	      completely virtualized away from midpoints and tail-ends.

1217	      Highly responsive to change: the SDN Controller only needs to
1218	      apply a policy change at the head-end.  No delay is lost
1219	      programming the midpoints and tail-end along the policy.

1221	4.2.2.3.  Dataset analysis

1223	   A future version of this document will report some analysis of the
1224	   application of the SDN/SR use-case to real operator data sets.

1226	   A first, incomplete, report is available here below.

1228	4.2.2.3.1.  Example 1

1230	   The first data-set consists in a full-mesh of 12000 explicitly-routed
1231	   tunnels observed on a real network.  These tunnels resulted from
1232	   distributed headend-based CSPF computation.

1234	   We measured that only 65% of the traffic is riding on its shortest
1235	   path.

1237	   Three well-known defects are illustrated in this data set:

1239	      The lack of ECMP support in explicitly--routed tunnels: ATM-alike
1240	      traffic-steering mechanisms steer the traffic along a non-ECMP
1241	      path.

1243	      The increase of the number of explicitly-routed non-ECMP tunnels
1244	      to enumerate all the ECMP options.

1246	      The inefficiency of distributed optimization: too much traffic is
1247	      riding off its shortest path.

1249	   We applied the SDN/SR use-case to this dataset.  This means that:

1251	      The distributed CSPF computation is replaced by centralized
1252	      optimization and BW admission control, supported by the SDN
1253	      Controller.

1255	         As part of the optimization, we also optimized the IGP-metrics
1256	         such as to get a maximum of traffic load-spread among ECMP-
1257	         paths by default.

1259	      The traffic-engineering policies are supported by SR segment-
1260	      lists.

1262	   As a result, we measured that 98% of the traffic would be kept on its
1263	   normal policy (ride shortest-path) and only 2% of the traffic
1264	   requires a path away from the shortest-path.

1266	   Let us highlight a few benefits:

1268	      98% of the traffic-engineering head-end policies are eliminated.

1270	         Indeed, by default, an SR-capable ingress edge node maps the
1271	         traffic on a single Node-ID to the egress edge node.  No
1272	         configuration or policy needs to be maintained at the ingress
1273	         edge node to realize this.

1275	      100% of the states at mid/tail nodes are eliminated.

1277	4.2.3.  Residual Bandwidth

1279	   The notion of Residual Bandwidth (RBW) is introduced by
1280	   [I-D.ietf-mpls-te-express-path].

1282	   A future version of this document will describe the SR/RBW research
1283	   opportunity.

1285	5.  Service chaining

1287	   Segment routing can be used to steer packets through services offered
1288	   by middleboxes to perform specific actions such as DPI, accounting,
1289	   etc.

1291	                             I---A---B---C---E
1292	                              \  |  / \ /
1293	                               \ | /   F
1294	                                \|/
1295	                                 D

1297	                                 Figure 11

1299	   For example, as illustrated in Figure 11, an ingress node I selects
1300	   an egress node E for a packet P. An application however requires that
1301	   P undergoes a specific treatment (DPI, firewalling, ...) offered by a
1302	   node D, reachable in the SR domain.  In the SR architecture, this
1303	   application can be supported through the use of a service segment
1304	   with a local scope to D, say SS, following the nodal segment which
1305	   corresponds to D. The Ingress box keeps the control of the egress
1306	   node through which the packet needs to exit the network, by placing a
1307	   nodal segment identifying the egress node after the service segment.

1309	   This would be achieved by letting I forward the packet P with the
1310	   following sequence of segments: {D,SS,E}.  D is a nodal segment, SS
1311	   is the service segment corresponding to the service to apply to the
1312	   packet P, and E is the nodal segment corresponding to the egress node
1313	   selected by I for that packet.

1315	6.  OAM

1317	6.1.  Monitoring a remote bundle

1319	   This section documents a few representative SR/OAM use-cases.
1320	               +--+    _   +--+                    +-------+
1321	               |  |   { }  |  |---991---L1---662---|       |
1322	               |MS|--{   }-|R1|---992---L2---663---|R2 (72)|
1323	               |  |   {_}  |  |---993---L3---664---|       |
1324	               +--+        +--+                    +-------+

1326	            Figure 12: Probing all the links of a remote bundle

1328	   In the above figure, a monitoring system (MS) needs to assess the
1329	   dataplane availability of all the links within a remote bundle
1330	   connected to routers R1 and R2.

1332	   The monitoring system retrieves the segment information from the IGP
1333	   LSDB and appends the following segment list: {72, 662, 992, 664} on
1334	   its IP probe (whose source and destination addresses are the address
1335	   of AA).

1337	   MS sends the probe to its connected router.  If the connected router
1338	   is not SR compliant, a tunneling technique can be used to tunnel the
1339	   SR-based probe to the first SR router.  The SR domain forwards the
1340	   probe to R2 (72 is the node segment of R2).  R2 forwards the probe to
1341	   R1 over link L1 (adjacency segment 662).  R1 forwards the probe to R2
1342	   over link L2 (adjacency segment 992).  R2 forwards the probe to R1
1343	   over link L3 (adjacency segment 664).  R1 then forwards the IP probe
1344	   to AA as per classic IP forwarding.

1346	6.2.  Monitoring a remote peering link

1348	   In Figure 6, node A can monitor the dataplane liveness of the
1349	   unidirectional peering link from C to D of AS2 by sending an IP probe
1350	   with destination address A and segment list {101, 9001}.  Node-SID
1351	   101 steers the probe to C and External Adj-SID 9001 steers the probe
1352	   from C over the desired peering link to D of AS2.  The SR header is
1353	   removed by C and D receives a plain IP packet with destination
1354	   address A. D returns the probe to A through classic IP forwarding.
1355	   BFD Echo mode ([RFC5880]) would support such liveliness
1356	   unidirectional link probing application.

1358	7.  IANA Considerations

1360	   TBD

1362	8.  Manageability Considerations

1364	   TBD

1366	9.  Security Considerations

1368	   TBD

1370	10.  Acknowledgements

1372	   We would like to thank Dave Ward, Dan Frost, Stewart Bryant, Thomas
1373	   Telkamp, Ruediger Geib and Les Ginsberg for their contribution to the
1374	   content of this document.

1376	11.  References

1378	11.1.  Normative References

1380	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1381	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1383	   [RFC4786]  Abley, J. and K. Lindqvist, "Operation of Anycast
1384	              Services", BCP 126, RFC 4786, December 2006.

1386	   [RFC5305]  Li, T. and H. Smit, "IS-IS Extensions for Traffic
1387	              Engineering", RFC 5305, October 2008.

1389	   [RFC5316]  Chen, M., Zhang, R., and X. Duan, "ISIS Extensions in
1390	              Support of Inter-Autonomous System (AS) MPLS and GMPLS
1391	              Traffic Engineering", RFC 5316, December 2008.

1393	   [RFC5880]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
1394	              (BFD)", RFC 5880, June 2010.

1396	   [RFC7011]  Claise, B., Trammell, B., and P. Aitken, "Specification of
1397	              the IP Flow Information Export (IPFIX) Protocol for the
1398	              Exchange of Flow Information", STD 77, RFC 7011,
1399	              September 2013.

1401	11.2.  Informative References

1403	   [I-D.filsfils-rtgwg-segment-routing]
1404	              Filsfils, C., Previdi, S., Bashandy, A., Decraene, B.,
1405	              Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R.,
1406	              Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe,
1407	              "Segment Routing Architecture",
1408	              draft-filsfils-rtgwg-segment-routing-01 (work in
1409	              progress), October 2013.

1411	   [I-D.filsfils-spring-segment-routing-ldp-interop]
1412	              Filsfils, C., Previdi, S., Bashandy, A., Decraene, B.,
1413	              Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R.,
1414	              Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe,
1415	              "Segment Routing interoperability with LDP",
1416	              draft-filsfils-spring-segment-routing-ldp-interop-00 (work
1417	              in progress), October 2013.

1419	   [I-D.filsfils-spring-segment-routing-mpls]
1420	              Filsfils, C., Previdi, S., Bashandy, A., Decraene, B.,
1421	              Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R.,
1422	              Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe,
1423	              "Segment Routing with MPLS data plane",
1424	              draft-filsfils-spring-segment-routing-mpls-00 (work in
1425	              progress), October 2013.

1427	   [I-D.francois-sr-frr]
1428	              Francois, P., Filsfils, C., Bashandy, A., Previdi, S., and
1429	              B. Decraene, "Segment Routing Fast Reroute",
1430	              draft-francois-sr-frr-00 (work in progress), July 2013.

1432	   [I-D.ietf-i2rs-architecture]
1433	              Atlas, A., Halpern, J., Hares, S., Ward, D., and T.
1434	              Nadeau, "An Architecture for the Interface to the Routing
1435	              System", draft-ietf-i2rs-architecture-02 (work in
1436	              progress), February 2014.

1438	   [I-D.ietf-idr-ls-distribution]
1439	              Gredler, H., Medved, J., Previdi, S., Farrel, A., and S.
1440	              Ray, "North-Bound Distribution of Link-State and TE
1441	              Information using BGP", draft-ietf-idr-ls-distribution-04
1442	              (work in progress), November 2013.

1444	   [I-D.ietf-isis-te-metric-extensions]
1445	              Previdi, S., Giacalone, S., Ward, D., Drake, J., Atlas,
1446	              A., Filsfils, C., and W. Wu, "IS-IS Traffic Engineering
1447	              (TE) Metric Extensions",
1448	              draft-ietf-isis-te-metric-extensions-01 (work in
1449	              progress), October 2013.

1451	   [I-D.ietf-mpls-te-express-path]
1452	              Atlas, A., Drake, J., Giacalone, S., Ward, D., Previdi,
1453	              S., and C. Filsfils, "Performance-based Path Selection for
1454	              Explicitly Routed LSPs using TE Metric Extensions",
1455	              draft-ietf-mpls-te-express-path-00 (work in progress),
1456	              October 2013.

1458	   [I-D.ietf-pce-pce-initiated-lsp]
1459	              Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "PCEP
1460	              Extensions for PCE-initiated LSP Setup in a Stateful PCE
1461	              Model", draft-ietf-pce-pce-initiated-lsp-00 (work in
1462	              progress), December 2013.

1464	   [I-D.ietf-pce-stateful-pce]
1465	              Crabbe, E., Medved, J., Minei, I., and R. Varga, "PCEP
1466	              Extensions for Stateful PCE",
1467	              draft-ietf-pce-stateful-pce-08 (work in progress),
1468	              February 2014.

1470	   [I-D.previdi-isis-segment-routing-extensions]
1471	              Previdi, S., Filsfils, C., Bashandy, A., Gredler, H.,
1472	              Litkowski, S., and J. Tantsura, "IS-IS Extensions for
1473	              Segment Routing",
1474	              draft-previdi-isis-segment-routing-extensions-05 (work in
1475	              progress), February 2014.

1477	   [I-D.psenak-ospf-segment-routing-extensions]
1478	              Psenak, P., Previdi, S., Filsfils, C., Gredler, H.,
1479	              Shakir, R., and W. Henderickx, "OSPF Extensions for
1480	              Segment Routing",
1481	              draft-psenak-ospf-segment-routing-extensions-04 (work in
1482	              progress), February 2014.

1484	   [I-D.rtgwg-bgp-pic]
1485	              Bashandy, A., Filsfils, C., and P. Mohapatra, "Abstract",
1486	              draft-rtgwg-bgp-pic-02 (work in progress), October 2013.

1488	   [I-D.shakir-rtgwg-sr-performance-engineered-lsps]
1489	              Shakir, R., Vernals, D., and A. Capello, "Performance
1490	              Engineered LSPs using the Segment Routing Data-Plane",
1491	              draft-shakir-rtgwg-sr-performance-engineered-lsps-00 (work
1492	              in progress), July 2013.

1494	   [I-D.sivabalan-pce-segment-routing]
1495	              Sivabalan, S., Medved, J., Filsfils, C., Crabbe, E., and
1496	              R. Raszuk, "PCEP Extensions for Segment Routing",
1497	              draft-sivabalan-pce-segment-routing-02 (work in progress),
1498	              October 2013.

1500	   [RFC5443]  Jork, M., Atlas, A., and L. Fang, "LDP IGP
1501	              Synchronization", RFC 5443, March 2009.

1503	   [RFC6138]  Kini, S. and W. Lu, "LDP IGP Synchronization for Broadcast
1504	              Networks", RFC 6138, February 2011.

1506	Authors' Addresses

1508	   Clarence Filsfils (editor)
1509	   Cisco Systems, Inc.
1510	   Brussels,
1511	   BE

1513	   Email: cfilsfil@cisco.com

1515	   Pierre Francois (editor)
1516	   IMDEA Networks
1517	   Leganes,
1518	   ES

1520	   Email: pierre.francois@imdea.org

1522	   Stefano Previdi
1523	   Cisco Systems, Inc.
1524	   Via Del Serafico, 200
1525	   Rome  00142
1526	   Italy

1528	   Email: sprevidi@cisco.com

1530	   Bruno Decraene
1531	   Orange
1532	   FR

1534	   Email: bruno.decraene@orange.com

1536	   Stephane Litkowski
1537	   Orange
1538	   FR

1540	   Email: stephane.litkowski@orange.com
1541	   Martin Horneffer
1542	   Deutsche Telekom
1543	   Hammer Str. 216-226
1544	   Muenster  48153
1545	   DE

1547	   Email: Martin.Horneffer@telekom.de

1549	   Igor Milojevic
1550	   Telekom Srbija
1551	   Takovska 2
1552	   Belgrade
1553	   RS

1555	   Email: igormilojevic@telekom.rs

1557	   Rob Shakir
1558	   British Telecom
1559	   London
1560	   UK

1562	   Email: rob.shakir@bt.com

1564	   Saku Ytti
1565	   TDC Oy
1566	   Mechelininkatu 1a
1567	   TDC  00094
1568	   FI

1570	   Email: saku@ytti.fi

1572	   Wim Henderickx
1573	   Alcatel-Lucent
1574	   Copernicuslaan 50
1575	   Antwerp  2018
1576	   BE

1578	   Email: wim.henderickx@alcatel-lucent.com
1579	   Jeff Tantsura
1580	   Ericsson
1581	   300 Holger Way
1582	   San Jose, CA  95134
1583	   US

1585	   Email: Jeff.Tantsura@ericsson.com

1587	   Sriganesh Kini
1588	   Ericsson
1589	   300 Holger Way
1590	   San Jose, CA  95134
1591	   US

1593	   Email: sriganesh.kini@ericsson.com

1595	   Edward Crabbe
1596	   Google, Inc.
1597	   1600 Amphitheatre Parkway
1598	   Mountain View, CA  94043
1599	   US

1601	   Email: edc@google.com