idnits 2.17.1 

draft-bookham-rtgwg-nfix-arch-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == There is 1 instance of lines with non-ascii characters in the document.


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet has text resembling
     RFC 2119 boilerplate text.

  -- The document date (5 January 2022) is 840 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'ELI' is mentioned on line 755, but not defined

  == Missing Reference: 'EL' is mentioned on line 755, but not defined

  == Missing Reference: 'RFC7130' is mentioned on line 1059, but not defined

  == Outdated reference: A later version (-10) exists of
     draft-ietf-bess-evpn-ipvpn-interworking-06

  == Outdated reference: A later version (-22) exists of
     draft-ietf-spring-segment-routing-policy-14

  == Outdated reference: A later version (-13) exists of
     draft-ietf-rtgwg-segment-routing-ti-lfa-07

  == Outdated reference: A later version (-19) exists of
     draft-ietf-idr-te-lsp-distribution-16

  == Outdated reference: A later version (-09) exists of
     draft-filsfils-spring-sr-policy-considerations-08

  == Outdated reference: A later version (-20) exists of
     draft-ietf-rtgwg-bgp-pic-17

  == Outdated reference: A later version (-08) exists of
     draft-ietf-idr-next-hop-capability-07

  == Outdated reference: A later version (-06) exists of
     draft-ietf-idr-long-lived-gr-00

  -- Obsolete informational reference (is this intentional?): RFC 7752
     (Obsoleted by RFC 9552)


     Summary: 0 errors (**), 0 flaws (~~), 14 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	RTG Working Group                                        C. Bookham, Ed.
3	Internet-Draft                                                  A. Stone
4	Intended status: Informational                                     Nokia
5	Expires: 9 July 2022                                         J. Tantsura
6	                                                               Microsoft
7	                                                              M. Durrani
8	                                                             Equinix Inc
9	                                                             B. Decraene
10	                                                                  Orange
11	                                                          5 January 2022

13	           An Architecture for Network Function Interconnect
14	                    draft-bookham-rtgwg-nfix-arch-04

16	Abstract

18	   The emergence of technologies such as 5G, the Internet of Things
19	   (IoT), and Industry 4.0, coupled with the move towards network
20	   function virtualization, means that the service requirements demanded
21	   from networks are changing.  This document describes an architecture
22	   for a Network Function Interconnect (NFIX) that allows for
23	   interworking of physical and virtual network functions in a unified
24	   and scalable manner across wide-area network and data center domains
25	   while maintaining the ability to deliver against SLAs.

27	Requirements Language

29	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
30	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
31	   document are to be interpreted as described in BCP 14
32	   [RFC2119][RFC8174] when, and only when, they appear in all capitals,
33	   as shown here.

35	Status of This Memo

37	   This Internet-Draft is submitted in full conformance with the
38	   provisions of BCP 78 and BCP 79.

40	   Internet-Drafts are working documents of the Internet Engineering
41	   Task Force (IETF).  Note that other groups may also distribute
42	   working documents as Internet-Drafts.  The list of current Internet-
43	   Drafts is at https://datatracker.ietf.org/drafts/current/.

45	   Internet-Drafts are draft documents valid for a maximum of six months
46	   and may be updated, replaced, or obsoleted by other documents at any
47	   time.  It is inappropriate to use Internet-Drafts as reference
48	   material or to cite them other than as "work in progress."
49	   This Internet-Draft will expire on 9 July 2022.

51	Copyright Notice

53	   Copyright (c) 2022 IETF Trust and the persons identified as the
54	   document authors.  All rights reserved.

56	   This document is subject to BCP 78 and the IETF Trust's Legal
57	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
58	   license-info) in effect on the date of publication of this document.
59	   Please review these documents carefully, as they describe your rights
60	   and restrictions with respect to this document.  Code Components
61	   extracted from this document must include Revised BSD License text as
62	   described in Section 4.e of the Trust Legal Provisions and are
63	   provided without warranty as described in the Revised BSD License.

65	Table of Contents

67	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
68	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
69	   3.  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . .   4
70	   4.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   7
71	   5.  Theory of Operation . . . . . . . . . . . . . . . . . . . . .   8
72	     5.1.  VNF Assumptions . . . . . . . . . . . . . . . . . . . . .   8
73	     5.2.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   9
74	     5.3.  Use of a Centralized Controller . . . . . . . . . . . . .   9
75	     5.4.  Routing and LSP Underlay  . . . . . . . . . . . . . . . .  11
76	       5.4.1.  Intra-Domain Routing  . . . . . . . . . . . . . . . .  11
77	       5.4.2.  Inter-Domain Routing  . . . . . . . . . . . . . . . .  13
78	       5.4.3.  Intra-Domain and Inter-Domain Traffic-Engineering . .  15
79	     5.5.  Service Layer . . . . . . . . . . . . . . . . . . . . . .  17
80	     5.6.  Service Differentiation . . . . . . . . . . . . . . . . .  19
81	     5.7.  Automated Service Activation  . . . . . . . . . . . . . .  20
82	     5.8.  Service Function Chaining . . . . . . . . . . . . . . . .  21
83	     5.9.  Stability and Availability  . . . . . . . . . . . . . . .  23
84	       5.9.1.  IGP Reconvergence . . . . . . . . . . . . . . . . . .  23
85	       5.9.2.  Data Center Reconvergence . . . . . . . . . . . . . .  23
86	       5.9.3.  Exchange of Inter-Domain Routes . . . . . . . . . . .  24
87	       5.9.4.  Controller Redundancy . . . . . . . . . . . . . . . .  25
88	       5.9.5.  Path and Segment Liveliness . . . . . . . . . . . . .  27
89	     5.10. Scalability . . . . . . . . . . . . . . . . . . . . . . .  28
90	       5.10.1.  Asymmetric Model B for VPN Families  . . . . . . . .  30
91	   6.  Illustration of Use . . . . . . . . . . . . . . . . . . . . .  32
92	     6.1.  Reference Topology  . . . . . . . . . . . . . . . . . . .  32
93	     6.2.  PNF to PNF Connectivity . . . . . . . . . . . . . . . . .  34
94	     6.3.  VNF to PNF Connectivity . . . . . . . . . . . . . . . . .  35
95	     6.4.  VNF to VNF Connectivity . . . . . . . . . . . . . . . . .  36
96	   7.  Conclusions . . . . . . . . . . . . . . . . . . . . . . . . .  37
97	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  38
98	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  38
99	   10. Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  38
100	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  39
101	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  39
102	     12.1.  Normative References . . . . . . . . . . . . . . . . . .  39
103	     12.2.  Informative References . . . . . . . . . . . . . . . . .  39
104	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  45

106	1.  Introduction

108	   With the introduction of technologies such as 5G, the Internet of
109	   Things (IoT), and Industry 4.0, service requirements are changing.
110	   In addition to the ever-increasing demand for more capacity, these
111	   services have other stringent service requirements that need to be
112	   met such as ultra-reliable and/or low-latency communication.

114	   Parallel to this, there is a continued trend to move towards network
115	   function virtualization.  Operators are building digitalized
116	   infrastructure capable of hosting numerous virtualized network
117	   functions (VNFs).  Infrastructure that can scale in and scale out
118	   depending on the application demand and can deliver flexibility and
119	   service velocity.  Much of this virtualization activity is driven by
120	   the afore-mentioned emerging technologies as new infrastructure is
121	   deployed in support of them.  To try and meet the new service
122	   requirements some of these VNFs are becoming more dispersed, so it is
123	   common for networks to have a mix of centralized medium- or large-
124	   sized sized data centers together with more distributed smaller
125	   'edge-clouds'.  VNFs hosted within these data centers require
126	   seamless connectivity to each other, and to their existing physical
127	   network function (PNF) counterparts.  This connectivity also needs to
128	   deliver against agreed SLAs.

130	   Coupled with the deployment of virtualization is automation.  Many of
131	   these VNFs are deployed within SDN-enabled data centers where
132	   automation is simply a must-have capability to improve service
133	   activation lead-times.  The expectation is that services will be
134	   instantiated in an abstract point-and-click manner and be
135	   automatically created by the underlying network, dynamically adapting
136	   to service connectivity changes as virtual entities move between
137	   hosts.

139	   This document describes an architecture for a Network Function
140	   Interconnect (NFIX) that allows for interworking of physical and
141	   virtual network functions in a unified and scalable manner.  It
142	   describes a mechanism for establishing connectivity across multiple
143	   discreet domains in both the wide-area network (WAN) and the data
144	   center (DC) while maintaining the ability to deliver against SLAs.
145	   To achieve this NFIX works with the underlying topology to build a
146	   unified over-the-top topology.

148	   The NFIX architecture described in this document does not define any
149	   new protocols but rather outlines an architecture utilizing a
150	   collaboration of existing standards-based protocols.

152	2.  Terminology

154	   *  A physical network function (PNF) refers to a network device such
155	      as a Provider Edge (PE) router that connects physically to the
156	      wide-area network.

158	   *  A virtualized network function (VNF) refers to a network device
159	      such as a provider edge (PE) router that is hosted on an
160	      application server.  The VNF may be bare-metal in that it consumes
161	      the entire resources of the server, or it may be one of numerous
162	      virtual functions instantiated as a VM or number of containers on
163	      a given server that is controlled by a hypervisor or container
164	      management platform.

166	   *  A Data Center Border (DCB) router refers to the network function
167	      that spans the border between the wide-area and the data center
168	      networks, typically interworking the different encapsulation
169	      techniques employed within each domain.

171	   *  An Interconnect controller is the controller responsible for
172	      managing the NFIX fabric and services.

174	   *  A DC controller is the term used for a controller that resides
175	      within an SDN-enabled data center and is responsible for the DC
176	      network(s)

178	3.  Motivation

180	   Industrial automation and business-critical environments use
181	   applications that are demanding on the network.  These applications
182	   present different requirements from low-latency to high-throughput,
183	   to application-specific traffic conditioning, or a combination.  The
184	   evolution to 5G equally presents challenges for mobile back-, front-
185	   and mid-haul networks.  The requirement for ultra-reliable low-
186	   latency communication means that operators need to re-evaluate their
187	   network architecture to meet these requirements.

189	   At the same time, the service edge is evolving.  Where the service
190	   edge device was historically a PNF, the adoption of virtualization
191	   means VNFs are becoming more commonplace.  Typically, these VNFs are
192	   hosted in some form of data center environment but require end-to-end
193	   connectivity to other VNFs and/or other PNFs.  This represents a
194	   challenge because generally transport layer connectivity differs
195	   between the WAN and the data center environment.  The WAN includes
196	   all levels of hierarchy (core, aggregation, access) that form the
197	   networks footprint, where transport layer connectivity using IP/MPLS
198	   is commonplace.  In the data center native IP is commonplace,
199	   utilizing network virtualization overlay (NVO) technologies such as
200	   virtual extensible LAN (VXLAN) [RFC7348], network virtualization
201	   using generic routing encapsulation (NVGRE) [RFC7637], or generic
202	   network virtualization encapsulation (GENEVE) [I-D.ietf-nvo3-geneve].
203	   There is a requirement to seamlessly integrate these islands and
204	   avoid heavy-lifting at interconnects as well as providing a means to
205	   provision end-to-end services with a single touch point at the edge.

207	   The service edge boundary is also changing.  Some functions that were
208	   previously reasonably centralized are now becoming more distributed.
209	   One reason for this is to attempt to deal with low latency
210	   requirements.  Another reason is that operators seek to reduce costs
211	   by deploying low/medium-capacity VNFs closer to the edge.  Equally,
212	   virtualization also sees some of the access network moving towards
213	   the core.  Examples of this include cloud-RAN or Software-Defined
214	   Access Networks.

216	   Historically service providers have architected data centers
217	   independently from the wide-area network, creating two independent
218	   domains or islands.  As VNFs become part of the service landscape the
219	   service data-path must be extended across the WAN into the data
220	   center infrastructure, but in a manner that still allows operators to
221	   meet deterministic performance requirements.  Methods for stitching
222	   WAN and DC infrastructures together with some form of service-
223	   interworking at the data center border have been implemented and
224	   deployed, but this service-interworking approach has several
225	   limitations:

227	   *  The data center environment typically uses encapsulation
228	      techniques such as VXLAN or NVGRE while the WAN typically uses
229	      encapsulation techniques such as MPLS [RFC3031].  Underlying
230	      optical infrastructure might also need to be programmed.  These
231	      are incompatible and require interworking at the service layer.

233	   *  It typically requires heavy-touch service provisioning on the data
234	      center border.  In an end-to-end service, midpoint provisioning is
235	      undesirable and should be avoided.

237	   *  Automation is difficult; largely due to the first two points but
238	      with additional contributing factors.  In the virtualization world
239	      automation is a must-have capability.

241	   *  When a service is operating at Layer 3 in a data center with
242	      redundant interconnects the risk of routing loops exists.  There
243	      is no inherent loop avoidance mechanism when redistributing routes
244	      between address families so extreme care must be taken.  Proposals
245	      such as the Domain Path (D-PATH) attribute
246	      [I-D.ietf-bess-evpn-ipvpn-interworking] attempt to address this
247	      issue but as yet are not widely implemented or deployed.

249	   *  Some or all the above make the service-interworking gateway
250	      cumbersome with questionable scaling attributes.

252	   Hence there is a requirement to create an open, scalable, and unified
253	   network architecture that brings together the wide-area network and
254	   data center domains.  It is not an architecture e xclusively targeted
255	   at greenfield deployments, nor does it require a flag day upgrade to
256	   deploy in a brownfield network.  It is an evolutionary step to a
257	   consolidated network that uses the constructs of seamless MPLS
258	   [I-D.ietf-mpls-seamless-mpls] as a baseline and extends upon that to
259	   include topologies that may not be link-state based and to provide
260	   end-to-end path control.  Overall the NFIX architecture aims to
261	   deliver the following:

263	   *  Allows for an evolving service edge boundary without having to
264	      constantly restructure the architecture.

266	   *  Provides a mechanism for providing seamless connectivity between
267	      VNF to VNF, VNF to PNF, and PNF to PNF, with deterministic SLAs,
268	      and with the ability to provide differentiated SLAs to suit
269	      different service requirements.

271	   *  Delivers a unified transport fabric using Segment Routing (SR)
272	      [RFC8402] where service delivery mandates touching only the
273	      service edge without imposing additional encapsulation
274	      requirements in the DC.

276	   *  Embraces automation by providing an environment where any end-to-
277	      end connectivity can be instantiated in a single request manner
278	      while maintaining SLAs.

280	4.  Requirements

282	   The following section outlines the requirements that the proposed
283	   solution must meet.  From an overall perspective, the proposed
284	   generic architecture must:

286	   *  Deliver end-to-end transport LSPs using traffic-engineering (TE)
287	      as required to meet appropriate SLAs for the service using(s)
288	      using those LSPs.  End-to-end refers to VNF and/or PNF
289	      connectivity or a combination of both.

291	   *  Provide a solution that allows for optimal end-to-end path
292	      placement; where optimal not only meets the requirements of the
293	      path in question but also meets the global network objectives.

295	   *  Support varying types of VNF physical network attachment and
296	      logical (underlay/overlay) connectivity.

298	   *  Facilitate automation of service provision.  As such the solution
299	      should avoid heavy-touch service provisioning and decapsulation/
300	      encapsulation at data center border routers.

302	   *  Provide a framework for delivering logical end-to-end networks
303	      using differentiated logical topologies and/or constraints.

305	   *  Provide a high level of stability; faults in one domain should not
306	      propagate to another domain.

308	   *  Provide a mechanism for homogeneous end-to-end OAM.

310	   *  Hide/localize instabilities in the different domains that
311	      participate in the end-to-end service.

313	   *  Provide a mechanism to minimize the label-stack depth required at
314	      path head-ends for SR-TE LSPs.

316	   *  Offer a high level of scalability.

318	   *  Although not considered in-scope of the current version of this
319	      document, the solution should not preclude the deployment of
320	      multicast.  This subject may be covered in later versions of this
321	      document.

323	5.  Theory of Operation

325	   This section describes the NFIX architecture including the building
326	   blocks and protocol machinery that is used to form the fabric.  Where
327	   considered appropriate rationale is given for selection of an
328	   architectural component where other seemingly applicable choices
329	   could have been made.

331	5.1.  VNF Assumptions

333	   For the sake of simplicity, references to VNF are made in a broad
334	   sense.  Equally, the differences between VNF and Container Network
335	   Function (CNF) are largely immaterial for the purposes of this
336	   document, therefore VNF is used to represent both.  The way in which
337	   a VNF is instantiated and provided network connectivity will differ
338	   based on environment and VNF capability, but for conciseness this is
339	   not explicitly detailed with every reference to a VNF.  Common
340	   examples of VNF variants include but are not limited to:

342	   *  A VNF that functions as a routing device and has full IP routing
343	      and MPLS capabilities.  It can be connected simultaneously to the
344	      data center fabric underlay and overlay and serves as the NVO
345	      tunnel endpoint [RFC8014].  Examples of this might be a
346	      virtualized PE router, or a virtualized Broadband Network Gateway
347	      (BNG).

349	   *  A VNF that functions as a device (host or router) with limited IP
350	      routing capability.  It does not connect directly to the data
351	      center fabric underlay but rather connects to one or more external
352	      physical or virtual devices that serve as the NVO tunnel
353	      endpoint(s).  It may however have single or multiple connections
354	      to the overlay.  Examples of this might be a mobile network
355	      control or management plane function.

357	   *  A VNF that has no routing capability.  It is a virtualized
358	      function hosted within an application server and is managed by a
359	      hypervisor or container host.  The hypervisor/container host acts
360	      as the NVO endpoint and interfaces to some form of SDN controller
361	      responsible for programming the forwarding plane of the
362	      virtualization host using, for example, OpenFlow.  Examples of
363	      this might be an Enterprise application server or a web server
364	      running as a virtual machine and front-ended by a virtual routing
365	      function such as OVS/xVRS/VTF.

367	   Where considered necessary exceptions to the examples provided above
368	   or focus on a particular scenario will be highlighted.

370	5.2.  Overview

372	   The NFIX architecture makes no assumptions about how the network is
373	   physically composed, nor does it impose any dependencies upon it.  It
374	   also makes no assumptions about IGP hierarchies and the use of areas/
375	   levels or discrete IGP instances within the WAN is fully endorsed to
376	   enhance scalability and constrain fault propagation.  This could
377	   apply for instance to a hierarchical WAN from core to edge or from
378	   WAN to LAN connections.  The overall architecture uses the constructs
379	   of seamless MPLS as a baseline and extends upon that.  The concept of
380	   decomposing the network into multiple domains is one that has been
381	   widely deployed and has been proven to scale in networks with large
382	   numbers of nodes.

384	   The proposed architecture uses segment routing (SR) as its preferred
385	   choice of transport.  Segment routing is chosen for construction of
386	   end-to-end LSPs given its ability to traffic-engineer through source-
387	   routing while concurrently scaling exceptionally well due to its lack
388	   of network state other than the ingress node.  This document uses SR
389	   instantiated on an MPLS forwarding plane(SR-MPLS), although it does
390	   not preclude the use of SRv6 either now or at some point in the
391	   future.  The rationale for selecting SR-MPLS is simply maturity and
392	   more widespread applicability across a potentially broad range of
393	   network devices.  This document may be updated in future versions to
394	   include more description of SRv6 applicability.

396	5.3.  Use of a Centralized Controller

398	   It is recognized that for most operators the move towards the use of
399	   a controller within the wide-area network is a significant change in
400	   operating model.  In the NFIX architecture it is a necessary
401	   component.  Its use is not simply to offload inter-domain path
402	   calculation from network elements; it provides many more benefits:

404	   *  It offers the ability to enforce constraints on paths that
405	      originate/terminate on different network elements, thereby
406	      providing path diversity, and/or bidirectionality/co-routing, and/
407	      or disjointness.

409	   *  It avoids collisions, re-tries, and packing problems that has been
410	      observed in networks using distributed TE path calculation, where
411	      head-ends make autonomous decisions.

413	   *  A controller can take a global view of path placement strategies,
414	      including the ability to make path placement decisions over a high
415	      number of LSPs concurrently as opposed to considering each LSP
416	      independently.  In turn, this allows for 'global' optimization of
417	      network resources such as available capacity.

419	   *  A controller can make decisions based on near-real-time network
420	      state and optimize paths accordingly.  For example, if a network
421	      link becomes congested it may recompute some of the paths
422	      transiting that link to other links that may not be quite as
423	      optimal but do have available capacity.  Or if a link latency
424	      crosses a certain threshold, it may select to reoptimize some
425	      latency-sensitive paths away from that link.

427	   *  The logic of a controller can be extended beyond pure path
428	      computation and placement.  If the controller is aware of
429	      services, service requirements, and available paths within the
430	      network it can cross-correlate between them and ensure that the
431	      appropriate paths are used for the appropriate services.

433	   *  The controller can provide assurance and verification of the
434	      underlying SLA provided to a given service.

436	   As the main objective of the NFIX architecture is to unify the data
437	   center and wide-area network domains, using the term controller is
438	   not sufficiently succinct.  The centralized controller may need to
439	   interface to other controllers that potentially reside within an SDN-
440	   enabled data center.  Therefore, to avoid interchangeably using the
441	   term controller for both functions, we distinguish between them
442	   simply by using the terms 'DC controller' which as the name suggests
443	   is responsible for the DC, and 'Interconnect controller' responsible
444	   for managing the extended SR fabric and services.

446	   The Interconnect controller learns wide-area network topology
447	   information and allocation of segment routing SIDs within that domain
448	   using BGP link-state [RFC7752] with appropriate SR extensions.
449	   Equally it learns data center topology information and Prefix-SID
450	   allocation using BGP labeled unicast [RFC8277] with appropriate SR
451	   extensions, or BGP link-state if a link-state IGP is used within the
452	   data center.  If Route-Reflection is used for exchange of BGP link-
453	   state or labeled unicast NLRI within one or more domains, then the
454	   Interconnect controller need only peer as a client with those Route-
455	   Reflectors in order to learn topology information.

457	   Where BGP link-state is used to learn the topology of a data center
458	   (or any IGP routing domain) the BGP-LS Instance Identifier (Instance-
459	   ID) is carried within Node/Link/Prefix NLRI and is used to identify a
460	   given IGP routing domain.  Where labeled unicast BGP is used to
461	   discover the topology of one or more data center domains there is no
462	   equivalent way for the Interconnect controller to achieve a level of
463	   routing domain correlation.  The controller may learn some splintered
464	   connectivity map consisting of 10 leaf switches, four spine switches,
465	   and four DCB's, but it needs some form of key to inform it that leaf
466	   switches 1-5, spine switches 1 and 2, and DCB's 1 and 2 belong to
467	   data center 1, while leaf switches 6-10, spine switches 3 and 4, and
468	   DCB's 3 and 4 belong to data center 2.  What is needed is a form of
469	   'data center membership identification' to provide this correlation.
470	   Optionally this could be achieved at BGP level using a standard
471	   community to represent each data center, or it could be done at a
472	   more abstract level where for example the DC controller provides the
473	   membership identification to the Interconnect controller through an
474	   application programming interface (API).

476	   Understanding real-time network state is an important part of the
477	   Interconnect controllers role, and only with this information is the
478	   controller able to make informed decisions and take preventive or
479	   corrective actions as necessary.  There are numerous methods
480	   implemented and deployed that allow for harvesting of network state,
481	   including (but not limited to) IPFIX [RFC7011], Netconf/YANG
482	   [RFC6241][RFC6020], streaming telemetry, BGP link-state [RFC7752]
483	   [I-D.ietf-idr-te-lsp-distribution], and the BGP Monitoring Protocol
484	   (BMP) [RFC7854].

486	5.4.  Routing and LSP Underlay

488	   This section describes the mechanisms and protocols that are used to
489	   establish end-to-end LSPs; where end-to-end refers to VNF-to-VNF,
490	   PNF-to-PNF, or VNF-to-PNF.

492	5.4.1.  Intra-Domain Routing

494	   In a seamless MPLS architecture domains are based on geographic
495	   dispersion (core, aggregation, access).  Within this document a
496	   domain is considered as any entity with a captive topology; be it a
497	   link-state topology or otherwise.  Where reference is made to the
498	   wide-area network domain, it refers to one or more domains that
499	   constitute the wide-area network domain.

501	   This section discusses the basic building blocks required within the
502	   wide-area network and the data center, noting from above that the
503	   wide-area network may itself consist of multiple domains.

505	5.4.1.1.  Wide-Area Network Domains

507	   The wide-area network includes all levels of hierarchy (core,
508	   aggregation, access) that constitute the networks MPLS footprint as
509	   well as the data Center border routers.  Each domain that constitutes
510	   part of the wide-area network runs a link-state interior gateway
511	   protocol (IGP) such as ISIS or OSPF, and each domain may use IGP-
512	   inherent hierarchy (OSPF areas, ISIS levels) with an assumption that
513	   visibility is domain-wide using, for example, L2 to L1
514	   redistribution.  Alternatively, or additionally, there may be
515	   multiple domains that are split by using separate and distinct
516	   instances of IGP.  There is no requirement for IGP redistribution of
517	   any link or loopback addresses between domains.

519	   Each IGP should be enabled with the relevant extensions for segment
520	   routing [RFC8667][RFC8665], and each SR-capable router should
521	   advertise a Node-SID for its loopback address, and an Adjacency-SID
522	   (Adj-SID) for every connected interface (unidirectional adjacency)
523	   belonging to the SR domain.  SR Global Blocks (SRGB) can be allocated
524	   to each domain as deemed appropriate to specific network
525	   requirements.  Border routers belonging to multiple domains have an
526	   SRGB for each domain.

528	   The default forwarding path for intra-domain LSPs that do not require
529	   TE is simply an SR LSP containing a single label advertised by the
530	   destination as a Node-SID and representing the ECMP-aware shortest
531	   path to that destination.  Intra-domain TE LSPs are constructed as
532	   required by the Interconnect controller.  Once a path is calculated
533	   it is advertised as an explicit SR Policy
534	   [I-D.ietf-spring-segment-routing-policy] containing one or more paths
535	   expressed as one or more segment-lists, which may optionally contain
536	   binding SIDs if requirements dictate.  An SR Policy is identified
537	   through the tuple [headend, color, endpoint] and this tuple is used
538	   extensively by the Interconnect controller to associate services with
539	   an underlying SR Policy that meets its objectives.

541	   To provide support for ECMP the Entropy Label [RFC6790][RFC8662]
542	   should be utilized.  Entropy Label Capability (ELC) should be
543	   advertised into the IGP using the IS-IS Prefix Attributes TLV
544	   [I-D.ietf-isis-mpls-elc] or the OSPF Extended Prefix TLV
545	   [I-D.ietf-ospf-mpls-elc] coupled with the Node MSD Capability sub-TLV
546	   to advertise Entropy Readable Label Depth (ERLD) [RFC8491][RFC8476]
547	   and the base MPLS Imposition (BMI).  Equally, support for ELC
548	   together with the supported ERLD should be signaled in BGP using the
549	   BGP Next-Hop Capability [I-D.ietf-idr-next-hop-capability].  Ingress
550	   nodes and or DCBs should ensure sufficient entropy is applied to
551	   packets to exercise available ECMP links.

553	5.4.1.2.  Data Center Domain

555	   The data center domain includes all fabric switches, network
556	   virtualization edge (NVE), and the data center border routers.  The
557	   data center routing design may align with the framework of [RFC7938]
558	   running eBGP single-hop sessions established over direct point-to-
559	   point links, or it may use an IGP for dissemination of topology
560	   information.  This document focuses on the former, simply because the
561	   ue of an IGP largely makes the data centers behaviour analogous to
562	   that of a wide-area network domain.

564	   The chosen method of transport or encapsulation within the data
565	   center for NFIX is SR-MPLS over IP/UDP [RFC8663] or, where possible,
566	   native SR-MPLS.  The choice of SR-MPLS over IP/UDP or native SR-MPLS
567	   allows for good entropy to maximize the use of equal-cost Clos fabric
568	   links.  Native SR-MPLS encapsulation provides entropy through use of
569	   the Entropy Label, and, like the wide-area network, support for ELC
570	   together with the support ERLD should be signaled using the BGP Next-
571	   Hop Capability attribute.  As described in [RFC6790] the ELC is an
572	   indication from the egress node of an MPLS tunnel to the ingress node
573	   of the MPLS tunnel that is is capable of processing an Entropy Label.
574	   The BGP Next-Hop Capability is a non-transitive attribute which is
575	   modified or deleted when the next-hop is changed to reflect the
576	   capabilities of the new next-hop.  If we assume that the path of a
577	   BGP-signaled LSP transits through multiple ASNs, and/or a single ASN
578	   with multiple next-hops, then it is not possible for the ingress node
579	   to determine the ELC of the egress node.  Without this end-to-end
580	   signaling capability the entropy label must only be used when it is
581	   explicitly known, through configuration or other means, that the
582	   egress node has support for it.  Entropy for SR-MPLS over IP/UDP
583	   encapsulation uses the source UDP port for IPv4 and the Flow Label
584	   for IPv6.  Again, the ingress network function should ensure
585	   sufficient entropy is applied to exercise available ECMP links.

587	   Another significant advantage of the use of native SR-MPLS or SR-MPLS
588	   over IP/UDP is that it allows for a lightweight interworking function
589	   at the DCB without the requirement for midpoint provisioning;
590	   interworking between the data center and the wide-area network
591	   domains becomes an MPLS label swap/continue action.

593	   Loopback addresses of network elements within the data center are
594	   advertised using labeled unicast BGP with the addition of SR Prefix
595	   SID extensions [RFC8669] containing a globally unique and persistent
596	   Prefix-SID.  The data-plane encapsulation of SR-MPLS over IP/UDP or
597	   native SR-MPLS allows network elements within the data center to
598	   consume BGP Prefix-SIDs and legitimately use those in the
599	   encapsulation.

601	5.4.2.  Inter-Domain Routing

603	   Inter-domain routing is responsible for establishing connectivity
604	   between any domains that form the wide-area network, and between the
605	   wide-area network and data center domains.  It is considered unlikely
606	   that every end-to-end LSP will require a TE path, hence there is a
607	   requirement for a default end-to-end forwarding path.  This default
608	   forwarding path may also become the path of last resort in the event
609	   of a non-recoverable failure of a TE path.  Similar to the seamless
610	   MPLS architecture this inter-domain MPLS connectivity is realized
611	   using labeled unicast BGP [RFC8277] with the addition of SR Prefix
612	   SID extensions.

614	   Within each wide-area network domain all service edge routers, DCBs,
615	   and ABRs/ASBRs form part of the labeled BGP mesh, which can be either
616	   full-mesh, or more likely based on the use of route-reflection.  Each
617	   of these routers advertises its respective loopback addresses into
618	   labeled BGP together with an MPLS label and a globally unique Prefix-
619	   SID.  Routes are advertised between wide-area network domains by
620	   ABRs/ASBRs that impose next-hop-self on advertised routes.  The
621	   function of imposing next-hop-self for labeled routes means that the
622	   ABR/ASBR allocates a new label for advertised routes and programs a
623	   label-swap entry in the forwarding plane for received and advertised
624	   routes.  In short it becomes part of the forwarding path.

626	   DCB routers have labeled BGP sessions towards the wide-area network
627	   and labeled BGP sessions towards the data center.  Routes are
628	   bidirectionally advertised between the domains subject to policy,
629	   with the DCB imposing itself as next-hop on advertised routes.  As
630	   above, the function of imposing next-hop-self for labeled routes
631	   implies allocation of a new label for advertised routes and a label-
632	   swap entry being programmed in the forwarding plane for received and
633	   advertised labels.  The DCB thereafter becomes the anchor point
634	   between the wide-area network domain and the data center domain.

636	   Within the wide-area network next-hops for labeled unicast routes
637	   containing Prefix-SIDs are resolved to SR LSPs, and within the data
638	   center domain next-hops for labeled unicast routes containing Prefix-
639	   SIDs are resolved to SR LSPs or IP/UDP tunnels.  This provides end-
640	   to-end connectivity without a traffic-engineering capability.

642	         +---------------+   +----------------+   +---------------+
643	         |  Data Center  |   |   Wide-Area    |   |   Wide-Area   |
644	         |              +-----+   Domain 1   +-----+  Domain ‘n’  |
645	         |              | DCB |              | ABR |              |
646	         |              +-----+              +-----+              |
647	         |               |   |                |   |               |
648	         +---------------+   +----------------+   +---------------+
649	         <-- SR/SRoUDP -->   <---- IGP/SR ---->   <--- IGP/SR ---->
650	         <--- BGP-LU ---> NHS <--- BGP-LU ---> NHS <--- BGP-LU --->

652	                                  Figure 1

654	   Default Inter-Domain Forwarding Path

656	5.4.3.  Intra-Domain and Inter-Domain Traffic-Engineering

658	   The capability to traffic-engineer intra- and inter-domain end-to-end
659	   paths is considered a key requirement in order to meet the service
660	   objectives previously outlined.  To achieve optimal end-to-end path
661	   placement the key components to be considered are path calculation,
662	   path activation, and FEC-to-path binding procedures.

664	   In the NFIX architecture end-to-end path calculation is performed by
665	   the Interconnect controller.  The mechanics of how the objectives of
666	   each path is calculated is beyond the scope of this document.  Once a
667	   path is calculated based upon its objectives and constraints, the
668	   path is advertised from the controller to the LSP headend as an
669	   explicit SR Policy containing one or more paths expressed as one or
670	   more segment-lists.  An SR Policy is identified through the tuple
671	   [headend, color, endpoint] and this tuple is used extensively by the
672	   Interconnect controller to associate services with an underlying SR
673	   Policy that meets its objectives.

675	   The segment-list of an SR Policy encodes a source-routed path towards
676	   the endpoint.  When calculating the segment-list the Interconnect
677	   controller makes comprehensive use of the Binding-SID (BSID),
678	   instantiating BSID anchors as necessary at path midpoints when
679	   calculating and activating a path.  The use of BSID is considered
680	   fundamental to segment routing as described in
681	   [I-D.filsfils-spring-sr-policy-considerations].  It provides opacity
682	   between domains, ensuring that any segment churn is constrained to a
683	   single domain.  It also reduces the number of segments/labels that
684	   the headend needs to impose, which is particularly important given
685	   that network elements within a data center generally have limited
686	   label imposition capabilities.  In the context of the NFIX
687	   architecture it is also the vehicle that allows for removal of heavy
688	   midpoint provisioning at the DCB.

690	   For example, assume that VNF1 is situated in data center 1, which is
691	   interconnected to the wide-area network via DCB1.  VNF1 requires
692	   connectivity to VNF2, situated in data center 2, which is
693	   interconnected to the wide-area network via DCB2.  Assuming there is
694	   no existing TE path that meet VNF1's requirements, the Interconnect
695	   controller will:

697	   *  Instantiate an SR Policy on DCB1 with BSID n and a segment-list
698	      containing the relevant segments of a TE path to DCB2.  DCB1
699	      therefore becomes a BSID anchor.

701	   *  Instantiate an SR Policy on VNF1 with BSID m and a segment-list
702	      containing segments {DCB1, n, VNF2}.

704	          +---------------+  +----------------+  +---------------+
705	          | Data Center 1 |  |   Wide-Area    |  | Data Center 2 |
706	          | +----+       +----+      3       +----+       +----+ |
707	          | |VNF1|       |DCB1|-1   / \   5--|DCB2|       |VNF2| |
708	          | +----+       +----+  \ /   \ /   +----+       +----+ |
709	          |               |  |    2     4     |  |               |
710	          +---------------+  +----------------+  +---------------+
711	          SR Policy      SR Policy
712	          BSID m         BSID n
713	         {DCB1,n,VNF2} {1,2,3,4,5,DCB2}

715	                                  Figure 2

717	   Traffic-Engineered Path using BSID

719	   In the above figure a single DCB is used to interconnect two domains.
720	   Similarly, in the case of two wide-area domains the DCB would be
721	   represented as an ABR or ASBR.  In some single operator environments
722	   domains may be interconnected using adjacent ASBRs connected via a
723	   distinct physical link.  In this scenario the procedures outlined
724	   above may be extended to incorporate the mechanisms used in Egress
725	   Peer Engineering (EPE) [I-D.ietf-spring-segment-routing-central-epe]
726	   to form a traffic-engineered path spanning distinct domains.

728	5.4.3.1.  Traffic-Engineering and ECMP

730	   Where the Interconnect controller is used to place SR policies,
731	   providing support for ECMP requires some consideration.  An SR Policy
732	   is described with one or more segment-lists, end each of those
733	   segment-lists may or may not provide ECMP as a sum instruction and
734	   each SID itself may or may not support ECMP forwarding.  When an
735	   individual SID is a BSID, an ECMP path may or may not also be nested
736	   within.  The Interconnect controller may choose to place a path
737	   consisting entirely of non-ECMP-aware Adj-SIDs (each SID representing
738	   a single adjacency) such that the controller has explicit hop-by-hop
739	   knowledge of where that SR-TE LSP is routed.  This is beneficial to
740	   allow the controller to take corrective action if the criteria that
741	   was used to initially select a particular link in a particular path
742	   subsequently changes.  For example, if the latency of a link
743	   increases or a link becomes congested and a path should be rerouted.
744	   If ECMP-aware SIDs are used in the SR policy segment-list (including
745	   Node-SIDs, Adj-SIDs representing parallel links, and Anycast SIDs) SR
746	   routers are able to make autonomous decisions about where traffic is
747	   forwarded.  As a result, it is not possible for the controller to
748	   fully understand the impact of a change in network state and react to
749	   it.  With this in mind there are a number of approaches that could be
750	   adopted:

752	   *  If there is no requirement for the Interconnect controller to
753	      explicitly track path on a hop-by-hop basis, ECMP-aware SIDs may
754	      be used in the SR policy segment-list.  This approach may require
755	      multiple [ELI, EL] pairs to be inserted at the ingress node; for
756	      example, above and below a BSID to provide entropy in multiple
757	      domains.

759	   *  If there is a requirement for the Interconnect controller to
760	      explicitly track paths on a hop-by-hop to provide the capability
761	      to reroute them based on changes in network state, SR policy
762	      segment-lists should be constructed of non-ECMP-aware Adj-SIDs.

764	   *  A hybrid approach that allows for a level of ECMP (at the headend)
765	      together with the ability for the Interconnect controller to
766	      explicitly track paths is to instantiate an SR policy consisting
767	      of a set of segment-lists, each containing non-ECMP-aware Adj-
768	      SIDs.  Each segment-list will be assigned a weight to allow for
769	      ECMP or UCMP.  This approach does however imply computation and
770	      programing of two paths instead of one.

772	   *  Another hybrid approach might work as follows.  Redundant DCBs
773	      advertise an Anycast-SID 'A' into the data center, and also
774	      instantiate an SR policy with a segment-list consisting of non-
775	      ECMP-aware Adj-SIDs meeting the required connectivity and SLA.
776	      The BSID value of this SR policy 'B' must be common to both
777	      redundant DCBs, but the calculated paths are diverse.  Indeed,
778	      multiple segment-lists could be used in this SR policy.  A VNF
779	      could then instantiate an SR policy with a segment-list of {A, B}
780	      to achieve ECMP in the data center and TE in the wide-area network
781	      with the option of ECMP at the BSID anchor

783	5.5.  Service Layer

785	   The service layer is intended to deliver Layer 2 and/or Layer 3 VPN
786	   connectivity between network functions to create an overlay utilizing
787	   the routing and LSP underlay described in section 5.4.  To do this
788	   the solution employs the EVPN and/or VPN-IPv4/IPv6 address families
789	   to exchange Layer 2 and Layer 3 Network Layer Reachability
790	   Information (NLRI).  When these NLRI are exchanged between domains it
791	   is typical for the border router to set next-hop-self on advertised
792	   routes.  With the proposed routing and LSP underlay however, this is
793	   not required and EVPN/VPN-IPv4/IPv6 routes should be passed end-to-
794	   end without transit routers modifying the next-hop attribute.

796	   Section 5.4.2 describes the use of labeled unicast BGP to exchange
797	   inter-domain routes to establish a default forwarding path.  Labeled-
798	   unicast BGP is used to exchange prefix reachability between service
799	   edge routers, with domain border routes imposing next-hop-self on
800	   routes advertised between domains.  This provides a default inter-
801	   domain forwarding path and provides the required connectivity to
802	   establish inter-domain BGP sessions between service edges for the
803	   exchange of EVPN and/or VPN-IPv4/IPv6 NLRI.  If route-reflection is
804	   used for the EVPN and/or VPN-IPv4/IPv6 address families within one or
805	   more domains, it may be desirable to create inter-domain BGP sessions
806	   between route-reflectors.  In this case the peering addresses of the
807	   route-reflectors should also be exchanged between domains using
808	   labeled unicast BGP.  This creates a connectivity model analogous to
809	   BGP/MPLS IP-VPN Inter-AS option C [RFC4364].

811	           +----------------+  +----------------+  +----------------+
812	           |     +----+     |  |     +----+     |  |     +----+     |
813	         +----+  | RR |    +----+    | RR |    +----+    | RR |   +----+
814	         | NF |  +----+    | DCI|    +----+    | DCI|    +----+   | NF |
815	         +----+            +----+              +----+             +----+
816	           |     Domain     |  |     Domain     |  |     Domain     |
817	           +----------------+  +----------------+  +----------------+
818	           <-------> <-----> NHS <-- BGP-LU ---> NHS <-----> <------>
819	           <-------> <--------- EVPN/VPN-IPv4/v6 ----------> <------>

821	                                 Figure 3

823	   Inter-Domain Service Layer

825	   EVPN and/or VPN-IPv4/v6 routes received from a peer in a different
826	   domain will contain a next-hop equivalent to the router that sourced
827	   the route.  The next-hop of these routes can be resolved to labeled-
828	   unicast route (default forwarding path) or to an SR policy (traffic-
829	   engineered forwarding path) as appropriate to the service
830	   requirements.  The exchange of EVPN and/or VPN-IPv4/IPv6 routes in
831	   this manner implies that Route-Distinguisher and Route-Target values
832	   remain intact end-to-end.

834	   The use of end-to-end EVPN and/or VPN-IPv4/IPv6 address families
835	   without the imposition of next-hop-self at border routers complements
836	   the gateway-less transport layer architecture.  It negates the
837	   requirement for midpoint service provisioning and as such provides
838	   the following benefits:

840	   *  Avoids the translation of MAC/IP EVPN routes to IP-VPN routes (and
841	      vice versa) that is typically associated with service
842	      interworking.

844	   *  Avoids instantiation of MAC-VRFs and IP-VPNs for each tenant
845	      resident in the DCB.

847	   *  Avoids provisioning of demarcation functions between the data
848	      center and wide-area network such as QoS, access-control,
849	      aggregation and isolation.

851	5.6.  Service Differentiation

853	   As discussed in section 5.4.3, the use of TE paths is a key
854	   capability of the NFIX solution framework described in this document.
855	   The Interconnect controller computes end-to-end TE paths between NFs
856	   and programs DC nodes, DCBs, ABR/ASBRs, via SR Policy, with the
857	   necessary label forwarding entries for each [headend, color,
858	   endpoint].  The collection of [headend, endpoint] pairs for the same
859	   color constitutes a logical network topology, where each topology
860	   satisfies a given SLA requirement.

862	   The Interconnect controller discovers the endpoints associated to a
863	   given topology (color) upon the reception of EVPN or IPVPN routes
864	   advertised by the endpoint.  The EVPN and IPVPN NLRIs are advertised
865	   by the endpoint nodes along with a color extended community which
866	   identifies the topology to which the owner of the NLRI belongs.  At a
867	   coarse level all the EVPN/IPVPN routes of the same VPN can be
868	   advertised with the same color, and therefore a TE topology would be
869	   established on a per-VPN basis.  At a more granular level IPVPN and
870	   especially EVPN provide a more granular way of coloring routes, that
871	   will allow the Interconnect controller to associate multiple
872	   topologies to the same VPN.  For example:

874	   *  All the EVPN MAC/IP routes for a given VNF may be advertised with
875	      the same color.  This would allow the Interconnect controller to
876	      associate topologies per VNF within the same VPN; that is, VNF1
877	      could be blue (e.g., low-latency topology) and VNF2 could be green
878	      (e.g., high-throughput).

880	   *  The EVPN MAC/IP routes and Inclusive Multicast Ethernet Tag (IMET)
881	      route for VNF1 may be advertised with different colors, e.g., red
882	      and brown, respectively.  This would allow the association of
883	      e.g., a low-latency topology for unicast traffic to VNF1 and best-
884	      effort topology for BUM traffic to VNF1.

886	   *  Each EVPN MAC/IP route or IP-Prefix route from a given VNF may be
887	      advertised with different color.  This would allow the association
888	      of topologies at the host level or host route granularity.

890	5.7.  Automated Service Activation

892	   The automation of network and service connectivity for instantiation
893	   and mobility of virtual machines is a highly desirable attribute
894	   within data centers.  Since this concerns service connectivity, it
895	   should be clear that this automation is relevant to virtual functions
896	   that belong to a service as opposed to a virtual network function
897	   that delivers services, such as a virtual PE router.

899	   Within an SDN-enabled data center, a typical hierarchy from top to
900	   bottom would include a policy engine (or policy repository), one or
901	   more DC controllers, numerous hypervisors/container hosts that
902	   function as NVO endpoints, and finally the virtual
903	   machines(VMs)/containers, which we'll refer to generically as
904	   virtualization hosts.

906	   The mechanisms used to communicate between the policy engine and DC
907	   controller, and between the DC controller and hypervisor/container
908	   are not relevant here and as such they are not discussed further.
909	   What is important is the interface and information exchange between
910	   the Interconnect controller and the data center SDN functions:

912	   *  The Interconnect controller interfaces with the data center policy
913	      engine and publishes the available colors, where each color
914	      represents a topological service connectivity map that meets a set
915	      of constraints and SLA objectives.  This interface is a
916	      straightforward API.

918	   *  The Interconnect controller interfaces with the DC controller to
919	      learn overlay routes.  This interface is BGP and uses the EVPN
920	      Address Family.

922	   With the above framework in place, automation of network and service
923	   connectivity can be implemented as follows:

925	   *  The virtualization host is turned-up.  The NVO endpoint notifies
926	      the DC controller of the startup.

928	   *  The DC controller retrieves service information, IP addressing
929	      information, and service 'color' for the virtualization host from
930	      the policy engine.  The DC controller subsequently programs the
931	      associated forwarding information on the virtualization host.
932	      Since the DC controller is now aware of MAC and IP address
933	      information for the virtualization host, it advertises that
934	      information as an EVPN MAC Advertisement Route into the overlay.

936	   *  The Interconnect controller receives the EVPN MAC Advertisement
937	      Route (potentially via a Route-Reflector) and correlates it with
938	      locally held service information and SLA requirements using Route
939	      Target and Color communities.  If the relevant SR policies are not
940	      already in place to support the service requirements and logical
941	      connectivity, including any binding-SIDs, they are calculated and
942	      advertised to the relevant headends.

944	   The same automated service activation principles can also be used to
945	   support the scenario where virtualization hosts are moved between
946	   hypervisors/container hosts for resourcing or other reasons.  We
947	   refer to this simply as mobility.  If a virtualization host is turned
948	   down the parent NVO endpoint notifies the DC controller, which in
949	   turn notifies the policy engine and withdraws any EVPN MAC
950	   Advertisement Routes.  Thereafter all associated state is removed.
951	   When the virtualization host is turned up on a different hypervisor/
952	   container host, the automated service connectivity process outlined
953	   above is simply repeated.

955	5.8.  Service Function Chaining

957	   Service Function Chaining (SFC) defines an ordered set of abstract
958	   service functions and the subsequent steering of traffic through
959	   them.  Packets are classified at ingress for processing by the
960	   required set of service functions (SFs) in an SFC-capable domain and
961	   are then forwarded through each SF in turn for processing.  The
962	   ability to dynamically construct SFCs containing the relevant SFs in
963	   the right sequence is a key requirement for operators.

965	   To enable flexible service function deployment models that support
966	   agile service insertion the NFIX architecture adopts the use of BGP
967	   as the control plane to distribute SFC information.  The BGP control
968	   plane for Network Service Header (NSH) SFC
969	   [I-D.ietf-bess-nsh-bgp-control-plane] is used for this purpose and
970	   defines two route types; the Service Function Instance Route (SFIR)
971	   and the Service Function Path Route (SFPR).

973	   The SFIR is used to advertise the presence of a service function
974	   instance (SFI) as a function type (i.e. firewall, TCP optimizer) and
975	   is advertised by the node hosting that SFI.  The SFIR is advertised
976	   together with a BGP Tunnel Encapsulation attribute containing details
977	   of how to reach that particular service function through the underlay
978	   network (i.e.  IP address and encapsulation information).

980	   The SFPRs contain service function path (SFP) information and one
981	   SFPR is originated for each SFP.  Each SFPR contains the service path
982	   identifier (SPI) of the path, the sequence of service function types
983	   that make up the path (each of which has at least one instance
984	   advertised in an SFIR), and the service index (SI) for each listed
985	   service function to identify its position in the path.

987	   Once a Classifier has determined which flows should be mapped to a
988	   given SFP, it imposes an NSH [RFC8300] on those packets, setting the
989	   SPI to that of the selected service path (advertised in an SFPR), and
990	   the SI to the first hop in the path.  As NSH is encapsulation
991	   agnostic, the NSH encapsulated packet is then forwarded through the
992	   appropriate tunnel to reach the service function forwarder (SFF)
993	   supporting that service function instance (advertised in an SFIR).
994	   The SFF removes the tunnel encapsulation and forwards the packet with
995	   the NSH to the relevant SF based upon a lookup of the SPI/SI.  When
996	   it is returned from the SF with a decremented SI value, the SFF
997	   forwards the packet to the next hop in the SFP using the tunnel
998	   information advertised by that SFI.  This procedure is repeated until
999	   the last hop of the SFP is reached.

1001	   The use of the NSH in this manner allows for service chaining with
1002	   topological and transport independence.  It also allows for the
1003	   deployment of SFIs in a condensed or dispersed fashion depending on
1004	   operator preference or resource availability.  Service function
1005	   chains are built in their own overlay network and share a common
1006	   underlay network, where that common underlay network is the NFIX
1007	   fabric described in section 5.4.  BGP updates containing an SFIR or
1008	   SFPR are advertised in conjunction with one or more Route Targets
1009	   (RTs), and each node in a service function overlay network is
1010	   configured with one or more import RTs.  As a result, nodes will only
1011	   import routes that are applicable and that local policy dictates.
1012	   This provides the ability to support multiple service function
1013	   overlay networks or the construction of service function chains
1014	   within L3VPN or EVPN services.

1016	   Although SFCs are constructed in a unidirectional manner, the BGP
1017	   control plane for NSH SFC allows for the optional association of
1018	   multiple paths (SFPRs).  This provides the ability to construct a
1019	   bidirectional service function chain in the presence of multiple
1020	   equal-cost paths between source and destination to avoid problems
1021	   that SFs may suffer with traffic asymmetry.

1023	   The proposed SFC model can be considered decoupled in that the use of
1024	   SR as a transport between SFFs is completely independent of the use
1025	   of NSH to define the SFC.  That is, it uses an NSH-based SFC and SR
1026	   is just one of many encapsulations that could be used between SFFs.
1027	   A similar more integrated approach proposes encoding a service
1028	   function as a segment so that an SFC can be constructed as a segment-
1029	   list.  In this case it can be considered an SR-based SFC with an NSH-
1030	   based service plane since the SF is unaware of the presence of the
1031	   SR.  Functionally both approaches are very similar and as such both
1032	   could be adopted and could work in parallel.  Construction of SFCs
1033	   based purely on SR (SF is SR-aware) are not considered at this time.

1035	5.9.  Stability and Availability

1037	   Any network architecture should have the capability to self-restore
1038	   following the failure of a network element.  The time to reconverge
1039	   following the failure needs to be minimal to avoid evident
1040	   disruptions in service.  This section discusses protection mechanisms
1041	   that are available for use and their applicability to the proposed
1042	   architecture.

1044	5.9.1.  IGP Reconvergence

1046	   Within the construct of an IGP topology the Topology Independent Loop
1047	   Free Alternate (TI-LFA) [I-D.ietf-rtgwg-segment-routing-ti-lfa] can
1048	   be used to provide a local repair mechanism that offers both link and
1049	   node protection.

1051	   TI-LFA is a repair mechanism, and as such it is reactive and
1052	   initially needs to detect a given failure.  To provide fast failure
1053	   detection the Bidirectional Forwarding Mechanism (BFD) is used.
1054	   Consideration needs to be given to the restoration capabilities of
1055	   the underlying transmission when deciding values for message
1056	   intervals and multipliers to avoid race conditions, but failure
1057	   detection in the order of 50 milliseconds can reasonably be
1058	   anticipated.  Where Link Aggregation Groups (LAG) are used, micro-BFD
1059	   [RFC7130] can be used to similar effect.  Indeed, to allow for
1060	   potential incremental growth in capacity it is not uncommon for
1061	   operators to provision all network links as LAG and use micro-BFD
1062	   from the outset.

1064	5.9.2.  Data Center Reconvergence

1066	   Clos fabrics are extremely common within data centers, and
1067	   fundamental to a Clos fabric is the ability to load-balance using
1068	   Equal Cost Multipath (ECMP).  The number of ECMP paths will vary
1069	   dependent on the number of devices in the parent tier but will never
1070	   be less than two for redundancy purposes with traffic hashed over the
1071	   available paths.  In this scenario the availability of a backup path
1072	   in the event of failure is implicit.  Commonly within the DC, rather
1073	   than computing protect paths (like LFA), techniques such as 'fast
1074	   rehash' are often utilized.  In this particular case, the failed
1075	   next-hop is removed from the multi-path forwarding data structure and
1076	   traffic is then rehashed over the remaining active paths.

1078	   In BGP-only data centers this relies on the implementation of BGP
1079	   multipath.  As network elements in the lower tier of a Clos fabric
1080	   will frequently belong to different ASNs, this includes the ability
1081	   to load-balance to a prefix with different AS_PATH attribute values
1082	   while having the same AS_PATH length; sometimes referred to as
1083	   'multipath relax' or 'multipath multiple-AS' [RFC7938].

1085	   Failure detection relies upon declaring a BGP session down and
1086	   removing any prefixes learnt over that session as soon as the link is
1087	   declared down.  As links between network elements predominantly use
1088	   direct point-to-point fiber, a link failure should be detected within
1089	   milliseconds.  BFD is also commonly used to detect IP layer failures.

1091	5.9.3.  Exchange of Inter-Domain Routes

1093	   Labeled unicast BGP together with SR Prefix-SID extensions are used
1094	   to exchange PNF and/or VNF endpoints between domains to create end-
1095	   to-end connectivity without TE.  When advertising between domains we
1096	   assume that a given BGP prefix is advertised by at least two border
1097	   routers (DCBs, ABRs, ASBRs) making prefixes reachable via at least
1098	   two next-hops.

1100	   BGP Prefix Independent Convergence (PIC) [I-D.ietf-rtgwg-bgp-pic]
1101	   allows failover to a pre-computed and pre-installed secondary next-
1102	   hop when the primary next-hop fails and is independent of the number
1103	   of destination prefixes that are affected by the failure.  When the
1104	   primary BGP next-hop fails, it should be clear that BGP PIC depends
1105	   on the availability o f a secondary next-hop in the Pathlist.  To
1106	   ensure that multiple paths to the same destination are visible the
1107	   BGP ADD-PATH [RFC7911] can be used to allow for advertisement of
1108	   multiple paths for the same address prefix.  Dual-homed EVPN/IP-VPN
1109	   prefixes also have the alternative option of allocating different
1110	   Route-Distinguishers (RDs).  To trigger the switch from primary to
1111	   secondary next-hop PIC needs to detect the failure and many
1112	   implementations support 'next-hop tracking' for this purpose.  Next-
1113	   hop tracking monitors the routing-table and if the next-hop prefix is
1114	   removed will immediately invalidate all BGP prefixes learnt through
1115	   that next-hop.  In the absence of next-hop tracking, multihop BFD
1116	   [RFC5883] could optionally be used as a fast failure detection
1117	   mechanism.

1119	5.9.4.  Controller Redundancy

1121	   With the Interconnect controller providing an integral part of the
1122	   networks' capabilities a redundant controller design is clearly
1123	   prudent.  To this end we can consider both availability and
1124	   redundancy.  Availability refers to the survivability of a single
1125	   controller system in a failure scenario.  A common strategy for
1126	   increasing the availability of a single controller system is to build
1127	   the system in a high-availability cluster such that it becomes a
1128	   confederation of redundant constituent parts as opposed to a single
1129	   monolithic system.  Should a single part fail, the system can still
1130	   survive without the requirement to failover to a standby controller
1131	   system.  Methods for detection of a failure of one or more member
1132	   parts of the cluster are implementation specific.

1134	   To provide contingency for a complete system failure a geo-redundant
1135	   standby controller system is required.  When redundant controllers
1136	   are deployed a coherent strategy is needed that provides a master/
1137	   standby election mechanism, the ability to propagate the outcome of
1138	   that election to network elements as required, an inter-system
1139	   failure detection mechanism, and the ability to synchronize state
1140	   across both systems such that the standby controller is fully aware
1141	   of current state should it need to transition to master controller.

1143	   Master/standby election, state synchronisation, and failure detection
1144	   between geo-redundant sites can largely be considered a local
1145	   implementation matter.  The requirement to propagate the outcome of
1146	   the master/standby election to network elements depends on a) the
1147	   mechanism that is used to instantiate SR policies, and b) whether the
1148	   SR policies are controller-initiated or headend-initiated, and these
1149	   are discussed in the following sub-sections.  In either scenario,
1150	   state of SR policies should be advertised northbound to both master/
1151	   standby controllers using either PCEP LSP State Report messages or SR
1152	   policy extensions to BGP link-state
1153	   [I-D.ietf-idr-te-lsp-distribution].

1155	5.9.4.1.  SR Policy Initiator

1157	   Controller-initiated SR policies are suited for auto-creation of
1158	   tunnels based on service route discovery and policy-driven route/flow
1159	   programming and are ephemeral.  Headend-initiated tunnels allow for
1160	   permanent configuration state to be held on the headend and are
1161	   suitable for static services that are not subject to dynamic changes.
1162	   If all SR policies are controller-initiated, it negates the
1163	   requirement to propagate the outcome of the master/standby election
1164	   to network elements.  This is because headends have no requirement
1165	   for unsolicited requests to a controller, and therefore have no
1166	   requirement to know which controller is master and which one is
1167	   standby.  A headend may respond to a message from a controller, but
1168	   it is not unsolicited.

1170	   If some or all SR policies are headend-initiated, then the
1171	   requirement to propagate the outcome of the master/standby election
1172	   exists.  This is further discussed in the following sub-section.

1174	5.9.4.2.  SR Policy Instantiation Mechanism

1176	   While candidate paths of SR policies may be provided using BGP, PCEP,
1177	   Netconf, or local policy/configuration, this document primarily
1178	   considers the use of PCEP or BGP.

1180	   When PCEP [RFC5440][RFC8231][RFC8281] is used for instantiation of
1181	   candidate paths of SR policies
1182	   [I-D.barth-pce-segment-routing-policy-cp] every headend/PCC should
1183	   establish a PCEP session with the master and standby controllers.  To
1184	   signal standby state to the PCC the standby controller may use a PCEP
1185	   Notification message to set the PCEP session into overload state.
1186	   While in this overload state the standby controller will accept path
1187	   computation LSP state report (PCRpt) messages without delegation but
1188	   will reject path computation requests (PCReq) and any path
1189	   computation reports (PCRpt) with the delegation bit set.  Further,
1190	   the standby controller will not path computation originate initiate
1191	   messages (PCInit) or path computation update request messages
1192	   (PCUpd).  In the event of the failure of the master controller, the
1193	   standby controller will transition to active and remove the PCEP
1194	   overload state.  Following expiration of the PCEP redelegation
1195	   timeout at the PCC any LSPs will be redelegated to the newly
1196	   transitioned active controller.  LSP state is not impacted unless
1197	   redelegation is not possible before the state timeout interval
1198	   expires.

1200	   When BGP is used for instantiation of SR policies every headend
1201	   should establish a BGP session with the master and standby controller
1202	   capable of exchanging SR TE Policy SAFI.  Candidate paths of SR
1203	   policies are advertised only by the active controller.  If the master
1204	   controller should experience a failure, then SR policies learnt from
1205	   that controller may be removed before they are re-advertised by the
1206	   standby (or newly-active) controller.  To minimize this possibility
1207	   BGP speakers that advertise and instantiate SR policies can implement
1208	   Long Lived Graceful Retart (LLGR) [I-D.ietf-idr-long-lived-gr], also
1209	   known as BGP persistence, to retain existing routes treated as least-
1210	   preferred until the new route arrives.  In the absence of LLGR, two
1211	   other alternatives are possible:

1213	   *  Provide a static backup SR policy.

1215	   *  Fallback to the default forwarding path.

1217	5.9.5.  Path and Segment Liveliness

1219	   When using traffic-engineered SR paths only the ingress router holds
1220	   any state.  The exception here is where BSIDs are used, which also
1221	   implies some state is maintained at the BSID anchor.  As there is no
1222	   control plane set-up, it follows that there is no feedback loop from
1223	   transit nodes of the path to notify the headend when a non-adjacent
1224	   point of the SR path fails.  The Interconnect controller however is
1225	   aware of all paths that are impacted by a given network failure and
1226	   should take the appropriate action.  This action could include
1227	   withdrawing an SR policy if a suitable candidate path is already in
1228	   place, or simply sending a new SR policy with a different segment-
1229	   list and a higher preference value assigned to it.

1231	   Verification of data plane liveliness is the responsibility of the
1232	   path headend.  A given SR policy may be associated with multiple
1233	   candidate paths and for the sake of clarity, we'll assume two for
1234	   redundancy purposes (which can be diversely routed).  Verification of
1235	   the liveliness of these paths can be achieved using seamless BFD
1236	   (S-BFD)[RFC7880], which provides an in-band failure detection
1237	   mechanism capable of detecting failure in the order of tens of
1238	   milliseconds.  Upon failure of the active path, failover to a
1239	   secondary candidate path can be activated at the path headend.
1240	   Details of the actual failover and revert mechanisms are a local
1241	   implementation matter.

1243	   S-BFD provides a fast and scalable failure detection mechanism but is
1244	   unlikely to be implemented in many VNFs given their inability to
1245	   offload the process to purpose-built hardware.  In the absence of an
1246	   active failure detection mechanism such as S-BFD the failover from
1247	   active path to secondary candidate path can be triggered using
1248	   continuous path validity checks.  One of the criteria that a
1249	   candidate path uses to determine its validity is the ability to
1250	   perform path resolution for the first SID to one or more outgoing
1251	   interface(s) and next-hop(s).  From the perspective of the VNF
1252	   headend the first SID in the segment-list will very likely be the DCB
1253	   (as BSID anchor) but could equally be another Prefix-SID hop within
1254	   the data center.  Should this segment experience a non-recoverable
1255	   failure, the headend will be unable to resolve the first SID and the
1256	   path will be considered invalid.  This will trigger a failover action
1257	   to a secondary candidate path.

1259	   Injection of S-BFD packets is not just constrained to the source of
1260	   an end-to-end LSP.  When an S-BFD packet is injected into an SR
1261	   policy path it is encapsulated with the label stack of the associated
1262	   segment-list.  It is possible therefore to run S-BFD from a BSID
1263	   anchor for just that section of the end-to-end path (for example,
1264	   from DCB to DCB).  This allows a BSID anchor to detect failure of a
1265	   path and take corrective action, while maintaining opacity between
1266	   domains.

1268	5.10.  Scalability

1270	   There are many aspects to consider regarding scalability of the NFIX
1271	   architecture.  The building blocks of NFIX are standards-based
1272	   technologies individually designed to scale for internet provider
1273	   networks.  When combined they provide a flexible and scalable
1274	   solution:

1276	   *  BGP has been proven to scale and operate with millions of routes
1277	      being exchanged.  Specifically, BGP labeled unicast has been
1278	      deployed and proven to scale in existing seamless-MPLS networks.

1280	   *  By placing forwarding instructions in the header of a packet,
1281	      segment routing reduces the amount of state required in the
1282	      network allowing the scale of greater number of transport tunnels.
1283	      This aids in the feasibility of the NFIX architecture to permit
1284	      the automated aspects of SR policy creation without having an
1285	      impact on the state in the core of the network.

1287	   *  The choice of utilizing native SR-MPLS or SR over IP in the data
1288	      center continues to permit horizontal scaling without introducing
1289	      new state inside of the data center fabric while still permitting
1290	      seamless end to end path forwarding integration.

1292	   *  BSIDs play a key role in the NFIX architecture as their use
1293	      provides the ability to traffic-engineer across large network
1294	      topologies consisting of many hops regardless of hardware
1295	      capability at the headend.  From a scalability perspective the use
1296	      of BSIDs facilitates better scale due to the fact that detailed
1297	      information about the SR paths in a domain has been abstracted and
1298	      localized to the BSID anchor point only.  When BSIDs are re-used
1299	      amongst one or many headends they reduce the amount of path
1300	      calculation and updates required at network edges while still
1301	      providing seamless end to end path forwarding.

1303	   *  The architecture of NFIX continues to use an independent DC
1304	      controller.  This allows continued independent scaling of data
1305	      center management in both policy and local forwarding functions,
1306	      while off-loading the end-to-end optimal path placement and
1307	      automation to the Interconnect controller.  The optimal path
1308	      placement is already a scalable function provided in a PCE
1309	      architecture.  The Interconnect controller must compute paths, but
1310	      it is not burdened by the management of virtual entity lifecycle
1311	      and associated forwarding policies.

1313	   It must be acknowledged that with the amalgamation of the technology
1314	   building blocks and the automation required by NFIX, there is an
1315	   additional burden on the Interconnect controller.  The scaling
1316	   considerations are dependent on many variables, but an implementation
1317	   of a Interconnect controller shares many overlapping traits and
1318	   scaling concerns as PCE, where the controller and PCE both must:

1320	   *  Discover and listen to topological state changes of the IP/MPLS
1321	      topology.

1323	   *  Compute traffic-engineered intra and inter domain paths across
1324	      large service provider topologies.

1326	   *  Synchronize, track and update thousands of LSPs to network devices
1327	      upon network state changes.

1329	   Both entail topologies that contain tens of thousands of nodes and
1330	   links.  The Interconnect controller in an NFIX architecture takes on
1331	   the additional role of becoming end to end service aware and
1332	   discovering data center entities that were traditionally excluded
1333	   from a controllers scope.  Although not exhaustive, an NFIX
1334	   Interconnect controller is impacted by some of the following:

1336	   *  The number of individual services, the number of endpoints that
1337	      may exist in each service, the distribution of endpoints in a
1338	      virtualized environment, and how many data centers may exist.
1339	      Medium or large sized data centers may be capable to host more
1340	      virtual endpoints per host, but with the move to smaller edge-
1341	      clouds the number of headends that require inter-connectivity
1342	      increases compared to the density of localized routing in a
1343	      centralized data center model.  The outcome has an impact on the
1344	      number of headend devices which may require tunnel management by
1345	      the Interconnect controller.

1347	   *  Assuming a given BSID satisfies SLA, the ability to re-use BSIDs
1348	      across multiple services reduces the number of paths to track and
1349	      manage.  However, the number of color or unique SLA definitions,
1350	      and criteria such as bandwidth constraints impacts WAN traffic
1351	      distribution requirements.  As BSIDs play a key role for VNF
1352	      connectivity, this potentially increases the number of BSID paths
1353	      required to permit appropriate traffic distribution.  This also
1354	      impacts the number of tunnels which may be re-used on a given
1355	      headend for different services.

1357	   *  The frequency of virtualized hosts being created and destroyed and
1358	      the general activity within a given service.  The controller must
1359	      analyze, track, and correlate the activity of relevant BGP routes
1360	      to track addition and removal of service host or host subnets, and
1361	      determine whether new SR policies should be instantiated, or stale
1362	      unused SR policies should be removed from the network.

1364	   *  The choice of SR instantiation mechanism impacts the number of
1365	      communication sessions the controller may require.  For example,
1366	      the BGP based mechanism may only require a small number of
1367	      sessions to route reflectors, whereas PCEP may require a
1368	      connection to every possible leaf in the network and any BSID
1369	      anchors.

1371	   *  The number of hops within one or many WAN domains may affect the
1372	      number of BSIDs required to provide transit for VNF/PNF, PNF/PNF,
1373	      or VNF/VNF inter-connectivity.

1375	   *  Relative to traditional WAN topologies, traditional data centers
1376	      are generally topologically denser in node and link connectivity
1377	      which is required to be discovered by the Interconnect controller,
1378	      resulting in a much larger, dense link-state database on the
1379	      Interconnect controller.

1381	5.10.1.  Asymmetric Model B for VPN Families

1383	   With the instantiation of multiple TE paths between any two VNFs in
1384	   the NFIX network, the number of SR Policy (remote endpoint, color)
1385	   routes, BSIDs and labels to support on VNFs becomes a choke point in
1386	   the architecture.  The fact that some VNFs are limited in terms of
1387	   forwarding resources makes this aspect an important scale issue.

1389	   As an example, if VNF1 and VNF2 in Figure 1 are associated to
1390	   multiple topologies 1..n, the Interconnect controller will
1391	   instantiate n TE paths in VNF1 to reach VNF2:

1393	   [VNF1,color-1,VNF2] --> BSID 1

1395	   [VNF1,color-2,VNF2] --> BSID 2

1397	   ...

1399	   [VNF1,color-n,VNF2] --> BSID n

1401	   Similarly, m TE paths may be instantiated on VNF1 to reach VNF3,
1402	   another p TE paths to reach VNF4, and so on for all the VNFs that
1403	   VNF1 needs to communicate with in DC2.  As it can be observed, the
1404	   number of forwarding resources to be instantiated on VNF1 may
1405	   significantly grow with the number of remote [endpoint, color] pairs,
1406	   compared with a best-effort architecture in which the number
1407	   forwarding resources in VNF1 grows with the number of endpoints only.

1409	   This scale issue on the VNFs can be relieved by the use of an
1410	   asymmetric model B service layer.  The concept is illustrated in
1411	   Figure 3.

1413	                                                  +------------+
1414	            <-------------------------------------|    WAN     |
1415	            |  SR Policy      +-------------------| Controller |
1416	            |  BSID m         |   SR Policy       +------------+
1417	            v  {DCI1,n,DCI2}  v   BSID n
1418	                                  {1,2,3,4,5,DCI2}
1419	           +----------------+  +----------------+  +----------------+
1420	           |     +----+     |  |                |  |     +----+     |
1421	         +----+  | RR |    +----+              +----+    | RR |   +----+
1422	         |VNF1|  +----+    |DCI1|              |DCI2|    +----+   |VNF2|
1423	         +----+            +----+              +----+             +----+
1424	           |       DC1      |  |       WAN      |  |       DC2      |
1425	           +----------------+  +----------------+  +----------------+

1427	           <-------- <-------------------------- NHS <------ <------
1428	                                EVPN/VPN-IPv4/v6(colored)

1430	           +----------------------------------->     +------------->
1431	                     TE path to DCI2                ECMP path to VNF2
1432	                 (BSID to segment-list
1433	                  expansion on DCI1)

1435	                                 Figure 4

1437	   Asymmetric Model B Service Layer

1439	   Consider the different n topologies needed between VNF1 and VNF2 are
1440	   really only relevant to the different TE paths that exist in the WAN.
1441	   The WAN is the domain in the network where there can be significant
1442	   differences in latency, throughput or packet loss depending on the
1443	   sequence of nodes and links the traffic goes through.  Based on that
1444	   assumption, for traffic from VNF1 to DCB2 in Figure 4, traffic from
1445	   DCB2 to VNF2 can simply take an ECMP path.  In this case an
1446	   asymmetric model B Service layer can significantly relieve the scale
1447	   pressure on VNF1.

1449	   From a service layer perspective, the NFIX architecture described up
1450	   to now can be considered 'symmetric', meaning that the EVPN/IPVPN
1451	   advertisements from e.g., VNF2 in Figure 2, are received on VNF1 with
1452	   the next-hop of VNF2, and vice versa for VNF1's routes on VNF2.  SR
1453	   Policies to each VNF2 [endpoint, color] are then required on the
1454	   VNF1.

1456	   In the 'asymmetric' service design illustrated in Figure 4, VNF2's
1457	   EVPN/IPVPN routes are received on VNF1 with the next-hop of DCB2, and
1458	   VNF1's routes are received on VNF2 with next-hop of DCB1.  Now SR
1459	   policies instantiated on VNFs can be reduced to only the number of TE
1460	   paths required to reach the remote DCB.  For example, considering n
1461	   topologies, in a symmetric model VNF1 has to be instantiated with n
1462	   SR policy paths per remote VNF in DC2, whereas in the asymmetric
1463	   model of Figure 4, VNF1 only requires n SR policy paths per DC, i.e.,
1464	   to DCB2.

1466	   Asymmetric model B is a simple design choice that only requires the
1467	   ability (on the DCB nodes) to set next-hop-self on the EVPN/IPVPN
1468	   routes advertised to the WAN neighbors and not do next-hop-self for
1469	   routes advertised to the DC neighbors.  With this option, the
1470	   Interconnect controller only needs to establish TE paths from VNFs to
1471	   remote DCBs, as opposed to VNFs to remote VNFs.

1473	6.  Illustration of Use

1475	   For the purpose of illustration, this section provides some examples
1476	   of how different end-to-end tunnels are instantiated (including the
1477	   relevant protocols, SID values/label stacks etc.) and how services
1478	   are then overlaid onto those LSPs.

1480	6.1.  Reference Topology

1482	   The following network diagram illustrates the reference network
1483	   topology that is used for illustration purposes in this section.
1484	   Within the data centers leaf and spine network elements may be
1485	   present but are not shown for the purpose of clarity.

1487	                    +----------+
1488	                    |Controller|
1489	                    +----------+
1490	                      /  |  \
1491	             +----+          +----+          +----+     +----+
1492	     ~ ~ ~ ~ | R1 |----------| R2 |----------| R3 |-----|AGN1| ~ ~ ~ ~
1493	     ~       +----+          +----+          +----+     +----+       ~
1494	     ~   DC1    |                            /  |         |    DC2   ~
1495	   +----+       |      L=5   +----+   L=5   /   |       +----+    +----+
1496	   | Sn |       |    +-------| R4 |--------+    |       |AGN2|    | Dn |
1497	   +----+       |   /  M=20  +----+  M=20       |       +----+    +----+
1498	     ~          |  /                            |         |          ~
1499	     ~       +----+     +----+    +----+     +----+     +----+       ~
1500	     ~ ~ ~ ~ | R5 |-----| R6 |----| R7 |-----| R8 |-----|AGN3| ~ ~ ~ ~
1501	             +----+     +----+    +----+     +----+     +----+

1503	                               Figure 5

1505	   Reference Topology

1507	   The following applies to the reference topology in figure 5:

1509	   *  Data center 1 and data center 2 both run BGP/SR.  Both data
1510	      centers run leaf/spine topologies, which are not shown for the
1511	      purpose of clarity.

1513	   *  R1 and R5 function as data center border routers for DC 1.  AGN1
1514	      and AGN3 function as data center border routers for DC 2.

1516	   *  Routers R1 through R8 form an independent ISIS-OSPF/SR instance.

1518	   *  Routers R3, R8, AGN1, AGN2, and AGN2 form an independent ISIS-
1519	      OSPF/SR instance.

1521	   *  All IGP link metrics within the wide area network are metric 10
1522	      except for links R5-R4 and R4-R3 which are both metric 20.

1524	   *  All links have a unidirectional latency of 10 milliseconds except
1525	      for links R5-R4 and R4-R3 which both have a unidirectional latency
1526	      of 5 milliseconds.

1528	   *  Source 'Sn' and destination 'Dn' represent one or more network
1529	      functions.

1531	6.2.  PNF to PNF Connectivity

1533	   The first example demonstrates the simplest form of connectivity; PNF
1534	   to PNF.  The example illustrates the instantiation of a
1535	   unidirectional TE path from R1 to AGN2 and its consumption by an EVPN
1536	   service.  The service has a requirement for high-throughput with no
1537	   strict latency requirements.  These service requirements are
1538	   catalogued and represented using the color blue.

1540	   *  An EVPN service is provisioned at R1 and AGN2.

1542	   *  The Interconnect controller computes the path from R1 to AGN2 and
1543	      calculates that the optimal path based on the service requirements
1544	      and overall network optimization is R1-R5-R6-R7-R8-AGN3-AGN2.  The
1545	      segment-list to represent the calculated path could be constructed
1546	      in numerous ways.  It could be strict hops represented by a series
1547	      of Adj-SIDs.  It could be loose hops using ECMP-aware Node-SIDs,
1548	      for example {R7, AGN2}, or it could be a combination of both Node-
1549	      SIDs and Adj-SIDs.  Equally, BSIDs could be used to reduce the
1550	      number of labels that need to be imposed at the headend.  In this
1551	      example, strict Adj-SID hops are used with a BSID at the area
1552	      border router R8, but this should not be interpreted as the only
1553	      way a path and segment-list can be represented.

1555	   *  The Interconnect controller advertises a BGP SR Policy to R8 with
1556	      BSID 1000, and a segment-list containing segments {AGN3, AGN2}.

1558	   *  The Interconnect controller advertises a BGP SR Policy to R1 with
1559	      BSID 1001, and a segment-list containing segments {R5, R6, R7, R8,
1560	      1000}. The policy is identified using the tuple [headed = R1,
1561	      color = blue, endpoint = AGN2].

1563	   *  AGN2 advertises an EVPN MAC Advertisement Route for MAC M1, which
1564	      is learned by R1.  The route has a next-hop of AGN2, an MPLS label
1565	      of L1, and it carries a color extended community with the value
1566	      blue.

1568	   *  R1 has a valid SR policy [color = blue, next-hop = AGN2] with
1569	      segment-list {R5, R6, R7, R8, 1000}. R1 therefore associates the
1570	      MAC address M1 with that policy and programs the relevant
1571	      information into the forwarding path.

1573	   *  The Interconnect controller also learns the EVPN MAC Route
1574	      advertised by AGN2.  The purpose of this is two-fold.  It allows
1575	      the controller to correlate the service overlay with the
1576	      underlying transport LSPs, thus creating a service connectivity
1577	      map.  It also allows the controller to dynamically create LSPs
1578	      based upon service requirements if they do not already exist, or
1579	      to optimize them if network conditions change.

1581	6.3.  VNF to PNF Connectivity

1583	   The next example demonstrates VNF to PNF connectivity and illustrates
1584	   the instantiation of a unidirectional TE path from S1 to AGN2.  The
1585	   path is consumed by an IP-VPN service that has a basic set of service
1586	   requirements and as such simply uses IGP metric as a path computation
1587	   objective.  These basic service requirements are cataloged and
1588	   represented using the color red.

1590	   In this example S1 is a VNF with full IP routing and MPLS capability
1591	   that interfaces to the data center underlay/overlay and serves as the
1592	   NVO tunnel endpoint.

1594	   *  An IP-VPN service is provisioned at S1 and AGN2.

1596	   *  The Interconnect controller computes the path from S1 to AGN2 and
1597	      calculates that the optimal path based on IGP metric is
1598	      R1-R2-R3-AGN1-AGN2.

1600	   *  The Interconnect controller advertises a BGP SR Policy to R1 with
1601	      BSID 1002, and a segment-list containing segments {R2, R3, AGN1,
1602	      AGN2}.

1604	   *  The Interconnect controller advertises a BGP SR Policy to S1 with
1605	      BSID 1003, and a segment-list containing segments {R1, 1002}. The
1606	      policy is identified using the tuple [headend = S1, color = red,
1607	      endpoint = AGN2].

1609	   *  Source S1 learns an VPN-IPv4 route for prefix P1, next-hop AGN2.
1610	      The route has an VPN label of L1, and it carries a color extended
1611	      community with value red.

1613	   *  S1 has a valid SR policy [color = red, endpoint = AGN2] with
1614	      segment-list {R1, 1002} and BSID 1003.  S1 therefore associates
1615	      the VPN-IPv4 prefix P1 with that policy and programs the relevant
1616	      information into the forwarding path.

1618	   *  As in the previous example the Interconnect controller also learns
1619	      the VPN-IPv4 route advertised by AGN2 in order to correlate the
1620	      service overlay with the underlying transport LSPs, creating or
1621	      optimizing them as required.

1623	6.4.  VNF to VNF Connectivity

1625	   The last example demonstrates VNF to VNF connectivity and illustrates
1626	   the instantiation of a unidirectional TE path from S2 to D2.  The
1627	   path is consumed by an EVPN service that requires low latency as a
1628	   service requirement and as such uses latency as a path computation
1629	   objective.  This service requirement is cataloged and represented
1630	   using the color green.

1632	   In this example S2 is a VNF that has no routing capability.  It is
1633	   hosted by hypervisor H1 that in turn has an interface to a DC
1634	   controller through which forwarding instructions are programmed.  H1
1635	   serves as the NVO tunnel endpoint and overlay next-hop.

1637	   D2 is a VNF with partial routing capability that is connected to a
1638	   leaf switch L1.  L1 connects to underlay/overlay in data center 2 and
1639	   serves as the NVO tunnel endpoint for D2.  L1 advertises BGP Prefix-
1640	   SID 9001 into the underlay.

1642	   *  The relevant details of the EVPN service are entered in the data
1643	      center policy engines within data center 1 and 2.

1645	   *  Source S2 is turned-up.  Hypervisor H1 notifies its parent DC
1646	      controller, which in turn retrieves the service (EVPN)
1647	      information, color, IP and MAC information from the policy engine
1648	      and subsequently programs the associated forwarding entries onto
1649	      S2.  The DC controller also dynamically advertises an EVPN MAC
1650	      Advertisement Route for S2's IP and MAC into the overlay with
1651	      next-hop H1.  (This would trigger the return path set-up between
1652	      L1 and H2 not covered in this example.)

1654	   *  The DC controller in data center 1 learns an EVPN MAC
1655	      Advertisement Route for D2, MAC M, next-nop L1.  The route has an
1656	      MPLS label of L2, and it carries a color extended community with
1657	      the value green.

1659	   *  The Interconnect controller computes the path between H1 and L1
1660	      and calculates that the optimal path based on latency is
1661	      R5-R4-R3-AGN1.

1663	   *  The Interconnect controller advertises a BGP SR Policy to R5 with
1664	      BSID 1004, and a segment-list containing segments {R4, R3, AGN1}.

1666	   *  The Interconnect controller advertises a BGP SR Policy to the DC
1667	      controller in data center 1 with BSID 1005 and a segment-list
1668	      containing segments {R5, 1004, 9001}. The policy is identified
1669	      using the tuple [headend = H1, color = green, endpoint = L1].

1671	   *  The DC controller in data center 1 has a valid SR policy [color =
1672	      green, endpoint = L1] with segment-list {R5, 1004, 9001} and BSID
1673	      1005.  The controller therefore associates the MAC Advertisement
1674	      Route with that policy, and programs the associated forwarding
1675	      rules into S2.

1677	   *  As in the previous example the Interconnect controller also learns
1678	      the MAC Advertisement Route advertised by D2 in order to correlate
1679	      the service overlay with the underlying transport LSPs, creating
1680	      or optimizing them as required.

1682	7.  Conclusions

1684	   The NFIX architecture provides an evolutionary path to a unified
1685	   network fabric.  It uses the base constructs of seamless-MPLS and
1686	   adds end-to-end LSPs capable of delivering against SLAs, seamless
1687	   data center interconnect, service differentiation, service function
1688	   chaining, and a Layer-2/Layer-3 infrastructure capable of
1689	   interconnecting PNF-to-PNF, PNF-to-VNF, and VNF-to-VNF.

1691	   NFIX establishes a dynamic, seamless, and automated connectivity
1692	   model that overcomes the operational barriers and interworking issues
1693	   between data centers and the wide-area network and delivers the
1694	   following using standards-based protocols:

1696	   *  A unified routing control plane: Multiprotocol BGP (MP-BGP) to
1697	      acquire inter-domain NLRI from the IP/MPLS underlay and the
1698	      virtualized IP-VPN/EVPN service overlay.

1700	   *  A unified forwarding control plane: SR provides dynamic service
1701	      tunnels with fast restoration options to meet deterministic
1702	      bandwidth, latency and path diversity constraints.  SR utilizes
1703	      the appropriate data path encapsulation for seamless, end-to-end
1704	      connectivity between distributed edge and core data centers across
1705	      the wide-area network.

1707	   *  Service Function Chaining: Leverage SFC extensions for BGP and
1708	      segment routing to interconnect network and service functions into
1709	      SFPs, with support for various data path implementations.

1711	   *  Service Differentiation: Provide a framework that allows for
1712	      construction of logical end-to-end networks with differentiated
1713	      logical topologies and/or constraints through use of SR policies
1714	      and coloring.

1716	   *  Automation: Facilitates automation of service provisioning and
1717	      avoids heavy service interworking at DCBs.

1719	   NFIX is deployable on existing data center and wide-area network
1720	   infrastructures and allows the underlying data forwarding plane to
1721	   evolve with minimal impact on the services plane.

1723	8.  Security Considerations

1725	   The NFIX architecture based on SR-MPLS is subject to the same
1726	   security concerns as any MPLS network.  No new protocols are
1727	   introduced, hence security issues of the protocols encompassed by
1728	   this architecture are addressed within the relevant individual
1729	   standards documents.  It is recommended that the security framework
1730	   for MPLS and GMPLS networks defined in [RFC5920] are adhered to.
1731	   Although [RFC5920] focuses on the use of RSVP-TE and LDP control
1732	   plane, the practices and procedures are extendable to an SR-MPLS
1733	   domain.

1735	   The NFIX architecture makes extensive use of Multiprotocol BGP, and
1736	   it is recommended that the TCP Authentication Option (TCP-AO)
1737	   [RFC5925] is used to protect the integrity of long-lived BGP sessions
1738	   and any other TCP-based protocols.

1740	   Where PCEP is used between controller and path headend the use of
1741	   PCEPS [RFC8253] is recommended to provide confidentiality to PCEP
1742	   communication using Transport Layer Security (TLS).

1744	9.  Acknowledgements

1746	   The authors would like to acknowledge Mustapha Aissaoui, Wim
1747	   Henderickx, and Gunter Van de Velde.

1749	10.  Contributors

1751	   The following people contributed to the content of this document and
1752	   should be considered co-authors.

1754	           Juan Rodriguez
1755	           Nokia
1756	           United States of America

1758	           Email: juan.rodriguez@nokia.com

1760	           Jorge Rabadan
1761	           Nokia
1762	           United States of America

1764	           Email: jorge.rabadan@nokia.com

1766	           Nick Morris
1767	           Verizon
1768	           United States of America

1770	           Email: nicklous.morris@verizonwireless.com

1772	           Eddie Leyton
1773	           Verizon
1774	           United States of America

1776	           Email: edward.leyton@verizonwireless.com

1778	                                  Figure 6

1780	11.  IANA Considerations

1782	   This memo does not include any requests to IANA for allocation.

1784	12.  References

1786	12.1.  Normative References

1788	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1789	              Requirement Levels", BCP 14, RFC 2119, March 1997,
1790	              <http://xml.resource.org/public/rfc/html/rfc2119.html>.

1792	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
1793	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
1794	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

1796	12.2.  Informative References

1798	   [I-D.ietf-nvo3-geneve]
1799	              Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic
1800	              Network Virtualization Encapsulation", Work in Progress,
1801	              Internet-Draft, draft-ietf-nvo3-geneve-16, 7 March 2020,
1802	              <https://www.ietf.org/archive/id/draft-ietf-nvo3-geneve-
1803	              16.txt>.

1805	   [I-D.ietf-mpls-seamless-mpls]
1806	              Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz,
1807	              M., and D. Steinberg, "Seamless MPLS Architecture", Work
1808	              in Progress, Internet-Draft, draft-ietf-mpls-seamless-
1809	              mpls-07, 28 June 2014, <https://www.ietf.org/archive/id/
1810	              draft-ietf-mpls-seamless-mpls-07.txt>.

1812	   [I-D.ietf-bess-evpn-ipvpn-interworking]
1813	              Rabadan, J., Sajassi, A., Rosen, E., Drake, J., Lin, W.,
1814	              Uttaro, J., and A. Simpson, "EVPN Interworking with
1815	              IPVPN", Work in Progress, Internet-Draft, draft-ietf-bess-
1816	              evpn-ipvpn-interworking-06, 22 September 2021,
1817	              <https://www.ietf.org/archive/id/draft-ietf-bess-evpn-
1818	              ipvpn-interworking-06.txt>.

1820	   [I-D.ietf-spring-segment-routing-policy]
1821	              Filsfils, C., Talaulikar, K., Voyer, D., Bogdanov, A., and
1822	              P. Mattes, "Segment Routing Policy Architecture", Work in
1823	              Progress, Internet-Draft, draft-ietf-spring-segment-
1824	              routing-policy-14, 25 October 2021,
1825	              <https://www.ietf.org/archive/id/draft-ietf-spring-
1826	              segment-routing-policy-14.txt>.

1828	   [I-D.ietf-rtgwg-segment-routing-ti-lfa]
1829	              Litkowski, S., Bashandy, A., Filsfils, C., Francois, P.,
1830	              Decraene, B., and D. Voyer, "Topology Independent Fast
1831	              Reroute using Segment Routing", Work in Progress,
1832	              Internet-Draft, draft-ietf-rtgwg-segment-routing-ti-lfa-
1833	              07, 29 June 2021, <https://www.ietf.org/archive/id/draft-
1834	              ietf-rtgwg-segment-routing-ti-lfa-07.txt>.

1836	   [I-D.ietf-bess-nsh-bgp-control-plane]
1837	              Farrel, A., Drake, J., Rosen, E., Uttaro, J., and L.
1838	              Jalil, "BGP Control Plane for the Network Service Header
1839	              in Service Function Chaining", Work in Progress, Internet-
1840	              Draft, draft-ietf-bess-nsh-bgp-control-plane-18, 21 August
1841	              2020, <https://www.ietf.org/archive/id/draft-ietf-bess-
1842	              nsh-bgp-control-plane-18.txt>.

1844	   [I-D.ietf-idr-te-lsp-distribution]
1845	              Previdi, S., Talaulikar, K., Dong, J., Chen, M., Gredler,
1846	              H., and J. Tantsura, "Distribution of Traffic Engineering
1847	              (TE) Policies and State using BGP-LS", Work in Progress,
1848	              Internet-Draft, draft-ietf-idr-te-lsp-distribution-16, 22
1849	              October 2021, <https://www.ietf.org/archive/id/draft-ietf-
1850	              idr-te-lsp-distribution-16.txt>.

1852	   [I-D.barth-pce-segment-routing-policy-cp]
1853	              Koldychev, M., Sivabalan, S., Barth, C., Peng, S., and H.
1854	              Bidgoli, "PCEP extension to support Segment Routing Policy
1855	              Candidate Paths", Work in Progress, Internet-Draft, draft-
1856	              barth-pce-segment-routing-policy-cp-06, 2 June 2020,
1857	              <https://www.ietf.org/archive/id/draft-barth-pce-segment-
1858	              routing-policy-cp-06.txt>.

1860	   [I-D.filsfils-spring-sr-policy-considerations]
1861	              Filsfils, C., Talaulikar, K., Krol, P., Horneffer, M., and
1862	              P. Mattes, "SR Policy Implementation and Deployment
1863	              Considerations", Work in Progress, Internet-Draft, draft-
1864	              filsfils-spring-sr-policy-considerations-08, 22 October
1865	              2021, <https://www.ietf.org/archive/id/draft-filsfils-
1866	              spring-sr-policy-considerations-08.txt>.

1868	   [I-D.ietf-rtgwg-bgp-pic]
1869	              Bashandy, A., Filsfils, C., and P. Mohapatra, "BGP Prefix
1870	              Independent Convergence", Work in Progress, Internet-
1871	              Draft, draft-ietf-rtgwg-bgp-pic-17, 12 October 2021,
1872	              <https://www.ietf.org/archive/id/draft-ietf-rtgwg-bgp-pic-
1873	              17.txt>.

1875	   [I-D.ietf-isis-mpls-elc]
1876	              Xu, X., Kini, S., Psenak, P., Filsfils, C., Litkowski, S.,
1877	              and M. Bocci, "Signaling Entropy Label Capability and
1878	              Entropy Readable Label Depth Using IS-IS", Work in
1879	              Progress, Internet-Draft, draft-ietf-isis-mpls-elc-13, 28
1880	              May 2020, <https://www.ietf.org/archive/id/draft-ietf-
1881	              isis-mpls-elc-13.txt>.

1883	   [I-D.ietf-ospf-mpls-elc]
1884	              Xu, X., Kini, S., Psenak, P., Filsfils, C., Litkowski, S.,
1885	              and M. Bocci, "Signaling Entropy Label Capability and
1886	              Entropy Readable Label Depth Using OSPF", Work in
1887	              Progress, Internet-Draft, draft-ietf-ospf-mpls-elc-15, 1
1888	              June 2020, <https://www.ietf.org/archive/id/draft-ietf-
1889	              ospf-mpls-elc-15.txt>.

1891	   [I-D.ietf-idr-next-hop-capability]
1892	              Decraene, B., Kompella, K., and W. Henderickx, "BGP Next-
1893	              Hop dependent capabilities", Work in Progress, Internet-
1894	              Draft, draft-ietf-idr-next-hop-capability-07, 8 December
1895	              2021, <https://www.ietf.org/archive/id/draft-ietf-idr-
1896	              next-hop-capability-07.txt>.

1898	   [I-D.ietf-spring-segment-routing-central-epe]
1899	              Filsfils, C., Previdi, S., Dawra, G., Aries, E., and D.
1900	              Afanasiev, "Segment Routing Centralized BGP Egress Peer
1901	              Engineering", Work in Progress, Internet-Draft, draft-
1902	              ietf-spring-segment-routing-central-epe-10, 21 December
1903	              2017, <https://www.ietf.org/archive/id/draft-ietf-spring-
1904	              segment-routing-central-epe-10.txt>.

1906	   [I-D.ietf-idr-long-lived-gr]
1907	              Uttaro, J., Chen, E., Decraene, B., and J. G. Scudder,
1908	              "Support for Long-lived BGP Graceful Restart", Work in
1909	              Progress, Internet-Draft, draft-ietf-idr-long-lived-gr-00,
1910	              5 September 2019, <https://www.ietf.org/archive/id/draft-
1911	              ietf-idr-long-lived-gr-00.txt>.

1913	   [RFC7938]  Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of
1914	              BGP for Routing in Large-Scale Data Centers", RFC 7938,
1915	              DOI 10.17487/RFC7938, August 2016,
1916	              <https://www.rfc-editor.org/info/rfc7938>.

1918	   [RFC7752]  Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and
1919	              S. Ray, "North-Bound Distribution of Link-State and
1920	              Traffic Engineering (TE) Information Using BGP", RFC 7752,
1921	              DOI 10.17487/RFC7752, March 2016,
1922	              <https://www.rfc-editor.org/info/rfc7752>.

1924	   [RFC8277]  Rosen, E., "Using BGP to Bind MPLS Labels to Address
1925	              Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017,
1926	              <https://www.rfc-editor.org/info/rfc8277>.

1928	   [RFC8667]  Previdi, S., Ed., Ginsberg, L., Ed., Filsfils, C.,
1929	              Bashandy, A., Gredler, H., and B. Decraene, "IS-IS
1930	              Extensions for Segment Routing", RFC 8667,
1931	              DOI 10.17487/RFC8667, December 2019,
1932	              <https://www.rfc-editor.org/info/rfc8667>.

1934	   [RFC8665]  Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler,
1935	              H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF
1936	              Extensions for Segment Routing", RFC 8665,
1937	              DOI 10.17487/RFC8665, December 2019,
1938	              <https://www.rfc-editor.org/info/rfc8665>.

1940	   [RFC8669]  Previdi, S., Filsfils, C., Lindem, A., Ed., Sreekantiah,
1941	              A., and H. Gredler, "Segment Routing Prefix Segment
1942	              Identifier Extensions for BGP", RFC 8669,
1943	              DOI 10.17487/RFC8669, December 2019,
1944	              <https://www.rfc-editor.org/info/rfc8669>.

1946	   [RFC8663]  Xu, X., Bryant, S., Farrel, A., Hassan, S., Henderickx,
1947	              W., and Z. Li, "MPLS Segment Routing over IP", RFC 8663,
1948	              DOI 10.17487/RFC8663, December 2019,
1949	              <https://www.rfc-editor.org/info/rfc8663>.

1951	   [RFC7911]  Walton, D., Retana, A., Chen, E., and J. Scudder,
1952	              "Advertisement of Multiple Paths in BGP", RFC 7911,
1953	              DOI 10.17487/RFC7911, July 2016,
1954	              <https://www.rfc-editor.org/info/rfc7911>.

1956	   [RFC7880]  Pignataro, C., Ward, D., Akiya, N., Bhatia, M., and S.
1957	              Pallagatti, "Seamless Bidirectional Forwarding Detection
1958	              (S-BFD)", RFC 7880, DOI 10.17487/RFC7880, July 2016,
1959	              <https://www.rfc-editor.org/info/rfc7880>.

1961	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
1962	              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
1963	              2006, <https://www.rfc-editor.org/info/rfc4364>.

1965	   [RFC5920]  Fang, L., Ed., "Security Framework for MPLS and GMPLS
1966	              Networks", RFC 5920, DOI 10.17487/RFC5920, July 2010,
1967	              <https://www.rfc-editor.org/info/rfc5920>.

1969	   [RFC7011]  Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
1970	              "Specification of the IP Flow Information Export (IPFIX)
1971	              Protocol for the Exchange of Flow Information", STD 77,
1972	              RFC 7011, DOI 10.17487/RFC7011, September 2013,
1973	              <https://www.rfc-editor.org/info/rfc7011>.

1975	   [RFC6241]  Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed.,
1976	              and A. Bierman, Ed., "Network Configuration Protocol
1977	              (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011,
1978	              <https://www.rfc-editor.org/info/rfc6241>.

1980	   [RFC6020]  Bjorklund, M., Ed., "YANG - A Data Modeling Language for
1981	              the Network Configuration Protocol (NETCONF)", RFC 6020,
1982	              DOI 10.17487/RFC6020, October 2010,
1983	              <https://www.rfc-editor.org/info/rfc6020>.

1985	   [RFC7854]  Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP
1986	              Monitoring Protocol (BMP)", RFC 7854,
1987	              DOI 10.17487/RFC7854, June 2016,
1988	              <https://www.rfc-editor.org/info/rfc7854>.

1990	   [RFC8300]  Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed.,
1991	              "Network Service Header (NSH)", RFC 8300,
1992	              DOI 10.17487/RFC8300, January 2018,
1993	              <https://www.rfc-editor.org/info/rfc8300>.

1995	   [RFC5440]  Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation
1996	              Element (PCE) Communication Protocol (PCEP)", RFC 5440,
1997	              DOI 10.17487/RFC5440, March 2009,
1998	              <https://www.rfc-editor.org/info/rfc5440>.

2000	   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
2001	              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
2002	              eXtensible Local Area Network (VXLAN): A Framework for
2003	              Overlaying Virtualized Layer 2 Networks over Layer 3
2004	              Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
2005	              <https://www.rfc-editor.org/info/rfc7348>.

2007	   [RFC7637]  Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network
2008	              Virtualization Using Generic Routing Encapsulation",
2009	              RFC 7637, DOI 10.17487/RFC7637, September 2015,
2010	              <https://www.rfc-editor.org/info/rfc7637>.

2012	   [RFC3031]  Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
2013	              Label Switching Architecture", RFC 3031,
2014	              DOI 10.17487/RFC3031, January 2001,
2015	              <https://www.rfc-editor.org/info/rfc3031>.

2017	   [RFC8014]  Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T.
2018	              Narten, "An Architecture for Data-Center Network
2019	              Virtualization over Layer 3 (NVO3)", RFC 8014,
2020	              DOI 10.17487/RFC8014, December 2016,
2021	              <https://www.rfc-editor.org/info/rfc8014>.

2023	   [RFC8402]  Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
2024	              Decraene, B., Litkowski, S., and R. Shakir, "Segment
2025	              Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
2026	              July 2018, <https://www.rfc-editor.org/info/rfc8402>.

2028	   [RFC5883]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
2029	              (BFD) for Multihop Paths", RFC 5883, DOI 10.17487/RFC5883,
2030	              June 2010, <https://www.rfc-editor.org/info/rfc5883>.

2032	   [RFC8231]  Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path
2033	              Computation Element Communication Protocol (PCEP)
2034	              Extensions for Stateful PCE", RFC 8231,
2035	              DOI 10.17487/RFC8231, September 2017,
2036	              <https://www.rfc-editor.org/info/rfc8231>.

2038	   [RFC8281]  Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "Path
2039	              Computation Element Communication Protocol (PCEP)
2040	              Extensions for PCE-Initiated LSP Setup in a Stateful PCE
2041	              Model", RFC 8281, DOI 10.17487/RFC8281, December 2017,
2042	              <https://www.rfc-editor.org/info/rfc8281>.

2044	   [RFC5925]  Touch, J., Mankin, A., and R. Bonica, "The TCP
2045	              Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
2046	              June 2010, <https://www.rfc-editor.org/info/rfc5925>.

2048	   [RFC8253]  Lopez, D., Gonzalez de Dios, O., Wu, Q., and D. Dhody,
2049	              "PCEPS: Usage of TLS to Provide a Secure Transport for the
2050	              Path Computation Element Communication Protocol (PCEP)",
2051	              RFC 8253, DOI 10.17487/RFC8253, October 2017,
2052	              <https://www.rfc-editor.org/info/rfc8253>.

2054	   [RFC6790]  Kompella, K., Drake, J., Amante, S., Henderickx, W., and
2055	              L. Yong, "The Use of Entropy Labels in MPLS Forwarding",
2056	              RFC 6790, DOI 10.17487/RFC6790, November 2012,
2057	              <https://www.rfc-editor.org/info/rfc6790>.

2059	   [RFC8662]  Kini, S., Kompella, K., Sivabalan, S., Litkowski, S.,
2060	              Shakir, R., and J. Tantsura, "Entropy Label for Source
2061	              Packet Routing in Networking (SPRING) Tunnels", RFC 8662,
2062	              DOI 10.17487/RFC8662, December 2019,
2063	              <https://www.rfc-editor.org/info/rfc8662>.

2065	   [RFC8491]  Tantsura, J., Chunduri, U., Aldrin, S., and L. Ginsberg,
2066	              "Signaling Maximum SID Depth (MSD) Using IS-IS", RFC 8491,
2067	              DOI 10.17487/RFC8491, November 2018,
2068	              <https://www.rfc-editor.org/info/rfc8491>.

2070	   [RFC8476]  Tantsura, J., Chunduri, U., Aldrin, S., and P. Psenak,
2071	              "Signaling Maximum SID Depth (MSD) Using OSPF", RFC 8476,
2072	              DOI 10.17487/RFC8476, December 2018,
2073	              <https://www.rfc-editor.org/info/rfc8476>.

2075	Authors' Addresses

2077	   Colin Bookham (editor)
2078	   Nokia
2079	   740 Waterside Drive
2080	   Almondsbury, Bristol
2081	   United Kingdom

2083	   Email: colin.bookham@nokia.com

2085	   Andrew Stone
2086	   Nokia
2087	   600 March Road
2088	   Kanata, Ontario
2089	   Canada

2091	   Email: andrew.stone@nokia.com

2093	   Jeff Tantsura
2094	   Microsoft

2096	   Email: jefftant.ietf@gmail.com

2098	   Muhammad Durrani
2099	   Equinix Inc
2100	   1188 Arques Ave
2101	   Sunnyvale CA,
2102	   United States of America

2104	   Email: mdurrani@equinix.com

2106	   Bruno Decraene
2107	   Orange
2108	   38-40 Rue de General Leclerc
2109	   92794 Issey Moulineaux cedex 9
2110	   France

2112	   Email: bruno.decraene@orange.com