idnits 2.17.1 

draft-bookham-rtgwg-nfix-arch-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet has text resembling
     RFC 2119 boilerplate text.

  -- The document date (June 24, 2020) is 1400 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'ELI' is mentioned on line 757, but not defined

  == Missing Reference: 'EL' is mentioned on line 757, but not defined

  == Missing Reference: 'RFC7130' is mentioned on line 1061, but not defined

  == Outdated reference: A later version (-10) exists of
     draft-ietf-bess-evpn-ipvpn-interworking-03

  == Outdated reference: A later version (-22) exists of
     draft-ietf-spring-segment-routing-policy-07

  == Outdated reference: A later version (-13) exists of
     draft-ietf-rtgwg-segment-routing-ti-lfa-03

  == Outdated reference: A later version (-18) exists of
     draft-ietf-bess-nsh-bgp-control-plane-15

  == Outdated reference: A later version (-19) exists of
     draft-ietf-idr-te-lsp-distribution-13

  == Outdated reference: A later version (-09) exists of
     draft-filsfils-spring-sr-policy-considerations-05

  == Outdated reference: A later version (-20) exists of
     draft-ietf-rtgwg-bgp-pic-11

  == Outdated reference: A later version (-08) exists of
     draft-ietf-idr-next-hop-capability-05

  == Outdated reference: A later version (-06) exists of
     draft-ietf-idr-long-lived-gr-00

  -- Obsolete informational reference (is this intentional?): RFC 7752
     (Obsoleted by RFC 9552)


     Summary: 0 errors (**), 0 flaws (~~), 14 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	RTG Working Group                                        C. Bookham, Ed.
3	Internet-Draft                                                  A. Stone
4	Intended status: Informational                                     Nokia
5	Expires: December 26, 2020                                   J. Tantsura
6	                                                                  Apstra
7	                                                              M. Durrani
8	                                                             Equinix Inc
9	                                                             B. Decraene
10	                                                                  Orange
11	                                                           June 24, 2020

13	           An Architecture for Network Function Interconnect
14	                    draft-bookham-rtgwg-nfix-arch-01

16	Abstract

18	   The emergence of technologies such as 5G, the Internet of Things
19	   (IoT), and Industry 4.0, coupled with the move towards network
20	   function virtualization, means that the service requirements demanded
21	   from networks are changing.  This document describes an architecture
22	   for a Network Function Interconnect (NFIX) that allows for
23	   interworking of physical and virtual network functions in a unified
24	   and scalable manner across wide-area network and data center domains
25	   while maintaining the ability to deliver against SLAs.

27	Requirements Language

29	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
30	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
31	   document are to be interpreted as described in BCP 14
32	   [RFC2119][RFC8174] when, and only when, they appear in all capitals,
33	   as shown here.

35	Status of This Memo

37	   This Internet-Draft is submitted in full conformance with the
38	   provisions of BCP 78 and BCP 79.

40	   Internet-Drafts are working documents of the Internet Engineering
41	   Task Force (IETF).  Note that other groups may also distribute
42	   working documents as Internet-Drafts.  The list of current Internet-
43	   Drafts is at https://datatracker.ietf.org/drafts/current/.

45	   Internet-Drafts are draft documents valid for a maximum of six months
46	   and may be updated, replaced, or obsoleted by other documents at any
47	   time.  It is inappropriate to use Internet-Drafts as reference
48	   material or to cite them other than as "work in progress."
49	   This Internet-Draft will expire on December 26, 2020.

51	Copyright Notice

53	   Copyright (c) 2020 IETF Trust and the persons identified as the
54	   document authors.  All rights reserved.

56	   This document is subject to BCP 78 and the IETF Trust's Legal
57	   Provisions Relating to IETF Documents
58	   (https://trustee.ietf.org/license-info) in effect on the date of
59	   publication of this document.  Please review these documents
60	   carefully, as they describe your rights and restrictions with respect
61	   to this document.  Code Components extracted from this document must
62	   include Simplified BSD License text as described in Section 4.e of
63	   the Trust Legal Provisions and are provided without warranty as
64	   described in the Simplified BSD License.

66	Table of Contents

68	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
69	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
70	   3.  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . .   4
71	   4.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   6
72	   5.  Theory of Operation . . . . . . . . . . . . . . . . . . . . .   7
73	     5.1.  VNF Assumptions . . . . . . . . . . . . . . . . . . . . .   7
74	     5.2.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   8
75	     5.3.  Use of a Centralized Controller . . . . . . . . . . . . .   9
76	     5.4.  Routing and LSP Underlay  . . . . . . . . . . . . . . . .  11
77	       5.4.1.  Intra-Domain Routing  . . . . . . . . . . . . . . . .  11
78	       5.4.2.  Inter-Domain Routing  . . . . . . . . . . . . . . . .  13
79	       5.4.3.  Intra-Domain and Inter-Domain Traffic-Engineering . .  14
80	     5.5.  Service Layer . . . . . . . . . . . . . . . . . . . . . .  17
81	     5.6.  Service Differentiation . . . . . . . . . . . . . . . . .  19
82	     5.7.  Automated Service Activation  . . . . . . . . . . . . . .  20
83	     5.8.  Service Function Chaining . . . . . . . . . . . . . . . .  21
84	     5.9.  Stability and Availability  . . . . . . . . . . . . . . .  23
85	       5.9.1.  IGP Reconvergence . . . . . . . . . . . . . . . . . .  23
86	       5.9.2.  Data Center Reconvergence . . . . . . . . . . . . . .  23
87	       5.9.3.  Exchange of Inter-Domain Routes . . . . . . . . . . .  24
88	       5.9.4.  Controller Redundancy . . . . . . . . . . . . . . . .  24
89	       5.9.5.  Path and Segment Liveliness . . . . . . . . . . . . .  26
90	     5.10. Scalability . . . . . . . . . . . . . . . . . . . . . . .  28
91	       5.10.1.  Asymmetric Model B for VPN Families  . . . . . . . .  30
92	   6.  Illustration of Use . . . . . . . . . . . . . . . . . . . . .  32
93	     6.1.  Reference Topology  . . . . . . . . . . . . . . . . . . .  32
94	     6.2.  PNF to PNF Connectivity . . . . . . . . . . . . . . . . .  34
95	     6.3.  VNF to PNF Connectivity . . . . . . . . . . . . . . . . .  35
96	     6.4.  VNF to VNF Connectivity . . . . . . . . . . . . . . . . .  36

98	   7.  Conclusions . . . . . . . . . . . . . . . . . . . . . . . . .  37
99	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  38
100	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  38
101	   10. Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  38
102	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  39
103	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  39
104	     12.1.  Normative References . . . . . . . . . . . . . . . . . .  39
105	     12.2.  Informative References . . . . . . . . . . . . . . . . .  40
106	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  45

108	1.  Introduction

110	   With the introduction of technologies such as 5G, the Internet of
111	   Things (IoT), and Industry 4.0, service requirements are changing.
112	   In addition to the ever-increasing demand for more capacity, these
113	   services have other stringent service requirements that need to be
114	   met such as ultra-reliable and/or low-latency communication.

116	   Parallel to this, there is a continued trend to move towards network
117	   function virtualization.  Operators are building digitalized
118	   infrastructure capable of hosting numerous virtualized network
119	   functions (VNFs).  Infrastructure that can scale in and scale out
120	   depending on the application demand and can deliver flexibility and
121	   service velocity.  Much of this virtualization activity is driven by
122	   the afore-mentioned emerging technologies as new infrastructure is
123	   deployed in support of them.  To try and meet the new service
124	   requirements some of these VNFs are becoming more dispersed, so it is
125	   common for networks to have a mix of centralized medium- or large-
126	   sized sized data centers together with more distributed smaller
127	   'edge-clouds'.  VNFs hosted within these data centers require
128	   seamless connectivity to each other, and to their existing physical
129	   network function (PNF) counterparts.  This connectivity also needs to
130	   deliver against agreed SLAs.

132	   Coupled with the deployment of virtualization is automation.  Many of
133	   these VNFs are deployed within SDN-enabled data centers where
134	   automation is simply a must-have capability to improve service
135	   activation lead-times.  The expectation is that services will be
136	   instantiated in an abstract point-and-click manner and be
137	   automatically created by the underlying network, dynamically adapting
138	   to service connectivity changes as virtual entities move between
139	   hosts.

141	   This document describes an architecture for a Network Function
142	   Interconnect (NFIX) that allows for interworking of physical and
143	   virtual network functions in a unified and scalable manner.  It
144	   describes a mechanism for establishing connectivity across multiple
145	   discreet domains in both the wide-area network (WAN) and the data
146	   center (DC) while maintaining the ability to deliver against SLAs.
147	   To achieve this NFIX works with the underlying topology to build a
148	   unified over-the-top topology.

150	   The NFIX architecture described in this document does not define any
151	   new protocols but rather outlines an architecture utilizing a
152	   collaboration of existing standards-based protocols.

154	2.  Terminology

156	   o  A physical network function (PNF) refers to a network device such
157	      as a Provider Edge (PE) router that connects physically to the
158	      wide-area network.

160	   o  A virtualized network function (VNF) refers to a network device
161	      such as a provider edge (PE) router that is hosted on an
162	      application server.  The VNF may be bare-metal in that it consumes
163	      the entire resources of the server, or it may be one of numerous
164	      virtual functions instantiated as a VM or number of containers on
165	      a given server that is controlled by a hypervisor or container
166	      management platform.

168	   o  A Data Center Border (DCB) router refers to the network function
169	      that spans the border between the wide-area and the data center
170	      networks, typically interworking the different encapsulation
171	      techniques employed within each domain.

173	   o  An Interconnect controller is the controller responsible for
174	      managing the NFIX fabric and services.

176	   o  A DC controller is the term used for a controller that resides
177	      within an SDN-enabled data center and is responsible for the DC
178	      network(s)

180	3.  Motivation

182	   Industrial automation and business-critical environments use
183	   applications that are demanding on the network.  These applications
184	   present different requirements from low-latency to high-throughput,
185	   to application-specific traffic conditioning, or a combination.  The
186	   evolution to 5G equally presents challenges for mobile back-, front-
187	   and mid-haul networks.  The requirement for ultra-reliable low-
188	   latency communication means that operators need to re-evaluate their
189	   network architecture to meet these requirements.

191	   At the same time, the service edge is evolving.  Where the service
192	   edge device was historically a PNF, the adoption of virtualization
193	   means VNFs are becoming more commonplace.  Typically, these VNFs are
194	   hosted in some form of data center environment but require end-to-end
195	   connectivity to other VNFs and/or other PNFs.  This represents a
196	   challenge because generally transport layer connectivity differs
197	   between the WAN and the data center environment.  The WAN includes
198	   all levels of hierarchy (core, aggregation, access) that form the
199	   networks footprint, where transport layer connectivity using IP/MPLS
200	   is commonplace.  In the data center native IP is commonplace,
201	   utilizing network virtualization overlay (NVO) technologies such as
202	   virtual extensible LAN (VXLAN) [RFC7348], network virtualization
203	   using generic routing encapsulation (NVGRE) [RFC7637], or generic
204	   network virtualization encapsulation (GENEVE) [I-D.ietf-nvo3-geneve].
205	   There is a requirement to seamlessly integrate these islands and
206	   avoid heavy-lifting at interconnects as well as providing a means to
207	   provision end-to-end services with a single touch point at the edge.

209	   The service edge boundary is also changing.  Some functions that were
210	   previously reasonably centralized are now becoming more distributed.
211	   One reason for this is to attempt to deal with low latency
212	   requirements.  Another reason is that operators seek to reduce costs
213	   by deploying low/medium-capacity VNFs closer to the edge.  Equally,
214	   virtualization also sees some of the access network moving towards
215	   the core.  Examples of this include cloud-RAN or Software-Defined
216	   Access Networks.

218	   Historically service providers have architected data centers
219	   independently from the wide-area network, creating two independent
220	   domains or islands.  As VNFs become part of the service landscape the
221	   service data-path must be extended across the WAN into the data
222	   center infrastructure, but in a manner that still allows operators to
223	   meet deterministic performance requirements.  Methods for stitching
224	   WAN and DC infrastructures together with some form of service-
225	   interworking at the data center border have been implemented and
226	   deployed, but this service-interworking approach has several
227	   limitations:

229	   o  The data center environment typically uses encapsulation
230	      techniques such as VXLAN or NVGRE while the WAN typically uses
231	      encapsulation techniques such as MPLS [RFC3031].  Underlying
232	      optical infrastructure might also need to be programmed.  These
233	      are incompatible and require interworking at the service layer.

235	   o  It typically requires heavy-touch service provisioning on the data
236	      center border.  In an end-to-end service, midpoint provisioning is
237	      undesirable and should be avoided.

239	   o  Automation is difficult; largely due to the first two points but
240	      with additional contributing factors.  In the virtualization world
241	      automation is a must-have capability.

243	   o  When a service is operating at Layer 3 in a data center with
244	      redundant interconnects the risk of routing loops exists.  There
245	      is no inherent loop avoidance mechanism when redistributing routes
246	      between address families so extreme care must be taken.  Proposals
247	      such as the Domain Path (D-PATH) attribute
248	      [I-D.ietf-bess-evpn-ipvpn-interworking] attempt to address this
249	      issue but as yet are not widely implemented or deployed.

251	   o  Some or all the above make the service-interworking gateway
252	      cumbersome with questionable scaling attributes.

254	   Hence there is a requirement to create an open, scalable, and unified
255	   network architecture that brings together the wide-area network and
256	   data center domains.  It is not an architecture e xclusively targeted
257	   at greenfield deployments, nor does it require a flag day upgrade to
258	   deploy in a brownfield network.  It is an evolutionary step to a
259	   consolidated network that uses the constructs of seamless MPLS
260	   [I-D.ietf-mpls-seamless-mpls] as a baseline and extends upon that to
261	   include topologies that may not be link-state based and to provide
262	   end-to-end path control.  Overall the NFIX architecture aims to
263	   deliver the following:

265	   o  Allows for an evolving service edge boundary without having to
266	      constantly restructure the architecture.

268	   o  Provides a mechanism for providing seamless connectivity between
269	      VNF to VNF, VNF to PNF, and PNF to PNF, with deterministic SLAs,
270	      and with the ability to provide differentiated SLAs to suit
271	      different service requirements.

273	   o  Delivers a unified transport fabric using Segment Routing (SR)
274	      [RFC8402] where service delivery mandates touching only the
275	      service edge without imposing additional encapsulation
276	      requirements in the DC.

278	   o  Embraces automation by providing an environment where any end-to-
279	      end connectivity can be instantiated in a single request manner
280	      while maintaining SLAs.

282	4.  Requirements

284	   The following section outlines the requirements that the proposed
285	   solution must meet.  From an overall perspective, the proposed
286	   generic architecture must:

288	   o  Deliver end-to-end transport LSPs using traffic-engineering (TE)
289	      as required to meet appropriate SLAs for the service using(s)
290	      using those LSPs.  End-to-end refers to VNF and/or PNF
291	      connectivity or a combination of both.

293	   o  Provide a solution that allows for optimal end-to-end path
294	      placement; where optimal not only meets the requirements of the
295	      path in question but also meets the global network objectives.

297	   o  Support varying types of VNF physical network attachment and
298	      logical (underlay/overlay) connectivity.

300	   o  Facilitate automation of service provision.  As such the solution
301	      should avoid heavy-touch service provisioning and decapsulation/
302	      encapsulation at data center border routers.

304	   o  Provide a framework for delivering logical end-to-end networks
305	      using differentiated logical topologies and/or constraints.

307	   o  Provide a high level of stability; faults in one domain should not
308	      propagate to another domain.

310	   o  Provide a mechanism for homogeneous end-to-end OAM.

312	   o  Hide/localize instabilities in the different domains that
313	      participate in the end-to-end service.

315	   o  Provide a mechanism to minimize the label-stack depth required at
316	      path head-ends for SR-TE LSPs.

318	   o  Offer a high level of scalability.

320	   o  Although not considered in-scope of the current version of this
321	      document, the solution should not preclude the deployment of
322	      multicast.  This subject may be covered in later versions of this
323	      document.

325	5.  Theory of Operation

327	   This section describes the NFIX architecture including the building
328	   blocks and protocol machinery that is used to form the fabric.  Where
329	   considered appropriate rationale is given for selection of an
330	   architectural component where other seemingly applicable choices
331	   could have been made.

333	5.1.  VNF Assumptions

335	   For the sake of simplicity, references to VNF are made in a broad
336	   sense.  Equally, the differences between VNF and Container Network
337	   Function (CNF) are largely immaterial for the purposes of this
338	   document, therefore VNF is used to represent both.  The way in which
339	   a VNF is instantiated and provided network connectivity will differ
340	   based on environment and VNF capability, but for conciseness this is
341	   not explicitly detailed with every reference to a VNF.  Common
342	   examples of VNF variants include but are not limited to:

344	   o  A VNF that functions as a routing device and has full IP routing
345	      and MPLS capabilities.  It can be connected simultaneously to the
346	      data center fabric underlay and overlay and serves as the NVO
347	      tunnel endpoint [RFC8014].  Examples of this might be a
348	      virtualized PE router, or a virtualized Broadband Network Gateway
349	      (BNG).

351	   o  A VNF that functions as a device (host or router) with limited IP
352	      routing capability.  It does not connect directly to the data
353	      center fabric underlay but rather connects to one or more external
354	      physical or virtual devices that serve as the NVO tunnel
355	      endpoint(s).  It may however have single or multiple connections
356	      to the overlay.  Examples of this might be a mobile network
357	      control or management plane function.

359	   o  A VNF that has no routing capability.  It is a virtualized
360	      function hosted within an application server and is managed by a
361	      hypervisor or container host.  The hypervisor/container host acts
362	      as the NVO endpoint and interfaces to some form of SDN controller
363	      responsible for programming the forwarding plane of the
364	      virtualization host using, for example, OpenFlow.  Examples of
365	      this might be an Enterprise application server or a web server
366	      running as a virtual machine and front-ended by a virtual routing
367	      function such as OVS/xVRS/VTF.

369	   Where considered necessary exceptions to the examples provided above
370	   or focus on a particular scenario will be highlighted.

372	5.2.  Overview

374	   The NFIX architecture makes no assumptions about how the network is
375	   physically composed, nor does it impose any dependencies upon it.  It
376	   also makes no assumptions about IGP hierarchies and the use of areas/
377	   levels or discrete IGP instances within the WAN is fully endorsed to
378	   enhance scalability and constrain fault propagation.  This could
379	   apply for instance to a hierarchical WAN from core to edge or from
380	   WAN to LAN connections.  The overall architecture uses the constructs
381	   of seamless MPLS as a baseline and extends upon that.  The concept of
382	   decomposing the network into multiple domains is one that has been
383	   widely deployed and has been proven to scale in networks with large
384	   numbers of nodes.

386	   The proposed architecture uses segment routing (SR) as its preferred
387	   choice of transport.  Segment routing is chosen for construction of
388	   end-to-end LSPs given its ability to traffic-engineer through source-
389	   routing while concurrently scaling exceptionally well due to its lack
390	   of network state other than the ingress node.  This document uses SR
391	   instantiated on an MPLS forwarding plane(SR-MPLS), although it does
392	   not preclude the use of SRv6 either now or at some point in the
393	   future.  The rationale for selecting SR-MPLS is simply maturity and
394	   more widespread applicability across a potentially broad range of
395	   network devices.  This document may be updated in future versions to
396	   include more description of SRv6 applicability.

398	5.3.  Use of a Centralized Controller

400	   It is recognized that for most operators the move towards the use of
401	   a controller within the wide-area network is a significant change in
402	   operating model.  In the NFIX architecture it is a necessary
403	   component.  Its use is not simply to offload inter-domain path
404	   calculation from network elements; it provides many more benefits:

406	   o  It offers the ability to enforce constraints on paths that
407	      originate/terminate on different network elements, thereby
408	      providing path diversity, and/or bidirectionality/co-routing, and/
409	      or disjointness.

411	   o  It avoids collisions, re-tries, and packing problems that has been
412	      observed in networks using distributed TE path calculation, where
413	      head-ends make autonomous decisions.

415	   o  A controller can take a global view of path placement strategies,
416	      including the ability to make path placement decisions over a high
417	      number of LSPs concurrently as opposed to considering each LSP
418	      independently.  In turn, this allows for 'global' optimization of
419	      network resources such as available capacity.

421	   o  A controller can make decisions based on near-real-time network
422	      state and optimize paths accordingly.  For example, if a network
423	      link becomes congested it may recompute some of the paths
424	      transiting that link to other links that may not be quite as
425	      optimal but do have available capacity.  Or if a link latency
426	      crosses a certain threshold, it may select to reoptimize some
427	      latency-sensitive paths away from that link.

429	   o  The logic of a controller can be extended beyond pure path
430	      computation and placement.  If the controller is aware of
431	      services, service requirements, and available paths within the
432	      network it can cross-correlate between them and ensure that the
433	      appropriate paths are used for the appropriate services.

435	   o  The controller can provide assurance and verification of the
436	      underlying SLA provided to a given service.

438	   As the main objective of the NFIX architecture is to unify the data
439	   center and wide-area network domains, using the term controller is
440	   not sufficiently succinct.  The centralized controller may need to
441	   interface to other controllers that potentially reside within an SDN-
442	   enabled data center.  Therefore, to avoid interchangeably using the
443	   term controller for both functions, we distinguish between them
444	   simply by using the terms 'DC controller' which as the name suggests
445	   is responsible for the DC, and 'Interconnect controller' responsible
446	   for managing the extended SR fabric and services.

448	   The Interconnect controller learns wide-area network topology
449	   information and allocation of segment routing SIDs within that domain
450	   using BGP link-state [RFC7752] with appropriate SR extensions.
451	   Equally it learns data center topology information and Prefix-SID
452	   allocation using BGP labeled unicast [RFC8277] with appropriate SR
453	   extensions, or BGP link-state if a link-state IGP is used within the
454	   data center.  If Route-Reflection is used for exchange of BGP link-
455	   state or labeled unicast NLRI within one or more domains, then the
456	   Interconnect controller need only peer as a client with those Route-
457	   Reflectors in order to learn topology information.

459	   Where BGP link-state is used to learn the topology of a data center
460	   (or any IGP routing domain) the BGP-LS Instance Identifier (Instance-
461	   ID) is carried within Node/Link/Prefix NLRI and is used to identify a
462	   given IGP routing domain.  Where labeled unicast BGP is used to
463	   discover the topology of one or more data center domains there is no
464	   equivalent way for the Interconnect controller to achieve a level of
465	   routing domain correlation.  The controller may learn some splintered
466	   connectivity map consisting of 10 leaf switches, four spine switches,
467	   and four DCB's, but it needs some form of key to inform it that leaf
468	   switches 1-5, spine switches 1 and 2, and DCB's 1 and 2 belong to
469	   data center 1, while leaf switches 6-10, spine switches 3 and 4, and
470	   DCB's 3 and 4 belong to data center 2.  What is needed is a form of
471	   'data center membership identification' to provide this correlation.
472	   Optionally this could be achieved at BGP level using a standard
473	   community to represent each data center, or it could be done at a
474	   more abstract level where for example the DC controller provides the
475	   membership identification to the Interconnect controller through an
476	   application programming interface (API).

478	   Understanding real-time network state is an important part of the
479	   Interconnect controllers role, and only with this information is the
480	   controller able to make informed decisions and take preventive or
481	   corrective actions as necessary.  There are numerous methods
482	   implemented and deployed that allow for harvesting of network state,
483	   including (but not limited to) IPFIX [RFC7011], Netconf/YANG
484	   [RFC6241][RFC6020], streaming telemetry, BGP link-state [RFC7752]
485	   [I-D.ietf-idr-te-lsp-distribution], and the BGP Monitoring Protocol
486	   (BMP) [RFC7854].

488	5.4.  Routing and LSP Underlay

490	   This section describes the mechanisms and protocols that are used to
491	   establish end-to-end LSPs; where end-to-end refers to VNF-to-VNF,
492	   PNF-to-PNF, or VNF-to-PNF.

494	5.4.1.  Intra-Domain Routing

496	   In a seamless MPLS architecture domains are based on geographic
497	   dispersion (core, aggregation, access).  Within this document a
498	   domain is considered as any entity with a captive topology; be it a
499	   link-state topology or otherwise.  Where reference is made to the
500	   wide-area network domain, it refers to one or more domains that
501	   constitute the wide-area network domain.

503	   This section discusses the basic building blocks required within the
504	   wide-area network and the data center, noting from above that the
505	   wide-area network may itself consist of multiple domains.

507	5.4.1.1.  Wide-Area Network Domains

509	   The wide-area network includes all levels of hierarchy (core,
510	   aggregation, access) that constitute the networks MPLS footprint as
511	   well as the data Center border routers.  Each domain that constitutes
512	   part of the wide-area network runs a link-state interior gateway
513	   protocol (IGP) such as ISIS or OSPF, and each domain may use IGP-
514	   inherent hierarchy (OSPF areas, ISIS levels) with an assumption that
515	   visibility is domain-wide using, for example, L2 to L1
516	   redistribution.  Alternatively, or additionally, there may be
517	   multiple domains that are split by using separate and distinct
518	   instances of IGP.  There is no requirement for IGP redistribution of
519	   any link or loopback addresses between domains.

521	   Each IGP should be enabled with the relevant extensions for segment
522	   routing [RFC8667][RFC8665], and each SR-capable router should
523	   advertise a Node-SID for its loopback address, and an Adjacency-SID
524	   (Adj-SID) for every connected interface (unidirectional adjacency)
525	   belonging to the SR domain.  SR Global Blocks (SRGB) can be allocated
526	   to each domain as deemed appropriate to specific network
527	   requirements.  Border routers belonging to multiple domains have an
528	   SRGB for each domain.

530	   The default forwarding path for intra-domain LSPs that do not require
531	   TE is simply an SR LSP containing a single label advertised by the
532	   destination as a Node-SID and representing the ECMP-aware shortest
533	   path to that destination.  Intra-domain TE LSPs are constructed as
534	   required by the Interconnect controller.  Once a path is calculated
535	   it is advertised as an explicit SR Policy
536	   [I-D.ietf-spring-segment-routing-policy] containing one or more paths
537	   expressed as one or more segment-lists, which may optionally contain
538	   binding SIDs if requirements dictate.  An SR Policy is identified
539	   through the tuple [headend, color, endpoint] and this tuple is used
540	   extensively by the Interconnect controller to associate services with
541	   an underlying SR Policy that meets its objectives.

543	   To provide support for ECMP the Entropy Label [RFC6790][RFC8662]
544	   should be utilized.  Entropy Label Capability (ELC) should be
545	   advertised into the IGP using the IS-IS Prefix Attributes TLV
546	   [I-D.ietf-isis-mpls-elc] or the OSPF Extended Prefix TLV
547	   [I-D.ietf-ospf-mpls-elc] coupled with the Node MSD Capability sub-TLV
548	   to advertise Entropy Readable Label Depth (ERLD) [RFC8491][RFC8476]
549	   and the base MPLS Imposition (BMI).  Equally, support for ELC
550	   together with the supported ERLD should be signaled in BGP using the
551	   BGP Next-Hop Capability [I-D.ietf-idr-next-hop-capability].  Ingress
552	   nodes and or DCBs should ensure sufficient entropy is applied to
553	   packets to exercise available ECMP links.

555	5.4.1.2.  Data Center Domain

557	   The data center domain includes all fabric switches, network
558	   virtualization edge (NVE), and the data center border routers.  The
559	   data center routing design may align with the framework of [RFC7938]
560	   running eBGP single-hop sessions established over direct point-to-
561	   point links, or it may use an IGP for dissemination of topology
562	   information.  This document focuses on the former, simply because the
563	   ue of an IGP largely makes the data centers behaviour analogous to
564	   that of a wide-area network domain.

566	   The chosen method of transport or encapsulation within the data
567	   center for NFIX is SR-MPLS over IP/UDP [RFC8663] or, where possible,
568	   native SR-MPLS.  The choice of SR-MPLS over IP/UDP or native SR-MPLS
569	   allows for good entropy to maximize the use of equal-cost Clos fabric
570	   links.  Native SR-MPLS encapsulation provides entropy through use of
571	   the Entropy Label, and, like the wide-area network, support for ELC
572	   together with the support ERLD should be signaled using the BGP Next-
573	   Hop Capability attribute.  As described in [RFC6790] the ELC is an
574	   indication from the egress node of an MPLS tunnel to the ingress node
575	   of the MPLS tunnel that is is capable of processing an Entropy Label.
576	   The BGP Next-Hop Capability is a non-transitive attribute which is
577	   modified or deleted when the next-hop is changed to reflect the
578	   capabilities of the new next-hop.  If we assume that the path of a
579	   BGP-signaled LSP transits through multiple ASNs, and/or a single ASN
580	   with multiple next-hops, then it is not possible for the ingress node
581	   to determine the ELC of the egress node.  Without this end-to-end
582	   signaling capability the entropy label must only be used when it is
583	   explicitly known, through configuration or other means, that the
584	   egress node has support for it.  Entropy for SR-MPLS over IP/UDP
585	   encapsulation uses the source UDP port for IPv4 and the Flow Label
586	   for IPv6.  Again, the ingress network function should ensure
587	   sufficient entropy is applied to exercise available ECMP links.

589	   Another significant advantage of the use of native SR-MPLS or SR-MPLS
590	   over IP/UDP is that it allows for a lightweight interworking function
591	   at the DCB without the requirement for midpoint provisioning;
592	   interworking between the data center and the wide-area network
593	   domains becomes an MPLS label swap/continue action.

595	   Loopback addresses of network elements within the data center are
596	   advertised using labeled unicast BGP with the addition of SR Prefix
597	   SID extensions [RFC8669] containing a globally unique and persistent
598	   Prefix-SID.  The data-plane encapsulation of SR-MPLS over IP/UDP or
599	   native SR-MPLS allows network elements within the data center to
600	   consume BGP Prefix-SIDs and legitimately use those in the
601	   encapsulation.

603	5.4.2.  Inter-Domain Routing

605	   Inter-domain routing is responsible for establishing connectivity
606	   between any domains that form the wide-area network, and between the
607	   wide-area network and data center domains.  It is considered unlikely
608	   that every end-to-end LSP will require a TE path, hence there is a
609	   requirement for a default end-to-end forwarding path.  This default
610	   forwarding path may also become the path of last resort in the event
611	   of a non-recoverable failure of a TE path.  Similar to the seamless
612	   MPLS architecture this inter-domain MPLS connectivity is realized
613	   using labeled unicast BGP [RFC8277] with the addition of SR Prefix
614	   SID extensions.

616	   Within each wide-area network domain all service edge routers, DCBs,
617	   and ABRs/ASBRs form part of the labeled BGP mesh, which can be either
618	   full-mesh, or more likely based on the use of route-reflection.  Each
619	   of these routers advertises its respective loopback addresses into
620	   labeled BGP together with an MPLS label and a globally unique Prefix-
621	   SID.  Routes are advertised between wide-area network domains by
622	   ABRs/ASBRs that impose next-hop-self on advertised routes.  The
623	   function of imposing next-hop-self for labeled routes means that the
624	   ABR/ASBR allocates a new label for advertised routes and programs a
625	   label-swap entry in the forwarding plane for received and advertised
626	   routes.  In short it becomes part of the forwarding path.

628	   DCB routers have labeled BGP sessions towards the wide-area network
629	   and labeled BGP sessions towards the data center.  Routes are
630	   bidirectionally advertised between the domains subject to policy,
631	   with the DCB imposing itself as next-hop on advertised routes.  As
632	   above, the function of imposing next-hop-self for labeled routes
633	   implies allocation of a new label for advertised routes and a label-
634	   swap entry being programmed in the forwarding plane for received and
635	   advertised labels.  The DCB thereafter becomes the anchor point
636	   between the wide-area network domain and the data center domain.

638	   Within the wide-area network next-hops for labeled unicast routes
639	   containing Prefix-SIDs are resolved to SR LSPs, and within the data
640	   center domain next-hops for labeled unicast routes containing Prefix-
641	   SIDs are resolved to SR LSPs or IP/UDP tunnels.  This provides end-
642	   to-end connectivity without a traffic-engineering capability.

644	         +---------------+   +----------------+   +---------------+
645	         |  Data Center  |   |   Wide-Area    |   |   Wide-Area   |
646	         |              +-----+   Domain 1   +-----+  Domain 'n'  |
647	         |              | DCB |              | ABR |              |
648	         |              +-----+              +-----+              |
649	         |               |   |                |   |               |
650	         +---------------+   +----------------+   +---------------+
651	         <-- SR/SRoUDP -->   <---- IGP/SR ---->   <--- IGP/SR ---->
652	         <--- BGP-LU ---> NHS <--- BGP-LU ---> NHS <--- BGP-LU --->

654	                   Default Inter-Domain Forwarding Path

656	                                 Figure 1

658	5.4.3.  Intra-Domain and Inter-Domain Traffic-Engineering

660	   The capability to traffic-engineer intra- and inter-domain end-to-end
661	   paths is considered a key requirement in order to meet the service
662	   objectives previously outlined.  To achieve optimal end-to-end path
663	   placement the key components to be considered are path calculation,
664	   path activation, and FEC-to-path binding procedures.

666	   In the NFIX architecture end-to-end path calculation is performed by
667	   the Interconnect controller.  The mechanics of how the objectives of
668	   each path is calculated is beyond the scope of this document.  Once a
669	   path is calculated based upon its objectives and constraints, the
670	   path is advertised from the controller to the LSP headend as an
671	   explicit SR Policy containing one or more paths expressed as one or
672	   more segment-lists.  An SR Policy is identified through the tuple
673	   [headend, color, endpoint] and this tuple is used extensively by the
674	   Interconnect controller to associate services with an underlying SR
675	   Policy that meets its objectives.

677	   The segment-list of an SR Policy encodes a source-routed path towards
678	   the endpoint.  When calculating the segment-list the Interconnect
679	   controller makes comprehensive use of the Binding-SID (BSID),
680	   instantiating BSID anchors as necessary at path midpoints when
681	   calculating and activating a path.  The use of BSID is considered
682	   fundamental to segment routing as described in
683	   [I-D.filsfils-spring-sr-policy-considerations].  It provides opacity
684	   between domains, ensuring that any segment churn is constrained to a
685	   single domain.  It also reduces the number of segments/labels that
686	   the headend needs to impose, which is particularly important given
687	   that network elements within a data center generally have limited
688	   label imposition capabilities.  In the context of the NFIX
689	   architecture it is also the vehicle that allows for removal of heavy
690	   midpoint provisioning at the DCB.

692	   For example, assume that VNF1 is situated in data center 1, which is
693	   interconnected to the wide-area network via DCB1.  VNF1 requires
694	   connectivity to VNF2, situated in data center 2, which is
695	   interconnected to the wide-area network via DCB2.  Assuming there is
696	   no existing TE path that meet VNF1's requirements, the Interconnect
697	   controller will:

699	   o  Instantiate an SR Policy on DCB1 with BSID n and a segment-list
700	      containing the relevant segments of a TE path to DCB2.  DCB1
701	      therefore becomes a BSID anchor.

703	   o  Instantiate an SR Policy on VNF1 with BSID m and a segment-list
704	      containing segments {DCB1, n, VNF2}.

706	          +---------------+  +----------------+  +---------------+
707	          | Data Center 1 |  |   Wide-Area    |  | Data Center 2 |
708	          | +----+       +----+      3       +----+       +----+ |
709	          | |VNF1|       |DCB1|-1   / \   5--|DCB2|       |VNF2| |
710	          | +----+       +----+  \ /   \ /   +----+       +----+ |
711	          |               |  |    2     4     |  |               |
712	          +---------------+  +----------------+  +---------------+
713	          SR Policy      SR Policy
714	          BSID m         BSID n
715	         {DCB1,n,VNF2} {1,2,3,4,5,DCB2}

717	                    Traffic-Engineered Path using BSID

719	                                 Figure 2

721	   In the above figure a single DCB is used to interconnect two domains.
722	   Similarly, in the case of two wide-area domains the DCB would be
723	   represented as an ABR or ASBR.  In some single operator environments
724	   domains may be interconnected using adjacent ASBRs connected via a
725	   distinct physical link.  In this scenario the procedures outlined
726	   above may be extended to incorporate the mechanisms used in Egress
727	   Peer Engineering (EPE) [I-D.ietf-spring-segment-routing-central-epe]
728	   to form a traffic-engineered path spanning distinct domains.

730	5.4.3.1.  Traffic-Engineering and ECMP

732	   Where the Interconnect controller is used to place SR policies,
733	   providing support for ECMP requires some consideration.  An SR Policy
734	   is described with one or more segment-lists, end each of those
735	   segment-lists may or may not provide ECMP as a sum instruction and
736	   each SID itself may or may not support ECMP forwarding.  When an
737	   individual SID is a BSID, an ECMP path may or may not also be nested
738	   within.  The Interconnect controller may choose to place a path
739	   consisting entirely of non-ECMP-aware Adj-SIDs (each SID representing
740	   a single adjacency) such that the controller has explicit hop-by-hop
741	   knowledge of where that SR-TE LSP is routed.  This is beneficial to
742	   allow the controller to take corrective action if the criteria that
743	   was used to initially select a particular link in a particular path
744	   subsequently changes.  For example, if the latency of a link
745	   increases or a link becomes congested and a path should be rerouted.
746	   If ECMP-aware SIDs are used in the SR policy segment-list (including
747	   Node-SIDs, Adj-SIDs representing parallel links, and Anycast SIDs) SR
748	   routers are able to make autonomous decisions about where traffic is
749	   forwarded.  As a result, it is not possible for the controller to
750	   fully understand the impact of a change in network state and react to
751	   it.  With this in mind there are a number of approaches that could be
752	   adopted:

754	   o  If there is no requirement for the Interconnect controller to
755	      explicitly track path on a hop-by-hop basis, ECMP-aware SIDs may
756	      be used in the SR policy segment-list.  This approach may require
757	      multiple [ELI, EL] pairs to be inserted at the ingress node; for
758	      example, above and below a BSID to provide entropy in multiple
759	      domains.

761	   o  If there is a requirement for the Interconnect controller to
762	      explicitly track paths on a hop-by-hop to provide the capability
763	      to reroute them based on changes in network state, SR policy
764	      segment-lists should be constructed of non-ECMP-aware Adj-SIDs.

766	   o  A hybrid approach that allows for a level of ECMP (at the headend)
767	      together with the ability for the Interconnect controller to
768	      explicitly track paths is to instantiate an SR policy consisting
769	      of a set of segment-lists, each containing non-ECMP-aware Adj-
770	      SIDs.  Each segment-list will be assigned a weight to allow for
771	      ECMP or UCMP.  This approach does however imply computation and
772	      programing of two paths instead of one.

774	   o  Another hybrid approach might work as follows.  Redundant DCBs
775	      advertise an Anycast-SID 'A' into the data center, and also
776	      instantiate an SR policy with a segment-list consisting of non-
777	      ECMP-aware Adj-SIDs meeting the required connectivity and SLA.
778	      The BSID value of this SR policy 'B' must be common to both
779	      redundant DCBs, but the calculated paths are diverse.  Indeed,
780	      multiple segment-lists could be used in this SR policy.  A VNF
781	      could then instantiate an SR policy with a segment-list of {A, B}
782	      to achieve ECMP in the data center and TE in the wide-area network
783	      with the option of ECMP at the BSID anchor

785	5.5.  Service Layer

787	   The service layer is intended to deliver Layer 2 and/or Layer 3 VPN
788	   connectivity between network functions to create an overlay utilizing
789	   the routing and LSP underlay described in section 5.4.  To do this
790	   the solution employs the EVPN and/or VPN-IPv4/IPv6 address families
791	   to exchange Layer 2 and Layer 3 Network Layer Reachability
792	   Information (NLRI).  When these NLRI are exchanged between domains it
793	   is typical for the border router to set next-hop-self on advertised
794	   routes.  With the proposed routing and LSP underlay however, this is
795	   not required and EVPN/VPN-IPv4/IPv6 routes should be passed end-to-
796	   end without transit routers modifying the next-hop attribute.

798	   Section 5.4.2 describes the use of labeled unicast BGP to exchange
799	   inter-domain routes to establish a default forwarding path.  Labeled-
800	   unicast BGP is used to exchange prefix reachability between service
801	   edge routers, with domain border routes imposing next-hop-self on
802	   routes advertised between domains.  This provides a default inter-
803	   domain forwarding path and provides the required connectivity to
804	   establish inter-domain BGP sessions between service edges for the
805	   exchange of EVPN and/or VPN-IPv4/IPv6 NLRI.  If route-reflection is
806	   used for the EVPN and/or VPN-IPv4/IPv6 address families within one or
807	   more domains, it may be desirable to create inter-domain BGP sessions
808	   between route-reflectors.  In this case the peering addresses of the
809	   route-reflectors should also be exchanged between domains using
810	   labeled unicast BGP.  This creates a connectivity model analogous to
811	   BGP/MPLS IP-VPN Inter-AS option C [RFC4364].

813	           +----------------+  +----------------+  +----------------+
814	           |     +----+     |  |     +----+     |  |     +----+     |
815	         +----+  | RR |    +----+    | RR |    +----+    | RR |   +----+
816	         | NF |  +----+    | DCI|    +----+    | DCI|    +----+   | NF |
817	         +----+            +----+              +----+             +----+
818	           |     Domain     |  |     Domain     |  |     Domain     |
819	           +----------------+  +----------------+  +----------------+
820	           <-------> <-----> NHS <-- BGP-LU ---> NHS <-----> <------>
821	           <-------> <--------- EVPN/VPN-IPv4/v6 ----------> <------>

823	                        Inter-Domain Service Layer

825	                                 Figure 3

827	   EVPN and/or VPN-IPv4/v6 routes received from a peer in a different
828	   domain will contain a next-hop equivalent to the router that sourced
829	   the route.  The next-hop of these routes can be resolved to labeled-
830	   unicast route (default forwarding path) or to an SR policy (traffic-
831	   engineered forwarding path) as appropriate to the service
832	   requirements.  The exchange of EVPN and/or VPN-IPv4/IPv6 routes in
833	   this manner implies that Route-Distinguisher and Route-Target values
834	   remain intact end-to-end.

836	   The use of end-to-end EVPN and/or VPN-IPv4/IPv6 address families
837	   without the imposition of next-hop-self at border routers complements
838	   the gateway-less transport layer architecture.  It negates the
839	   requirement for midpoint service provisioning and as such provides
840	   the following benefits:

842	   o  Avoids the translation of MAC/IP EVPN routes to IP-VPN routes (and
843	      vice versa) that is typically associated with service
844	      interworking.

846	   o  Avoids instantiation of MAC-VRFs and IP-VPNs for each tenant
847	      resident in the DCB.

849	   o  Avoids provisioning of demarcation functions between the data
850	      center and wide-area network such as QoS, access-control,
851	      aggregation and isolation.

853	5.6.  Service Differentiation

855	   As discussed in section 5.4.3, the use of TE paths is a key
856	   capability of the NFIX solution framework described in this document.
857	   The Interconnect controller computes end-to-end TE paths between NFs
858	   and programs DC nodes, DCBs, ABR/ASBRs, via SR Policy, with the
859	   necessary label forwarding entries for each [headend, color,
860	   endpoint].  The collection of [headend, endpoint] pairs for the same
861	   color constitutes a logical network topology, where each topology
862	   satisfies a given SLA requirement.

864	   The Interconnect controller discovers the endpoints associated to a
865	   given topology (color) upon the reception of EVPN or IPVPN routes
866	   advertised by the endpoint.  The EVPN and IPVPN NLRIs are advertised
867	   by the endpoint nodes along with a color extended community which
868	   identifies the topology to which the owner of the NLRI belongs.  At a
869	   coarse level all the EVPN/IPVPN routes of the same VPN can be
870	   advertised with the same color, and therefore a TE topology would be
871	   established on a per-VPN basis.  At a more granular level IPVPN and
872	   especially EVPN provide a more granular way of coloring routes, that
873	   will allow the Interconnect controller to associate multiple
874	   topologies to the same VPN.  For example:

876	   o  All the EVPN MAC/IP routes for a given VNF may be advertised with
877	      the same color.  This would allow the Interconnect controller to
878	      associate topologies per VNF within the same VPN; that is, VNF1
879	      could be blue (e.g., low-latency topology) and VNF2 could be green
880	      (e.g., high-throughput).

882	   o  The EVPN MAC/IP routes and Inclusive Multicast Ethernet Tag (IMET)
883	      route for VNF1 may be advertised with different colors, e.g., red
884	      and brown, respectively.  This would allow the association of
885	      e.g., a low-latency topology for unicast traffic to VNF1 and best-
886	      effort topology for BUM traffic to VNF1.

888	   o  Each EVPN MAC/IP route or IP-Prefix route from a given VNF may be
889	      advertised with different color.  This would allow the association
890	      of topologies at the host level or host route granularity.

892	5.7.  Automated Service Activation

894	   The automation of network and service connectivity for instantiation
895	   and mobility of virtual machines is a highly desirable attribute
896	   within data centers.  Since this concerns service connectivity, it
897	   should be clear that this automation is relevant to virtual functions
898	   that belong to a service as opposed to a virtual network function
899	   that delivers services, such as a virtual PE router.

901	   Within an SDN-enabled data center, a typical hierarchy from top to
902	   bottom would include a policy engine (or policy repository), one or
903	   more DC controllers, numerous hypervisors/container hosts that
904	   function as NVO endpoints, and finally the virtual
905	   machines(VMs)/containers, which we'll refer to generically as
906	   virtualization hosts.

908	   The mechanisms used to communicate between the policy engine and DC
909	   controller, and between the DC controller and hypervisor/container
910	   are not relevant here and as such they are not discussed further.
911	   What is important is the interface and information exchange between
912	   the Interconnect controller and the data center SDN functions:

914	   o  The Interconnect controller interfaces with the data center policy
915	      engine and publishes the available colors, where each color
916	      represents a topological service connectivity map that meets a set
917	      of constraints and SLA objectives.  This interface is a
918	      straightforward API.

920	   o  The Interconnect controller interfaces with the DC controller to
921	      learn overlay routes.  This interface is BGP and uses the EVPN
922	      Address Family.

924	   With the above framework in place, automation of network and service
925	   connectivity can be implemented as follows:

927	   o  The virtualization host is turned-up.  The NVO endpoint notifies
928	      the DC controller of the startup.

930	   o  The DC controller retrieves service information, IP addressing
931	      information, and service 'color' for the virtualization host from
932	      the policy engine.  The DC controller subsequently programs the
933	      associated forwarding information on the virtualization host.
934	      Since the DC controller is now aware of MAC and IP address
935	      information for the virtualization host, it advertises that
936	      information as an EVPN MAC Advertisement Route into the overlay.

938	   o  The Interconnect controller receives the EVPN MAC Advertisement
939	      Route (potentially via a Route-Reflector) and correlates it with
940	      locally held service information and SLA requirements using Route
941	      Target and Color communities.  If the relevant SR policies are not
942	      already in place to support the service requirements and logical
943	      connectivity, including any binding-SIDs, they are calculated and
944	      advertised to the relevant headends.

946	   The same automated service activation principles can also be used to
947	   support the scenario where virtualization hosts are moved between
948	   hypervisors/container hosts for resourcing or other reasons.  We
949	   refer to this simply as mobility.  If a virtualization host is turned
950	   down the parent NVO endpoint notifies the DC controller, which in
951	   turn notifies the policy engine and withdraws any EVPN MAC
952	   Advertisement Routes.  Thereafter all associated state is removed.
953	   When the virtualization host is turned up on a different hypervisor/
954	   container host, the automated service connectivity process outlined
955	   above is simply repeated.

957	5.8.  Service Function Chaining

959	   Service Function Chaining (SFC) defines an ordered set of abstract
960	   service functions and the subsequent steering of traffic through
961	   them.  Packets are classified at ingress for processing by the
962	   required set of service functions (SFs) in an SFC-capable domain and
963	   are then forwarded through each SF in turn for processing.  The
964	   ability to dynamically construct SFCs containing the relevant SFs in
965	   the right sequence is a key requirement for operators.

967	   To enable flexible service function deployment models that support
968	   agile service insertion the NFIX architecture adopts the use of BGP
969	   as the control plane to distribute SFC information.  The BGP control
970	   plane for Network Service Header (NSH) SFC
971	   [I-D.ietf-bess-nsh-bgp-control-plane] is used for this purpose and
972	   defines two route types; the Service Function Instance Route (SFIR)
973	   and the Service Function Path Route (SFPR).

975	   The SFIR is used to advertise the presence of a service function
976	   instance (SFI) as a function type (i.e. firewall, TCP optimizer) and
977	   is advertised by the node hosting that SFI.  The SFIR is advertised
978	   together with a BGP Tunnel Encapsulation attribute containing details
979	   of how to reach that particular service function through the underlay
980	   network (i.e.  IP address and encapsulation information).

982	   The SFPRs contain service function path (SFP) information and one
983	   SFPR is originated for each SFP.  Each SFPR contains the service path
984	   identifier (SPI) of the path, the sequence of service function types
985	   that make up the path (each of which has at least one instance
986	   advertised in an SFIR), and the service index (SI) for each listed
987	   service function to identify its position in the path.

989	   Once a Classifier has determined which flows should be mapped to a
990	   given SFP, it imposes an NSH [RFC8300] on those packets, setting the
991	   SPI to that of the selected service path (advertised in an SFPR), and
992	   the SI to the first hop in the path.  As NSH is encapsulation
993	   agnostic, the NSH encapsulated packet is then forwarded through the
994	   appropriate tunnel to reach the service function forwarder (SFF)
995	   supporting that service function instance (advertised in an SFIR).
996	   The SFF removes the tunnel encapsulation and forwards the packet with
997	   the NSH to the relevant SF based upon a lookup of the SPI/SI.  When
998	   it is returned from the SF with a decremented SI value, the SFF
999	   forwards the packet to the next hop in the SFP using the tunnel
1000	   information advertised by that SFI.  This procedure is repeated until
1001	   the last hop of the SFP is reached.

1003	   The use of the NSH in this manner allows for service chaining with
1004	   topological and transport independence.  It also allows for the
1005	   deployment of SFIs in a condensed or dispersed fashion depending on
1006	   operator preference or resource availability.  Service function
1007	   chains are built in their own overlay network and share a common
1008	   underlay network, where that common underlay network is the NFIX
1009	   fabric described in section 5.4.  BGP updates containing an SFIR or
1010	   SFPR are advertised in conjunction with one or more Route Targets
1011	   (RTs), and each node in a service function overlay network is
1012	   configured with one or more import RTs.  As a result, nodes will only
1013	   import routes that are applicable and that local policy dictates.
1014	   This provides the ability to support multiple service function
1015	   overlay networks or the construction of service function chains
1016	   within L3VPN or EVPN services.

1018	   Although SFCs are constructed in a unidirectional manner, the BGP
1019	   control plane for NSH SFC allows for the optional association of
1020	   multiple paths (SFPRs).  This provides the ability to construct a
1021	   bidirectional service function chain in the presence of multiple
1022	   equal-cost paths between source and destination to avoid problems
1023	   that SFs may suffer with traffic asymmetry.

1025	   The proposed SFC model can be considered decoupled in that the use of
1026	   SR as a transport between SFFs is completely independent of the use
1027	   of NSH to define the SFC.  That is, it uses an NSH-based SFC and SR
1028	   is just one of many encapsulations that could be used between SFFs.
1029	   A similar more integrated approach proposes encoding a service
1030	   function as a segment so that an SFC can be constructed as a segment-
1031	   list.  In this case it can be considered an SR-based SFC with an NSH-
1032	   based service plane since the SF is unaware of the presence of the
1033	   SR.  Functionally both approaches are very similar and as such both
1034	   could be adopted and could work in parallel.  Construction of SFCs
1035	   based purely on SR (SF is SR-aware) are not considered at this time.

1037	5.9.  Stability and Availability

1039	   Any network architecture should have the capability to self-restore
1040	   following the failure of a network element.  The time to reconverge
1041	   following the failure needs to be minimal to avoid evident
1042	   disruptions in service.  This section discusses protection mechanisms
1043	   that are available for use and their applicability to the proposed
1044	   architecture.

1046	5.9.1.  IGP Reconvergence

1048	   Within the construct of an IGP topology the Topology Independent Loop
1049	   Free Alternate (TI-LFA) [I-D.ietf-rtgwg-segment-routing-ti-lfa] can
1050	   be used to provide a local repair mechanism that offers both link and
1051	   node protection.

1053	   TI-LFA is a repair mechanism, and as such it is reactive and
1054	   initially needs to detect a given failure.  To provide fast failure
1055	   detection the Bidirectional Forwarding Mechanism (BFD) is used.
1056	   Consideration needs to be given to the restoration capabilities of
1057	   the underlying transmission when deciding values for message
1058	   intervals and multipliers to avoid race conditions, but failure
1059	   detection in the order of 50 milliseconds can reasonably be
1060	   anticipated.  Where Link Aggregation Groups (LAG) are used, micro-BFD
1061	   [RFC7130] can be used to similar effect.  Indeed, to allow for
1062	   potential incremental growth in capacity it is not uncommon for
1063	   operators to provision all network links as LAG and use micro-BFD
1064	   from the outset.

1066	5.9.2.  Data Center Reconvergence

1068	   Clos fabrics are extremely common within data centers, and
1069	   fundamental to a Clos fabric is the ability to load-balance using
1070	   Equal Cost Multipath (ECMP).  The number of ECMP paths will vary
1071	   dependent on the number of devices in the parent tier but will never
1072	   be less than two for redundancy purposes with traffic hashed over the
1073	   available paths.  In this scenario the availability of a backup path
1074	   in the event of failure is implicit.  Commonly within the DC, rather
1075	   than computing protect paths (like LFA), techniques such as 'fast
1076	   rehash' are often utilized.  In this particular case, the failed
1077	   next-hop is removed from the multi-path forwarding data structure and
1078	   traffic is then rehashed over the remaining active paths.

1080	   In BGP-only data centers this relies on the implementation of BGP
1081	   multipath.  As network elements in the lower tier of a Clos fabric
1082	   will frequently belong to different ASNs, this includes the ability
1083	   to load-balance to a prefix with different AS_PATH attribute values
1084	   while having the same AS_PATH length; sometimes referred to as
1085	   'multipath relax' or 'multipath multiple-AS' [RFC7938].

1087	   Failure detection relies upon declaring a BGP session down and
1088	   removing any prefixes learnt over that session as soon as the link is
1089	   declared down.  As links between network elements predominantly use
1090	   direct point-to-point fiber, a link failure should be detected within
1091	   milliseconds.  BFD is also commonly used to detect IP layer failures.

1093	5.9.3.  Exchange of Inter-Domain Routes

1095	   Labeled unicast BGP together with SR Prefix-SID extensions are used
1096	   to exchange PNF and/or VNF endpoints between domains to create end-
1097	   to-end connectivity without TE.  When advertising between domains we
1098	   assume that a given BGP prefix is advertised by at least two border
1099	   routers (DCBs, ABRs, ASBRs) making prefixes reachable via at least
1100	   two next-hops.

1102	   BGP Prefix Independent Convergence (PIC) [I-D.ietf-rtgwg-bgp-pic]
1103	   allows failover to a pre-computed and pre-installed secondary next-
1104	   hop when the primary next-hop fails and is independent of the number
1105	   of destination prefixes that are affected by the failure.  When the
1106	   primary BGP next-hop fails, it should be clear that BGP PIC depends
1107	   on the availability o f a secondary next-hop in the Pathlist.  To
1108	   ensure that multiple paths to the same destination are visible the
1109	   BGP ADD-PATH [RFC7911] can be used to allow for advertisement of
1110	   multiple paths for the same address prefix.  Dual-homed EVPN/IP-VPN
1111	   prefixes also have the alternative option of allocating different
1112	   Route-Distinguishers (RDs).  To trigger the switch from primary to
1113	   secondary next-hop PIC needs to detect the failure and many
1114	   implementations support 'next-hop tracking' for this purpose.  Next-
1115	   hop tracking monitors the routing-table and if the next-hop prefix is
1116	   removed will immediately invalidate all BGP prefixes learnt through
1117	   that next-hop.  In the absence of next-hop tracking, multihop BFD
1118	   [RFC5883] could optionally be used as a fast failure detection
1119	   mechanism.

1121	5.9.4.  Controller Redundancy

1123	   With the Interconnect controller providing an integral part of the
1124	   networks' capabilities a redundant controller design is clearly
1125	   prudent.  To this end we can consider both availability and
1126	   redundancy.  Availability refers to the survivability of a single
1127	   controller system in a failure scenario.  A common strategy for
1128	   increasing the availability of a single controller system is to build
1129	   the system in a high-availability cluster such that it becomes a
1130	   confederation of redundant constituent parts as opposed to a single
1131	   monolithic system.  Should a single part fail, the system can still
1132	   survive without the requirement to failover to a standby controller
1133	   system.  Methods for detection of a failure of one or more member
1134	   parts of the cluster are implementation specific.

1136	   To provide contingency for a complete system failure a geo-redundant
1137	   standby controller system is required.  When redundant controllers
1138	   are deployed a coherent strategy is needed that provides a master/
1139	   standby election mechanism, the ability to propagate the outcome of
1140	   that election to network elements as required, an inter-system
1141	   failure detection mechanism, and the ability to synchronize state
1142	   across both systems such that the standby controller is fully aware
1143	   of current state should it need to transition to master controller.

1145	   Master/standby election, state synchronisation, and failure detection
1146	   between geo-redundant sites can largely be considered a local
1147	   implementation matter.  The requirement to propagate the outcome of
1148	   the master/standby election to network elements depends on a) the
1149	   mechanism that is used to instantiate SR policies, and b) whether the
1150	   SR policies are controller-initiated or headend-initiated, and these
1151	   are discussed in the following sub-sections.  In either scenario,
1152	   state of SR policies should be advertised northbound to both master/
1153	   standby controllers using either PCEP LSP State Report messages or SR
1154	   policy extensions to BGP link-state
1155	   [I-D.ietf-idr-te-lsp-distribution].

1157	5.9.4.1.  SR Policy Initiator

1159	   Controller-initiated SR policies are suited for auto-creation of
1160	   tunnels based on service route discovery and policy-driven route/flow
1161	   programming and are ephemeral.  Headend-initiated tunnels allow for
1162	   permanent configuration state to be held on the headend and are
1163	   suitable for static services that are not subject to dynamic changes.
1164	   If all SR policies are controller-initiated, it negates the
1165	   requirement to propagate the outcome of the master/standby election
1166	   to network elements.  This is because headends have no requirement
1167	   for unsolicited requests to a controller, and therefore have no
1168	   requirement to know which controller is master and which one is
1169	   standby.  A headend may respond to a message from a controller, but
1170	   it is not unsolicited.

1172	   If some or all SR policies are headend-initiated, then the
1173	   requirement to propagate the outcome of the master/standby election
1174	   exists.  This is further discussed in the following sub-section.

1176	5.9.4.2.  SR Policy Instantiation Mechanism

1178	   While candidate paths of SR policies may be provided using BGP, PCEP,
1179	   Netconf, or local policy/configuration, this document primarily
1180	   considers the use of PCEP or BGP.

1182	   When PCEP [RFC5440][RFC8231][RFC8281] is used for instantiation of
1183	   candidate paths of SR policies
1184	   [I-D.barth-pce-segment-routing-policy-cp] every headend/PCC should
1185	   establish a PCEP session with the master and standby controllers.  To
1186	   signal standby state to the PCC the standby controller may use a PCEP
1187	   Notification message to set the PCEP session into overload state.
1188	   While in this overload state the standby controller will accept path
1189	   computation LSP state report (PCRpt) messages without delegation but
1190	   will reject path computation requests (PCReq) and any path
1191	   computation reports (PCRpt) with the delegation bit set.  Further,
1192	   the standby controller will not path computation originate initiate
1193	   messages (PCInit) or path computation update request messages
1194	   (PCUpd).  In the event of the failure of the master controller, the
1195	   standby controller will transition to active and remove the PCEP
1196	   overload state.  Following expiration of the PCEP redelegation
1197	   timeout at the PCC any LSPs will be redelegated to the newly
1198	   transitioned active controller.  LSP state is not impacted unless
1199	   redelegation is not possible before the state timeout interval
1200	   expires.

1202	   When BGP is used for instantiation of SR policies every headend
1203	   should establish a BGP session with the master and standby controller
1204	   capable of exchanging SR TE Policy SAFI.  Candidate paths of SR
1205	   policies are advertised only by the active controller.  If the master
1206	   controller should experience a failure, then SR policies learnt from
1207	   that controller may be removed before they are re-advertised by the
1208	   standby (or newly-active) controller.  To minimize this possibility
1209	   BGP speakers that advertise and instantiate SR policies can implement
1210	   Long Lived Graceful Retart (LLGR) [I-D.ietf-idr-long-lived-gr], also
1211	   known as BGP persistence, to retain existing routes treated as least-
1212	   preferred until the new route arrives.  In the absence of LLGR, two
1213	   other alternatives are possible:

1215	   o  Provide a static backup SR policy.

1217	   o  Fallback to the default forwarding path.

1219	5.9.5.  Path and Segment Liveliness

1221	   When using traffic-engineered SR paths only the ingress router holds
1222	   any state.  The exception here is where BSIDs are used, which also
1223	   implies some state is maintained at the BSID anchor.  As there is no
1224	   control plane set-up, it follows that there is no feedback loop from
1225	   transit nodes of the path to notify the headend when a non-adjacent
1226	   point of the SR path fails.  The Interconnect controller however is
1227	   aware of all paths that are impacted by a given network failure and
1228	   should take the appropriate action.  This action could include
1229	   withdrawing an SR policy if a suitable candidate path is already in
1230	   place, or simply sending a new SR policy with a different segment-
1231	   list and a higher preference value assigned to it.

1233	   Verification of data plane liveliness is the responsibility of the
1234	   path headend.  A given SR policy may be associated with multiple
1235	   candidate paths and for the sake of clarity, we'll assume two for
1236	   redundancy purposes (which can be diversely routed).  Verification of
1237	   the liveliness of these paths can be achieved using seamless BFD
1238	   (S-BFD)[RFC7880], which provides an in-band failure detection
1239	   mechanism capable of detecting failure in the order of tens of
1240	   milliseconds.  Upon failure of the active path, failover to a
1241	   secondary candidate path can be activated at the path headend.
1242	   Details of the actual failover and revert mechanisms are a local
1243	   implementation matter.

1245	   S-BFD provides a fast and scalable failure detection mechanism but is
1246	   unlikely to be implemented in many VNFs given their inability to
1247	   offload the process to purpose-built hardware.  In the absence of an
1248	   active failure detection mechanism such as S-BFD the failover from
1249	   active path to secondary candidate path can be triggered using
1250	   continuous path validity checks.  One of the criteria that a
1251	   candidate path uses to determine its validity is the ability to
1252	   perform path resolution for the first SID to one or more outgoing
1253	   interface(s) and next-hop(s).  From the perspective of the VNF
1254	   headend the first SID in the segment-list will very likely be the DCB
1255	   (as BSID anchor) but could equally be another Prefix-SID hop within
1256	   the data center.  Should this segment experience a non-recoverable
1257	   failure, the headend will be unable to resolve the first SID and the
1258	   path will be considered invalid.  This will trigger a failover action
1259	   to a secondary candidate path.

1261	   Injection of S-BFD packets is not just constrained to the source of
1262	   an end-to-end LSP.  When an S-BFD packet is injected into an SR
1263	   policy path it is encapsulated with the label stack of the associated
1264	   segment-list.  It is possible therefore to run S-BFD from a BSID
1265	   anchor for just that section of the end-to-end path (for example,
1266	   from DCB to DCB).  This allows a BSID anchor to detect failure of a
1267	   path and take corrective action, while maintaining opacity between
1268	   domains.

1270	5.10.  Scalability

1272	   There are many aspects to consider regarding scalability of the NFIX
1273	   architecture.  The building blocks of NFIX are standards-based
1274	   technologies individually designed to scale for internet provider
1275	   networks.  When combined they provide a flexible and scalable
1276	   solution:

1278	   o  BGP has been proven to scale and operate with millions of routes
1279	      being exchanged.  Specifically, BGP labeled unicast has been
1280	      deployed and proven to scale in existing seamless-MPLS networks.

1282	   o  By placing forwarding instructions in the header of a packet,
1283	      segment routing reduces the amount of state required in the
1284	      network allowing the scale of greater number of transport tunnels.
1285	      This aids in the feasibility of the NFIX architecture to permit
1286	      the automated aspects of SR policy creation without having an
1287	      impact on the state in the core of the network.

1289	   o  The choice of utilizing native SR-MPLS or SR over IP in the data
1290	      center continues to permit horizontal scaling without introducing
1291	      new state inside of the data center fabric while still permitting
1292	      seamless end to end path forwarding integration.

1294	   o  BSIDs play a key role in the NFIX architecture as their use
1295	      provides the ability to traffic-engineer across large network
1296	      topologies consisting of many hops regardless of hardware
1297	      capability at the headend.  From a scalability perspective the use
1298	      of BSIDs facilitates better scale due to the fact that detailed
1299	      information about the SR paths in a domain has been abstracted and
1300	      localized to the BSID anchor point only.  When BSIDs are re-used
1301	      amongst one or many headends they reduce the amount of path
1302	      calculation and updates required at network edges while still
1303	      providing seamless end to end path forwarding.

1305	   o  The architecture of NFIX continues to use an independent DC
1306	      controller.  This allows continued independent scaling of data
1307	      center management in both policy and local forwarding functions,
1308	      while off-loading the end-to-end optimal path placement and
1309	      automation to the Interconnect controller.  The optimal path
1310	      placement is already a scalable function provided in a PCE
1311	      architecture.  The Interconnect controller must compute paths, but
1312	      it is not burdened by the management of virtual entity lifecycle
1313	      and associated forwarding policies.

1315	   It must be acknowledged that with the amalgamation of the technology
1316	   building blocks and the automation required by NFIX, there is an
1317	   additional burden on the Interconnect controller.  The scaling
1318	   considerations are dependent on many variables, but an implementation
1319	   of a Interconnect controller shares many overlapping traits and
1320	   scaling concerns as PCE, where the controller and PCE both must:

1322	   o  Discover and listen to topological state changes of the IP/MPLS
1323	      topology.

1325	   o  Compute traffic-engineered intra and inter domain paths across
1326	      large service provider topologies.

1328	   o  Synchronize, track and update thousands of LSPs to network devices
1329	      upon network state changes.

1331	   Both entail topologies that contain tens of thousands of nodes and
1332	   links.  The Interconnect controller in an NFIX architecture takes on
1333	   the additional role of becoming end to end service aware and
1334	   discovering data center entities that were traditionally excluded
1335	   from a controllers scope.  Although not exhaustive, an NFIX
1336	   Interconnect controller is impacted by some of the following:

1338	   o  The number of individual services, the number of endpoints that
1339	      may exist in each service, the distribution of endpoints in a
1340	      virtualized environment, and how many data centers may exist.
1341	      Medium or large sized data centers may be capable to host more
1342	      virtual endpoints per host, but with the move to smaller edge-
1343	      clouds the number of headends that require inter-connectivity
1344	      increases compared to the density of localized routing in a
1345	      centralized data center model.  The outcome has an impact on the
1346	      number of headend devices which may require tunnel management by
1347	      the Interconnect controller.

1349	   o  Assuming a given BSID satisfies SLA, the ability to re-use BSIDs
1350	      across multiple services reduces the number of paths to track and
1351	      manage.  However, the number of color or unique SLA definitions,
1352	      and criteria such as bandwidth constraints impacts WAN traffic
1353	      distribution requirements.  As BSIDs play a key role for VNF
1354	      connectivity, this potentially increases the number of BSID paths
1355	      required to permit appropriate traffic distribution.  This also
1356	      impacts the number of tunnels which may be re-used on a given
1357	      headend for different services.

1359	   o  The frequency of virtualized hosts being created and destroyed and
1360	      the general activity within a given service.  The controller must
1361	      analyze, track, and correlate the activity of relevant BGP routes
1362	      to track addition and removal of service host or host subnets, and
1363	      determine whether new SR policies should be instantiated, or stale
1364	      unused SR policies should be removed from the network.

1366	   o  The choice of SR instantiation mechanism impacts the number of
1367	      communication sessions the controller may require.  For example,
1368	      the BGP based mechanism may only require a small number of
1369	      sessions to route reflectors, whereas PCEP may require a
1370	      connection to every possible leaf in the network and any BSID
1371	      anchors.

1373	   o  The number of hops within one or many WAN domains may affect the
1374	      number of BSIDs required to provide transit for VNF/PNF, PNF/PNF,
1375	      or VNF/VNF inter-connectivity.

1377	   o  Relative to traditional WAN topologies, traditional data centers
1378	      are generally topologically denser in node and link connectivity
1379	      which is required to be discovered by the Interconnect controller,
1380	      resulting in a much larger, dense link-state database on the
1381	      Interconnect controller.

1383	5.10.1.  Asymmetric Model B for VPN Families

1385	   With the instantiation of multiple TE paths between any two VNFs in
1386	   the NFIX network, the number of SR Policy (remote endpoint, color)
1387	   routes, BSIDs and labels to support on VNFs becomes a choke point in
1388	   the architecture.  The fact that some VNFs are limited in terms of
1389	   forwarding resources makes this aspect an important scale issue.

1391	   As an example, if VNF1 and VNF2 in Figure 1 are associated to
1392	   multiple topologies 1..n, the Interconnect controller will
1393	   instantiate n TE paths in VNF1 to reach VNF2:

1395	   [VNF1,color-1,VNF2] --> BSID 1

1397	   [VNF1,color-2,VNF2] --> BSID 2

1399	   ...

1401	   [VNF1,color-n,VNF2] --> BSID n

1403	   Similarly, m TE paths may be instantiated on VNF1 to reach VNF3,
1404	   another p TE paths to reach VNF4, and so on for all the VNFs that
1405	   VNF1 needs to communicate with in DC2.  As it can be observed, the
1406	   number of forwarding resources to be instantiated on VNF1 may
1407	   significantly grow with the number of remote [endpoint, color] pairs,
1408	   compared with a best-effort architecture in which the number
1409	   forwarding resources in VNF1 grows with the number of endpoints only.

1411	   This scale issue on the VNFs can be relieved by the use of an
1412	   asymmetric model B service layer.  The concept is illustrated in
1413	   Figure 3.

1415	                                                  +------------+
1416	            <-------------------------------------|    WAN     |
1417	            |  SR Policy      +-------------------| Controller |
1418	            |  BSID m         |   SR Policy       +------------+
1419	            v  {DCI1,n,DCI2}  v   BSID n
1420	                                  {1,2,3,4,5,DCI2}
1421	           +----------------+  +----------------+  +----------------+
1422	           |     +----+     |  |                |  |     +----+     |
1423	         +----+  | RR |    +----+              +----+    | RR |   +----+
1424	         |VNF1|  +----+    |DCI1|              |DCI2|    +----+   |VNF2|
1425	         +----+            +----+              +----+             +----+
1426	           |       DC1      |  |       WAN      |  |       DC2      |
1427	           +----------------+  +----------------+  +----------------+

1429	           <-------- <-------------------------- NHS <------ <------
1430	                                EVPN/VPN-IPv4/v6(colored)

1432	           +----------------------------------->     +------------->
1433	                     TE path to DCI2                ECMP path to VNF2
1434	                 (BSID to segment-list
1435	                  expansion on DCI1)

1437	                     Asymmetric Model B Service Layer

1439	                                 Figure 4

1441	   Consider the different n topologies needed between VNF1 and VNF2 are
1442	   really only relevant to the different TE paths that exist in the WAN.
1443	   The WAN is the domain in the network where there can be significant
1444	   differences in latency, throughput or packet loss depending on the
1445	   sequence of nodes and links the traffic goes through.  Based on that
1446	   assumption, for traffic from VNF1 to DCB2 in Figure 4, traffic from
1447	   DCB2 to VNF2 can simply take an ECMP path.  In this case an
1448	   asymmetric model B Service layer can significantly relieve the scale
1449	   pressure on VNF1.

1451	   From a service layer perspective, the NFIX architecture described up
1452	   to now can be considered 'symmetric', meaning that the EVPN/IPVPN
1453	   advertisements from e.g., VNF2 in Figure 2, are received on VNF1 with
1454	   the next-hop of VNF2, and vice versa for VNF1's routes on VNF2.  SR
1455	   Policies to each VNF2 [endpoint, color] are then required on the
1456	   VNF1.

1458	   In the 'asymmetric' service design illustrated in Figure 4, VNF2's
1459	   EVPN/IPVPN routes are received on VNF1 with the next-hop of DCB2, and
1460	   VNF1's routes are received on VNF2 with next-hop of DCB1.  Now SR
1461	   policies instantiated on VNFs can be reduced to only the number of TE
1462	   paths required to reach the remote DCB.  For example, considering n
1463	   topologies, in a symmetric model VNF1 has to be instantiated with n
1464	   SR policy paths per remote VNF in DC2, whereas in the asymmetric
1465	   model of Figure 4, VNF1 only requires n SR policy paths per DC, i.e.,
1466	   to DCB2.

1468	   Asymmetric model B is a simple design choice that only requires the
1469	   ability (on the DCB nodes) to set next-hop-self on the EVPN/IPVPN
1470	   routes advertised to the WAN neighbors and not do next-hop-self for
1471	   routes advertised to the DC neighbors.  With this option, the
1472	   Interconnect controller only needs to establish TE paths from VNFs to
1473	   remote DCBs, as opposed to VNFs to remote VNFs.

1475	6.  Illustration of Use

1477	   For the purpose of illustration, this section provides some examples
1478	   of how different end-to-end tunnels are instantiated (including the
1479	   relevant protocols, SID values/label stacks etc.) and how services
1480	   are then overlaid onto those LSPs.

1482	6.1.  Reference Topology

1484	   The following network diagram illustrates the reference network
1485	   topology that is used for illustration purposes in this section.
1486	   Within the data centers leaf and spine network elements may be
1487	   present but are not shown for the purpose of clarity.

1489	                    +----------+
1490	                    |Controller|
1491	                    +----------+
1492	                      /  |  \
1493	             +----+          +----+          +----+     +----+
1494	     ~ ~ ~ ~ | R1 |----------| R2 |----------| R3 |-----|AGN1| ~ ~ ~ ~
1495	     ~       +----+          +----+          +----+     +----+       ~
1496	     ~   DC1    |                            /  |         |    DC2   ~
1497	   +----+       |      L=5   +----+   L=5   /   |       +----+    +----+
1498	   | Sn |       |    +-------| R4 |--------+    |       |AGN2|    | Dn |
1499	   +----+       |   /  M=20  +----+  M=20       |       +----+    +----+
1500	     ~          |  /                            |         |          ~
1501	     ~       +----+     +----+    +----+     +----+     +----+       ~
1502	     ~ ~ ~ ~ | R5 |-----| R6 |----| R7 |-----| R8 |-----|AGN3| ~ ~ ~ ~
1503	             +----+     +----+    +----+     +----+     +----+

1505	                            Reference Topology

1507	                                 Figure 5

1509	   The following applies to the reference topology in figure 5:

1511	   o  Data center 1 and data center 2 both run BGP/SR.  Both data
1512	      centers run leaf/spine topologies, which are not shown for the
1513	      purpose of clarity.

1515	   o  R1 and R5 function as data center border routers for DC 1.  AGN1
1516	      and AGN3 function as data center border routers for DC 2.

1518	   o  Routers R1 through R8 form an independent ISIS-OSPF/SR instance.

1520	   o  Routers R3, R8, AGN1, AGN2, and AGN2 form an independent ISIS-
1521	      OSPF/SR instance.

1523	   o  All IGP link metrics within the wide area network are metric 10
1524	      except for links R5-R4 and R4-R3 which are both metric 20.

1526	   o  All links have a unidirectional latency of 10 milliseconds except
1527	      for links R5-R4 and R4-R3 which both have a unidirectional latency
1528	      of 5 milliseconds.

1530	   o  Source 'Sn' and destination 'Dn' represent one or more network
1531	      functions.

1533	6.2.  PNF to PNF Connectivity

1535	   The first example demonstrates the simplest form of connectivity; PNF
1536	   to PNF.  The example illustrates the instantiation of a
1537	   unidirectional TE path from R1 to AGN2 and its consumption by an EVPN
1538	   service.  The service has a requirement for high-throughput with no
1539	   strict latency requirements.  These service requirements are
1540	   catalogued and represented using the color blue.

1542	   o  An EVPN service is provisioned at R1 and AGN2.

1544	   o  The Interconnect controller computes the path from R1 to AGN2 and
1545	      calculates that the optimal path based on the service requirements
1546	      and overall network optimization is R1-R5-R6-R7-R8-AGN3-AGN2.  The
1547	      segment-list to represent the calculated path could be constructed
1548	      in numerous ways.  It could be strict hops represented by a series
1549	      of Adj-SIDs.  It could be loose hops using ECMP-aware Node-SIDs,
1550	      for example {R7, AGN2}, or it could be a combination of both Node-
1551	      SIDs and Adj-SIDs.  Equally, BSIDs could be used to reduce the
1552	      number of labels that need to be imposed at the headend.  In this
1553	      example, strict Adj-SID hops are used with a BSID at the area
1554	      border router R8, but this should not be interpreted as the only
1555	      way a path and segment-list can be represented.

1557	   o  The Interconnect controller advertises a BGP SR Policy to R8 with
1558	      BSID 1000, and a segment-list containing segments {AGN3, AGN2}.

1560	   o  The Interconnect controller advertises a BGP SR Policy to R1 with
1561	      BSID 1001, and a segment-list containing segments {R5, R6, R7, R8,
1562	      1000}. The policy is identified using the tuple [headed = R1,
1563	      color = blue, endpoint = AGN2].

1565	   o  AGN2 advertises an EVPN MAC Advertisement Route for MAC M1, which
1566	      is learned by R1.  The route has a next-hop of AGN2, an MPLS label
1567	      of L1, and it carries a color extended community with the value
1568	      blue.

1570	   o  R1 has a valid SR policy [color = blue, next-hop = AGN2] with
1571	      segment-list {R5, R6, R7, R8, 1000}. R1 therefore associates the
1572	      MAC address M1 with that policy and programs the relevant
1573	      information into the forwarding path.

1575	   o  The Interconnect controller also learns the EVPN MAC Route
1576	      advertised by AGN2.  The purpose of this is two-fold.  It allows
1577	      the controller to correlate the service overlay with the
1578	      underlying transport LSPs, thus creating a service connectivity
1579	      map.  It also allows the controller to dynamically create LSPs
1580	      based upon service requirements if they do not already exist, or
1581	      to optimize them if network conditions change.

1583	6.3.  VNF to PNF Connectivity

1585	   The next example demonstrates VNF to PNF connectivity and illustrates
1586	   the instantiation of a unidirectional TE path from S1 to AGN2.  The
1587	   path is consumed by an IP-VPN service that has a basic set of service
1588	   requirements and as such simply uses IGP metric as a path computation
1589	   objective.  These basic service requirements are cataloged and
1590	   represented using the color red.

1592	   In this example S1 is a VNF with full IP routing and MPLS capability
1593	   that interfaces to the data center underlay/overlay and serves as the
1594	   NVO tunnel endpoint.

1596	   o  An IP-VPN service is provisioned at S1 and AGN2.

1598	   o  The Interconnect controller computes the path from S1 to AGN2 and
1599	      calculates that the optimal path based on IGP metric is
1600	      R1-R2-R3-AGN1-AGN2.

1602	   o  The Interconnect controller advertises a BGP SR Policy to R1 with
1603	      BSID 1002, and a segment-list containing segments {R2, R3, AGN1,
1604	      AGN2}.

1606	   o  The Interconnect controller advertises a BGP SR Policy to S1 with
1607	      BSID 1003, and a segment-list containing segments {R1, 1002}. The
1608	      policy is identified using the tuple [headend = S1, color = red,
1609	      endpoint = AGN2].

1611	   o  Source S1 learns an VPN-IPv4 route for prefix P1, next-hop AGN2.
1612	      The route has an VPN label of L1, and it carries a color extended
1613	      community with value red.

1615	   o  S1 has a valid SR policy [color = red, endpoint = AGN2] with
1616	      segment-list {R1, 1002} and BSID 1003.  S1 therefore associates
1617	      the VPN-IPv4 prefix P1 with that policy and programs the relevant
1618	      information into the forwarding path.

1620	   o  As in the previous example the Interconnect controller also learns
1621	      the VPN-IPv4 route advertised by AGN2 in order to correlate the
1622	      service overlay with the underlying transport LSPs, creating or
1623	      optimizing them as required.

1625	6.4.  VNF to VNF Connectivity

1627	   The last example demonstrates VNF to VNF connectivity and illustrates
1628	   the instantiation of a unidirectional TE path from S2 to D2.  The
1629	   path is consumed by an EVPN service that requires low latency as a
1630	   service requirement and as such uses latency as a path computation
1631	   objective.  This service requirement is cataloged and represented
1632	   using the color green.

1634	   In this example S2 is a VNF that has no routing capability.  It is
1635	   hosted by hypervisor H1 that in turn has an interface to a DC
1636	   controller through which forwarding instructions are programmed.  H1
1637	   serves as the NVO tunnel endpoint and overlay next-hop.

1639	   D2 is a VNF with partial routing capability that is connected to a
1640	   leaf switch L1.  L1 connects to underlay/overlay in data center 2 and
1641	   serves as the NVO tunnel endpoint for D2.  L1 advertises BGP Prefix-
1642	   SID 9001 into the underlay.

1644	   o  The relevant details of the EVPN service are entered in the data
1645	      center policy engines within data center 1 and 2.

1647	   o  Source S2 is turned-up.  Hypervisor H1 notifies its parent DC
1648	      controller, which in turn retrieves the service (EVPN)
1649	      information, color, IP and MAC information from the policy engine
1650	      and subsequently programs the associated forwarding entries onto
1651	      S2.  The DC controller also dynamically advertises an EVPN MAC
1652	      Advertisement Route for S2's IP and MAC into the overlay with
1653	      next-hop H1.  (This would trigger the return path set-up between
1654	      L1 and H2 not covered in this example.)

1656	   o  The DC controller in data center 1 learns an EVPN MAC
1657	      Advertisement Route for D2, MAC M, next-nop L1.  The route has an
1658	      MPLS label of L2, and it carries a color extended community with
1659	      the value green.

1661	   o  The Interconnect controller computes the path between H1 and L1
1662	      and calculates that the optimal path based on latency is
1663	      R5-R4-R3-AGN1.

1665	   o  The Interconnect controller advertises a BGP SR Policy to R5 with
1666	      BSID 1004, and a segment-list containing segments {R4, R3, AGN1}.

1668	   o  The Interconnect controller advertises a BGP SR Policy to the DC
1669	      controller in data center 1 with BSID 1005 and a segment-list
1670	      containing segments {R5, 1004, 9001}. The policy is identified
1671	      using the tuple [headend = H1, color = green, endpoint = L1].

1673	   o  The DC controller in data center 1 has a valid SR policy [color =
1674	      green, endpoint = L1] with segment-list {R5, 1004, 9001} and BSID
1675	      1005.  The controller therefore associates the MAC Advertisement
1676	      Route with that policy, and programs the associated forwarding
1677	      rules into S2.

1679	   o  As in the previous example the Interconnect controller also learns
1680	      the MAC Advertisement Route advertised by D2 in order to correlate
1681	      the service overlay with the underlying transport LSPs, creating
1682	      or optimizing them as required.

1684	7.  Conclusions

1686	   The NFIX architecture provides an evolutionary path to a unified
1687	   network fabric.  It uses the base constructs of seamless-MPLS and
1688	   adds end-to-end LSPs capable of delivering against SLAs, seamless
1689	   data center interconnect, service differentiation, service function
1690	   chaining, and a Layer-2/Layer-3 infrastructure capable of
1691	   interconnecting PNF-to-PNF, PNF-to-VNF, and VNF-to-VNF.

1693	   NFIX establishes a dynamic, seamless, and automated connectivity
1694	   model that overcomes the operational barriers and interworking issues
1695	   between data centers and the wide-area network and delivers the
1696	   following using standards-based protocols:

1698	   o  A unified routing control plane: Multiprotocol BGP (MP-BGP) to
1699	      acquire inter-domain NLRI from the IP/MPLS underlay and the
1700	      virtualized IP-VPN/EVPN service overlay.

1702	   o  A unified forwarding control plane: SR provides dynamic service
1703	      tunnels with fast restoration options to meet deterministic
1704	      bandwidth, latency and path diversity constraints.  SR utilizes
1705	      the appropriate data path encapsulation for seamless, end-to-end
1706	      connectivity between distributed edge and core data centers across
1707	      the wide-area network.

1709	   o  Service Function Chaining: Leverage SFC extensions for BGP and
1710	      segment routing to interconnect network and service functions into
1711	      SFPs, with support for various data path implementations.

1713	   o  Service Differentiation: Provide a framework that allows for
1714	      construction of logical end-to-end networks with differentiated
1715	      logical topologies and/or constraints through use of SR policies
1716	      and coloring.

1718	   o  Automation: Facilitates automation of service provisioning and
1719	      avoids heavy service interworking at DCBs.

1721	   NFIX is deployable on existing data center and wide-area network
1722	   infrastructures and allows the underlying data forwarding plane to
1723	   evolve with minimal impact on the services plane.

1725	8.  Security Considerations

1727	   The NFIX architecture based on SR-MPLS is subject to the same
1728	   security concerns as any MPLS network.  No new protocols are
1729	   introduced, hence security issues of the protocols encompassed by
1730	   this architecture are addressed within the relevant individual
1731	   standards documents.  It is recommended that the security framework
1732	   for MPLS and GMPLS networks defined in [RFC5920] are adhered to.
1733	   Although [RFC5920] focuses on the use of RSVP-TE and LDP control
1734	   plane, the practices and procedures are extendable to an SR-MPLS
1735	   domain.

1737	   The NFIX architecture makes extensive use of Multiprotocol BGP, and
1738	   it is recommended that the TCP Authentication Option (TCP-AO)
1739	   [RFC5925] is used to protect the integrity of long-lived BGP sessions
1740	   and any other TCP-based protocols.

1742	   Where PCEP is used between controller and path headend the use of
1743	   PCEPS [RFC8253] is recommended to provide confidentiality to PCEP
1744	   communication using Transport Layer Security (TLS).

1746	9.  Acknowledgements

1748	   The authors would like to acknowledge Mustapha Aissaoui, Wim
1749	   Henderickx, and Gunter Van de Velde.

1751	10.  Contributors

1753	   The following people contributed to the content of this document and
1754	   should be considered co-authors.

1756	           Juan Rodriguez
1757	           Nokia
1758	           United States of America

1760	           Email: juan.rodriguez@nokia.com

1762	           Jorge Rabadan
1763	           Nokia
1764	           United States of America

1766	           Email: jorge.rabadan@nokia.com

1768	           Nick Morris
1769	           Verizon
1770	           United States of America

1772	           Email: nicklous.morris@verizonwireless.com

1774	           Eddie Leyton
1775	           Verizon
1776	           United States of America

1778	           Email: edward.leyton@verizonwireless.com

1780	                                 Figure 6

1782	11.  IANA Considerations

1784	   This memo does not include any requests to IANA for allocation.

1786	12.  References

1788	12.1.  Normative References

1790	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1791	              Requirement Levels", BCP 14, RFC 2119, March 1997,
1792	              <http://xml.resource.org/public/rfc/html/rfc2119.html>.

1794	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
1795	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
1796	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

1798	12.2.  Informative References

1800	   [I-D.ietf-nvo3-geneve]
1801	              Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic
1802	              Network Virtualization Encapsulation", draft-ietf-
1803	              nvo3-geneve-16 (work in progress), March 2020.

1805	   [I-D.ietf-mpls-seamless-mpls]
1806	              Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz,
1807	              M., and D. Steinberg, "Seamless MPLS Architecture", draft-
1808	              ietf-mpls-seamless-mpls-07 (work in progress), June 2014.

1810	   [I-D.ietf-bess-evpn-ipvpn-interworking]
1811	              Rabadan, J., Sajassi, A., Rosen, E., Drake, J., Lin, W.,
1812	              Uttaro, J., and A. Simpson, "EVPN Interworking with
1813	              IPVPN", draft-ietf-bess-evpn-ipvpn-interworking-03 (work
1814	              in progress), May 2020.

1816	   [I-D.ietf-spring-segment-routing-policy]
1817	              Filsfils, C., Sivabalan, S., Voyer, D., Bogdanov, A., and
1818	              P. Mattes, "Segment Routing Policy Architecture", draft-
1819	              ietf-spring-segment-routing-policy-07 (work in progress),
1820	              May 2020.

1822	   [I-D.ietf-rtgwg-segment-routing-ti-lfa]
1823	              Litkowski, S., Bashandy, A., Filsfils, C., Decraene, B.,
1824	              Francois, P., Voyer, D., Clad, F., and P. Camarillo,
1825	              "Topology Independent Fast Reroute using Segment Routing",
1826	              draft-ietf-rtgwg-segment-routing-ti-lfa-03 (work in
1827	              progress), March 2020.

1829	   [I-D.ietf-bess-nsh-bgp-control-plane]
1830	              Farrel, A., Drake, J., Rosen, E., Uttaro, J., and L.
1831	              Jalil, "BGP Control Plane for the Network Service Header
1832	              in Service Function Chaining", draft-ietf-bess-nsh-bgp-
1833	              control-plane-15 (work in progress), June 2020.

1835	   [I-D.ietf-idr-te-lsp-distribution]
1836	              Previdi, S., Talaulikar, K., Dong, J., Chen, M., Gredler,
1837	              H., and J. Tantsura, "Distribution of Traffic Engineering
1838	              (TE) Policies and State using BGP-LS", draft-ietf-idr-te-
1839	              lsp-distribution-13 (work in progress), April 2020.

1841	   [I-D.barth-pce-segment-routing-policy-cp]
1842	              Koldychev, M., Sivabalan, S., Barth, C., Peng, S., and H.
1843	              Bidgoli, "PCEP extension to support Segment Routing Policy
1844	              Candidate Paths", draft-barth-pce-segment-routing-policy-
1845	              cp-06 (work in progress), June 2020.

1847	   [I-D.filsfils-spring-sr-policy-considerations]
1848	              Filsfils, C., Talaulikar, K., Krol, P., Horneffer, M., and
1849	              P. Mattes, "SR Policy Implementation and Deployment
1850	              Considerations", draft-filsfils-spring-sr-policy-
1851	              considerations-05 (work in progress), April 2020.

1853	   [I-D.ietf-rtgwg-bgp-pic]
1854	              Bashandy, A., Filsfils, C., and P. Mohapatra, "BGP Prefix
1855	              Independent Convergence", draft-ietf-rtgwg-bgp-pic-11
1856	              (work in progress), February 2020.

1858	   [I-D.ietf-isis-mpls-elc]
1859	              Xu, X., Kini, S., Psenak, P., Filsfils, C., Litkowski, S.,
1860	              and M. Bocci, "Signaling Entropy Label Capability and
1861	              Entropy Readable Label Depth Using IS-IS", draft-ietf-
1862	              isis-mpls-elc-13 (work in progress), May 2020.

1864	   [I-D.ietf-ospf-mpls-elc]
1865	              Xu, X., Kini, S., Psenak, P., Filsfils, C., Litkowski, S.,
1866	              and M. Bocci, "Signaling Entropy Label Capability and
1867	              Entropy Readable Label Depth Using OSPF", draft-ietf-ospf-
1868	              mpls-elc-15 (work in progress), June 2020.

1870	   [I-D.ietf-idr-next-hop-capability]
1871	              Decraene, B., Kompella, K., and W. Henderickx, "BGP Next-
1872	              Hop dependent capabilities", draft-ietf-idr-next-hop-
1873	              capability-05 (work in progress), June 2019.

1875	   [I-D.ietf-spring-segment-routing-central-epe]
1876	              Filsfils, C., Previdi, S., Dawra, G., Aries, E., and D.
1877	              Afanasiev, "Segment Routing Centralized BGP Egress Peer
1878	              Engineering", draft-ietf-spring-segment-routing-central-
1879	              epe-10 (work in progress), December 2017.

1881	   [I-D.ietf-idr-long-lived-gr]
1882	              Uttaro, J., Chen, E., Decraene, B., and J. Scudder,
1883	              "Support for Long-lived BGP Graceful Restart", draft-ietf-
1884	              idr-long-lived-gr-00 (work in progress), September 2019.

1886	   [RFC7938]  Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of
1887	              BGP for Routing in Large-Scale Data Centers", RFC 7938,
1888	              DOI 10.17487/RFC7938, August 2016,
1889	              <https://www.rfc-editor.org/info/rfc7938>.

1891	   [RFC7752]  Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and
1892	              S. Ray, "North-Bound Distribution of Link-State and
1893	              Traffic Engineering (TE) Information Using BGP", RFC 7752,
1894	              DOI 10.17487/RFC7752, March 2016,
1895	              <https://www.rfc-editor.org/info/rfc7752>.

1897	   [RFC8277]  Rosen, E., "Using BGP to Bind MPLS Labels to Address
1898	              Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017,
1899	              <https://www.rfc-editor.org/info/rfc8277>.

1901	   [RFC8667]  Previdi, S., Ed., Ginsberg, L., Ed., Filsfils, C.,
1902	              Bashandy, A., Gredler, H., and B. Decraene, "IS-IS
1903	              Extensions for Segment Routing", RFC 8667,
1904	              DOI 10.17487/RFC8667, December 2019,
1905	              <https://www.rfc-editor.org/info/rfc8667>.

1907	   [RFC8665]  Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler,
1908	              H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF
1909	              Extensions for Segment Routing", RFC 8665,
1910	              DOI 10.17487/RFC8665, December 2019,
1911	              <https://www.rfc-editor.org/info/rfc8665>.

1913	   [RFC8669]  Previdi, S., Filsfils, C., Lindem, A., Ed., Sreekantiah,
1914	              A., and H. Gredler, "Segment Routing Prefix Segment
1915	              Identifier Extensions for BGP", RFC 8669,
1916	              DOI 10.17487/RFC8669, December 2019,
1917	              <https://www.rfc-editor.org/info/rfc8669>.

1919	   [RFC8663]  Xu, X., Bryant, S., Farrel, A., Hassan, S., Henderickx,
1920	              W., and Z. Li, "MPLS Segment Routing over IP", RFC 8663,
1921	              DOI 10.17487/RFC8663, December 2019,
1922	              <https://www.rfc-editor.org/info/rfc8663>.

1924	   [RFC7911]  Walton, D., Retana, A., Chen, E., and J. Scudder,
1925	              "Advertisement of Multiple Paths in BGP", RFC 7911,
1926	              DOI 10.17487/RFC7911, July 2016,
1927	              <https://www.rfc-editor.org/info/rfc7911>.

1929	   [RFC7880]  Pignataro, C., Ward, D., Akiya, N., Bhatia, M., and S.
1930	              Pallagatti, "Seamless Bidirectional Forwarding Detection
1931	              (S-BFD)", RFC 7880, DOI 10.17487/RFC7880, July 2016,
1932	              <https://www.rfc-editor.org/info/rfc7880>.

1934	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
1935	              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
1936	              2006, <https://www.rfc-editor.org/info/rfc4364>.

1938	   [RFC5920]  Fang, L., Ed., "Security Framework for MPLS and GMPLS
1939	              Networks", RFC 5920, DOI 10.17487/RFC5920, July 2010,
1940	              <https://www.rfc-editor.org/info/rfc5920>.

1942	   [RFC7011]  Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
1943	              "Specification of the IP Flow Information Export (IPFIX)
1944	              Protocol for the Exchange of Flow Information", STD 77,
1945	              RFC 7011, DOI 10.17487/RFC7011, September 2013,
1946	              <https://www.rfc-editor.org/info/rfc7011>.

1948	   [RFC6241]  Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed.,
1949	              and A. Bierman, Ed., "Network Configuration Protocol
1950	              (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011,
1951	              <https://www.rfc-editor.org/info/rfc6241>.

1953	   [RFC6020]  Bjorklund, M., Ed., "YANG - A Data Modeling Language for
1954	              the Network Configuration Protocol (NETCONF)", RFC 6020,
1955	              DOI 10.17487/RFC6020, October 2010,
1956	              <https://www.rfc-editor.org/info/rfc6020>.

1958	   [RFC7854]  Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP
1959	              Monitoring Protocol (BMP)", RFC 7854,
1960	              DOI 10.17487/RFC7854, June 2016,
1961	              <https://www.rfc-editor.org/info/rfc7854>.

1963	   [RFC8300]  Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed.,
1964	              "Network Service Header (NSH)", RFC 8300,
1965	              DOI 10.17487/RFC8300, January 2018,
1966	              <https://www.rfc-editor.org/info/rfc8300>.

1968	   [RFC5440]  Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation
1969	              Element (PCE) Communication Protocol (PCEP)", RFC 5440,
1970	              DOI 10.17487/RFC5440, March 2009,
1971	              <https://www.rfc-editor.org/info/rfc5440>.

1973	   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
1974	              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
1975	              eXtensible Local Area Network (VXLAN): A Framework for
1976	              Overlaying Virtualized Layer 2 Networks over Layer 3
1977	              Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
1978	              <https://www.rfc-editor.org/info/rfc7348>.

1980	   [RFC7637]  Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network
1981	              Virtualization Using Generic Routing Encapsulation",
1982	              RFC 7637, DOI 10.17487/RFC7637, September 2015,
1983	              <https://www.rfc-editor.org/info/rfc7637>.

1985	   [RFC3031]  Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
1986	              Label Switching Architecture", RFC 3031,
1987	              DOI 10.17487/RFC3031, January 2001,
1988	              <https://www.rfc-editor.org/info/rfc3031>.

1990	   [RFC8014]  Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T.
1991	              Narten, "An Architecture for Data-Center Network
1992	              Virtualization over Layer 3 (NVO3)", RFC 8014,
1993	              DOI 10.17487/RFC8014, December 2016,
1994	              <https://www.rfc-editor.org/info/rfc8014>.

1996	   [RFC8402]  Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
1997	              Decraene, B., Litkowski, S., and R. Shakir, "Segment
1998	              Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
1999	              July 2018, <https://www.rfc-editor.org/info/rfc8402>.

2001	   [RFC5883]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
2002	              (BFD) for Multihop Paths", RFC 5883, DOI 10.17487/RFC5883,
2003	              June 2010, <https://www.rfc-editor.org/info/rfc5883>.

2005	   [RFC8231]  Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path
2006	              Computation Element Communication Protocol (PCEP)
2007	              Extensions for Stateful PCE", RFC 8231,
2008	              DOI 10.17487/RFC8231, September 2017,
2009	              <https://www.rfc-editor.org/info/rfc8231>.

2011	   [RFC8281]  Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "Path
2012	              Computation Element Communication Protocol (PCEP)
2013	              Extensions for PCE-Initiated LSP Setup in a Stateful PCE
2014	              Model", RFC 8281, DOI 10.17487/RFC8281, December 2017,
2015	              <https://www.rfc-editor.org/info/rfc8281>.

2017	   [RFC5925]  Touch, J., Mankin, A., and R. Bonica, "The TCP
2018	              Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
2019	              June 2010, <https://www.rfc-editor.org/info/rfc5925>.

2021	   [RFC8253]  Lopez, D., Gonzalez de Dios, O., Wu, Q., and D. Dhody,
2022	              "PCEPS: Usage of TLS to Provide a Secure Transport for the
2023	              Path Computation Element Communication Protocol (PCEP)",
2024	              RFC 8253, DOI 10.17487/RFC8253, October 2017,
2025	              <https://www.rfc-editor.org/info/rfc8253>.

2027	   [RFC6790]  Kompella, K., Drake, J., Amante, S., Henderickx, W., and
2028	              L. Yong, "The Use of Entropy Labels in MPLS Forwarding",
2029	              RFC 6790, DOI 10.17487/RFC6790, November 2012,
2030	              <https://www.rfc-editor.org/info/rfc6790>.

2032	   [RFC8662]  Kini, S., Kompella, K., Sivabalan, S., Litkowski, S.,
2033	              Shakir, R., and J. Tantsura, "Entropy Label for Source
2034	              Packet Routing in Networking (SPRING) Tunnels", RFC 8662,
2035	              DOI 10.17487/RFC8662, December 2019,
2036	              <https://www.rfc-editor.org/info/rfc8662>.

2038	   [RFC8491]  Tantsura, J., Chunduri, U., Aldrin, S., and L. Ginsberg,
2039	              "Signaling Maximum SID Depth (MSD) Using IS-IS", RFC 8491,
2040	              DOI 10.17487/RFC8491, November 2018,
2041	              <https://www.rfc-editor.org/info/rfc8491>.

2043	   [RFC8476]  Tantsura, J., Chunduri, U., Aldrin, S., and P. Psenak,
2044	              "Signaling Maximum SID Depth (MSD) Using OSPF", RFC 8476,
2045	              DOI 10.17487/RFC8476, December 2018,
2046	              <https://www.rfc-editor.org/info/rfc8476>.

2048	Authors' Addresses

2050	   Colin Bookham (editor)
2051	   Nokia
2052	   740 Waterside Drive
2053	   Almondsbury, Bristol
2054	   UK

2056	   Email: colin.bookham@nokia.com

2058	   Andrew Stone
2059	   Nokia
2060	   600 March Road
2061	   Kanata, Ontario
2062	   Canada

2064	   Email: andrew.stone@nokia.com

2066	   Jeff Tantsura
2067	   Apstra
2068	   333 Middlefield Road #200
2069	   Menlo Park, CA 94025
2070	   USA

2072	   Email: jefftant.ietf@gmail.com
2073	   Muhammad Durrani
2074	   Equinix Inc
2075	   1188 Arques Ave
2076	   Sunnyvale CA
2077	   USA

2079	   Email: mdurrani@equinix.com

2081	   Bruno Decraene
2082	   Orange
2083	   38-40 Rue de General Leclerc
2084	   92794 Issey Moulineaux cedex 9
2085	   France

2087	   Email: bruno.decraene@orange.com