idnits 2.17.1 draft-bookham-rtgwg-nfix-arch-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet has text resembling RFC 2119 boilerplate text. -- The document date (5 January 2022) is 840 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'ELI' is mentioned on line 755, but not defined == Missing Reference: 'EL' is mentioned on line 755, but not defined == Missing Reference: 'RFC7130' is mentioned on line 1059, but not defined == Outdated reference: A later version (-10) exists of draft-ietf-bess-evpn-ipvpn-interworking-06 == Outdated reference: A later version (-22) exists of draft-ietf-spring-segment-routing-policy-14 == Outdated reference: A later version (-13) exists of draft-ietf-rtgwg-segment-routing-ti-lfa-07 == Outdated reference: A later version (-19) exists of draft-ietf-idr-te-lsp-distribution-16 == Outdated reference: A later version (-09) exists of draft-filsfils-spring-sr-policy-considerations-08 == Outdated reference: A later version (-20) exists of draft-ietf-rtgwg-bgp-pic-17 == Outdated reference: A later version (-08) exists of draft-ietf-idr-next-hop-capability-07 == Outdated reference: A later version (-06) exists of draft-ietf-idr-long-lived-gr-00 -- Obsolete informational reference (is this intentional?): RFC 7752 (Obsoleted by RFC 9552) Summary: 0 errors (**), 0 flaws (~~), 14 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTG Working Group C. Bookham, Ed. 3 Internet-Draft A. Stone 4 Intended status: Informational Nokia 5 Expires: 9 July 2022 J. Tantsura 6 Microsoft 7 M. Durrani 8 Equinix Inc 9 B. Decraene 10 Orange 11 5 January 2022 13 An Architecture for Network Function Interconnect 14 draft-bookham-rtgwg-nfix-arch-04 16 Abstract 18 The emergence of technologies such as 5G, the Internet of Things 19 (IoT), and Industry 4.0, coupled with the move towards network 20 function virtualization, means that the service requirements demanded 21 from networks are changing. This document describes an architecture 22 for a Network Function Interconnect (NFIX) that allows for 23 interworking of physical and virtual network functions in a unified 24 and scalable manner across wide-area network and data center domains 25 while maintaining the ability to deliver against SLAs. 27 Requirements Language 29 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 30 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 31 document are to be interpreted as described in BCP 14 32 [RFC2119][RFC8174] when, and only when, they appear in all capitals, 33 as shown here. 35 Status of This Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at https://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on 9 July 2022. 51 Copyright Notice 53 Copyright (c) 2022 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 58 license-info) in effect on the date of publication of this document. 59 Please review these documents carefully, as they describe your rights 60 and restrictions with respect to this document. Code Components 61 extracted from this document must include Revised BSD License text as 62 described in Section 4.e of the Trust Legal Provisions and are 63 provided without warranty as described in the Revised BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 3. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 7 71 5. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 8 72 5.1. VNF Assumptions . . . . . . . . . . . . . . . . . . . . . 8 73 5.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . 9 74 5.3. Use of a Centralized Controller . . . . . . . . . . . . . 9 75 5.4. Routing and LSP Underlay . . . . . . . . . . . . . . . . 11 76 5.4.1. Intra-Domain Routing . . . . . . . . . . . . . . . . 11 77 5.4.2. Inter-Domain Routing . . . . . . . . . . . . . . . . 13 78 5.4.3. Intra-Domain and Inter-Domain Traffic-Engineering . . 15 79 5.5. Service Layer . . . . . . . . . . . . . . . . . . . . . . 17 80 5.6. Service Differentiation . . . . . . . . . . . . . . . . . 19 81 5.7. Automated Service Activation . . . . . . . . . . . . . . 20 82 5.8. Service Function Chaining . . . . . . . . . . . . . . . . 21 83 5.9. Stability and Availability . . . . . . . . . . . . . . . 23 84 5.9.1. IGP Reconvergence . . . . . . . . . . . . . . . . . . 23 85 5.9.2. Data Center Reconvergence . . . . . . . . . . . . . . 23 86 5.9.3. Exchange of Inter-Domain Routes . . . . . . . . . . . 24 87 5.9.4. Controller Redundancy . . . . . . . . . . . . . . . . 25 88 5.9.5. Path and Segment Liveliness . . . . . . . . . . . . . 27 89 5.10. Scalability . . . . . . . . . . . . . . . . . . . . . . . 28 90 5.10.1. Asymmetric Model B for VPN Families . . . . . . . . 30 91 6. Illustration of Use . . . . . . . . . . . . . . . . . . . . . 32 92 6.1. Reference Topology . . . . . . . . . . . . . . . . . . . 32 93 6.2. PNF to PNF Connectivity . . . . . . . . . . . . . . . . . 34 94 6.3. VNF to PNF Connectivity . . . . . . . . . . . . . . . . . 35 95 6.4. VNF to VNF Connectivity . . . . . . . . . . . . . . . . . 36 96 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 37 97 8. Security Considerations . . . . . . . . . . . . . . . . . . . 38 98 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 99 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 38 100 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 101 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 102 12.1. Normative References . . . . . . . . . . . . . . . . . . 39 103 12.2. Informative References . . . . . . . . . . . . . . . . . 39 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 45 106 1. Introduction 108 With the introduction of technologies such as 5G, the Internet of 109 Things (IoT), and Industry 4.0, service requirements are changing. 110 In addition to the ever-increasing demand for more capacity, these 111 services have other stringent service requirements that need to be 112 met such as ultra-reliable and/or low-latency communication. 114 Parallel to this, there is a continued trend to move towards network 115 function virtualization. Operators are building digitalized 116 infrastructure capable of hosting numerous virtualized network 117 functions (VNFs). Infrastructure that can scale in and scale out 118 depending on the application demand and can deliver flexibility and 119 service velocity. Much of this virtualization activity is driven by 120 the afore-mentioned emerging technologies as new infrastructure is 121 deployed in support of them. To try and meet the new service 122 requirements some of these VNFs are becoming more dispersed, so it is 123 common for networks to have a mix of centralized medium- or large- 124 sized sized data centers together with more distributed smaller 125 'edge-clouds'. VNFs hosted within these data centers require 126 seamless connectivity to each other, and to their existing physical 127 network function (PNF) counterparts. This connectivity also needs to 128 deliver against agreed SLAs. 130 Coupled with the deployment of virtualization is automation. Many of 131 these VNFs are deployed within SDN-enabled data centers where 132 automation is simply a must-have capability to improve service 133 activation lead-times. The expectation is that services will be 134 instantiated in an abstract point-and-click manner and be 135 automatically created by the underlying network, dynamically adapting 136 to service connectivity changes as virtual entities move between 137 hosts. 139 This document describes an architecture for a Network Function 140 Interconnect (NFIX) that allows for interworking of physical and 141 virtual network functions in a unified and scalable manner. It 142 describes a mechanism for establishing connectivity across multiple 143 discreet domains in both the wide-area network (WAN) and the data 144 center (DC) while maintaining the ability to deliver against SLAs. 145 To achieve this NFIX works with the underlying topology to build a 146 unified over-the-top topology. 148 The NFIX architecture described in this document does not define any 149 new protocols but rather outlines an architecture utilizing a 150 collaboration of existing standards-based protocols. 152 2. Terminology 154 * A physical network function (PNF) refers to a network device such 155 as a Provider Edge (PE) router that connects physically to the 156 wide-area network. 158 * A virtualized network function (VNF) refers to a network device 159 such as a provider edge (PE) router that is hosted on an 160 application server. The VNF may be bare-metal in that it consumes 161 the entire resources of the server, or it may be one of numerous 162 virtual functions instantiated as a VM or number of containers on 163 a given server that is controlled by a hypervisor or container 164 management platform. 166 * A Data Center Border (DCB) router refers to the network function 167 that spans the border between the wide-area and the data center 168 networks, typically interworking the different encapsulation 169 techniques employed within each domain. 171 * An Interconnect controller is the controller responsible for 172 managing the NFIX fabric and services. 174 * A DC controller is the term used for a controller that resides 175 within an SDN-enabled data center and is responsible for the DC 176 network(s) 178 3. Motivation 180 Industrial automation and business-critical environments use 181 applications that are demanding on the network. These applications 182 present different requirements from low-latency to high-throughput, 183 to application-specific traffic conditioning, or a combination. The 184 evolution to 5G equally presents challenges for mobile back-, front- 185 and mid-haul networks. The requirement for ultra-reliable low- 186 latency communication means that operators need to re-evaluate their 187 network architecture to meet these requirements. 189 At the same time, the service edge is evolving. Where the service 190 edge device was historically a PNF, the adoption of virtualization 191 means VNFs are becoming more commonplace. Typically, these VNFs are 192 hosted in some form of data center environment but require end-to-end 193 connectivity to other VNFs and/or other PNFs. This represents a 194 challenge because generally transport layer connectivity differs 195 between the WAN and the data center environment. The WAN includes 196 all levels of hierarchy (core, aggregation, access) that form the 197 networks footprint, where transport layer connectivity using IP/MPLS 198 is commonplace. In the data center native IP is commonplace, 199 utilizing network virtualization overlay (NVO) technologies such as 200 virtual extensible LAN (VXLAN) [RFC7348], network virtualization 201 using generic routing encapsulation (NVGRE) [RFC7637], or generic 202 network virtualization encapsulation (GENEVE) [I-D.ietf-nvo3-geneve]. 203 There is a requirement to seamlessly integrate these islands and 204 avoid heavy-lifting at interconnects as well as providing a means to 205 provision end-to-end services with a single touch point at the edge. 207 The service edge boundary is also changing. Some functions that were 208 previously reasonably centralized are now becoming more distributed. 209 One reason for this is to attempt to deal with low latency 210 requirements. Another reason is that operators seek to reduce costs 211 by deploying low/medium-capacity VNFs closer to the edge. Equally, 212 virtualization also sees some of the access network moving towards 213 the core. Examples of this include cloud-RAN or Software-Defined 214 Access Networks. 216 Historically service providers have architected data centers 217 independently from the wide-area network, creating two independent 218 domains or islands. As VNFs become part of the service landscape the 219 service data-path must be extended across the WAN into the data 220 center infrastructure, but in a manner that still allows operators to 221 meet deterministic performance requirements. Methods for stitching 222 WAN and DC infrastructures together with some form of service- 223 interworking at the data center border have been implemented and 224 deployed, but this service-interworking approach has several 225 limitations: 227 * The data center environment typically uses encapsulation 228 techniques such as VXLAN or NVGRE while the WAN typically uses 229 encapsulation techniques such as MPLS [RFC3031]. Underlying 230 optical infrastructure might also need to be programmed. These 231 are incompatible and require interworking at the service layer. 233 * It typically requires heavy-touch service provisioning on the data 234 center border. In an end-to-end service, midpoint provisioning is 235 undesirable and should be avoided. 237 * Automation is difficult; largely due to the first two points but 238 with additional contributing factors. In the virtualization world 239 automation is a must-have capability. 241 * When a service is operating at Layer 3 in a data center with 242 redundant interconnects the risk of routing loops exists. There 243 is no inherent loop avoidance mechanism when redistributing routes 244 between address families so extreme care must be taken. Proposals 245 such as the Domain Path (D-PATH) attribute 246 [I-D.ietf-bess-evpn-ipvpn-interworking] attempt to address this 247 issue but as yet are not widely implemented or deployed. 249 * Some or all the above make the service-interworking gateway 250 cumbersome with questionable scaling attributes. 252 Hence there is a requirement to create an open, scalable, and unified 253 network architecture that brings together the wide-area network and 254 data center domains. It is not an architecture e xclusively targeted 255 at greenfield deployments, nor does it require a flag day upgrade to 256 deploy in a brownfield network. It is an evolutionary step to a 257 consolidated network that uses the constructs of seamless MPLS 258 [I-D.ietf-mpls-seamless-mpls] as a baseline and extends upon that to 259 include topologies that may not be link-state based and to provide 260 end-to-end path control. Overall the NFIX architecture aims to 261 deliver the following: 263 * Allows for an evolving service edge boundary without having to 264 constantly restructure the architecture. 266 * Provides a mechanism for providing seamless connectivity between 267 VNF to VNF, VNF to PNF, and PNF to PNF, with deterministic SLAs, 268 and with the ability to provide differentiated SLAs to suit 269 different service requirements. 271 * Delivers a unified transport fabric using Segment Routing (SR) 272 [RFC8402] where service delivery mandates touching only the 273 service edge without imposing additional encapsulation 274 requirements in the DC. 276 * Embraces automation by providing an environment where any end-to- 277 end connectivity can be instantiated in a single request manner 278 while maintaining SLAs. 280 4. Requirements 282 The following section outlines the requirements that the proposed 283 solution must meet. From an overall perspective, the proposed 284 generic architecture must: 286 * Deliver end-to-end transport LSPs using traffic-engineering (TE) 287 as required to meet appropriate SLAs for the service using(s) 288 using those LSPs. End-to-end refers to VNF and/or PNF 289 connectivity or a combination of both. 291 * Provide a solution that allows for optimal end-to-end path 292 placement; where optimal not only meets the requirements of the 293 path in question but also meets the global network objectives. 295 * Support varying types of VNF physical network attachment and 296 logical (underlay/overlay) connectivity. 298 * Facilitate automation of service provision. As such the solution 299 should avoid heavy-touch service provisioning and decapsulation/ 300 encapsulation at data center border routers. 302 * Provide a framework for delivering logical end-to-end networks 303 using differentiated logical topologies and/or constraints. 305 * Provide a high level of stability; faults in one domain should not 306 propagate to another domain. 308 * Provide a mechanism for homogeneous end-to-end OAM. 310 * Hide/localize instabilities in the different domains that 311 participate in the end-to-end service. 313 * Provide a mechanism to minimize the label-stack depth required at 314 path head-ends for SR-TE LSPs. 316 * Offer a high level of scalability. 318 * Although not considered in-scope of the current version of this 319 document, the solution should not preclude the deployment of 320 multicast. This subject may be covered in later versions of this 321 document. 323 5. Theory of Operation 325 This section describes the NFIX architecture including the building 326 blocks and protocol machinery that is used to form the fabric. Where 327 considered appropriate rationale is given for selection of an 328 architectural component where other seemingly applicable choices 329 could have been made. 331 5.1. VNF Assumptions 333 For the sake of simplicity, references to VNF are made in a broad 334 sense. Equally, the differences between VNF and Container Network 335 Function (CNF) are largely immaterial for the purposes of this 336 document, therefore VNF is used to represent both. The way in which 337 a VNF is instantiated and provided network connectivity will differ 338 based on environment and VNF capability, but for conciseness this is 339 not explicitly detailed with every reference to a VNF. Common 340 examples of VNF variants include but are not limited to: 342 * A VNF that functions as a routing device and has full IP routing 343 and MPLS capabilities. It can be connected simultaneously to the 344 data center fabric underlay and overlay and serves as the NVO 345 tunnel endpoint [RFC8014]. Examples of this might be a 346 virtualized PE router, or a virtualized Broadband Network Gateway 347 (BNG). 349 * A VNF that functions as a device (host or router) with limited IP 350 routing capability. It does not connect directly to the data 351 center fabric underlay but rather connects to one or more external 352 physical or virtual devices that serve as the NVO tunnel 353 endpoint(s). It may however have single or multiple connections 354 to the overlay. Examples of this might be a mobile network 355 control or management plane function. 357 * A VNF that has no routing capability. It is a virtualized 358 function hosted within an application server and is managed by a 359 hypervisor or container host. The hypervisor/container host acts 360 as the NVO endpoint and interfaces to some form of SDN controller 361 responsible for programming the forwarding plane of the 362 virtualization host using, for example, OpenFlow. Examples of 363 this might be an Enterprise application server or a web server 364 running as a virtual machine and front-ended by a virtual routing 365 function such as OVS/xVRS/VTF. 367 Where considered necessary exceptions to the examples provided above 368 or focus on a particular scenario will be highlighted. 370 5.2. Overview 372 The NFIX architecture makes no assumptions about how the network is 373 physically composed, nor does it impose any dependencies upon it. It 374 also makes no assumptions about IGP hierarchies and the use of areas/ 375 levels or discrete IGP instances within the WAN is fully endorsed to 376 enhance scalability and constrain fault propagation. This could 377 apply for instance to a hierarchical WAN from core to edge or from 378 WAN to LAN connections. The overall architecture uses the constructs 379 of seamless MPLS as a baseline and extends upon that. The concept of 380 decomposing the network into multiple domains is one that has been 381 widely deployed and has been proven to scale in networks with large 382 numbers of nodes. 384 The proposed architecture uses segment routing (SR) as its preferred 385 choice of transport. Segment routing is chosen for construction of 386 end-to-end LSPs given its ability to traffic-engineer through source- 387 routing while concurrently scaling exceptionally well due to its lack 388 of network state other than the ingress node. This document uses SR 389 instantiated on an MPLS forwarding plane(SR-MPLS), although it does 390 not preclude the use of SRv6 either now or at some point in the 391 future. The rationale for selecting SR-MPLS is simply maturity and 392 more widespread applicability across a potentially broad range of 393 network devices. This document may be updated in future versions to 394 include more description of SRv6 applicability. 396 5.3. Use of a Centralized Controller 398 It is recognized that for most operators the move towards the use of 399 a controller within the wide-area network is a significant change in 400 operating model. In the NFIX architecture it is a necessary 401 component. Its use is not simply to offload inter-domain path 402 calculation from network elements; it provides many more benefits: 404 * It offers the ability to enforce constraints on paths that 405 originate/terminate on different network elements, thereby 406 providing path diversity, and/or bidirectionality/co-routing, and/ 407 or disjointness. 409 * It avoids collisions, re-tries, and packing problems that has been 410 observed in networks using distributed TE path calculation, where 411 head-ends make autonomous decisions. 413 * A controller can take a global view of path placement strategies, 414 including the ability to make path placement decisions over a high 415 number of LSPs concurrently as opposed to considering each LSP 416 independently. In turn, this allows for 'global' optimization of 417 network resources such as available capacity. 419 * A controller can make decisions based on near-real-time network 420 state and optimize paths accordingly. For example, if a network 421 link becomes congested it may recompute some of the paths 422 transiting that link to other links that may not be quite as 423 optimal but do have available capacity. Or if a link latency 424 crosses a certain threshold, it may select to reoptimize some 425 latency-sensitive paths away from that link. 427 * The logic of a controller can be extended beyond pure path 428 computation and placement. If the controller is aware of 429 services, service requirements, and available paths within the 430 network it can cross-correlate between them and ensure that the 431 appropriate paths are used for the appropriate services. 433 * The controller can provide assurance and verification of the 434 underlying SLA provided to a given service. 436 As the main objective of the NFIX architecture is to unify the data 437 center and wide-area network domains, using the term controller is 438 not sufficiently succinct. The centralized controller may need to 439 interface to other controllers that potentially reside within an SDN- 440 enabled data center. Therefore, to avoid interchangeably using the 441 term controller for both functions, we distinguish between them 442 simply by using the terms 'DC controller' which as the name suggests 443 is responsible for the DC, and 'Interconnect controller' responsible 444 for managing the extended SR fabric and services. 446 The Interconnect controller learns wide-area network topology 447 information and allocation of segment routing SIDs within that domain 448 using BGP link-state [RFC7752] with appropriate SR extensions. 449 Equally it learns data center topology information and Prefix-SID 450 allocation using BGP labeled unicast [RFC8277] with appropriate SR 451 extensions, or BGP link-state if a link-state IGP is used within the 452 data center. If Route-Reflection is used for exchange of BGP link- 453 state or labeled unicast NLRI within one or more domains, then the 454 Interconnect controller need only peer as a client with those Route- 455 Reflectors in order to learn topology information. 457 Where BGP link-state is used to learn the topology of a data center 458 (or any IGP routing domain) the BGP-LS Instance Identifier (Instance- 459 ID) is carried within Node/Link/Prefix NLRI and is used to identify a 460 given IGP routing domain. Where labeled unicast BGP is used to 461 discover the topology of one or more data center domains there is no 462 equivalent way for the Interconnect controller to achieve a level of 463 routing domain correlation. The controller may learn some splintered 464 connectivity map consisting of 10 leaf switches, four spine switches, 465 and four DCB's, but it needs some form of key to inform it that leaf 466 switches 1-5, spine switches 1 and 2, and DCB's 1 and 2 belong to 467 data center 1, while leaf switches 6-10, spine switches 3 and 4, and 468 DCB's 3 and 4 belong to data center 2. What is needed is a form of 469 'data center membership identification' to provide this correlation. 470 Optionally this could be achieved at BGP level using a standard 471 community to represent each data center, or it could be done at a 472 more abstract level where for example the DC controller provides the 473 membership identification to the Interconnect controller through an 474 application programming interface (API). 476 Understanding real-time network state is an important part of the 477 Interconnect controllers role, and only with this information is the 478 controller able to make informed decisions and take preventive or 479 corrective actions as necessary. There are numerous methods 480 implemented and deployed that allow for harvesting of network state, 481 including (but not limited to) IPFIX [RFC7011], Netconf/YANG 482 [RFC6241][RFC6020], streaming telemetry, BGP link-state [RFC7752] 483 [I-D.ietf-idr-te-lsp-distribution], and the BGP Monitoring Protocol 484 (BMP) [RFC7854]. 486 5.4. Routing and LSP Underlay 488 This section describes the mechanisms and protocols that are used to 489 establish end-to-end LSPs; where end-to-end refers to VNF-to-VNF, 490 PNF-to-PNF, or VNF-to-PNF. 492 5.4.1. Intra-Domain Routing 494 In a seamless MPLS architecture domains are based on geographic 495 dispersion (core, aggregation, access). Within this document a 496 domain is considered as any entity with a captive topology; be it a 497 link-state topology or otherwise. Where reference is made to the 498 wide-area network domain, it refers to one or more domains that 499 constitute the wide-area network domain. 501 This section discusses the basic building blocks required within the 502 wide-area network and the data center, noting from above that the 503 wide-area network may itself consist of multiple domains. 505 5.4.1.1. Wide-Area Network Domains 507 The wide-area network includes all levels of hierarchy (core, 508 aggregation, access) that constitute the networks MPLS footprint as 509 well as the data Center border routers. Each domain that constitutes 510 part of the wide-area network runs a link-state interior gateway 511 protocol (IGP) such as ISIS or OSPF, and each domain may use IGP- 512 inherent hierarchy (OSPF areas, ISIS levels) with an assumption that 513 visibility is domain-wide using, for example, L2 to L1 514 redistribution. Alternatively, or additionally, there may be 515 multiple domains that are split by using separate and distinct 516 instances of IGP. There is no requirement for IGP redistribution of 517 any link or loopback addresses between domains. 519 Each IGP should be enabled with the relevant extensions for segment 520 routing [RFC8667][RFC8665], and each SR-capable router should 521 advertise a Node-SID for its loopback address, and an Adjacency-SID 522 (Adj-SID) for every connected interface (unidirectional adjacency) 523 belonging to the SR domain. SR Global Blocks (SRGB) can be allocated 524 to each domain as deemed appropriate to specific network 525 requirements. Border routers belonging to multiple domains have an 526 SRGB for each domain. 528 The default forwarding path for intra-domain LSPs that do not require 529 TE is simply an SR LSP containing a single label advertised by the 530 destination as a Node-SID and representing the ECMP-aware shortest 531 path to that destination. Intra-domain TE LSPs are constructed as 532 required by the Interconnect controller. Once a path is calculated 533 it is advertised as an explicit SR Policy 534 [I-D.ietf-spring-segment-routing-policy] containing one or more paths 535 expressed as one or more segment-lists, which may optionally contain 536 binding SIDs if requirements dictate. An SR Policy is identified 537 through the tuple [headend, color, endpoint] and this tuple is used 538 extensively by the Interconnect controller to associate services with 539 an underlying SR Policy that meets its objectives. 541 To provide support for ECMP the Entropy Label [RFC6790][RFC8662] 542 should be utilized. Entropy Label Capability (ELC) should be 543 advertised into the IGP using the IS-IS Prefix Attributes TLV 544 [I-D.ietf-isis-mpls-elc] or the OSPF Extended Prefix TLV 545 [I-D.ietf-ospf-mpls-elc] coupled with the Node MSD Capability sub-TLV 546 to advertise Entropy Readable Label Depth (ERLD) [RFC8491][RFC8476] 547 and the base MPLS Imposition (BMI). Equally, support for ELC 548 together with the supported ERLD should be signaled in BGP using the 549 BGP Next-Hop Capability [I-D.ietf-idr-next-hop-capability]. Ingress 550 nodes and or DCBs should ensure sufficient entropy is applied to 551 packets to exercise available ECMP links. 553 5.4.1.2. Data Center Domain 555 The data center domain includes all fabric switches, network 556 virtualization edge (NVE), and the data center border routers. The 557 data center routing design may align with the framework of [RFC7938] 558 running eBGP single-hop sessions established over direct point-to- 559 point links, or it may use an IGP for dissemination of topology 560 information. This document focuses on the former, simply because the 561 ue of an IGP largely makes the data centers behaviour analogous to 562 that of a wide-area network domain. 564 The chosen method of transport or encapsulation within the data 565 center for NFIX is SR-MPLS over IP/UDP [RFC8663] or, where possible, 566 native SR-MPLS. The choice of SR-MPLS over IP/UDP or native SR-MPLS 567 allows for good entropy to maximize the use of equal-cost Clos fabric 568 links. Native SR-MPLS encapsulation provides entropy through use of 569 the Entropy Label, and, like the wide-area network, support for ELC 570 together with the support ERLD should be signaled using the BGP Next- 571 Hop Capability attribute. As described in [RFC6790] the ELC is an 572 indication from the egress node of an MPLS tunnel to the ingress node 573 of the MPLS tunnel that is is capable of processing an Entropy Label. 574 The BGP Next-Hop Capability is a non-transitive attribute which is 575 modified or deleted when the next-hop is changed to reflect the 576 capabilities of the new next-hop. If we assume that the path of a 577 BGP-signaled LSP transits through multiple ASNs, and/or a single ASN 578 with multiple next-hops, then it is not possible for the ingress node 579 to determine the ELC of the egress node. Without this end-to-end 580 signaling capability the entropy label must only be used when it is 581 explicitly known, through configuration or other means, that the 582 egress node has support for it. Entropy for SR-MPLS over IP/UDP 583 encapsulation uses the source UDP port for IPv4 and the Flow Label 584 for IPv6. Again, the ingress network function should ensure 585 sufficient entropy is applied to exercise available ECMP links. 587 Another significant advantage of the use of native SR-MPLS or SR-MPLS 588 over IP/UDP is that it allows for a lightweight interworking function 589 at the DCB without the requirement for midpoint provisioning; 590 interworking between the data center and the wide-area network 591 domains becomes an MPLS label swap/continue action. 593 Loopback addresses of network elements within the data center are 594 advertised using labeled unicast BGP with the addition of SR Prefix 595 SID extensions [RFC8669] containing a globally unique and persistent 596 Prefix-SID. The data-plane encapsulation of SR-MPLS over IP/UDP or 597 native SR-MPLS allows network elements within the data center to 598 consume BGP Prefix-SIDs and legitimately use those in the 599 encapsulation. 601 5.4.2. Inter-Domain Routing 603 Inter-domain routing is responsible for establishing connectivity 604 between any domains that form the wide-area network, and between the 605 wide-area network and data center domains. It is considered unlikely 606 that every end-to-end LSP will require a TE path, hence there is a 607 requirement for a default end-to-end forwarding path. This default 608 forwarding path may also become the path of last resort in the event 609 of a non-recoverable failure of a TE path. Similar to the seamless 610 MPLS architecture this inter-domain MPLS connectivity is realized 611 using labeled unicast BGP [RFC8277] with the addition of SR Prefix 612 SID extensions. 614 Within each wide-area network domain all service edge routers, DCBs, 615 and ABRs/ASBRs form part of the labeled BGP mesh, which can be either 616 full-mesh, or more likely based on the use of route-reflection. Each 617 of these routers advertises its respective loopback addresses into 618 labeled BGP together with an MPLS label and a globally unique Prefix- 619 SID. Routes are advertised between wide-area network domains by 620 ABRs/ASBRs that impose next-hop-self on advertised routes. The 621 function of imposing next-hop-self for labeled routes means that the 622 ABR/ASBR allocates a new label for advertised routes and programs a 623 label-swap entry in the forwarding plane for received and advertised 624 routes. In short it becomes part of the forwarding path. 626 DCB routers have labeled BGP sessions towards the wide-area network 627 and labeled BGP sessions towards the data center. Routes are 628 bidirectionally advertised between the domains subject to policy, 629 with the DCB imposing itself as next-hop on advertised routes. As 630 above, the function of imposing next-hop-self for labeled routes 631 implies allocation of a new label for advertised routes and a label- 632 swap entry being programmed in the forwarding plane for received and 633 advertised labels. The DCB thereafter becomes the anchor point 634 between the wide-area network domain and the data center domain. 636 Within the wide-area network next-hops for labeled unicast routes 637 containing Prefix-SIDs are resolved to SR LSPs, and within the data 638 center domain next-hops for labeled unicast routes containing Prefix- 639 SIDs are resolved to SR LSPs or IP/UDP tunnels. This provides end- 640 to-end connectivity without a traffic-engineering capability. 642 +---------------+ +----------------+ +---------------+ 643 | Data Center | | Wide-Area | | Wide-Area | 644 | +-----+ Domain 1 +-----+ Domain ā€˜nā€™ | 645 | | DCB | | ABR | | 646 | +-----+ +-----+ | 647 | | | | | | 648 +---------------+ +----------------+ +---------------+ 649 <-- SR/SRoUDP --> <---- IGP/SR ----> <--- IGP/SR ----> 650 <--- BGP-LU ---> NHS <--- BGP-LU ---> NHS <--- BGP-LU ---> 652 Figure 1 654 Default Inter-Domain Forwarding Path 656 5.4.3. Intra-Domain and Inter-Domain Traffic-Engineering 658 The capability to traffic-engineer intra- and inter-domain end-to-end 659 paths is considered a key requirement in order to meet the service 660 objectives previously outlined. To achieve optimal end-to-end path 661 placement the key components to be considered are path calculation, 662 path activation, and FEC-to-path binding procedures. 664 In the NFIX architecture end-to-end path calculation is performed by 665 the Interconnect controller. The mechanics of how the objectives of 666 each path is calculated is beyond the scope of this document. Once a 667 path is calculated based upon its objectives and constraints, the 668 path is advertised from the controller to the LSP headend as an 669 explicit SR Policy containing one or more paths expressed as one or 670 more segment-lists. An SR Policy is identified through the tuple 671 [headend, color, endpoint] and this tuple is used extensively by the 672 Interconnect controller to associate services with an underlying SR 673 Policy that meets its objectives. 675 The segment-list of an SR Policy encodes a source-routed path towards 676 the endpoint. When calculating the segment-list the Interconnect 677 controller makes comprehensive use of the Binding-SID (BSID), 678 instantiating BSID anchors as necessary at path midpoints when 679 calculating and activating a path. The use of BSID is considered 680 fundamental to segment routing as described in 681 [I-D.filsfils-spring-sr-policy-considerations]. It provides opacity 682 between domains, ensuring that any segment churn is constrained to a 683 single domain. It also reduces the number of segments/labels that 684 the headend needs to impose, which is particularly important given 685 that network elements within a data center generally have limited 686 label imposition capabilities. In the context of the NFIX 687 architecture it is also the vehicle that allows for removal of heavy 688 midpoint provisioning at the DCB. 690 For example, assume that VNF1 is situated in data center 1, which is 691 interconnected to the wide-area network via DCB1. VNF1 requires 692 connectivity to VNF2, situated in data center 2, which is 693 interconnected to the wide-area network via DCB2. Assuming there is 694 no existing TE path that meet VNF1's requirements, the Interconnect 695 controller will: 697 * Instantiate an SR Policy on DCB1 with BSID n and a segment-list 698 containing the relevant segments of a TE path to DCB2. DCB1 699 therefore becomes a BSID anchor. 701 * Instantiate an SR Policy on VNF1 with BSID m and a segment-list 702 containing segments {DCB1, n, VNF2}. 704 +---------------+ +----------------+ +---------------+ 705 | Data Center 1 | | Wide-Area | | Data Center 2 | 706 | +----+ +----+ 3 +----+ +----+ | 707 | |VNF1| |DCB1|-1 / \ 5--|DCB2| |VNF2| | 708 | +----+ +----+ \ / \ / +----+ +----+ | 709 | | | 2 4 | | | 710 +---------------+ +----------------+ +---------------+ 711 SR Policy SR Policy 712 BSID m BSID n 713 {DCB1,n,VNF2} {1,2,3,4,5,DCB2} 715 Figure 2 717 Traffic-Engineered Path using BSID 719 In the above figure a single DCB is used to interconnect two domains. 720 Similarly, in the case of two wide-area domains the DCB would be 721 represented as an ABR or ASBR. In some single operator environments 722 domains may be interconnected using adjacent ASBRs connected via a 723 distinct physical link. In this scenario the procedures outlined 724 above may be extended to incorporate the mechanisms used in Egress 725 Peer Engineering (EPE) [I-D.ietf-spring-segment-routing-central-epe] 726 to form a traffic-engineered path spanning distinct domains. 728 5.4.3.1. Traffic-Engineering and ECMP 730 Where the Interconnect controller is used to place SR policies, 731 providing support for ECMP requires some consideration. An SR Policy 732 is described with one or more segment-lists, end each of those 733 segment-lists may or may not provide ECMP as a sum instruction and 734 each SID itself may or may not support ECMP forwarding. When an 735 individual SID is a BSID, an ECMP path may or may not also be nested 736 within. The Interconnect controller may choose to place a path 737 consisting entirely of non-ECMP-aware Adj-SIDs (each SID representing 738 a single adjacency) such that the controller has explicit hop-by-hop 739 knowledge of where that SR-TE LSP is routed. This is beneficial to 740 allow the controller to take corrective action if the criteria that 741 was used to initially select a particular link in a particular path 742 subsequently changes. For example, if the latency of a link 743 increases or a link becomes congested and a path should be rerouted. 744 If ECMP-aware SIDs are used in the SR policy segment-list (including 745 Node-SIDs, Adj-SIDs representing parallel links, and Anycast SIDs) SR 746 routers are able to make autonomous decisions about where traffic is 747 forwarded. As a result, it is not possible for the controller to 748 fully understand the impact of a change in network state and react to 749 it. With this in mind there are a number of approaches that could be 750 adopted: 752 * If there is no requirement for the Interconnect controller to 753 explicitly track path on a hop-by-hop basis, ECMP-aware SIDs may 754 be used in the SR policy segment-list. This approach may require 755 multiple [ELI, EL] pairs to be inserted at the ingress node; for 756 example, above and below a BSID to provide entropy in multiple 757 domains. 759 * If there is a requirement for the Interconnect controller to 760 explicitly track paths on a hop-by-hop to provide the capability 761 to reroute them based on changes in network state, SR policy 762 segment-lists should be constructed of non-ECMP-aware Adj-SIDs. 764 * A hybrid approach that allows for a level of ECMP (at the headend) 765 together with the ability for the Interconnect controller to 766 explicitly track paths is to instantiate an SR policy consisting 767 of a set of segment-lists, each containing non-ECMP-aware Adj- 768 SIDs. Each segment-list will be assigned a weight to allow for 769 ECMP or UCMP. This approach does however imply computation and 770 programing of two paths instead of one. 772 * Another hybrid approach might work as follows. Redundant DCBs 773 advertise an Anycast-SID 'A' into the data center, and also 774 instantiate an SR policy with a segment-list consisting of non- 775 ECMP-aware Adj-SIDs meeting the required connectivity and SLA. 776 The BSID value of this SR policy 'B' must be common to both 777 redundant DCBs, but the calculated paths are diverse. Indeed, 778 multiple segment-lists could be used in this SR policy. A VNF 779 could then instantiate an SR policy with a segment-list of {A, B} 780 to achieve ECMP in the data center and TE in the wide-area network 781 with the option of ECMP at the BSID anchor 783 5.5. Service Layer 785 The service layer is intended to deliver Layer 2 and/or Layer 3 VPN 786 connectivity between network functions to create an overlay utilizing 787 the routing and LSP underlay described in section 5.4. To do this 788 the solution employs the EVPN and/or VPN-IPv4/IPv6 address families 789 to exchange Layer 2 and Layer 3 Network Layer Reachability 790 Information (NLRI). When these NLRI are exchanged between domains it 791 is typical for the border router to set next-hop-self on advertised 792 routes. With the proposed routing and LSP underlay however, this is 793 not required and EVPN/VPN-IPv4/IPv6 routes should be passed end-to- 794 end without transit routers modifying the next-hop attribute. 796 Section 5.4.2 describes the use of labeled unicast BGP to exchange 797 inter-domain routes to establish a default forwarding path. Labeled- 798 unicast BGP is used to exchange prefix reachability between service 799 edge routers, with domain border routes imposing next-hop-self on 800 routes advertised between domains. This provides a default inter- 801 domain forwarding path and provides the required connectivity to 802 establish inter-domain BGP sessions between service edges for the 803 exchange of EVPN and/or VPN-IPv4/IPv6 NLRI. If route-reflection is 804 used for the EVPN and/or VPN-IPv4/IPv6 address families within one or 805 more domains, it may be desirable to create inter-domain BGP sessions 806 between route-reflectors. In this case the peering addresses of the 807 route-reflectors should also be exchanged between domains using 808 labeled unicast BGP. This creates a connectivity model analogous to 809 BGP/MPLS IP-VPN Inter-AS option C [RFC4364]. 811 +----------------+ +----------------+ +----------------+ 812 | +----+ | | +----+ | | +----+ | 813 +----+ | RR | +----+ | RR | +----+ | RR | +----+ 814 | NF | +----+ | DCI| +----+ | DCI| +----+ | NF | 815 +----+ +----+ +----+ +----+ 816 | Domain | | Domain | | Domain | 817 +----------------+ +----------------+ +----------------+ 818 <-------> <-----> NHS <-- BGP-LU ---> NHS <-----> <------> 819 <-------> <--------- EVPN/VPN-IPv4/v6 ----------> <------> 821 Figure 3 823 Inter-Domain Service Layer 825 EVPN and/or VPN-IPv4/v6 routes received from a peer in a different 826 domain will contain a next-hop equivalent to the router that sourced 827 the route. The next-hop of these routes can be resolved to labeled- 828 unicast route (default forwarding path) or to an SR policy (traffic- 829 engineered forwarding path) as appropriate to the service 830 requirements. The exchange of EVPN and/or VPN-IPv4/IPv6 routes in 831 this manner implies that Route-Distinguisher and Route-Target values 832 remain intact end-to-end. 834 The use of end-to-end EVPN and/or VPN-IPv4/IPv6 address families 835 without the imposition of next-hop-self at border routers complements 836 the gateway-less transport layer architecture. It negates the 837 requirement for midpoint service provisioning and as such provides 838 the following benefits: 840 * Avoids the translation of MAC/IP EVPN routes to IP-VPN routes (and 841 vice versa) that is typically associated with service 842 interworking. 844 * Avoids instantiation of MAC-VRFs and IP-VPNs for each tenant 845 resident in the DCB. 847 * Avoids provisioning of demarcation functions between the data 848 center and wide-area network such as QoS, access-control, 849 aggregation and isolation. 851 5.6. Service Differentiation 853 As discussed in section 5.4.3, the use of TE paths is a key 854 capability of the NFIX solution framework described in this document. 855 The Interconnect controller computes end-to-end TE paths between NFs 856 and programs DC nodes, DCBs, ABR/ASBRs, via SR Policy, with the 857 necessary label forwarding entries for each [headend, color, 858 endpoint]. The collection of [headend, endpoint] pairs for the same 859 color constitutes a logical network topology, where each topology 860 satisfies a given SLA requirement. 862 The Interconnect controller discovers the endpoints associated to a 863 given topology (color) upon the reception of EVPN or IPVPN routes 864 advertised by the endpoint. The EVPN and IPVPN NLRIs are advertised 865 by the endpoint nodes along with a color extended community which 866 identifies the topology to which the owner of the NLRI belongs. At a 867 coarse level all the EVPN/IPVPN routes of the same VPN can be 868 advertised with the same color, and therefore a TE topology would be 869 established on a per-VPN basis. At a more granular level IPVPN and 870 especially EVPN provide a more granular way of coloring routes, that 871 will allow the Interconnect controller to associate multiple 872 topologies to the same VPN. For example: 874 * All the EVPN MAC/IP routes for a given VNF may be advertised with 875 the same color. This would allow the Interconnect controller to 876 associate topologies per VNF within the same VPN; that is, VNF1 877 could be blue (e.g., low-latency topology) and VNF2 could be green 878 (e.g., high-throughput). 880 * The EVPN MAC/IP routes and Inclusive Multicast Ethernet Tag (IMET) 881 route for VNF1 may be advertised with different colors, e.g., red 882 and brown, respectively. This would allow the association of 883 e.g., a low-latency topology for unicast traffic to VNF1 and best- 884 effort topology for BUM traffic to VNF1. 886 * Each EVPN MAC/IP route or IP-Prefix route from a given VNF may be 887 advertised with different color. This would allow the association 888 of topologies at the host level or host route granularity. 890 5.7. Automated Service Activation 892 The automation of network and service connectivity for instantiation 893 and mobility of virtual machines is a highly desirable attribute 894 within data centers. Since this concerns service connectivity, it 895 should be clear that this automation is relevant to virtual functions 896 that belong to a service as opposed to a virtual network function 897 that delivers services, such as a virtual PE router. 899 Within an SDN-enabled data center, a typical hierarchy from top to 900 bottom would include a policy engine (or policy repository), one or 901 more DC controllers, numerous hypervisors/container hosts that 902 function as NVO endpoints, and finally the virtual 903 machines(VMs)/containers, which we'll refer to generically as 904 virtualization hosts. 906 The mechanisms used to communicate between the policy engine and DC 907 controller, and between the DC controller and hypervisor/container 908 are not relevant here and as such they are not discussed further. 909 What is important is the interface and information exchange between 910 the Interconnect controller and the data center SDN functions: 912 * The Interconnect controller interfaces with the data center policy 913 engine and publishes the available colors, where each color 914 represents a topological service connectivity map that meets a set 915 of constraints and SLA objectives. This interface is a 916 straightforward API. 918 * The Interconnect controller interfaces with the DC controller to 919 learn overlay routes. This interface is BGP and uses the EVPN 920 Address Family. 922 With the above framework in place, automation of network and service 923 connectivity can be implemented as follows: 925 * The virtualization host is turned-up. The NVO endpoint notifies 926 the DC controller of the startup. 928 * The DC controller retrieves service information, IP addressing 929 information, and service 'color' for the virtualization host from 930 the policy engine. The DC controller subsequently programs the 931 associated forwarding information on the virtualization host. 932 Since the DC controller is now aware of MAC and IP address 933 information for the virtualization host, it advertises that 934 information as an EVPN MAC Advertisement Route into the overlay. 936 * The Interconnect controller receives the EVPN MAC Advertisement 937 Route (potentially via a Route-Reflector) and correlates it with 938 locally held service information and SLA requirements using Route 939 Target and Color communities. If the relevant SR policies are not 940 already in place to support the service requirements and logical 941 connectivity, including any binding-SIDs, they are calculated and 942 advertised to the relevant headends. 944 The same automated service activation principles can also be used to 945 support the scenario where virtualization hosts are moved between 946 hypervisors/container hosts for resourcing or other reasons. We 947 refer to this simply as mobility. If a virtualization host is turned 948 down the parent NVO endpoint notifies the DC controller, which in 949 turn notifies the policy engine and withdraws any EVPN MAC 950 Advertisement Routes. Thereafter all associated state is removed. 951 When the virtualization host is turned up on a different hypervisor/ 952 container host, the automated service connectivity process outlined 953 above is simply repeated. 955 5.8. Service Function Chaining 957 Service Function Chaining (SFC) defines an ordered set of abstract 958 service functions and the subsequent steering of traffic through 959 them. Packets are classified at ingress for processing by the 960 required set of service functions (SFs) in an SFC-capable domain and 961 are then forwarded through each SF in turn for processing. The 962 ability to dynamically construct SFCs containing the relevant SFs in 963 the right sequence is a key requirement for operators. 965 To enable flexible service function deployment models that support 966 agile service insertion the NFIX architecture adopts the use of BGP 967 as the control plane to distribute SFC information. The BGP control 968 plane for Network Service Header (NSH) SFC 969 [I-D.ietf-bess-nsh-bgp-control-plane] is used for this purpose and 970 defines two route types; the Service Function Instance Route (SFIR) 971 and the Service Function Path Route (SFPR). 973 The SFIR is used to advertise the presence of a service function 974 instance (SFI) as a function type (i.e. firewall, TCP optimizer) and 975 is advertised by the node hosting that SFI. The SFIR is advertised 976 together with a BGP Tunnel Encapsulation attribute containing details 977 of how to reach that particular service function through the underlay 978 network (i.e. IP address and encapsulation information). 980 The SFPRs contain service function path (SFP) information and one 981 SFPR is originated for each SFP. Each SFPR contains the service path 982 identifier (SPI) of the path, the sequence of service function types 983 that make up the path (each of which has at least one instance 984 advertised in an SFIR), and the service index (SI) for each listed 985 service function to identify its position in the path. 987 Once a Classifier has determined which flows should be mapped to a 988 given SFP, it imposes an NSH [RFC8300] on those packets, setting the 989 SPI to that of the selected service path (advertised in an SFPR), and 990 the SI to the first hop in the path. As NSH is encapsulation 991 agnostic, the NSH encapsulated packet is then forwarded through the 992 appropriate tunnel to reach the service function forwarder (SFF) 993 supporting that service function instance (advertised in an SFIR). 994 The SFF removes the tunnel encapsulation and forwards the packet with 995 the NSH to the relevant SF based upon a lookup of the SPI/SI. When 996 it is returned from the SF with a decremented SI value, the SFF 997 forwards the packet to the next hop in the SFP using the tunnel 998 information advertised by that SFI. This procedure is repeated until 999 the last hop of the SFP is reached. 1001 The use of the NSH in this manner allows for service chaining with 1002 topological and transport independence. It also allows for the 1003 deployment of SFIs in a condensed or dispersed fashion depending on 1004 operator preference or resource availability. Service function 1005 chains are built in their own overlay network and share a common 1006 underlay network, where that common underlay network is the NFIX 1007 fabric described in section 5.4. BGP updates containing an SFIR or 1008 SFPR are advertised in conjunction with one or more Route Targets 1009 (RTs), and each node in a service function overlay network is 1010 configured with one or more import RTs. As a result, nodes will only 1011 import routes that are applicable and that local policy dictates. 1012 This provides the ability to support multiple service function 1013 overlay networks or the construction of service function chains 1014 within L3VPN or EVPN services. 1016 Although SFCs are constructed in a unidirectional manner, the BGP 1017 control plane for NSH SFC allows for the optional association of 1018 multiple paths (SFPRs). This provides the ability to construct a 1019 bidirectional service function chain in the presence of multiple 1020 equal-cost paths between source and destination to avoid problems 1021 that SFs may suffer with traffic asymmetry. 1023 The proposed SFC model can be considered decoupled in that the use of 1024 SR as a transport between SFFs is completely independent of the use 1025 of NSH to define the SFC. That is, it uses an NSH-based SFC and SR 1026 is just one of many encapsulations that could be used between SFFs. 1027 A similar more integrated approach proposes encoding a service 1028 function as a segment so that an SFC can be constructed as a segment- 1029 list. In this case it can be considered an SR-based SFC with an NSH- 1030 based service plane since the SF is unaware of the presence of the 1031 SR. Functionally both approaches are very similar and as such both 1032 could be adopted and could work in parallel. Construction of SFCs 1033 based purely on SR (SF is SR-aware) are not considered at this time. 1035 5.9. Stability and Availability 1037 Any network architecture should have the capability to self-restore 1038 following the failure of a network element. The time to reconverge 1039 following the failure needs to be minimal to avoid evident 1040 disruptions in service. This section discusses protection mechanisms 1041 that are available for use and their applicability to the proposed 1042 architecture. 1044 5.9.1. IGP Reconvergence 1046 Within the construct of an IGP topology the Topology Independent Loop 1047 Free Alternate (TI-LFA) [I-D.ietf-rtgwg-segment-routing-ti-lfa] can 1048 be used to provide a local repair mechanism that offers both link and 1049 node protection. 1051 TI-LFA is a repair mechanism, and as such it is reactive and 1052 initially needs to detect a given failure. To provide fast failure 1053 detection the Bidirectional Forwarding Mechanism (BFD) is used. 1054 Consideration needs to be given to the restoration capabilities of 1055 the underlying transmission when deciding values for message 1056 intervals and multipliers to avoid race conditions, but failure 1057 detection in the order of 50 milliseconds can reasonably be 1058 anticipated. Where Link Aggregation Groups (LAG) are used, micro-BFD 1059 [RFC7130] can be used to similar effect. Indeed, to allow for 1060 potential incremental growth in capacity it is not uncommon for 1061 operators to provision all network links as LAG and use micro-BFD 1062 from the outset. 1064 5.9.2. Data Center Reconvergence 1066 Clos fabrics are extremely common within data centers, and 1067 fundamental to a Clos fabric is the ability to load-balance using 1068 Equal Cost Multipath (ECMP). The number of ECMP paths will vary 1069 dependent on the number of devices in the parent tier but will never 1070 be less than two for redundancy purposes with traffic hashed over the 1071 available paths. In this scenario the availability of a backup path 1072 in the event of failure is implicit. Commonly within the DC, rather 1073 than computing protect paths (like LFA), techniques such as 'fast 1074 rehash' are often utilized. In this particular case, the failed 1075 next-hop is removed from the multi-path forwarding data structure and 1076 traffic is then rehashed over the remaining active paths. 1078 In BGP-only data centers this relies on the implementation of BGP 1079 multipath. As network elements in the lower tier of a Clos fabric 1080 will frequently belong to different ASNs, this includes the ability 1081 to load-balance to a prefix with different AS_PATH attribute values 1082 while having the same AS_PATH length; sometimes referred to as 1083 'multipath relax' or 'multipath multiple-AS' [RFC7938]. 1085 Failure detection relies upon declaring a BGP session down and 1086 removing any prefixes learnt over that session as soon as the link is 1087 declared down. As links between network elements predominantly use 1088 direct point-to-point fiber, a link failure should be detected within 1089 milliseconds. BFD is also commonly used to detect IP layer failures. 1091 5.9.3. Exchange of Inter-Domain Routes 1093 Labeled unicast BGP together with SR Prefix-SID extensions are used 1094 to exchange PNF and/or VNF endpoints between domains to create end- 1095 to-end connectivity without TE. When advertising between domains we 1096 assume that a given BGP prefix is advertised by at least two border 1097 routers (DCBs, ABRs, ASBRs) making prefixes reachable via at least 1098 two next-hops. 1100 BGP Prefix Independent Convergence (PIC) [I-D.ietf-rtgwg-bgp-pic] 1101 allows failover to a pre-computed and pre-installed secondary next- 1102 hop when the primary next-hop fails and is independent of the number 1103 of destination prefixes that are affected by the failure. When the 1104 primary BGP next-hop fails, it should be clear that BGP PIC depends 1105 on the availability o f a secondary next-hop in the Pathlist. To 1106 ensure that multiple paths to the same destination are visible the 1107 BGP ADD-PATH [RFC7911] can be used to allow for advertisement of 1108 multiple paths for the same address prefix. Dual-homed EVPN/IP-VPN 1109 prefixes also have the alternative option of allocating different 1110 Route-Distinguishers (RDs). To trigger the switch from primary to 1111 secondary next-hop PIC needs to detect the failure and many 1112 implementations support 'next-hop tracking' for this purpose. Next- 1113 hop tracking monitors the routing-table and if the next-hop prefix is 1114 removed will immediately invalidate all BGP prefixes learnt through 1115 that next-hop. In the absence of next-hop tracking, multihop BFD 1116 [RFC5883] could optionally be used as a fast failure detection 1117 mechanism. 1119 5.9.4. Controller Redundancy 1121 With the Interconnect controller providing an integral part of the 1122 networks' capabilities a redundant controller design is clearly 1123 prudent. To this end we can consider both availability and 1124 redundancy. Availability refers to the survivability of a single 1125 controller system in a failure scenario. A common strategy for 1126 increasing the availability of a single controller system is to build 1127 the system in a high-availability cluster such that it becomes a 1128 confederation of redundant constituent parts as opposed to a single 1129 monolithic system. Should a single part fail, the system can still 1130 survive without the requirement to failover to a standby controller 1131 system. Methods for detection of a failure of one or more member 1132 parts of the cluster are implementation specific. 1134 To provide contingency for a complete system failure a geo-redundant 1135 standby controller system is required. When redundant controllers 1136 are deployed a coherent strategy is needed that provides a master/ 1137 standby election mechanism, the ability to propagate the outcome of 1138 that election to network elements as required, an inter-system 1139 failure detection mechanism, and the ability to synchronize state 1140 across both systems such that the standby controller is fully aware 1141 of current state should it need to transition to master controller. 1143 Master/standby election, state synchronisation, and failure detection 1144 between geo-redundant sites can largely be considered a local 1145 implementation matter. The requirement to propagate the outcome of 1146 the master/standby election to network elements depends on a) the 1147 mechanism that is used to instantiate SR policies, and b) whether the 1148 SR policies are controller-initiated or headend-initiated, and these 1149 are discussed in the following sub-sections. In either scenario, 1150 state of SR policies should be advertised northbound to both master/ 1151 standby controllers using either PCEP LSP State Report messages or SR 1152 policy extensions to BGP link-state 1153 [I-D.ietf-idr-te-lsp-distribution]. 1155 5.9.4.1. SR Policy Initiator 1157 Controller-initiated SR policies are suited for auto-creation of 1158 tunnels based on service route discovery and policy-driven route/flow 1159 programming and are ephemeral. Headend-initiated tunnels allow for 1160 permanent configuration state to be held on the headend and are 1161 suitable for static services that are not subject to dynamic changes. 1162 If all SR policies are controller-initiated, it negates the 1163 requirement to propagate the outcome of the master/standby election 1164 to network elements. This is because headends have no requirement 1165 for unsolicited requests to a controller, and therefore have no 1166 requirement to know which controller is master and which one is 1167 standby. A headend may respond to a message from a controller, but 1168 it is not unsolicited. 1170 If some or all SR policies are headend-initiated, then the 1171 requirement to propagate the outcome of the master/standby election 1172 exists. This is further discussed in the following sub-section. 1174 5.9.4.2. SR Policy Instantiation Mechanism 1176 While candidate paths of SR policies may be provided using BGP, PCEP, 1177 Netconf, or local policy/configuration, this document primarily 1178 considers the use of PCEP or BGP. 1180 When PCEP [RFC5440][RFC8231][RFC8281] is used for instantiation of 1181 candidate paths of SR policies 1182 [I-D.barth-pce-segment-routing-policy-cp] every headend/PCC should 1183 establish a PCEP session with the master and standby controllers. To 1184 signal standby state to the PCC the standby controller may use a PCEP 1185 Notification message to set the PCEP session into overload state. 1186 While in this overload state the standby controller will accept path 1187 computation LSP state report (PCRpt) messages without delegation but 1188 will reject path computation requests (PCReq) and any path 1189 computation reports (PCRpt) with the delegation bit set. Further, 1190 the standby controller will not path computation originate initiate 1191 messages (PCInit) or path computation update request messages 1192 (PCUpd). In the event of the failure of the master controller, the 1193 standby controller will transition to active and remove the PCEP 1194 overload state. Following expiration of the PCEP redelegation 1195 timeout at the PCC any LSPs will be redelegated to the newly 1196 transitioned active controller. LSP state is not impacted unless 1197 redelegation is not possible before the state timeout interval 1198 expires. 1200 When BGP is used for instantiation of SR policies every headend 1201 should establish a BGP session with the master and standby controller 1202 capable of exchanging SR TE Policy SAFI. Candidate paths of SR 1203 policies are advertised only by the active controller. If the master 1204 controller should experience a failure, then SR policies learnt from 1205 that controller may be removed before they are re-advertised by the 1206 standby (or newly-active) controller. To minimize this possibility 1207 BGP speakers that advertise and instantiate SR policies can implement 1208 Long Lived Graceful Retart (LLGR) [I-D.ietf-idr-long-lived-gr], also 1209 known as BGP persistence, to retain existing routes treated as least- 1210 preferred until the new route arrives. In the absence of LLGR, two 1211 other alternatives are possible: 1213 * Provide a static backup SR policy. 1215 * Fallback to the default forwarding path. 1217 5.9.5. Path and Segment Liveliness 1219 When using traffic-engineered SR paths only the ingress router holds 1220 any state. The exception here is where BSIDs are used, which also 1221 implies some state is maintained at the BSID anchor. As there is no 1222 control plane set-up, it follows that there is no feedback loop from 1223 transit nodes of the path to notify the headend when a non-adjacent 1224 point of the SR path fails. The Interconnect controller however is 1225 aware of all paths that are impacted by a given network failure and 1226 should take the appropriate action. This action could include 1227 withdrawing an SR policy if a suitable candidate path is already in 1228 place, or simply sending a new SR policy with a different segment- 1229 list and a higher preference value assigned to it. 1231 Verification of data plane liveliness is the responsibility of the 1232 path headend. A given SR policy may be associated with multiple 1233 candidate paths and for the sake of clarity, we'll assume two for 1234 redundancy purposes (which can be diversely routed). Verification of 1235 the liveliness of these paths can be achieved using seamless BFD 1236 (S-BFD)[RFC7880], which provides an in-band failure detection 1237 mechanism capable of detecting failure in the order of tens of 1238 milliseconds. Upon failure of the active path, failover to a 1239 secondary candidate path can be activated at the path headend. 1240 Details of the actual failover and revert mechanisms are a local 1241 implementation matter. 1243 S-BFD provides a fast and scalable failure detection mechanism but is 1244 unlikely to be implemented in many VNFs given their inability to 1245 offload the process to purpose-built hardware. In the absence of an 1246 active failure detection mechanism such as S-BFD the failover from 1247 active path to secondary candidate path can be triggered using 1248 continuous path validity checks. One of the criteria that a 1249 candidate path uses to determine its validity is the ability to 1250 perform path resolution for the first SID to one or more outgoing 1251 interface(s) and next-hop(s). From the perspective of the VNF 1252 headend the first SID in the segment-list will very likely be the DCB 1253 (as BSID anchor) but could equally be another Prefix-SID hop within 1254 the data center. Should this segment experience a non-recoverable 1255 failure, the headend will be unable to resolve the first SID and the 1256 path will be considered invalid. This will trigger a failover action 1257 to a secondary candidate path. 1259 Injection of S-BFD packets is not just constrained to the source of 1260 an end-to-end LSP. When an S-BFD packet is injected into an SR 1261 policy path it is encapsulated with the label stack of the associated 1262 segment-list. It is possible therefore to run S-BFD from a BSID 1263 anchor for just that section of the end-to-end path (for example, 1264 from DCB to DCB). This allows a BSID anchor to detect failure of a 1265 path and take corrective action, while maintaining opacity between 1266 domains. 1268 5.10. Scalability 1270 There are many aspects to consider regarding scalability of the NFIX 1271 architecture. The building blocks of NFIX are standards-based 1272 technologies individually designed to scale for internet provider 1273 networks. When combined they provide a flexible and scalable 1274 solution: 1276 * BGP has been proven to scale and operate with millions of routes 1277 being exchanged. Specifically, BGP labeled unicast has been 1278 deployed and proven to scale in existing seamless-MPLS networks. 1280 * By placing forwarding instructions in the header of a packet, 1281 segment routing reduces the amount of state required in the 1282 network allowing the scale of greater number of transport tunnels. 1283 This aids in the feasibility of the NFIX architecture to permit 1284 the automated aspects of SR policy creation without having an 1285 impact on the state in the core of the network. 1287 * The choice of utilizing native SR-MPLS or SR over IP in the data 1288 center continues to permit horizontal scaling without introducing 1289 new state inside of the data center fabric while still permitting 1290 seamless end to end path forwarding integration. 1292 * BSIDs play a key role in the NFIX architecture as their use 1293 provides the ability to traffic-engineer across large network 1294 topologies consisting of many hops regardless of hardware 1295 capability at the headend. From a scalability perspective the use 1296 of BSIDs facilitates better scale due to the fact that detailed 1297 information about the SR paths in a domain has been abstracted and 1298 localized to the BSID anchor point only. When BSIDs are re-used 1299 amongst one or many headends they reduce the amount of path 1300 calculation and updates required at network edges while still 1301 providing seamless end to end path forwarding. 1303 * The architecture of NFIX continues to use an independent DC 1304 controller. This allows continued independent scaling of data 1305 center management in both policy and local forwarding functions, 1306 while off-loading the end-to-end optimal path placement and 1307 automation to the Interconnect controller. The optimal path 1308 placement is already a scalable function provided in a PCE 1309 architecture. The Interconnect controller must compute paths, but 1310 it is not burdened by the management of virtual entity lifecycle 1311 and associated forwarding policies. 1313 It must be acknowledged that with the amalgamation of the technology 1314 building blocks and the automation required by NFIX, there is an 1315 additional burden on the Interconnect controller. The scaling 1316 considerations are dependent on many variables, but an implementation 1317 of a Interconnect controller shares many overlapping traits and 1318 scaling concerns as PCE, where the controller and PCE both must: 1320 * Discover and listen to topological state changes of the IP/MPLS 1321 topology. 1323 * Compute traffic-engineered intra and inter domain paths across 1324 large service provider topologies. 1326 * Synchronize, track and update thousands of LSPs to network devices 1327 upon network state changes. 1329 Both entail topologies that contain tens of thousands of nodes and 1330 links. The Interconnect controller in an NFIX architecture takes on 1331 the additional role of becoming end to end service aware and 1332 discovering data center entities that were traditionally excluded 1333 from a controllers scope. Although not exhaustive, an NFIX 1334 Interconnect controller is impacted by some of the following: 1336 * The number of individual services, the number of endpoints that 1337 may exist in each service, the distribution of endpoints in a 1338 virtualized environment, and how many data centers may exist. 1339 Medium or large sized data centers may be capable to host more 1340 virtual endpoints per host, but with the move to smaller edge- 1341 clouds the number of headends that require inter-connectivity 1342 increases compared to the density of localized routing in a 1343 centralized data center model. The outcome has an impact on the 1344 number of headend devices which may require tunnel management by 1345 the Interconnect controller. 1347 * Assuming a given BSID satisfies SLA, the ability to re-use BSIDs 1348 across multiple services reduces the number of paths to track and 1349 manage. However, the number of color or unique SLA definitions, 1350 and criteria such as bandwidth constraints impacts WAN traffic 1351 distribution requirements. As BSIDs play a key role for VNF 1352 connectivity, this potentially increases the number of BSID paths 1353 required to permit appropriate traffic distribution. This also 1354 impacts the number of tunnels which may be re-used on a given 1355 headend for different services. 1357 * The frequency of virtualized hosts being created and destroyed and 1358 the general activity within a given service. The controller must 1359 analyze, track, and correlate the activity of relevant BGP routes 1360 to track addition and removal of service host or host subnets, and 1361 determine whether new SR policies should be instantiated, or stale 1362 unused SR policies should be removed from the network. 1364 * The choice of SR instantiation mechanism impacts the number of 1365 communication sessions the controller may require. For example, 1366 the BGP based mechanism may only require a small number of 1367 sessions to route reflectors, whereas PCEP may require a 1368 connection to every possible leaf in the network and any BSID 1369 anchors. 1371 * The number of hops within one or many WAN domains may affect the 1372 number of BSIDs required to provide transit for VNF/PNF, PNF/PNF, 1373 or VNF/VNF inter-connectivity. 1375 * Relative to traditional WAN topologies, traditional data centers 1376 are generally topologically denser in node and link connectivity 1377 which is required to be discovered by the Interconnect controller, 1378 resulting in a much larger, dense link-state database on the 1379 Interconnect controller. 1381 5.10.1. Asymmetric Model B for VPN Families 1383 With the instantiation of multiple TE paths between any two VNFs in 1384 the NFIX network, the number of SR Policy (remote endpoint, color) 1385 routes, BSIDs and labels to support on VNFs becomes a choke point in 1386 the architecture. The fact that some VNFs are limited in terms of 1387 forwarding resources makes this aspect an important scale issue. 1389 As an example, if VNF1 and VNF2 in Figure 1 are associated to 1390 multiple topologies 1..n, the Interconnect controller will 1391 instantiate n TE paths in VNF1 to reach VNF2: 1393 [VNF1,color-1,VNF2] --> BSID 1 1395 [VNF1,color-2,VNF2] --> BSID 2 1397 ... 1399 [VNF1,color-n,VNF2] --> BSID n 1401 Similarly, m TE paths may be instantiated on VNF1 to reach VNF3, 1402 another p TE paths to reach VNF4, and so on for all the VNFs that 1403 VNF1 needs to communicate with in DC2. As it can be observed, the 1404 number of forwarding resources to be instantiated on VNF1 may 1405 significantly grow with the number of remote [endpoint, color] pairs, 1406 compared with a best-effort architecture in which the number 1407 forwarding resources in VNF1 grows with the number of endpoints only. 1409 This scale issue on the VNFs can be relieved by the use of an 1410 asymmetric model B service layer. The concept is illustrated in 1411 Figure 3. 1413 +------------+ 1414 <-------------------------------------| WAN | 1415 | SR Policy +-------------------| Controller | 1416 | BSID m | SR Policy +------------+ 1417 v {DCI1,n,DCI2} v BSID n 1418 {1,2,3,4,5,DCI2} 1419 +----------------+ +----------------+ +----------------+ 1420 | +----+ | | | | +----+ | 1421 +----+ | RR | +----+ +----+ | RR | +----+ 1422 |VNF1| +----+ |DCI1| |DCI2| +----+ |VNF2| 1423 +----+ +----+ +----+ +----+ 1424 | DC1 | | WAN | | DC2 | 1425 +----------------+ +----------------+ +----------------+ 1427 <-------- <-------------------------- NHS <------ <------ 1428 EVPN/VPN-IPv4/v6(colored) 1430 +-----------------------------------> +-------------> 1431 TE path to DCI2 ECMP path to VNF2 1432 (BSID to segment-list 1433 expansion on DCI1) 1435 Figure 4 1437 Asymmetric Model B Service Layer 1439 Consider the different n topologies needed between VNF1 and VNF2 are 1440 really only relevant to the different TE paths that exist in the WAN. 1441 The WAN is the domain in the network where there can be significant 1442 differences in latency, throughput or packet loss depending on the 1443 sequence of nodes and links the traffic goes through. Based on that 1444 assumption, for traffic from VNF1 to DCB2 in Figure 4, traffic from 1445 DCB2 to VNF2 can simply take an ECMP path. In this case an 1446 asymmetric model B Service layer can significantly relieve the scale 1447 pressure on VNF1. 1449 From a service layer perspective, the NFIX architecture described up 1450 to now can be considered 'symmetric', meaning that the EVPN/IPVPN 1451 advertisements from e.g., VNF2 in Figure 2, are received on VNF1 with 1452 the next-hop of VNF2, and vice versa for VNF1's routes on VNF2. SR 1453 Policies to each VNF2 [endpoint, color] are then required on the 1454 VNF1. 1456 In the 'asymmetric' service design illustrated in Figure 4, VNF2's 1457 EVPN/IPVPN routes are received on VNF1 with the next-hop of DCB2, and 1458 VNF1's routes are received on VNF2 with next-hop of DCB1. Now SR 1459 policies instantiated on VNFs can be reduced to only the number of TE 1460 paths required to reach the remote DCB. For example, considering n 1461 topologies, in a symmetric model VNF1 has to be instantiated with n 1462 SR policy paths per remote VNF in DC2, whereas in the asymmetric 1463 model of Figure 4, VNF1 only requires n SR policy paths per DC, i.e., 1464 to DCB2. 1466 Asymmetric model B is a simple design choice that only requires the 1467 ability (on the DCB nodes) to set next-hop-self on the EVPN/IPVPN 1468 routes advertised to the WAN neighbors and not do next-hop-self for 1469 routes advertised to the DC neighbors. With this option, the 1470 Interconnect controller only needs to establish TE paths from VNFs to 1471 remote DCBs, as opposed to VNFs to remote VNFs. 1473 6. Illustration of Use 1475 For the purpose of illustration, this section provides some examples 1476 of how different end-to-end tunnels are instantiated (including the 1477 relevant protocols, SID values/label stacks etc.) and how services 1478 are then overlaid onto those LSPs. 1480 6.1. Reference Topology 1482 The following network diagram illustrates the reference network 1483 topology that is used for illustration purposes in this section. 1484 Within the data centers leaf and spine network elements may be 1485 present but are not shown for the purpose of clarity. 1487 +----------+ 1488 |Controller| 1489 +----------+ 1490 / | \ 1491 +----+ +----+ +----+ +----+ 1492 ~ ~ ~ ~ | R1 |----------| R2 |----------| R3 |-----|AGN1| ~ ~ ~ ~ 1493 ~ +----+ +----+ +----+ +----+ ~ 1494 ~ DC1 | / | | DC2 ~ 1495 +----+ | L=5 +----+ L=5 / | +----+ +----+ 1496 | Sn | | +-------| R4 |--------+ | |AGN2| | Dn | 1497 +----+ | / M=20 +----+ M=20 | +----+ +----+ 1498 ~ | / | | ~ 1499 ~ +----+ +----+ +----+ +----+ +----+ ~ 1500 ~ ~ ~ ~ | R5 |-----| R6 |----| R7 |-----| R8 |-----|AGN3| ~ ~ ~ ~ 1501 +----+ +----+ +----+ +----+ +----+ 1503 Figure 5 1505 Reference Topology 1507 The following applies to the reference topology in figure 5: 1509 * Data center 1 and data center 2 both run BGP/SR. Both data 1510 centers run leaf/spine topologies, which are not shown for the 1511 purpose of clarity. 1513 * R1 and R5 function as data center border routers for DC 1. AGN1 1514 and AGN3 function as data center border routers for DC 2. 1516 * Routers R1 through R8 form an independent ISIS-OSPF/SR instance. 1518 * Routers R3, R8, AGN1, AGN2, and AGN2 form an independent ISIS- 1519 OSPF/SR instance. 1521 * All IGP link metrics within the wide area network are metric 10 1522 except for links R5-R4 and R4-R3 which are both metric 20. 1524 * All links have a unidirectional latency of 10 milliseconds except 1525 for links R5-R4 and R4-R3 which both have a unidirectional latency 1526 of 5 milliseconds. 1528 * Source 'Sn' and destination 'Dn' represent one or more network 1529 functions. 1531 6.2. PNF to PNF Connectivity 1533 The first example demonstrates the simplest form of connectivity; PNF 1534 to PNF. The example illustrates the instantiation of a 1535 unidirectional TE path from R1 to AGN2 and its consumption by an EVPN 1536 service. The service has a requirement for high-throughput with no 1537 strict latency requirements. These service requirements are 1538 catalogued and represented using the color blue. 1540 * An EVPN service is provisioned at R1 and AGN2. 1542 * The Interconnect controller computes the path from R1 to AGN2 and 1543 calculates that the optimal path based on the service requirements 1544 and overall network optimization is R1-R5-R6-R7-R8-AGN3-AGN2. The 1545 segment-list to represent the calculated path could be constructed 1546 in numerous ways. It could be strict hops represented by a series 1547 of Adj-SIDs. It could be loose hops using ECMP-aware Node-SIDs, 1548 for example {R7, AGN2}, or it could be a combination of both Node- 1549 SIDs and Adj-SIDs. Equally, BSIDs could be used to reduce the 1550 number of labels that need to be imposed at the headend. In this 1551 example, strict Adj-SID hops are used with a BSID at the area 1552 border router R8, but this should not be interpreted as the only 1553 way a path and segment-list can be represented. 1555 * The Interconnect controller advertises a BGP SR Policy to R8 with 1556 BSID 1000, and a segment-list containing segments {AGN3, AGN2}. 1558 * The Interconnect controller advertises a BGP SR Policy to R1 with 1559 BSID 1001, and a segment-list containing segments {R5, R6, R7, R8, 1560 1000}. The policy is identified using the tuple [headed = R1, 1561 color = blue, endpoint = AGN2]. 1563 * AGN2 advertises an EVPN MAC Advertisement Route for MAC M1, which 1564 is learned by R1. The route has a next-hop of AGN2, an MPLS label 1565 of L1, and it carries a color extended community with the value 1566 blue. 1568 * R1 has a valid SR policy [color = blue, next-hop = AGN2] with 1569 segment-list {R5, R6, R7, R8, 1000}. R1 therefore associates the 1570 MAC address M1 with that policy and programs the relevant 1571 information into the forwarding path. 1573 * The Interconnect controller also learns the EVPN MAC Route 1574 advertised by AGN2. The purpose of this is two-fold. It allows 1575 the controller to correlate the service overlay with the 1576 underlying transport LSPs, thus creating a service connectivity 1577 map. It also allows the controller to dynamically create LSPs 1578 based upon service requirements if they do not already exist, or 1579 to optimize them if network conditions change. 1581 6.3. VNF to PNF Connectivity 1583 The next example demonstrates VNF to PNF connectivity and illustrates 1584 the instantiation of a unidirectional TE path from S1 to AGN2. The 1585 path is consumed by an IP-VPN service that has a basic set of service 1586 requirements and as such simply uses IGP metric as a path computation 1587 objective. These basic service requirements are cataloged and 1588 represented using the color red. 1590 In this example S1 is a VNF with full IP routing and MPLS capability 1591 that interfaces to the data center underlay/overlay and serves as the 1592 NVO tunnel endpoint. 1594 * An IP-VPN service is provisioned at S1 and AGN2. 1596 * The Interconnect controller computes the path from S1 to AGN2 and 1597 calculates that the optimal path based on IGP metric is 1598 R1-R2-R3-AGN1-AGN2. 1600 * The Interconnect controller advertises a BGP SR Policy to R1 with 1601 BSID 1002, and a segment-list containing segments {R2, R3, AGN1, 1602 AGN2}. 1604 * The Interconnect controller advertises a BGP SR Policy to S1 with 1605 BSID 1003, and a segment-list containing segments {R1, 1002}. The 1606 policy is identified using the tuple [headend = S1, color = red, 1607 endpoint = AGN2]. 1609 * Source S1 learns an VPN-IPv4 route for prefix P1, next-hop AGN2. 1610 The route has an VPN label of L1, and it carries a color extended 1611 community with value red. 1613 * S1 has a valid SR policy [color = red, endpoint = AGN2] with 1614 segment-list {R1, 1002} and BSID 1003. S1 therefore associates 1615 the VPN-IPv4 prefix P1 with that policy and programs the relevant 1616 information into the forwarding path. 1618 * As in the previous example the Interconnect controller also learns 1619 the VPN-IPv4 route advertised by AGN2 in order to correlate the 1620 service overlay with the underlying transport LSPs, creating or 1621 optimizing them as required. 1623 6.4. VNF to VNF Connectivity 1625 The last example demonstrates VNF to VNF connectivity and illustrates 1626 the instantiation of a unidirectional TE path from S2 to D2. The 1627 path is consumed by an EVPN service that requires low latency as a 1628 service requirement and as such uses latency as a path computation 1629 objective. This service requirement is cataloged and represented 1630 using the color green. 1632 In this example S2 is a VNF that has no routing capability. It is 1633 hosted by hypervisor H1 that in turn has an interface to a DC 1634 controller through which forwarding instructions are programmed. H1 1635 serves as the NVO tunnel endpoint and overlay next-hop. 1637 D2 is a VNF with partial routing capability that is connected to a 1638 leaf switch L1. L1 connects to underlay/overlay in data center 2 and 1639 serves as the NVO tunnel endpoint for D2. L1 advertises BGP Prefix- 1640 SID 9001 into the underlay. 1642 * The relevant details of the EVPN service are entered in the data 1643 center policy engines within data center 1 and 2. 1645 * Source S2 is turned-up. Hypervisor H1 notifies its parent DC 1646 controller, which in turn retrieves the service (EVPN) 1647 information, color, IP and MAC information from the policy engine 1648 and subsequently programs the associated forwarding entries onto 1649 S2. The DC controller also dynamically advertises an EVPN MAC 1650 Advertisement Route for S2's IP and MAC into the overlay with 1651 next-hop H1. (This would trigger the return path set-up between 1652 L1 and H2 not covered in this example.) 1654 * The DC controller in data center 1 learns an EVPN MAC 1655 Advertisement Route for D2, MAC M, next-nop L1. The route has an 1656 MPLS label of L2, and it carries a color extended community with 1657 the value green. 1659 * The Interconnect controller computes the path between H1 and L1 1660 and calculates that the optimal path based on latency is 1661 R5-R4-R3-AGN1. 1663 * The Interconnect controller advertises a BGP SR Policy to R5 with 1664 BSID 1004, and a segment-list containing segments {R4, R3, AGN1}. 1666 * The Interconnect controller advertises a BGP SR Policy to the DC 1667 controller in data center 1 with BSID 1005 and a segment-list 1668 containing segments {R5, 1004, 9001}. The policy is identified 1669 using the tuple [headend = H1, color = green, endpoint = L1]. 1671 * The DC controller in data center 1 has a valid SR policy [color = 1672 green, endpoint = L1] with segment-list {R5, 1004, 9001} and BSID 1673 1005. The controller therefore associates the MAC Advertisement 1674 Route with that policy, and programs the associated forwarding 1675 rules into S2. 1677 * As in the previous example the Interconnect controller also learns 1678 the MAC Advertisement Route advertised by D2 in order to correlate 1679 the service overlay with the underlying transport LSPs, creating 1680 or optimizing them as required. 1682 7. Conclusions 1684 The NFIX architecture provides an evolutionary path to a unified 1685 network fabric. It uses the base constructs of seamless-MPLS and 1686 adds end-to-end LSPs capable of delivering against SLAs, seamless 1687 data center interconnect, service differentiation, service function 1688 chaining, and a Layer-2/Layer-3 infrastructure capable of 1689 interconnecting PNF-to-PNF, PNF-to-VNF, and VNF-to-VNF. 1691 NFIX establishes a dynamic, seamless, and automated connectivity 1692 model that overcomes the operational barriers and interworking issues 1693 between data centers and the wide-area network and delivers the 1694 following using standards-based protocols: 1696 * A unified routing control plane: Multiprotocol BGP (MP-BGP) to 1697 acquire inter-domain NLRI from the IP/MPLS underlay and the 1698 virtualized IP-VPN/EVPN service overlay. 1700 * A unified forwarding control plane: SR provides dynamic service 1701 tunnels with fast restoration options to meet deterministic 1702 bandwidth, latency and path diversity constraints. SR utilizes 1703 the appropriate data path encapsulation for seamless, end-to-end 1704 connectivity between distributed edge and core data centers across 1705 the wide-area network. 1707 * Service Function Chaining: Leverage SFC extensions for BGP and 1708 segment routing to interconnect network and service functions into 1709 SFPs, with support for various data path implementations. 1711 * Service Differentiation: Provide a framework that allows for 1712 construction of logical end-to-end networks with differentiated 1713 logical topologies and/or constraints through use of SR policies 1714 and coloring. 1716 * Automation: Facilitates automation of service provisioning and 1717 avoids heavy service interworking at DCBs. 1719 NFIX is deployable on existing data center and wide-area network 1720 infrastructures and allows the underlying data forwarding plane to 1721 evolve with minimal impact on the services plane. 1723 8. Security Considerations 1725 The NFIX architecture based on SR-MPLS is subject to the same 1726 security concerns as any MPLS network. No new protocols are 1727 introduced, hence security issues of the protocols encompassed by 1728 this architecture are addressed within the relevant individual 1729 standards documents. It is recommended that the security framework 1730 for MPLS and GMPLS networks defined in [RFC5920] are adhered to. 1731 Although [RFC5920] focuses on the use of RSVP-TE and LDP control 1732 plane, the practices and procedures are extendable to an SR-MPLS 1733 domain. 1735 The NFIX architecture makes extensive use of Multiprotocol BGP, and 1736 it is recommended that the TCP Authentication Option (TCP-AO) 1737 [RFC5925] is used to protect the integrity of long-lived BGP sessions 1738 and any other TCP-based protocols. 1740 Where PCEP is used between controller and path headend the use of 1741 PCEPS [RFC8253] is recommended to provide confidentiality to PCEP 1742 communication using Transport Layer Security (TLS). 1744 9. Acknowledgements 1746 The authors would like to acknowledge Mustapha Aissaoui, Wim 1747 Henderickx, and Gunter Van de Velde. 1749 10. Contributors 1751 The following people contributed to the content of this document and 1752 should be considered co-authors. 1754 Juan Rodriguez 1755 Nokia 1756 United States of America 1758 Email: juan.rodriguez@nokia.com 1760 Jorge Rabadan 1761 Nokia 1762 United States of America 1764 Email: jorge.rabadan@nokia.com 1766 Nick Morris 1767 Verizon 1768 United States of America 1770 Email: nicklous.morris@verizonwireless.com 1772 Eddie Leyton 1773 Verizon 1774 United States of America 1776 Email: edward.leyton@verizonwireless.com 1778 Figure 6 1780 11. IANA Considerations 1782 This memo does not include any requests to IANA for allocation. 1784 12. References 1786 12.1. Normative References 1788 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1789 Requirement Levels", BCP 14, RFC 2119, March 1997, 1790 . 1792 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1793 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1794 May 2017, . 1796 12.2. Informative References 1798 [I-D.ietf-nvo3-geneve] 1799 Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic 1800 Network Virtualization Encapsulation", Work in Progress, 1801 Internet-Draft, draft-ietf-nvo3-geneve-16, 7 March 2020, 1802 . 1805 [I-D.ietf-mpls-seamless-mpls] 1806 Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz, 1807 M., and D. Steinberg, "Seamless MPLS Architecture", Work 1808 in Progress, Internet-Draft, draft-ietf-mpls-seamless- 1809 mpls-07, 28 June 2014, . 1812 [I-D.ietf-bess-evpn-ipvpn-interworking] 1813 Rabadan, J., Sajassi, A., Rosen, E., Drake, J., Lin, W., 1814 Uttaro, J., and A. Simpson, "EVPN Interworking with 1815 IPVPN", Work in Progress, Internet-Draft, draft-ietf-bess- 1816 evpn-ipvpn-interworking-06, 22 September 2021, 1817 . 1820 [I-D.ietf-spring-segment-routing-policy] 1821 Filsfils, C., Talaulikar, K., Voyer, D., Bogdanov, A., and 1822 P. Mattes, "Segment Routing Policy Architecture", Work in 1823 Progress, Internet-Draft, draft-ietf-spring-segment- 1824 routing-policy-14, 25 October 2021, 1825 . 1828 [I-D.ietf-rtgwg-segment-routing-ti-lfa] 1829 Litkowski, S., Bashandy, A., Filsfils, C., Francois, P., 1830 Decraene, B., and D. Voyer, "Topology Independent Fast 1831 Reroute using Segment Routing", Work in Progress, 1832 Internet-Draft, draft-ietf-rtgwg-segment-routing-ti-lfa- 1833 07, 29 June 2021, . 1836 [I-D.ietf-bess-nsh-bgp-control-plane] 1837 Farrel, A., Drake, J., Rosen, E., Uttaro, J., and L. 1838 Jalil, "BGP Control Plane for the Network Service Header 1839 in Service Function Chaining", Work in Progress, Internet- 1840 Draft, draft-ietf-bess-nsh-bgp-control-plane-18, 21 August 1841 2020, . 1844 [I-D.ietf-idr-te-lsp-distribution] 1845 Previdi, S., Talaulikar, K., Dong, J., Chen, M., Gredler, 1846 H., and J. Tantsura, "Distribution of Traffic Engineering 1847 (TE) Policies and State using BGP-LS", Work in Progress, 1848 Internet-Draft, draft-ietf-idr-te-lsp-distribution-16, 22 1849 October 2021, . 1852 [I-D.barth-pce-segment-routing-policy-cp] 1853 Koldychev, M., Sivabalan, S., Barth, C., Peng, S., and H. 1854 Bidgoli, "PCEP extension to support Segment Routing Policy 1855 Candidate Paths", Work in Progress, Internet-Draft, draft- 1856 barth-pce-segment-routing-policy-cp-06, 2 June 2020, 1857 . 1860 [I-D.filsfils-spring-sr-policy-considerations] 1861 Filsfils, C., Talaulikar, K., Krol, P., Horneffer, M., and 1862 P. Mattes, "SR Policy Implementation and Deployment 1863 Considerations", Work in Progress, Internet-Draft, draft- 1864 filsfils-spring-sr-policy-considerations-08, 22 October 1865 2021, . 1868 [I-D.ietf-rtgwg-bgp-pic] 1869 Bashandy, A., Filsfils, C., and P. Mohapatra, "BGP Prefix 1870 Independent Convergence", Work in Progress, Internet- 1871 Draft, draft-ietf-rtgwg-bgp-pic-17, 12 October 2021, 1872 . 1875 [I-D.ietf-isis-mpls-elc] 1876 Xu, X., Kini, S., Psenak, P., Filsfils, C., Litkowski, S., 1877 and M. Bocci, "Signaling Entropy Label Capability and 1878 Entropy Readable Label Depth Using IS-IS", Work in 1879 Progress, Internet-Draft, draft-ietf-isis-mpls-elc-13, 28 1880 May 2020, . 1883 [I-D.ietf-ospf-mpls-elc] 1884 Xu, X., Kini, S., Psenak, P., Filsfils, C., Litkowski, S., 1885 and M. Bocci, "Signaling Entropy Label Capability and 1886 Entropy Readable Label Depth Using OSPF", Work in 1887 Progress, Internet-Draft, draft-ietf-ospf-mpls-elc-15, 1 1888 June 2020, . 1891 [I-D.ietf-idr-next-hop-capability] 1892 Decraene, B., Kompella, K., and W. Henderickx, "BGP Next- 1893 Hop dependent capabilities", Work in Progress, Internet- 1894 Draft, draft-ietf-idr-next-hop-capability-07, 8 December 1895 2021, . 1898 [I-D.ietf-spring-segment-routing-central-epe] 1899 Filsfils, C., Previdi, S., Dawra, G., Aries, E., and D. 1900 Afanasiev, "Segment Routing Centralized BGP Egress Peer 1901 Engineering", Work in Progress, Internet-Draft, draft- 1902 ietf-spring-segment-routing-central-epe-10, 21 December 1903 2017, . 1906 [I-D.ietf-idr-long-lived-gr] 1907 Uttaro, J., Chen, E., Decraene, B., and J. G. Scudder, 1908 "Support for Long-lived BGP Graceful Restart", Work in 1909 Progress, Internet-Draft, draft-ietf-idr-long-lived-gr-00, 1910 5 September 2019, . 1913 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 1914 BGP for Routing in Large-Scale Data Centers", RFC 7938, 1915 DOI 10.17487/RFC7938, August 2016, 1916 . 1918 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 1919 S. Ray, "North-Bound Distribution of Link-State and 1920 Traffic Engineering (TE) Information Using BGP", RFC 7752, 1921 DOI 10.17487/RFC7752, March 2016, 1922 . 1924 [RFC8277] Rosen, E., "Using BGP to Bind MPLS Labels to Address 1925 Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017, 1926 . 1928 [RFC8667] Previdi, S., Ed., Ginsberg, L., Ed., Filsfils, C., 1929 Bashandy, A., Gredler, H., and B. Decraene, "IS-IS 1930 Extensions for Segment Routing", RFC 8667, 1931 DOI 10.17487/RFC8667, December 2019, 1932 . 1934 [RFC8665] Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler, 1935 H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF 1936 Extensions for Segment Routing", RFC 8665, 1937 DOI 10.17487/RFC8665, December 2019, 1938 . 1940 [RFC8669] Previdi, S., Filsfils, C., Lindem, A., Ed., Sreekantiah, 1941 A., and H. Gredler, "Segment Routing Prefix Segment 1942 Identifier Extensions for BGP", RFC 8669, 1943 DOI 10.17487/RFC8669, December 2019, 1944 . 1946 [RFC8663] Xu, X., Bryant, S., Farrel, A., Hassan, S., Henderickx, 1947 W., and Z. Li, "MPLS Segment Routing over IP", RFC 8663, 1948 DOI 10.17487/RFC8663, December 2019, 1949 . 1951 [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, 1952 "Advertisement of Multiple Paths in BGP", RFC 7911, 1953 DOI 10.17487/RFC7911, July 2016, 1954 . 1956 [RFC7880] Pignataro, C., Ward, D., Akiya, N., Bhatia, M., and S. 1957 Pallagatti, "Seamless Bidirectional Forwarding Detection 1958 (S-BFD)", RFC 7880, DOI 10.17487/RFC7880, July 2016, 1959 . 1961 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1962 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1963 2006, . 1965 [RFC5920] Fang, L., Ed., "Security Framework for MPLS and GMPLS 1966 Networks", RFC 5920, DOI 10.17487/RFC5920, July 2010, 1967 . 1969 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 1970 "Specification of the IP Flow Information Export (IPFIX) 1971 Protocol for the Exchange of Flow Information", STD 77, 1972 RFC 7011, DOI 10.17487/RFC7011, September 2013, 1973 . 1975 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 1976 and A. Bierman, Ed., "Network Configuration Protocol 1977 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 1978 . 1980 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 1981 the Network Configuration Protocol (NETCONF)", RFC 6020, 1982 DOI 10.17487/RFC6020, October 2010, 1983 . 1985 [RFC7854] Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP 1986 Monitoring Protocol (BMP)", RFC 7854, 1987 DOI 10.17487/RFC7854, June 2016, 1988 . 1990 [RFC8300] Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed., 1991 "Network Service Header (NSH)", RFC 8300, 1992 DOI 10.17487/RFC8300, January 2018, 1993 . 1995 [RFC5440] Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation 1996 Element (PCE) Communication Protocol (PCEP)", RFC 5440, 1997 DOI 10.17487/RFC5440, March 2009, 1998 . 2000 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 2001 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 2002 eXtensible Local Area Network (VXLAN): A Framework for 2003 Overlaying Virtualized Layer 2 Networks over Layer 3 2004 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 2005 . 2007 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 2008 Virtualization Using Generic Routing Encapsulation", 2009 RFC 7637, DOI 10.17487/RFC7637, September 2015, 2010 . 2012 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 2013 Label Switching Architecture", RFC 3031, 2014 DOI 10.17487/RFC3031, January 2001, 2015 . 2017 [RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. 2018 Narten, "An Architecture for Data-Center Network 2019 Virtualization over Layer 3 (NVO3)", RFC 8014, 2020 DOI 10.17487/RFC8014, December 2016, 2021 . 2023 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 2024 Decraene, B., Litkowski, S., and R. Shakir, "Segment 2025 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 2026 July 2018, . 2028 [RFC5883] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 2029 (BFD) for Multihop Paths", RFC 5883, DOI 10.17487/RFC5883, 2030 June 2010, . 2032 [RFC8231] Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path 2033 Computation Element Communication Protocol (PCEP) 2034 Extensions for Stateful PCE", RFC 8231, 2035 DOI 10.17487/RFC8231, September 2017, 2036 . 2038 [RFC8281] Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "Path 2039 Computation Element Communication Protocol (PCEP) 2040 Extensions for PCE-Initiated LSP Setup in a Stateful PCE 2041 Model", RFC 8281, DOI 10.17487/RFC8281, December 2017, 2042 . 2044 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 2045 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 2046 June 2010, . 2048 [RFC8253] Lopez, D., Gonzalez de Dios, O., Wu, Q., and D. Dhody, 2049 "PCEPS: Usage of TLS to Provide a Secure Transport for the 2050 Path Computation Element Communication Protocol (PCEP)", 2051 RFC 8253, DOI 10.17487/RFC8253, October 2017, 2052 . 2054 [RFC6790] Kompella, K., Drake, J., Amante, S., Henderickx, W., and 2055 L. Yong, "The Use of Entropy Labels in MPLS Forwarding", 2056 RFC 6790, DOI 10.17487/RFC6790, November 2012, 2057 . 2059 [RFC8662] Kini, S., Kompella, K., Sivabalan, S., Litkowski, S., 2060 Shakir, R., and J. Tantsura, "Entropy Label for Source 2061 Packet Routing in Networking (SPRING) Tunnels", RFC 8662, 2062 DOI 10.17487/RFC8662, December 2019, 2063 . 2065 [RFC8491] Tantsura, J., Chunduri, U., Aldrin, S., and L. Ginsberg, 2066 "Signaling Maximum SID Depth (MSD) Using IS-IS", RFC 8491, 2067 DOI 10.17487/RFC8491, November 2018, 2068 . 2070 [RFC8476] Tantsura, J., Chunduri, U., Aldrin, S., and P. Psenak, 2071 "Signaling Maximum SID Depth (MSD) Using OSPF", RFC 8476, 2072 DOI 10.17487/RFC8476, December 2018, 2073 . 2075 Authors' Addresses 2077 Colin Bookham (editor) 2078 Nokia 2079 740 Waterside Drive 2080 Almondsbury, Bristol 2081 United Kingdom 2083 Email: colin.bookham@nokia.com 2085 Andrew Stone 2086 Nokia 2087 600 March Road 2088 Kanata, Ontario 2089 Canada 2091 Email: andrew.stone@nokia.com 2093 Jeff Tantsura 2094 Microsoft 2096 Email: jefftant.ietf@gmail.com 2098 Muhammad Durrani 2099 Equinix Inc 2100 1188 Arques Ave 2101 Sunnyvale CA, 2102 United States of America 2104 Email: mdurrani@equinix.com 2106 Bruno Decraene 2107 Orange 2108 38-40 Rue de General Leclerc 2109 92794 Issey Moulineaux cedex 9 2110 France 2112 Email: bruno.decraene@orange.com