idnits 2.17.1 draft-ietf-bess-service-chaining-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 6 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 12, 2018) is 2237 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'I-D.fang-l3vpn-virtual-ce' is mentioned on line 239, but not defined == Missing Reference: 'I-D.fang-l3vpn-virtual-pe' is mentioned on line 245, but not defined == Missing Reference: 'RFC3031' is mentioned on line 627, but not defined == Missing Reference: 'RFC4360' is mentioned on line 1561, but not defined == Unused Reference: 'RFC2328' is defined on line 1611, but no explicit reference was found in the text == Unused Reference: 'RFC5575' is defined on line 1639, but no explicit reference was found in the text == Unused Reference: 'RFC 8300' is defined on line 1648, but no explicit reference was found in the text == Unused Reference: 'RFC6241' is defined on line 1661, but no explicit reference was found in the text == Unused Reference: 'RFC7510' is defined on line 1669, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 5575 (Obsoleted by RFC 8955) Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BGP Enabled Services (bess) R. Fernando 3 INTERNET-DRAFT Cisco 4 Intended status: Standards Track S. Mackie 5 Expires: August 18, 2018 Juniper 6 D. Rao 7 Cisco 8 B. Rijsman 9 Juniper 10 M. Napierala 11 AT&T 12 T. Morin 13 Orange 15 February 12, 2018 17 Service Chaining using Virtual Networks with BGP VPNs 18 draft-ietf-bess-service-chaining-04 20 Abstract 22 This document describes how service function chains (SFC) can be 23 applied to traffic flows using routing in a virtual (overlay) network 24 to steer traffic between service nodes. Chains can include services 25 running in routers, on physical appliances or in virtual machines. 26 Service chains have applicability at the subscriber edge, business 27 edge and in multi-tenant datacenters. The routing function into SFCs 28 and between service functions within an SFC can be performed by 29 physical devices (routers), be virtualized inside hypervisors, or run 30 as part of a host OS. 32 A BGP control plane for route distribution is used to create virtual 33 networks implemented using IP MPLS, VXLAN or other suitable 34 encapsulation, where the routes within the virtual networks cause 35 traffic to flow through a sequence of service nodes that apply packet 36 processing functions to the flows. 38 Two techniques are described: in one the service chain is implemented 39 as a sequence of distinct VPNs between sets of service nodes that 40 apply each service function; in the other, the routes within a VPN 41 are modified through the use of special route targets and modified 42 next-hop resolution to achieve the desired result. 44 In both techniques, service chains can be created by manual 45 configuration of routes and route targets in routing systems, or 46 through the use of a controller which contains a topological model of 47 the desired service chains. 49 This document also contains discussion of load balancing between 50 network functions, symmetric forward and reverse paths when stateful 51 services are involved, and use of classifiers to direct traffic into 52 a service chain. 54 Status of this Memo 56 This Internet-Draft is submitted in full conformance with the 57 provisions of BCP 78 and BCP 79. 59 Internet-Drafts are working documents of the Internet Engineering 60 Task Force (IETF), its areas, and its working groups. Note that 61 other groups may also distribute working documents as Internet- 62 Drafts. 64 Internet-Drafts are draft documents valid for a maximum of six months 65 and may be updated, replaced, or obsoleted by other documents at any 66 time. It is inappropriate to use Internet-Drafts as reference 67 material or to cite them other than as "work in progress." 69 The list of current Internet-Drafts can be accessed at 70 http://www.ietf.org/ietf/1id-abstracts.txt 72 The list of Internet-Draft Shadow Directories can be accessed at 73 http://www.ietf.org/shadow.html 75 This Internet-Draft will expire on October 13, 2016. 77 Copyright Notice and License Notice 79 Copyright (c) 2016 IETF Trust and the persons identified as the 80 document authors. All rights reserved. 82 This document is subject to BCP 78 and the IETF Trust's Legal 83 Provisions Relating to IETF Documents 84 (http://trustee.ietf.org/license-info) in effect on the date of 85 publication of this document. Please review these documents 86 carefully, as they describe your rights and restrictions with respect 87 to this document. Code Components extracted from this document must 88 include Simplified BSD License text as described in Section 4.e of 89 the Trust Legal Provisions and are provided without warranty as 90 described in the Simplified BSD License. 92 Table of Contents 94 1 Introduction ...................................................4 95 1.1 Terminology................................................5 96 2 Service Function Chain Architecture Using Virtual Networking ...8 97 2.1 High Level Architecture....................................9 98 2.2 Service Function Chain Logical Model......................10 99 2.3 Service Function Implemented in a Set of SF Instances.....11 100 2.4 SF Instance Connections to VRFs...........................13 101 2.4.1 SF Instance in Physical Appliance....................13 102 2.4.2 SF Instance in a Virtualized Environment.............14 103 2.5 Encapsulation Tunneling for Transport.....................15 104 2.6 SFC Creation Procedure....................................15 105 2.6.1 SFC Provisioning Using Sequential VPNs...............16 106 2.6.2 Modified-Route SFC Creation..........................18 107 2.7 Controller Function.......................................20 108 2.8 Variations on Setting Prefixes in an SFC..................21 109 2.8.1 Using a Default Route................................21 110 2.8.2 Using a Default Route and a Large Prefix.............21 111 2.8.3 Disaggregated Gateway Routers........................22 112 2.8.4 Optimizing VRF usage.................................23 113 2.8.5 Dynamic Entry and Exit Signaling.....................23 114 2.8.6 Dynamic Re-Advertisements in Intermediate systems....24 115 2.9 Layer-2 Virtual Networks and Services.....................24 116 2.10 Header Transforming Service Functions....................25 117 3 Load Balancing Along a Service Function Chain .................25 118 3.1 SF Instances Connected to Separate VRFs...................25 119 3.2 SF Instances Connected to the Same VRF....................26 120 3.3 Combination of Egress and Ingress VRF Load Balancing......27 121 3.4 Forward and Reverse Flow Load Balancing...................29 122 3.4.1 Issues with Equal Cost Multi-Path Routing............29 123 3.4.2 Modified ECMP with Consistent Hash...................29 124 3.4.3 ECMP with Flow Table.................................30 125 3.4.4 Dealing with different hash algorithms in an SFC.....32 126 4 Steering into SFCs Using a Classifier .........................32 127 5 External Domain Co-ordination .................................34 128 6 Fine-grained steering using BGP Flow-Spec .....................35 129 7 Controller Federation .........................................35 130 8 Coordination Between SF Instances and Controller using BGP ....35 131 9 BGP Extended Communities ......................................36 132 10 Summary and Conclusion........................................38 133 11 Security Considerations.......................................38 134 12 IANA Considerations...........................................38 135 13 Informative References........................................38 137 14 Acknowledgments...............................................40 139 1 Introduction 141 The purpose of networks is to allow computing systems to communicate 142 with each other. Requests are usually made from the client or 143 customer side of a network, and responses are generated by 144 applications residing in a datacenter. Over time, the network between 145 the client and the application has become more complex, and traffic 146 between the client and the application is acted on by intermediate 147 systems that apply network services. Some of these activities, like 148 firewall filtering, subscriber attachment and network address 149 translation are generally carried out in network devices along the 150 traffic path, while others are carried out by dedicated appliances, 151 such as media proxy and deep packet inspection (DPI). Deployment of 152 these in-network services is complex, time- consuming and costly, 153 since they require configuration of devices with vendor-specific 154 operating systems, sometimes with co-processing cards, or deployment 155 of physical devices in the network, which requires cabling and 156 configuration of the devices that they connect to. Additionally, 157 other devices in the network need to be configured to ensure that 158 traffic is correctly steered through the systems that services are 159 running on. 161 The current mode of operations does not easily allow common 162 operational processes to be applied to the lifecycle of services in 163 the network, or for steering of traffic through them. 165 The recent emergence of Network Functions Virtualization (NFV) 166 [NFVE2E] to provide a standard deployment model for network services 167 as software appliances, combined with Software Defined Networking 168 (SDN) for more dynamic traffic steering can provide foundational 169 elements that will allow network services to be deployed and managed 170 far more efficiently and with more agility than is possible today. 172 This document describes how the combination of several existing 173 technologies can be used to create chains of functions, while 174 preserving the requirements of scale, performance and reliability for 175 service provider networks. The technologies employed are: 177 o Traffic flow between service functions described by routing and 178 network policies rather than by static physical or logical 179 connectivity 181 o Packet header encapsulation in order to create virtual private 182 networks using network overlays 184 o VRFs on both physical devices and in hypervisors to implement 185 forwarding policies that are specific to each virtual network 187 o Optional use of a controller to calculate routes to be installed 188 in routing systems to form a service chain. The controller uses a 189 topological model that stores service function instance 190 connectivity to network devices and intended connectivity between 191 service functions. 193 o MPLS or other labeling to facilitate identification of the next 194 interface to send packets to in a service function chain 196 o BGP or BGP-style signaling to distribute routes in order to create 197 service function chains 199 o Distributed load balancing between service functions performed in 200 the VRFs that service function instance connect to. 202 Virtualized environments can be supported without necessarily running 203 BGP or MPLS natively. Messaging protocols such as NC/YANG, XMPP or 204 OpenFlow may be used to signal forwarding information. Encapsulation 205 mechanisms such as VXLAN or GRE may be used for overlay transport. 206 The term 'BGP-style', above, refers to this type of signaling. 208 Traffic can be directed into service function chains using IP routing 209 at each end of the service function chain, or be directed into the 210 chain by a classifier function that can determine which service chain 211 a traffic flow should pass through based on deep packet inspection 212 (DPI) and/or subscriber identity. 214 The techniques can support an evolution from services implemented in 215 physical devices attached to physical forwarding systems (routers) to 216 fully virtualized implementations as well as intermediate hybrid 217 implementations. 219 1.1 Terminology 221 This document uses the following acronyms and terms. 223 Terms Meaning 224 ----- ----------------------------------------------- 225 AS Autonomous System 226 ASBR Autonomous System Border Router 228 FW Firewall 229 I2RS Interface to the Routing System 230 L3VPN Layer 3 VPN 231 LB Load Balancer 232 NLRI Network Layer Reachability Information [RFC4271] 233 P Provider backbone router 234 proxy-arp proxy-Address Resolution Protocol 235 RR Route Reflector 236 RT Route Target 237 SDN Software Defined Network 238 vCE virtual Customer Edge router 239 [I-D.fang-l3vpn-virtual-ce] 240 vFW virtual Firewall 241 vLB virtual Load Balancer 242 VM Virtual Machine 243 vPC virtual Private Cloud 244 vPE virtual Provider Edge router 245 [I-D.fang-l3vpn-virtual-pe] 246 VPN Virtual Private Network 247 VRF VPN Routing and Forwarding table [RFC4364] 248 vRR virtual Route Reflector 250 This document follows some of the terminology used in [sfc-arch] and 251 adds some new terminology: 253 Network Service: An externally visible service offered by a network 254 operator; a service may consist of a single service function or a 255 composite built from several service functions executed in one or 256 more pre-determined sequences and delivered by software executing 257 in physical or virtual devices. 259 Classification: Customer/network/service policy used to identify and 260 select traffic flow(s) requiring certain outbound forwarding 261 actions, in particular, to direct specific traffic flows into the 262 ingress of a particular service function chain, or causing 263 branching within a service function chain. 265 Virtual Network: A logical overlay network built using virtual links 266 or packet encapsulation, over an existing network (the underlay). 268 Service Function Chain (SFC): A service function chain defines an 269 ordered set of service functions that must be applied to packets 270 and/or frames selected as a result of classification. An SFC may be 271 either a linear chain or a complex service graph with multiple 272 branches. The term 'Service Chain' is often used in place of 273 'Service Function Chain'. 275 SFC Set: The pair of SFCs through which the forward and reverse 276 directions of a given classified flow will pass. 278 Service Function (SF): A logical function that is applied to 279 packets. A service function can act at the network layer or other 280 OSI layers. A service function can be embedded in one or more 281 physical network elements, or can be implemented in one or more 282 software instances running on physical or virtual hosts. One or 283 multiple service functions can be embedded in the same network 284 element or run on the same host. Multiple instances of a service 285 function can be enabled in the same administrative domain. We will 286 also refer to 'Service Function' as, simply, 'Service' for 287 simplicity. 289 A non-exhaustive list of services includes: firewalls, DDOS 290 protection, anti-malware/ant-virus systems, WAN and application 291 acceleration, Deep Packet Inspection (DPI), server load balancers, 292 network address translation, HTTP Header Enrichment functions, 293 video optimization, TCP optimization, etc. 295 SF Instance: An instance of software that implements the packet 296 processing of a service function 298 SF Instance Set: A group of SF instances that, in parallel, implement 299 a service function in an SFC. 301 Routing System: A hardware or software system that performs layer 3 302 routing and/or forwarding functions. The term includes physical 303 routers as well as hypervisor or Host OS implementations of the 304 forwarding plane of a conventional router. 306 Gateway: A routing system attached to the source or destination 307 network that peers with the controller, or with the routing system 308 at one end of an SFC. A source network gateway directs traffic from 309 the source network into an SFC, while a destination network gateway 310 distributes traffic towards destinations. The routing systems at 311 each end of an SFC can themselves act as gateways and in a 312 bidirectional SF instance set, gateways can act in both directions 314 VRF: A subsystem within a routing system as defined in [RFC4364] that 315 contains private routing and forwarding tables and has physical 316 and/or logical interfaces associated with it. In the case of 317 hypervisor/Host OS implementations, the term refers only to the 318 forwarding function of a VRF, and this will be referred to as a 319 'VPN forwarder.' 321 Ingress VRF: A VRF containing an ingress interface of a SF instance 323 Egress VRF: A VRF containing an egress interface of a SF instance 325 Note that in this document the terms 'ingress' and 'egress' are used 326 with respect to SF instances rather than the tunnels that connect SF 327 instances. This is different usage than in VPN literature in general. 329 Entry VRF: A VRF through which traffic enters the SFC from the source 330 network. This VRF may be used to advertise the destination 331 network's routes to the source network. It could be placed on a 332 gateway router or be collocated with the first ingress VRF. 334 Exit VRF: A VRF through which traffic exits the SFC into the 335 destination network. This VRF contains the routes from the 336 destination network and could be located on a gateway router. 337 Alternatively, the egress VRF attached to the last SF instance may 338 also function as the exit VRF. 340 2 Service Function Chain Architecture Using Virtual Networking 342 The techniques described in this document use virtual networks to 343 implement service function chains. Service function chains can be 344 implemented on devices that support existing MPLS VPN and BGP 345 standards [RFC4364, RFC4271, RFC4760], as well as other 346 encapsulations, such as VXLAN [RFC7348]. Similarly, equivalent 347 control plane protocols such as BGP-EVPN with type-2 and type-5 route 348 types can also be used where supported. The set of techniques 349 described in this document represent one implementation approach to 350 realize the SFC architecture described in [sfc-arch]. 352 The following sections detail the building blocks of the SFC 353 architecture, and outline the processes of route installation and 354 subsequent route exchange to create an SFC. 356 2.1 High Level Architecture 358 Service function chains can be deployed with or without a classifier. 359 Use cases where SFCs may be deployed without a classifier include 360 multi-tenant data centers, private and public cloud and virtual CPE 361 for business services. Classifiers will primarily be used in mobile 362 and wireline subscriber edge use cases. Use of a classifier is 363 discussed in Section 4. 365 A high-level architecture diagram of an SFC without a classifier, 366 where traffic is routed into and out of the SFC, is shown in Figure 367 1, below. An optional controller is shown that contains a topological 368 model of the SFC and which configures the network resources to 369 implement the SFC. 371 +-------------------------+ 372 |--- Data plane connection| 373 |=== Encapsulation tunnel | 374 | O VRF | 375 +-------------------------+ 377 Control +------------------------------------------------+ 378 Plane | Controller | 379 ....... +-+------------+----------+----------+---------+-+ 380 | | | | | 381 Service | +---+ | +---+ | +---+ | | 382 Plane | |SF1| | |SF2| | |SF3| | | 383 | +---+ | +---+ | +---+ | | 384 ....... / | | / | | / | | / / 385 +-----+ +--|-|--+ +--|-|--+ +--|-|--+ +-----+ 386 | | | | | | | | | | | | | | | | 387 Net-A-->---O==========O O========O O========O O=========O---->Net-B 388 | | | | | | | | | | 389 Data | R-A | | R-1 | | R-2 | | R-3 | | R-B | 390 Plane +-----+ +-------+ +-------+ +-------+ +-----+ 392 ^ ^ ^ ^ 393 | | | | 394 | Ingress Egress | 395 | VRF VRF | 396 SFC Entry SFC Exit 397 VRF VRF 399 Figure 1 - High level SFC Architecture 401 Traffic from Network-A destined for Network-B will pass through the 402 SFC composed of SF instances, SF1, SF2 and SF3. Routing system R-A 403 contains a VRF (shown as 'O' symbol) that is the SFC entry point. 404 This VRF will advertise a route to reach Network-B into Network-A 405 causing any traffic from a source in Network-A with a destination in 406 Network-B to arrive in this VRF. The forwarding table in the VRF in 407 R-A will direct traffic destined for Network-B into an encapsulation 408 tunnel with destination R-1 and a label that identifies the ingress 409 (left) interface of SF1 that R-1 should send the packets out on. The 410 packets are processed by service instance SF-1 and arrive in the 411 egress (right) VRF in R-1. The forwarding entries in the egress VRF 412 direct traffic to the next ingress VRF using encapsulation tunneling. 413 The process is repeated for each service instance in the SFC until 414 packets arrive at the SFC exit VRF (in R-B). This VRF is peered with 415 Network-B and routes packets towards their destinations in the user 416 data plane. In this example, routing systems R-A and R-B are gateway 417 routing systems. 419 In the example, each pair of ingress and egress VRFs are configured 420 in separate routing systems, but such pairs could be collocated in 421 the same routing system, and it is possible for the ingress and 422 egress VRFs for a given SF instance to be in different routing 423 systems. The SFC entry and exit VRFs can be collocated in the same 424 routing system, and the service instances can be local or remote from 425 either or both of the routing systems containing the entry and exit 426 VRFs, and from each other. It is also possible that the ingress and 427 egress VRFs are implemented using alternative mechanisms. 429 The controller is responsible for configuring the VRFs in each 430 routing system, installing the routes in each of the VRFs to 431 implement the SFC, and, in the case of virtualized services, may 432 instantiate the service instances. 434 2.2 Service Function Chain Logical Model 436 A service function chain is a set of logically connected service 437 functions through which traffic can flow. Each egress interface of 438 one service function is logically connected to an ingress interface 439 of the next service function. 441 +------+ +------+ +------+ 442 Network-A-->| SF-1 |-->| SF-2 |-->| SF-3 |-->Network-B 443 +------+ +------+ +------+ 445 Figure 2 - A Chain of Service Functions 447 In Figure 2, above, a service function chain has been created that 448 connects Network-A to Network-B, such that traffic from a host in 449 Network-A to a host in Network-B will traverse the service function 450 chain. 452 As defined in [sfc-arch], a service function chain can be uni- 453 directional or bi-directional. In this document, in order to allow 454 for the possibility that the forward and reverse paths may not be 455 symmetrical, SFCs are defined as uni-directional, and the term 'SFC 456 set' is used to refer to a pair of forward and reverse direction SFCs 457 for some set of routed or classified traffic. 459 2.3 Service Function Implemented in a Set of SF Instances 461 A service function instance is a software system that acts on packets 462 that arrive on an ingress interface of that software system. Service 463 function instances may run on a physical appliance or in a virtual 464 machine. A service function instance may be transparent at layer 2 465 and/or layer 3, and may support branching across multiple egress 466 interfaces and may support aggregation across ingress interfaces. For 467 simplicity, the examples in this document have a single ingress and a 468 single egress interface. 470 Each service function in a chain can be implemented by a single 471 service function instance, or by a set of instances in order to 472 provide scale and resilience. 474 +------------------------------------------------------------------+ 475 | Logical Service Functions Connected in a Chain | 476 | | 477 | +--------+ +--------+ | 478 | Net-A--->| SF-1 |----------->| SF-2 |--->Net-B | 479 | +--------+ +--------+ | 480 | | 481 +------------------------------------------------------------------+ 482 | Service Function Instances Connected by Virtual Networks | 483 | ...... ...... | 484 | : : +------+ : : | 485 | : :-->|SFI-11|-->: : ...... | 486 | : : +------+ : : +------+ : : | 487 | : : : :-->|SFI-21|-->: : | 488 | : : +------+ : : +------+ : : | 489 | A->: VN-1 :-->|SFI-12|-->: VN-2 : : VN-3 :-->B | 490 | : : +------+ : : +------+ : : | 491 | : : : :-->|SFI-22|-->: : | 492 | : : +------+ : : +------+ : : | 493 | : :-->|SFI-13|-->: : '''''' | 494 | : : +------+ : : | 495 | '''''' '''''' | 496 +------------------------------------------------------------------+ 498 Figure 3 - Service Functions Are Composed of SF Instances Connected 499 Via Virtual Networks 501 In Figure 3, service function SF-1 is implemented in three service 502 function instances, SFI-11, SFI-12, and SFI-13. Service function SF- 503 2 is implemented in two SF instances. The service function instances 504 are connected to the next service function in the chain using a 505 virtual network, VN-2. Additionally, a virtual network (VN-1) is used 506 to enter the SFC and another (VN-3) is used at the exit. 508 The logical connection between two service functions is implemented 509 using a virtual network that contains egress interfaces for instances 510 of one service function, and ingress interfaces of instances of the 511 next service function. Traffic is directed across the virtual network 512 between the two sets of service function instances using layer 3 513 forwarding (e.g. an MPLS VPN) or layer 2 forwarding (e.g. a VXLAN). 515 The virtual networks could be described as "directed half-mesh", in 516 that the egress interface of each SF instance of one service function 517 can reach any ingress interface of the SF instances of the connected 518 service function. 520 Details on how routing across virtual networks is achieved, and 521 requirements on load balancing across ingress interfaces are 522 discussed in later sections of this document. 524 2.4 SF Instance Connections to VRFs 526 SF instances can be deployed as software running on physical 527 appliances, or in virtual machines running on a hypervisor. These two 528 types are described in more detail in the following sections. 530 2.4.1 SF Instance in Physical Appliance 532 The case of a SF instance running on a physical appliance is shown in 533 Figure 4, below. 535 +---------------------------------+ 536 | | 537 | +-----------------------------+ | 538 | | Service Function Instance | | 539 | +-------^-------------|-------+ | 540 | | Host | | 541 +---------|-------------|---------+ 542 | | 543 +------ |-------------|-------+ 544 | | | | 545 | +----|----+ +-----v----+ | 546 ---------+ Ingress | | Egress +--------- 547 ---------> VRF | | VRF ----------> 548 ---------+ | | +--------- 549 | +---------+ +----------+ | 550 | Routing System | 551 +-----------------------------+ 553 Figure 4 - Ingress and Egress VRFs for a Physical Routing System and 554 Physical SF Instance 556 The routing system is a physical device and the service function 557 instance is implemented as software running in a physical appliance 558 (host) connected to it. The connection between the physical device 559 and the routing system may use physical or logical interfaces. 560 Transport between VRFs on different routing systems that are 561 connected to other SF instances in an SFC is via encapsulation 562 tunnels, such as MPLS over GRE, or VXLAN. 564 2.4.2 SF Instance in a Virtualized Environment 566 In virtualized environments, a routing system with VRFs that act as 567 VPN forwarders is resident in the hypervisor/Host OS, and is co- 568 resident in the host with one or more SF instances that run in 569 virtual machines. The egress VPN forwarder performs tunnel 570 encapsulation to send packets to other physical or virtual routing 571 systems with attached SF instances to form an SFC. The tunneled 572 packets are sent through the physical interfaces of the host to the 573 other hosts or physical routers. This is illustrated in Figure 5, 574 below. 576 +-------------------------------------+ 577 | +-----------------------------+ | 578 | | Service Function Instance | | 579 | +-------^-------------|-------+ | 580 | | | | 581 | +---------|-------------|---------+ | 582 | | +-------|-------------|-------+ | | 583 | | | | | | | | 584 | | | +----|----+ +-----v----+ | | | 585 ------------+ Ingress | | Egress +----------- 586 ------------> VRF | | VRF ------------> 587 ------------+ | | +----------- 588 | | | +---------+ +----------+ | | | 589 | | | Routing System | | | 590 | | +-----------------------------+ | | 591 | | Hypervisor or Host OS | | 592 | +---------------------------------+ | 593 | Host | 594 +-------------------------------------+ 596 Figure 5 - Ingress and Egress VRFs for a Virtual Routing System and 597 Virtualized SF Instance 599 When more than one instance of an SF is running on a hypervisor, they 600 can be connected to the same VRF for scale out of an SF within an 601 SFC. 603 The routing mechanisms in the VRFs into and between service function 604 instances, and the encapsulation tunneling between routing systems 605 are identical in the physical and virtual implementations of SFCs and 606 routing systems described in this document. Physical and virtual 607 service functions can be mixed as needed with different combinations 608 of physical and virtual routing systems, within a single service 609 chain. 611 The SF instances are attached to the routing systems via physical, 612 virtual or logical (e.g, 802.1q) interfaces, and are assumed to 613 perform basic L3 or L2 forwarding. 615 A single SF instance can be part of multiple service chains. In this 616 case, the SF instance will have dedicated interfaces (typically 617 logical) and forwarding contexts associated with each service chain. 619 2.5 Encapsulation Tunneling for Transport 621 Encapsulation tunneling is used to transport packets between SF 622 instances in the chain and, when a classifier is not used, from the 623 originating network into the SFC and from the SFC into the 624 destination network. 626 The tunnels can be MPLS over GRE [RFC4023], MPLS over UDP [draft- 627 ietf-mpls-in-udp], MPLS over MPLS [RFC3031], VXLAN [RFC7348], or 628 another suitable encapsulation methods. 630 Tunneling capabilities may be enabled in each routing system as part 631 of a base configuration or may be configured by the controller. 632 Tunnel encapsulations may be programmed by the controller or signaled 633 using BGP. The encapsulation to be used for a given route is signaled 634 in BGP using the procedures described in [draft-rosen- 635 idr-tunnel-encaps], i.e. typically relying on the BGP Tunnel 636 Encapsulation Extended Community. 638 2.6 SFC Creation Procedure 640 This section describes how service chains are created using two 641 methods: 643 o Sequential VPNs - where a conventional VPN is created between each 644 set of SF instances to create the links in the SFC 646 o Route Modification - where each routing system modifies advertised 647 routes that it receives, to realize the links in an SFC on the 648 basis of a special service topology RT and a route- policy that 649 describes the service chain logical topology 651 In both cases the controller, when present, is responsible for 652 creating ingress and egress VRFs, configuring the interfaces 653 connected to SF instances in each VRF, and allocating and configuring 654 import and export RTs for each VRF. Additionally, in the second 655 method, the controller also sends the route-policy containing the 656 service chain logical topology to each routing system. If a 657 controller is not used, these procedures will require to be performed 658 manually or through scripting, for instance. 660 The source and destination networks' prefixes can be configured in 661 the controller, or may be automatically learned through peering 662 between the controller and each network's gateway. This is further 663 described in Section 2.8.5 and Section 5. 665 The following sub-sections describe how RT configuration, local route 666 installation and route distribution occur in each of the methods. 668 It should be noted that depending on the capabilities of the routing 669 systems, a controller can use one or more techniques to realize 670 forwarding along the service chain, ranging from fully centralized to 671 fully distributed. The goal of describing the following two methods 672 is to illustrate the broad approaches and as a base for various 673 optimization options. 675 Interoperability between a controller implementing one method and a 676 controller implementing a different method is achieved by relying on 677 the techniques described in section 5 and section 8, that describe 678 the use of BGP-style service chaining within domains that are 679 interconnected using standard BGP VPN route exchanges. 681 2.6.1 SFC Provisioning Using Sequential VPNs 683 The task of the controller in this method of SFC provisioning is to 684 create a set of VPNs that carry traffic to the destination network 685 through instances of each service function in turn. This is achieved 686 by allocating and configuring RTs such that the egress VRFs of one 687 set of SF instances import an RT that is an export RT for the ingress 688 VRFs of the next, logically connected, set of SF instances. 690 The process of SFC creation is as follows: 692 1. Controller creates a VRF in each routing system that is 693 connected to a service instance that will be used in the SFC 695 2. Controller configures each VRF to contain the logical 696 interface that connects to a SF instance. 698 3. Controller implements route target import and export 699 policies in the VRFs using the same route targets for the 700 egress VRFs of a service function and the ingress VRFs of 701 the next logically connected service function in the SFC. 703 4. Controller installs a static route in each ingress VRF whose 704 next hop is the interface that a SF instance is connected 705 to. The prefix for the route is the destination network to 706 be reached by passing through the SFC. The following 707 sections describe variations that can be used. 709 5. Routing systems advertise the static routes via BGP as VPN 710 routes with next hop being the IP address of the router, 711 with an encapsulation specified and a label that identifies 712 the service instance interface. 714 6. Routing systems containing VRFs with matching route targets 715 receive the updates. 717 7. Routes are installed in egress VRFs with matching import 718 targets. The egress VRFs of each SF instance will now 719 contain VPN routes to one or more routers containing ingress 720 VRFs for SF instances of the next service function in the 721 SFC. 723 Routes to the destination network via the first set of SF instances 724 are advertised into the source network, and the egress VRFs of the 725 last SF instance set have routes into the destination network. 727 As discussed further in Section 3, egress VRFs can load balance 728 across the multiple next hops advertised from the next set of ingress 729 VRFs. 731 2.6.2 Modified-Route SFC Creation 733 In this method of SFC configuration, all the VRFs connected to SF 734 instances for a given SFC are configured with same import and export 735 RT, so they form a VPN-connected mesh between the SF instance 736 interfaces. This is termed the 'Service VPN'. A route is configured 737 or learnt in each VRF with destination being the IP address of a 738 connected SF instance via an interface configured in the VRF. The 739 interface may be a physical or logical interface. The routing system 740 that hosts such a VRF advertises a VPN route for each locally 741 connected SF instance, with a forwarding label that enables it to 742 forward incoming traffic from other routing systems to the connected 743 SF instance. The VPN routes may be advertised via an RR or the 744 controller, which sends these updates to all the other routing 745 systems that have VRFs with the service VPN RT. At this point all the 746 VRFs have a route to reach every SF instance. The same virtual IP 747 address may be used for each SF instance in a set, enabling load- 748 balancing among multiple SF instances in the set. 750 The controller builds a route-policy for the routing systems in the 751 VPN, that describes the logical topology of each service chain that 752 it belongs to. The route-policy contains entries in the form of a 753 tuple for each service chain: 755 {Service-topology-name, Service-topology-RT, Service-node- 756 sequence} 758 where Service-node-sequence is simply an ordered list of the service 759 function interface IP addresses that are in the chain. 761 Every service function chain has a single unique service-topology-RT 762 that is allocated and provisioned on all participating routing 763 systems in the relevant VRFs. 765 The VRF in the routing system that connects to the destination 766 network (i.e. the exit VRF) is configured to attach the Service- 767 topology-RT to exported routes, and the VRF connected to the source 768 network (i.e. the entry VRF) will import routes using the Service- 769 topology-RT. The controller may also be used to originate the 770 Service-topology-RT attached routes. 772 The route-policy may be described in a variety of formats and 773 installed on the routing system using a suitable mechanism. For 774 instance, the policy may be defined in YANG and provisioned using 775 Netconf. 777 Using Figure 1 for reference, when the gateway R-B advertises a VPN 778 route to Network-B, it attaches the Service-topology-RT. BGP route 779 updates are sent to all the routing systems in the service VPN. The 780 routing systems perform a modified set of actions for next-hop 781 resolution and route installation in the ingress VRFs compared to 782 normal BGP VPN behavior in routing systems, but no changes are 783 required in the operation of the BGP protocol itself. The 784 modification of behavior in the routing systems allows the automatic 785 and constrained flow of traffic through the service chain. 787 Each routing system in the service VPN will process the VPN route to 788 Network-B via R-B as follows: 790 1. If the routing system contains VRFs that import the 791 Service-topology-RT, continue, otherwise ignore the route. 793 2. The routing system identifies the position and role 794 (ingress/egress) of each of its VRFs in the SFC by comparing 795 the IP address of the route in the VRF to the connected SF 796 instance with those in the Service-node- sequence in the 797 route-policy. Alternatively, the controller may provision 798 the specific service node IP to be used as the next-hop in 799 each VRF, in the route-policy for the VRF. 801 3. The routing system modifies the next-hop of the imported 802 route with the Service-topology-RT, to select the 803 appropriate next-hop as per the route-policy. It ignores the 804 next-hop and label in the received route. It resolves the 805 selected next-hop in the local VRF routing table. 807 a. The imported route to Network-B in the ingress VRF is 808 modified to have a next-hop of the IP address of the 809 logically connected SF instance. 811 b. The imported route to Network-B in the egress VRF is 812 modified to have a next hop of the IP address of the 813 next SF instance in the SFC. 815 4. The egress VRFs for the last service function install the 816 VPN route via the gateway R-B unmodified. 818 Note that the modified routes are not re-advertised into the VPN by 819 the various intermediate routing systems in the SFC. 821 2.6.3 Common SFC provisioning considerations 823 In both the methods, for physical routers, the creation and 824 configuration of VRFs, interfaces and local static routes can be 825 performed programmatically using Netconf; and BGP route distribution 826 can use a route reflector (which may be part of the controller). In 827 the virtualized case, where a VPN forwarder is present, creation and 828 configuration of VRFs, interfaces and installation of routes may 829 instead be performed using a single protocol like XMPP, NC/YANG or an 830 equivalent programmatic interface. 832 Also in the virtualized case, the actual forwarding table entries to 833 be installed in the ingress and egress VRFs may be calculated by the 834 controller based on its internal knowledge of the required SFC 835 topology and the connectivity of SF instances to routing systems. In 836 this case, the routes may be directly installed in the forwarders 837 using the programmatic interface and no BGP route advertisement is 838 necessary, except when coordination with external domains (Section 5) 839 or federation between controller domains is employed (Section 7). 840 Note however that this is just one typical model for a virtual 841 forwarding based system. In general, physical and virtual routing 842 systems can be treated exactly the same if they have the same 843 capabilities. 845 In both the methods, the SF instance may also need to be set up 846 appropriately to forward traffic between it's input and output 847 interfaces, either via static, dynamic or policy-based routing. If 848 the service function is a transparent L2 service, then the static 849 route installed in the ingress VRF will have a next-hop of the IP 850 address of the routing system interface that the service instance is 851 attached to on its other interface. 853 2.7 Controller Function 855 The purpose of the controller is to manage instantiation of SFCs in 856 networks and datacenters. When an SFC is to be instantiated, a model 857 of the desired topology (service functions, number of instances, 858 connectivity) is built in the controller either via an API or GUI. 859 The controller then selects resources in the infrastructure that will 860 support the SFC and configures them. This can involve instantiation 861 of SF instances to implement each service function, the instantiation 862 of VRFs that will form virtual networks between SF instances, and 863 installation of routes to cause traffic to flow into and between SF 864 instances. It can also include provisioning the necessary static, 865 dynamic or policy based forwarding on the service function instance 866 to enable it to forward traffic. 868 For simplicity, in this document, the controller is assumed to 869 contain all the required features for management of SFCs. In actual 870 implementations, these features may be distributed among multiple 871 inter-connected systems. E.g. An overarching orchestrator might 872 manage the overall SFC model, sending instructions to a separate 873 virtual machine manager to instantiate service function instances, 874 and to a virtual network manager to set up the service chain 875 connections between them. 877 The controller can also perform necessary BGP signaling and route 878 distribution actions as described throughout this document. 880 2.8 Variations on Setting Prefixes in an SFC 882 The SFC Creation section above described the basic procedures for a 883 couple of SFC creation methods. This section describes some 884 techniques that can extend and provide optimizations on top of the 885 basic procedures. 887 2.8.1 Using a Default Route 889 In the methods described above, it can be noted that only the gateway 890 routing systems need the specific network prefixes to steer traffic 891 in and out of the SFC. The intermediate systems can direct traffic in 892 the ingress and egress VRFs by using only a default route. Hence, it 893 is possible to avoid installing the network prefixes in the 894 intermediate systems. This can be done by splitting the SFC into two 895 sections - one linking the entry and exit VRFs and the other 896 including the intermediate systems. For instance, this may be 897 achieved by using two different Service-topology-RTs in the second 898 method. 900 2.8.2 Using a Default Route and a Large Prefix 902 In the configuration methods described above, the network prefixes 903 for each network (Network-A and Network-B in the example above) 904 connected to the SFC are used in the routes that direct traffic 905 through the SFC. This creates an operational linkage between the 906 implementation of the SFC and the insertion of the SFC into a 907 network. 909 For instance, subscriber network prefixes will normally be segmented 910 across subscriber attachment points such as broadband or mobile 911 gateways. This means that each SFC would have to be configured with 912 the subscriber network prefixes whose traffic it is handling. 914 In a variation of the SFC configuration method described above, the 915 prefixes used in each direction can be such that they include all 916 possible addresses at each side of the SFC. For example, in Figure 1, 917 the prefix for Network-A could include all subscriber IP addresses 918 and the prefix for Network-B could be the default route, 0/0. 920 Using this technique, the same routes can be installed in all 921 instances of an SFC that serve different groups of subscribers in 922 different geographic locations. 924 The routes forwarding traffic into a SF instance and to the next SF 925 instance are installed when an SFC is initially built, and each time 926 a SF instance is connected into the SFC, but there is no requirement 927 for VRFs to be reconfigured when traffic from different networks pass 928 through the service chain, so long as their prefix is included in the 929 prefixes in the VRFs along the SFC. 931 In this variation, it is assumed that no subscriber-originated 932 traffic will enter the SFC destined for an IP address also in the 933 subscriber network address range. This will not be a restriction in 934 many cases. 936 2.8.3 Disaggregated Gateway Routers 938 As a slight variation of the above, a network prefix may be 939 disaggregated and spread out among various gateway routers, for 940 instance, in the case of virtual machines in a data-center. In order 941 to reduce the scaling requirements on the routing systems along the 942 SFC, the SFC can again be split into two sections as described above. 943 In addition, the last egress VRF may act as the exit VRF and install 944 the destination network's disaggregated routes. If the destination 945 network's prefixes can be aggregated, for instance into a subnet 946 prefix, then the aggregate prefix may be advertised and installed in 947 the entry VRF. 949 2.8.4 Optimizing VRF usage 951 It may be desirable to avoid using distinct ingress and egress VRFs 952 for the service instances in order to make more efficient use of VRF 953 resources, especially on physical routing systems. The ingress VRF 954 and egress VRF may be treated as conceptual entities and the 955 forwarding realized using one or more options described in this 956 section, combined with the methods described earlier. 958 For instance, the next-hop forwarding label described earlier serves 959 the purpose of directing traffic received from other routing systems 960 directly towards an attached service instance. On the other hand, if 961 the encapsulation mechanism or the device in use requires an IP 962 lookup for incoming packets from other routing systems, then the 963 specific network prefixes may be installed in the intermediate 964 service VRFs to direct traffic towards the attached service 965 instances. 967 Similarly, a per-interface policy-based-routing rule applied to an 968 access interface can serve to direct traffic coming in from attached 969 service instances towards the next SF set. 971 2.8.5 Dynamic Entry and Exit Signaling 973 When either of the methods of the previous sections are employed, the 974 prefixes of the attached networks at each end of an SFC can be 975 signaled into the corresponding VRFs dynamically. This requires that 976 a BGP session is configured either from the network device at each 977 end of the SFC into each network or from the controller. 979 If dynamic signaling is performed, and a bidirectional SFC set is 980 configured, and the gateways to the networks connected via the SFC 981 exchange routes, steps must be taken to ensure that routes to both 982 networks do not get advertised from both ends of the SFC set by re- 983 origination. This can be achieved if a new BGP Extended Community is 984 implemented to control re-origination. When a route is re-originated, 985 the RTs of the re-originated routes are appended to the new Route- 986 Target Record Extended Community, and if the RT for the route already 987 exists in the Extended Community, the route is not re-originated (see 988 Section 9.1). 990 2.8.6 Dynamic Re-Advertisements in Intermediate systems 992 The intermediate routing systems attached to the service instances 993 may also use the dynamic signaling technique from the previous 994 section to re-advertise received routes up the chain. In this case, 995 the ingress and egress VRFs are combined into one; and a local 996 route-policy ensures the re-advertised routes are associated with 997 labels that direct incoming traffic directly to the attached service 998 instances on that routing system. 1000 2.9 Layer-2 Virtual Networks and Service Functions 1002 There are SFs that operate at layer-2, in a transparent mode, and 1003 forward traffic based on the MAC DA. When such a SF is present in the 1004 SFC, the procedures at the routing system are modified slightly. In 1005 this case, the IP address associated with the SF instance (and used 1006 as the next-hop of routes in the above procedures) is actually the 1007 one assigned to the routing system interface attached to the other 1008 end of the SF instance, or it could be a virtual IP address logically 1009 associated with the service function with a next-hop of the other 1010 routing system interface. The routing system interface uses distinct 1011 interface MAC addresses. This allows the current scheme to be 1012 supported, while allowing the transparent service function to work 1013 using its existing behavior. 1015 A SFC may be also be set up between end systems or network segments 1016 within the same Layer-2 bridged network. In this case, applying the 1017 procedures described earlier, the segments or groups of end systems 1018 are placed in distinct Layer-2 virtual networks, which are then then 1019 inter-connected via a sequence of intermediate Layer-2 virtual 1020 networks that form the links in the SFC. Each virtual network maps to 1021 a pair of ingress and egress MAC VRFs on the routing systems to which 1022 the SF instances are attached. The routing systems at the ends of the 1023 SFC will advertise the locally learnt or installed MAC entries using 1024 BGP-EVPN type-2 routes, which will get installed in the MAC VRFs at 1025 the other end. The intermediate systems may use default MAC routes 1026 installed in the ingress and egress MAC VRFs, or the other variations 1027 described earlier in this document. 1029 2.10 Header Transforming Service Functions 1031 If a service function performs an action that changes the source 1032 address in the packet header (e.g., NAT), the routes that were 1033 installed as described above may not support reverse flow traffic. 1035 The solution to this is for the controller modify the routes in the 1036 reverse direction to direct traffic into instances of the 1037 transforming service function. The original routes with a source 1038 prefix (Network-A in Figure 2) are replaced with a route that has a 1039 prefix that includes all the possible addresses that the source 1040 address could be mapped to. In the case of network address 1041 translation, this would correspond to the NAT pool. 1043 3 Load Balancing Along a Service Function Chain 1045 One of the key concepts driving NFV [NFVE2E]is the idea that each 1046 service function along an SFC can be separately scaled by changing 1047 the number of service function instances that implement it. This 1048 requires that load balancing be performed before entry into each 1049 service function. In this architecture, load balancing is performed 1050 in either or both of egress and ingress VRFs depending on the type of 1051 load balancing being performed, and if more than one service instance 1052 is connected to the same ingress VRF. 1054 3.1 SF Instances Connected to Separate VRFs 1056 If SF instances implementing a service in an SFC are each connected 1057 to separate VRFs(e.g. instances are connected to different routers or 1058 are running on different hosts), load balancing is performed in the 1059 egress VRFs of the previous service, or in the VRF that is the entry 1060 to the SFC. The controller distributes BGP multi-path routes to the 1061 egress VRFs. The destination prefix of each route is the ultimate 1062 destination network, or its representative aggregate or default. The 1063 next-hops in the ECMP set are BGP next-hops of the service instances 1064 attached to ingress VRFs of the next service in the SFC. The load 1065 balancing corresponds to BGP Multipath, which requires that the route 1066 distinguishers for each route are distinct in order to recognize that 1067 distinct paths should be used. Hence, each VRF in a distributed, SFC 1068 environment should have a unique route distinguisher. 1070 +------+ +-------------------------+ 1071 O----|SFI-11|---O |--- Data plane connection| 1072 // +------+ \\ |=== Encapsulation tunnel | 1073 // \\ | O VRF | 1074 // \\ | * Load balancer | 1075 // \\ +-------------------------+ 1076 // +------+ \\ 1077 Net-A-->O*====O----|SFI-12|---O====O-->Net-B 1078 \\ +------+ // 1079 \\ // 1080 \\ // 1081 \\ // 1082 \\ +------+ // 1083 O----|SFI-13|---O 1084 +------+ 1086 Figure 6 - Egress VRF Load Balancing across SF Instances Connected 1087 to Different VRFs 1089 In the diagram, above, a service function is implemented in three 1090 service instances each connected to separate VRFs. Traffic from 1091 Network-A arrives at VRF at the start of the SFC, and is load 1092 balanced across the service instances using a set of ECMP routes with 1093 next hops being the addresses of the routing systems containing the 1094 ingress VRFs and with labels that identify the ingress interfaces of 1095 the service instances. 1097 3.2 SF Instances Connected to the Same VRF 1099 When SF instances implementing a service in an SFC are connected to 1100 the same ingress VRF, load balancing is performed in the ingress VRF 1101 across the service instances connected to it. The controller will 1102 install routes in the ingress VRF to the destination network with the 1103 interfaces connected to each service instance as next hops. The 1104 ingress VRF will then use ECMP to load balance across the service 1105 instances. 1107 +------+ +-------------------------+ 1108 |SFI-11| |--- Data plane connection| 1109 +------+ |=== Encapsulation tunnel | 1110 / \ | O VRF | 1111 / \ | * Load balancer | 1112 / \ +-------------------------+ 1113 / +------+ \ 1114 Net-A-->O====O*----|SFI-12|----O====O-->Net-B 1115 \ +------+ / 1116 \ / 1117 \ / 1118 \ / 1119 +------+ 1120 |SFI-13| 1121 +------+ 1123 Figure 7 - Ingress VRF Load Balancing across SF Instances 1124 Connected to the Same VRF 1126 In the diagram, above, a service is implemented by three service 1127 instances that are connected to the same ingress and egress VRFs. The 1128 ingress VRF load balances across the ingress interfaces using ECMP, 1129 and the egress traffic is aggregated in the egress VRF. 1131 If forwarding labels that identify each SFI ingress interface are 1132 used, and if the routes to each SF instance are advertised with 1133 different route distinguishers, then it is possible to perform ECMP 1134 load balancing at the routing instance at the beginning of the 1135 encapsulation tunnel (which could be the egress VRF of the previous 1136 SF in the SFC). 1138 3.3 Combination of Egress and Ingress VRF Load Balancing 1140 In Figure 8, below, an example SFC is shown where load balancing is 1141 performed in both ingress and egress VRFs. 1143 +-------------------------+ 1144 |--- Data plane connection| 1145 +------+ |=== Encapsulation tunnel | 1146 |SFI-11| | O VRF | 1147 +------+ | * Load balancer | 1148 / \ +-------------------------+ 1149 / \ 1150 / +------+ \ +------+ 1151 O*---|SFI-12|---O*====O---|SFI-21|---O 1152 // +------+ \\ // +------+ \\ 1153 // \\// \\ 1154 // \\ \\ 1155 // //\\ \\ 1156 // +------+ // \\ +------+ \\ 1157 Net-A-->O*====O----|SFI-13|---O*====O---|SFI-22|---O====O-->Net-B 1158 +------+ +------+ 1159 ^ ^ ^ ^ ^ ^ 1160 | | | | | | 1161 | Ingress Egress | | | 1162 | Ingress Egress | 1163 SFC Entry SFC Exit 1165 Figure 8 - Load Balancing across SF Instances 1167 In Figure 8, above, an SFC is composed of two services implemented by 1168 three service instances and two service instances, respectively. The 1169 service instances SFI-11 and SFI-12 are connected to the same ingress 1170 and egress VRFs, and all the other service instances are connected to 1171 separate VRFs. 1173 Traffic entering the SFC from Network-A is load balanced across the 1174 ingress VRFs of the first service function by the chain entry VRF, 1175 and then load balanced again across the ingress interfaces of SFI-11 1176 and SFI-12 by the shared ingress VRF. Note that use of standard ECMP 1177 will lead to an uneven distribution of traffic between the three 1178 service instances (25% to SFI-11, 25% to SFI-12, and 50% to SFI-13). 1179 This issue can be mitigated through the use of BGP link bandwidth 1180 extended community [draft-ietf-idr-link-bandwidth]. As described in 1181 the previous section, if a next-hop forwarding label is used, another 1182 way to mitigate this effect would be to advertise routes to each SF 1183 instance connected to a VRF with a different route distinguisher. 1185 After traffic passes through the first set of service instances, it 1186 is load balanced in each of the egress VRFs of the first set of 1187 service instances across the ingress VRFs of the next set of service 1188 instances. 1190 3.4 Forward and Reverse Flow Load Balancing 1192 This section discusses requirements in load balancing for forward and 1193 reverse paths when stateful service functions are deployed. 1195 3.4.1 Issues with Equal Cost Multi-Path Routing 1197 As discussed in the previous sections, load balancing in the forward 1198 SFC in the above example can automatically occur with standard BGP, 1199 if multiple equal cost routes to Network-B are installed into all the 1200 ingress VRFs, and each route directs traffic through a different 1201 service function instance in the next set. The multiple BGP routes in 1202 the routing table will translate to Equal Cost Multi-Path in the 1203 forwarding table. The hash used in the load balancing algorithm (per 1204 packet, per flow or per prefix) is implementation specific. 1206 If a service function is stateful, it is required that forward flows 1207 and reverse flows always pass through the same service function 1208 instance. Standard ECMP does not provide this capability, since the 1209 hash calculation will see different input data for the same flow in 1210 the forward and reverse directions (since the source and destination 1211 fields are reversed). 1213 Additionally, if the number of SF instances changes, either 1214 increasing to expand capacity, or decreases (planned, or due to a SF 1215 instance failure), the hash table in ECMP is recalculated, and most 1216 flows will be directed to a different SF instance and user sessions 1217 will be disrupted. 1219 There are a number of ways to satisfy the requirements of symmetric 1220 forward/reverse paths for flows and minimal disruption when SF 1221 instances are added to or removed from a set. Two techniques that can 1222 be employed are described in the following sections. 1224 3.4.2 Modified ECMP with Consistent Hash 1226 Symmetric forwarding into each side of an SF instance set can be 1227 achieved with a small modification to ECMP if the packet headers are 1228 preserved after passing through the SF instance set and assuming that 1229 the same hash function, same hash salt and same ordering association 1230 of hash buckets to ECMP routes is used in both directions. Each packet's 1231 5-tuple data is used to calculate which hash bucket, and therefore 1232 which service instance, that the packet will be sent to, but the source 1233 and destination IP address and port information are swapped in the 1234 calculation in the reverse direction. This method only requires that the 1235 list of available service function instances is consistently maintained 1236 in load balance tables in all the routing systems rather than maintaining 1237 flow tables. This requirement can be met by the use of a distinct VPN 1238 route for each instance. 1240 In the SFC architecture described in this document, when SF instances 1241 are added or removed, the controller is required to install (or 1242 remove) routes to the SF instances. The controller could configure 1243 the load balancing function in VRFs that connect to each added (or 1244 removed) SF instance as part of the same network transaction as route 1245 updates to ensure that the load balancer configuration is 1246 synchronized with the set of SF instances. 1248 The consistent ordering among ECMP routes in the routing systems 1249 could be achieved through configuration of the routing systems by the 1250 controller using, for instance, Netconf; or when the routes are 1251 signaled using BGP by the controller or a routing system, the order 1252 for a given instance can be sent in a new 'Consistent Hash Sort 1253 Order' BGP Extended Community (defined in Section 9.2). 1255 The effect of rehashing when SF instances are added or removed can be 1256 minimized, or even eliminated using variations of the technique of 1257 consistent hashing [consistent-hash]. Details are outside the scope 1258 of this document. 1260 3.4.3 ECMP with Flow Table 1262 A second refinement that can ensure forward/reverse flow consistency, 1263 and also provides stability when the number of SF instances changes 1264 ('flow-stickiness'), is the use of dynamically configured IP flow 1265 tables in the VRFs. In this technique, flow tables are used to ensure 1266 that existing flows are unaffected if the number of ECMP routes 1267 changes, and that forward and reverse traffic passes through the same 1268 SF instance in each set of SF instances implementing a service 1269 function. 1271 The flow tables are set up as follows: 1273 1. User traffic with a new 5-tuple enters an egress VRF from a 1274 connected SF instance. 1276 2. The VRF calculates the ECMP hash across available routes 1277 (i.e., ECMP group) to the ingress interfaces of the SF 1278 instances in the next SF instance set. The consistent hash 1279 technique described in section 3.4.2 must be used here and 1280 in subsequent steps. 1282 3. The VRF creates a new flow entry for the 5-tuple of the new 1283 traffic with the next-hop being the chosen downstream ECMP 1284 group member (determined in the step 2. above) . All 1285 subsequent packets for the same flow will be forwarded using 1286 flow lookup and, hence, will use the same next-hop. 1288 4. The encapsulated packet arrives in the routing system that 1289 hosts the ingress VRF for the selected SF instance. 1291 5. The ingress VRF of the next service instance determines if 1292 the packet came from a routing system that is in an ECMP 1293 group in the reverse direction(i.e., from this ingress VRF 1294 back to the previous set of SF instances). 1296 6. If an ECMP group is found, the ingress VRF creates a flow 1297 entry for the reversed 5-tuple with next-hop of the tunnel 1298 on which traffic arrived. This is for the traffic in the 1299 reverse direction. 1301 7. If multiple SF instances are connected to the ingress VRF, 1302 the ECMP consistent hash is used to choose which one to send 1303 the traffic into. 1305 8. A forward flow table entry is created for the traffic's 5- 1306 tuple with next hop of the interface of the SF instance 1307 chosen in the previous step. 1309 9. The packet is sent into the selected SF instance. 1311 The above method ensures that forward and reverse flows pass through 1312 the same SF instances, and that if the number of ECMP routes changes 1313 when SF instances are added or removed, all existing flows will 1314 continue to flow through the same SF instances, but new flows will 1315 use the new ECMP hash. The only flows affected will be those that 1316 were passing through an SF instance that was removed, and those will 1317 be spread among the remaining SF instances using the updated ECMP 1318 hash. 1320 If the consistent hash algorithm is used in both directions, then 1321 only the forwarding flow entries would be required, and would be 1322 built independently in each direction. If distinct VPN routes with 1323 next-hop forwarding labels are used, then only the flow table in step 1324 3 is sufficient to provide flow stickiness. 1326 3.4.4 Dealing with different hash algorithms in an SFC 1328 In some cases, there will be two or more hash algorithms in 1329 forwarders along an SFC. E.g. when a physical router is at the entry 1330 and exit of the chain, and virtual forwarders are used within the 1331 chain. Forward and reverse flows will mostly not pass through the 1332 same SF instances of the first SF, and the SFC will not operate as 1333 intended if the first SF is stateful. It may be impractical, or 1334 prohibitively expensive to implement the flow table-based methods 1335 described above to achieve flow stability and symmetry. This issue 1336 can be mitigated by ensuring that the first SF is not stateful, or by 1337 placing a null SF between the physical router and the first actual SF 1338 in the SFC. This ensures that the hash method on both sides of 1339 stateful service instances is the same, and the SFC will operate with 1340 flow stability and symmetry if the methods described above are 1341 employed. 1343 4 Steering into SFCs Using a Classifier 1345 In many applications of SFCs, a classifier will be used to direct 1346 traffic into SFCs. The classifier inspects the first or first few 1347 packets in a flow to determine which SFC the flow should be sent 1348 into. The decision criteria can be based on just the IP 5-tuple of 1349 the header (i.e filter-based forwarding), or could involve analysis 1350 of the payload of packets using deep packet inspection. Integration 1351 with a subscriber management system such as PCRF or AAA may be 1352 required in order to identify which SFC to send traffic to based on 1353 subscriber policy. 1355 An example logical architecture is shown in Figure 9, below where a 1356 classifier is external to a physical router that is hosting the VRFs 1357 that form the ends of two SFC sets. In the case of filter-based 1358 forwarding, classification could occur in a VRF on the router. 1360 +----------+ 1361 | PCRF/AAA | 1362 +-----+----+ 1363 : 1364 : 1365 Subscriber +-----+------+ 1366 Traffic----->| Classifier | 1367 +------------+ 1368 | | 1369 +-------|---|------------------------+ 1370 | | | Router | 1371 | | | | 1372 | O O X--------->Internet 1373 | | | / \ | 1374 | | | O O | 1375 +-------|---|----------------|---|---+ 1376 | | +---+ +---+ | | 1377 | +--+ U +---+ V +-+ | 1378 | +---+ +---+ | 1379 | | 1380 | +---+ +---+ +---+ | 1381 +--+ X +---+ Y +---+ Z +-+ 1382 +---+ +---+ +---+ 1384 Figure 9 - Subscriber/Application-Aware Steering with a Classifier 1386 In the diagram, the classifier receives subscriber traffic and sends 1387 the traffic out of one of two logical interfaces, depending on 1388 classification criteria. The logical interfaces of the classifier are 1389 connected to VRFs in a router that are entries to two SFCs (shown as 1390 O in the diagram). 1392 In this scenario, the entry VRF for each chain does not advertise the 1393 destination network prefixes and the modified method of setting 1394 prefixes, described in Section 2.8.2 can be employed. Also, the exit 1395 VRF for each SFC does not peer with a gateway or proxy node in the 1396 destination network and packets are forwarded using IP lookup in the 1397 main routing table or in a VRF that the exit traffic from the SFCs is 1398 directed into (shown as X in the diagram). A flow table may be 1399 required to ensure that reverse traffic is sent into the correct SFC. 1401 An alternative would be where the classifier is itself a distributed, 1402 virtualized service function, but with multiple egress interfaces. In 1403 that case, each virtual classifier instance could be attached to a set 1404 of VRFs that connect to different SFCs. Each chain entry VRF would 1405 load balance across the first SF instance set in its SFC. The reverse 1406 flow table mechanism described in Section 3.4.3 could be employed to 1407 ensure that flows return to the originating classifier instance which 1408 may maintain subscriber context and perform charging and accounting. 1410 5 External Domain Co-ordination 1412 It is likely that SFCs will be managed as a separate administrative 1413 domain from the networks that they receive traffic from, and send 1414 traffic to. If the connected networks use BGP for route distribution, 1415 the controller in the SFC domain can join the network domains by 1416 creating BGP peering sessions with routing systems or route 1417 reflectors in those network domains to exchange VPN routes, or with 1418 local border routers that peer with the external domains. While a 1419 controller can modify route targets for the VRFs within its SFC 1420 domain, it is likely to not have any control over the external 1421 networks with which it is peering. Hence, the design does not assume 1422 that the RTs of external network domains can be modified by the 1423 controller. It may however learn those RTs and use them in it's 1424 modified route advertisements. 1426 In order to steer traffic from external network domains into an SFC, 1427 the controller will advertise a destination network's prefixes into 1428 the peering source network domain with a BGP next-hop and label 1429 associated with the SFC entry point that may be on a routing system 1430 attached to the first SF instance. This advertisement may be over 1431 regular MP-BGP/VPN peering which assumes existing standard VPN 1432 routing/forwarding behavior on the network domain's routers 1433 (PEs/ASBRs). The controller can learn routes to networks in external 1434 domains at the egress of an SFC and advertise routes to those network 1435 into other external domains using the first ingress routing instance 1436 as the next hop thus allowing dynamic steering through re- 1437 origination of routes. 1439 An operational benefit of this approach is that the SFC topology 1440 within a domain need not be exposed to other domains. Additionally, 1441 using non-specific routes inside an SFC, as described in Section 1442 2.8.1, means that new networks can be attached to a SFC without 1443 needing to configure prefixes inside the chain. 1445 The controller will typically remove the destination network's RTs 1446 and replace them with the RTs of the source network while advertising 1447 the modified routes. Alternatively, an external domain may be 1448 provisioned with an additional export-only RT and an import- only RT 1449 that the controller can use. 1451 6 Fine-grained steering using BGP Flow-Spec 1452 When steering traffic from an external network domain into an SFC 1453 based on attributes of the packet flow, BGP Flow-spec can be used as 1454 a signaling option. 1456 In this case, the controller can advertise one or more flow-spec 1457 routes into the entry VRF with the appropriate Service-topology-RT 1458 for the SFC. Alternatively, it can use the procedures described in 1459 RFC5575 or [flowspec-redirect-ip] on the gateway router to redirect 1460 traffic towards the first SF. 1462 If it is desired to steer specific flows from a network domain's 1463 existing routers, the controller can advertise the above flow-spec 1464 routes to the network domain's border routers or route reflectors. 1466 7 Controller Federation 1468 When SFCs are distributed geographically, or in very large-scale 1469 environments, there may be multiple SFC controllers present and they 1470 may variously employ both of the SFC creation methods described in 1471 Section 2.6. If there is a requirement for SFCs to span controller 1472 domains there may be a requirement to exchange information between 1473 controllers. Again, a BGP session between controllers can be used to 1474 exchange route information as described in the previous sections and 1475 allow such domain spanning SFCs to be created. 1477 8 Coordination Between SF Instances and Controller using BGP 1479 In many cases, the configuration of SF instance determines its 1480 network behavior. E.g. when NAT pools are set up, or when an SSL 1481 gateway is configured with a set of enterprise IP addresses to use. 1482 In these cases, the addresses that will be used by the SFs need to be 1483 known in the networks connecting to them in order that traffic can be 1484 properly routed. When SFCs are involved, this means that the 1485 controller has to be notified when such configuration changes are 1486 made in SF instances. Sometimes, the changes will be made by end- 1487 customers and it is desirable the controller adjust the SFC routing 1488 configuration automatically when the change is made, and without 1489 customers needing to notify the service provider via a portal, for 1490 instance, or requiring development of integration modules linking the 1491 SF instances and the controller. 1493 One option for automatic notification for SFs that support BGP is for 1494 the connected forwarding system (physical or virtual SFF) to also 1495 support BGP, and for SF instances to be configured to peer with the 1496 SFF. When changes are made to the configuration of a SF instance, 1497 that for example, the SF will accept packets from a particular 1498 network prefix on one of its interfaces, the SF instance will send a 1499 BGP route update to the SFF it is connected to and which it has a BGP 1500 session with. The controller can then adjust the routes along SFCs to 1501 ensure that packets with destinations in the new prefix reach the 1502 reconfigured SF instance. 1504 BGP could also be used to signal from the controller to a SF instance 1505 that certain traffic should be sent out from a particular interface. 1506 This could be used to direct suspect traffic to a security scrubbing 1507 center,for example. 1509 Note that the SFF need not support a BGP stack itself; it can proxy 1510 BGP messages to the controller which will support such a stack. 1512 9 BGP Extended Communities 1514 9.1 Route-Target Record 1516 Route-Target Record (RT Record) is defined as a transitive BGP 1517 Extended Community, that contains a Route-Target value representing 1518 one of the RTs that the route has been attached with previously, and 1519 which may no longer be attached to the route on subsequent re- 1520 advertisements (see Section 2.8.5). 1522 A Sub-Type code 0x13 is assigned in the three BGP Extended Community 1523 types - Two-Octet AS-Specific 0x00, IPv4-Address-Specific 0x01 and 1524 Four-Octet AS-Specific 0x02. A Sub-Type code 0x0013 is also assigned 1525 in the BGP Transitive IPv6 Address-Specific Extended Community. 1527 The Extended Community is encoded as follows: 1529 0 1 2 3 1530 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1531 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1532 | 0x00,0x01,0x02| Sub-Type=0x13 | Route-Target Value | 1533 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1534 | Route-Target Value contd. | 1535 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1537 The Type field of the BGP Route-Target Extended Community is 1538 copied into the Type field of the RT Record Extended Community. 1540 The Value field (Global Administrator and Local Administrator) of 1541 the Route-Target Extended Community is copied into the Route- 1542 Target Value field of the RT Record Extended Community. 1544 When comparing a RT-Record to a Route-Target, only the Type and 1545 the Route-Target value fields are used in the comparison. The sub- 1546 type field is masked out. 1548 When a speaker re-originates a route that contains one or more 1549 RTs, it must add each of these RTs as RT Record extended communities 1550 in the re-originated route. 1552 A speaker must not re-originate a route with an RT, if this RT is 1553 already present as an RT Record extended community. 1555 9.2 CONSISTENT_HASH_SORT_ORDER 1557 Consistent Hash Sort Order is an optional transitive Opaque BGP 1558 Extended Community of sub-type 0x14, defined as follows: 1560 Type Field : The value of the high-order octet is determined by 1561 provisioning as per [RFC4360]. The value of the low- 1562 order octet is assigned as 0x14 by IANA from the 1563 Transitive Opaque Extended Community Sub-Types registry. 1565 Value Field : The value field contains a Sort Order sub-field that 1566 indicates the relative order of this route among the 1567 ECMP set for the prefix, to be sorted in increasing 1568 order. It is a 32-bit unsigned integer. The field is 1569 encoded as shown below: 1571 +------------------------------+ 1572 | Sort Order (4 octets) | 1573 +------------------------------+ 1574 | Reserved (2 octets) | 1575 +------------------------------+ 1577 10 Summary and Conclusion 1579 The architecture for service function chains described in this 1580 document uses virtual networks implemented as overlays in order to 1581 create service function chains. The virtual networks use standards- 1582 based encapsulation tunneling, such as MPLS over GRE/UDP or VXLAN, to 1583 transport packets into an SFC and between service function instances 1584 without routing in the user address space. Two methods of installing 1585 routes to form service chains are described. 1587 In environments with physical routers, a controller may operate in 1588 tandem with existing BGP route reflectors, and would contain the SFC 1589 topology model, and the ability to install the local static interface 1590 routes to SF instances. In a virtualized environment, the controller 1591 can emulate route refection internally and simply install required 1592 routes directly without advertisements occurring. 1594 11 Security Considerations 1596 The security considerations for SFCs are broadly similar to those 1597 concerning the data, control and management planes of any device 1598 placed in a network. Details are out of scope for this document. 1600 12 IANA Considerations 1602 The new BGP Extended Communities in Section 9 are assigned types as 1603 defined above in the IANA registry for extended communities. 1605 13 Informative References 1607 [NFVE2E] "Network Functions Virtualisation: End to End Architecture, 1608 http://docbox.etsi.org/ISG/NFV/70- 1609 DRAFT/0010/NFV-0010v016.zip". 1611 [RFC2328] J. Moy, "OSPF Version 2", RFC 2328, April, 1998. 1613 [sfc-arch] Halpern, J. and Pignataro, C., "Service Function Chaining 1614 (SFC) Architecture", RFC 7665, October 2015. 1616 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1617 Networks (VPNs)", RFC 4364, February 2006. 1619 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 1620 Protocol 4 (BGP-4)", RFC 4271, January 2006. 1622 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 1623 "Multiprotocol Extensions for BGP-4", RFC 4760, January 1624 2007. 1626 [RFC7348] Mahalingam, M., et al. "VXLAN: A Framework for Overlaying 1627 Virtualized Layer 2 Networks over Layer 3 Networks", RFC 1628 7348, August 2014. 1630 [draft-ietf-idr-tunnel-encaps] 1631 Rosen, E, Ed et al. "The BGP Tunnel Encapsulation 1632 Attribute", draft-ietf-idr-tunnel-encaps-08, January 11, 1633 2018 1635 [draft-ietf-l3vpn-end-system] Marques, P., et al., "BGP-signaled 1636 end-system IP/VPNs", draft-ietf-l3vpn-end-system-06, 1637 December 15, 2016. 1639 [RFC5575] Marques, P., Sheth, N., Raszuk, R., et al., "Dissemination 1640 of Flow Specification Rules", RFC 5575, Ausust 2009. 1642 [draft-ietf-bess-evpn-overlay-12] 1644 A. Sajassi, et al, "A Network Virtualization Overlay 1645 Solution using EVPN", draft-ietf-bess-evpn-overlay, 1646 February 2018. 1648 [RFC 8300] 1649 Quinn, P., et al, "Network Service Header", RFC 8300, 1650 November 2017. 1652 [draft-niu-sfc-mechanism] 1653 Niu, L., Li, H., and Jiang, Y., "A Service Function 1654 Chaining Header and its Mechanism", draft-niu-sfc- 1655 mechanism-00, January 2014. 1657 [draft-rijsman-sfc-metadata-considerations] 1658 B. Rijsman, et al. "Metadata Considerations", draft- 1659 rijsman-sfc-metadata-considerations-00, February 12, 2014 1661 [RFC6241] Enns, R., Bjorklund, M., Schoenwaelder, J., and A. Bierman, 1662 "Network Configuration Protocol (NETCONF)", RFC 6241, June 1663 2011. 1665 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating MPLS 1666 in IP or Generic Routing Encapsulation (GRE)", RFC 4023, 1667 March 2005. 1669 [RFC7510] Xu, X., Sheth, N. et al, "Encapsulating MPLS in UDP", RFC 1670 7510, April 2015. 1672 [draft-ietf-i2rs-architecture] 1673 Atlas, A., Halpern, J., Hares, S., Ward, D., and T Nadeau, 1674 "An Architecture for the Interface to the Routing System", 1675 draft-ietf-i2rs-architecture, work in progress, March 1676 2015. 1678 [consistent-hash] 1679 Karger, D.; Lehman, E.; Leighton, T.; Panigrahy, R.; 1680 Levine, M.; Lewin, D. (1997). "Consistent Hashing and 1681 Random Trees: Distributed Caching Protocols for Relieving 1682 Hot Spots on the World Wide Web". Proceedings of the 1683 Twenty-ninth Annual ACM Symposium on Theory of Computing. 1684 ACM Press New York, NY, USA. pp. 654-663. 1686 [draft-ietf-idr-link-bandwidth] 1687 P. Mohapatra, R. Fernando, "BGP Link Bandwidth Extended 1688 Community", draft-ietf-idr-link-bandwidth, work in 1689 progress. 1691 [flowspec-redirect-ip] 1693 Uttaro, J. et al. "BGP Flow-Spec Redirect to IP Action", 1694 draft-ietf-idr-flowspec-redirect-ip-02, February 2015. 1696 14 Acknowledgments 1698 This document was prepared using 2-Word-v2.0.template.dot. 1700 This document is based on earlier drafts [draft-rfernando-bess- 1701 service-chaining] and [draft-mackie-sfc-using-virtual-networking]. 1703 The authors would like to thank D. Daino, D.R. Lopez, D. Bernier, W. 1704 Haeffner, A. Farrel, L. Fang, and N. So, for their contributions to 1705 the earlier drafts. The authors would also like to thank the 1706 following individuals for their review and feedback on the original 1707 proposals: E. Rosen, J. Guchard, P. Quinn, P. Bosch, D. Ward, A. 1708 Ganesan, N. Seth, G. Pildush and N. Bitar. The authors also thank Wim 1709 Henderickx for his useful suggestions on several aspects of the 1711 Authors' Addresses 1713 Rex Fernando 1714 Cisco 1715 170 W Tasman Drive 1716 San Jose, CA 95134 1717 USA 1718 Email: rex@cisco.com 1720 Stuart Mackie 1721 Juniper Networks 1722 1133 Innovation Way 1723 Sunnyvale, CA 94089 1724 USA 1725 Email: wsmackie@juniper.net 1727 Dhananjaya Rao 1728 Cisco 1729 170 W. Tasman Drive 1730 San Jose, CA 95134 1731 USA 1732 Email: dhrao@cisco.com 1734 Bruno Rijsman 1735 Juniper Networks 1736 1133 Innovation Way 1737 Sunnyvale, CA 94089 1738 USA 1739 Email: brijsman@juniper.net 1741 Maria Napierala 1742 AT&T Labs 1743 200 Laurel Avenue 1744 Middletown, NJ 07748 1745 USA 1746 Email: mnapierala@att.com 1748 Thomas Morin 1749 Orange 1750 2, avenue Pierre Marzin 1751 Lannion 22307 1752 France 1753 Email: thomas.morin@orange.com