idnits 2.17.1 draft-hegde-spring-mpls-seamless-sr-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 47 instances of too long lines in the document, the longest one being 15 characters in excess of 72. == There are 8 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 9 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. == There are 7 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1013 has weird spacing: '... red intra...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Each domain SHOULD be independent and SHOULD not depend on the transport technology in another domain. This allows for more flexible evolution of the network. -- The document date (January 7, 2021) is 1176 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.ietf-idr-tunnel-encaps' is defined on line 1678, but no explicit reference was found in the text == Unused Reference: 'RFC1997' is defined on line 1735, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 1739, but no explicit reference was found in the text == Outdated reference: A later version (-02) exists of draft-hegde-rtgwg-egress-protection-sr-networks-01 == Outdated reference: A later version (-17) exists of draft-kaliraj-idr-bgp-classful-transport-planes-06 ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) == Outdated reference: A later version (-26) exists of draft-ietf-idr-segment-routing-te-policy-11 == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-20 == Outdated reference: A later version (-26) exists of draft-ietf-lsr-flex-algo-13 == Outdated reference: A later version (-15) exists of draft-ietf-pce-segment-routing-policy-cp-01 == Outdated reference: A later version (-13) exists of draft-ietf-rtgwg-segment-routing-ti-lfa-05 == Outdated reference: A later version (-22) exists of draft-ietf-spring-segment-routing-policy-09 == Outdated reference: A later version (-09) exists of draft-ietf-spring-sr-service-programming-03 == Outdated reference: A later version (-03) exists of draft-saad-sr-fa-link-02 Summary: 2 errors (**), 0 flaws (~~), 19 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SPRING S. Hegde 3 Internet-Draft C. Bowers 4 Intended status: Standards Track Juniper Networks Inc. 5 Expires: July 11, 2021 X. Xu 6 Alibaba Inc. 7 A. Gulko 8 Refinitiv 9 A. Bogdanov 10 Google Inc. 11 J. Uttaro 12 ATT 13 L. Jalil 14 Verizon 15 M. Khaddam 16 Cox communications 17 A. Alston 18 Liquid Telecom 19 January 7, 2021 21 Seamless Segment Routing 22 draft-hegde-spring-mpls-seamless-sr-04 24 Abstract 26 In order to operate networks with large numbers of devices, network 27 operators organize networks into multiple smaller network domains. 28 Each network domain typically runs an IGP which has complete 29 visibility within its own domain, but limited visibility outside of 30 its domain. Seamless Segment Routing (Seamless SR) provides 31 flexible, scalable and reliable end-to-end connectivity for services 32 across independent network domains. Seamless SR accommodates domains 33 using SR, LDP, and RSVP for MPLS label distribution as well as 34 domains running IP without MPLS (IP-Fabric).It also provides seamless 35 connectivity across domains having different IPv6 technologies such 36 as SRv6 and SRm6. 38 Requirements Language 40 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 41 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 42 document are to be interpreted as described in RFC 2119 [RFC2119]. 44 Status of This Memo 46 This Internet-Draft is submitted in full conformance with the 47 provisions of BCP 78 and BCP 79. 49 Internet-Drafts are working documents of the Internet Engineering 50 Task Force (IETF). Note that other groups may also distribute 51 working documents as Internet-Drafts. The list of current Internet- 52 Drafts is at https://datatracker.ietf.org/drafts/current/. 54 Internet-Drafts are draft documents valid for a maximum of six months 55 and may be updated, replaced, or obsoleted by other documents at any 56 time. It is inappropriate to use Internet-Drafts as reference 57 material or to cite them other than as "work in progress." 59 This Internet-Draft will expire on July 11, 2021. 61 Copyright Notice 63 Copyright (c) 2021 IETF Trust and the persons identified as the 64 document authors. All rights reserved. 66 This document is subject to BCP 78 and the IETF Trust's Legal 67 Provisions Relating to IETF Documents 68 (https://trustee.ietf.org/license-info) in effect on the date of 69 publication of this document. Please review these documents 70 carefully, as they describe your rights and restrictions with respect 71 to this document. Code Components extracted from this document must 72 include Simplified BSD License text as described in Section 4.e of 73 the Trust Legal Provisions and are provided without warranty as 74 described in the Simplified BSD License. 76 Table of Contents 78 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 79 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 80 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 81 3.1. Service provider network . . . . . . . . . . . . . . . . 5 82 3.2. Large scale WAN networks . . . . . . . . . . . . . . . . 7 83 3.3. Data Center Interconnect (DCI) Networks . . . . . . . . . 8 84 3.4. Service Function Chaining . . . . . . . . . . . . . . . . 8 85 3.5. Multicast Use cases . . . . . . . . . . . . . . . . . . . 9 86 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 9 87 4.1. MPLS Transport . . . . . . . . . . . . . . . . . . . . . 10 88 4.2. SLA Guarantee . . . . . . . . . . . . . . . . . . . . . . 10 89 4.3. Scalability . . . . . . . . . . . . . . . . . . . . . . . 10 90 4.4. Availability . . . . . . . . . . . . . . . . . . . . . . 10 91 4.5. Operations . . . . . . . . . . . . . . . . . . . . . . . 11 92 4.6. Service Mapping . . . . . . . . . . . . . . . . . . . . . 11 93 5. Alternative Solutions . . . . . . . . . . . . . . . . . . . . 12 94 5.1. Centralized Solutions . . . . . . . . . . . . . . . . . . 12 95 5.2. Distributed solutions . . . . . . . . . . . . . . . . . . 12 96 5.3. Choice of Solution . . . . . . . . . . . . . . . . . . . 12 98 6. Seamless Segment Routing architecture . . . . . . . . . . . . 13 99 6.1. Solution Concepts . . . . . . . . . . . . . . . . . . . . 13 100 6.2. BGP Classful Transport . . . . . . . . . . . . . . . . . 14 101 6.3. Automatically Creating Transport Classes . . . . . . . . 19 102 6.3.1. Automatically Creating Transport Classes for BGP-SR- 103 TE Intra-domain Tunnels . . . . . . . . . . . . . . . 19 104 6.3.2. Automatically Creating Transport Classes for Flex- 105 Algo Tunnels . . . . . . . . . . . . . . . . . . . . 19 106 6.3.3. Auto-deriving Transport Classes for PCEP . . . . . . 20 107 6.4. Inter-domain flex-algo with BGP-CT . . . . . . . . . . . 20 108 6.5. Applicability to color-only policies . . . . . . . . . . 20 109 6.6. Data sovereignty . . . . . . . . . . . . . . . . . . . . 20 110 6.7. Interconnecting IP Fabric Data Centers . . . . . . . . . 22 111 6.8. Translating Transport Classes across Domains . . . . . . 24 112 6.9. SLA Guarantee . . . . . . . . . . . . . . . . . . . . . . 25 113 6.9.1. Low latency . . . . . . . . . . . . . . . . . . . . . 25 114 6.9.2. Traffic Engineering (TE) constraints . . . . . . . . 25 115 6.9.3. Bandwidth constraints . . . . . . . . . . . . . . . . 26 116 6.10. Scalability . . . . . . . . . . . . . . . . . . . . . . . 26 117 6.10.1. Access node scalability . . . . . . . . . . . . . . 26 118 6.10.2. Label stack depth . . . . . . . . . . . . . . . . . 27 119 6.10.3. Label Resources . . . . . . . . . . . . . . . . . . 27 120 6.11. Availability . . . . . . . . . . . . . . . . . . . . . . 30 121 6.11.1. Intra domain link and node protection . . . . . . . 30 122 6.11.2. Egress link and node protection . . . . . . . . . . 30 123 6.11.3. Border Node protection . . . . . . . . . . . . . . . 30 124 6.12. Operations . . . . . . . . . . . . . . . . . . . . . . . 31 125 6.12.1. MPLS ping and Traceroute . . . . . . . . . . . . . . 31 126 6.12.2. Counters and Statistics . . . . . . . . . . . . . . 31 127 6.13. Service Mapping . . . . . . . . . . . . . . . . . . . . . 31 128 6.14. Migrations . . . . . . . . . . . . . . . . . . . . . . . 32 129 6.15. SRv6 interworking with MPLS domains . . . . . . . . . . . 32 130 6.16. Service Function Chaining . . . . . . . . . . . . . . . . 36 131 6.17. BGP based Multicast . . . . . . . . . . . . . . . . . . . 38 132 7. Backward Compatibility . . . . . . . . . . . . . . . . . . . 38 133 8. Security Considerations . . . . . . . . . . . . . . . . . . . 38 134 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 135 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 136 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 38 137 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 138 12.1. Normative References . . . . . . . . . . . . . . . . . . 39 139 12.2. Informative References . . . . . . . . . . . . . . . . . 39 140 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43 142 1. Introduction 144 Evolving wireless access technology and cloud applications are 145 expected to place new requirements on the packet transport networks. 146 These services are contributing to significantly higher bandwidth 147 throughput which in turn leads to a growing number of transport 148 network devices. As an example, 5G networks are expected to require 149 up to 250Gbps in the fronthaul and up to 400Gbps in the backhaul. 150 There is a desire to allow many network functions to be virtualized 151 and cloud native. In order to support latency-sensitive cloud-native 152 network functions, packet transport networks should be capable of 153 providing low-latency paths end-to-end. Some services will require 154 low-latency paths while others may require different QoS properties. 155 The network should be able to differentiate between the services and 156 provide corresponding SLA transport paths. In addition, as these 157 applications become more sensitive and less loss tolerant, more and 158 more emphasis is placed on overall service availability and 159 reliability. 161 The Seamless SR architecture builds upon the Seamless MPLS 162 architecture and caters to new requirements imposed by the 5G 163 transport networks and the cloud applications. 164 [I-D.ietf-mpls-seamless-mpls], contains a good description of the 165 Seamless MPLS architecture. Although [I-D.ietf-mpls-seamless-mpls] 166 has not been published as an RFC, it serves as a useful description 167 of the Seamless MPLS architecture. [I-D.ietf-mpls-seamless-mpls] 168 describes the Seamless MPLS architecture, which uses LDP and/or RSVP 169 for intra-domain label distribution, and BGP-LU [RFC3107] for end-to- 170 end label distribution. Seamless SR focuses on using segment routing 171 for intra-domain label distribution. The mechansims described in 172 this document are equally applicable to intra-domain tunneling 173 mechanisms deployed using RSVP and/or LDP. 175 By using segment routing for intra-domain label distribution, 176 Seamless SR is able to easily support both SR-MPLS on IPv4 and IPv6 177 networks. This overcomes a limitation of the classic Seamless MPLS 178 architecture, which was limited to run MPLS on IPv4 networks in 179 practice. Seamless SR (like Seamless MPLS) can use BGP-LU (RFC 3107) 180 to stitch different domains. However, Seamless SR can also take 181 advantage of BGP Prefix-SID [RFC8669] to provide predictable and 182 deterministic labels for the inter-domain connectivity. 184 The basic functionality of the Seamless SR architecture does not 185 require any enhancements to existing protocols. However, in order to 186 support end-to-end service requirements across multiple domains, 187 protocol extensions may be needed. This draft discusses use cases, 188 requirements, and potential protocol enhancements. 190 Section Section 3 describes usecases and section Section 4 describes 191 requirements arising out of the usecases. There may be alternative 192 solutions available to solve the same usecases. This document does 193 not exclude other possible solutions. Section Section 5 refers to 194 possible alternative solutions and describes how the different 195 archictures can co-exist in the same network or be deployed 196 independently. 198 2. Terminology 200 This document uses the following terminology 202 o Access Node (AN): An access node is a node which processes 203 customers frames or packets at Layer 2 or above. This includes 204 but is not limited to DSLAMs and Cell Site Routers in 5G networks. 205 Access nodes have only limited MPLS functionalities 206 in order to reduce complexity in the access network. 208 o Pre-Aggregation Node (P-AGG): A pre-aggregation node (P-AGG) is a node 209 which aggregates several access nodes (ANs). 211 o Aggregation Node (AGG): A aggregation node (AGG) is a node which 212 aggregates several pre-aggregation nodes (P-AGG). 214 o Area Border Router (ABR): Router between aggregation and core 215 domain. 217 o Label Switch Router (LSR): Label Switch router are pure transit nodes. 218 ideally have no customer or service state and are therefore decoupled 219 from service creation. 221 o Use Case: Describes a typical network including service creation 222 points and distribution of remote node loopback prefixes. 224 Figure 1: Terminology 226 3. Use Cases 228 3.1. Service provider network 230 Service provider transport networks use multiple domains to support 231 scalability. For this analysis, we consider a representative network 232 design with four level of hierarchy: access domains, pre-aggregation 233 domains, aggregation domains and a core. (See Figure 2). The 5G 234 transport networks in particular are expected to scale to very large 235 number of access nodes due to the shorter range of the 5G radio 236 technology. The networks are expected to scale up to one million 237 nodes. 239 +-------+ +-------+ +------+ +------+ 240 | | | | | | | | 241 +--+ P-AGG1+---+ AGG1 +---+ ABR1 +---+ LSR1 +--> to ABR 242 / | | /| | | | | | 243 +----+/ +-------+\/ +-------+ +------+ /+------+ 244 | AN | /\ \/ 245 +----+\ +-------+ \+-------+ +------+/\ +------+ 246 \ | | | | | | \| | 247 +--+ P-AGG2+---+ AGG2 +---+ ABR2 +---+ LSR2 +--> to ABR 248 | | | | | | | | 249 +-------+ +-------+ +------+ +------+ 251 ISIS L1 ISIS L2 ISIS L2 253 |-Access-|--Aggregation Domain--|---------Core-----------------| 255 Figure 2: 5G network 257 Many network functions in a 5G network will be virtualized/ 258 containerized and distributed across multiple data centers. 259 Virtualized network functions are instantiated dynamically across 260 different compute resources. This requires that the underlying 261 transport network supports the stringent SLA on end-to-end paths. 263 5G networks support variety of service use cases that require end-to- 264 end slicing. In certain cases the end-to-end connectivity requires 265 differentiated forwarding capabilities. Seamless SR architecture 266 should provide the ability to establish end-to-end paths that satisfy 267 the required SLAs. For example, end user requirement could be to 268 establish a low latency path end-to-end. The System Architecture for 269 the 5G System [TS.23.501-3GPP] currently defines four standardized 270 Slice/Service Types: Enhanced Mobile Broadband (eMBB), Ultra-Reliable 271 Low Latency Communication (URLLC), massive Internet of Things (mIoT), 272 Vehicle to everything (V2X). The Seamless SR should support end-to- 273 end Service Level Objectives(SLO) to allow the creation of network 274 slices with these four Slice/Service Types. 276 Many deployments consist of ring topologies in the access and 277 aggregation networks. In the ring topologies, there are at most two 278 forwarding paths for the traffic, whereas the core networks consist 279 of nodes with more denser connectivity compared to ring topologies. 280 Thus core networks may have a larger number of TE paths while access 281 networks will have a smaller number of TE paths. The Seamless SR 282 architecture should support the ability to have more TE paths in one 283 domain and lesser number of TE paths in another domain and provide 284 the ability to effectively connect the domains end-to-end while 285 satisfying end-to-end constraints. 287 3.2. Large scale WAN networks 289 As WAN networks grow beyond several thousand nodes, it is often 290 useful to divide the network into multiple IGP domains, as 291 illustrated in Section 3.2. Separate IGP domains increase service 292 availability by establishing a constrained failure domain. Smaller 293 IGP domains may also improve network performance and health by 294 reducing the device scale profile (including protocol and FIB scale). 296 +-------+ +-------+ +-------+ 297 | | | | | | 298 | ABR1 ABR2 ABR3 ABR4 | 299 | | | | | | 300 PE1+DOMAIN1+-----+DOMAIN2+-----+DOMAIN3+PE2 301 | | | | | | 302 | ABR11 ABR22 ABR33 ABR44 | 303 | | | | | | 304 +-------+ +-------+ +-------+ 306 |-ISIS1-| |-ISIS2-| |-ISIS3-| 308 Figure 3: WAN Network 310 These Large WAN networks often cross national boundaries. In order 311 to meet data sovereignty requirements, operators need to maintain 312 strict control over end-to-end traffic-engineered(TE) paths. Segment 313 Routing provides two main solutions to implement highly constrained 314 TE paths. Flex-algo (defined in [I-D.ietf-lsr-flex-algo]) uses 315 prefix-SIDs computed by all nodes in the IGP domain using the same 316 pruned topology. Highly constrained TE paths for the data 317 sovereignty use case can also be implemented using SR-TE policies 318 ([I-D.ietf-spring-segment-routing-policy]) built using unprotected 319 adjacency SIDs. 321 Both of these approaches work well for intra-domain TE paths. 322 However, they both have limitations when one tries to extend them to 323 the creation of highly constrained inter-domain TE paths. A goal of 324 seamless SR is to be able to create highly constrained inter-domain 325 TE paths in a scalable manner. 327 Some deployments may use a centralized controller to acquire the 328 topologies of multiple domains and build end-to-end constrained 329 paths. This can be scaled with hierarchical controllers. However, 330 there is still significant risk of a loss of network connectivity to 331 one or more controllers, which can result in a failure to satisfy the 332 strict requirements of data sovereignty. The network should have 333 pre-established TE paths end-to-end that don't rely on controllers in 334 order to address these failure scenarios. 336 3.3. Data Center Interconnect (DCI) Networks 338 Data centers are playing an increasingly important role in providing 339 access to information and applications. Geographically diverse data 340 centers usually connect via a high speed, reliable and secure core 341 network. 343 +-------+ +-------+ +-------+ 344 | ASBR1 ASBR2 ASBR3 ASBR4 | 345 | | | | | | 346 PE1+ DC1 +-----+ CORE +-----+ DC2 +PE2 347 | ASBR11 ASBR22 ASBR33 ASBR44 | 348 | | | | | | 349 +-------+ +-------+ +-------+ 351 |-ISIS1-| |-ISIS2-| |-ISIS3-| 353 Figure 4: DCI Network 355 In many Data Center deployments, applications require end-to-end path 356 diversity and/or end-to-end low latency paths. In certain cases it 357 is desirable to have a uniform technology deployed in the core as 358 well as in the Data Centers to create these SLA paths. Such 359 uniformity simplifies the network to a great extent. In certain 360 other cases, the datacenter environments may deploy SRv6 and the core 361 network may be running MPLS. It is desirable for a solution to only 362 require service-related configurations on the access end-points where 363 services are attached, avoiding service-related configurations on the 364 ABR/ASBR nodes. 366 3.4. Service Function Chaining 368 [RFC7665] defines service functions chaining as an ordered set of 369 service functions and automated steering of traffic through these set 370 of service functions. There could be a variety of service functions 371 such as firewalls, parental control, CGNAT etc. In 5G networks these 372 functions may be completely virtualised or could be a mix of 373 virtualized functions and physical appliances. It is required that 374 the inter-domain solution caters to the service function chaining 375 requirements. 377 3.5. Multicast Use cases 379 Multicast services such as IPTV and multicast VPN also need to be 380 supported across a multi-domain service provider network. 382 +---------+---------+---------+ 383 | | | | 384 S1 ABR1 ABR2 R1 385 | Metro1 | Core | Metro2 | 386 | | | | 387 S2 ABR11 ABR22 R2 388 | | | | 389 +---------+---------+---------+ 391 |-ISIS1-| |-ISIS2-| |-ISIS3-| 393 Figure 5: Multicast usecases 395 Figure 5 shows a simplified multi-domain network supporting 396 multicast. Multicast sources S1 and S2 lie in a different domain 397 from the receivers R1 and R2. Using multiple IGP domains presents a 398 problem for the establishment of multicast replication trees. 399 Typically, a multicast receiver does a reverse path forwarding (RPF) 400 lookup for a multicast source. One solution is to leak the routes 401 for multicast sources across the IGP domains. However, this can 402 compromise the scaling properties of the multi-domain architecture. 403 SR-P2MP [I-D.voyer-pim-sr-p2mp-policy] offers a solution for both 404 intra-domain and inter-domain multicast. However, it does not 405 accommodate deployments using existing intra-domain multicast 406 technology, such as mLDP [RFC6388] in some of the domains. A 407 solution should accommodate a mixture of existing and newer 408 technologies to better facilitate coexistence and migration. 410 4. Requirements 412 This section provides a summary of requirements derived from the use 413 cases described in previous sections. 415 4.1. MPLS Transport 417 The architecture SHOULD provide MPLS transport between two service 418 endpoints regardless of whether the two end-points are in the same 419 IGP domain, different IGP domains, or in different autonomous 420 systems. 422 The MPLS transport SHOULD be supported on IPv4, IPv6, and dual- 423 stack networks. 425 4.2. SLA Guarantee 427 The architecture SHOULD allow the creation of paths that support 428 end-to-end SLAs. The paths should for example obey constraints 429 related to latency, diversity, bandwidth and availability. 431 The architecture SHOULD support end-to-end network slicing as 432 described by 5G transport requirements [TS.23.501-3GPP]. 434 4.3. Scalability 436 The architecture SHOULD be able to support up to 1 million nodes. 438 The architecture SHOULD facilitate the use of access nodes with 439 low RIB/FIB and low CPU capabilities. 441 The architecture SHOULD facilitate the use of access nodes with 442 low label stacking capability. 444 The architecture SHOULD allow for a scalable response to network 445 events. An individual node SHOULD only need to respond to a 446 limited subset of network events. 448 Service routes on the border nodes SHOULD be minimized. 450 4.4. Availability 452 Traffic SHOULD be Fast Reroute (FRR) protected against link, node, 453 and SRLG failures within a domain. 455 Traffic SHOULD be Fast Reroute (FRR) protected against border node 456 failures. 458 Traffic SHOULD be Fast Reroute (FRR) protected against egress node 459 and egress link failures. 461 4.5. Operations 463 Each domain SHOULD be independent and SHOULD not depend on the 464 transport technology in another domain. This allows for more 465 flexible evolution of the network. 467 Basic MPLS OAM mechanisms described in [RFC8029] SHOULD be 468 supported. 470 End-to-end mpls ping and traceroute procedures SHOULD be 471 supported. 473 Ability to validate the path inside each domain SHOULD be 474 supported. 476 Statistics for inter-domain paths on the ingress and egress PE 477 nodes as well as border nodes SHOULD be supported. 479 4.6. Service Mapping 481 The architecture SHOULD support the automated steering of traffic 482 on to transport paths based on communities carried in the service 483 prefix advertisements. 485 The architecture SHOULD support the steering of traffic on to 486 transport paths based on the DSCP value carried in IPv4/IPv6 487 packets. 489 Traffic steering based on EXP bits in the mpls header SHOULD be 490 supported. 492 Traffic steering based on 5-tuple packet filter SHOULD be 493 supported. Source address, destination address, source port, 494 destination port and protocol fields should be allowed. 496 All traffic steering mechanisms SHOULD be supported for all kinds 497 of service traffic including VPN traffic as well as global 498 internet traffic. 500 The core domain is expected to have more traffic engineering 501 constraints as compared to metros. The ability to map the 502 services to appropriate transport tunnels at service attachment 503 points SHOULD be supported. 505 5. Alternative Solutions 507 The usecases and requirements discussed in this document may be 508 solved using alternative solutions. The solutions can be divided 509 into two broad categories. 511 Centralized Solutions 513 Distributed Solutions 515 5.1. Centralized Solutions 517 A centralized solution uses one central entity or a set of central 518 entities that have complete visibility into end-to-end paths. The 519 nodes and links used to construct paths may be contained in a single 520 topology database or a set of connected topology databases. A 521 computing entity is also aware of the resource utilization and 522 resource availability in this topology and makes informed computation 523 decisions to construct paths. The solution described in 524 "Interconnecting Millions of Endpoints with Segment Routing " 525 ([RFC8604]) is an example of a centralized architecture. 526 [I-D.saad-sr-fa-link] describes extensions that can be used to extend 527 this architecture to brownfield networks and provides abstractions to 528 scale the solution. 530 5.2. Distributed solutions 532 In a distributed solution, there is no central entity with complete 533 visibility into the end-to-end paths. Each domain independently 534 computes a portion of an end-to-end path, and these independent sub- 535 paths are stitched together at the border nodes between domains. The 536 current document describes Seamless SR, an example of a distributed 537 solution, which uses BGP-based extensions to stitch together complete 538 end-to-end paths that satisfy certain properties. The Seamless SR 539 architecture uses BGP-LU (RFC 3107) and BGP-Prefix-SID (RFC 8669) for 540 end-to-end best path and BGP-CT (draft-kaliraj-idr-bgp-classful- 541 transport-planes) for multiple SLA paths. Seamless SR solution does 542 not exclude possibility of future protocol extensions that adhere to 543 the principles of the architecture to provide end-to-end paths. 545 5.3. Choice of Solution 547 The centralized and the distributed solutions can independently solve 548 the usecases and the requirements discussed in previous sections. 549 One architecture may be more suitable for certain usecases while the 550 other may be more suitable for some other usecase. It is solely at 551 the discretion of the operator to choose the solution that best 552 solves the usecases one has. 554 The two type of solutions are complementary to each other and can co- 555 exist together in the same network. A network operator can use both 556 distributed and centralized solutions in the same network to handle 557 traffic with different requirements. For example, a network operator 558 may find it useful to use centralized solution for traffic that 559 requires stringent latency-bounded paths across network domains under 560 the complete control of the network operator. However, the same 561 network operator may choose to deploy a distributed solution for 562 traffic that crosses a co-operating transit domain, where a 563 centralized solution is precluded. 565 6. Seamless Segment Routing architecture 567 6.1. Solution Concepts 568 The solution described below makes use of the following concepts. 569 The definitions from the draft-kaliraj-idr-bgp-classful-transport-planes have 570 been reproduced here for readability. In case of any conflicts, text from 571 draft-kaliraj-idr-bgp-classful-transport-planes should be used. 573 o Transport Class (TC): A Transport Class is defined as a collection of 574 end-to-end MPLS paths that satisfy a set of constraints or 575 Service Level Agreements. 577 o BGP-Classful Transport (BGP-CT): A new BGP family used to 578 establish Transport Class paths across different domains. 580 o Route Distinguisher (RD): The Route Distinguisher is 581 defined in RFC4364. In BGP-CT, the RD is used in BGP advertisements 582 to differentiate multiple paths to the same loopback address. 583 It may be useful to automatically generate RDs in order to 584 simplify configuration. 586 o Route Target (RT): The Route Target extended community is 587 carried in BGP-CT advertisements. The RT represents the Transport Class 588 of an advertised path. Note that the RT is only carried in 589 the BGP-CT advertisements. No BGP-VPN related configuration or 590 VPN family advertisements are needed when BGP-CT transport paths are used 591 to carry non-VPN traffic. 593 o Mapping Community (MC): The Mapping Community is the BGP extended community 594 as defined in RFC4360. In the Seamless SR architecture, 595 an MC is carried by a BGP-CT route and/or a service route. 596 The MC is used to identify the specific local policy used 597 to map traffic for a service route to different Transport Class paths. 598 When a mapping community is advertised in a BGP-CT route it 599 identifies the specific local policy used to map the BGP-CT 600 route to the intra-domain tunnels.The local policy can include 601 additional traffic steering properties for placing traffic on different 602 Transport Class paths. The values of the MCs and the 603 corresponding local policies for service mapping are defined 604 by the network operator. 606 Figure 6: Solution Concepts 608 6.2. BGP Classful Transport 609 ----IBGP------EBGP----IBGP------EBGP-----IBGP--- 610 | | | | | | 612 +-----------+ +-----------+ +-----------+ 613 | | | | | | 614 | ASBR1+--+ASBR2 ASBR3+--+ASBR4 | 615 PE1+ D1 | X | D2 | X | D3 +PE2 616 | ASBR5+--+ASBR6 ASBR7+--+ASBR8 | 617 | | | | | | 618 +-----+-----+ +-----------+ +-----------+ 619 PE3 621 |---ISIS1---| |---ISIS2---| |---ISIS3---| 623 Figure 7: WAN Network 625 The above diagram shows a WAN network divided into 3 different 626 domains. Within each domain, BGP sessions are established between 627 the PE nodes and the border nodes as well as between border nodes. 628 BGP sessions are also established between border nodes across 629 domains. The goal is for PE1 to have MPLS connectivity to PE2, 630 satisfying specific characteristics. Multiple MPLS paths from PE1 to 631 PE2 are required in order to satisfy different SLAs. 632 [I-D.kaliraj-idr-bgp-classful-transport-planes] defines a new BGP 633 family called BGP-Classful Transport. The NLRI for this new family 634 consists of a prefix and a Route Distinguisher. The prefix 635 corresponds to the loopback of the destination PE, and RD is used to 636 distinguish different paths to the same PE loopback. The BGP-CT 637 advertisement also carries a Route Target. The RT specifies the 638 Transport Class to which the BGP-CT advertisement belongs. BGP-CT 639 mechanisms are applicable to single ownership networks that are 640 organized into multiple domains. It is also applicable to multiple 641 ASes with different ownership but closely co-operating 642 administration. BGP-CT mechansims are not expected to be applied on 643 the internet peering or between domains that have completely 644 independent administrations. 646 BGP-CT advertisements for red Transport Class 648 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 649 RD:RD1 RD:RD1 RD:RD1 RD:RD1 RD:RD1 650 RT:Red RT:Red RT:Red RT:Red RT:Red(100) 651 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:PE2 652 Label:L1 Label:L2 Label:L3 Label:L4 Label:L5 654 PE1-------ASBR1------ASBR2---------ASBR3-------ASBR4--------PE2 656 VPNa Prefix: 657 10.1.1.1/32 658 RD: RD50 659 RT: RT-VPNa 660 ext-community: 661 Red(100) 662 nh: PE2 663 Label: S1 665 +------+ +------+ +------+ 666 | IL71 | | IL72 | | IL73 | 667 +------+ +------+ +------+ +------+ +------+ 668 | L1 | | L2 | | L3 | | L4 | | L5 | 669 +------+ +------+ +------+ +------+ +------+ 670 | S1 | | S1 | | S1 | | S1 | | S1 | 671 +------+ +------+ +------+ +------+ +------+ 673 Label stacks along end-to-end path 674 S1 is the end-to-end service label. 675 IL71, IL72, and IL73 are intra-domain labels corresponding to 676 red intra-domain paths. 678 Figure 8: BGP-CT Advertisements and Label Stacks 679 BGP-CT advertisements for blue Transport Class 681 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 682 RD:RD2 RD:RD2 RD:RD2 RD:RD2 RD:RD2 683 RT:Blue RT:Blue RT:Blue RT:Blue RT:Blue(200) 684 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:PE2 685 Label:L11 Label:L12 Label:L13 Label:L14 Label:L15 687 PE1-------ASBR1----ASBR2----------ASBR3-------ASBR4--------PE2 689 VPNb Prefix: 690 10.1.1.1/32 691 RD: RD51 692 RT: RT-VPNb 693 ext-community: 694 Blue(200) 695 nh: PE2 696 Label: S2 698 +------+ +------+ +------+ 699 | IL81 | | IL82 | | IL83 | 700 +------+ +------+ +------+ +------+ +------+ 701 | L11 | | L12 | | L13 | | L14 | | L15 | 702 +------+ +------+ +------+ +------+ +------+ 703 | S2 | | S2 | | S2 | | S2 | | S2 | 704 +------+ +------+ +------+ +------+ +------+ 706 Label stacks along end-to-end path 707 S2 is the end-to-end service label. 708 IL81, IL82, and IL83 are intra-domain labels corresponding to 709 blue intra-domain paths. 711 Figure 9: BGP-CT Advertisements and Label Stacks 713 For example, consider the diagram in Figure 8 and Figure 9 . The 714 diagram shows the BGP-CT advertisements corresponding to two 715 different end-to-end paths between PE1 and PE2. The two different 716 paths belong to two different Transport Classes, red and blue. 718 The inter-domain paths created by BGP-CT Transport Classes can be 719 used by any traffic that can be steered using BGP next-hop 720 resolution, including vanilla IPv4 and IPv6, L2VPN, L3VPN, and eVPN. 721 In the example above, we show how traffic from two different L3VPNs 722 (VPNa and VPNb) is mapped onto two different BGP-CT Transport Classes 723 (Red and Blue). The L3VPN advertisements for VPNa and VPNb are 724 originated by PE2 as usual. PE1 receives these L3VPN advertisements 725 and uses the next-hop in the L3VPN advertisements to determine the 726 path to use. In the absence of any BGP-CT Transport Classes in the 727 network, PE1 would likely resolve the L3VPN next-hop over BGP-LU 728 routes corresponding to the BGP best path. However, when BGP-CT 729 Transport Classes are used, PE1 will resolve the L3VPN next-hop over 730 a BGP-CT route. 732 In the example above, PE2 originates BGP-CT advertisements for the 733 Red and Blue Transport Classes. These BGP-CT advertisements 734 propagate across the multiple domains, causing forwarding state for 735 the two Transport Classes to be installed at ABRs along the way. In 736 order to create unique NLRIs for the two advertisements, PE2 uses two 737 different RDs. In the example above, the red BGP-CT advertisement 738 has an RD of RD1 and the blue BGP-CT advertisement has an RD of RD2. 739 Note that the RD values used in the BGP-CT advertisement are 740 completely independent of the RD values used in the L3VPN 741 advertisements. In both cases, the RD values are simply a mechanism 742 to guarantee uniqueness of a prefix/RD pair. 744 The RT values used in the BGP-CT advertisements are unrelated to the 745 RT values used on the L3VPN advertisements. The L3VPN RT values 746 identify VPN membership, as usual. The BGP-CT RT values identify 747 Transport Class membership. In order to be able to easily map VPN 748 traffic into BGP-CT Transport classes, it can be useful however to 749 make an association between BGP-CT RT values and color extended 750 community values in the L3VPN advertisements. In the example 751 above,the RT value carried in the BGP-CT advertisement originated 752 from PE2 for the red Transport Class is configured to correspond to 753 the color extended community advertised in the VPN advertisement for 754 VPNa. Similarly, the RT value for the blue Transport Class 755 corresponds to the color extended community for VPNb. In this way, 756 traffic on PE1 for each VPN can be mapped to a tranport class path by 757 associating the value of the color extended community carried in the 758 VPN advertisement with an RT value carried in a BGP-CT advertisement. 760 The example above also shows the label stacks at different points 761 along the end-to-end paths for the forwarding entries which are 762 established by the two advertisements. Labels L1-L4 are red BGP-CT 763 labels advertised by border nodes ASBR1,2,3,and 4, while label L5 is 764 advertised by PE2 for the red Transport Class. Labels L11-L14 are 765 blue BGP-CT labels advertised by border nodes ASBR1,2,3,and 4, while 766 label L15 is advertised by PE2 for the blue Transport Class. 768 IL71, IL72, and IL73 represent tunnels internal to the domains 1, 2, 769 and 3 which correspond to the red Transport Class. IL81, IL82, and 770 IL83 represent tunnels internal to the domains 1, 2, and 3 which 771 correspond to the blue Transport Class. In this example, we assume 772 that the intra-domain tunnels correspond to SRTE policies having red 773 SRTE-policy-color and blue SRTE-policy-color. Service labels are 774 represented by S1 and S2. 776 Note that this example focuses on how signalling originated by PE2 777 results in forwarding state used by PE1 to reach PE2 on a specific 778 Transport Class path. The solution supports the establishment of 779 forwarding state for an arbitrary number of PEs to reach PE2. For 780 example, PE3 in Figure 8 can reach PE2 on a red Transport Class path 781 established using the same BGP-CT signalling. The signalling and 782 forwarding state from ASBR1 all the way to PE2 is common to the paths 783 used by both PE1 and PE3. This merging of signalling and forwarding 784 state is essentially to the good scaling properties of the Seamless 785 SR architecture. Millions of end-to-end Transport Class paths can be 786 established in a scalable manner. 788 6.3. Automatically Creating Transport Classes 790 In order to simplify the creation of inter-domain paths, it may be 791 desirable to automatically advertise a BGP-CT Transport Class based 792 on the existence of an intra-domain tunnel. The RT value used on the 793 BGP-CT advertisement is automatically derived from a property of the 794 intra-domain tunnel that triggered its creation. How the Transpor 795 Class RT value is derived for different types of intra-domain tunnels 796 is discussed below. 798 6.3.1. Automatically Creating Transport Classes for BGP-SR-TE Intra- 799 domain Tunnels 801 When the intra-domain tunnel is a BGP-SR-TE policy 802 [I-D.ietf-idr-segment-routing-te-policy], the value of the Transport 803 Class RT in the corresponding BGP-CT advertisement is derived from 804 the Policy Color contained in SR Policy NLRI. The 32-bit Policy 805 Color is directly converted to a 32-bit Transport Class RT. 807 6.3.2. Automatically Creating Transport Classes for Flex-Algo Tunnels 809 When the intra-domain tunnel is created using Flex-Algo 810 [I-D.ietf-lsr-flex-algo], the value of the Transport Class RT in the 811 corresponding BGP-CT advertisement is derived from the 8-bit 812 Algorithm value carried in SR-Algorithm sub-TLV (RFC8667). The 813 conversion from 8-bit Algorithm value to 32-bit Transport Class RT is 814 done by treating both as unsigned integers. Note that this 815 definition allows for intra-domain tunnels created via standardized 816 algorithm (0-127) as well as flex-algo (128-255). 818 6.3.3. Auto-deriving Transport Classes for PCEP 820 When the intra-domain tunnel is created using PCEP, the value of the 821 Transport Class RT in the corresponding BGP-CT advertisement is 822 derived from the Color of the SR Policy Identifiers TLV defined in 823 [I-D.ietf-pce-segment-routing-policy-cp]. The 32-bit Color is 824 directly converted to a 32-bit Transport Class RT. 826 6.4. Inter-domain flex-algo with BGP-CT 828 Flex-algo (defined in [I-D.ietf-lsr-flex-algo]) provides a mechanism 829 to separate routing planes. Multiple algorithms are defined and 830 prefix-SIDs are advertised for each algorithm. BGP-CT can be used to 831 advertise these flex-algo SIDs in BGP-CT. BGP Prefix-SID (RFC 8669) 832 is an attribute and can be carried in the BGP-CT NLRI. Multiple 833 transport classes that correspond to each of the flex-algo in IGP 834 domain are defined. These Transport Classes advertise the IGP flex- 835 algo SIDs in the prefix-SIDs attribute in the BGP-CT NLRI. 837 6.5. Applicability to color-only policies 839 Color-only policies consist of (nullEndpont, color) as specified in 840 [I-D.ietf-spring-segment-routing-policy]. Special steering 841 mechanisms are defined with "CO" flags defined in the color extended 842 community [I-D.ietf-idr-segment-routing-te-policy]. Color-only 843 policies can be advertised in BGP-CT with the prefix being NULL 844 (0.0.0.0/32 or 0::0/128). Seperate RD will be advertised for each 845 NULL advertisement with different color. The Route target carries 846 the Policy Color contained in SR Policy NLRI. The steering 847 mechanisms defined in [I-D.ietf-spring-segment-routing-policy] MUST 848 be honoured while resolving services prefixes on the BGP-CT 849 advertisements. 851 6.6. Data sovereignty 852 +-----------+ +-----------+ +-----------+ 853 | | | +-+ AS2 | | | 854 | A1+--+A2 | | A3+--+A4 | 855 PE1+ AS1 | | |Z| | | AS3 +PE3 856 | A5+--+A6 | | A7+--+A8 | 857 | | | +-+ | | | 858 +--A13--A15-+ +-A17--A19--+ +-----------+ 859 | | | | 860 | | | | 861 | | | | 862 +--A14--A16-+ +-A18--A20--+ 863 | | | | 864 | A9+--+A10 | 865 PE4+ AS4 | | AS5 | 866 | A11+-+A12 | 867 | | | | 868 +-----------+ +-----------+ 870 Figure 10: Multi domain Network 872 Consider a WAN network with multiple ASes as shown in the diagram 873 Figure 10. The ASes roughly correspond to the geographical location 874 of the nodes. In this example, we assume that each AS corresponds to 875 a continent. The data sovereignty requirement in this example is 876 that certain traffic from PE1(in AS1) to PE3(in AS3) must not cross 877 through country Z in AS2. As indicate by the location of country Z 878 in the diagram, all paths that go directly from AS1 to AS3 through 879 AS2 necessarily passes through country Z. Using BGP-LU to provide 880 connectivity from PE1 to PE3 would generally result in a path that 881 goes from AS1 to AS2 to AS3, which does not satisfy the data 882 sovereignty requirement in this example. Instead, the solution using 883 BGP-CT will go from AS1 to AS4 to AS5 to AS2 to AS3. BGP-CT will 884 ensure that when the traffic passes through AS2, only intra-domain 885 paths satisfying the data sovereignty requirement will be used. 887 Within AS2, there are several different intra-domain TE mechanisms 888 that can be used to exclude links that pass through country Z. For 889 example, RSVP-TE or flex-algo can be used to create intra-domain 890 paths that satisfy the data sovereignty requirement. BGP-CT allows 891 the constrained intra-domain paths to satisfy requirements for end- 892 to-end inter-domain paths. LSPs created by RSVP-TE or Flex-algo that 893 satisfy the "exclude country Z" constraint are associated with a 894 color Green. A Green Transport Class is defined on border nodes in 895 all ASes. This Green Transport Class is associated with a mapping 896 community called Not-Z. 898 In AS2, the ASBRs are configured such that the presence of the 899 mapping community Not-Z in BGP-CT routes results in a strict route 900 resolution mechanism for those routes. A BGP-CT route carrying the 901 color extended community Not-Z will only resolve on the Green 902 Tranport Class. So it will only use Green intra-domain tunnels. 904 In AS1, AS3, AS4, and AS5, no links pass through country Z, so all 905 intra-domain paths automatically satisfy the data sovereignty 906 requirement. So there is no need for the creation of Green intra- 907 domain tunnels. In these ASes, the presence of the mapping community 908 Not-Z in BGP-CT routes results in resolution on best-effort paths. 909 Even though the ASBRs in these ASes do not need to create Green 910 intra-domain tunnels, they still need to allocate labels to identify 911 traffic using the Green Transport Class. These labels will be used 912 by the ASBRs in AS2 to put traffic on the Green intra-domain tunnels 913 in AS2. 915 The requirement is that only a subset of traffic honor the data 916 sovereignty requirement. The service prefixes from PE1 to PE2 that 917 need to honor the data sovereignty requirement will be associated 918 with Green extended color community in the service advertisements. 919 This will result in PE1 using the BGP-CT labels corresponding to 920 {PE2, Green} to forward the traffic. BGP-CT labels corresponding to 921 {PE2, Green} will exist at every ASBR along the path. The traffic 922 originating on PE1, will be associated with Green color community. 923 The bottom-most label in the packet consists of a VPN label. Above 924 the VPN label, BGP-CT label is imposed. Above BGP-CT label, the 925 intra-domain transport label is imposed. Let us assume the traffic 926 from PE1 needs to go to PE2 through AS1, AS4, AS5, AS2, and AS3. The 927 BGP-CT label for {PE2, Green} will be swapped at the border nodes. 929 Note that end-to-end inter-domain data sovereignty can in principle 930 be accomplished using BGP-LU with multiple loopbacks and associating 931 those loopbacks to appropriate transport tunnels at every border node 932 in every domain. This is very configuration intensive and require 933 multiple loopbacks. BGP-CT builds on the basic mechanisms of BGP-LU 934 while greatly simplifying such use cases. 936 6.7. Interconnecting IP Fabric Data Centers 937 Prefix:TOR2 Prefix:TOR2 Prefix:TOR2 Prefix:TOR2 Prefix:TOR2 938 RD:RD2 RD:RD2 RD:RD2 RD:RD2 RD:RD2 939 RT:Blue RT:Blue RT:Blue RT:Blue RT:Blue 940 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:TOR2 941 Label:L11 Label:L12 Label:L13 Label:L14 Label:L15 943 +-----------+ +-----------+ +-----------+ 944 | ASBR1 ASBR2 ASBR3 ASBR4 | 945 | | | | | | 946 TOR1+ DC1 +-------+ CORE +--------+ DC2 +TOR2 947 | ASBR11 ASBR22 ASBR33 ASBR44 | 948 | | | | | | 949 +-----------+ +-----------+ +-----------+ 951 +------+ +------+ +------+ 952 | UDP | | IL82 | | UDP | 953 +------+ +------+ +------+ +------+ +------+ 954 | L11 | | L12 | | L13 | | L14 | | L15 | 955 +------+ +------+ +------+ +------+ +------+ 956 | S2 | | S2 | | S2 | | S2 | | S2 | 957 +------+ +------+ +------+ +------+ +------+ 959 Label stacks along end-to-end path 960 S2 is the end-to-end service label. 961 IL82, is intra-domain labels corresponding to 962 blue intra-domain paths. 964 Figure 11: Operation in IP fabric 966 Many data center networks consist of IP fabrics which do not have 967 MPLS packet processing capability. A common requirement is that 968 traffic originated from an IP Fabric data center needs to satisfy 969 certain constraints in the MPLS-enable core, for example, only using 970 a subset of links (blue links). It is useful for the traffic 971 originating in an IP Fabric DC to carry information that allows the 972 MPLS-enable core to treat it accordingly. MPLSoUDP, as defined in 973 [RFC7510], is a mechanism where a UDP header is imposed on an MPLS 974 packets on the border nodes. In Figure 11 above, the traffic needs 975 to take blue paths in the core. The Blue Transport Class is defined 976 on the ASBRs. In the core, Blue intra-domain tunnels are created. 977 The BGP-CT advertisements for the Blue Transport Class are as shown 978 in the diagram. The BGP-CT advertisements originate at TOR2 and 979 propagate through all the ASBRs, until finally reaching TOR1. Within 980 DC1, traffic is encapsulated with a UDP header. Traffic with the UDP 981 header gets decapsulated at ASBR1. The traffic follows Blue paths in 982 the core. At ASBR4, the MPLS packet gets encapsulated with a UDP 983 header. The UDP header is removed at TOR2, and the lookup will be 984 done for the service label. 986 6.8. Translating Transport Classes across Domains 988 Prefix:PE2 Prefix:PE2 Prefix:PE2 989 RD:RD2 RD:RD2 RD:RD2 990 RT:Red RT:Blue RT:Blue 991 nh:ASBR1 nh:ASBR2 nh:PE2 992 Label:L11 Label:L12 Label:L13 994 +-----------+ +-----------+ 995 | ASBR1 ASBR2 | 996 | | | | 997 PE1+ AS1 +----------------+ AS2 +PE2 998 | ASBR11 ASBR22 | 999 | | | | 1000 +-----------+ +-----------+ 1002 +------+ +------+ 1003 | IL1 | | IL2 | 1004 +------+ +------+ +------+ +------+ 1005 | L11 | | L12 | | L13 | | L14 | 1006 +------+ +------+ +------+ +------+ 1007 | S2 | | S2 | | S2 | | S2 | 1008 +------+ +------+ +------+ +------+ 1010 Label stacks along end-to-end path 1011 S2 is the end-to-end service label. 1012 IL1 and IL2 are intra-domain labels corresponding to 1013 red intra-domain path in AS1 and Blue intra-domain 1014 path in AS2. 1016 Figure 12: Translating Transport Classes across Domains 1018 In certain scenarios, the TE intent represented by Transport Classes 1019 may differ from one domain to another. This could be the result of 1020 two independent organizations merging into one. It could also occur 1021 when two ASes are under different administration, but use BGP-CT to 1022 provide an end-to-end service. In both scenarios, the same color may 1023 represent different intent in each domain. When the traffic needs to 1024 satisfy certain TE characteristic, the colors need to be mapped 1025 correctly at the border. In the example in Figure 12, there are two 1026 ASes. The low latency TE intent is represented with the Red 1027 Transport Class in AS1 and with the Blue Transport Class in AS2. PE2 1028 advertises a BGP-CT prefix with RT of Blue. ASBR2 sets the nexthop 1029 to self and advertises a new label L12. On ASBR1, the Blue BGP-CT 1030 advertisement is imported into the Red Transport RIB and the 1031 advertisement from ASBR1 will carry a Red RT. This ensures that the 1032 BGP-CT prefix for PE2 resolves on a Red intra-domain path in AS1. 1033 The detailed protocol procedures for this usecase is described in 1034 section 10 of [I-D.kaliraj-idr-bgp-classful-transport-planes]. 1036 6.9. SLA Guarantee 1038 6.9.1. Low latency 1040 Many network functions are virtualized and distributed. Certain 1041 functions are time and latency sensitive. In inter-domain networks, 1042 End-to-End latency measurement is required. Inside a domain, latency 1043 measurement mechanisms such as TWAMP [RFC5357] are used and link 1044 latency is advertised in IGP using extensions described in 1045 [RFC8570]and [RFC7471] . 1047 [I-D.ietf-idr-performance-routing] extends the BGP AIGP attribute 1048 [RFC7311] by adding a sub TLV to carry an accumulated latency metric. 1049 The BGP best path selection algorithm used for a Transport Class 1050 requiring low latency will consider the accumulated latency metric to 1051 choose the lowest latency path. 1053 6.9.2. Traffic Engineering (TE) constraints 1055 TE constraints generally include the ability to send traffic via 1056 certain nodes or links or avoid using certain nodes or links. In the 1057 Seamless SR architecture, the intra-domain transport technology is 1058 responsible for ensuring the TE constraints inside the domain, BGP-CT 1059 ensures that the end-to-end path is constructed from intra-domain 1060 paths and inter-AS links that individually satisfy the TE 1061 constraints. 1063 For example, in order to construct a pair of diverse paths, we can 1064 define a red and a blue Transport Class. Within each domain, the red 1065 and blue Transport Class path are realized using intra-domain path 1066 diversity mechanisms. For example, in a domain using flex-algo, red 1067 and blue Transport Classes are realized using red and blue flex-algo 1068 definitions (FAD) which don't share any links. To maintain path 1069 diversity on inter-AS links, BGP policies are used to associate two 1070 inter-AS peers with the red Transport Class and another two inter-AS 1071 peers with the blue Transport Class. 1073 6.9.3. Bandwidth constraints 1075 The Seamless SR architecture does not natively support end-to-end 1076 bandwidth reservations. In this architecture, the bandwidth 1077 utilization characteristics of each domain are managed independently. 1078 The intra-domain bandwidth management can make use of a variety of 1079 tools. 1081 Link bandwidth extended community as defined in 1082 [I-D.ietf-idr-link-bandwidth] allows for efficient weighted load- 1083 balancing of traffic on multiple BGP-CT paths that belong to the same 1084 Transport Class. For optimized path placement, a centralized TE 1085 system may be deployed with BGP policies/communities used for path 1086 placement. 1088 6.10. Scalability 1090 6.10.1. Access node scalability 1092 The Seamless SR architecture needs to be able to accommodate very 1093 large numbers of access devices. These access devices are expected 1094 to be low-end devices with limited FIB capacity. The Seamless MPLS 1095 architecture, as described in [I-D.ietf-mpls-seamless-mpls], 1096 recommends the use of LDP DOD mode to limit the size of both the RIB 1097 and the FIB needed on the access devices. In the Seamless SR 1098 architecture, networks use IGP-based label distribution and do not 1099 have this selective label request mechanism. However, RIB 1100 scalability of access nodes has not been a problem for real seamless 1101 MPLS deployments. In cases where access devices are low on CPU and 1102 memory and unable to support large a RIB, BGP filtering policies can 1103 be applied at the ABR/ASBR routers to restrict the number of BGP-CT 1104 advertisements towards the access devices. The access devices will 1105 receive only the PE loopbacks that it needs to connect to. 1107 6.10.1.1. Automating Filtering of BGP-CT Advertisements using Route 1108 Target Constraints 1110 When access devices have CPU and memory constraints, it is useful to 1111 be able to filter BGP-CT advertisements using policies on border 1112 nodes so that only a subset of BGP-CT advertisements are sent to a 1113 given access device. While this filtering of BGP-CT advertisements 1114 could be done via explicit configuration, it is desirable to have an 1115 automated filtering mechanism. 1117 When a service prefix advertisement is received on an access device, 1118 the protocol nexthop of the service prefix indicates the remote 1119 loopback address from which the service prefix is originated. An 1120 access device only needs to receive the subset of BGP-CT 1121 advertisements corresponding to the originators of the service 1122 prefixes recieved by that access device. When an access node 1123 receives a service prefix with a particular remote loopback address 1124 as the protocol nexthop, it can selectively request the BGP-CT 1125 advertisement for this particular loopback address from the Route 1126 Reflector. 1128 This mechanism is similar to how Route Target Constraints are used to 1129 selectively filter VPN advertisements. [RFC4684]. The Route Target 1130 Constraint defined in [RFC4684] currently allows for filtering based 1131 on Route Target information. Applying a similar mechanism to the 1132 filtering of BGP-CT advertisements based on individual loopback 1133 addresses requires an extension. The minor protocol enhancements 1134 required to achieve this are described in section 11 of 1135 [I-D.kaliraj-idr-bgp-classful-transport-planes] 1137 6.10.2. Label stack depth 1139 The ability for a device to push multiple MPLS labels on a packet 1140 depends on hardware capabilities. Access devices are expected to 1141 have limited label stack push capabilities. Assuming shortest path 1142 SR-MPLS in the access domain, the access domain transport will use a 1143 single label. Lightweight traffic-engineering and slicing could also 1144 be achieved with a single label as described in 1145 [I-D.ietf-lsr-flex-algo]. The Seamless SR architecture can provide 1146 cross-domain MPLS connectivity with a single label. Assuming the use 1147 of a service label, end-to-end connectivity is provided by pushing 1148 one service label, one BGP-CT label, and one intra-domain transport 1149 label (which could also be a Binding-SID). Therefore, access nodes 1150 will only need to be able to push 3 labels for most applications. 1152 6.10.3. Label Resources 1153 -----IBGP----- -----IBGP----- -----IBGP------ 1154 | | | | 1156 BGP-CT Advt: 1157 Prefix: 2.2.2.2 (PE2 loopback) 1158 RD:20000 1159 RT: 128 1160 Label:100 Label:100 Label:101 1161 Next hop:ABR3 Next hop:ABR3 Next hop: PE2 1162 ---------------------------------------------------------------- 1164 BGP-CT Advt: 1165 Prefix: 30.30.30.30 (ABR3 loopback) 1166 RD:30000 1167 RT:128 1168 Label:2000 Label:2001 1169 Nexthop:ABR1 Nexthop:ABR3 1171 +-----------+ +------------+ +-----------+ 1172 / \ / \/ \ 1173 | ABR1 ABR3 | 1174 | | | | 1175 PE1+ Metro1 + Core + Metro2 +PE2 1176 | | | | 1177 | ABR2 ABR4 | 1178 \ /\ /\ / 1179 +------------+ +-----------+ +------------+ 1181 |-ISIS1-| |-ISIS2-| |-ISIS3-| 1183 +------+ +------+ +------+ 1184 | 11111| | 22222| | 33333| IGP-labels: 1185 +------+ +------+ +------+ 11111,22222,33333 1186 | 2000 | | 2001 | | 101 | BGP-CT label: 1187 +------+ +------+ + -----+ For ABR3: 1188 | 100 | | 100 | | VPN | 2000,2001 1189 +------+ +------+ +------+ For PE2: 1190 | VPN | | VPN | 100, 101 1191 +------+ +------+ 1193 Figure 13: Recursive Route Resolution 1195 The label resources are an important consideration in MPLS networks. 1196 On access devices, labels are consumed by services as well as for 1197 transport loopbacks inside IGP domain where the access device 1198 resides. For example, in the above diagram PE1 would have to 1199 allocate label resources equal to the number of customers connecting 1200 (i.e. the number of L2/L3 VPNs). Based on the size of the IGP domain 1201 that PE1 resides in, it will also have to allocate labels for IGP 1202 loopbacks. This number is at most a few thousands. So overall a 1203 typical access device should have adequate label resources in 1204 Seamless SR architecture. The P routers need to allocate labels for 1205 IGP loopbacks. This number again is small. At most it will be a few 1206 thousand based on number of nodes in the largest IGP domains. The 1207 metro networks connect to the core network through ABRs. It is 1208 possible that a given ABR may end up having to maintain forwarding 1209 entries for a large subset of the transport loopback routes. There 1210 may be a large number of metro networks connecting to a given ABR, 1211 and in this case, the ABR will need forwarding entries for every 1212 access node in the directly connected metros. So, this ABR may have 1213 to maintain on the order of 100k routes. With BGP-CT each Transport 1214 Class will have to be separately allocated a label. So, in the above 1215 example, the ABR1 would have to use 300k labels if there were 3 1216 Transport Classes. This large number of label forwarding entries 1217 could be problematic. 1219 In highly scaled scenarios, it is therefore desirable to reduce the 1220 forwarding state on the ABRs. This reduction can be achieved with 1221 label stacking as a result of recursive route resolution. Figure 13 1222 illustrates how the forwarding state on ABRs can be greatly reduced 1223 by removing forward state for PEs in remote domains from the ABRs. 1224 In this example, we assume that we are setting up end-to-end paths 1225 for a single Transport Class, for example red. PE2 advertises a BGP- 1226 CT prefix of 2.2.2.2 with nexthop of 2.2.2.2 and label 101. 2.2.2.2 1227 is PE2's loopback. ABR3 advertises label 100 for BGP-CT prefix 1228 2.2.2.2 and changes the nexthop to self. When ABR1 receives the BGP- 1229 CT advertisement for 2.2.2.2, it does not change the nexthop and 1230 advertises same label advertised by ABR3. When PE1 receives the BGP- 1231 CT advertisement for 2.2.2.2 with a nexthop of ABR3, it resolves the 1232 route using reachability to ABR3. 1234 The reachability of ABR3 has been learned by PE1 as the result of a 1235 BGP-CT advertisement originated by ABR3. As shown in Figure 13, ABR3 1236 advertises BGP-CT prefix 30.30.30.30 with label 2001. ABR1 1237 advertises label 2000 for BGP-CT prefix 30.30.30.30 and sets nexthop 1238 to self. PE1 constructs the service data packet with a VPN label at 1239 the bottom followed by 2 BGP-CT labels 100 and 2000. The top most 1240 label 2000 is the transport label for the metro1 domain. Removing 1241 the forwarding state for PEs in remote domains on the ABRs comes at 1242 the expense of one additional BGP-CT label on the data packet. 1244 Recursive route resolution provides significant forwarding state 1245 reduction on the ABRs. ABRs have to allocate label resources only 1246 for the PEs in their local domain. The number of PEs in the same 1247 domain as a given ABR is much lower than the total number of PEs in 1248 the network. 1250 The examples in this draft generally show VPN routes resolving on 1251 BGP-CT prefixes. However, the mechanisms are equally applicable to 1252 non-VPN routes. 1254 6.11. Availability 1256 Transport layer availability is very important in latency and loss 1257 sensitive networks. Any link or node failure must be repaired with 1258 50ms convergence time. 50 ms convergence time can be achieved with 1259 Fast ReRoute (FRR) mechanisms. The seamless SR architecture provides 1260 protection against intra-domain link and node failures, Protection 1261 against border node failures and the egress link and node failures 1262 are also provided. Details of the FRR techniques are described in 1263 the sections below. 1265 6.11.1. Intra domain link and node protection 1267 In the seamless SR architecture, protection against node and link 1268 failure is achieved with the relevant FRR techniques for the 1269 corresponding transport mechanism used inside the domain. In the 1270 case of an IP fabric, ECMP FRR or LFA can be used. In SR networks, 1271 TI-LFA [I-D.ietf-rtgwg-segment-routing-ti-lfa] provides link and node 1272 protection. For SR-TE transport 1273 ([I-D.ietf-spring-segment-routing-policy]), link and node protection 1274 can be achieved using TI-LFA, combined with mechanisms described in 1275 [I-D.hegde-spring-node-protection-for-sr-te-paths]. 1277 6.11.2. Egress link and node protection 1279 [RFC8679] describes the mechanisms for providing protection for 1280 border nodes and PE devices where services are hosted. The mechanism 1281 can be further simplified operationally with anycast SIDs and anycast 1282 service labels, as described in 1283 [I-D.hegde-rtgwg-egress-protection-sr-networks]. 1285 6.11.3. Border Node protection 1287 Border node protection is very important in a network consisting of 1288 multiple domains. Seamless SR architecture can achieve 50ms FRR 1289 protection in the event of node failure using anycast addresses for 1290 the ABR/ASBRs. The requires that a set of ABRs advertise the same 1291 label for a given BGP-CT Prefix. The detailed mechanism is described 1292 in [I-D.hegde-rtgwg-egress-protection-sr-networks]. 1294 6.12. Operations 1296 6.12.1. MPLS ping and Traceroute 1298 The Seamless SR Architecture consists of 3 layers: the service layer, 1299 intra-domain transport, and BGP-CT transport. Within each layer, 1300 connectivity can be verified independently. Within the BGP-CT 1301 transport layer, end-to-end connectivity can be verified using a new 1302 OAM FEC for BGP-CT defined in draft 1303 [I-D.kaliraj-idr-bgp-classful-transport-planes]. The draft describes 1304 end-to-end connectivity verification as well as fault isolation. 1305 BGP-CT verification happens only on the BGP nodes. The intra-domain 1306 connectivity verification and fault isolation will be based on the 1307 technology deployed in that domain as defined in [RFC8029] and 1308 [RFC8287]. 1310 6.12.2. Counters and Statistics 1312 Traffic accounting and the ability to build demand matrix for PE to 1313 PE traffic is very important. With BGP-CT, per-label transit 1314 counters should be supported on every transit router. Per-label 1315 transit counters provide details of total traffic towards a remote PE 1316 measured at every BGP transit router. Per-label egress counters 1317 should be supported on ingress PE router. Per-label egress counters 1318 provide total traffic from ingress PE to the specific remote PE. 1320 6.13. Service Mapping 1322 Service mapping is an important aspect of any architecture. It 1323 provides means to translate end users SLA requirements into 1324 operator's network configurations. Seamless SR architecture supports 1325 automatic steering with extended color community. The Transport 1326 Class and the route target carried by the BGP-CT advertisement 1327 directly map to the extended color community. Services that require 1328 specific SLA carry the extended color community which maps to the 1329 Transport Class to which the BGP-CT advertisement belongs. 1331 Other types of traffic steering such as DSCP based forwarding is 1332 expressed with mapping-community. Mapping community is a standard 1333 BGP community and is completely generic and user defined. The 1334 mapping community will have a specific service mapping feature 1335 associated with it along with required fallback behaviour when the 1336 primary transport goes down. The below list provides a general 1337 guideline into the different service mapping features and fallback 1338 options an implementation should provide. 1340 DSCP based mapping with each DSCP mapping to a Transport Class. 1342 DSCP based mapping with default mapping to a best-effort transport 1344 DSCP based mapping with fallback to best-effort when primary 1345 transport tunnel goes down. 1347 Extended color community based mapping with fallback to best 1348 effort 1350 Fallback options with specific protocol during migrations 1352 Fallback options to a different Transport Class. 1354 No Fallback permitted. 1356 6.14. Migrations 1358 Networks that migrate from Seamless MPLS architecture to Seamless SR 1359 architecture, require that all the border nodes and PE devices be 1360 upgraded and enabled with new family on the BGP session. In cases 1361 where legacy nodes that cannot be upgraded, exporting from BGP-LU 1362 into BGP-CT and vice versa SHOULD be supported. Once the entire 1363 network is migrated to support BGP-CT, there is no need to run BGP-LU 1364 family on the BGP sessions. BGP-CT itself can advertise a best 1365 effort Transport Class and BGP-LU family can be removed. 1367 6.15. SRv6 interworking with MPLS domains 1369 SRv6 defines the Segment Routing architecture for IPv6 data plane 1370 with a new extension header as described in [RFC8402]. As described 1371 in Section 3.3 of the current document, data center and access/ 1372 aggregation networks may deploy SRv6 and connect to the WAN networks. 1373 Since current WAN networks predominantly use MPLS, it is important to 1374 provide solutions that interconnect SRv6 and MPLS domains. The 1375 seamless SR architecture supports interconnecting domains that deploy 1376 SRv6 and MPLS. 1378 The SRv6 Network Programming draft 1379 [I-D.ietf-spring-srv6-network-programming] defines an SRv6 SID as 1380 consisting of locator, function, and argument bits. The locator part 1381 of the SRv6 SID is routable, and the route leads to the node that 1382 instantiates the SID. The seamless SR architecture builds on this 1383 concept to enable interworking between SRv6 and other domains. In 1384 the Seamless SR architecture, different domains are loosely coupled, 1385 and prefixes are not leaked from the IGP in one domain into the IGP 1386 of another domain. BGP is used to stitch the different domains 1387 together and build an end-to-end path. In SRv6, a seperate locator 1388 is allocated for each color. The service SIDs that need to use the 1389 particular colored path will be derived based on corresponding 1390 locator. Locators are IPv6 prefixes of length less than 128 bits. 1391 These locators are advertised in BGP in AFI 2/ SAFI 1 family (IPv6 1392 unicast). BGP will install these locator routes on each border node, 1393 so each border node will have reachability for the SRv6 SIDs. In 1394 order to transparently traverse an MPLS domain, the SRv6 traffic is 1395 encapsulated with MPLS headers at the ingress MPLS border node and 1396 decapsulated at the egress MPLS border node. The association of the 1397 SRv6 locator with a particular color is also carried in the IPv6 1398 unicast advertisement so that specific transport class paths can be 1399 used when desired. This is illustrated in the following example. 1401 Locator for Red Transport Class : 5:6::/96 1402 Locator for Blue Transport Class: 5:7::/96 1404 BGP AFI2/SAFI 1 advertisements for Red transport class 1406 Pfx:5:6::/96 Pfx:5:6::/96 Pfx:5:6::/96 Pfx:5:6::/96 Pfx:5:6::/96 1407 Ext-Com: Red Ext-Com:Red Ext-Com:Red Ext-Com:Red Ext-Com:Red 1408 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:PE2 1410 PE1------------ASBR1-----------ASBR2---------ASBR3-------ASBR4--------PE2 1411 | | | | | | 1412 ------SRv6------ -----MPLS----- ----SRv6----- 1414 VPNa Prefix: 1415 10.1.1.0/24 1416 RD: RD50 1417 RT: RT-VPNa 1418 ext-community: 1419 Red(100) 1420 nh: PE2 1421 END.DT4 SID: 5:6::16/128 1422 +-----------+ 1423 | IL1 | 1424 +-----------+ 1425 | IL2 | 1426 +---------+ +------------+ +-----------+ +-----------+ 1427 |src:PE1 | | src:PE1 | |src:PE1 | |src:PE1 | 1428 |dst:ASBR1| | dst:5:6::16| |dst:5:6::16| |dst:5:6::16| 1429 |SRH: SL=1| |SRH: SL = 0 | |SRH: SL=0 | |SRH: SL=0 | 1430 |5:6::16 | |5:6::16 | |5:6::16 | |5:6::16 | 1431 +---------+ +------------+ +-----------+ +-----------+ +----------+ 1432 | orig | | orig | | Orig | | Orig | | Orig | 1433 +---------+ +------------+ +-----------+ +-----------+ +----------+ 1435 Packet format along end-to-end path 1436 Orig is the original packet destined to 10.1.1.1 1437 IL1, IL2, intra-domain labels corresponding to 1438 red intra-domain paths in MPLS domain. 1440 Figure 14: SRv6 and MPLS interworking 1442 In the diagram above Figure 14 describes an example where the core is 1443 MPLS domain and the datacenters deploy SRv6. In the example above, 1444 an end-to-end path is built for the Red transport class. The SRv6 1445 domains in this example use best effort paths. On PE2, locator 1446 5:6::/96 represents the Red transport class. PE2 would like for 1447 traffic for service prefix 10.1.1.0/24 to use a Red tranport class 1448 path. To accomplish this PE2 creates two BGP advertisements, a VPN 1449 advertisement and an IPv6 unicast advertisement. 1451 PE2 creates a VPN advertisement using an END.DT4 SID derived from its 1452 Red locator 5:6::/96.(END.DT4 SID = 5:6::16/128 in this example.) 1453 The VPN advertisement also associates the Red extended color 1454 community with the service prefix 10.1.1.0/24. 1456 PE2 also creates a IPv6 unicast BGP advertisement that associates the 1457 IPv6 prefix of the Red locator (5:6::/96) with the Red extended 1458 community. This advertisement allows PE1 as well as the ASBRs to 1459 have routes for 5:6::/96, and to associate those routes with the Red 1460 transport class where appropriate. 1462 The routes that make up the end-to-end path from PE1 to PE2 are 1463 described below. On PE1, the VPN prefix 10.1.1.0/24. will resolve on 1464 the locator prefix 5:6::/96. The prefix 5:6::/96 will then resolve 1465 on an SRv6/IPv6 tunnel to ASBR1. ASBR1 will have a normal IPv6 route 1466 for 5:6::/96 installed by BGP to reach ASBR2. On ASBR2, the prefix 1467 5:6::/96 will resolve on an MPLS tunnel belonging to Red transport 1468 class terminating on ASBR3. The route for 5:6::/96 from ASBR3 to 1469 ASBR4 is again a simple IPv6 route installed by BGP.On ASBR4, both 1470 BGP and the IGP will provide a route for 5:6::/96. In general, the 1471 active route will be derived from the IGP which will normally be 1472 preferred. In cases where a traffic engineered path is needed in the 1473 last SRv6 domain, the preference needs to be set appropriately by the 1474 administrator. 1476 Below is a description of packet forwarding operations along the end- 1477 to-end path. On PE1, the original packet destined to 10.1.1.1 will 1478 get encapsulated in IPv6 header with one segment END.DT4SID. The 1479 destination address is set to ASBR1. On ASBR1, segment left is 1480 decremented and the END.DT4 sid 5:6::16 is copied into destination 1481 address. On ASBR1, forwarding will be based on the locator route 1482 programmed by BGP. Between ASBR1 and ASBR2, it is normal ipv6 1483 forwarding. On ASBR2, an MPLS header corresponding to Red transport 1484 Class is pushed on the packet. The MPLS header gets removed when 1485 packet reaches ASBR3 and normal ipv6 forwarding based on the locator 1486 route is performed. On ASBR4, since best effort path for locator 1487 5:6::/96 is used which is created by IGP, normal IPv6 forwarding is 1488 used. The packet reaches PE2 with 5:6::16 as the destination which 1489 is present in MyLocalSID table. IPv6 header is decapsulated and 1490 lookup for 10.1.1.1 is performed in the VPN table. 1492 The example described above has complete domain seperation where SRv6 1493 operations end on one border nodeand MPLS header operations are 1494 performed on next border node. There may be cases where the a single 1495 border node needs to perform both SRv6 and MPLS operations. A goal 1496 for the Seamless SR architecture is to avoid service routes on border 1497 nodes and provide seamless end-to-end connectivity for the services. 1498 In order to satisfy this goal for the single border node use case, a 1499 new SID type is defined. The END.DTM SID decapsulates the IPv6 1500 header and pushes an MPLS SID List. It is used to determine the MPLS 1501 labels for traffic flowing from a SRv6 domain to an MPLS domain. 1502 [draft-bonica-spring-srv6-end-dtm] provides details of this new SID 1503 and its operation in detail. 1505 6.16. Service Function Chaining 1507 Service Function Chaining involves steering traffic through an 1508 ordered set of service functions. Virtualized service functions may 1509 be deployed in a single Data Center location or across multiple Data 1510 Centers which are geographically separated. There are several 1511 different service function chaining solutions available. One set of 1512 solutions uses the source routing paradigm as described in 1513 [I-D.ietf-spring-sr-service-programming]. The source routing based 1514 solution may use SR-MPLS or SRv6 as described in above draft. 1515 Another set of solutions uses stitched tunnels to achieve the traffic 1516 steering through service functions. The tunneling technology can be 1517 MPLS tunneling or IP tunnelling. This set of solutions is described 1518 in [draft-hegde-spring-service-chaining-stitched-tunnel]. When a 1519 network deploys Seamless SR-based inter-domain solutions, it can 1520 deploy either of these solutions for service chaining. This section 1521 describes how service chaining is applied in a network that uses 1522 Seamless SR for inter-domain connectivity. For simplicity, the 1523 example below assumes service functions deployed in a single Data 1524 Center. The procedures are equally applicable when the service 1525 functions are spread across multiple geographically separated Data 1526 Centers. 1528 ----------------------------- -------------------- 1529 | --- | | | 1530 | | S1| TOR1 | | | Z 1531 | --- SP1 DCGW1 | / 1532 | --- | | WAN PE2 1533 | | S2| TOR2 | | | 1534 | --- SP2 DCGW2 | 1535 | --- | | | 1536 | | S3| TOR3 | | | 1537 | --- | | | 1538 |----------------------------- ------------------- 1539 BGP-CT BGP-CT 1540 |----------------|---------------------| 1542 Figure 15: SFC in a seamless SR based network 1544 Figure 15 shows a Data Center (DC) network connected to a WAN 1545 network. We assume the traffic is originating at S1 in the DC 1546 network and destined for Z in the WAN network. The traffic should go 1547 through service functions deployed on S2 and S3. The DCGW1 and DCGW2 1548 are the border nodes between the DC domain and WAN domain. BGP-CT is 1549 deployed to provide seamless end-to-end connectivity. We also assume 1550 that DC network deploys a pure IP underlay, and that the WAN uses an 1551 MPLS underlay. BGP-CT is deployed on the Top-of-Rack switches 1552 (TORs), and BGP-CT sessions are running from the TORs to the DCGWs, 1553 and from the DCGWs to PE2. All the BGP-CT speakers will have an SLA- 1554 specific forwarding entries to reach PE2. 1556 When source routed SFC is used 1557 [I-D.ietf-spring-sr-service-programming], a packet originating at S1 1558 will use an SR-MPLS or SRv6 SID-list to achieve service function 1559 chaining. In this example, the packet will have a SID-list 1560 corresponding to the service functions on S2 and S3. The SFC SID- 1561 list gets removed by the time the packets leaves S3. The packet 1562 arrives at TOR3 with its original IP header exposed. On TOR3 a 1563 lookup is done for destination Z. The packet follows SLA-specific 1564 BGP-CT paths in both the DC and the WAN. 1566 When the stitched tunnel mechanism is used for service chaining 1567 [draft-hegde-spring-service-chaining-stitched-tunnel], it is typical 1568 for an an overlay orchestrator to build the tunnels in the DC fabric 1569 for the S1->S2 and S2->S3. The overlay orchestrator also provisions 1570 the appropriate firewall filters to steer the traffic across these 1571 stitched tunnels. When the packet arrives at S3, all service 1572 functions have been applied and a lookup on the original IP header is 1573 done. In the case, the packet also follows SLA-specific BGP-CT paths 1574 in both the DC and the WAN. 1576 6.17. BGP based Multicast 1578 BGP based multicast as described in draft 1579 [I-D.zzhang-bess-bgp-multicast] serves two main purposes. It can 1580 replace PIM/ mLDP inside a domain to natively do a BGP based 1581 multicast. It can also serve as an overlay stitching protocol to 1582 stitch multiple P2MP LSPs across the domain. This gives the ability 1583 to easily transition each domain independently from one technology to 1584 the other. BGP based multicast defines a new SAFI for carrying the 1585 MULTICAST TREE SAFI. Different route types are defined to support 1586 the various usecases. section 1.2.6 of 1587 [I-D.zzhang-bess-bgp-multicast] describes the use of new SAFI for 1588 stitching the multicast tunnels across different domains. 1590 7. Backward Compatibility 1592 8. Security Considerations 1594 TBD 1596 9. IANA Considerations 1598 10. Acknowledgements 1600 Many thanks to Kireeti Kompella, Ron Bonica, Krzysztof Szarcowitz, 1601 Srihari Sangli,Julian Lucek, Ram Santhanakrishnan for discussions and 1602 inputs. Thanks to Joel Halpern for review and comments. 1604 11. Contributors 1606 1.Kaliraj Vairavakkalai 1608 Juniper Networks 1610 kaliraj@juniper.net 1612 2. Jeffrey Zhang 1614 Juniper Networks 1616 zzhang@juniper.net 1618 12. References 1619 12.1. Normative References 1621 [I-D.hegde-rtgwg-egress-protection-sr-networks] 1622 Hegde, S., Lin, W., and S. Peng, "Egress Protection for 1623 Segment Routing (SR) networks", draft-hegde-rtgwg-egress- 1624 protection-sr-networks-01 (work in progress), November 1625 2020. 1627 [I-D.ietf-idr-performance-routing] 1628 Xu, X., Hegde, S., Talaulikar, K., Boucadair, M., and C. 1629 Jacquenet, "Performance-based BGP Routing Mechanism", 1630 draft-ietf-idr-performance-routing-03 (work in progress), 1631 December 2020. 1633 [I-D.kaliraj-idr-bgp-classful-transport-planes] 1634 Vairavakkalai, K., Venkataraman, N., Rajagopalan, B., 1635 Mishra, G., Khaddam, M., and X. Xu, "BGP Classful 1636 Transport Planes", draft-kaliraj-idr-bgp-classful- 1637 transport-planes-06 (work in progress), January 2021. 1639 [I-D.zzhang-bess-bgp-multicast] 1640 Zhang, Z., Giuliano, L., Patel, K., Wijnands, I., mishra, 1641 m., and A. Gulko, "BGP Based Multicast", draft-zzhang- 1642 bess-bgp-multicast-03 (work in progress), October 2019. 1644 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1645 Requirement Levels", BCP 14, RFC 2119, 1646 DOI 10.17487/RFC2119, March 1997, 1647 . 1649 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 1650 BGP-4", RFC 3107, DOI 10.17487/RFC3107, May 2001, 1651 . 1653 [RFC8669] Previdi, S., Filsfils, C., Lindem, A., Ed., Sreekantiah, 1654 A., and H. Gredler, "Segment Routing Prefix Segment 1655 Identifier Extensions for BGP", RFC 8669, 1656 DOI 10.17487/RFC8669, December 2019, 1657 . 1659 12.2. Informative References 1661 [I-D.hegde-spring-node-protection-for-sr-te-paths] 1662 Hegde, S., Bowers, C., Litkowski, S., Xu, X., and F. Xu, 1663 "Node Protection for SR-TE Paths", draft-hegde-spring- 1664 node-protection-for-sr-te-paths-07 (work in progress), 1665 July 2020. 1667 [I-D.ietf-idr-link-bandwidth] 1668 Mohapatra, P. and R. Fernando, "BGP Link Bandwidth 1669 Extended Community", draft-ietf-idr-link-bandwidth-07 1670 (work in progress), March 2018. 1672 [I-D.ietf-idr-segment-routing-te-policy] 1673 Previdi, S., Filsfils, C., Talaulikar, K., Mattes, P., 1674 Rosen, E., Jain, D., and S. Lin, "Advertising Segment 1675 Routing Policies in BGP", draft-ietf-idr-segment-routing- 1676 te-policy-11 (work in progress), November 2020. 1678 [I-D.ietf-idr-tunnel-encaps] 1679 Patel, K., Velde, G., Sangli, S., and J. Scudder, "The BGP 1680 Tunnel Encapsulation Attribute", draft-ietf-idr-tunnel- 1681 encaps-20 (work in progress), November 2020. 1683 [I-D.ietf-lsr-flex-algo] 1684 Psenak, P., Hegde, S., Filsfils, C., Talaulikar, K., and 1685 A. Gulko, "IGP Flexible Algorithm", draft-ietf-lsr-flex- 1686 algo-13 (work in progress), October 2020. 1688 [I-D.ietf-mpls-seamless-mpls] 1689 Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz, 1690 M., and D. Steinberg, "Seamless MPLS Architecture", draft- 1691 ietf-mpls-seamless-mpls-07 (work in progress), June 2014. 1693 [I-D.ietf-pce-segment-routing-policy-cp] 1694 Koldychev, M., Sivabalan, S., Barth, C., Peng, S., and H. 1695 Bidgoli, "PCEP extension to support Segment Routing Policy 1696 Candidate Paths", draft-ietf-pce-segment-routing-policy- 1697 cp-01 (work in progress), October 2020. 1699 [I-D.ietf-rtgwg-segment-routing-ti-lfa] 1700 Litkowski, S., Bashandy, A., Filsfils, C., Decraene, B., 1701 and D. Voyer, "Topology Independent Fast Reroute using 1702 Segment Routing", draft-ietf-rtgwg-segment-routing-ti- 1703 lfa-05 (work in progress), November 2020. 1705 [I-D.ietf-spring-segment-routing-policy] 1706 Filsfils, C., Talaulikar, K., Voyer, D., Bogdanov, A., and 1707 P. Mattes, "Segment Routing Policy Architecture", draft- 1708 ietf-spring-segment-routing-policy-09 (work in progress), 1709 November 2020. 1711 [I-D.ietf-spring-sr-service-programming] 1712 Clad, F., Xu, X., Filsfils, C., daniel.bernier@bell.ca, 1713 d., Li, C., Decraene, B., Ma, S., Yadlapalli, C., 1714 Henderickx, W., and S. Salsano, "Service Programming with 1715 Segment Routing", draft-ietf-spring-sr-service- 1716 programming-03 (work in progress), September 2020. 1718 [I-D.ietf-spring-srv6-network-programming] 1719 Filsfils, C., Camarillo, P., Leddy, J., Voyer, D., 1720 Matsushima, S., and Z. Li, "SRv6 Network Programming", 1721 draft-ietf-spring-srv6-network-programming-28 (work in 1722 progress), December 2020. 1724 [I-D.saad-sr-fa-link] 1725 Saad, T., Beeram, V., Barth, C., and S. Sivabalan, 1726 "Segment-Routing over Forwarding Adjacency Links", draft- 1727 saad-sr-fa-link-02 (work in progress), July 2020. 1729 [I-D.voyer-pim-sr-p2mp-policy] 1730 Voyer, D., Filsfils, C., Parekh, R., Bidgoli, H., and Z. 1731 Zhang, "Segment Routing Point-to-Multipoint Policy", 1732 draft-voyer-pim-sr-p2mp-policy-02 (work in progress), July 1733 2020. 1735 [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities 1736 Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, 1737 . 1739 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1740 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1741 2006, . 1743 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 1744 R., Patel, K., and J. Guichard, "Constrained Route 1745 Distribution for Border Gateway Protocol/MultiProtocol 1746 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 1747 Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684, 1748 November 2006, . 1750 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 1751 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 1752 RFC 5357, DOI 10.17487/RFC5357, October 2008, 1753 . 1755 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 1756 Thomas, "Label Distribution Protocol Extensions for Point- 1757 to-Multipoint and Multipoint-to-Multipoint Label Switched 1758 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 1759 . 1761 [RFC7311] Mohapatra, P., Fernando, R., Rosen, E., and J. Uttaro, 1762 "The Accumulated IGP Metric Attribute for BGP", RFC 7311, 1763 DOI 10.17487/RFC7311, August 2014, 1764 . 1766 [RFC7471] Giacalone, S., Ward, D., Drake, J., Atlas, A., and S. 1767 Previdi, "OSPF Traffic Engineering (TE) Metric 1768 Extensions", RFC 7471, DOI 10.17487/RFC7471, March 2015, 1769 . 1771 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1772 "Encapsulating MPLS in UDP", RFC 7510, 1773 DOI 10.17487/RFC7510, April 2015, 1774 . 1776 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 1777 Chaining (SFC) Architecture", RFC 7665, 1778 DOI 10.17487/RFC7665, October 2015, 1779 . 1781 [RFC8029] Kompella, K., Swallow, G., Pignataro, C., Ed., Kumar, N., 1782 Aldrin, S., and M. Chen, "Detecting Multiprotocol Label 1783 Switched (MPLS) Data-Plane Failures", RFC 8029, 1784 DOI 10.17487/RFC8029, March 2017, 1785 . 1787 [RFC8287] Kumar, N., Ed., Pignataro, C., Ed., Swallow, G., Akiya, 1788 N., Kini, S., and M. Chen, "Label Switched Path (LSP) 1789 Ping/Traceroute for Segment Routing (SR) IGP-Prefix and 1790 IGP-Adjacency Segment Identifiers (SIDs) with MPLS Data 1791 Planes", RFC 8287, DOI 10.17487/RFC8287, December 2017, 1792 . 1794 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 1795 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1796 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 1797 July 2018, . 1799 [RFC8570] Ginsberg, L., Ed., Previdi, S., Ed., Giacalone, S., Ward, 1800 D., Drake, J., and Q. Wu, "IS-IS Traffic Engineering (TE) 1801 Metric Extensions", RFC 8570, DOI 10.17487/RFC8570, March 1802 2019, . 1804 [RFC8604] Filsfils, C., Ed., Previdi, S., Dawra, G., Ed., 1805 Henderickx, W., and D. Cooper, "Interconnecting Millions 1806 of Endpoints with Segment Routing", RFC 8604, 1807 DOI 10.17487/RFC8604, June 2019, 1808 . 1810 [RFC8679] Shen, Y., Jeganathan, M., Decraene, B., Gredler, H., 1811 Michel, C., and H. Chen, "MPLS Egress Protection 1812 Framework", RFC 8679, DOI 10.17487/RFC8679, December 2019, 1813 . 1815 [TS.23.501-3GPP] 1816 3rd Generation Partnership Project (3GPP), "System 1817 Architecture for 5G System; Stage 2, 3GPP TS 23.501 1818 v16.4.0", March 2020. 1820 Authors' Addresses 1822 Shraddha Hegde 1823 Juniper Networks Inc. 1824 Exora Business Park 1825 Bangalore, KA 560103 1826 India 1828 Email: shraddha@juniper.net 1830 Chris Bowers 1831 Juniper Networks Inc. 1833 Email: cbowers@juniper.net 1835 Xiaohu Xu 1836 Alibaba Inc. 1837 Beijing 1838 China 1840 Email: xiaohu.xxh@alibaba-inc.com 1842 Arkadiy Gulko 1843 Refinitiv 1845 Email: arkadiy.gulko@refinitiv.com 1846 Alex Bogdanov 1847 Google Inc. 1849 Email: bogdanov@google.com 1851 James Uttaro 1852 ATT 1854 Email: ju1738@att.com 1856 Luay Jalil 1857 Verizon 1859 Email: luay.jalil@verizon.com 1861 Mazen Khaddam 1862 Cox communications 1864 Email: mazen.khaddam@cox.com 1866 Andrew Alston 1867 Liquid Telecom 1869 Email: andrew.alston@liquidtelecom.com