idnits 2.17.1 draft-hegde-spring-mpls-seamless-sr-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 32 instances of too long lines in the document, the longest one being 15 characters in excess of 72. == There are 8 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 917 has weird spacing: '... red intra...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Each domain SHOULD be independent and SHOULD not depend on the transport technology in another domain. This allows for more flexible evolution of the network. -- The document date (September 21, 2020) is 1310 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.ietf-idr-tunnel-encaps' is defined on line 1344, but no explicit reference was found in the text == Unused Reference: 'RFC1997' is defined on line 1384, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 1388, but no explicit reference was found in the text == Unused Reference: 'RFC8402' is defined on line 1431, but no explicit reference was found in the text == Outdated reference: A later version (-02) exists of draft-hegde-rtgwg-egress-protection-sr-networks-00 == Outdated reference: A later version (-03) exists of draft-ietf-idr-performance-routing-02 == Outdated reference: A later version (-17) exists of draft-kaliraj-idr-bgp-classful-transport-planes-01 ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) == Outdated reference: A later version (-26) exists of draft-ietf-idr-segment-routing-te-policy-09 == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-19 == Outdated reference: A later version (-26) exists of draft-ietf-lsr-flex-algo-11 == Outdated reference: A later version (-15) exists of draft-ietf-pce-segment-routing-policy-cp-00 == Outdated reference: A later version (-13) exists of draft-ietf-rtgwg-segment-routing-ti-lfa-04 == Outdated reference: A later version (-22) exists of draft-ietf-spring-segment-routing-policy-08 Summary: 2 errors (**), 0 flaws (~~), 18 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SPRING S. Hegde 3 Internet-Draft C. Bowers 4 Intended status: Standards Track Juniper Networks Inc. 5 Expires: March 25, 2021 X. Xu 6 Alibaba Inc. 7 A. Gulko 8 Refinitiv 9 A. Bogdanov 10 Google Inc. 11 J. Uttaro 12 ATT 13 L. Jalil 14 Verizon 15 September 21, 2020 17 Seamless Segment Routing 18 draft-hegde-spring-mpls-seamless-sr-02 20 Abstract 22 In order to operate networks with large numbers of devices, network 23 operators organize networks into multiple smaller network domains. 24 Each network domain typically runs an IGP which has complete 25 visibility within its own domain, but limited visibility outside of 26 its domain. Seamless Segment Routing (Seamless SR) provides 27 flexible, scalable and reliable end-to-end connectivity for services 28 across independent network domains. Seamless SR accommodates domains 29 using SR, LDP, and RSVP for MPLS label distribution as well as 30 domains running IP without MPLS (IP-Fabric). 32 Requirements Language 34 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 35 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 36 document are to be interpreted as described in RFC 2119 [RFC2119]. 38 Status of This Memo 40 This Internet-Draft is submitted in full conformance with the 41 provisions of BCP 78 and BCP 79. 43 Internet-Drafts are working documents of the Internet Engineering 44 Task Force (IETF). Note that other groups may also distribute 45 working documents as Internet-Drafts. The list of current Internet- 46 Drafts is at https://datatracker.ietf.org/drafts/current/. 48 Internet-Drafts are draft documents valid for a maximum of six months 49 and may be updated, replaced, or obsoleted by other documents at any 50 time. It is inappropriate to use Internet-Drafts as reference 51 material or to cite them other than as "work in progress." 53 This Internet-Draft will expire on March 25, 2021. 55 Copyright Notice 57 Copyright (c) 2020 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (https://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 73 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 75 3.1. Service provider network . . . . . . . . . . . . . . . . 5 76 3.2. Large scale WAN networks . . . . . . . . . . . . . . . . 7 77 3.3. Data Center Interconnect (DCI) Networks . . . . . . . . . 8 78 3.4. Multicast Use cases . . . . . . . . . . . . . . . . . . . 8 79 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 9 80 4.1. MPLS Transport . . . . . . . . . . . . . . . . . . . . . 9 81 4.2. SLA Guarantee . . . . . . . . . . . . . . . . . . . . . . 10 82 4.3. Scalability . . . . . . . . . . . . . . . . . . . . . . . 10 83 4.4. Availability . . . . . . . . . . . . . . . . . . . . . . 10 84 4.5. Operations . . . . . . . . . . . . . . . . . . . . . . . 10 85 4.6. Service Mapping . . . . . . . . . . . . . . . . . . . . . 11 86 5. Seamless Segment Routing architecture . . . . . . . . . . . . 11 87 5.1. Solution Concepts . . . . . . . . . . . . . . . . . . . . 11 88 5.2. BGP Classful Transport . . . . . . . . . . . . . . . . . 12 89 5.3. Automatically Creating Transport Classes . . . . . . . . 17 90 5.3.1. Automatically Creating Transport Classes for BGP-SR- 91 TE Intra-domain Tunnels . . . . . . . . . . . . . . . 17 92 5.3.2. Automatically Creating Transport Classes for Flex- 93 Algo Tunnels . . . . . . . . . . . . . . . . . . . . 17 94 5.3.3. Auto-deriving Transport Classes for PCEP . . . . . . 18 95 5.4. Inter-domain flex-algo with BGP-CT . . . . . . . . . . . 18 96 5.5. Applicability to color-only policies . . . . . . . . . . 18 97 5.6. Data sovereignty . . . . . . . . . . . . . . . . . . . . 18 98 5.7. Interconnecting IP Fabric Data Centers . . . . . . . . . 20 99 5.8. Translating Transport Classes across Domains . . . . . . 22 100 5.9. SLA Guarantee . . . . . . . . . . . . . . . . . . . . . . 23 101 5.9.1. Low latency . . . . . . . . . . . . . . . . . . . . . 23 102 5.9.2. Traffic Engineering (TE) constraints . . . . . . . . 23 103 5.9.3. Bandwidth constraints . . . . . . . . . . . . . . . . 24 104 5.10. Scalability . . . . . . . . . . . . . . . . . . . . . . . 24 105 5.10.1. Access node scalability . . . . . . . . . . . . . . 24 106 5.10.2. Label stack depth . . . . . . . . . . . . . . . . . 24 107 5.10.3. Label Resources . . . . . . . . . . . . . . . . . . 25 108 5.11. Availability . . . . . . . . . . . . . . . . . . . . . . 28 109 5.11.1. Intra domain link and node protection . . . . . . . 28 110 5.11.2. Egress link and node protection . . . . . . . . . . 28 111 5.11.3. Border Node protection . . . . . . . . . . . . . . . 28 112 5.12. Operations . . . . . . . . . . . . . . . . . . . . . . . 29 113 5.12.1. MPLS ping and Traceroute . . . . . . . . . . . . . . 29 114 5.12.2. Counters and Statistics . . . . . . . . . . . . . . 29 115 5.13. Service Mapping . . . . . . . . . . . . . . . . . . . . . 29 116 5.14. Migrations . . . . . . . . . . . . . . . . . . . . . . . 30 117 5.15. Interworking with v6 transport technologies . . . . . . . 30 118 5.16. BGP based Multicast . . . . . . . . . . . . . . . . . . . 30 119 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . 31 120 7. Security Considerations . . . . . . . . . . . . . . . . . . . 31 121 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 122 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 31 123 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 31 124 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 31 125 11.1. Normative References . . . . . . . . . . . . . . . . . . 31 126 11.2. Informative References . . . . . . . . . . . . . . . . . 32 127 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 129 1. Introduction 131 Evolving wireless access technology and cloud applications are 132 expected to place new requirements on the packet transport networks. 133 These services are contributing to significantly higher bandwidth 134 throughput which in turn leads to a growing number of transport 135 network devices. As an example, 5G networks are expected to require 136 up to 250Gbps in the fronthaul and up to 400Gbps in the backhaul. 137 There is a desire to allow many network functions to be virtualized 138 and cloud native. In order to support latency-sensitive cloud-native 139 network functions, packet transport networks should be capable of 140 providing low-latency paths end-to-end. Some services will require 141 low-latency paths while others may require different QoS properties. 142 The network should be able to differentiate between the services and 143 provide corresponding SLA transport paths. In addition, as these 144 applications become more sensitive and less loss tolerant, more and 145 more emphasis is placed on overall service availability and 146 reliability. 148 The Seamless SR architecture builds upon the Seamless MPLS 149 architecture and caters to new requirements imposed by the 5G 150 transport networks and the cloud applications. 151 [I-D.ietf-mpls-seamless-mpls], contains a good description of the 152 Seamless MPLS architecture. Although [I-D.ietf-mpls-seamless-mpls] 153 has not been published as an RFC, it serves as a useful description 154 of the Seamless MPLS architecture. [I-D.ietf-mpls-seamless-mpls] 155 describes the Seamless MPLS architecture, which uses LDP and/or RSVP 156 for intra-domain label distribution, and BGP-LU [RFC3107] for end-to- 157 end label distribution. Seamless SR focuses on using segment routing 158 for intra-domain label distribution. The mechansims described in 159 this document are equally applicable to intra-domain tunneling 160 mechanisms deployed using RSVP and/or LDP. 162 By using segment routing for intra-domain label distribution, 163 Seamless SR is able to easily support both SR-MPLS on IPv4 and IPv6 164 networks. This overcomes a limitation of the classic Seamless MPLS 165 architecture, which was limited to run MPLS on IPv4 networks in 166 practice. Seamless SR (like Seamless MPLS) can use BGP-LU (RFC 3107) 167 to stitch different domains. However, Seamless SR can also take 168 advantage of BGP Prefix-SID [RFC8669] to provide predictable and 169 deterministic labels for the inter-domain connectivity. 171 The basic functionality of the Seamless SR architecture does not 172 require any enhancements to existing protocols. However, in order to 173 support end-to-end service requirements across multiple domains, 174 protocol extensions may be needed. This draft discusses use cases, 175 requirements, and potential protocol enhancements. 177 2. Terminology 178 This document uses the following terminology 180 o Access Node (AN): An access node is a node which processes 181 customers frames or packets at Layer 2 or above. This includes 182 but is not limited to DSLAMs and Cell Site Routers in 5G networks. 183 Access nodes have only limited MPLS functionalities 184 in order to reduce complexity in the access network. 186 o Pre-Aggregation Node (P-AGG): A pre-aggregation node (P-AGG) is a node 187 which aggregates several access nodes (ANs). 189 o Aggregation Node (AGG): A aggregation node (AGG) is a node which 190 aggregates several pre-aggregation nodes (P-AGG). 192 o Area Border Router (ABR): Router between aggregation and core 193 domain. 195 o Label Switch Router (LSR): Label Switch router are pure transit nodes. 196 ideally have no customer or service state and are therefore decoupled 197 from service creation. 199 o Use Case: Describes a typical network including service creation 200 points and distribution of remote node loopback prefixes. 202 Figure 1: Terminology 204 3. Use Cases 206 3.1. Service provider network 208 Service provider transport networks use multiple domains to support 209 scalability. For this analysis, we consider a representative network 210 design with four level of hierarchy: access domains, pre-aggregation 211 domains, aggregation domains and a core. (See Figure 2). The 5G 212 transport networks in particular are expected to scale to very large 213 number of access nodes due to the shorter range of the 5G radio 214 technology. The networks are expected to scale up to one million 215 nodes. 217 +-------+ +-------+ +------+ +------+ 218 | | | | | | | | 219 +--+ P-AGG1+---+ AGG1 +---+ ABR1 +---+ LSR1 +--> to ABR 220 / | | /| | | | | | 221 +----+/ +-------+\/ +-------+ +------+ /+------+ 222 | AN | /\ \/ 223 +----+\ +-------+ \+-------+ +------+/\ +------+ 224 \ | | | | | | \| | 225 +--+ P-AGG2+---+ AGG2 +---+ ABR2 +---+ LSR2 +--> to ABR 226 | | | | | | | | 227 +-------+ +-------+ +------+ +------+ 229 ISIS L1 ISIS L2 ISIS L2 231 |-Access-|--Aggregation Domain--|---------Core-----------------| 233 Figure 2: 5G network 235 Many network functions in a 5G network will be virtualized/ 236 containerized and distributed across multiple data centers. 237 Virtualized network functions are instantiated dynamically across 238 different compute resources. This requires that the underlying 239 transport network supports the stringent SLA on end-to-end paths. 241 5G networks support variety of service use cases that require end-to- 242 end slicing. In certain cases the end-to-end connectivity requires 243 differentiated forwarding capabilities. Seamless SR architecture 244 should provide the ability to establish end-to-end paths that satisfy 245 the required SLAs. For example, end user requirement could be to 246 establish a low latency path end-to-end. The System Architecture for 247 the 5G System [TS.23.501-3GPP] currently defines four standardized 248 Slice/Service Types: Enhanced Mobile Broadband (eMBB), Ultra-Reliable 249 Low Latency Communication (URLLC), massive Internet of Things (mIoT), 250 Vehicle to everything (V2X). The Seamless SR should support end-to- 251 end Service Level Objectives(SLO) to allow the creation of network 252 slices with these four Slice/Service Types. 254 Many deployments consist of ring topologies in the access and 255 aggregation networks. In the ring topologies, there are at most two 256 forwarding paths for the traffic, whereas the core networks consist 257 of nodes with more denser connectivity compared to ring topologies. 258 Thus core networks may have a larger number of TE paths while access 259 networks will have a smaller number of TE paths. The Seamless SR 260 architecture should support the ability to have more TE paths in one 261 domain and lesser number of TE paths in another domain and provide 262 the ability to effectively connect the domains end-to-end while 263 satisfying end-to-end constraints. 265 3.2. Large scale WAN networks 267 As WAN networks grow beyond several thousand nodes, it is often 268 useful to divide the network into multiple IGP domains, as 269 illustrated in Section 3.2. Separate IGP domains increase service 270 availability by establishing a constrained failure domain. Smaller 271 IGP domains may also improve network performance and health by 272 reducing the device scale profile (including protocol and FIB scale). 274 +-------+ +-------+ +-------+ 275 | | | | | | 276 | ABR1 ABR2 ABR3 ABR4 | 277 | | | | | | 278 PE1+DOMAIN1+-----+DOMAIN2+-----+DOMAIN3+PE2 279 | | | | | | 280 | ABR11 ABR22 ABR33 ABR44 | 281 | | | | | | 282 +-------+ +-------+ +-------+ 284 |-ISIS1-| |-ISIS2-| |-ISIS3-| 286 Figure 3: WAN Network 288 These Large WAN networks often cross national boundaries. In order 289 to meet data sovereignty requirements, operators need to maintain 290 strict control over end-to-end traffic-engineered(TE) paths. Segment 291 Routing provides two main solutions to implement highly constrained 292 TE paths. Flex-algo (defined in [I-D.ietf-lsr-flex-algo]) uses 293 prefix-SIDs computed by all nodes in the IGP domain using the same 294 pruned topology. Highly constrained TE paths for the data 295 sovereignty use case can also be implemented using SR-TE policies 296 ([I-D.ietf-spring-segment-routing-policy]) built using unprotected 297 adjacency SIDs. 299 Both of these approaches work well for intra-domain TE paths. 300 However, they both have limitations when one tries to extend them to 301 the creation of highly constrained inter-domain TE paths. A goal of 302 seamless SR is to be able to create highly constrained inter-domain 303 TE paths in a scalable manner. 305 Some deployments may use a centralized controller to acquire the 306 topologies of multiple domains and build end-to-end constrained 307 paths. This can be scaled with hierarchical controllers. However, 308 there is still significant risk of a loss of network connectivity to 309 one or more controllers, which can result in a failure to satisfy the 310 strict requirements of data sovereignty. The network should have 311 pre-established TE paths end-to-end that don't rely on controllers in 312 order to address these failure scenarios. 314 3.3. Data Center Interconnect (DCI) Networks 316 Data centers are playing an increasingly important role in providing 317 access to information and applications. Geographically diverse data 318 centers usually connect via a high speed, reliable and secure core 319 network. 321 +-------+ +-------+ +-------+ 322 | ASBR1 ASBR2 ASBR3 ASBR4 | 323 | | | | | | 324 PE1+ DC1 +-----+ CORE +-----+ DC2 +PE2 325 | ASBR11 ASBR22 ASBR33 ASBR44 | 326 | | | | | | 327 +-------+ +-------+ +-------+ 329 |-ISIS1-| |-ISIS2-| |-ISIS3-| 331 Figure 4: DCI Network 333 In many Data Center deployments, applications require end-to-end path 334 diversity and/or end-to-end low latency paths. It is desirable to 335 have a uniform technology deployed in the core as well as in the Data 336 Centers to create these SLA paths. Such uniformity simplifies the 337 network to a great extent. It is desirable for a solution to only 338 require service-related configurations on the access end-points where 339 services are attached, avoiding service-related configurations on the 340 ABR/ASBR nodes. 342 3.4. Multicast Use cases 344 Multicast services such as IPTV and multicast also need to be support 345 across a multi-domain service provider network. Multicast services 346 such as IPTV, multicast VPN etc need to be supported in a service 347 provider network. 349 +---------+---------+---------+ 350 | | | | 351 S1 ABR1 ABR2 R1 352 | Metro1 | Core | Metro2 | 353 | | | | 354 S2 ABR11 ABR22 R2 355 | | | | 356 +---------+---------+---------+ 358 |-ISIS1-| |-ISIS2-| |-ISIS3-| 360 Figure 5: Multicast usecases 362 Figure 5 shows a simplified multi-domain network supporting 363 multicast. Multicast sources S1 and S2 lie in a different domain 364 from the receivers R1 and R2. Using multiple IGP domains presents a 365 problem for the establishment of multicast replication trees. 366 Typically, a multicast receiver does a reverse path forwarding (RPF) 367 lookup for a multicast source. One solution is to leak the routes 368 for multicast sources across the IGP domains. However, this can 369 compromise the scaling properties of the multi-domain architecture. 370 SR-P2MP [I-D.voyer-pim-sr-p2mp-policy] offers a solution for both 371 intra-domain and inter-domain multicast. However, it does not 372 accommodate deployments using existing intra-domain multicast 373 technology, such as mLDP [RFC6388] in some of the domains. A 374 solution should accommodate a mixture of existing and newer 375 technologies to better facilitate coexistence and migration. 377 4. Requirements 379 This section provides a summary of requirements derived from the use 380 cases described in previous sections. 382 4.1. MPLS Transport 384 The architecture SHOULD provide MPLS transport between two service 385 endpoints regardless of whether the two end-points are in the same 386 IGP domain, different IGP domains, or in different autonomous 387 systems. 389 The MPLS transport SHOULD be supported on IPv4, IPv6, and dual- 390 stack networks. 392 4.2. SLA Guarantee 394 The architecture SHOULD allow the creation of paths that support 395 end-to-end SLAs. The paths should for example obey constraints 396 related to latency, diversity, bandwidth and availability. 398 The architecture SHOULD support end-to-end network slicing as 399 described by 5G transport requirements [TS.23.501-3GPP]. 401 4.3. Scalability 403 The architecture SHOULD be able to support up to 1 million nodes. 405 The architecture SHOULD facilitate the use of access nodes with 406 low RIB/FIB and low CPU capabilities. 408 The architecture SHOULD facilitate the use of access nodes with 409 low label stacking capability. 411 The architecture SHOULD allow for a scalable response to network 412 events. An individual node SHOULD only need to respond to a 413 limited subset of network events. 415 Service routes on the border nodes SHOULD be minimized. 417 4.4. Availability 419 Traffic SHOULD be Fast Reroute (FRR) protected against link, node, 420 and SRLG failures within a domain. 422 Traffic SHOULD be Fast Reroute (FRR) protected against border node 423 failures. 425 Traffic SHOULD be Fast Reroute (FRR) protected against egress node 426 and egress link failures. 428 4.5. Operations 430 Each domain SHOULD be independent and SHOULD not depend on the 431 transport technology in another domain. This allows for more 432 flexible evolution of the network. 434 Basic MPLS OAM mechanisms described in [RFC8029] SHOULD be 435 supported. 437 End-to-end mpls ping and traceroute procedures SHOULD be 438 supported. 440 Ability to validate the path inside each domain SHOULD be 441 supported. 443 Statistics for inter-domain paths on the ingress and egress PE 444 nodes as well as border nodes SHOULD be supported. 446 4.6. Service Mapping 448 The architecture SHOULD support the automated steering of traffic 449 on to transport paths based on communities carried in the service 450 prefix advertisements. 452 The architecture SHOULD support the steering of traffic on to 453 transport paths based on the DSCP value carried in IPv4/IPv6 454 packets. 456 Traffic steering based on EXP bits in the mpls header SHOULD be 457 supported. 459 Traffic steering based on 5-tuple packet filter SHOULD be 460 supported. Source address, destination address, source port, 461 destination port and protocol fields should be allowed. 463 All traffic steering mechanims SHOULD be supported for all kinds 464 of service traffic including VPN traffic as well as global 465 internet traffic. 467 The core domain is expected to have more traffic engineering 468 constraints as compared to metros. The ability to map the 469 services to appropriate transport tunnels at service attachment 470 points SHOULD be supported. 472 5. Seamless Segment Routing architecture 474 5.1. Solution Concepts 475 The solution described below makes use of the following concepts. 477 o Transport Class (TC): A Transport Class is defined as a collection of 478 end-to-end MPLS paths that satisfy a set of constraints or 479 Service Level Agreements. 481 o BGP-Classful Transport (BGP-CT): A new BGP family used to 482 establish Transport Class paths across different domains. 484 o Route Distinguisher (RD): The Route Distinguisher is 485 defined in RFC4364. In BGP-CT, the RD is used in BGP advertisements 486 to differentiate multiple paths to the same loopback address. 487 It may be useful to automatically generate RDs in order to 488 simplify configuration. 490 o Route Target (RT): The Route Target extended community is 491 carried in BGP-CT advertisements. The RT represents the Transport Class 492 of an advertised path. Note that the RT is only carried in 493 the BGP-CT advertisements. No BGP-VPN related configuration or 494 VPN family advertisements are needed when BGP-CT transport paths are used 495 to carry non-VPN traffic. 497 o Mapping Community (MC): The Mapping Community is the BGP extended community 498 as defined in RFC4360. In the Seamless SR architecture, 499 an MC is carried by a BGP-CT route and/or a service route. 500 The MC is used to identify the specific local policy used 501 to map traffic for a service route to different Transport Class paths. 502 When a mapping community is advertised in a BGP-CT route it 503 identifies the specific local policy used to map the BGP-CT 504 route to the intra-domain tunnels.The local policy can include 505 additional traffic steering properties for placing traffic on different 506 Transport Class paths. The values of the MCs and the 507 corresponding local policies for service mapping are defined 508 by the network operator. 510 Figure 6: Solution Concepts 512 5.2. BGP Classful Transport 513 ----IBGP------EBGP----IBGP------EBGP-----IBGP--- 514 | | | | | | 516 +-----------+ +-----------+ +-----------+ 517 | | | | | | 518 | ASBR1+--+ASBR2 ASBR3+--+ASBR4 | 519 PE1+ D1 | X | D2 | X | D3 +PE2 520 | ASBR5+--+ASBR6 ASBR7+--+ASBR8 | 521 | | | | | | 522 +-----+-----+ +-----------+ +-----------+ 523 PE3 525 |---ISIS1---| |---ISIS2---| |---ISIS3---| 527 Figure 7: WAN Network 529 The above diagram shows a WAN network divided into 3 different 530 domains. Within each domain, BGP sessions are established between 531 the PE nodes and the border nodes as well as between border nodes. 532 BGP sessions are also established between border nodes across 533 domains. The goal is for PE1 to have MPLS connectivity to PE2, 534 satisfying specific characteristics. Multiple MPLS paths from PE1 to 535 PE2 are required in order to satisfy different SLAs. 536 [I-D.kaliraj-idr-bgp-classful-transport-planes] defines a new BGP 537 family called BGP-Classful Transport. The NLRI for this new family 538 consists of a prefix and a Route Distinguisher. The prefix 539 corresponds to the loopback of the destination PE, and RD is used to 540 distinguish different paths to the same PE loopback. The BGP-CT 541 advertisement also carries a Route Target. The RT specifies the 542 Transport Class to which the BGP-CT advertisement belongs. BGP-CT 543 mechanisms are applicable to single ownership networks that are 544 organized into multiple domains. It is also applicable to multiple 545 ASes with different ownership but closely co-operating 546 administration. BGP-CT mechansims are not expected to be applied on 547 the internet peering or between domains that have completely 548 independent administrations. 550 BGP-CT advertisements for red Transport Class 552 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 553 RD:RD1 RD:RD1 RD:RD1 RD:RD1 RD:RD1 554 RT:Red RT:Red RT:Red RT:Red RT:Red(100) 555 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:PE2 556 Label:L1 Label:L2 Label:L3 Label:L4 Label:L5 558 PE1-------ASBR1------ASBR2---------ASBR3-------ASBR4--------PE2 560 VPNa Prefix: 561 10.1.1.1/32 562 RD: RD50 563 RT: RT-VPNa 564 ext-community: 565 Red(100) 566 nh: PE2 567 Label: S1 569 +------+ +------+ +------+ 570 | IL71 | | IL72 | | IL73 | 571 +------+ +------+ +------+ +------+ +------+ 572 | L1 | | L2 | | L3 | | L4 | | L5 | 573 +------+ +------+ +------+ +------+ +------+ 574 | S1 | | S1 | | S1 | | S1 | | S1 | 575 +------+ +------+ +------+ +------+ +------+ 577 Label stacks along end-to-end path 578 S1 is the end-to-end service label. 579 IL71, IL72, and IL73 are intra-domain labels corresponding to 580 red intra-domain paths. 582 Figure 8: BGP-CT Advertisements and Label Stacks 583 BGP-CT advertisements for blue Transport Class 585 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 586 RD:RD2 RD:RD2 RD:RD2 RD:RD2 RD:RD2 587 RT:Blue RT:Blue RT:Blue RT:Blue RT:Blue(200) 588 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:PE2 589 Label:L11 Label:L12 Label:L13 Label:L14 Label:L15 591 PE1-------ASBR1----ASBR2----------ASBR3-------ASBR4--------PE2 593 VPNb Prefix: 594 10.1.1.1/32 595 RD: RD51 596 RT: RT-VPNb 597 ext-community: 598 Blue(200) 599 nh: PE2 600 Label: S2 602 +------+ +------+ +------+ 603 | IL81 | | IL82 | | IL83 | 604 +------+ +------+ +------+ +------+ +------+ 605 | L11 | | L12 | | L13 | | L14 | | L15 | 606 +------+ +------+ +------+ +------+ +------+ 607 | S2 | | S2 | | S2 | | S2 | | S2 | 608 +------+ +------+ +------+ +------+ +------+ 610 Label stacks along end-to-end path 611 S2 is the end-to-end service label. 612 IL81, IL82, and IL83 are intra-domain labels corresponding to 613 blue intra-domain paths. 615 Figure 9: BGP-CT Advertisements and Label Stacks 617 For example, consider the diagram in Figure 8 and Figure 9 . The 618 diagram shows the BGP-CT advertisements corresponding to two 619 different end-to-end paths between PE1 and PE2. The two different 620 paths belong to two different Transport Classes, red and blue. 622 The inter-domain paths created by BGP-CT Transport Classes can be 623 used by any traffic that can be steered using BGP next-hop 624 resolution, including vanilla IPv4 and IPv6, L2VPN, L3VPN, and eVPN. 625 In the example above, we show how traffic from two different L3VPNs 626 (VPNa and VPNb) is mapped onto two different BGP-CT Transport Classes 627 (Red and Blue). The L3VPN advertisements for VPNa and VPNb are 628 originated by PE2 as usual. PE1 receives these L3VPN advertisements 629 and uses the next-hop in the L3VPN advertisements to determine the 630 path to use. In the absence of any BGP-CT Transport Classes in the 631 network, PE1 would likely resolve the L3VPN next-hop over BGP-LU 632 routes corresponding to the BGP best path. However, when BGP-CT 633 Transport Classes are used, PE1 will resolve the L3VPN next-hop over 634 a BGP-CT route. 636 In the example above, PE2 originates BGP-CT advertisements for the 637 Red and Blue Transport Classes. These BGP-CT advertisements 638 propagate across the multiple domains, causing forwarding state for 639 the two Transport Classes to be installed at ABRs along the way. In 640 order to create unique NLRIs for the two advertisements, PE2 uses two 641 different RDs. In the example above, the red BGP-CT advertisement 642 has an RD of RD1 and the blue BGP-CT advertisement has an RD of RD2. 643 Note that the RD values used in the BGP-CT advertisement are 644 completely independent of the RD values used in the L3VPN 645 advertisements. In both cases, the RD values are simply a mechanism 646 to guarantee uniqueness of a prefix/RD pair. 648 The RT values used in the BGP-CT advertisements are unrelated to the 649 RT values used on the L3VPN advertisements. The L3VPN RT values 650 identify VPN membership, as usual. The BGP-CT RT values identify 651 Transport Class membership. In order to be able to easily map VPN 652 traffic into BGP-CT Transport classes, it can be useful however to 653 make an association between BGP-CT RT values and color extended 654 community values in the L3VPN advertisements. In the example 655 above,the RT value carried in the BGP-CT advertisement originated 656 from PE2 for the red Transport Class is configured to correspond to 657 the color extended community advertised in the VPN advertisement for 658 VPNa. Similarly, the RT value for the blue Transport Class 659 corresponds to the color extended community for VPNb. In this way, 660 traffic on PE1 for each VPN can be mapped to a tranport class path by 661 associating the value of the color extended community carried in the 662 VPN advertisement with an RT value carried in a BGP-CT advertisement. 664 The example above also shows the label stacks at different points 665 along the end-to-end paths for the forwarding entries which are 666 established by the two advertisements. Labels L1-L4 are red BGP-CT 667 labels advertised by border nodes ASBR1,2,3,and 4, while label L5 is 668 advertised by PE2 for the red Transport Class. Labels L11-L14 are 669 blue BGP-CT labels advertised by border nodes ASBR1,2,3,and 4, while 670 label L15 is advertised by PE2 for the blue Transport Class. 672 IL71, IL72, and IL73 represent tunnels internal to the domains 1, 2, 673 and 3 which correspond to the red Transport Class. IL81, IL82, and 674 IL83 represent tunnels internal to the domains 1, 2, and 3 which 675 correspond to the blue Transport Class. In this example, we assume 676 that the intra-domain tunnels correspond to SRTE policies having red 677 SRTE-policy-color and blue SRTE-policy-color. Service labels are 678 represented by S1 and S2. 680 Note that this example focuses on how signalling originated by PE2 681 results in forwarding state used by PE1 to reach PE2 on a specific 682 Transport Class path. The solution supports the establishment of 683 forwarding state for an arbitrary number of PEs to reach PE2. For 684 example, PE3 in Figure 8 can reach PE2 on a red Transport Class path 685 established using the same BGP-CT signalling. The signalling and 686 forwarding state from ASBR1 all the way to PE2 is common to the paths 687 used by both PE1 and PE3. This merging of signalling and forwarding 688 state is essentially to the good scaling properties of the Seamless 689 SR architecture. Millions of end-to-end Transport Class paths can be 690 established in a scalable manner. 692 5.3. Automatically Creating Transport Classes 694 In order to simplify the creation of inter-domain paths, it may be 695 desirable to automatically advertise a BGP-CT Transport Class based 696 on the existence of an intra-domain tunnel. The RT value used on the 697 BGP-CT advertisement is automatically derived from a property of the 698 intra-domain tunnel that triggered its creation. How the Transpor 699 Class RT value is derived for different types of intra-domain tunnels 700 is discussed below. 702 5.3.1. Automatically Creating Transport Classes for BGP-SR-TE Intra- 703 domain Tunnels 705 When the intra-domain tunnel is a BGP-SR-TE policy 706 [I-D.ietf-idr-segment-routing-te-policy], the value of the Transport 707 Class RT in the corresponding BGP-CT advertisement is derived from 708 the Policy Color contained in SR Policy NLRI. The 32-bit Policy 709 Color is directly converted to a 32-bit Transport Class RT. 711 5.3.2. Automatically Creating Transport Classes for Flex-Algo Tunnels 713 When the intra-domain tunnel is created using Flex-Algo 714 [I-D.ietf-lsr-flex-algo], the value of the Transport Class RT in the 715 corresponding BGP-CT advertisement is derived from the 8-bit 716 Algorithm value carried in SR-Algorithm sub-TLV (RFC8667). The 717 conversion from 8-bit Algorithm value to 32-bit Transport Class RT is 718 done by treating both as unsigned integers. Note that this 719 definition allows for intra-domain tunnels created via standardized 720 algorithm (0-127) as well as flex-algo (128-255). 722 5.3.3. Auto-deriving Transport Classes for PCEP 724 When the intra-domain tunnel is created using PCEP, the value of the 725 Transport Class RT in the corresponding BGP-CT advertisement is 726 derived from the Color of the SR Policy Identifiers TLV defined in 727 [I-D.ietf-pce-segment-routing-policy-cp]. The 32-bit Color is 728 directly converted to a 32-bit Transport Class RT. 730 5.4. Inter-domain flex-algo with BGP-CT 732 Flex-algo (defined in [I-D.ietf-lsr-flex-algo]) provides a mechanism 733 to separate routing planes. Multiple algorithms are defined and 734 prefix-SIDs are advertised for each algorithm. BGP-CT can be used to 735 advertise these flex-algo SIDs in BGP-CT. BGP Prefix-SID (RFC 8669) 736 is an attribute and can be carried in the BGP-CT NLRI. Multiple 737 transport classes that correspond to each of the flex-algo in IGP 738 domain are defined. These Transport Classes advertise the IGP flex- 739 algo SIDs in the prefix-SIDs attribute in the BGP-CT NLRI. 741 5.5. Applicability to color-only policies 743 Color-only policies consist of (nullEndpont, color) as specified in 744 [I-D.ietf-spring-segment-routing-policy]. Special steering 745 mechanisms are defined with "CO" flags defined in the color extended 746 community [I-D.ietf-idr-segment-routing-te-policy]. Color-only 747 policies can be advertised in BGP-CT with the prefix being NULL 748 (0.0.0.0/32 or 0::0/128). Seperate RD will be advertised for each 749 NULL advertisement with different color. The Route target carries 750 the Policy Color contained in SR Policy NLRI. The steering 751 mechanisms defined in [I-D.ietf-spring-segment-routing-policy] MUST 752 be honoured while resolving services prefixes on the BGP-CT 753 advertisements. 755 5.6. Data sovereignty 756 +-----------+ +-----------+ +-----------+ 757 | | | +-+ AS2 | | | 758 | A1+--+A2 | | A3+--+A4 | 759 PE1+ AS1 | | |Z| | | AS3 +PE3 760 | A5+--+A6 | | A7+--+A8 | 761 | | | +-+ | | | 762 +--A13--A15-+ +-A17--A19--+ +-----------+ 763 | | | | 764 | | | | 765 | | | | 766 +--A14--A16-+ +-A18--A20--+ 767 | | | | 768 | A9+--+A10 | 769 PE4+ AS4 | | AS5 | 770 | A11+-+A12 | 771 | | | | 772 +-----------+ +-----------+ 774 Figure 10: Multi domain Network 776 Consider a WAN network with multiple ASes as shown in the diagram 777 Figure 10. The ASes roughly correspond to the geographical location 778 of the nodes. In this example, we assume that each AS corresponds to 779 a continent. The data sovereignty requirement in this example is 780 that certain traffic from PE1(in AS1) to PE3(in AS3) must not cross 781 through country Z in AS2. As indicate by the location of country Z 782 in the diagram, all paths that go directly from AS1 to AS3 through 783 AS2 necessarily passes through country Z. Using BGP-LU to provide 784 connectivity from PE1 to PE3 would generally result in a path that 785 goes from AS1 to AS2 to AS3, which does not satisfy the data 786 sovereignty requirement in this example. Instead, the solution using 787 BGP-CT will go from AS1 to AS4 to AS5 to AS2 to AS3. BGP-CT will 788 ensure that when the traffic passes through AS2, only intra-domain 789 paths satisfying the data sovereignty requirement will be used. 791 Within AS2, there are several different intra-domain TE mechanisms 792 that can be used to exclude links that pass through country Z. For 793 example, RSVP-TE or flex-algo can be used to create intra-domain 794 paths that satisfy the data sovereignty requirement. BGP-CT allows 795 the constrained intra-domain paths to satisfy requirements for end- 796 to-end inter-domain paths. LSPs created by RSVP-TE or Flex-algo that 797 satisfy the "exclude country Z" constraint are associated with a 798 color Green. A Green Transport Class is defined on border nodes in 799 all ASes. This Green Transport Class is associated with a mapping 800 community called Not-Z. 802 In AS2, the ASBRs are configured such that the presence of the 803 mapping community Not-Z in BGP-CT routes results in a strict route 804 resolution mechanism for those routes. A BGP-CT route carrying the 805 color extended community Not-Z will only resolve on the Green 806 Tranport Class. So it will only use Green intra-domain tunnels. 808 In AS1, AS3, AS4, and AS5, no links pass through country Z, so all 809 intra-domain paths automatically satisfy the data sovereignty 810 requirement. So there is no need for the creation of Green intra- 811 domain tunnels. In these ASes, the presence of the mapping community 812 Not-Z in BGP-CT routes results in resolution on best-effort paths. 813 Even though the ASBRs in these ASes do not need to create Green 814 intra-domain tunnels, they still need to allocate labels to identify 815 traffic using the Green Transport Class. These labels will be used 816 by the ASBRs in AS2 to put traffic on the Green intra-domain tunnels 817 in AS2. 819 The requirement is that only a subset of traffic honor the data 820 sovereignty requirement. The service prefixes from PE1 to PE2 that 821 need to honor the data sovereignty requirement will be associated 822 with Green extended color community in the service advertisements. 823 This will result in PE1 using the BGP-CT labels corresponding to 824 {PE2, Green} to forward the traffic. BGP-CT labels corresponding to 825 {PE2, Green} will exist at every ASBR along the path. The traffic 826 originating on PE1, will be associated with Green color community. 827 The bottom-most label in the packet consists of a VPN label. Above 828 the VPN label, BGP-CT label is imposed. Above BGP-CT label, the 829 intra-domain transport label is imposed. Let us assume the traffic 830 from PE1 needs to go to PE2 through AS1, AS4, AS5, AS2, and AS3. The 831 BGP-CT label for {PE2, Green} will be swapped at the border nodes. 833 Note that end-to-end inter-domain data sovereignty can in principle 834 be accomplished using BGP-LU with multiple loopbacks and associating 835 those loopbacks to appropriate transport tunnels at every border node 836 in every domain. This is very configuration intensive and require 837 multiple loopbacks. BGP-CT builds on the basic mechanisms of BGP-LU 838 while greatly simplifying such use cases. 840 5.7. Interconnecting IP Fabric Data Centers 841 Prefix:TOR2 Prefix:TOR2 Prefix:TOR2 Prefix:TOR2 Prefix:TOR2 842 RD:RD2 RD:RD2 RD:RD2 RD:RD2 RD:RD2 843 RT:Blue RT:Blue RT:Blue RT:Blue RT:Blue 844 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:TOR2 845 Label:L11 Label:L12 Label:L13 Label:L14 Label:L15 847 +-----------+ +-----------+ +-----------+ 848 | ASBR1 ASBR2 ASBR3 ASBR4 | 849 | | | | | | 850 TOR1+ DC1 +-------+ CORE +--------+ DC2 +TOR2 851 | ASBR11 ASBR22 ASBR33 ASBR44 | 852 | | | | | | 853 +-----------+ +-----------+ +-----------+ 855 +------+ +------+ +------+ 856 | UDP | | IL82 | | UDP | 857 +------+ +------+ +------+ +------+ +------+ 858 | L11 | | L12 | | L13 | | L14 | | L15 | 859 +------+ +------+ +------+ +------+ +------+ 860 | S2 | | S2 | | S2 | | S2 | | S2 | 861 +------+ +------+ +------+ +------+ +------+ 863 Label stacks along end-to-end path 864 S2 is the end-to-end service label. 865 IL82, is intra-domain labels corresponding to 866 blue intra-domain paths. 868 Figure 11: Operation in IP fabric 870 Many data center networks consist of IP fabrics which do not have 871 MPLS packet processing capability. A common requirement is that 872 traffic originated from an IP Fabric data center needs to satisfy 873 certain constraints in the MPLS-enable core, for example, only using 874 a subset of links (blue links). It is useful for the traffic 875 originating in an IP Fabric DC to carry information that allows the 876 MPLS-enable core to treat it accordingly. MPLSoUDP, as defined in 877 [RFC7510], is a mechanism where a UDP header is imposed on an MPLS 878 packets on the border nodes. In Figure 11 above, the traffic needs 879 to take blue paths in the core. The Blue Transport Class is defined 880 on the ASBRs. In the core, Blue intra-domain tunnels are created. 881 The BGP-CT advertisements for the Blue Transport Class are as shown 882 in the diagram. The BGP-CT advertisements originate at TOR2 and 883 propagate through all the ASBRs, until finally reaching TOR1. Within 884 DC1, traffic is encapsulated with a UDP header. Traffic with the UDP 885 header gets decapsulated at ASBR1. The traffic follows Blue paths in 886 the core. At ASBR4, the MPLS packet gets encapsulated with a UDP 887 header. The UDP header is removed at TOR2, and the lookup will be 888 done for the service label. 890 5.8. Translating Transport Classes across Domains 892 Prefix:PE2 Prefix:PE2 Prefix:PE2 893 RD:RD2 RD:RD2 RD:RD2 894 RT:Red RT:Blue RT:Blue 895 nh:ASBR1 nh:ASBR2 nh:PE2 896 Label:L11 Label:L12 Label:L13 898 +-----------+ +-----------+ 899 | ASBR1 ASBR2 | 900 | | | | 901 PE1+ AS1 +----------------+ AS2 +PE2 902 | ASBR11 ASBR22 | 903 | | | | 904 +-----------+ +-----------+ 906 +------+ +------+ 907 | IL1 | | IL2 | 908 +------+ +------+ +------+ +------+ 909 | L11 | | L12 | | L13 | | L14 | 910 +------+ +------+ +------+ +------+ 911 | S2 | | S2 | | S2 | | S2 | 912 +------+ +------+ +------+ +------+ 914 Label stacks along end-to-end path 915 S2 is the end-to-end service label. 916 IL1 and IL2 are intra-domain labels corresponding to 917 red intra-domain path in AS1 and Blue intra-domain 918 path in AS2. 920 Figure 12: Translating Transport Classes across Domains 922 In certain scenarios, the TE intent represented by Transport Classes 923 may differ from one domain to another. This could be the result of 924 two independent organizations merging into one. It could also occur 925 when two ASes are under different administration, but use BGP-CT to 926 provide an end-to-end service. In both scenarios, the same color may 927 represent different intent in each domain. When the traffic needs to 928 satisfy certain TE characteristic, the colors need to be mapped 929 correctly at the border. In the example in Figure 12, there are two 930 ASes. The low latency TE intent is represented with the Red 931 Transport Class in AS1 and with the Blue Transport Class in AS2. PE2 932 advertises a BGP-CT prefix with RT of Blue. ASBR2 sets the nexthop 933 to self and advertises a new label L12. On ASBR1, the Blue BGP-CT 934 advertisement is imported into the Red Transport RIB and the 935 advertisement from ASBR1 will carry a Red RT. This ensures that the 936 BGP-CT prefix for PE2 resolves on a Red intra-domain path in AS1. 938 5.9. SLA Guarantee 940 5.9.1. Low latency 942 Many network functions are virtualized and distributed. Certain 943 functions are time and latency sensitive. In inter-domain networks, 944 End-to-End latency measurement is required. Inside a domain, latency 945 measurement mechanisms such as TWAMP [RFC5357] are used and link 946 latency is advertised in IGP using extensions described in 947 [RFC8570]and [RFC7471] . 949 [I-D.ietf-idr-performance-routing] extends the BGP AIGP attribute 950 [RFC7311] by adding a sub TLV to carry an accumulated latency metric. 951 The BGP best path selection algorithm used for a Transport Class 952 requiring low latency will consider the accumulated latency metric to 953 choose the lowest latency path. 955 5.9.2. Traffic Engineering (TE) constraints 957 TE constraints generally include the ability to send traffic via 958 certain nodes or links or avoid using certain nodes or links. In the 959 Seamless SR architecture, the intra-domain transport technology is 960 responsible for ensuring the TE constraints inside the domain, BGP-CT 961 ensures that the end-to-end path is constructed from intra-domain 962 paths and inter-AS links that individually satisfy the TE 963 constraints. 965 For example, in order to construct a pair of diverse paths, we can 966 define a red and a blue Transport Class. Within each domain, the red 967 and blue Transport Class path are realized using intra-domain path 968 diversity mechanisms. For example, in a domain using flex-algo, red 969 and blue Transport Classes are realized using red and blue flex-algo 970 definitions (FAD) which don't share any links. To maintain path 971 diversity on inter-AS links, BGP policies are used to associate two 972 inter-AS peers with the red Transport Class and another two inter-AS 973 peers with the blue Transport Class. 975 5.9.3. Bandwidth constraints 977 The Seamless SR architecture does not natively support end-to-end 978 bandwidth reservations. In this architecture, the bandwidth 979 utilization characteristics of each domain are managed independently. 980 The intra-domain bandwidth management can make use of a variety of 981 tools. 983 Link bandwidth extended community as defined in 984 [I-D.ietf-idr-link-bandwidth] allows for efficient weighted load- 985 balancing of traffic on multiple BGP-CT paths that belong to the same 986 Transport Class. For optimized path placement, a centralized TE 987 system may be deployed with BGP policies/communities used for path 988 placement. 990 5.10. Scalability 992 5.10.1. Access node scalability 994 The Seamless SR architecture needs to be able to accommodate very 995 large numbers of access devices. These access devices are expected 996 to be low-end devices with limited FIB capacity. The Seamless MPLS 997 architecture, as described in [I-D.ietf-mpls-seamless-mpls], 998 recommends the use of LDP DOD mode to limit the size of both the RIB 999 and the FIB needed on the access devices. In the Seamless SR 1000 architecture, networks use IGP-based label distribution and do not 1001 have this selective label request mechanism. However, RIB 1002 scalability of access nodes has not been a problem for real seamless 1003 MPLS deployments. In cases where access devices are low on CPU and 1004 memory and unable to support large a RIB, BGP filtering policies can 1005 be applied at the ABR/ASBR routers to restrict the number of BGP-CT 1006 advertisements towards the access devices. The access devices will 1007 receive only the PE loopbacks that it needs to connect to. 1009 5.10.2. Label stack depth 1011 The ability for a device to push multiple MPLS labels on a packet 1012 depends on hardware capabilities. Access devices are expected to 1013 have limited label stack push capabilities. Assuming shortest path 1014 SR-MPLS in the access domain, the access domain transport will use a 1015 single label. Lightweight traffic-engineering and slicing could also 1016 be achieved with a single label as described in 1017 [I-D.ietf-lsr-flex-algo]. The Seamless SR architecture can provide 1018 cross-domain MPLS connectivity with a single label. Assuming the use 1019 of a service label, end-to-end connectivity is provided by pushing 1020 one service label, one BGP-CT label, and one intra-domain transport 1021 label (which could also be a Binding-SID). Therefore, access nodes 1022 will only need to be able to push 3 labels for most applications. 1024 5.10.3. Label Resources 1025 -----IBGP----- -----IBGP----- -----IBGP------ 1026 | | | | 1028 BGP-CT Advt: 1029 Prefix: 2.2.2.2 (PE2 loopback) 1030 RD:20000 1031 RT: 128 1032 Label:100 Label:100 Label:101 1033 Next hop:ABR3 Next hop:ABR3 Next hop: PE2 1034 ---------------------------------------------------------------- 1036 BGP-CT Advt: 1037 Prefix: 30.30.30.30 (ABR3 loopback) 1038 RD:30000 1039 RT:128 1040 Label:2000 Label:2001 1041 Nexthop:ABR1 Nexthop:ABR3 1043 +-----------+ +------------+ +-----------+ 1044 / \ / \/ \ 1045 | ABR1 ABR3 | 1046 | | | | 1047 PE1+ Metro1 + Core + Metro2 +PE2 1048 | | | | 1049 | ABR2 ABR4 | 1050 \ /\ /\ / 1051 +------------+ +-----------+ +------------+ 1053 |-ISIS1-| |-ISIS2-| |-ISIS3-| 1055 +------+ +------+ +------+ 1056 | 11111| | 22222| | 33333| IGP-labels: 1057 +------+ +------+ +------+ 11111,22222,33333 1058 | 2000 | | 2001 | | 101 | BGP-CT label: 1059 +------+ +------+ + -----+ For ABR3: 1060 | 100 | | 100 | | VPN | 2000,2001 1061 +------+ +------+ +------+ For PE2: 1062 | VPN | | VPN | 100, 101 1063 +------+ +------+ 1065 Figure 13: Recursive Route Resolution 1067 The label resources are an important consideration in MPLS networks. 1068 On access devices, labels are consumed by services as well as for 1069 transport loopbacks inside IGP domain where the access device 1070 resides. For example, in the above diagram PE1 would have to 1071 allocate label resources equal to the number of customers connecting 1072 (i.e. the number of L2/L3 VPNs). Based on the size of the IGP domain 1073 that PE1 resides in, it will also have to allocate labels for IGP 1074 loopbacks. This number is at most a few thousands. So overall a 1075 typical access device should have adequate label resources in 1076 Seamless SR architecture. The P routers need to allocate labels for 1077 IGP loopbacks. This number again is small. At most it will be a few 1078 thousand based on number of nodes in the largest IGP domains. The 1079 metro networks connect to the core network through ABRs. It is 1080 possible that a given ABR may end up having to maintain forwarding 1081 entries for a large subset of the transport loopback routes. There 1082 may be a large number of metro networks connecting to a given ABR, 1083 and in this case, the ABR will need forwarding entries for every 1084 access node in the directly connected metros. So, this ABR may have 1085 to maintain on the order of 100k routes. With BGP-CT each Transport 1086 Class will have to be separately allocated a label. So, in the above 1087 example, the ABR1 would have to use 300k labels if there were 3 1088 Transport Classes. This large number of label forwarding entries 1089 could be problematic. 1091 In highly scaled scenarios, it is therefore desirable to reduce the 1092 forwarding state on the ABRs. This reduction can be achieved with 1093 label stacking as a result of recursive route resolution. Figure 13 1094 illustrates how the forwarding state on ABRs can be greatly reduced 1095 by removing forward state for PEs in remote domains from the ABRs. 1096 In this example, we assume that we are setting up end-to-end paths 1097 for a single Transport Class, for example red. PE2 advertises a BGP- 1098 CT prefix of 2.2.2.2 with nexthop of 2.2.2.2 and label 101. 2.2.2.2 1099 is PE2's loopback. ABR3 advertises label 100 for BGP-CT prefix 1100 2.2.2.2 and changes the nexthop to self. When ABR1 receives the BGP- 1101 CT advertisement for 2.2.2.2, it does not change the nexthop and 1102 advertises same label advertised by ABR3. When PE1 receives the BGP- 1103 CT advertisement for 2.2.2.2 with a nexthop of ABR3, it resolves the 1104 route using reachability to ABR3. 1106 The reachability of ABR3 has been learned by PE1 as the result of a 1107 BGP-CT advertisement originated by ABR3. As shown in Figure 13, ABR3 1108 advertises BGP-CT prefix 30.30.30.30 with label 2001. ABR1 1109 advertises label 2000 for BGP-CT prefix 30.30.30.30 and sets nexthop 1110 to self. PE1 constructs the service data packet with a VPN label at 1111 the bottom followed by 2 BGP-CT labels 100 and 2000. The top most 1112 label 2000 is the transport label for the metro1 domain. Removing 1113 the forwarding state for PEs in remote domains on the ABRs comes at 1114 the expense of one additional BGP-CT label on the data packet. 1116 Recursive route resolution provides significant forwarding state 1117 reduction on the ABRs. ABRs have to allocate label resources only 1118 for the PEs in their local domain. The number of PEs in the same 1119 domain as a given ABR is much lower than the total number of PEs in 1120 the network. 1122 The examples in this draft generally show VPN routes resolving on 1123 BGP-CT prefixes. However, the mechanisms are equally applicable to 1124 non-VPN routes. 1126 5.11. Availability 1128 Transport layer availability is very important in latency and loss 1129 sensitive networks. Any link or node failure must be repaired with 1130 50ms convergence time. 50 ms convergence time can be achieved with 1131 Fast ReRoute (FRR) mechanisms. The seamless SR architecture provides 1132 protection against intra-domain link and node failures, Protection 1133 against border node failures and the egress link and node failures 1134 are also provided. Details of the FRR techniques are described in 1135 the sections below. 1137 5.11.1. Intra domain link and node protection 1139 In the seamless SR architecture, protection against node and link 1140 failure is achieved with the relevant FRR techniques for the 1141 corresponding transport mechanism used inside the domain. In the 1142 case of an IP fabric, ECMP FRR or LFA can be used. In SR networks, 1143 TI-LFA [I-D.ietf-rtgwg-segment-routing-ti-lfa] provides link and node 1144 protection. For SR-TE transport 1145 ([I-D.ietf-spring-segment-routing-policy]), link and node protection 1146 can be achieved using TI-LFA, combined with mechanisms described in 1147 [I-D.hegde-spring-node-protection-for-sr-te-paths]. 1149 5.11.2. Egress link and node protection 1151 [RFC8679] describes the mechanisms for providing protection for 1152 border nodes and PE devices where services are hosted. The mechanism 1153 can be further simplified operationally with anycast SIDs and anycast 1154 service labels, as described in 1155 [I-D.hegde-rtgwg-egress-protection-sr-networks]. 1157 5.11.3. Border Node protection 1159 Border node protection is very important in a network consisting of 1160 multiple domains. Seamless SR architecture can achieve 50ms FRR 1161 protection in the event of node failure using anycast addresses for 1162 the ABR/ASBRs. The requires that a set of ABRs advertise the same 1163 label for a given BGP-CT Prefix. The detailed mechanism is described 1164 in [I-D.hegde-rtgwg-egress-protection-sr-networks]. 1166 5.12. Operations 1168 5.12.1. MPLS ping and Traceroute 1170 The Seamless SR Architecture consists of 3 layers: the service layer, 1171 intra-domain transport, and BGP-CT transport. Within each layer, 1172 connectivity can be verified independently. Within the BGP-CT 1173 transport layer, end-to-end connectivity can be verified using a new 1174 OAM FEC for BGP-CT defined in draft 1175 [I-D.kaliraj-idr-bgp-classful-transport-planes]. The draft describes 1176 end-to-end connectivity verification as well as fault isolation. 1177 BGP-CT verification happens only on the BGP nodes. The intra-domain 1178 connectivity verification and fault isolation will be based on the 1179 technology deployed in that domain as defined in [RFC8029] and 1180 [RFC8287]. 1182 5.12.2. Counters and Statistics 1184 Traffic accounting and the ability to build demand matrix for PE to 1185 PE traffic is very important. With BGP-CT, per-label transit 1186 counters should be supported on every transit router. Per-label 1187 transit counters provide details of total traffic towards a remote PE 1188 measured at every BGP transit router. Per-label egress counters 1189 should be supported on ingress PE router. Per-label egress counters 1190 provide total traffic from ingress PE to the specific remote PE. 1192 5.13. Service Mapping 1194 Service mapping is an important aspect of any architecture. It 1195 provides means to translate end users SLA requirements into 1196 operator's network configurations. Seamless SR architecture supports 1197 automatic steering with extended color community. The Transport 1198 Class and the route target carried by the BGP-CT advertisement 1199 directly map to the extended color community. Services that require 1200 specific SLA carry the extended color community which maps to the 1201 Transport Class to which the BGP-CT advertisement belongs. 1203 Other types of traffic steering such as DSCP based forwarding is 1204 expressed with mapping-community. Mapping community is a standard 1205 BGP community and is completely generic and user defined. The 1206 mapping community will have a specific service mapping feature 1207 associated with it along with required fallback behaviour when the 1208 primary transport goes down. The below list provides a general 1209 guideline into the different service mapping features and fallback 1210 options an implementation should provide. 1212 DSCP based mapping with each DSCP mapping to a Transport Class. 1214 DSCP based mapping with default mapping to a best-effort transport 1216 DSCP based mapping with fallback to best-effort when primary 1217 transport tunnel goes down. 1219 Extended color community based mapping with fallback to best 1220 effort 1222 Fallback options with specific protocol during migrations 1224 Fallback options to a different Transport Class. 1226 No Fallback permitted. 1228 5.14. Migrations 1230 Networks that migrate from Seamless MPLS architecture to Seamless SR 1231 architecture, require that all the border nodes and PE devices be 1232 upgraded and enabled with new family on the BGP session. In cases 1233 where legacy nodes that cannot be upgraded, exporting from BGP-LU 1234 into BGP-CT and vice versa SHOULD be supported. Once the entire 1235 network is migrated to support BGP-CT, there is no need to run BGP-LU 1236 family on the BGP sessions. BGP-CT itself can advertise a best 1237 effort Transport Class and BGP-LU family can be removed. 1239 5.15. Interworking with v6 transport technologies 1241 A later version of this document will address interworking with other 1242 v6 technologies, including SRv6, SRm6, and MPLS over GRE6. 1244 5.16. BGP based Multicast 1246 BGP based multicast as described in draft 1247 [I-D.zzhang-bess-bgp-multicast] serves two main purposes. It can 1248 replace PIM/ mLDP inside a domain to natively do a BGP based 1249 multicast. It can also serve as an overlay stitching protocol to 1250 stitch multiple P2MP LSPs across the domain. This gives the ability 1251 to easily transition each domain independently from one technology to 1252 the other. BGP based multicast defines a new SAFI for carrying the 1253 MULTICAST TREE SAFI. Different route types are defined to support 1254 the various usecases. 1256 6. Backward Compatibility 1258 7. Security Considerations 1260 TBD 1262 8. IANA Considerations 1264 9. Acknowledgements 1266 Many thanks to Kireeti Kompella, Ron Bonica, Krzysztof Szarcowitz, 1267 Srihari Sangli,Julian Lucek, Ram Santhanakrishnan for discussions and 1268 inputs. Thanks to Joel Halpern for review and comments. 1270 10. Contributors 1272 1.Kaliraj Vairavakkalai 1274 Juniper Networks 1276 kaliraj@juniper.net 1278 2. Jeffrey Zhang 1280 Juniper Networks 1282 zzhang@juniper.net 1284 11. References 1286 11.1. Normative References 1288 [I-D.hegde-rtgwg-egress-protection-sr-networks] 1289 Hegde, S. and W. Lin, "Egress Protection for Segment 1290 Routing (SR) networks", draft-hegde-rtgwg-egress- 1291 protection-sr-networks-00 (work in progress), March 2020. 1293 [I-D.ietf-idr-performance-routing] 1294 Xu, X., Hegde, S., Talaulikar, K., Boucadair, M., and C. 1295 Jacquenet, "Performance-based BGP Routing Mechanism", 1296 draft-ietf-idr-performance-routing-02 (work in progress), 1297 October 2019. 1299 [I-D.kaliraj-idr-bgp-classful-transport-planes] 1300 Vairavakkalai, K., Venkataraman, N., and B. Rajagopalan, 1301 "BGP Classful Transport Planes", draft-kaliraj-idr-bgp- 1302 classful-transport-planes-01 (work in progress), July 1303 2020. 1305 [I-D.zzhang-bess-bgp-multicast] 1306 Zhang, Z., Giuliano, L., Patel, K., Wijnands, I., mishra, 1307 m., and A. Gulko, "BGP Based Multicast", draft-zzhang- 1308 bess-bgp-multicast-03 (work in progress), October 2019. 1310 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1311 Requirement Levels", BCP 14, RFC 2119, 1312 DOI 10.17487/RFC2119, March 1997, 1313 . 1315 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 1316 BGP-4", RFC 3107, DOI 10.17487/RFC3107, May 2001, 1317 . 1319 [RFC8669] Previdi, S., Filsfils, C., Lindem, A., Ed., Sreekantiah, 1320 A., and H. Gredler, "Segment Routing Prefix Segment 1321 Identifier Extensions for BGP", RFC 8669, 1322 DOI 10.17487/RFC8669, December 2019, 1323 . 1325 11.2. Informative References 1327 [I-D.hegde-spring-node-protection-for-sr-te-paths] 1328 Hegde, S., Bowers, C., Litkowski, S., Xu, X., and F. Xu, 1329 "Node Protection for SR-TE Paths", draft-hegde-spring- 1330 node-protection-for-sr-te-paths-07 (work in progress), 1331 July 2020. 1333 [I-D.ietf-idr-link-bandwidth] 1334 Mohapatra, P. and R. Fernando, "BGP Link Bandwidth 1335 Extended Community", draft-ietf-idr-link-bandwidth-07 1336 (work in progress), March 2018. 1338 [I-D.ietf-idr-segment-routing-te-policy] 1339 Previdi, S., Filsfils, C., Talaulikar, K., Mattes, P., 1340 Rosen, E., Jain, D., and S. Lin, "Advertising Segment 1341 Routing Policies in BGP", draft-ietf-idr-segment-routing- 1342 te-policy-09 (work in progress), May 2020. 1344 [I-D.ietf-idr-tunnel-encaps] 1345 Patel, K., Velde, G., Sangli, S., and J. Scudder, "The BGP 1346 Tunnel Encapsulation Attribute", draft-ietf-idr-tunnel- 1347 encaps-19 (work in progress), September 2020. 1349 [I-D.ietf-lsr-flex-algo] 1350 Psenak, P., Hegde, S., Filsfils, C., Talaulikar, K., and 1351 A. Gulko, "IGP Flexible Algorithm", draft-ietf-lsr-flex- 1352 algo-11 (work in progress), September 2020. 1354 [I-D.ietf-mpls-seamless-mpls] 1355 Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz, 1356 M., and D. Steinberg, "Seamless MPLS Architecture", draft- 1357 ietf-mpls-seamless-mpls-07 (work in progress), June 2014. 1359 [I-D.ietf-pce-segment-routing-policy-cp] 1360 Koldychev, M., Sivabalan, S., Barth, C., Peng, S., and H. 1361 Bidgoli, "PCEP extension to support Segment Routing Policy 1362 Candidate Paths", draft-ietf-pce-segment-routing-policy- 1363 cp-00 (work in progress), June 2020. 1365 [I-D.ietf-rtgwg-segment-routing-ti-lfa] 1366 Litkowski, S., Bashandy, A., Filsfils, C., Decraene, B., 1367 Francois, P., Voyer, D., Clad, F., and P. Camarillo, 1368 "Topology Independent Fast Reroute using Segment Routing", 1369 draft-ietf-rtgwg-segment-routing-ti-lfa-04 (work in 1370 progress), August 2020. 1372 [I-D.ietf-spring-segment-routing-policy] 1373 Filsfils, C., Talaulikar, K., Voyer, D., Bogdanov, A., and 1374 P. Mattes, "Segment Routing Policy Architecture", draft- 1375 ietf-spring-segment-routing-policy-08 (work in progress), 1376 July 2020. 1378 [I-D.voyer-pim-sr-p2mp-policy] 1379 Voyer, D., Filsfils, C., Parekh, R., Bidgoli, H., and Z. 1380 Zhang, "Segment Routing Point-to-Multipoint Policy", 1381 draft-voyer-pim-sr-p2mp-policy-02 (work in progress), July 1382 2020. 1384 [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities 1385 Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, 1386 . 1388 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1389 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1390 2006, . 1392 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 1393 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 1394 RFC 5357, DOI 10.17487/RFC5357, October 2008, 1395 . 1397 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 1398 Thomas, "Label Distribution Protocol Extensions for Point- 1399 to-Multipoint and Multipoint-to-Multipoint Label Switched 1400 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 1401 . 1403 [RFC7311] Mohapatra, P., Fernando, R., Rosen, E., and J. Uttaro, 1404 "The Accumulated IGP Metric Attribute for BGP", RFC 7311, 1405 DOI 10.17487/RFC7311, August 2014, 1406 . 1408 [RFC7471] Giacalone, S., Ward, D., Drake, J., Atlas, A., and S. 1409 Previdi, "OSPF Traffic Engineering (TE) Metric 1410 Extensions", RFC 7471, DOI 10.17487/RFC7471, March 2015, 1411 . 1413 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1414 "Encapsulating MPLS in UDP", RFC 7510, 1415 DOI 10.17487/RFC7510, April 2015, 1416 . 1418 [RFC8029] Kompella, K., Swallow, G., Pignataro, C., Ed., Kumar, N., 1419 Aldrin, S., and M. Chen, "Detecting Multiprotocol Label 1420 Switched (MPLS) Data-Plane Failures", RFC 8029, 1421 DOI 10.17487/RFC8029, March 2017, 1422 . 1424 [RFC8287] Kumar, N., Ed., Pignataro, C., Ed., Swallow, G., Akiya, 1425 N., Kini, S., and M. Chen, "Label Switched Path (LSP) 1426 Ping/Traceroute for Segment Routing (SR) IGP-Prefix and 1427 IGP-Adjacency Segment Identifiers (SIDs) with MPLS Data 1428 Planes", RFC 8287, DOI 10.17487/RFC8287, December 2017, 1429 . 1431 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 1432 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1433 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 1434 July 2018, . 1436 [RFC8570] Ginsberg, L., Ed., Previdi, S., Ed., Giacalone, S., Ward, 1437 D., Drake, J., and Q. Wu, "IS-IS Traffic Engineering (TE) 1438 Metric Extensions", RFC 8570, DOI 10.17487/RFC8570, March 1439 2019, . 1441 [RFC8679] Shen, Y., Jeganathan, M., Decraene, B., Gredler, H., 1442 Michel, C., and H. Chen, "MPLS Egress Protection 1443 Framework", RFC 8679, DOI 10.17487/RFC8679, December 2019, 1444 . 1446 [TS.23.501-3GPP] 1447 3rd Generation Partnership Project (3GPP), "System 1448 Architecture for 5G System; Stage 2, 3GPP TS 23.501 1449 v16.4.0", March 2020. 1451 Authors' Addresses 1453 Shraddha Hegde 1454 Juniper Networks Inc. 1455 Exora Business Park 1456 Bangalore, KA 560103 1457 India 1459 Email: shraddha@juniper.net 1461 Chris Bowers 1462 Juniper Networks Inc. 1464 Email: cbowers@juniper.net 1466 Xiaohu Xu 1467 Alibaba Inc. 1468 Beijing 1469 China 1471 Email: xiaohu.xxh@alibaba-inc.com 1473 Arkadiy Gulko 1474 Refinitiv 1476 Email: arkadiy.gulko@refinitiv.com 1478 Alex Bogdanov 1479 Google Inc. 1481 Email: bogdanov@google.com 1483 Jim Uttaro 1484 ATT 1486 Email: ju1738@att.com 1488 Luay Jalil 1489 Verizon 1491 Email: luay.jalil@verizon.com