idnits 2.17.1 draft-hegde-spring-mpls-seamless-sr-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 32 instances of too long lines in the document, the longest one being 15 characters in excess of 72. == There are 8 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 901 has weird spacing: '... red intra...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Each domain SHOULD be independent and SHOULD not depend on the transport technology in another domain. This allows for more flexible evolution of the network. -- The document date (July 26, 2020) is 1360 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.ietf-idr-tunnel-encaps' is defined on line 1327, but no explicit reference was found in the text == Unused Reference: 'RFC1997' is defined on line 1367, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 1371, but no explicit reference was found in the text == Unused Reference: 'RFC8402' is defined on line 1414, but no explicit reference was found in the text == Outdated reference: A later version (-02) exists of draft-hegde-rtgwg-egress-protection-sr-networks-00 == Outdated reference: A later version (-03) exists of draft-ietf-idr-performance-routing-02 == Outdated reference: A later version (-17) exists of draft-kaliraj-idr-bgp-classful-transport-planes-00 ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) == Outdated reference: A later version (-07) exists of draft-hegde-spring-node-protection-for-sr-te-paths-06 == Outdated reference: A later version (-26) exists of draft-ietf-idr-segment-routing-te-policy-09 == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-15 == Outdated reference: A later version (-26) exists of draft-ietf-lsr-flex-algo-08 == Outdated reference: A later version (-15) exists of draft-ietf-pce-segment-routing-policy-cp-00 == Outdated reference: A later version (-13) exists of draft-ietf-rtgwg-segment-routing-ti-lfa-03 == Outdated reference: A later version (-22) exists of draft-ietf-spring-segment-routing-policy-08 Summary: 2 errors (**), 0 flaws (~~), 19 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SPRING S. Hegde 3 Internet-Draft C. Bowers 4 Intended status: Standards Track Juniper Networks Inc. 5 Expires: January 27, 2021 X. Xu 6 Alibaba Inc. 7 A. Gulko 8 Refinitiv 9 A. Bogdanov 10 Google Inc. 11 J. Uttaro 12 ATT 13 July 26, 2020 15 Seamless Segment Routing 16 draft-hegde-spring-mpls-seamless-sr-01 18 Abstract 20 In order to operate networks with large numbers of devices, network 21 operators organize networks into multiple smaller network domains. 22 Each network domain typically runs an IGP which has complete 23 visibility within its own domain, but limited visibility outside of 24 its domain. Seamless Segment Routing (Seamless SR) provides 25 flexible, scalable and reliable end-to-end connectivity for services 26 across independent network domains. Seamless SR accommodates domains 27 using SR, LDP, and RSVP for MPLS label distribution as well as 28 domains running IP without MPLS (IP-Fabric). 30 Requirements Language 32 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 33 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 34 document are to be interpreted as described in RFC 2119 [RFC2119]. 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at https://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on January 27, 2021. 53 Copyright Notice 55 Copyright (c) 2020 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (https://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 71 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 73 3.1. Service provider network . . . . . . . . . . . . . . . . 5 74 3.2. Large scale WAN networks . . . . . . . . . . . . . . . . 7 75 3.3. Data Center Interconnect (DCI) Networks . . . . . . . . . 8 76 3.4. Multicast Use cases . . . . . . . . . . . . . . . . . . . 8 77 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 9 78 4.1. MPLS Transport . . . . . . . . . . . . . . . . . . . . . 9 79 4.2. SLA Guarantee . . . . . . . . . . . . . . . . . . . . . . 10 80 4.3. Scalability . . . . . . . . . . . . . . . . . . . . . . . 10 81 4.4. Availability . . . . . . . . . . . . . . . . . . . . . . 10 82 4.5. Operations . . . . . . . . . . . . . . . . . . . . . . . 10 83 4.6. Service Mapping . . . . . . . . . . . . . . . . . . . . . 11 84 5. Seamless Segment Routing architecture . . . . . . . . . . . . 11 85 5.1. Solution Concepts . . . . . . . . . . . . . . . . . . . . 11 86 5.2. BGP Classful Transport . . . . . . . . . . . . . . . . . 12 87 5.3. Automatically Creating Transport Classes . . . . . . . . 17 88 5.3.1. Automatically Creating Transport Classes for BGP-SR- 89 TE Intra-domain Tunnels . . . . . . . . . . . . . . . 17 90 5.3.2. Automatically Creating Transport Classes for Flex- 91 Algo Tunnels . . . . . . . . . . . . . . . . . . . . 17 92 5.3.3. Auto-deriving Transport Classes for PCEP . . . . . . 18 93 5.4. Inter-domain flex-algo with BGP-CT . . . . . . . . . . . 18 94 5.5. Data sovereignty . . . . . . . . . . . . . . . . . . . . 18 95 5.6. Interconnecting IP Fabric Data Centers . . . . . . . . . 20 96 5.7. Translating Transport Classes across Domains . . . . . . 21 97 5.8. SLA Guarantee . . . . . . . . . . . . . . . . . . . . . . 23 98 5.8.1. Low latency . . . . . . . . . . . . . . . . . . . . . 23 99 5.8.2. Traffic Engineering (TE) constraints . . . . . . . . 23 100 5.8.3. Bandwidth constraints . . . . . . . . . . . . . . . . 23 101 5.9. Scalability . . . . . . . . . . . . . . . . . . . . . . . 24 102 5.9.1. Access node scalability . . . . . . . . . . . . . . . 24 103 5.9.2. Label stack depth . . . . . . . . . . . . . . . . . . 24 104 5.9.3. Label Resources . . . . . . . . . . . . . . . . . . . 24 105 5.10. Availability . . . . . . . . . . . . . . . . . . . . . . 27 106 5.10.1. Intra domain link and node protection . . . . . . . 27 107 5.10.2. Egress link and node protection . . . . . . . . . . 27 108 5.10.3. Border Node protection . . . . . . . . . . . . . . . 27 109 5.11. Operations . . . . . . . . . . . . . . . . . . . . . . . 28 110 5.11.1. MPLS ping and Traceroute . . . . . . . . . . . . . . 28 111 5.11.2. Counters and Statistics . . . . . . . . . . . . . . 28 112 5.12. Service Mapping . . . . . . . . . . . . . . . . . . . . . 28 113 5.13. Migrations . . . . . . . . . . . . . . . . . . . . . . . 29 114 5.14. Interworking with v6 transport technologies . . . . . . . 29 115 5.15. BGP based Multicast . . . . . . . . . . . . . . . . . . . 29 116 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . 30 117 7. Security Considerations . . . . . . . . . . . . . . . . . . . 30 118 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 119 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 30 120 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 30 121 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 122 11.1. Normative References . . . . . . . . . . . . . . . . . . 30 123 11.2. Informative References . . . . . . . . . . . . . . . . . 31 124 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 126 1. Introduction 128 Evolving wireless access technology and cloud applications are 129 expected to place new requirements on the packet transport networks. 130 These services are contributing to significantly higher bandwidth 131 throughput which in turn leads to a growing number of transport 132 network devices. As an example, 5G networks are expected to require 133 up to 250Gbps in the fronthaul and up to 400Gbps in the backhaul. 134 There is a desire to allow many network functions to be virtualized 135 and cloud native. In order to support latency-sensitive cloud-native 136 network functions, packet transport networks should be capable of 137 providing low-latency paths end-to-end. Some services will require 138 low-latency paths while others may require different QoS properties. 139 The network should be able to differentiate between the services and 140 provide corresponding SLA transport paths. In addition, as these 141 applications become more sensitive and less loss tolerant, more and 142 more emphasis is placed on overall service availability and 143 reliability. 145 The Seamless SR architecture builds upon the Seamless MPLS 146 architecture and caters to new requirements imposed by the 5G 147 transport networks and the cloud applications. 148 [I-D.ietf-mpls-seamless-mpls], contains a good description of the 149 Seamless MPLS architecture. Although [I-D.ietf-mpls-seamless-mpls] 150 has not been published as an RFC, it serves as a useful description 151 of the Seamless MPLS architecture. [I-D.ietf-mpls-seamless-mpls] 152 describes the Seamless MPLS architecture, which uses LDP and/or RSVP 153 for intra-domain label distribution, and BGP-LU [RFC3107] for end-to- 154 end label distribution. Seamless SR focuses on using segment routing 155 for intra-domain label distribution. The mechansims described in 156 this document are equally applicable to intra-domain tunneling 157 mechanisms deployed using RSVP and/or LDP. 159 By using segment routing for intra-domain label distribution, 160 Seamless SR is able to easily support both SR-MPLS on IPv4 and IPv6 161 networks. This overcomes a limitation of the classic Seamless MPLS 162 architecture, which was limited to run MPLS on IPv4 networks in 163 practice. Seamless SR (like Seamless MPLS) can use BGP-LU (RFC 3107) 164 to stitch different domains. However, Seamless SR can also take 165 advantage of BGP Prefix-SID [RFC8669] to provide predictable and 166 deterministic labels for the inter-domain connectivity. 168 The basic functionality of the Seamless SR architecture does not 169 require any enhancements to existing protocols. However, in order to 170 support end-to-end service requirements across multiple domains, 171 protocol extensions may be needed. This draft discusses use cases, 172 requirements, and potential protocol enhancements. 174 2. Terminology 175 This document uses the following terminology 177 o Access Node (AN): An access node is a node which processes 178 customers frames or packets at Layer 2 or above. This includes 179 but is not limited to DSLAMs and Cell Site Routers in 5G networks. 180 Access nodes have only limited MPLS functionalities 181 in order to reduce complexity in the access network. 183 o Pre-Aggregation Node (P-AGG): A pre-aggregation node (P-AGG) is a node 184 which aggregates several access nodes (ANs). 186 o Aggregation Node (AGG): A aggregation node (AGG) is a node which 187 aggregates several pre-aggregation nodes (P-AGG). 189 o Area Border Router (ABR): Router between aggregation and core 190 domain. 192 o Label Switch Router (LSR): Label Switch router are pure transit nodes. 193 ideally have no customer or service state and are therefore decoupled 194 from service creation. 196 o Use Case: Describes a typical network including service creation 197 points and distribution of remote node loopback prefixes. 199 Figure 1: Terminology 201 3. Use Cases 203 3.1. Service provider network 205 Service provider transport networks use multiple domains to support 206 scalability. For this analysis, we consider a representative network 207 design with four level of hierarchy: access domains, pre-aggregation 208 domains, aggregation domains and a core. (See Figure 2). The 5G 209 transport networks in particular are expected to scale to very large 210 number of access nodes due to the shorter range of the 5G radio 211 technology. The networks are expected to scale up to one million 212 nodes. 214 +-------+ +-------+ +------+ +------+ 215 | | | | | | | | 216 +--+ P-AGG1+---+ AGG1 +---+ ABR1 +---+ LSR1 +--> to ABR 217 / | | /| | | | | | 218 +----+/ +-------+\/ +-------+ +------+ /+------+ 219 | AN | /\ \/ 220 +----+\ +-------+ \+-------+ +------+/\ +------+ 221 \ | | | | | | \| | 222 +--+ P-AGG2+---+ AGG2 +---+ ABR2 +---+ LSR2 +--> to ABR 223 | | | | | | | | 224 +-------+ +-------+ +------+ +------+ 226 ISIS L1 ISIS L2 ISIS L2 228 |-Access-|--Aggregation Domain--|---------Core-----------------| 230 Figure 2: 5G network 232 Many network functions in a 5G network will be virtualized and 233 distributed across multiple data centers. Virtualized network 234 functions are instantiated dynamically across different compute 235 resources. This requires that the underlying transport network 236 supports the stringent SLA on end-to-end paths. 238 5G networks support variety of service use cases that require end-to- 239 end slicing. In certain cases the end-to-end connectivity requires 240 differentiated forwarding capabilities. Seamless SR architecture 241 should provide the ability to establish end-to-end paths that satisfy 242 the required SLAs. For example, end user requirement could be to 243 establish a low latency path end-to-end. The System Architecture for 244 the 5G System [TS.23.501-3GPP] currently defines four standardized 245 Slice/Service Types: Enhanced Mobile Broadband (eMBB), Ultra-Reliable 246 Low Latency Communication (URLLC), massive Internet of Things (mIoT), 247 Vehicle to everything (V2X). The Seamless SR should support end-to- 248 end Service Level Objectives(SLO) to allow the creation of network 249 slices with these four Slice/Service Types. 251 Many deployments consist of ring topologies in the access and 252 aggregation networks. In the ring topologies, there are at most two 253 forwarding paths for the traffic, where as the core networks consist 254 of nodes with more denser connectivity compared to ring topologies. 255 Thus core networks may have a larger number of TE paths while access 256 networks will have a smaller number of TE paths. The Seamless SR 257 architecture should support the ability to have more TE paths in one 258 domain and lesser number of TE paths in another domain and provide 259 the ability to effectively connect the domains end-to-end while 260 satisfying end-to-end constraints. 262 3.2. Large scale WAN networks 264 As WAN networks grow beyond several thousand nodes, it is often 265 useful to divide the network into multiple IGP domains. Separate IGP 266 domains increase service availability by establishing a constrained 267 failure domain. Smaller IGP domains may also improve network 268 performance and health by reducing the device scale profile 269 (including protocol and FIB scale). 271 +-------+ +-------+ +-------+ 272 | | | | | | 273 | ABR1 ABR2 ABR3 ABR4 | 274 | | | | | | 275 PE1+DOMAIN1+-----+DOMAIN2+-----+DOMAIN3+PE2 276 | | | | | | 277 | ABR11 ABR22 ABR33 ABR44 | 278 | | | | | | 279 +-------+ +-------+ +-------+ 281 |-ISIS1-| |-ISIS2-| |-ISIS3-| 283 Figure 3: WAN Network 285 These Large WAN networks often cross national boundaries. In order 286 to meet data sovereignty requirements, operators need to maintain 287 strict control over end-to-end traffic-engineered(TE) paths. Segment 288 Routing provides two main solutions to implement highly constrained 289 TE paths. Flex-algo (defined in [I-D.ietf-lsr-flex-algo]) uses 290 prefix-SIDs computed by all nodes in the IGP domain using the same 291 pruned topology. Highly constrained TE paths for the data 292 sovereignty use case can also be implemented using SR-TE policies 293 ([I-D.ietf-spring-segment-routing-policy]) built using unprotected 294 adjacency SIDs. 296 Both of these approaches work well for intra-domain TE paths. 297 However, they both have limitations when one tries to extend them to 298 the creation of highly constrained inter-domain TE paths. A goal of 299 seamless SR is to be able to create highly constrained inter-domain 300 TE paths in a scalable manner. 302 Some deployments may use a centralized controller to acquire the 303 topologies of multiple domains and build end-to-end constrained 304 paths. This can be scaled with hierarchical controllers. However, 305 there is still significant risk of a loss of network connectivity to 306 one or more controllers, which can result in a failure to satisfy the 307 strict requirements of data sovereignty. The network should have 308 pre-established TE paths end-to-end that don't rely on controllers in 309 order to address these failure scenarios. 311 3.3. Data Center Interconnect (DCI) Networks 313 Data centers are playing an increasingly important role in providing 314 access to information and applications. Geographically diverse data 315 centers usually connect via a high speed, reliable and secure core 316 network. 318 +-------+ +-------+ +-------+ 319 | ASBR1 ASBR2 ASBR3 ASBR4 | 320 | | | | | | 321 PE1+ DC1 +-----+ CORE +-----+ DC2 +PE2 322 | ASBR11 ASBR22 ASBR33 ASBR44 | 323 | | | | | | 324 +-------+ +-------+ +-------+ 326 |-ISIS1-| |-ISIS2-| |-ISIS3-| 328 Figure 4: DCI Network 330 In many Data Center deployments, applications require end-to-end path 331 diversity and/or end-to-end low latency paths. It is desirable to 332 have a uniform technology deployed in the core as well as in the Data 333 Centers to create these SLA paths. Such uniformity simplifies the 334 network to a great extent. It is desirable for a solution to only 335 require service-related configurations on the access end-points where 336 services are attached, avoiding service-related configurations on the 337 ABR/ASBR nodes. 339 3.4. Multicast Use cases 341 Multicast services such as IPTV and multicast also need to be support 342 across a multi-domain service provider network. Multicast services 343 such as IPTV, multicast VPN etc need to be supported in a service 344 provider network. 346 +---------+---------+---------+ 347 | | | | 348 S1 ABR1 ABR2 R1 349 | Metro1 | Core | Metro2 | 350 | | | | 351 S2 ABR11 ABR22 R2 352 | | | | 353 +---------+---------+---------+ 355 |-ISIS1-| |-ISIS2-| |-ISIS3-| 357 Figure 5: Multicast usecases 359 Figure 5 shows a simplified multi-domain network supporting 360 multicast. Multicast sources S1 and S2 lie in a different domain 361 from the receivers R1 and R2. Using multiple IGP domains presents a 362 problem for the establishment of multicast replication trees. 363 Typically, a multicast receiver does a reverse path forwarding (RPF) 364 lookup for a multicast source. One solution is to leak the routes 365 for multicast sources across the IGP domains. However, this can 366 compromise the scaling properties of the multi-domain architecture. 367 SR-P2MP [I-D.voyer-pim-sr-p2mp-policy] offers a solution for both 368 intra-domain and inter-domain multicast. However, it does not 369 accommodate deployments using existing intra-domain multicast 370 technology, such as mLDP [RFC6388] in some of the domains. A 371 solution should accommodate a mixture of existing and newer 372 technologies to better facilitate coexistence and migration. 374 4. Requirements 376 This section provides a summary of requirements derived from the use 377 cases described in previous sections. 379 4.1. MPLS Transport 381 The architecture SHOULD provide MPLS transport between two service 382 endpoints regardless of whether the two end-points are in the same 383 IGP domain, different IGP domains, or in different autonomous 384 systems. 386 The MPLS transport SHOULD be supported on IPv4, IPv6, and dual- 387 stack networks. 389 4.2. SLA Guarantee 391 The architecture SHOULD allow the creation of paths that support 392 end-to-end SLAs. The paths should for example obey constraints 393 related to latency, diversity, and availability. 395 The architecture SHOULD support end-to-end network slicing as 396 described by 5G transport requirements [TS.23.501-3GPP]. 398 4.3. Scalability 400 The architecture SHOULD be able to support up to 1 million nodes. 402 The architecture SHOULD facilitate the use of access nodes with 403 low RIB/FIB and low CPU capabilities. 405 The architecture SHOULD facilitate the use of access nodes with 406 low label stacking capability. 408 The architecture SHOULD allow for a scalable response to network 409 events. An individual node SHOULD only need to respond to a 410 limited subset of network events. 412 Service routes on the border nodes SHOULD be minimized. 414 4.4. Availability 416 Traffic SHOULD be Fast Reroute (FRR) protected against link, node, 417 and SRLG failures within a domain. 419 Traffic SHOULD be Fast Reroute (FRR) protected against border node 420 failures. 422 Traffic SHOULD be Fast Reroute (FRR) protected against egress node 423 and egress link failures. 425 4.5. Operations 427 Each domain SHOULD be independent and SHOULD not depend on the 428 transport technology in another domain. This allows for more 429 flexible evolution of the network. 431 Basic MPLS OAM mechanisms described in [RFC8029] SHOULD be 432 supported. 434 End-to-end mpls ping and traceroute procedures SHOULD be 435 supported. 437 Ability to validate the path inside each domain SHOULD be 438 supported. 440 Statistics for inter-domain paths on the ingress and egress PE 441 nodes as well as border nodes SHOULD be supported. 443 4.6. Service Mapping 445 The architecture SHOULD support the automated steering of traffic 446 on to transport paths based on communities carried in the service 447 prefix advertisements. 449 The architecture SHOULD support the steering of traffic on to 450 transport paths based on the DSCP value carried in IPv4/IPv6 451 packets. 453 Traffic steering based on EXP bits in the mpls header SHOULD be 454 supported. 456 Traffic steering based on 5-tuple packet filter SHOULD be 457 supported. Source address, destination address, source port, 458 destination port and protocol fields should be allowed. 460 All traffic steering mechanims SHOULD be supported for all kinds 461 of service traffic including VPN traffic as well as global 462 internet traffic. 464 The core domain is expected to have more traffic engineering 465 constraints as compared to metros. The ability to map the 466 services to appropriate transport tunnels at service attachment 467 points SHOULD be supported. 469 5. Seamless Segment Routing architecture 471 5.1. Solution Concepts 472 The solution described below makes use of the following concepts. 474 o Transport Class (TC): A Transport Class is defined as a collection of 475 end-to-end MPLS paths that satisfy a set of constraints or 476 Service Level Agreements. 478 o BGP-Classful Transport (BGP-CT): A new BGP family used to 479 establish Transport Class paths across different domains. 481 o Route Distinguisher (RD): The Route Distinguisher is 482 defined in RFC4364. In BGP-CT, the RD is used in BGP advertisements 483 to differentiate multiple paths to the same loopback address. 484 It may be useful to automatically generate RDs in order to 485 simplify configuration. 487 o Route Target (RT): The Route Target extended community is 488 carried in BGP-CT advertisements. The RT represents the Transport Class 489 of an advertised path. Note that the RT is only carried in 490 the BGP-CT advertisements. No BGP-VPN related configuration or 491 VPN family advertisements are needed when BGP-CT transport paths are used 492 to carry non-VPN traffic. 494 o Mapping Community (MC): The Mapping Community is the BGP extended community 495 as defined in RFC4360. In the Seamless SR architecture, 496 an MC is carried by a BGP-CT route and/or a service route. 497 The MC is used to identify the specific local policy used 498 to map traffic for a service route to different Transport Class paths. 499 When a mapping community is advertised in a BGP-CT route it 500 identifies the specific local policy used to map the BGP-CT 501 route to the intra-domain tunnels.The local policy can include 502 additional traffic steering properties for placing traffic on different 503 Transport Class paths. The values of the MCs and the 504 corresponding local policies for service mapping are defined 505 by the network operator. 507 Figure 6: Solution Concepts 509 5.2. BGP Classful Transport 510 ----IBGP------EBGP----IBGP------EBGP-----IBGP--- 511 | | | | | | 513 +-----------+ +-----------+ +-----------+ 514 | | | | | | 515 | ASBR1+--+ASBR2 ASBR3+--+ASBR4 | 516 PE1+ D1 | X | D2 | X | D3 +PE2 517 | ASBR5+--+ASBR6 ASBR7+--+ASBR8 | 518 | | | | | | 519 +-----+-----+ +-----------+ +-----------+ 520 PE3 522 |---ISIS1---| |---ISIS2---| |---ISIS3---| 524 Figure 7: WAN Network 526 The above diagram shows a WAN network divided into 3 different 527 domains. Within each domain, BGP sessions are established between 528 the PE nodes and the border nodes as well as between border nodes. 529 BGP sessions are also established between border nodes across 530 domains. The goal is for PE1 to have MPLS connectivity to PE2, 531 satisfying specific characteristics. Multiple MPLS paths from PE1 to 532 PE2 are required in order to satisfy diffrent SLAs. 533 [I-D.kaliraj-idr-bgp-classful-transport-planes] defines a new BGP 534 family called BGP-Classful Transport. The NLRI for this new family 535 consists of a prefix and a Route Distinguisher. The prefix 536 corresponds to the loopback of the destination PE, and RD is used to 537 distinguish different paths to the same PE loopback. The BGP-CT 538 advertisement also carries a Route Target. The RT specifies the 539 Transport Class to which the BGP-CT advertisement belongs. BGP-CT 540 mechanisms are applicable to single ownership networks that are 541 organized into multiple domains. It is also applicable to multiple 542 ASes with different ownership but closely co-operating 543 administration. BGP-CT mechansims are not expected to be applied on 544 the internet peering or between domains that have completely 545 independent administrations. 547 BGP-CT advertisements for red Transport Class 549 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 550 RD:RD1 RD:RD1 RD:RD1 RD:RD1 RD:RD1 551 RT:Red RT:Red RT:Red RT:Red RT:Red(100) 552 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:PE2 553 Label:L1 Label:L2 Label:L3 Label:L4 Label:L5 555 PE1-------ASBR1------ASBR2---------ASBR3-------ASBR4--------PE2 557 VPNa Prefix: 558 10.1.1.1/32 559 RD: RD50 560 RT: RT-VPNa 561 ext-community: 562 Red(100) 563 nh: PE2 564 Label: S1 566 +------+ +------+ +------+ 567 | IL71 | | IL72 | | IL73 | 568 +------+ +------+ +------+ +------+ +------+ 569 | L1 | | L2 | | L3 | | L4 | | L5 | 570 +------+ +------+ +------+ +------+ +------+ 571 | S1 | | S1 | | S1 | | S1 | | S1 | 572 +------+ +------+ +------+ +------+ +------+ 574 Label stacks along end-to-end path 575 S1 is the end-to-end service label. 576 IL71, IL72, and IL73 are intra-domain labels corresponding to 577 red intra-domain paths. 579 Figure 8: BGP-CT Advertisements and Label Stacks 580 BGP-CT advertisements for blue Transport Class 582 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 Prefix:PE2 583 RD:RD2 RD:RD2 RD:RD2 RD:RD2 RD:RD2 584 RT:Blue RT:Blue RT:Blue RT:Blue RT:Blue(200) 585 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:PE2 586 Label:L11 Label:L12 Label:L13 Label:L14 Label:L15 588 PE1-------ASBR1----ASBR2----------ASBR3-------ASBR4--------PE2 590 VPNb Prefix: 591 10.1.1.1/32 592 RD: RD51 593 RT: RT-VPNb 594 ext-community: 595 Blue(200) 596 nh: PE2 597 Label: S2 599 +------+ +------+ +------+ 600 | IL81 | | IL82 | | IL83 | 601 +------+ +------+ +------+ +------+ +------+ 602 | L11 | | L12 | | L13 | | L14 | | L15 | 603 +------+ +------+ +------+ +------+ +------+ 604 | S2 | | S2 | | S2 | | S2 | | S2 | 605 +------+ +------+ +------+ +------+ +------+ 607 Label stacks along end-to-end path 608 S2 is the end-to-end service label. 609 IL81, IL82, and IL83 are intra-domain labels corresponding to 610 blue intra-domain paths. 612 Figure 9: BGP-CT Advertisements and Label Stacks 614 For example, consider the diagram in Figure 8 and Figure 9 . The 615 diagram shows the BGP-CT advertisements corresponding to two 616 different end-to-end paths between PE1 and PE2. The two different 617 paths belong to two different Transport Classes, red and blue. 619 The inter-domain paths created by BGP-CT Transport Classes can be 620 used by any traffic that can be steered using BGP next-hop 621 resolution, including vanilla IPv4 and IPv6, L2VPN, L3VPN, and eVPN. 622 In the example above, we show how traffic from two different L3VPNs 623 (VPNa and VPNb) is mapped onto two different BGP-CT Transport Classes 624 (Red and Blue). The L3VPN advertisements for VPNa and VPNb are 625 originated by PE2 as usual. PE1 receives these L3VPN advertisements 626 and uses the next-hop in the L3VPN advertisements to determine the 627 path to use. In the absence of any BGP-CT Transport Classes in the 628 network, PE1 would likely resolve the L3VPN next-hop over BGP-LU 629 routes corresponding to the BGP best path. However, when BGP-CT 630 Transport Classes are used, PE1 will resolve the L3VPN next-hop over 631 a BGP-CT route. 633 In the example above, PE2 originates BGP-CT advertisements for the 634 Red and Blue Transport Classes. These BGP-CT advertisements 635 propogate across the multiple domains, causing forwarding state for 636 the two Transport Classes to be installed at ABRs along the way. In 637 order to create unique NLRIs for the two advertisements, PE2 uses two 638 different RDs. In the example above, the red BGP-CT advertisement 639 has an RD of RD1 and the blue BGP-CT advertisement has an RD of RD2. 640 Note that the RD values used in the BGP-CT advertisement are 641 completely independent of the RD values used in the L3VPN 642 advertisements. In both cases, the RD values are simply a mechanism 643 to guarantee uniqueness of a prefix/RD pair. 645 The RT values used in the BGP-CT advertisements are unrelated to the 646 RT values used on the L3VPN advertisements. The L3VPN RT values 647 identify VPN membership, as usual. The BGP-CT RT values identify 648 Transport Class membership. In order to be able to easily map VPN 649 traffic into BGP-CT Transport classes, it can be useful however to 650 make an association between BGP-CT RT values and color extended 651 community values in the L3VPN advertisements. In the example 652 above,the RT value carried in the BGP-CT advertisement originated 653 from PE2 for the red Transport Class is configured to correspond to 654 the color extended community advertised in the VPN advertisement for 655 VPNa. Similarly, the RT value for the blue Transport Class 656 corresponds to the color extended community for VPNb. In this way, 657 traffic on PE1 for each VPN can be mapped to a tranport class path by 658 associating the value of the color extended community carried in the 659 VPN advertisement with an RT value carried in a BGP-CT advertisement. 661 The example above also shows the label stacks at different points 662 along the end-to-end paths for the forwarding entries which are 663 established by the two advertisements. Labels L1-L4 are red BGP-CT 664 labels advertised by border nodes ASBR1,2,3,and 4, while label L5 is 665 advertised by PE2 for the red Transport Class. Labels L11-L14 are 666 blue BGP-CT labels advertised by border nodes ASBR1,2,3,and 4, while 667 label L15 is advertised by PE2 for the blue Transport Class. 669 IL71, IL72, and IL73 represent tunnels internal to the domains 1, 2, 670 and 3 which correspond to the red Transport Class. IL81, IL82, and 671 IL83 represent tunnels internal to the domains 1, 2, and 3 which 672 correspond to the blue Transport Class. In this example, we assume 673 that the intra-domain tunnels correspond to SRTE policies having red 674 SRTE-policy-color and blue SRTE-policy-color. Service labels are 675 represented by S1 and S2. 677 Note that this example focuses on how signalling originated by PE2 678 results in forwarding state used by PE1 to reach PE2 on a specific 679 Transport Class path. The solution supports the establishment of 680 forwarding state for an arbitrary number of PEs to reach PE2. For 681 example, PE3 in Figure 8 can reach PE2 on a red Transport Class path 682 established using the same BGP-CT signalling. The signalling and 683 forwarding state from ASBR1 all the way to PE2 is common to the paths 684 used by both PE1 and PE3. This merging of signalling and forwarding 685 state is essentially to the good scaling properties of the Seamless 686 SR architecture. Millions of end-to-end Transport Class paths can be 687 established in a scalable manner. 689 5.3. Automatically Creating Transport Classes 691 In order to simplify the creation of inter-domain paths, it may be 692 desirable to automatically advertise a BGP-CT Transport Class based 693 on the existence of an intra-domain tunnel. The RT value used on the 694 BGP-CT advertisement is automatically derived from a property of the 695 intra-domain tunnel that triggered its creation. How the Transpor 696 Class RT value is derived for different types of intra-domain tunnels 697 is discussed below. 699 5.3.1. Automatically Creating Transport Classes for BGP-SR-TE Intra- 700 domain Tunnels 702 When the intra-domain tunnel is a BGP-SR-TE policy 703 [I-D.ietf-idr-segment-routing-te-policy], the value of the Transport 704 Class RT in the corresponding BGP-CT advertisement is derived from 705 the Policy Color contained in SR Policy NLRI. The 32-bit Policy 706 Color is directly converted to a 32-bit Transport Class RT. 708 5.3.2. Automatically Creating Transport Classes for Flex-Algo Tunnels 710 When the intra-domain tunnel is created using Flex-Algo 711 [I-D.ietf-lsr-flex-algo], the value of the Transport Class RT in the 712 corresponding BGP-CT advertisement is derived from the 8-bit 713 Algorithm value carried in SR-Algorithm sub-TLV (RFC8667). The 714 conversion from 8-bit Algorithm value to 32-bit Transport Class RT is 715 done by treating both as unsigned integers. Note that this 716 definition allows for intra-domain tunnels created via standardized 717 algorithm (0-127) as well as flex-algo (128-255). 719 5.3.3. Auto-deriving Transport Classes for PCEP 721 When the intra-domain tunnel is created using PCEP, the value of the 722 Transport Class RT in the corresponding BGP-CT advertisement is 723 derived from the Color of the SR Policy Identifiers TLV defined in 724 [I-D.ietf-pce-segment-routing-policy-cp]. The 32-bit Color is 725 directly converted to a 32-bit Transport Class RT. 727 5.4. Inter-domain flex-algo with BGP-CT 729 Flex-algo (defined in [I-D.ietf-lsr-flex-algo]) provides a mechanism 730 to separate routing planes. Multiple algorithms are defined and 731 prefix-SIDs are advertised for each algorithm. BGP-CT can be used to 732 advertise these flex-algo SIDs in BGP-CT. BGP Prefix-SID (RFC 8669) 733 is an attribute and can be carried in the BGP-CT NLRI. Multiple 734 trasport classes that correspond to each of the flex-algo in IGP 735 domain are defined. These Transport Classes advertise the IGP flex- 736 algo SIDs in the prefix-SIDs attribute in the BGP-CT NLRI. 738 5.5. Data sovereignty 740 +-----------+ +-----------+ +-----------+ 741 | | | +-+ AS2 | | | 742 | A1+--+A2 | | A3+--+A4 | 743 PE1+ AS1 | | |Z| | | AS3 +PE3 744 | A5+--+A6 | | A7+--+A8 | 745 | | | +-+ | | | 746 +--A13--A15-+ +-A17--A19--+ +-----------+ 747 | | | | 748 | | | | 749 | | | | 750 +--A14--A16-+ +-A18--A20--+ 751 | | | | 752 | A9+--+A10 | 753 PE4+ AS4 | | AS5 | 754 | A11+-+A12 | 755 | | | | 756 +-----------+ +-----------+ 758 Figure 10: Multi domain Network 760 Consider a WAN network with multiple ASes as shown in the diagram 761 Figure 10. The ASes roughly correspond to the geographical location 762 of the nodes. In this example, we assume that each AS corresponds to 763 a continent. The data sovereignty requirement in this example is 764 that certain traffic from PE1(in AS1) to PE3(in AS3) must not cross 765 through country Z in AS2. As indicate by the location of country Z 766 in the diagram, all paths that go directly from AS1 to AS3 through 767 AS2 necessarily passes through country Z. Using BGP-LU to provide 768 connectivity from PE1 to PE3 would generally result in a path that 769 goes from AS1 to AS2 to AS3, which does not satisfy the data 770 sovereignty requirement in this example. Instead, the solution using 771 BGP-CT will go from AS1 to AS4 to AS5 to AS2 to AS3. BGP-CT will 772 ensure that when the traffic passes through AS2, only intra-domain 773 paths satisying the data sovereignty requirement will be used. 775 Within AS2, there are several different intra-domain TE mechanisms 776 that can be used to exclude links that pass through country Z. For 777 example, RSVP-TE or flex-algo can be used to create intra-domain 778 paths that satisfy the data sovereignty requirement. BGP-CT allows 779 the constrained intra-domain paths to satisfy requirements for end- 780 to-end inter-domain paths. LSPs created by RSVP-TE or Flex-algo that 781 satisfy the "exclude country Z" constraint are associated with a 782 color Green. A Green Trassport Class is defined on border nodes in 783 all ASes. This Green Trassport Class is associated with a mapping 784 community called Not-Z. 786 In AS2, the ASBRs are configured such that the presence of the 787 mapping community Not-Z in BGP-CT routes results in a strict route 788 resolution mechanism for those routes. A BGP-CT route carrying the 789 color extended community Not-Z will only resolve on the Green 790 Tranport Class. So it will only use Green intra-domain tunnels. 792 In AS1, AS3, AS4, and AS5, no links pass through country Z, so all 793 intra-domain paths automatically satisfy the data sovereignty 794 requirement. So there is no need for the creation of Green intra- 795 domain tunnels. In these ASes, the presence of the mapping community 796 Not-Z in BGP-CT routes results in resolution on best-effort paths. 797 Even though the ASBRs in these ASes do not need to create Green 798 intra-domain tunnels, they still need to allocate labels to identify 799 traffic using the Green Transport Class. These labels will be used 800 by the ASBRs in AS2 to put traffic on the Green intra-domain tunnels 801 in AS2. 803 The requirement is that only a subset of traffic honor the data 804 sovereignty requirement. The service prefixes from PE1 to PE2 that 805 need to honor the data sovereignty requirement will be associated 806 with Green extended color community in the service advertisements. 807 This will result in PE1 using the BGP-CT labels corresponding to 808 {PE2, Green} to forward the traffic. BGP-CT labels corresponding to 809 {PE2, Green} will exist at every ASBR along the path. The traffic 810 originating on PE1, will be associated with Green color community. 811 The bottom-most label in the packet consists of a VPN label. Above 812 the VPN label, BGP-CT label is imposed. Above BGP-CT label, the 813 intra-domain transport label is imposed. Let us assume the traffic 814 from PE1 needs to go to PE2 through AS1, AS4, AS5, AS2, and AS3. The 815 BGP-CT label for {PE2, Green} will be swapped at the border nodes. 817 Note that end-to-end inter-domain data sovereignty can in principle 818 be accomplished using BGP-LU with multiple loopbacks and associating 819 those loopbacks to appropriate transport tunnels at every border node 820 in every domain. This is very configuration intensive and require 821 multiple loopbacks. BGP-CT builds on the basic mechanisms of BGP-LU 822 while greatly simplifying such use cases. 824 5.6. Interconnecting IP Fabric Data Centers 826 Prefix:TOR2 Prefix:TOR2 Prefix:TOR2 Prefix:TOR2 Prefix:TOR2 827 RD:RD2 RD:RD2 RD:RD2 RD:RD2 RD:RD2 828 RT:Blue RT:Blue RT:Blue RT:Blue RT:Blue 829 nh:ASBR1 nh:ASBR2 nh:ASBR3 nh:ASBR4 nh:TOR2 830 Label:L11 Label:L12 Label:L13 Label:L14 Label:L15 832 +-----------+ +-----------+ +-----------+ 833 | ASBR1 ASBR2 ASBR3 ASBR4 | 834 | | | | | | 835 TOR1+ DC1 +-------+ CORE +--------+ DC2 +TOR2 836 | ASBR11 ASBR22 ASBR33 ASBR44 | 837 | | | | | | 838 +-----------+ +-----------+ +-----------+ 840 +------+ +------+ +------+ 841 | UDP | | IL82 | | UDP | 842 +------+ +------+ +------+ +------+ +------+ 843 | L11 | | L12 | | L13 | | L14 | | L15 | 844 +------+ +------+ +------+ +------+ +------+ 845 | S2 | | S2 | | S2 | | S2 | | S2 | 846 +------+ +------+ +------+ +------+ +------+ 848 Label stacks along end-to-end path 849 S2 is the end-to-end service label. 850 IL82, is intra-domain labels corresponding to 851 blue intra-domain paths. 853 Figure 11: Operation in IP fabric 855 Many data center networks consist of IP fabrics which do not have 856 MPLS packet processing capability. A common requirement is that 857 traffic originated from an IP Fabric data center needs to satisfy 858 certain constraints in the MPLS-enable core, for example, only using 859 a subset of links (blue links). It is useful for the traffic 860 originating in an IP Fabric DC to carry information that allows the 861 MPLS-enable core to treat it accordingly. MPLSoUDP, as defined in 862 [RFC7510], is a mechanism where a UDP header is imposed on an MPLS 863 packets on the border nodes. In Figure 11 above, the traffic needs 864 to take blue paths in the core. The Blue Transport Class is defined 865 on the ASBRs. In the core, Blue intra-domain tunnels are created. 866 The BGP-CT advertisements for the Blue Transport Class are as shown 867 in the diagram. The BGP-CT advertisements originate at TOR2 and 868 propagate through all the ASBRs, until finally reaching TOR1. Within 869 DC1, traffic is encapsulated with a UDP header. Traffic with the UDP 870 header gets decapsulated at ASBR1. The traffic follows Blue paths in 871 the core. At ASBR4, the MPLS packet gets encapsulated with a UDP 872 header. The UDP header is removed at TOR2, and the lookup will be 873 done for the service label. 875 5.7. Translating Transport Classes across Domains 876 Prefix:PE2 Prefix:PE2 Prefix:PE2 877 RD:RD2 RD:RD2 RD:RD2 878 RT:Red RT:Blue RT:Blue 879 nh:ASBR1 nh:ASBR2 nh:PE2 880 Label:L11 Label:L12 Label:L13 882 +-----------+ +-----------+ 883 | ASBR1 ASBR2 | 884 | | | | 885 PE1+ AS1 +----------------+ AS2 +PE2 886 | ASBR11 ASBR22 | 887 | | | | 888 +-----------+ +-----------+ 890 +------+ +------+ 891 | IL1 | | IL2 | 892 +------+ +------+ +------+ +------+ 893 | L11 | | L12 | | L13 | | L14 | 894 +------+ +------+ +------+ +------+ 895 | S2 | | S2 | | S2 | | S2 | 896 +------+ +------+ +------+ +------+ 898 Label stacks along end-to-end path 899 S2 is the end-to-end service label. 900 IL1 and IL2 are intra-domain labels corresponding to 901 red intra-domain path in AS1 and Blue intra-domain 902 path in AS2. 904 Figure 12: Translating Transport Classes across Domains 906 In certain scenarios, the TE intent represented by Transport Classes 907 may differ from one domain to another. This could be the result of 908 two independent organizations merging into one. It could also occur 909 when two ASes are under different administration, but use BGP-CT to 910 provide an end-to-end service. In both scenarios, the same color may 911 represent different intent in each domain. When the traffic needs to 912 satisfy certain TE characteristic, the colors need to be mapped 913 correctly at the border. In the example in Figure 12, there are two 914 ASes. The low latency TE intent is represented with the Red 915 Transport Class in AS1 and with the Blue Transport Class in AS2. PE2 916 advertises a BGP-CT prefix with RT of Blue. ASBR2 sets the nexthop 917 to self and advertises a new label L12. On ASBR1, the Blue BGP-CT 918 advertisement is imported into the Red Transport RIB and the 919 advertisement from ASBR1 will carry a Red RT. This ensures that the 920 BGP-CT prefix for PE2 resolves on a Red intra-domain path in AS1. 922 5.8. SLA Guarantee 924 5.8.1. Low latency 926 Many network functions are virtualized and distributed. Certain 927 functions are time and latency sensitive. In inter-domain networks, 928 End-to-End latency measurement is required. Inside a domain, latency 929 measurement mechanisms such as TWAMP [RFC5357] are used and link 930 latency is advertised in IGP using extensions described in 931 [RFC8570]and [RFC7471] . 933 [I-D.ietf-idr-performance-routing] extends the BGP AIGP attribute 934 [RFC7311] by adding a sub TLV to carry an accumulated latency metric. 935 The BGP best path selection algorithm used for a Transport Class 936 requiring low latency will consider the accumulated latency metric to 937 choose the lowest latency path. 939 5.8.2. Traffic Engineering (TE) constraints 941 TE constraints generally include the ability to send traffic via 942 certain nodes or links or avoid using certain nodes or links. In the 943 Seamless SR architecture, the intra-domain transport technology is 944 responsible for ensuring the TE constraints inside the domain, BGP-CT 945 ensures that the end-to-end path is constructed from intra-domain 946 paths and inter-AS links that individually satisfy the TE 947 constraints. 949 For example, in order to construct a pair of diverse paths, we can 950 define a red and a blue Transport Class. Within each domain, the red 951 and blue Transport Class path are realized using intra-domain path 952 diversity mechanisms. For example, in a domain using flex-algo, red 953 and blue Transport Classes are realized using red and blue flex-algo 954 definitions (FAD) which don't share any links. To maintain path 955 diversity on inter-AS links, BGP policies are used to asociate two 956 inter-AS peers with the red Transport Class and another two inter-AS 957 peers with the blue Transport Class. 959 5.8.3. Bandwidth constraints 961 The Seamless SR architecture does not natively support end-to-end 962 bandwidth reservations. In this architecture, the bandwidth 963 utilization characteristics of each domain are managed independently. 964 The intra-domain bandwidth management can make use of a variety of 965 tools. 967 Link bandwidth extended community as defined in 968 [I-D.ietf-idr-link-bandwidth] allows for efficient weighted load- 969 balancing of traffic on multiple BGP-CT paths that belong to the same 970 Transport Class. For optimized path placement, a centralized TE 971 system may be deployed with BGP policies/communities used for path 972 placement. 974 5.9. Scalability 976 5.9.1. Access node scalability 978 The Seamless SR architecture needs to be able to accommodate very 979 large numbers of access devices. These access devices are expected 980 to be low-end devices with limited FIB capacity. The Seamless MPLS 981 architecture, as described in [I-D.ietf-mpls-seamless-mpls], 982 recommends the use of LDP DOD mode to limit the size of both the RIB 983 and the FIB needed on the access devices. In the Seamless SR 984 architecture, networks use IGP-based label distribution and do not 985 have this selective label request mechanism. However, RIB 986 scalability of access nodes has not been a problem for real seamless 987 MPLS deployments. In cases where access devices are low on CPU and 988 memory and unable to support large a RIB, BGP filtering policies can 989 be applied at the ABR/ASBR routers to restrict the number of BGP-CT 990 advertisements towards the access devices. The access devices will 991 receive only the PE loopbacks that it needs to connect to. 993 5.9.2. Label stack depth 995 The ability for a device to push multiple MPLS labels on a packet 996 depends on hardware capabilities. Access devices are expected to 997 have limited label stack push capabilities. Assuming shortest path 998 SR-MPLS in the access domain, the access domain transport will use a 999 single label. Lightweight traffic-engineering and slicing could also 1000 be achieved with a single label as described in 1001 [I-D.ietf-lsr-flex-algo]. The Seamless SR architecture can provide 1002 cross-domain MPLS connectivity with a single label. Assuming the use 1003 of a service label, end-to-end connectivity is provided by pushing 1004 one service label, one BGP-CT label, and one intra-domain transport 1005 label. Therefore, access nodes will only need to be able to push 3 1006 labels for most applications. 1008 5.9.3. Label Resources 1009 -----IBGP----- -----IBGP----- -----IBGP------ 1010 | | | | 1012 BGP-CT Advt: 1013 Prefix: 2.2.2.2 (PE2 loopback) 1014 RD:20000 1015 RT: 128 1016 Label:100 Label:100 Label:101 1017 Next hop:ABR3 Next hop:ABR3 Next hop: PE2 1018 ---------------------------------------------------------------- 1020 BGP-CT Advt: 1021 Prefix: 30.30.30.30 (ABR3 loopback) 1022 RD:30000 1023 RT:128 1024 Label:2000 Label:2001 1025 Nexthop:ABR1 Nexthop:ABR3 1027 +-----------+ +------------+ +-----------+ 1028 / \ / \/ \ 1029 | ABR1 ABR3 | 1030 | | | | 1031 PE1+ Metro1 + Core + Metro2 +PE2 1032 | | | | 1033 | ABR2 ABR4 | 1034 \ /\ /\ / 1035 +------------+ +-----------+ +------------+ 1037 |-ISIS1-| |-ISIS2-| |-ISIS3-| 1039 +------+ +------+ +------+ 1040 | 11111| | 22222| | 33333| IGP-labels: 1041 +------+ +------+ +------+ 11111,22222,33333 1042 | 2000 | | 2001 | | 101 | BGP-CT label: 1043 +------+ +------+ + -----+ For ABR3: 1044 | 100 | | 100 | | VPN | 2000,2001 1045 +------+ +------+ +------+ For PE2: 1046 | VPN | | VPN | 100, 101 1047 +------+ +------+ 1049 Figure 13: Recursive Route Resolution 1051 The label resources are an important consideration in MPLS networks. 1052 On access devices, labels are consumed by services as well as for 1053 transport loopbacks inside IGP domain where the access device 1054 resides. For example, in the above diagram PE1 would have to 1055 allocate label resources equal to the number of customers connecting 1056 (i.e. the number of L2/L3 VPNs). Based on the size of the IGP domain 1057 that PE1 resides in, it will also have to allocate labels for IGP 1058 loopbacks. This number is at most a few thousands. So overall a 1059 typical access device should have adequate label resources in 1060 Seamless SR architecture. The P routers need to allocate labels for 1061 IGP loopbacks. This number again is small. At most it will be a few 1062 thousand based on number of nodes in the largest IGP domains. The 1063 metro networks connect to the core network through ABRs. It is 1064 possible that a given ABR may end up having to maintain forwarding 1065 entries for a large subset of the transport loopback routes. There 1066 may be a large number of metro networks connecting to a given ABR, 1067 and in this case, the ABR will need forwarding entries for every 1068 access node in the directly connected metros. So, this ABR may have 1069 to maintain on the order of 100k routes. With BGP-CT each Transport 1070 Class will have to be separately allocated a label. So, in the above 1071 example, the ABR1 would have to use 300k labels if there were 3 1072 Transport Classes. This large number of label forwarding entries 1073 could be problematic. 1075 In highly scaled scenarios, it is therefore desirable to reduce the 1076 forwarding state on the ABRs. This reduction can be achieved with 1077 label stacking as a result of recursive route resolution. Figure 13 1078 illustrates how the forwarding state on ABRs can be greatly reduced 1079 by removing forward state for PEs in remote domains from the ABRs. 1080 In this example, we assume that we are setting up end-to-end paths 1081 for a single Transport Class, for example red. PE2 advertises a BGP- 1082 CT prefix of 2.2.2.2 with nexthop of 2.2.2.2 and label 101. 2.2.2.2 1083 is PE2's loopback. ABR3 advertises label 100 for BGP-CT prefix 1084 2.2.2.2 and changes the nexthop to self. When ABR1 receives the BGP- 1085 CT advertisement for 2.2.2.2, it does not change the nexthop and 1086 advertises same label advertised by ABR3. When PE1 receives the BGP- 1087 CT advertisement for 2.2.2.2 with a nexthop of ABR3, it resolves the 1088 route using reachability to ABR3. 1090 The reachability of ABR3 has been learned by PE1 as the result of a 1091 BGP-CT advertisement originated by ABR3. As shown in Figure 13, ABR3 1092 advertises BGP-CT prefix 30.30.30.30 with label 2001. ABR1 1093 advertises label 2000 for BGP-CT prefix 30.30.30.30 and sets nexthop 1094 to self. PE1 constructs the service data packet with a VPN label at 1095 the bottom followed by 2 BGP-CT labels 100 and 2000. The top most 1096 label 2000 is the transport label for the metro1 domain. Removing 1097 the forwarding state for PEs in remote domains on the ABRs comes at 1098 the expense of one additional BGP-CT label on the data packet. 1100 Recursive route resolution provides significant forwarding state 1101 reduction on the ABRs. ABRs have to allocate label resources only 1102 for the PEs in their local domain. The number of PEs in the same 1103 domain as a given ABR is much lower than the total number of PEs in 1104 the network. 1106 The examples in this draft generally show VPN routes resolving on 1107 BGP-CT prefixes. However, the mechanisms are equally applicable to 1108 non-VPN routes. 1110 5.10. Availability 1112 Transport layer availability is very important in latency and loss 1113 sensitive networks. Any link or node failure must be repaired with 1114 50ms convergence time. 50 ms convergence time can be achieved with 1115 Fast ReRoute (FRR) mechanisms. The seamless SR architecture provides 1116 protection against intra-domain link and node failures, Protection 1117 against border node failures and the egress link and node failures 1118 are also provided. Details of the FRR techniques are described in 1119 the sections below. 1121 5.10.1. Intra domain link and node protection 1123 In the seamless SR architecture, protection against node and link 1124 failure is achieved with the relevant FRR techniques for the 1125 corresponding transport mechanism used inside the domain. In the 1126 case of an IP fabric, ECMP FRR or LFA can be used. In SR networks, 1127 TI-LFA [I-D.ietf-rtgwg-segment-routing-ti-lfa] provides link and node 1128 protection. For SR-TE transport 1129 ([I-D.ietf-spring-segment-routing-policy]), link and node protection 1130 can be achieved using TI-LFA, combined with mechanisms described in 1131 [I-D.hegde-spring-node-protection-for-sr-te-paths]. 1133 5.10.2. Egress link and node protection 1135 [RFC8679] describes the mechanisms for providing protection for 1136 border nodes and PE devices where services are hosted. The mechanism 1137 can be further simplified operationally with anycast SIDs and anycast 1138 service labels, as described in 1139 [I-D.hegde-rtgwg-egress-protection-sr-networks]. 1141 5.10.3. Border Node protection 1143 Border node protection is very important in a network consisting of 1144 multiple domains. Seamless SR architecture can achieve 50ms FRR 1145 protection in the event of node failure using anycast addresses for 1146 the ABR/ASBRs. The requires that a set of ABRs advertise the same 1147 label for a given BGP-CT Prefix. The detailed mechanism is described 1148 in [I-D.hegde-rtgwg-egress-protection-sr-networks]. 1150 5.11. Operations 1152 5.11.1. MPLS ping and Traceroute 1154 The Seamless SR Architecture consists of 3 layers: the service layer, 1155 intra-domain transport, and BGP-CT transport. Within each layer, 1156 connectivity can be verified independently. Within the the BGP-CT 1157 transport layer, end-to-end connectivity can be verified using a new 1158 OAM FEC for BGP-CT defined in draft 1159 [I-D.kaliraj-idr-bgp-classful-transport-planes]. The draft describes 1160 end-to-end connectivity verification as well as fault isolation. 1161 BGP-CT verification happens only on the BGP nodes. The intra-domain 1162 connectivity verification and fault isolation will be based on the 1163 technology deployed in that domain as defined in [RFC8029] and 1164 [RFC8287]. 1166 5.11.2. Counters and Statistics 1168 Traffic accounting and the ability to build demand matrix for PE to 1169 PE traffic is very important. With BGP-CT, per-label transit 1170 counters should be supported on every transit router. Per-label 1171 transit counters provide details of total traffic towards a remote PE 1172 measured at every BGP transit router. Per-label egress counters 1173 should be supported on ingress PE router. Per-label egress counters 1174 provide total traffic from ingress PE to the specific remote PE. 1176 5.12. Service Mapping 1178 Service mapping is an important aspect of any architecture. It 1179 provides means to translate end users SLA requirements into 1180 operator's network configurations. Seamless SR architecture supports 1181 automatic steering with extended color community. The Transport 1182 Class and the route target carried by the BGP-CT advertisement 1183 directly map to the extended color community. Services that require 1184 specific SLA carry the extended color community which maps to the 1185 Transport Class to which the BGP-CT advertisement belongs. 1187 Other types of traffic steering such as DSCP based forwarding is 1188 expressed with mapping-community. Mapping community is a standard 1189 BGP community and is completely generic and user defined. The 1190 mapping community will have a specific service mapping feature 1191 associated with it along with required fallback behaviour when the 1192 primary transport goes down. The below list provides a general 1193 guideline into the different service mapping features and fallback 1194 options an implementation should provide. 1196 DSCP based mapping with each DSCP mapping to a Transport Class. 1198 DSCP based mapping with default mapping to a best-effort transport 1200 DSCP based mapping with fallback to best-effort when primary 1201 transport tunnel goes down. 1203 Extended color community based mapping with fallback to best 1204 effort 1206 Fallback options with specific protocol during migrations 1208 Falback options to a different Transport Class. 1210 No Fallback permitted. 1212 5.13. Migrations 1214 Networks that migrate from Seamless MPLS architecture to Seamless SR 1215 architecture, require that all the border nodes and PE devices be 1216 upgraded and enabled with new family on the BGP session. In cases 1217 where legacy nodes that cannot be upgraded, exporting from BGP-LU 1218 into BGP-CT and vice versa SHOULD be supported. Once the entire 1219 network is migrated to support BGP-CT, there is no need to run BGP-LU 1220 family on the BGP sessions. BGP-CT itself can advertise a best 1221 effort Transport Class and BGP-LU family can be removed. 1223 5.14. Interworking with v6 transport technologies 1225 A later version of this document will address interworking with other 1226 v6 technologies, including SRv6, SRm6, and MPLS over GRE6. 1228 5.15. BGP based Multicast 1230 BGP based multicast as described in draft 1231 [I-D.zzhang-bess-bgp-multicast] serves two main purposes. It can 1232 replace PIM/ mLDP inside a domain to natively do a BGP based 1233 multicast. It can also serve as an overlay stitching protocol to 1234 stitch multiple P2MP LSPs across the domain. This gives the ability 1235 to easily transition each domain independently from one technology to 1236 the other. BGP based multicast defines a new SAFI for carrying the 1237 MULTICAST TREE SAFI. Different route types are defined to support 1238 the various usecases. 1240 6. Backward Compatibility 1242 7. Security Considerations 1244 TBD 1246 8. IANA Considerations 1248 9. Acknowledgements 1250 Many thanks to Kireeti Kompella, Ron Bonica, Krzysztof Szarcowitz, 1251 Srihari Sangli,Julian Lucek, Ram Santhanakrishnan for discussions and 1252 inputs. Thanks to Joel Halpern for review and comments. 1254 10. Contributors 1256 1.Kaliraj Vairavakkalai 1258 Juniper Networks 1260 kaliraj@juniper.net 1262 2. Jeffrey Zhang 1264 Juniper Networks 1266 zzhang@juniper.net 1268 11. References 1270 11.1. Normative References 1272 [I-D.hegde-rtgwg-egress-protection-sr-networks] 1273 Hegde, S. and W. Lin, "Egress Protection for Segment 1274 Routing (SR) networks", draft-hegde-rtgwg-egress- 1275 protection-sr-networks-00 (work in progress), March 2020. 1277 [I-D.ietf-idr-performance-routing] 1278 Xu, X., Hegde, S., Talaulikar, K., Boucadair, M., and C. 1279 Jacquenet, "Performance-based BGP Routing Mechanism", 1280 draft-ietf-idr-performance-routing-02 (work in progress), 1281 October 2019. 1283 [I-D.kaliraj-idr-bgp-classful-transport-planes] 1284 Vairavakkalai, K., Venkataraman, N., and B. Rajagopalan, 1285 "BGP Classful Transport Planes", draft-kaliraj-idr-bgp- 1286 classful-transport-planes-00 (work in progress), May 2020. 1288 [I-D.zzhang-bess-bgp-multicast] 1289 Zhang, Z., Giuliano, L., Patel, K., Wijnands, I., mishra, 1290 m., and A. Gulko, "BGP Based Multicast", draft-zzhang- 1291 bess-bgp-multicast-03 (work in progress), October 2019. 1293 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1294 Requirement Levels", BCP 14, RFC 2119, 1295 DOI 10.17487/RFC2119, March 1997, 1296 . 1298 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 1299 BGP-4", RFC 3107, DOI 10.17487/RFC3107, May 2001, 1300 . 1302 [RFC8669] Previdi, S., Filsfils, C., Lindem, A., Ed., Sreekantiah, 1303 A., and H. Gredler, "Segment Routing Prefix Segment 1304 Identifier Extensions for BGP", RFC 8669, 1305 DOI 10.17487/RFC8669, December 2019, 1306 . 1308 11.2. Informative References 1310 [I-D.hegde-spring-node-protection-for-sr-te-paths] 1311 Hegde, S., Bowers, C., Litkowski, S., Xu, X., and F. Xu, 1312 "Node Protection for SR-TE Paths", draft-hegde-spring- 1313 node-protection-for-sr-te-paths-06 (work in progress), 1314 July 2020. 1316 [I-D.ietf-idr-link-bandwidth] 1317 Mohapatra, P. and R. Fernando, "BGP Link Bandwidth 1318 Extended Community", draft-ietf-idr-link-bandwidth-07 1319 (work in progress), March 2018. 1321 [I-D.ietf-idr-segment-routing-te-policy] 1322 Previdi, S., Filsfils, C., Talaulikar, K., Mattes, P., 1323 Rosen, E., Jain, D., and S. Lin, "Advertising Segment 1324 Routing Policies in BGP", draft-ietf-idr-segment-routing- 1325 te-policy-09 (work in progress), May 2020. 1327 [I-D.ietf-idr-tunnel-encaps] 1328 Patel, K., Velde, G., and S. Ramachandra, "The BGP Tunnel 1329 Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-15 1330 (work in progress), December 2019. 1332 [I-D.ietf-lsr-flex-algo] 1333 Psenak, P., Hegde, S., Filsfils, C., Talaulikar, K., and 1334 A. Gulko, "IGP Flexible Algorithm", draft-ietf-lsr-flex- 1335 algo-08 (work in progress), July 2020. 1337 [I-D.ietf-mpls-seamless-mpls] 1338 Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz, 1339 M., and D. Steinberg, "Seamless MPLS Architecture", draft- 1340 ietf-mpls-seamless-mpls-07 (work in progress), June 2014. 1342 [I-D.ietf-pce-segment-routing-policy-cp] 1343 Koldychev, M., Sivabalan, S., Barth, C., Peng, S., and H. 1344 Bidgoli, "PCEP extension to support Segment Routing Policy 1345 Candidate Paths", draft-ietf-pce-segment-routing-policy- 1346 cp-00 (work in progress), June 2020. 1348 [I-D.ietf-rtgwg-segment-routing-ti-lfa] 1349 Litkowski, S., Bashandy, A., Filsfils, C., Decraene, B., 1350 Francois, P., Voyer, D., Clad, F., and P. Camarillo, 1351 "Topology Independent Fast Reroute using Segment Routing", 1352 draft-ietf-rtgwg-segment-routing-ti-lfa-03 (work in 1353 progress), March 2020. 1355 [I-D.ietf-spring-segment-routing-policy] 1356 Filsfils, C., Talaulikar, K., Voyer, D., Bogdanov, A., and 1357 P. Mattes, "Segment Routing Policy Architecture", draft- 1358 ietf-spring-segment-routing-policy-08 (work in progress), 1359 July 2020. 1361 [I-D.voyer-pim-sr-p2mp-policy] 1362 Voyer, D., Filsfils, C., Parekh, R., Bidgoli, H., and Z. 1363 Zhang, "Segment Routing Point-to-Multipoint Policy", 1364 draft-voyer-pim-sr-p2mp-policy-02 (work in progress), July 1365 2020. 1367 [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities 1368 Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, 1369 . 1371 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1372 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1373 2006, . 1375 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 1376 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 1377 RFC 5357, DOI 10.17487/RFC5357, October 2008, 1378 . 1380 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 1381 Thomas, "Label Distribution Protocol Extensions for Point- 1382 to-Multipoint and Multipoint-to-Multipoint Label Switched 1383 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 1384 . 1386 [RFC7311] Mohapatra, P., Fernando, R., Rosen, E., and J. Uttaro, 1387 "The Accumulated IGP Metric Attribute for BGP", RFC 7311, 1388 DOI 10.17487/RFC7311, August 2014, 1389 . 1391 [RFC7471] Giacalone, S., Ward, D., Drake, J., Atlas, A., and S. 1392 Previdi, "OSPF Traffic Engineering (TE) Metric 1393 Extensions", RFC 7471, DOI 10.17487/RFC7471, March 2015, 1394 . 1396 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1397 "Encapsulating MPLS in UDP", RFC 7510, 1398 DOI 10.17487/RFC7510, April 2015, 1399 . 1401 [RFC8029] Kompella, K., Swallow, G., Pignataro, C., Ed., Kumar, N., 1402 Aldrin, S., and M. Chen, "Detecting Multiprotocol Label 1403 Switched (MPLS) Data-Plane Failures", RFC 8029, 1404 DOI 10.17487/RFC8029, March 2017, 1405 . 1407 [RFC8287] Kumar, N., Ed., Pignataro, C., Ed., Swallow, G., Akiya, 1408 N., Kini, S., and M. Chen, "Label Switched Path (LSP) 1409 Ping/Traceroute for Segment Routing (SR) IGP-Prefix and 1410 IGP-Adjacency Segment Identifiers (SIDs) with MPLS Data 1411 Planes", RFC 8287, DOI 10.17487/RFC8287, December 2017, 1412 . 1414 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 1415 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1416 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 1417 July 2018, . 1419 [RFC8570] Ginsberg, L., Ed., Previdi, S., Ed., Giacalone, S., Ward, 1420 D., Drake, J., and Q. Wu, "IS-IS Traffic Engineering (TE) 1421 Metric Extensions", RFC 8570, DOI 10.17487/RFC8570, March 1422 2019, . 1424 [RFC8679] Shen, Y., Jeganathan, M., Decraene, B., Gredler, H., 1425 Michel, C., and H. Chen, "MPLS Egress Protection 1426 Framework", RFC 8679, DOI 10.17487/RFC8679, December 2019, 1427 . 1429 [TS.23.501-3GPP] 1430 3rd Generation Partnership Project (3GPP), "System 1431 Architecture for 5G System; Stage 2, 3GPP TS 23.501 1432 v16.4.0", March 2020. 1434 Authors' Addresses 1436 Shraddha Hegde 1437 Juniper Networks Inc. 1438 Exora Business Park 1439 Bangalore, KA 560103 1440 India 1442 Email: shraddha@juniper.net 1444 Chris Bowers 1445 Juniper Networks Inc. 1447 Email: cbowers@juniper.net 1449 Xiaohu Xu 1450 Alibaba Inc. 1451 Beijing 1452 China 1454 Email: xiaohu.xxh@alibaba-inc.com 1456 Arkadiy Gulko 1457 Refinitiv 1459 Email: arkadiy.gulko@refinitiv.com 1461 Alex Bogdanov 1462 Google Inc. 1464 Email: bogdanov@google.com 1466 Jim Uttaro 1467 ATT 1469 Email: ju1738@att.com