idnits 2.17.1 draft-ietf-teas-pcecc-use-cases-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (25 October 2021) is 913 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-08) exists of draft-ietf-pce-pcep-extension-pce-controller-sr-03 == Outdated reference: A later version (-16) exists of draft-li-pce-controlled-id-space-09 == Outdated reference: A later version (-04) exists of draft-ietf-pce-stateful-interdomain-02 == Outdated reference: A later version (-25) exists of draft-ietf-pce-segment-routing-ipv6-09 == Outdated reference: A later version (-13) exists of draft-ietf-bier-te-arch-10 == Outdated reference: A later version (-13) exists of draft-chen-pce-bier-09 == Outdated reference: A later version (-09) exists of draft-ietf-spring-sr-service-programming-05 == Outdated reference: A later version (-15) exists of draft-ietf-spring-nsh-sr-09 Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TEAS Working Group Z. Li 3 Internet-Draft D. Dhody 4 Intended status: Informational Huawei Technologies 5 Expires: 28 April 2022 Q. Zhao 6 Etheric Networks 7 K. Ke 8 Tencent Holdings Ltd. 9 B. Khasanov 10 Yandex LLC 11 L. Fang 12 Expedia, Inc. 13 C. Zhou 14 HPE 15 B. Zhang 16 Telus Communications 17 A. Rachitskiy 18 Mobile TeleSystems JLLC 19 A. Gulida 20 LLC "Lifetech" 21 25 October 2021 23 The Use Cases for Path Computation Element (PCE) as a Central Controller 24 (PCECC). 25 draft-ietf-teas-pcecc-use-cases-08 27 Abstract 29 The Path Computation Element (PCE) is a core component of a Software- 30 Defined Networking (SDN) system. It can compute optimal paths for 31 traffic across a network and can also update the paths to reflect 32 changes in the network or traffic demands. PCE was developed to 33 derive paths for MPLS Label Switched Paths (LSPs), which are supplied 34 to the head end of the LSP using the Path Computation Element 35 Communication Protocol (PCEP). 37 SDN has a broader applicability than signaled MPLS traffic-engineered 38 (TE) networks, and the PCE may be used to determine paths in a range 39 of use cases including static LSPs, segment routing (SR), Service 40 Function Chaining (SFC), and most forms of a routed or switched 41 network. It is, therefore, reasonable to consider PCEP as a control 42 protocol for use in these environments to allow the PCE to be fully 43 enabled as a central controller. 45 This document describes general considerations for PCECC deployment 46 and examines its applicability and benefits, as well as its 47 challenges and limitations, through a number of use cases. PCEP 48 extensions required for stateful PCE usage are covered in separate 49 documents. 51 Requirements Language 53 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 54 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 55 "OPTIONAL" in this document are to be interpreted as described in BCP 56 14 [RFC2119] [RFC8174] when, and only when, they appear in all 57 capitals, as shown here. 59 Status of This Memo 61 This Internet-Draft is submitted in full conformance with the 62 provisions of BCP 78 and BCP 79. 64 Internet-Drafts are working documents of the Internet Engineering 65 Task Force (IETF). Note that other groups may also distribute 66 working documents as Internet-Drafts. The list of current Internet- 67 Drafts is at https://datatracker.ietf.org/drafts/current/. 69 Internet-Drafts are draft documents valid for a maximum of six months 70 and may be updated, replaced, or obsoleted by other documents at any 71 time. It is inappropriate to use Internet-Drafts as reference 72 material or to cite them other than as "work in progress." 74 This Internet-Draft will expire on 28 April 2022. 76 Copyright Notice 78 Copyright (c) 2021 IETF Trust and the persons identified as the 79 document authors. All rights reserved. 81 This document is subject to BCP 78 and the IETF Trust's Legal 82 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 83 license-info) in effect on the date of publication of this document. 84 Please review these documents carefully, as they describe your rights 85 and restrictions with respect to this document. Code Components 86 extracted from this document must include Simplified BSD License text 87 as described in Section 4.e of the Trust Legal Provisions and are 88 provided without warranty as described in the Simplified BSD License. 90 Table of Contents 92 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 93 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 94 3. Application Scenarios . . . . . . . . . . . . . . . . . . . . 4 95 3.1. Use Cases of PCECC for Label Management . . . . . . . . . 4 96 3.2. Using PCECC for SR . . . . . . . . . . . . . . . . . . . 6 97 3.2.1. PCECC SID Allocation . . . . . . . . . . . . . . . . 7 98 3.2.2. Use Cases of PCECC for SR Best Effort (BE) Path . . . 8 99 3.2.3. Use Cases of PCECC for SR Traffic Engineering (TE) 100 Path . . . . . . . . . . . . . . . . . . . . . . . . 8 101 3.2.4. SR Policy . . . . . . . . . . . . . . . . . . . . . . 9 102 3.3. Use Cases of PCECC for TE LSP . . . . . . . . . . . . . . 9 103 3.3.1. PCECC Load Balancing (LB) Use Case . . . . . . . . . 11 104 3.3.2. PCECC and Inter-AS TE . . . . . . . . . . . . . . . . 13 105 3.4. Use Cases of PCECC for Multicast LSPs . . . . . . . . . . 16 106 3.4.1. Using PCECC for P2MP/MP2MP LSPs' Setup . . . . . . . 17 107 3.4.2. Use Cases of PCECC for the Resiliency of P2MP/MP2MP 108 LSPs . . . . . . . . . . . . . . . . . . . . . . . . 18 109 3.5. Use Cases of PCECC for LSP in the Network Migration . . . 20 110 3.6. Use Cases of PCECC for L3VPN and PWE3 . . . . . . . . . . 22 111 3.7. Using PCECC for Traffic Classification Information . . . 23 112 3.8. Use Cases of PCECC for SRv6 . . . . . . . . . . . . . . . 23 113 3.9. Use Cases of PCECC for SFC . . . . . . . . . . . . . . . 25 114 3.10. Use Cases of PCECC for Native IP . . . . . . . . . . . . 26 115 3.11. Use Cases of PCECC for Local Protection (RSVP-TE) . . . . 26 116 3.12. Use Cases of PCECC for BIER . . . . . . . . . . . . . . . 26 117 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 118 5. Security Considerations . . . . . . . . . . . . . . . . . . . 27 119 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 27 120 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 121 7.1. Normative References . . . . . . . . . . . . . . . . . . 27 122 7.2. Informative References . . . . . . . . . . . . . . . . . 28 123 Appendix A. Using reliable P2MP TE based multicast delivery for 124 distributed computations (MapReduce-Hadoop) . . . . . . . 32 125 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 127 1. Introduction 129 An Architecture for Use of PCE and PCEP [RFC5440] in a Network with 130 Central Control [RFC8283] describes SDN architecture where the Path 131 Computation Element (PCE) determines paths for variety of different 132 usecases, with PCEP as a general southbound communication protocol 133 with all the nodes along the path.. 135 [RFC9050] introduces the procedures and extensions for PCEP to 136 support the PCECC architecture [RFC8283]. 138 This draft describes the various usecases for the PCECC architecture. 140 This is a living document to catalog the use cases for PCECC. There 141 is currently no intention to publish this work as an RFC. [Update: 142 Chairs are evaluating if the document should be published instead.] 144 2. Terminology 146 The following terminology is used in this document. 148 IGP: Interior Gateway Protocol. Either of the two routing 149 protocols, Open Shortest Path First (OSPF) or Intermediate System to 150 Intermediate System (IS-IS). 152 PCC: Path Computation Client: any client application requesting a 153 path computation to be performed by a Path Computation Element. 155 PCE: Path Computation Element. An entity (component, application, 156 or network node) that is capable of computing a network path or route 157 based on a network graph and applying computational constraints. 159 PCECC: PCE as a central controller. Extension of PCE to support SDN 160 functions as per [RFC8283]. 162 TE: Traffic Engineering. 164 3. Application Scenarios 166 In the following sections, several use cases are described, 167 showcasing scenarios that benefit from the deployment of PCECC. 169 3.1. Use Cases of PCECC for Label Management 171 As per [RFC8283], in some cases, the PCE-based controller can take 172 responsibility for managing some part of the MPLS label space for 173 each of the routers that it controls, and it may taker wider 174 responsibility for partitioning the label space for each router and 175 allocating different parts for different uses, communicating the 176 ranges to the router using PCEP. 178 [RFC9050] describe a mode where LSPs are provisioned as explicit 179 label instructions at each hop on the end-to-end path. Each router 180 along the path must be told what label forwarding instructions to 181 program and what resources to reserve. The controller uses PCEP to 182 communicate with each router along the path of the end-to-end LSP. 183 For this to work, the PCE- based controller will take responsibility 184 for managing some part of the MPLS label space for each of the 185 routers that it controls. An extension to PCEP could be done to 186 allow a PCC to inform the PCE of such a label space to control. 188 [RFC8664] specifies extensions to PCEP that allow a stateful PCE to 189 compute, update or initiate SR-TE paths. 190 [I-D.ietf-pce-pcep-extension-pce-controller-sr] describes the 191 mechanism for PCECC to allocate and provision the node/prefix/ 192 adjacency label (SID) via PCEP. To make such allocation PCE needs to 193 be aware of the label space from Segment Routing Global Block (SRGB) 194 or Segment Routing Local Block (SRLB) [RFC8402] of the node that it 195 controls. A mechanism for a PCC to inform the PCE of such a label 196 space to control is needed within PCEP. The full SRGB/SRLB of a node 197 could be learned via existing IGP or BGP-LS mechanism too. 199 [I-D.li-pce-controlled-id-space] defines a PCEP extension to support 200 advertisement of the MPLS label space to the PCE to control. 202 There have been various proposals for Global Labels, the PCECC 203 architecture could be used as means to learn the label space of 204 nodes, and could also be used to determine and provision the global 205 label range. 207 +------------------------------+ +------------------------------+ 208 | PCE DOMAIN 1 | | PCE DOMAIN 2 | 209 | +--------+ | | +--------+ | 210 | | | | | | | | 211 | | PCECC1 | ---------PCEP---------- | PCECC2 | | 212 | | | | | | | | 213 | | | | | | | | 214 | +--------+ | | +--------+ | 215 | ^ ^ | | ^ ^ | 216 | / \ PCEP | | PCEP / \ | 217 | V V | | V V | 218 | +--------+ +--------+ | | +--------+ +--------+ | 219 | |NODE 11 | | NODE 1n| | | |NODE 21 | | NODE 2n| | 220 | | | ...... | | | | | | ...... | | | 221 | | PCECC | | PCECC | | | | PCECC | |PCECC | | 222 | |Enabled | | Enabled| | |Enabled | |Enabled | | 223 | +--------+ +--------+ | | +--------+ +--------+ | 224 | | | | 225 +------------------------------+ +------------------------------+ 226 Figure 1: PCECC for Label Management 228 * PCC would advertise the PCECC capability to the PCE (central 229 controller-PCECC) [RFC9050]. 231 * The PCECC could also learn the label range set aside by the PCC 232 ([I-D.li-pce-controlled-id-space]). 234 * Optionally, the PCECC could determine the shared MPLS global label 235 range for the network. 237 - In the case that the shared global label range need to be 238 negotiated across multiple domains, the central controllers of 239 these domains would also need to negotiate a common global 240 label range across domains. 242 - The PCECC would need to set the shared global label range to 243 all PCC nodes in the network. 245 3.2. Using PCECC for SR 247 Segment Routing (SR) leverages the source routing paradigm. Using 248 SR, a source node steers a packet through a path without relying on 249 hop-by-hop signaling protocols such as LDP or RSVP-TE. Each path is 250 specified as an ordered list of instructions called "segments". Each 251 segment is an instruction to route the packet to a specific place in 252 the network, or to perform a specific service on the packet. A 253 database of segments can be distributed through the network using a 254 routing protocol (such as IS-IS or OSPF) or by any other means. PCEP 255 (and PCECC) could be one such means. 257 [RFC8664] specify the SR specific PCEP extensions. PCECC may further 258 use PCEP protocol for SR SID (Segment Identifier) distribution to the 259 SR nodes (PCC) with some benefits. If the PCECC allocates and 260 maintains the SID in the network for the nodes and adjacencies; and 261 further distributes them to the SR nodes directly via the PCEP 262 session has some advantage over the configurations on each SR node 263 and flooding via IGP, especially in a SDN environment. 265 When the PCECC is used for the distribution of the node segment ID 266 and adjacency segment ID, the node segment ID is allocated from the 267 SRGB of the node. For the allocation of adjacency segment ID, the 268 allocation is from the SRLB of the node as described in 269 [I-D.ietf-pce-pcep-extension-pce-controller-sr]. 271 [RFC8355] identifies various protection and resiliency usecases for 272 SR. Path protection lets the ingress node be in charge of the 273 failure recovery (used for SR-TE). Also protection can be performed 274 by the node adjacent to the failed component, commonly referred to as 275 local protection techniques or fast-reroute (FRR) techniques. In 276 case of PCECC, the protection paths can be pre-computed and setup by 277 the PCE. 279 The following example illustrate the use case where the node SID and 280 adjacency SID are allocated by the PCECC. 282 192.0.2.1/32 283 +----------+ 284 | R1(1001) | 285 +----------+ 286 | 287 +----------+ 288 | R2(1002) | 192.0.2.2/32 289 +----------+ 290 * | * * 291 * | * * 292 *link1| * * 293 192.0.2.4/32 * | *link2 * 192.0.2.5/32 294 +-----------+ 9001| * +-----------+ 295 | R4(1004) | | * | R5(1005) | 296 +-----------+ | * +-----------+ 297 * | *9003 * + 298 * | * * + 299 * | * * + 300 +-----------+ +-----------+ 301 192.0.2.3/32 | R3(1003) | |R6(1006) |192.0.2.6/32 302 +-----------+ +-----------+ 303 | 304 +-----------+ 305 | R8(1008) | 192.0.2.8/32 306 +-----------+ 308 3.2.1. PCECC SID Allocation 310 Each node (PCC) is allocated a node-SID by the PCECC. The PCECC 311 needs to update the label map of each node to all the nodes in the 312 domain. On receiving the label map, each node (PCC) uses the local 313 routing information to determine the next-hop and download the label 314 forwarding instructions accordingly. The forwarding behavior and the 315 end result is same as IGP based Node-SID in SR. Thus, from anywhere 316 in the domain, it enforces the ECMP-aware shortest-path forwarding of 317 the packet towards the related node. 319 For each adjacency in the network, PCECC can allocate an Adj-SID. 320 The PCECC sends PCInitiate message to update the label map of each 321 Adj to the corresponding nodes in the domain. Each node (PCC) 322 download the label forwarding instructions accordingly. The 323 forwarding behavior and the end result is similar to IGP based "Adj- 324 SID" in SR. 326 The various mechanism are described in 327 [I-D.ietf-pce-pcep-extension-pce-controller-sr]. 329 3.2.2. Use Cases of PCECC for SR Best Effort (BE) Path 331 In this mode of the solution, the PCECC just need to allocate the 332 node segment ID and adjacency ID (without calculating the explicit 333 path for the SR path). The ingress of the forwarding path just need 334 to encapsulate the destination node segment ID on top of the packet. 335 All the intermediate nodes will forward the packet based on the 336 destination node SID. It is similar to the LDP LSP. 338 R1 may send a packet to R8 simply by pushing an SR header with 339 segment list {1008} (Node SID for R8). The path would be the based 340 on the routing/nexthop calculation on the routers. 342 3.2.3. Use Cases of PCECC for SR Traffic Engineering (TE) Path 344 SR-TE paths may not follow an IGP SPT. Such paths may be chosen by a 345 PCECC and provisioned on the ingress node of the SR-TE path. The SR 346 header consists of a list of SIDs (or MPLS labels). The header has 347 all necessary information so that, the packets can be guided from the 348 ingress node to the egress node of the path; hence, there is no need 349 for any signaling protocol. For the case where strict traffic 350 engineering path is needed, all the adjacency SID are stacked, 351 otherwise a combination of node-SID or adj-SID can be used for the 352 SR-TE paths. 354 Note that the bandwidth reservations is only guaranteed at controller 355 and through the enforce of the bandwidth admission control. As for 356 the RSVP-TE LSP case, the control plane signaling also does the link 357 bandwidth reservation in each hop of the path. 359 The SR traffic engineering path examples are explained as bellow: 361 Note that the node SID for each node is allocated from the SRGB and 362 adjacency SID for each link are allocated from the SRLB for each 363 node. 365 Example 1: 367 R1 may send a packet P1 to R8 simply by pushing an SR header with 368 segment list {1008}. Based on the best path, it could be: 369 R1-R2-R3-R8. 371 Example 2: 373 R1 may send a packet P2 to R8 by pushing an SR header with segment 374 list {1002, 9001, 1008}. The path should be: R1-R2-link1-R3-R8. 376 Example 3: 378 R1 may send a packet P3 to R8 via R4 by pushing an SR header with 379 segment list {1004, 1008}. The path could be : R1-R2-R4-R3-R8 381 The local protection examples for SR TE path are explained below: 383 Example 4: local link protection: 385 * R1 may send a packet P4 to R8 by pushing an SR header with segment 386 list {1002, 9001, 1008}. The path should be: R1-R2-link1-R3-R8. 388 * When node R2 receives the packet from R1 which has the header of 389 link1-R3-R8, and also find out there is a link failure of link1, 390 then the R2 can enforce the traffic over the bypass to send out 391 the packet with header of R3-R8 through link2. 393 Example 5: local node protection: 395 * R1 may send a packet P5 to R8 by pushing an SR header with segment 396 list {1004, 1008}. The path could be : R1-R2-R4-R3-R8. 398 * When node R2 receives the packet from R1 which has the header of 399 {1004, 1008}, and also finds out there is a node failure for 400 node4, then it can enforce the traffic over the bypass and send 401 out the packet with header of {1005, 1008} to node5 instead of 402 node4. 404 3.2.4. SR Policy 406 [TODO: BORIS - will be added in next version to be published after 407 the blockade is lifted :) ] 409 3.3. Use Cases of PCECC for TE LSP 411 In the Section 3.2 the case of SR path via PCECC is discussed. 412 Although those cases give the simplicity and scalability, but there 413 are existing functionalities for the traffic engineering path such as 414 the bandwidth guarantee, monitoring where SR based solution are 415 complex. Also there are cases where the depth of the label stack is 416 an issue for existing deployment and certain vendors. 418 So to address these issues, PCECC architecture also support the TE 419 LSP functionalities. To achieve this, the existing PCEP can be used 420 to communicate between the PCECC and nodes along the path. This is 421 similar to static LSPs, where LSPs can be provisioned as explicit 422 label instructions at each hop on the end-to-end path. Each router 423 along the path must be told what label- forwarding instructions to 424 program and what resources to reserve. The PCE-based controller 425 keeps a view of the network and determines the paths of the end-to- 426 end LSPs, and the controller uses PCEP to communicate with each 427 router along the path of the end-to-end LSP. 429 192.0.2.1/32 430 +----------+ 431 | R1 | 432 +----------+ 433 | | 434 |link1 | 435 | |link2 436 +----------+ 437 | R2 | 192.0.2.2/32 438 +----------+ 439 link3 * | * * link4 440 * | * * 441 *link5| * * 442 192.0.2.4/32 * | *link6 * 192.0.2.5/32 443 +-----------+ | * +-----------+ 444 | R4 | | * | R5 | 445 +-----------+ | * +-----------+ 446 * | * * + 447 link10 * | * *link7 + 448 * | * * + 449 +-----------+ +-----------+ 450 192.0.2.3/32 | R3 | |R6 |192.0.2.6/32 451 +-----------+ +-----------+ 452 | | 453 |link8 | 454 | |link9 455 +-----------+ 456 | R8 | 192.0.2.8/32 457 +-----------+ 459 Figure 2: PCECC TE LSP Setup Example 461 * Based on path computation request / delegation or PCE initiation, 462 the PCECC receives the PCECC request with constraints and 463 optimization criteria. 465 * PCECC would calculate the optimal path according to given 466 constrains (e.g. bandwidth). 468 * PCECC would provision each node along the path and assign incoming 469 and outgoing labels from R1 to R8 with the path: {R1, link1, 470 1001}, {1001, R2, link3, 2003], {2003, R4, link10, 4010}, {4010, 471 R3, link8, 3008}, {3008, R8}. 473 * For the end to end protection, PCECC program each node along the 474 path from R1 to R8 with the secondary path: {R1, link2, 1002}, 475 {1002, R2, link4, 2004], {2004, R5, link7, 5007}, {5007, R3, 476 link9, 3009}, {3009, R8}. 478 * It is also possible to have a bypass path for the local protection 479 setup by the PCECC. For example, the primary path as above, then 480 to protect the node R4 locally, PCECC can program the bypass path 481 like this: {R2, link5, 2005}, {2005, R3}. By doing this, the node 482 R4 is locally protected at R2. 484 3.3.1. PCECC Load Balancing (LB) Use Case 486 Very often many service providers use TE tunnels for solving issues 487 with non-deterministic paths in their networks. One example of such 488 applications is usage of TEs in the mobile backhaul (MBH). Consider 489 the following topology - 491 TE1 --------------> 492 +---------+ +--------+ +--------+ +--------+ +------+ +---+ 493 | Access |----| Access |----| AGG 1 |----| AGG N-1|----|Core 1|--|SR1| 494 | SubNode1| | Node 1 | +--------+ +--------+ +------+ +---+ 495 +---------+ +--------+ | | | ^ | 496 | Access | Access | AGG Ring 1 | | | 497 | SubRing 1 | Ring 1 | | | | | 498 +---------+ +--------+ +--------+ | | | 499 | Access | | Access | | AGG 2 | | | | 500 | SubNode2| | Node 2 | +--------+ | | | 501 +---------+ +--------+ | | | | | 502 | | | | | | | 503 | | | +----TE2----|-+ | 504 +---------+ +--------+ +--------+ +--------+ +------+ +---+ 505 | Access | | Access |----| AGG 3 |----| AGG N |----|Core N|--|SRn| 506 | SubNodeN|----| Node N | +--------+ +--------+ +------+ +---+ 507 +---------+ +--------+ 509 This MBH architecture uses L2 access rings and sub-rings. L3 starts 510 at the aggregation layer. For the sake of simplicity, the figure 511 shows only one access sub-ring, access ring and aggregation ring 512 (AGG1...AGGN), connected by Nx10GE interfaces. Aggregation domain 513 runs its own IGP. There are two Egress routers (AGG N-1,AGG N) that 514 are connected to the Core domain via L2 interfaces. Core also have 515 connections to service routers, RSVP-TEs are used for MPLS transport 516 inside the ring. There could be at least 2 tunnels (one way) from 517 each AGG router to egress AGG routers. There are also many L2 access 518 rings connected to AGG routers. 520 Service deployment made by means of either L2VPNs (VPLS) or L3VPNs. 521 Those services use MPLS TE as transport towards egress AGG routers. 522 TE tunnels could be also used as transport towards service routers in 523 case of seamless MPLS based architecture in the future. 525 There is a need to solve the following tasks: 527 * Perform automatic load-balance amongst TE tunnels according to 528 current traffic load. 530 * TE bandwidth (BW) management: Provide guaranteed BW for specific 531 service: HSI, IPTV, etc., provide time-based BW reservation (BoD) 532 for other services. 534 * Simplify development of TE tunnels by automation without any 535 manual intervention. 537 * Provide flexibility for Service Router placement (anywhere in the 538 network by creation of transport LSPs to them). 540 Since other tasks are already considered by other PCECC use cases, in 541 this section, the focus is on load balancing (LB) task. LB task 542 could be solved by means of PCECC in the following way: 544 * After application or network service or operator can ask SDN 545 controller (PCECC) for LSP based LB between AGG X and AGG N/AGG 546 N-1 (egress AGG routers which have connections to core) via North 547 Bound Interface (NBI). Each of these would have associated 548 constrains (i.e. Path Setup Type (PST), bandwidth, inclusion or 549 exclusion specific links or nodes, number of paths, objective 550 function (OF), need for disjoint LSP paths etc.). 552 * PCECC could calculate multiple (Say N) LSPs according to given 553 constrains, calculation is based on results of Objective Function 554 (OF) [RFC5541], constraints, endpoints, same or different 555 bandwidth (BW) , different links (in case of disjoint paths) and 556 other constrains. 558 * Depending on given LSP Path setup type (PST), PCECC would use 559 download instructions to the PCC. At this stage it is assumed the 560 PCECC is aware of the label space it controls and in case of SR 561 the SID allocation and distribution is already done. 563 * PCECC would send PCInitiate PCEP message [RFC8281] towards ingress 564 AGG X router(PCC) for each of N LSPs and receives PCRpt PCEP 565 message [RFC8231] back from PCCs. If the PST is PCECC-SR, the 566 PCECC would include the SID stack as per [RFC8664]. If the PST is 567 PCECC (basic), then the PCECC would assigns labels along the 568 calculated path; and set up the path by sending central controller 569 instructions in PCEP message to each node along the path of the 570 LSP as per [RFC9050] and then send PCUpd message to the ingress 571 AGG X router with information about new LSP and AGG X(PCC) would 572 respond with PCRpt with LSP status. 574 * AGG X as ingress router now have N LSPs towards AGG N and AGG N-1 575 which are available for installing to router's forwarding and LB 576 of traffic between them. Traffic distribution between those LSPs 577 depends on particular realization of hash-function on that router. 579 * Since PCECC is aware of TEDB (TE state) and LSP-DB, it can manage 580 and prevent possible over-subscriptions and limit number of 581 available LB states. Via PCECC mechanism the control can take 582 quick actions into the network by directly provisioning the 583 central control instructions. 585 3.3.2. PCECC and Inter-AS TE 587 There are various signaling options for establishing Inter-AS TE LSP: 588 contiguous TE LSP [RFC5151], stitched TE LSP [RFC5150], nested TE LSP 589 [RFC4206]. 591 Requirements for PCE-based Inter-AS setup [RFC5376] describe the 592 approach and PCEP functionality that are needed for establishing 593 Inter-AS TE LSPs. 595 [RFC5376] also gives Inter- and Intra-AS PCE Reference Model that is 596 provided below in shorten form for the sake of simplicity. 598 Inter-AS Inter-AS 599 PCC <-->PCE1<--------->PCE2 600 :: :: :: 601 :: :: :: 602 R1----ASBR1====ASBR3---R3---ASBR5 603 | AS1 | | PCC | 604 | | | AS2 | 605 R2----ASBR2====ASBR4---R4---ASBR6 606 :: :: 607 :: :: 608 Intra-AS Intra-AS 609 PCE3 PCE4 611 Figure 3: Shorten form of Inter- and Intra-AS PCE Reference Model 612 [RFC5376] 614 The PCECC belonging to different domain can co-operate to setup 615 inter-AS TE LSP. The stateful H-PCE [RFC8751] mechanism could also 616 be used to first establish a per-domain PCECC LSP. These could be 617 stitched together to form inter-AS TE LSP as described in 618 [I-D.ietf-pce-stateful-interdomain]. 620 For the sake of simplicity, here after the focus is on a simplified 621 Inter-AS case when both AS1 and AS2 belong to the same service 622 provider administration. In that case Inter and Intra-AS PCEs could 623 be combined in one single PCE if such combined PCE performance is 624 enough for handling all path computation request and setup. There is 625 a potential to use a single PCE for both ASes if the scalability and 626 performance are enough. The PCE would require interfaces (PCEP and 627 BGP-LS) to both domains. PCECC redundancy mechanisms are described 628 in [RFC8283]. Thus routers in AS1 and AS2 (PCCs) can send PCEP 629 messages towards same PCECC. 631 +----BGP-LS------+ +------BGP-LS-----+ 632 | | | | 633 +-PCEP-|----++-+-------PCECC-----PCEP--++-+-|-------+ 634 +-:------|----::-:-+ +--::-:-|-------:---+ 635 | : | :: : | | :: : | : | 636 | : RR1 :: : | | :: : RR2 : | 637 | v v: : | LSP1 | :: v v | 638 | R1---------ASBR1=======================ASBR3--------R3 | 639 | | v : | | :v | | 640 | +----------ASBR2=======================ASBR4---------+ | 641 | | Region 1 : | | : Region 1 | | 642 |----------------:-| |--:-------------|--| 643 | | v | LSP2 | v | | 644 | +----------ASBR5=======================ASBR6---------+ | 645 | Region 2 | | Region 2 | 646 +------------------+ <--------------> +-------------------+ 647 MPLS Domain 1 Inter-AS MPLS Domain 2 648 <=======AS1=======> <========AS2=======> 650 Figure 4: Particular case of Inter-AS PCE 652 In a case of PCECC Inter-AS TE scenario where service provider 653 controls both domains (AS1 and AS2), each of them have own IGP and 654 MPLS transport. There is a need is to setup Inter-AS LSPs for 655 transporting different services on top of them (Voice, L3VPN etc.). 656 Inter-AS links with different capacity exist in several regions. The 657 task is not only to provision those Inter-AS LSPs with given 658 constrains but also calculate the path and pre-setup the backup 659 Inter-AS LSPs that will be used if primary LSP fails. 661 As per the Figure 4, LSP1 from R1 to R3 goes via ASBR1 and ASBR3, and 662 it is the primary Inter-AS LSP. R1-R3 LSP2 that go via ASBR5 and 663 ASBR6 is the backup one. In addition there could also be a bypass 664 LSP setup to protect against ASBR or inter-AS link failure. 666 After the addition of PCECC functionality to PCE (SDN controller), 667 PCECC based Inter-AS TE model SHOULD follow as PCECC usecase for TE 668 LSP as requirements of [RFC5376] with the following details: 670 * Since PCECC needs to know the topology of both domains AS1 and 671 AS2, PCECC could use BGP-LS peering with routers (or RRs) in both 672 domains. 674 * PCECC needs to PCEP connectivity towards all routers in both 675 domains (see also section 4 in [RFC5376]) in a similar manner as a 676 SDN controller. 678 * After operator's application or service orchestrator will create 679 request for tunnel creation of specific service, PCECC should 680 receive that request via NBI (NBI type is implementation 681 dependent, could be NETCONF/Yang, REST etc.). Then PCECC would 682 calculate the optimal path based on Objective Function (OF) and 683 given constraints (i.e. path setup type, bandwidth etc.), 684 including those from [RFC5376]: priority, AS sequence, preferred 685 ASBR, disjoint paths, protection. On this step we would have two 686 paths: R1-ASBR1-ASBR3-R3, R1-ASBR5-ASBR6-R3 688 * Depending on given LSP PST (PCECC or PCECC-SR), PCECC would use 689 central control download instructions to the PCC. At this stage 690 it is assumed the PCECC is aware of the label space it controls 691 and in case of SR the SID allocation and distribution is already 692 done. 694 * PCECC would send PCInitiate PCEP message [RFC8281] towards ingress 695 router R1 (PCC) in AS1 and receives PCRpt PCEP message [RFC8231] 696 back from PCC. If the PST is PCECC-SR, the PCECC would include 697 the SID stack as per [RFC8664]. It may also include binding SID 698 based on AS boundary. The backup SID stack could also be 699 installed at ingress but more importantly each node along the SR 700 path could also do local protection just based on the top segment. 701 If the PST is PCECC (basic), then the PCECC would assigns labels 702 along the calculated paths (R1-ASBR1-ASBR3-R3, R1-ASBR5-ASBR6-R3); 703 and set up the path by sending central controller instructions in 704 PCEP message to each node along the path of the LSPs as per 705 [RFC9050] and then send PCUpd message to the ingress R1 router 706 with information about new LSPs and R1 would respond with PCRpt 707 with LSP(s) status. 709 * After that step R1 now have primary and backup TEs (LSP1 and LSP2) 710 towards R3. It is up to router implementation how to make 711 switchover to backup LSP2 if LSP1 fails. 713 3.4. Use Cases of PCECC for Multicast LSPs 715 The current multicast LSPs are setup either using the RSVP-TE P2MP or 716 mLDP protocols. The setup of these LSPs may require manual 717 configurations and complex signaling when the protection is 718 considered. By using the PCECC solution, the multicast LSP can be 719 computed and setup through centralized controller which has the full 720 picture of the topology and bandwidth usage for each link. It not 721 only reduces the complex configurations comparing the distributed 722 RSVP-TE P2MP or mLDP signaling, but also it can compute the disjoint 723 primary path and secondary P2MP path efficiently. 725 3.4.1. Using PCECC for P2MP/MP2MP LSPs' Setup 727 It is assumed the PCECC is aware of the label space it controls for 728 all nodes and make allocations accordingly. 730 +----------+ 731 | R1 | Root node of the multicast LSP 732 +----------+ 733 |6000 734 +----------+ 735 Transit Node | R2 | 736 branch +----------+ 737 * | * * 738 9001* | * *9002 739 * | * * 740 +-----------+ | * +-----------+ 741 | R4 | | * | R5 | Transit Nodes 742 +-----------+ | * +-----------+ 743 * | * * + 744 9003* | * * +9004 745 * | * * + 746 +-----------+ +-----------+ 747 | R3 | | R6 | Leaf Node 748 +-----------+ +-----------+ 749 9005| 750 +-----------+ 751 | R8 | Leaf Node 752 +-----------+ 754 The P2MP examples are explained here, where R1 is root and R8 and R6 755 are the leaves. 757 * Based on the P2MP path computation request / delegation or PCE 758 initiation, the PCECC receives the PCECC request with constraints 759 and optimization criteria. 761 * PCECC would calculate the optimal P2MP path according to given 762 constrains (i.e.bandwidth). 764 * PCECC would provision each node along the path and assign incoming 765 and outgoing labels from R1 to {R6, R8} with the path: {R1, 6000}, 766 {6000, R2, {9001,9002}}, {9001, R4, 9003}, {9002, R5, 9004} {9003, 767 R3, 9005}, {9004, R6}, {9005, R8}. The main difference is in the 768 branch node instruction at R2 where two copies of packet are sent 769 towards R4 and R5 with 9001 and 9002 labels respectively. 771 The packet forwarding involves - 772 Step1: R1 may send a packet P1 to R2 simply by pushing an label of 773 6000 to the packet. 775 Step2: After R2 receives the packet with label 6000, it will 776 forwarding to R4 by swapping label to 9001 and by swapping label 777 of 9002 towards R5. 779 Step3: After R4 receives the packet with label 9001, it will 780 forwarding to R3 by swapping to 9003. After R5 receives the 781 packet with label 9002, it will forwarding to R6 by swapping to 782 9004. 784 Step4: After R3 receives the packet with label 9003, it will 785 forwarding to R8 by swapping to 9005 and when R5 receives the 786 packet with label 9004, it will swap to 9004 and send to R6. 788 Step5: Packet received at R8 and 9005 is popped; packet receives 789 at R6 and 9004 is popped. 791 3.4.2. Use Cases of PCECC for the Resiliency of P2MP/MP2MP LSPs 793 3.4.2.1. PCECC for the End-to-End Protection of the P2MP/MP2MP LSPs 795 In this section we describe the end-to-end managed path protection 796 service as well as the local protection with the operation management 797 in the PCECC network for the P2MP/MP2MP LSP. 799 An end-to-end protection principle can be applied for computing 800 backup P2MP or MP2MP LSPs. During computation of the primary 801 multicast trees, PCECC server may also take the computation of a 802 secondary tree into consideration. A PCE may compute the primary and 803 backup P2MP (or MP2MP) LSP together or sequentially. 805 +----+ +----+ 806 Root node of LSP | R1 |--| R11| 807 +----+ +----+ 808 / + 809 10/ +20 810 / + 811 +----------+ +-----------+ 812 Transit Node | R2 | | R3 | 813 +----------+ +-----------+ 814 | \ + + 815 | \ + + 816 10| 10\ +20 20+ 817 | \ + + 818 | \ + 819 | + \ + 820 +-----------+ +-----------+ Leaf Nodes 821 | R4 | | R5 | (Downstream LSR) 822 +-----------+ +-----------+ 824 In the example above, when the PCECC setup the primary multicast tree 825 from the root node R1 to the leaves, which is R1->R2->{R4, R5}, at 826 same time, it can setup the backup tree, which is R1->R11->R3->{R4, 827 R5}. Both the these two primary forwarding tree and secondary 828 forwarding tree will be downloaded to each routers along the primary 829 path and the secondary path. The traffic will be forwarded through 830 the R1->R2->{R4, R5} path normally, and when there is a node in the 831 primary tree fails (say R2), then the root node R1 will switch the 832 flow to the backup tree, which is R1->R11->R3->{R4, R5}. By using 833 the PCECC, the path computation and forwarding path downloading can 834 all be done without the complex signaling used in the P2MP RSVP-TE or 835 mLDP. 837 3.4.2.2. PCECC for the Local Protection of the P2MP/MP2MP LSPs 839 In this section we describe the local protection service in the PCECC 840 network for the P2MP/MP2MP LSP. 842 While the PCECC sets up the primary multicast tree, it can also build 843 the back LSP among PLR, the protected node, and MPs (the downstream 844 nodes of the protected node). In the cases where the amount of 845 downstream nodes are huge, this mechanism can avoid unnecessary 846 packet duplication on PLR and protect the network from traffic 847 congestion risk. 849 +------------+ 850 | R1 | Root Node 851 +------------+ 852 . 853 . 854 . 855 +------------+ Point of Local Repair/ 856 | R10 | Switchover Point 857 +------------+ (Upstream LSR) 858 / + 859 10/ +20 860 / + 861 +----------+ +-----------+ 862 Protected Node | R20 | | R30 | 863 +----------+ +-----------+ 864 | \ + + 865 | \ + + 866 10| 10\ +20 20+ 867 | \ + + 868 | \ + 869 | + \ + 870 +-----------+ +-----------+ Merge Point 871 | R40 | | R50 | (Downstream LSR) 872 +-----------+ +-----------+ 873 . . 874 . . 876 In the example above, when the PCECC setup the primary multicast path 877 around the PLR node R10 to protect node R20, which is R10->R20->{R40, 878 R50}, at same time, it can setup the backup path R10->R30->{R40, 879 R50}. Both the these two primary forwarding path and secondary 880 bypass forwarding path will be downloaded to each routers along the 881 primary path and the secondary bypass path. The traffic will be 882 forwarded through the R10->R20->{R40, R50} path normally, and when 883 there is a node failure for node R20, then the PLR node R10 will 884 switch the flow to the backup path, which is R10->R30->{R40, R50}. 885 By using the PCECC, the path computation and forwarding path 886 downloading can all be done without the complex signaling used in the 887 P2MP RSVP-TE or mLDP. 889 3.5. Use Cases of PCECC for LSP in the Network Migration 891 One of the main advantages for PCECC solution is that it has backward 892 compatibility naturally since the PCE server itself can function as a 893 proxy node of MPLS network for all the new nodes which may no longer 894 support the signaling protocols. 896 As it is illustrated in the following example, the current network 897 could migrate to a total PCECC controlled network gradually by 898 replacing the legacy nodes. During the migration, the legacy nodes 899 still need to signal using the existing MPLS protocol such as LDP and 900 RSVP-TE, and the new nodes setup their portion of the forwarding path 901 through PCECC directly. With the PCECC function as the proxy of 902 these new nodes, MPLS signaling can populate through network as 903 normal. 905 Example described in this section is based on network configurations 906 illustrated using the following figure: 908 +------------------------------------------------------------------+ 909 | PCE DOMAIN | 910 | +-----------------------------------------------------+ | 911 | | PCECC | | 912 | +-----------------------------------------------------+ | 913 | ^ ^ ^ ^ | 914 | | PCEP | | PCEP | | 915 | V V V V | 916 | +--------+ +--------+ +--------+ +--------+ +--------+ | 917 | | NODE 1 | | NODE 2 | | NODE 3 | | NODE 4 | | NODE 5 | | 918 | | |...| |...| |...| |...| | | 919 | | Legacy |if1| Legacy |if2|Legacy |if3| PCECC |if4| PCECC | | 920 | | Node | | Node | |Enabled | |Enabled | | Enabled| | 921 | +--------+ +--------+ +--------+ +--------+ +--------+ | 922 | | 923 +------------------------------------------------------------------+ 925 Example: PCECC Initiated LSP Setup In the Network Migration 927 In this example, there are five nodes for the TE LSP from head end 928 (Node1) to the tail end (Node5). Where the Node4 and Node5 are 929 centrally controlled and other nodes are legacy nodes. 931 * Node1 sends a path request message for the setup of LSP 932 destinating to Node5. 934 * PCECC sends to node1 a reply message for LSP setup with the path: 935 (Node1, if1),(Node2, if2), (Node3, if3), (Node4, if4), Node5. 937 * Node1, Node2, Node3 will setup the LSP to Node5 using the local 938 labels as usual. Node 3 with help of PCECC could proxy the 939 signaling. 941 * Then the PCECC will program the out-segment of Node3, the in- 942 segment/ out-segment of Node4, and the in-segment for Node5. 944 3.6. Use Cases of PCECC for L3VPN and PWE3 946 As described in [RFC8283], various network services may be offered 947 over a network. These include protection services (including Virtual 948 Private Network (VPN) services (such as Layer 3 VPNs [RFC4364] or 949 Ethernet VPNs [RFC7432]); or Pseudowires [RFC3985]. Delivering 950 services over a network in an optimal way requires coordination in 951 the way that network resources are allocated to support the services. 952 A PCE-based central controller can consider the whole network and all 953 components of a service at once when planning how to deliver the 954 service. It can then use PCEP to manage the network resources and to 955 install the necessary associations between those resources. 957 In the case of L3VPN, VPN labels can be assigned and distributed 958 through the PCECC PCEP among the PE router instead of using the BGP 959 protocols. 961 Example described in this section is based on network configurations 962 illustrated using the following figure: 964 +-------------------------------------------+ 965 | PCE DOMAIN | 966 | +-----------------------------------+ | 967 | | PCECC | | 968 | +-----------------------------------+ | 969 | ^ ^ ^ | 970 |PWE3/L3VPN | PCEP PCEP|LSP PWE3/L3VPN|PCEP | 971 | V V V | 972 +--------+ | +--------+ +--------+ +--------+ | +--------+ 973 | CE | | | PE1 | | NODE x | | PE2 | | | CE | 974 | |...... | |...| |...| |.....| | 975 | Legacy | |if1 | PCECC |if2|PCCEC |if3| PCECC |if4 | Legacy | 976 | Node | | | Enabled| |Enabled | |Enabled | | | Node | 977 +--------+ | +--------+ +--------+ +--------+ | +--------+ 978 | | 979 +-------------------------------------------+ 981 Example: Using PCECC for L3VPN and PWE3 983 In the case PWE3, instead of using the LDP signaling protocols, the 984 label and port pairs assigned to each pseudowire can be assigned 985 through PCECC among the PE routers and the corresponding forwarding 986 entries will be distributed into each PE routers through the extended 987 PCEP protocols and PCECC mechanism. 989 3.7. Using PCECC for Traffic Classification Information 991 As described in [RFC8283], traffic classification is an important 992 part of traffic engineering. It is the process of looking at a 993 packet to determine how it should be treated as it is forwarded 994 through the network. It applies in many scenarios including MPLS 995 traffic engineering (where it determines what traffic is forwarded 996 onto which LSPs); segment routing (where it is used to select which 997 set of forwarding instructions to add to a packet); and SFC (where it 998 indicates along which service function path a packet should be 999 forwarded). In conjunction with traffic engineering, traffic 1000 classification is an important enabler for load balancing. Traffic 1001 classification is closely linked to the computational elements of 1002 planning for the network functions just listed because it determines 1003 how traffic load is balanced and distributed through the network. 1004 Therefore, selecting what traffic classification should be performed 1005 by a router is an important part of the work done by a PCECC. 1007 Instructions can be passed from the controller to the routers using 1008 PCEP. These instructions tell the routers how to map traffic to 1009 paths or connections. Refer [I-D.ietf-pce-pcep-flowspec]. 1011 Along with traffic classification, there are few more question that 1012 needs to be considered once the path is setup - 1014 * how to use it 1016 * Whether it is a virtual link 1018 * Whether to advertise it in the IGP as a virtual link 1020 * What bits of this information to signal to the tail end 1022 These are out of scope of this document. 1024 3.8. Use Cases of PCECC for SRv6 1026 As per [RFC8402], with Segment Routing (SR), a node steers a packet 1027 through an ordered list of instructions, called segments. Segment 1028 Routing can be applied to the IPv6 architecture with the Segment 1029 Routing Header (SRH) [RFC8754]. A segment is encoded as an IPv6 1030 address. An ordered list of segments is encoded as an ordered list 1031 of IPv6 addresses in the routing header. The active segment is 1032 indicated by the Destination Address of the packet. Upon completion 1033 of a segment, a pointer in the new routing header is incremented and 1034 indicates the next segment. 1036 As per [RFC8754], an SRv6 Segment is a 128-bit value. "SRv6 SID" or 1037 simply "SID" are often used as a shorter reference for "SRv6 1038 Segment". Further details are in An illustration is provided in 1039 [RFC8986] where SRv6 SID is represented as LOC:FUNCT. 1041 [I-D.ietf-pce-segment-routing-ipv6] extends [RFC8664] to support SR 1042 for IPv6 data plane. Further a PCECC could be extended to support 1043 SRv6 SID allocation and distribution. 1045 2001:db8::1 1046 +----------+ 1047 | R1 | 1048 +----------+ 1049 | 1050 +----------+ 1051 | R2 | 2001:db8::2 1052 +----------+ 1053 * | * * 1054 * | * * 1055 *link1| * * 1056 2001:db8::4 * | *link2 * 2001:db8::5 1057 +-----------+ | * +-----------+ 1058 | R4 | | * | R5 | 1059 +-----------+ | * +-----------+ 1060 * | * * + 1061 * | * * + 1062 * | * * + 1063 +-----------+ +-----------+ 1064 2001:db8::3 | R3 | |R6 |2001:db8::6 1065 +-----------+ +-----------+ 1066 | 1067 +-----------+ 1068 | R8 | 2001:db8::8 1069 +-----------+ 1071 In this case, PCECC could assign the SRv6 SID (in form of a IPv6 1072 address) to be used for node and adjacency. Later SRv6 path in form 1073 of list of SRv6 SID could be used at the ingress. Some examples - 1075 * SRv6 SID-List={2001:db8::8} - The best path towards R8 1077 * SRv6 SID-List={2001:db8::5, 2001:db8::8} - The path towards R8 via 1078 R5 1080 3.9. Use Cases of PCECC for SFC 1082 Service Function Chaining (SFC) is described in [RFC7665]. It is the 1083 process of directing traffic in a network such that it passes through 1084 specific hardware devices or virtual machines (known as service 1085 function nodes) that can perform particular desired functions on the 1086 traffic. The set of functions to be performed and the order in which 1087 they are to be performed is known as a service function chain. The 1088 chain is enhanced with the locations at which the service functions 1089 are to be performed to derive a Service Function Path (SFP). Each 1090 packet is marked as belonging to a specific SFP, and that marking 1091 lets each successive service function node know which functions to 1092 perform and to which service function node to send the packet next. 1093 To operate an SFC network, the service function nodes must be 1094 configured to understand the packet markings, and the edge nodes must 1095 be told how to mark packets entering the network. Additionally, it 1096 may be necessary to establish tunnels between service function nodes 1097 to carry the traffic. Planning an SFC network requires load 1098 balancing between service function nodes and traffic engineering 1099 across the network that connects them. As per [RFC8283], these are 1100 operations that can be performed by a PCE-based controller, and that 1101 controller can use PCEP to program the network and install the 1102 service function chains and any required tunnels. 1104 A possible mechanism would be to add support for SFC based central 1105 control instruction that would be able to instruct the following to 1106 the each SFF along the SFP - 1108 * Service Path Identifier (SPI): Uniquely identifies a SFP. 1110 * Service Index (SI): Provides location within the SFP. 1112 * SFC Proxy handling 1114 PCECC can play the role for setting the traffic classification rules 1115 at the classifier to impose the NSH as well as downloading the 1116 forwarding instructions to each SFFs along the way so that they could 1117 process the NSH and forward accordingly. Instructions to the service 1118 classifier handle the context header, meta data etc. 1120 It is also possible to support SFC with SR in conjunction with or 1121 without NSH such as [I-D.ietf-spring-nsh-sr] and 1122 [I-D.ietf-spring-sr-service-programming]. PCECC technique can also 1123 be used for service function related segments and SR service 1124 policies. 1126 3.10. Use Cases of PCECC for Native IP 1128 [RFC8735] describes the scenarios, and suggestions for the "Centrally 1129 Control Dynamic Routing (CCDR)" architecture, which integrates the 1130 merit of traditional distributed protocols (IGP/BGP), and the power 1131 of centrally control technologies (PCE/SDN) to provide one feasible 1132 traffic engineering solution in various complex scenarios for the 1133 service provider. [I-D.ietf-teas-pce-native-ip] defines the 1134 framework for CCDR traffic engineering within Native IP network, 1135 using Dual/Multi-BGP session strategy and CCDR architecture. PCEP 1136 protocol can be used to transfer the key parameters between PCE and 1137 the underlying network devices (PCC) using PCECC technique. The 1138 central control instructions from PCECC to identify which prefix 1139 should be advertised on which BGP session. 1141 3.11. Use Cases of PCECC for Local Protection (RSVP-TE) 1143 [I-D.cbrt-pce-stateful-local-protection] describes the need for the 1144 PCE to maintain and associate the local protection paths for the 1145 RSVP-TE LSP. Local protection requires the setup of a bypass at the 1146 PLR. This bypass can be PCC-initiated and delegated, or PCE- 1147 initiated. In either case, the PLR MUST maintain a PCEP session to 1148 the PCE. The Bypass LSPs need to mapped to the primary LSP. This 1149 could be done locally at the PLR based on a local policy but there is 1150 a need for a PCE to do the mapping as well to exert greater control. 1152 This mapping can be done via PCECC procedures where the PCE could 1153 instruct the PLR to the mapping and identify the primary LSP for 1154 which bypass should be used. 1156 3.12. Use Cases of PCECC for BIER 1158 Bit Index Explicit Replication (BIER) [RFC8279] defines an 1159 architecture where all intended multicast receivers are encoded as a 1160 bitmask in the multicast packet header within different 1161 encapsulations. A router that receives such a packet will forward 1162 the packet based on the bit position in the packet header towards the 1163 receiver(s) following a precomputed tree for each of the bits in the 1164 packet. Each receiver is represented by a unique bit in the bitmask. 1166 BIER-TE [I-D.ietf-bier-te-arch] shares architecture and packet 1167 formats with BIER. BIER-TE forwards and replicates packets based on 1168 a BitString in the packet header, but every BitPosition of the 1169 BitString of a BIER-TE packet indicates one or more adjacencies. 1170 BIER-TE Path can be derived from a PCE and used at the ingress as 1171 described in [I-D.chen-pce-bier]. 1173 Further, PCECC mechanism could be used for the allocation of bits for 1174 the BIER router for BIER as well as for the adjacencies for BIER-TE. 1175 PCECC based controller can use PCEP to instruct the BIER capable 1176 routers the meaning of the bits as well as other fields needed for 1177 BIER encapsulation. The PCECC could be used to program the BIER 1178 router with various parameters used in the BIER encapsulation such as 1179 BIER subdomain-ID, BFR-ID, BIER Encapsulation etc etc for both node 1180 and adjacency. 1182 4. IANA Considerations 1184 This document does not require any action from IANA. 1186 5. Security Considerations 1188 TODO: DHRUV - we plan to add this in the next revision after the 1189 draft submission blockade is lifted. 1191 6. Acknowledgments 1193 We would like to thank Adrain Farrel, Aijun Wang, Robert Tao, 1194 Changjiang Yan, Tieying Huang, Sergio Belotti, Dieter Beller, Andrey 1195 Elperin and Evgeniy Brodskiy for their useful comments and 1196 suggestions. 1198 7. References 1200 7.1. Normative References 1202 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1203 Requirement Levels", BCP 14, RFC 2119, 1204 DOI 10.17487/RFC2119, March 1997, 1205 . 1207 [RFC5440] Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation 1208 Element (PCE) Communication Protocol (PCEP)", RFC 5440, 1209 DOI 10.17487/RFC5440, March 2009, 1210 . 1212 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1213 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1214 May 2017, . 1216 [RFC8283] Farrel, A., Ed., Zhao, Q., Ed., Li, Z., and C. Zhou, "An 1217 Architecture for Use of PCE and the PCE Communication 1218 Protocol (PCEP) in a Network with Central Control", 1219 RFC 8283, DOI 10.17487/RFC8283, December 2017, 1220 . 1222 7.2. Informative References 1224 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 1225 Edge-to-Edge (PWE3) Architecture", RFC 3985, 1226 DOI 10.17487/RFC3985, March 2005, 1227 . 1229 [RFC4206] Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP) 1230 Hierarchy with Generalized Multi-Protocol Label Switching 1231 (GMPLS) Traffic Engineering (TE)", RFC 4206, 1232 DOI 10.17487/RFC4206, October 2005, 1233 . 1235 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1236 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1237 2006, . 1239 [RFC5150] Ayyangar, A., Kompella, K., Vasseur, JP., and A. Farrel, 1240 "Label Switched Path Stitching with Generalized 1241 Multiprotocol Label Switching Traffic Engineering (GMPLS 1242 TE)", RFC 5150, DOI 10.17487/RFC5150, February 2008, 1243 . 1245 [RFC5151] Farrel, A., Ed., Ayyangar, A., and JP. Vasseur, "Inter- 1246 Domain MPLS and GMPLS Traffic Engineering -- Resource 1247 Reservation Protocol-Traffic Engineering (RSVP-TE) 1248 Extensions", RFC 5151, DOI 10.17487/RFC5151, February 1249 2008, . 1251 [RFC5541] Le Roux, JL., Vasseur, JP., and Y. Lee, "Encoding of 1252 Objective Functions in the Path Computation Element 1253 Communication Protocol (PCEP)", RFC 5541, 1254 DOI 10.17487/RFC5541, June 2009, 1255 . 1257 [RFC5376] Bitar, N., Zhang, R., and K. Kumaki, "Inter-AS 1258 Requirements for the Path Computation Element 1259 Communication Protocol (PCECP)", RFC 5376, 1260 DOI 10.17487/RFC5376, November 2008, 1261 . 1263 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1264 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 1265 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 1266 2015, . 1268 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 1269 Chaining (SFC) Architecture", RFC 7665, 1270 DOI 10.17487/RFC7665, October 2015, 1271 . 1273 [RFC8231] Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path 1274 Computation Element Communication Protocol (PCEP) 1275 Extensions for Stateful PCE", RFC 8231, 1276 DOI 10.17487/RFC8231, September 2017, 1277 . 1279 [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1280 Przygienda, T., and S. Aldrin, "Multicast Using Bit Index 1281 Explicit Replication (BIER)", RFC 8279, 1282 DOI 10.17487/RFC8279, November 2017, 1283 . 1285 [RFC8281] Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "Path 1286 Computation Element Communication Protocol (PCEP) 1287 Extensions for PCE-Initiated LSP Setup in a Stateful PCE 1288 Model", RFC 8281, DOI 10.17487/RFC8281, December 2017, 1289 . 1291 [RFC8355] Filsfils, C., Ed., Previdi, S., Ed., Decraene, B., and R. 1292 Shakir, "Resiliency Use Cases in Source Packet Routing in 1293 Networking (SPRING) Networks", RFC 8355, 1294 DOI 10.17487/RFC8355, March 2018, 1295 . 1297 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 1298 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1299 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 1300 July 2018, . 1302 [RFC8664] Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., 1303 and J. Hardwick, "Path Computation Element Communication 1304 Protocol (PCEP) Extensions for Segment Routing", RFC 8664, 1305 DOI 10.17487/RFC8664, December 2019, 1306 . 1308 [RFC8735] Wang, A., Huang, X., Kou, C., Li, Z., and P. Mi, 1309 "Scenarios and Simulation Results of PCE in a Native IP 1310 Network", RFC 8735, DOI 10.17487/RFC8735, February 2020, 1311 . 1313 [RFC8751] Dhody, D., Lee, Y., Ceccarelli, D., Shin, J., and D. King, 1314 "Hierarchical Stateful Path Computation Element (PCE)", 1315 RFC 8751, DOI 10.17487/RFC8751, March 2020, 1316 . 1318 [RFC8754] Filsfils, C., Ed., Dukes, D., Ed., Previdi, S., Leddy, J., 1319 Matsushima, S., and D. Voyer, "IPv6 Segment Routing Header 1320 (SRH)", RFC 8754, DOI 10.17487/RFC8754, March 2020, 1321 . 1323 [RFC8986] Filsfils, C., Ed., Camarillo, P., Ed., Leddy, J., Voyer, 1324 D., Matsushima, S., and Z. Li, "Segment Routing over IPv6 1325 (SRv6) Network Programming", RFC 8986, 1326 DOI 10.17487/RFC8986, February 2021, 1327 . 1329 [RFC9050] Li, Z., Peng, S., Negi, M., Zhao, Q., and C. Zhou, "Path 1330 Computation Element Communication Protocol (PCEP) 1331 Procedures and Extensions for Using the PCE as a Central 1332 Controller (PCECC) of LSPs", RFC 9050, 1333 DOI 10.17487/RFC9050, July 2021, 1334 . 1336 [I-D.ietf-pce-pcep-flowspec] 1337 Dhody, D., Farrel, A., and Z. Li, "PCEP Extension for Flow 1338 Specification", Work in Progress, Internet-Draft, draft- 1339 ietf-pce-pcep-flowspec-13, 14 October 2021, 1340 . 1343 [I-D.ietf-pce-pcep-extension-pce-controller-sr] 1344 Li, Z., Peng, S., Negi, M. S., Zhao, Q., and C. Zhou, 1345 "PCEP Procedures and Protocol Extensions for Using PCE as 1346 a Central Controller (PCECC) for Segment Routing (SR) MPLS 1347 Segment Identifier (SID) Allocation and Distribution.", 1348 Work in Progress, Internet-Draft, draft-ietf-pce-pcep- 1349 extension-pce-controller-sr-03, 30 September 2021, 1350 . 1353 [I-D.li-pce-controlled-id-space] 1354 Li, C., Chen, M., Wang, A., Cheng, W., and C. Zhou, "PCE 1355 Controlled ID Space", Work in Progress, Internet-Draft, 1356 draft-li-pce-controlled-id-space-09, 22 August 2021, 1357 . 1360 [I-D.ietf-pce-stateful-interdomain] 1361 Dugeon, O., Meuric, J., Lee, Y., and D. Ceccarelli, "PCEP 1362 Extension for Stateful Inter-Domain Tunnels", Work in 1363 Progress, Internet-Draft, draft-ietf-pce-stateful- 1364 interdomain-02, 12 July 2021, 1365 . 1368 [I-D.cbrt-pce-stateful-local-protection] 1369 Barth, C. and R. Torvi, "PCEP Extensions for RSVP-TE 1370 Local-Protection with PCE-Stateful", Work in Progress, 1371 Internet-Draft, draft-cbrt-pce-stateful-local-protection- 1372 01, 29 June 2018, . 1375 [I-D.ietf-pce-segment-routing-ipv6] 1376 Li, C., Negi, M., Sivabalan, S., Koldychev, M., 1377 Kaladharan, P., and Y. Zhu, "PCEP Extensions for Segment 1378 Routing leveraging the IPv6 data plane", Work in Progress, 1379 Internet-Draft, draft-ietf-pce-segment-routing-ipv6-09, 27 1380 May 2021, . 1383 [I-D.ietf-teas-pce-native-ip] 1384 Wang, A., Khasanov, B., Zhao, Q., and H. Chen, "PCE-Based 1385 Traffic Engineering (TE) in Native IP Networks", Work in 1386 Progress, Internet-Draft, draft-ietf-teas-pce-native-ip- 1387 17, 1 February 2021, . 1390 [I-D.ietf-bier-te-arch] 1391 Eckert, T., Cauchie, G., and M. Menth, "Tree Engineering 1392 for Bit Index Explicit Replication (BIER-TE)", Work in 1393 Progress, Internet-Draft, draft-ietf-bier-te-arch-10, 9 1394 July 2021, . 1397 [I-D.chen-pce-bier] 1398 Chen, R., Zhang, Z., Chen, H., Dhanaraj, S., Qin, F., and 1399 A. Wang, "PCEP Extensions for BIER-TE", Work in Progress, 1400 Internet-Draft, draft-chen-pce-bier-09, 12 July 2021, 1401 . 1404 [I-D.ietf-spring-sr-service-programming] 1405 Clad, F., Xu, X., Filsfils, C., Bernier, D., Li, C., 1406 Decraene, B., Ma, S., Yadlapalli, C., Henderickx, W., and 1407 S. Salsano, "Service Programming with Segment Routing", 1408 Work in Progress, Internet-Draft, draft-ietf-spring-sr- 1409 service-programming-05, 10 September 2021, 1410 . 1413 [I-D.ietf-spring-nsh-sr] 1414 Guichard, J. N. and J. Tantsura, "Integration of Network 1415 Service Header (NSH) and Segment Routing for Service 1416 Function Chaining (SFC)", Work in Progress, Internet- 1417 Draft, draft-ietf-spring-nsh-sr-09, 26 July 2021, 1418 . 1421 [MAP-REDUCE] 1422 Lee, K., Choi, T., Ganguly, A., Wolinsky, D., Boykin, P., 1423 and R. Figueiredo, "Parallel Processing Framework on a P2P 1424 System Using Map and Reduce Primitives", , May 2011, 1425 . 1427 [MPLS-DC] Afanasiev, D. and D. Ginsburg, "MPLS in DC and inter-DC 1428 networks: the unified forwarding mechanism for network 1429 programmability at scale", , March 2014, 1430 . 1433 Appendix A. Using reliable P2MP TE based multicast delivery for 1434 distributed computations (MapReduce-Hadoop) 1436 MapReduce model of distributed computations in computing clusters is 1437 widely deployed. In Hadoop (https://hadoop.apache.org/) 1.0 1438 architecture MapReduce operations on big data in the Hadoop 1439 Distributed File System (HDFS), where NameNode has the knowledge 1440 about resources of the cluster and where actual data (chunks) for 1441 particular task are located (which DataNode). Each chunk of data 1442 (64MB or more) should have 3 saved copies in different DataNodes 1443 based on their proximity. 1445 Proximity level currently has semi-manual allocation and based on 1446 Rack IDs (Assumption is that closer data are better because of access 1447 speed/smaller latency). 1449 JobTracker node is responsible for computation tasks, scheduling 1450 across DataNodes and also have Rack-awareness. Currently transport 1451 protocols between NameNode/JobTracker and DataNodes are based on IP 1452 unicast. It has simplicity as pros but has numerous drawbacks 1453 related with its flat approach. 1455 It is clear that we should go beyond of one DC for Hadoop cluster 1456 creation and move towards distributed clusters. In that case we need 1457 to handle performance and latency issues. Latency depends on speed 1458 of light in fiber links and also latency introduced by intermediate 1459 devices in between. The last one is closely correlated with network 1460 device architecture and performance. Current performance of NPU 1461 based routers should be enough for creating distribute Hadoop 1462 clusters with predicted latency. Performance of SW based routers 1463 (mainly as VNF) together with additional HW features such as DPDK are 1464 promising but require additional research and testing. 1466 Main question is how can we create simple but effective architecture 1467 for distributed Hadoop cluster? 1469 There is research [MAP-REDUCE] which show how usage of multicast tree 1470 could improve speed of resource or cluster members discovery inside 1471 the cluster as well as increase redundancy in communications between 1472 cluster nodes. 1474 Is traditional IP based multicast enough for that? We doubt it 1475 because it requires additional control plane (IGMP, PIM) and a lot of 1476 signaling, that is not suitable for high performance computations, 1477 that are very sensitive to latency. 1479 P2MP TE tunnels looks much more suitable as potential solution for 1480 creation of multicast based communications between NameNode as root 1481 and DataNodes as leaves inside the cluster. Obviously these P2MP 1482 tunnels should be dynamically created and turned down (no manual 1483 intervention). Here, the PCECC comes to play with main objective to 1484 create optimal topology of each particular request for MapReduce 1485 computation and also create P2MP tunnels with needed parameters such 1486 as bandwidth and delay. 1488 This solution would require to use MPLS label based forwarding inside 1489 the cluster. Usage of label based forwarding inside DC was proposed 1490 by Yandex [MPLS-DC]. Technically it is already possible because MPLS 1491 on switches is already supported by some vendors, MPLS also exists on 1492 Linux and OVS. 1494 The following framework can make this task: 1496 +--------+ 1497 | APP | 1498 +--------+ 1499 | NBI (REST API,...) 1500 | 1501 PCEP +----------+ REST API 1502 +---------+ +---| PCECC |----------+ 1503 | Client |---|---| | | 1504 +---------+ | +----------+ | 1505 | | | | | | 1506 +-----|---+ |PCEP| | 1507 +--------+ | | | | | 1508 | | | | | | 1509 | REST API | | | | | 1510 | | | | | | 1511 +-------------+ | | | | +----------+ 1512 | Job Tracker | | | | | | NameNode | 1513 | | | | | | | | 1514 +-------------+ | | | | +----------+ 1515 +------------------+ | +-----------+ 1516 | | | | 1517 |---+-----P2MP TE--+-----|-----------| | 1518 +----------+ +----------+ +----------+ 1519 | DataNode1| | DataNode2| | DataNodeN| 1520 |TaskTraker| |TaskTraker| .... |TaskTraker| 1521 +----------+ +----------+ +----------+ 1523 Communication between JobTracker, NameNode and PCECC can be done via 1524 REST API directly or via cluster manager such as Mesos. 1526 Phase 1: Distributed cluster resources discovery During this phase 1527 JobTracker and NameNode SHOULD identify and find available DataNodes 1528 according to computing request from application (APP). NameNode 1529 SHOULD query PCECC about available DataNodes, NameNode MAY provide 1530 additional constrains to PCECC such as topological proximity, 1531 redundancy level. 1533 PCECC SHOULD analyze the topology of distributed cluster and perform 1534 constrain based path calculation from client towards most suitable 1535 NameNodes. PCECC SHOULD reply to NameNode the list of most suitable 1536 DataNodes and their resource capabilities. Topology discovery 1537 mechanism for PCECC will be added later to that framework. 1539 Phase 2: PCECC SHOULD create P2MP LSP from client towards those 1540 DataNodes by means of PCEP messages following previously calculated 1541 path. 1543 Phase 3. NameNode SHOULD send this information to client, PCECC 1544 informs client about optimal P2MP path towards DataNodes via PCEP 1545 message. 1547 Phase 4. Client sends data blocks to those DataNodes for writing via 1548 created P2MP tunnel. 1550 When this task will be finished, P2MP tunnel could be turned down. 1552 Authors' Addresses 1554 Zhenbin (Robin) Li 1555 Huawei Technologies 1556 Huawei Bld., No.156 Beiqing Rd. 1557 Beijing 1558 100095 1559 China 1561 Email: lizhenbin@huawei.com 1563 Dhruv Dhody 1564 Huawei Technologies 1565 Divyashree Techno Park, Whitefield 1566 Bangalore 1567 Karnataka 560066 1568 India 1570 Email: dhruv.ietf@gmail.com 1572 Quintin Zhao 1573 Etheric Networks 1574 1009 S CLAREMONT ST 1575 SAN MATEO, CA 94402 1576 United States of America 1578 Email: qzhao@ethericnetworks.com 1580 King Ke 1581 Tencent Holdings Ltd. 1582 Shenzhen 1583 China 1585 Email: kinghe@tencent.com 1586 Boris Khasanov 1587 Yandex LLC 1588 Ulitsa Lva Tolstogo 16 1589 Moscow 1591 Email: bhassanov@yahoo.com 1593 Luyuan Fang 1594 Expedia, Inc. 1595 United States of America 1597 Email: luyuanf@gmail.com 1599 Chao Zhou 1600 HPE 1602 Email: chaozhou_us@yahoo.com 1604 Boris Zhang 1605 Telus Communications 1607 Email: Boris.zhang@telus.com 1609 Artem Rachitskiy 1610 Mobile TeleSystems JLLC 1611 Nezavisimosti ave., 95 1612 220043, Minsk 1613 Belarus 1615 Email: arachitskiy@mts.by 1617 Anton Gulida 1618 LLC "Lifetech" 1619 Krasnoarmeyskaya str., 24 1620 220030, Minsk 1621 Belarus 1623 Email: anton.gulida@life.com.by