idnits 2.17.1 draft-ietf-teas-pcecc-use-cases-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 4, 2019) is 1751 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 1399 == Outdated reference: A later version (-15) exists of draft-ietf-pce-stateful-hpce-10 == Outdated reference: A later version (-13) exists of draft-ietf-pce-pcep-flowspec-03 == Outdated reference: A later version (-14) exists of draft-ietf-pce-pcep-extension-for-pce-controller-01 == Outdated reference: A later version (-09) exists of draft-zhao-pce-pcep-extension-pce-controller-sr-04 == Outdated reference: A later version (-16) exists of draft-li-pce-controlled-id-space-03 == Outdated reference: A later version (-04) exists of draft-dugeon-pce-stateful-interdomain-02 == Outdated reference: A later version (-25) exists of draft-ietf-pce-segment-routing-ipv6-02 == Outdated reference: A later version (-26) exists of draft-ietf-6man-segment-routing-header-21 == Outdated reference: A later version (-17) exists of draft-ietf-teas-pce-native-ip-03 == Outdated reference: A later version (-12) exists of draft-ietf-teas-native-ip-scenarios-06 == Outdated reference: A later version (-13) exists of draft-ietf-bier-te-arch-02 == Outdated reference: A later version (-13) exists of draft-chen-pce-bier-05 Summary: 0 errors (**), 0 flaws (~~), 13 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TEAS Working Group Q. Zhao 3 Internet-Draft Z. Li 4 Intended status: Informational B. Khasanov 5 Expires: January 5, 2020 D. Dhody 6 Huawei Technologies 7 K. Ke 8 Tencent Holdings Ltd. 9 L. Fang 10 Expedia, Inc. 11 C. Zhou 12 Cisco Systems 13 B. Zhang 14 Telus Communications 15 A. Rachitskiy 16 Mobile TeleSystems JLLC 17 A. Gulida 18 LLC "Lifetech" 19 July 4, 2019 21 The Use Cases for Path Computation Element (PCE) as a Central Controller 22 (PCECC). 23 draft-ietf-teas-pcecc-use-cases-04 25 Abstract 27 The Path Computation Element (PCE) is a core component of a Software- 28 Defined Networking (SDN) system. It can compute optimal paths for 29 traffic across a network and can also update the paths to reflect 30 changes in the network or traffic demands. PCE was developed to 31 derive paths for MPLS Label Switched Paths (LSPs), which are supplied 32 to the head end of the LSP using the Path Computation Element 33 Communication Protocol (PCEP). 35 SDN has a broader applicability than signaled MPLS traffic-engineered 36 (TE) networks, and the PCE may be used to determine paths in a range 37 of use cases including static LSPs, segment routing (SR), Service 38 Function Chaining (SFC), and most forms of a routed or switched 39 network. It is, therefore, reasonable to consider PCEP as a control 40 protocol for use in these environments to allow the PCE to be fully 41 enabled as a central controller. 43 This document describes general considerations for PCECC deployment 44 and examines its applicability and benefits, as well as its 45 challenges and limitations, through a number of use cases. PCEP 46 extensions required for stateful PCE usage are covered in separate 47 documents. 49 This is a living document to catalogue the use cases for PCECC. 50 There is currently no intention to publish this work as an RFC. 52 Requirements Language 54 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 55 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 56 "OPTIONAL" in this document are to be interpreted as described in BCP 57 14 [RFC2119] [RFC8174] when, and only when, they appear in all 58 capitals, as shown here. 60 Status of This Memo 62 This Internet-Draft is submitted in full conformance with the 63 provisions of BCP 78 and BCP 79. 65 Internet-Drafts are working documents of the Internet Engineering 66 Task Force (IETF). Note that other groups may also distribute 67 working documents as Internet-Drafts. The list of current Internet- 68 Drafts is at https://datatracker.ietf.org/drafts/current/. 70 Internet-Drafts are draft documents valid for a maximum of six months 71 and may be updated, replaced, or obsoleted by other documents at any 72 time. It is inappropriate to use Internet-Drafts as reference 73 material or to cite them other than as "work in progress." 75 This Internet-Draft will expire on January 5, 2020. 77 Copyright Notice 79 Copyright (c) 2019 IETF Trust and the persons identified as the 80 document authors. All rights reserved. 82 This document is subject to BCP 78 and the IETF Trust's Legal 83 Provisions Relating to IETF Documents 84 (https://trustee.ietf.org/license-info) in effect on the date of 85 publication of this document. Please review these documents 86 carefully, as they describe your rights and restrictions with respect 87 to this document. Code Components extracted from this document must 88 include Simplified BSD License text as described in Section 4.e of 89 the Trust Legal Provisions and are provided without warranty as 90 described in the Simplified BSD License. 92 Table of Contents 94 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 95 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 96 3. Application Scenarios . . . . . . . . . . . . . . . . . . . . 4 97 3.1. Use Cases of PCECC for Label Management . . . . . . . . . 4 98 3.2. Using PCECC for SR . . . . . . . . . . . . . . . . . . . 6 99 3.2.1. PCECC SID Allocation . . . . . . . . . . . . . . . . 7 100 3.2.2. Use Cases of PCECC for SR Best Effort (BE) Path . . . 8 101 3.2.3. Use Cases of PCECC for SR Traffic Engineering (TE) 102 Path . . . . . . . . . . . . . . . . . . . . . . . . 8 103 3.3. Use Cases of PCECC for TE LSP . . . . . . . . . . . . . . 9 104 3.3.1. PCECC Load Balancing (LB) Use Case . . . . . . . . . 11 105 3.3.2. PCECC and Inter-AS TE . . . . . . . . . . . . . . . . 13 106 3.4. Use Cases of PCECC for Multicast LSPs . . . . . . . . . . 16 107 3.4.1. Using PCECC for P2MP/MP2MP LSPs' Setup . . . . . . . 16 108 3.4.2. Use Cases of PCECC for the Resiliency of P2MP/MP2MP 109 LSPs . . . . . . . . . . . . . . . . . . . . . . . . 17 110 3.5. Use Cases of PCECC for LSP in the Network Migration . . . 19 111 3.6. Use Cases of PCECC for L3VPN and PWE3 . . . . . . . . . . 21 112 3.7. Using PCECC for Traffic Classification Information . . . 22 113 3.8. Use Cases of PCECC for SRv6 . . . . . . . . . . . . . . . 22 114 3.9. Use Cases of PCECC for SFC . . . . . . . . . . . . . . . 24 115 3.10. Use Cases of PCECC for Native IP . . . . . . . . . . . . 24 116 3.11. Use Cases of PCECC for Local Protection (RSVP-TE) . . . . 25 117 3.12. Use Cases of PCECC for BIER . . . . . . . . . . . . . . . 25 118 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 119 5. Security Considerations . . . . . . . . . . . . . . . . . . . 26 120 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 26 121 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 122 7.1. Normative References . . . . . . . . . . . . . . . . . . 26 123 7.2. Informative References . . . . . . . . . . . . . . . . . 26 124 Appendix A. Using reliable P2MP TE based multicast delivery for 125 distributed computations (MapReduce-Hadoop) . . . . 30 126 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 128 1. Introduction 130 An Architecture for Use of PCE and PCEP [RFC5440] in a Network with 131 Central Control [RFC8283] describes SDN architecture where the Path 132 Computation Element (PCE) determines paths for variety of different 133 usecases, with PCEP as a general southbound communication protocol 134 with all the nodes along the path.. 136 [I-D.ietf-pce-pcep-extension-for-pce-controller] introduces the 137 procedures and extensions for PCEP to support the PCECC architecture 138 [RFC8283]. 140 This draft describes the various usecases for the PCECC architecture. 142 This is a living document to catalogue the use cases for PCECC. 143 There is currently no intention to publish this work as an RFC. 145 2. Terminology 147 The following terminology is used in this document. 149 IGP: Interior Gateway Protocol. Either of the two routing 150 protocols, Open Shortest Path First (OSPF) or Intermediate System 151 to Intermediate System (IS-IS). 153 PCC: Path Computation Client: any client application requesting a 154 path computation to be performed by a Path Computation Element. 156 PCE: Path Computation Element. An entity (component, application, 157 or network node) that is capable of computing a network path or 158 route based on a network graph and applying computational 159 constraints. 161 PCECC: PCE as a central controller. Extension of PCE to support SDN 162 functions as per [RFC8283]. 164 TE: Traffic Engineering. 166 3. Application Scenarios 168 In the following sections, several use cases are described, 169 showcasing scenarios that benefit from the deployment of PCECC. 171 3.1. Use Cases of PCECC for Label Management 173 As per [RFC8283], in some cases, the PCE-based controller can take 174 responsibility for managing some part of the MPLS label space for 175 each of the routers that it controls, and it may taker wider 176 responsibility for partitioning the label space for each router and 177 allocating different parts for different uses, communicating the 178 ranges to the router using PCEP. 180 [I-D.ietf-pce-pcep-extension-for-pce-controller] describe a mode 181 where LSPs are provisioned as explicit label instructions at each hop 182 on the end-to-end path. Each router along the path must be told what 183 label forwarding instructions to program and what resources to 184 reserve. The controller uses PCEP to communicate with each router 185 along the path of the end-to-end LSP. For this to work, the PCE- 186 based controller will take responsibility for managing some part of 187 the MPLS label space for each of the routers that it controls. An 188 extension to PCEP could be done to allow a PCC to inform the PCE of 189 such a label space to control. 191 [I-D.ietf-pce-segment-routing] specifies extensions to PCEP that 192 allow a stateful PCE to compute, update or initiate SR-TE paths. 193 [I-D.zhao-pce-pcep-extension-pce-controller-sr] describes the 194 mechanism for PCECC to allocate and provision the node/prefix/ 195 adjacency label (SID) via PCEP. To make such allocation PCE needs to 196 be aware of the label space from Segment Routing Global Block (SRGB) 197 or Segment Routing Local Block (SRLB) [RFC8402] of the node that it 198 controls. A mechanism for a PCC to inform the PCE of such a label 199 space to control is needed within PCEP. The full SRGB/SRLB of a node 200 could be learned via existing IGP or BGP-LS mechanism too. 202 [I-D.li-pce-controlled-id-space] defines a PCEP extension to support 203 advertisement of the MPLS label space to the PCE to control. 205 There have been various proposals for Global Labels, the PCECC 206 architecture could be used as means to learn the label space of 207 nodes, and could also be used to determine and provision the global 208 label range. 210 +------------------------------+ +------------------------------+ 211 | PCE DOMAIN 1 | | PCE DOMAIN 2 | 212 | +--------+ | | +--------+ | 213 | | | | | | | | 214 | | PCECC1 | ---------PCEP---------- | PCECC2 | | 215 | | | | | | | | 216 | | | | | | | | 217 | +--------+ | | +--------+ | 218 | ^ ^ | | ^ ^ | 219 | / \ PCEP | | PCEP / \ | 220 | V V | | V V | 221 | +--------+ +--------+ | | +--------+ +--------+ | 222 | |NODE 11 | | NODE 1n| | | |NODE 21 | | NODE 2n| | 223 | | | ...... | | | | | | ...... | | | 224 | | PCECC | | PCECC | | | | PCECC | |PCECC | | 225 | |Enabled | | Enabled| | |Enabled | |Enabled | | 226 | +--------+ +--------+ | | +--------+ +--------+ | 227 | | | | 228 +------------------------------+ +------------------------------+ 230 Figure 1: PCECC for Label Management 232 o PCC would advertise the PCECC capability to the PCE (central 233 controller-PCECC) 234 [I-D.ietf-pce-pcep-extension-for-pce-controller]. 236 o The PCECC could also learn the label range set aside by the PCC 237 ([I-D.li-pce-controlled-id-space]). 239 o Optionally, the PCECC could determine the shared MPLS global label 240 range for the network. 242 o In the case that the shared global label range need to be 243 negotiated across multiple domains, the central controllers of 244 these domains would also need to negotiate a common global 245 label range across domains. 247 o The PCECC would need to set the shared global label range to 248 all PCC nodes in the network. 250 3.2. Using PCECC for SR 252 Segment Routing (SR) leverages the source routing paradigm. Using 253 SR, a source node steers a packet through a path without relying on 254 hop-by-hop signaling protocols such as LDP or RSVP-TE. Each path is 255 specified as an ordered list of instructions called "segments". Each 256 segment is an instruction to route the packet to a specific place in 257 the network, or to perform a specific service on the packet. A 258 database of segments can be distributed through the network using a 259 routing protocol (such as IS-IS or OSPF) or by any other means. PCEP 260 (and PCECC) could be one such means. 262 [I-D.ietf-pce-segment-routing] specify the SR specific PCEP 263 extensions. PCECC may further use PCEP protocol for SR SID (Segment 264 Identifier) distribution to the SR nodes (PCC) with some benefits. 265 If the PCECC allocates and maintains the SID in the network for the 266 nodes and adjacencies; and further distributes them to the SR nodes 267 directly via the PCEP session has some advantage over the 268 configurations on each SR node and flooding via IGP, especially in a 269 SDN environment. 271 When the PCECC is used for the distribution of the node segment ID 272 and adjacency segment ID, the node segment ID is allocated from the 273 SRGB of the node. For the allocation of adjacency segment ID, the 274 allocation is from the SRLB of the node as described in 275 [I-D.zhao-pce-pcep-extension-pce-controller-sr]. 277 [RFC8355] identifies various protection and resiliency usecases for 278 SR. Path protection lets the ingress node be in charge of the 279 failure recovery (used for SR-TE). Also protection can be performed 280 by the node adjacent to the failed component, commonly referred to as 281 local protection techniques or fast-reroute (FRR) techniques. In 282 case of PCECC, the protection paths can be pre-computed and setup by 283 the PCE. 285 The following example illustrate the use case where the node SID and 286 adjacency SID are allocated by the PCECC. 288 192.0.2.1/32 289 +----------+ 290 | R1(1001) | 291 +----------+ 292 | 293 +----------+ 294 | R2(1002) | 192.0.2.2/32 295 +----------+ 296 * | * * 297 * | * * 298 *link1| * * 299 192.0.2.4/32 * | *link2 * 192.0.2.5/32 300 +-----------+ 9001| * +-----------+ 301 | R4(1004) | | * | R5(1005) | 302 +-----------+ | * +-----------+ 303 * | *9003 * + 304 * | * * + 305 * | * * + 306 +-----------+ +-----------+ 307 192.0.2.3/32 | R3(1003) | |R6(1006) |192.0.2.6/32 308 +-----------+ +-----------+ 309 | 310 +-----------+ 311 | R8(1008) | 192.0.2.8/32 312 +-----------+ 314 3.2.1. PCECC SID Allocation 316 Each node (PCC) is allocated a node-SID by the PCECC. The PCECC 317 needs to update the label map of each node to all the nodes in the 318 domain. On receiving the label map, each node (PCC) uses the local 319 routing information to determine the next-hop and download the label 320 forwarding instructions accordingly. The forwarding behavior and the 321 end result is same as IGP based Node-SID in SR. Thus, from anywhere 322 in the domain, it enforces the ECMP-aware shortest-path forwarding of 323 the packet towards the related node. 325 For each adjacency in the network, PCECC can allocate an Adj-SID. 326 The PCECC sends PCInitiate message to update the label map of each 327 Adj to the corresponding nodes in the domain. Each node (PCC) 328 download the label forwarding instructions accordingly. The 329 forwarding behavior and the end result is similar to IGP based "Adj- 330 SID" in SR. 332 The various mechanism are described in 333 [I-D.zhao-pce-pcep-extension-pce-controller-sr]. 335 3.2.2. Use Cases of PCECC for SR Best Effort (BE) Path 337 In this mode of the solution, the PCECC just need to allocate the 338 node segment ID and adjacency ID (without calculating the explicit 339 path for the SR path). The ingress of the forwarding path just need 340 to encapsulate the destination node segment ID on top of the packet. 341 All the intermediate nodes will forward the packet based on the 342 destination node SID. It is similar to the LDP LSP. 344 R1 may send a packet to R8 simply by pushing an SR header with 345 segment list {1008} (Node SID for R8). The path would be the based 346 on the routing/nexthop calculation on the routers. 348 3.2.3. Use Cases of PCECC for SR Traffic Engineering (TE) Path 350 SR-TE paths may not follow an IGP SPT. Such paths may be chosen by a 351 PCECC and provisioned on the ingress node of the SR-TE path. The SR 352 header consists of a list of SIDs (or MPLS labels). The header has 353 all necessary information so that, the packets can be guided from the 354 ingress node to the egress node of the path; hence, there is no need 355 for any signaling protocol. For the case where strict traffic 356 engineering path is needed, all the adjacency SID are stacked, 357 otherwise a combination of node-SID or adj-SID can be used for the 358 SR-TE paths. 360 Note that the bandwidth reservations is only guaranteed at controller 361 and through the enforce of the bandwidth admission control. As for 362 the RSVP-TE LSP case, the control plane signaling also does the link 363 bandwidth reservation in each hop of the path. 365 The SR traffic engineering path examples are explained as bellow: 367 Note that the node SID for each node is allocated from the SRGB and 368 adjacency SID for each link are allocated from the SRLB for each 369 node. 371 Example 1: 373 R1 may send a packet P1 to R8 simply by pushing an SR header with 374 segment list {1008}. Based on the best path, it could be: 375 R1-R2-R3-R8. 377 Example 2: 379 R1 may send a packet P2 to R8 by pushing an SR header with segment 380 list {1002, 9001, 1008}. The path should be: R1-R2-link1-R3-R8. 382 Example 3: 384 R1 may send a packet P3 to R8 via R4 by pushing an SR header with 385 segment list {1004, 1008}. The path could be : R1-R2-R4-R3-R8 387 The local protection examples for SR TE path are explained below: 389 Example 4: local link protection: 391 o R1 may send a packet P4 to R8 by pushing an SR header with segment 392 list {1002, 9001, 1008}. The path should be: R1-R2-link1-R3-R8. 394 o When node R2 receives the packet from R1 which has the header of 395 link1-R3-R8, and also find out there is a link failure of link1, 396 then the R2 can enforce the traffic over the bypass to send out 397 the packet with header of R3-R8 through link2. 399 Example 5: local node protection: 401 o R1 may send a packet P5 to R8 by pushing an SR header with segment 402 list {1004, 1008}. The path could be : R1-R2-R4-R3-R8. 404 o When node R2 receives the packet from R1 which has the header of 405 {1004, 1008}, and also finds out there is a node failure for 406 node4, then it can enforce the traffic over the bypass and send 407 out the packet with header of {1005, 1008} to node5 instead of 408 node4. 410 3.3. Use Cases of PCECC for TE LSP 412 In the Section 3.2 the case of SR path via PCECC is discussed. 413 Although those cases give the simplicity and scalability, but there 414 are existing functionalities for the traffic engineering path such as 415 the bandwidth guarantee, monitoring where SR based solution are 416 complex. Also there are cases where the depth of the label stack is 417 an issue for existing deployment and certain vendors. 419 So to address these issues, PCECC architecture also support the TE 420 LSP functionalities. To achieve this, the existing PCEP can be used 421 to communicate between the PCECC and nodes along the path. This is 422 similar to static LSPs, where LSPs can be provisioned as explicit 423 label instructions at each hop on the end-to-end path. Each router 424 along the path must be told what label- forwarding instructions to 425 program and what resources to reserve. The PCE-based controller 426 keeps a view of the network and determines the paths of the end-to- 427 end LSPs, and the controller uses PCEP to communicate with each 428 router along the path of the end-to-end LSP. 430 192.0.2.1/32 431 +----------+ 432 | R1 | 433 +----------+ 434 | | 435 |link1 | 436 | |link2 437 +----------+ 438 | R2 | 192.0.2.2/32 439 +----------+ 440 link3 * | * * link4 441 * | * * 442 *link5| * * 443 192.0.2.4/32 * | *link6 * 192.0.2.5/32 444 +-----------+ | * +-----------+ 445 | R4 | | * | R5 | 446 +-----------+ | * +-----------+ 447 * | * * + 448 link10 * | * *link7 + 449 * | * * + 450 +-----------+ +-----------+ 451 192.0.2.3/32 | R3 | |R6 |192.0.2.6/32 452 +-----------+ +-----------+ 453 | | 454 |link8 | 455 | |link9 456 +-----------+ 457 | R8 | 192.0.2.8/32 458 +-----------+ 460 Figure 2: PCECC TE LSP Setup Example 462 o Based on path computation request / delegation or PCE initiation, 463 the PCECC receives the PCECC request with constraints and 464 optimization criteria. 466 o PCECC would calculate the optimal path according to given 467 constrains (e.g. bandwidth). 469 o PCECC would provision each node along the path and assign incoming 470 and outgoing labels from R1 to R8 with the path: {R1, link1, 471 1001}, {1001, R2, link3, 2003], {2003, R4, link10, 4010}, {4010, 472 R3, link8, 3008}, {3008, R8}. 474 o For the end to end protection, PCECC program each node along the 475 path from R1 to R8 with the secondary path: {R1, link2, 1002}, 476 {1002, R2, link4, 2004], {2004, R5, link7, 5007}, {5007, R3, 477 link9, 3009}, {3009, R8}. 479 o It is also possible to have a bypass path for the local protection 480 setup by the PCECC. For example, the primary path as above, then 481 to protect the node R4 locally, PCECC can program the bypass path 482 like this: {R2, link5, 2005}, {2005, R3}. By doing this, the node 483 R4 is locally protected at R2. 485 3.3.1. PCECC Load Balancing (LB) Use Case 487 Very often many service providers use TE tunnels for solving issues 488 with non-deterministic paths in their networks. One example of such 489 applications is usage of TEs in the mobile backhaul (MBH). Consider 490 the following topology - 492 TE1 --------------> 493 +---------+ +--------+ +--------+ +--------+ +------+ +---+ 494 | Access |----| Access |----| AGG 1 |----| AGG N-1|----|Core 1|--|SR1| 495 | SubNode1| | Node 1 | +--------+ +--------+ +------+ +---+ 496 +---------+ +--------+ | | | ^ | 497 | Access | Access | AGG Ring 1 | | | 498 | SubRing 1 | Ring 1 | | | | | 499 +---------+ +--------+ +--------+ | | | 500 | Access | | Access | | AGG 2 | | | | 501 | SubNode2| | Node 2 | +--------+ | | | 502 +---------+ +--------+ | | | | | 503 | | | | | | | 504 | | | +----TE2----|-+ | 505 +---------+ +--------+ +--------+ +--------+ +------+ +---+ 506 | Access | | Access |----| AGG 3 |----| AGG N |----|Core N|--|SRn| 507 | SubNodeN|----| Node N | +--------+ +--------+ +------+ +---+ 508 +---------+ +--------+ 510 This MBH architecture uses L2 access rings and sub-rings. L3 starts 511 at the aggregation layer. For the sake of simplicity, the figure 512 shows only one access sub-ring, access ring and aggregation ring 513 (AGG1...AGGN), connected by Nx10GE interfaces. Aggregation domain 514 runs its own IGP. There are two Egress routers (AGG N-1,AGG N) that 515 are connected to the Core domain via L2 interfaces. Core also have 516 connections to service routers, RSVP-TEs are used for MPLS transport 517 inside the ring. There could be at least 2 tunnels (one way) from 518 each AGG router to egress AGG routers. There are also many L2 access 519 rings connected to AGG routers. 521 Service deployment made by means of either L2VPNs (VPLS) or L3VPNs. 522 Those services use MPLS TE as transport towards egress AGG routers. 523 TE tunnels could be also used as transport towards service routers in 524 case of seamless MPLS based architecture in the future. 526 There is a need to solve the following tasks: 528 o Perform automatic load-balance amongst TE tunnels according to 529 current traffic load. 531 o TE bandwidth (BW) management: Provide guaranteed BW for specific 532 service: HSI, IPTV, etc., provide time-based BW reservation (BoD) 533 for other services. 535 o Simplify development of TE tunnels by automation without any 536 manual intervention. 538 o Provide flexibility for Service Router placement (anywhere in the 539 network by creation of transport LSPs to them). 541 Since other tasks are already considered by other PCECC use cases, in 542 this section, the focus is on load balancing (LB) task. LB task 543 could be solved by means of PCECC in the following way: 545 o After application or network service or operator can ask SDN 546 controller (PCECC) for LSP based LB between AGG X and AGG N/AGG 547 N-1 (egress AGG routers which have connections to core) via North 548 Bound Interface (NBI). Each of these would have associated 549 constrains (i.e. Path Setup Type (PST), bandwidth, inclusion or 550 exclusion specific links or nodes, number of paths, objective 551 function (OF), need for disjoint LSP paths etc.). 553 o PCECC could calculate multiple (Say N) LSPs according to given 554 constrains, calculation is based on results of Objective Function 555 (OF) [RFC5541], constraints, endpoints, same or different 556 bandwidth (BW) , different links (in case of disjoint paths) and 557 other constrains. 559 o Depending on given LSP Path setup type (PST), PCECC would use 560 download instructions to the PCC. At this stage it is assumed the 561 PCECC is aware of the label space it controls and in case of SR 562 the SID allocation and distribution is already done. 564 o PCECC would send PCInitiate PCEP message [RFC8281] towards ingress 565 AGG X router(PCC) for each of N LSPs and receives PCRpt PCEP 566 message [RFC8231] back from PCCs. If the PST is PCECC-SR, the 567 PCECC would include the SID stack as per 568 [I-D.ietf-pce-segment-routing]. If the PST is PCECC (basic), then 569 the PCECC would assigns labels along the calculated path; and set 570 up the path by sending central controller instructions in PCEP 571 message to each node along the path of the LSP as per 572 [I-D.ietf-pce-pcep-extension-for-pce-controller] and then send 573 PCUpd message to the ingress AGG X router with information about 574 new LSP and AGG X(PCC) would respond with PCRpt with LSP status. 576 o AGG X as ingress router now have N LSPs towards AGG N and AGG N-1 577 which are available for installing to router's forwarding and LB 578 of traffic between them. Traffic distribution between those LSPs 579 depends on particular realization of hash-function on that router. 581 o Since PCECC is aware of TEDB (TE state) and LSP-DB, it can manage 582 and prevent possible over-subscriptions and limit number of 583 available LB states. Via PCECC mechanism the control can take 584 quick actions into the network by directly provisioning the 585 central control instructions. 587 3.3.2. PCECC and Inter-AS TE 589 There are various signaling options for establishing Inter-AS TE LSP: 590 contiguous TE LSP [RFC5151], stitched TE LSP [RFC5150], nested TE LSP 591 [RFC4206]. 593 Requirements for PCE-based Inter-AS setup [RFC5376] describe the 594 approach and PCEP functionality that are needed for establishing 595 Inter-AS TE LSPs. 597 [RFC5376] also gives Inter- and Intra-AS PCE Reference Model that is 598 provided below in shorten form for the sake of simplicity. 600 Inter-AS Inter-AS 601 PCC <-->PCE1<--------->PCE2 602 :: :: :: 603 :: :: :: 604 R1----ASBR1====ASBR3---R3---ASBR5 605 | AS1 | | PCC | 606 | | | AS2 | 607 R2----ASBR2====ASBR4---R4---ASBR6 608 :: :: 609 :: :: 610 Intra-AS Intra-AS 611 PCE3 PCE4 613 Figure 3: Shorten form of Inter- and Intra-AS PCE Reference Model 614 [RFC5376] 616 The PCECC belonging to different domain can co-operate to setup 617 inter-AS TE LSP. The stateful H-PCE [I-D.ietf-pce-stateful-hpce] 618 mechanism could also be used to first establish a per-domain PCECC 619 LSP. These could be stitched together to form inter-AS TE LSP as 620 described in [I-D.dugeon-pce-stateful-interdomain]. 622 For the sake of simplicity, here after the focus is on a simplified 623 Inter-AS case when both AS1 and AS2 belong to the same service 624 provider administration. In that case Inter and Intra-AS PCEs could 625 be combined in one single PCE if such combined PCE performance is 626 enough for handling all path computation request and setup. There is 627 a potential to use a single PCE for both ASes if the scalability and 628 performance are enough. The PCE would require interfaces (PCEP and 629 BGP-LS) to both domains. PCECC redundancy mechanisms are described 630 in [RFC8283]. Thus routers in AS1 and AS2 (PCCs) can send PCEP 631 messages towards same PCECC. 633 +----BGP-LS------+ +------BGP-LS-----+ 634 | | | | 635 +-PCEP-|----++-+-------PCECC-----PCEP--++-+-|-------+ 636 +-:------|----::-:-+ +--::-:-|-------:---+ 637 | : | :: : | | :: : | : | 638 | : RR1 :: : | | :: : RR2 : | 639 | v v: : | LSP1 | :: v v | 640 | R1---------ASBR1=======================ASBR3--------R3 | 641 | | v : | | :v | | 642 | +----------ASBR2=======================ASBR4---------+ | 643 | | Region 1 : | | : Region 1 | | 644 |----------------:-| |--:-------------|--| 645 | | v | LSP2 | v | | 646 | +----------ASBR5=======================ASBR6---------+ | 647 | Region 2 | | Region 2 | 648 +------------------+ <--------------> +-------------------+ 649 MPLS Domain 1 Inter-AS MPLS Domain 2 650 <=======AS1=======> <========AS2=======> 652 Figure 4: Particular case of Inter-AS PCE 654 In a case of PCECC Inter-AS TE scenario where service provider 655 controls both domains (AS1 and AS2), each of them have own IGP and 656 MPLS transport. There is a need is to setup Inter-AS LSPs for 657 transporting different services on top of them (Voice, L3VPN etc.). 658 Inter-AS links with different capacity exist in several regions. The 659 task is not only to provision those Inter-AS LSPs with given 660 constrains but also calculate the path and pre-setup the backup 661 Inter-AS LSPs that will be used if primary LSP fails. 663 As per the Figure 4, LSP1 from R1 to R3 goes via ASBR1 and ASBR3, and 664 it is the primary Inter-AS LSP. R1-R3 LSP2 that go via ASBR5 and 665 ASBR6 is the backup one. In addition there could also be a bypass 666 LSP setup to protect against ASBR or inter-AS link failure. 668 After the addition of PCECC functionality to PCE (SDN controller), 669 PCECC based Inter-AS TE model SHOULD follow as PCECC usecase for TE 670 LSP as requirements of [RFC5376] with the following details: 672 o Since PCECC needs to know the topology of both domains AS1 and 673 AS2, PCECC could use BGP-LS peering with routers (or RRs) in both 674 domains. 676 o PCECC needs to PCEP connectivity towards all routers in both 677 domains (see also section 4 in [RFC5376]) in a similar manner as a 678 SDN controller. 680 o After operator's application or service orchestrator will create 681 request for tunnel creation of specific service, PCECC should 682 receive that request via NBI (NBI type is implementation 683 dependent, could be NETCONF/Yang, REST etc.). Then PCECC would 684 calculate the optimal path based on Objective Function (OF) and 685 given constraints (i.e. path setup type, bandwidth etc.), 686 including those from [RFC5376]: priority, AS sequence, preferred 687 ASBR, disjoint paths, protection. On this step we would have two 688 paths: R1-ASBR1-ASBR3-R3, R1-ASBR5-ASBR6-R3 690 o Depending on given LSP PST (PCECC or PCECC-SR), PCECC would use 691 central control download instructions to the PCC. At this stage 692 it is assumed the PCECC is aware of the label space it controls 693 and in case of SR the SID allocation and distribution is already 694 done. 696 o PCECC would send PCInitiate PCEP message [RFC8281] towards ingress 697 router R1 (PCC) in AS1 and receives PCRpt PCEP message [RFC8231] 698 back from PCC. If the PST is PCECC-SR, the PCECC would include 699 the SID stack as per [I-D.ietf-pce-segment-routing]. It may also 700 include binding SID based on AS boundary. The backup SID stack 701 could also be installed at ingress but more importantly each node 702 along the SR path could also do local protection just based on the 703 top segment. If the PST is PCECC (basic), then the PCECC would 704 assigns labels along the calculated paths (R1-ASBR1-ASBR3-R3, 705 R1-ASBR5-ASBR6-R3); and set up the path by sending central 706 controller instructions in PCEP message to each node along the 707 path of the LSPs as per 708 [I-D.ietf-pce-pcep-extension-for-pce-controller] and then send 709 PCUpd message to the ingress R1 router with information about new 710 LSPs and R1 would respond with PCRpt with LSP(s) status. 712 o After that step R1 now have primary and backup TEs (LSP1 and LSP2) 713 towards R3. It is up to router implementation how to make 714 switchover to backup LSP2 if LSP1 fails. 716 3.4. Use Cases of PCECC for Multicast LSPs 718 The current multicast LSPs are setup either using the RSVP-TE P2MP or 719 mLDP protocols. The setup of these LSPs may require manual 720 configurations and complex signaling when the protection is 721 considered. By using the PCECC solution, the multicast LSP can be 722 computed and setup through centralized controller which has the full 723 picture of the topology and bandwidth usage for each link. It not 724 only reduces the complex configurations comparing the distributed 725 RSVP-TE P2MP or mLDP signaling, but also it can compute the disjoint 726 primary path and secondary P2MP path efficiently. 728 3.4.1. Using PCECC for P2MP/MP2MP LSPs' Setup 730 It is assumed the PCECC is aware of the label space it controls for 731 all nodes and make allocations accordingly. 733 +----------+ 734 | R1 | Root node of the multicast LSP 735 +----------+ 736 |6000 737 +----------+ 738 Transit Node | R2 | 739 branch +----------+ 740 * | * * 741 9001* | * *9002 742 * | * * 743 +-----------+ | * +-----------+ 744 | R4 | | * | R5 | Transit Nodes 745 +-----------+ | * +-----------+ 746 * | * * + 747 9003* | * * +9004 748 * | * * + 749 +-----------+ +-----------+ 750 | R3 | | R6 | Leaf Node 751 +-----------+ +-----------+ 752 9005| 753 +-----------+ 754 | R8 | Leaf Node 755 +-----------+ 757 The P2MP examples are explained here, where R1 is root and R8 and R6 758 are the leaves. 760 o Based on the P2MP path computation request / delegation or PCE 761 initiation, the PCECC receives the PCECC request with constraints 762 and optimization criteria. 764 o PCECC would calculate the optimal P2MP path according to given 765 constrains (i.e.bandwidth). 767 o PCECC would provision each node along the path and assign incoming 768 and outgoing labels from R1 to {R6, R8} with the path: {R1, 6000}, 769 {6000, R2, {9001,9002}}, {9001, R4, 9003}, {9002, R5, 9004} {9003, 770 R3, 9005}, {9004, R6}, {9005, R8}. The main difference is in the 771 branch node instruction at R2 where two copies of packet are sent 772 towards R4 and R5 with 9001 and 9002 labels respectively. 774 The packet forwarding involves - 776 Step1: R1 may send a packet P1 to R2 simply by pushing an label of 777 6000 to the packet. 779 Step2: After R2 receives the packet with label 6000, it will 780 forwarding to R4 by swapping label to 9001 and by swapping label 781 of 9002 towards R5. 783 Step3: After R4 receives the packet with label 9001, it will 784 forwarding to R3 by swapping to 9003. After R5 receives the 785 packet with label 9002, it will forwarding to R6 by swapping to 786 9004. 788 Step4: After R3 receives the packet with label 9003, it will 789 forwarding to R8 by swapping to 9005 and when R5 receives the 790 packet with label 9004, it will swap to 9004 and send to R6. 792 Step5: Packet received at R8 and 9005 is popped; packet receives 793 at R6 and 9004 is popped. 795 3.4.2. Use Cases of PCECC for the Resiliency of P2MP/MP2MP LSPs 797 3.4.2.1. PCECC for the End-to-End Protection of the P2MP/MP2MP LSPs 799 In this section we describe the end-to-end managed path protection 800 service as well as the local protection with the operation management 801 in the PCECC network for the P2MP/MP2MP LSP. 803 An end-to-end protection principle can be applied for computing 804 backup P2MP or MP2MP LSPs. During computation of the primary 805 multicast trees, PCECC server may also take the computation of a 806 secondary tree into consideration. A PCE may compute the primary and 807 backup P2MP (or MP2MP) LSP together or sequentially. 809 +----+ +----+ 810 Root node of LSP | R1 |--| R11| 811 +----+ +----+ 812 / + 813 10/ +20 814 / + 815 +----------+ +-----------+ 816 Transit Node | R2 | | R3 | 817 +----------+ +-----------+ 818 | \ + + 819 | \ + + 820 10| 10\ +20 20+ 821 | \ + + 822 | \ + 823 | + \ + 824 +-----------+ +-----------+ Leaf Nodes 825 | R4 | | R5 | (Downstream LSR) 826 +-----------+ +-----------+ 828 In the example above, when the PCECC setup the primary multicast tree 829 from the root node R1 to the leaves, which is R1->R2->{R4, R5}, at 830 same time, it can setup the backup tree, which is R1->R11->R3->{R4, 831 R5}. Both the these two primary forwarding tree and secondary 832 forwarding tree will be downloaded to each routers along the primary 833 path and the secondary path. The traffic will be forwarded through 834 the R1->R2->{R4, R5} path normally, and when there is a node in the 835 primary tree fails (say R2), then the root node R1 will switch the 836 flow to the backup tree, which is R1->R11->R3->{R4, R5}. By using 837 the PCECC, the path computation and forwarding path downloading can 838 all be done without the complex signaling used in the P2MP RSVP-TE or 839 mLDP. 841 3.4.2.2. PCECC for the Local Protection of the P2MP/MP2MP LSPs 843 In this section we describe the local protection service in the PCECC 844 network for the P2MP/MP2MP LSP. 846 While the PCECC sets up the primary multicast tree, it can also build 847 the back LSP among PLR, the protected node, and MPs (the downstream 848 nodes of the protected node). In the cases where the amount of 849 downstream nodes are huge, this mechanism can avoid unnecessary 850 packet duplication on PLR and protect the network from traffic 851 congestion risk. 853 +------------+ 854 | R1 | Root Node 855 +------------+ 856 . 857 . 858 . 859 +------------+ Point of Local Repair/ 860 | R10 | Switchover Point 861 +------------+ (Upstream LSR) 862 / + 863 10/ +20 864 / + 865 +----------+ +-----------+ 866 Protected Node | R20 | | R30 | 867 +----------+ +-----------+ 868 | \ + + 869 | \ + + 870 10| 10\ +20 20+ 871 | \ + + 872 | \ + 873 | + \ + 874 +-----------+ +-----------+ Merge Point 875 | R40 | | R50 | (Downstream LSR) 876 +-----------+ +-----------+ 877 . . 878 . . 880 In the example above, when the PCECC setup the primary multicast path 881 around the PLR node R10 to protect node R20, which is R10->R20->{R40, 882 R50}, at same time, it can setup the backup path R10->R30->{R40, 883 R50}. Both the these two primary forwarding path and secondary 884 bypass forwarding path will be downloaded to each routers along the 885 primary path and the secondary bypass path. The traffic will be 886 forwarded through the R10->R20->{R40, R50} path normally, and when 887 there is a node failure for node R20, then the PLR node R10 will 888 switch the flow to the backup path, which is R10->R30->{R40, R50}. 889 By using the PCECC, the path computation and forwarding path 890 downloading can all be done without the complex signaling used in the 891 P2MP RSVP-TE or mLDP. 893 3.5. Use Cases of PCECC for LSP in the Network Migration 895 One of the main advantages for PCECC solution is that it has backward 896 compatibility naturally since the PCE server itself can function as a 897 proxy node of MPLS network for all the new nodes which may no longer 898 support the signaling protocols. 900 As it is illustrated in the following example, the current network 901 could migrate to a total PCECC controlled network gradually by 902 replacing the legacy nodes. During the migration, the legacy nodes 903 still need to signal using the existing MPLS protocol such as LDP and 904 RSVP-TE, and the new nodes setup their portion of the forwarding path 905 through PCECC directly. With the PCECC function as the proxy of 906 these new nodes, MPLS signaling can populate through network as 907 normal. 909 Example described in this section is based on network configurations 910 illustrated using the following figure: 912 +------------------------------------------------------------------+ 913 | PCE DOMAIN | 914 | +-----------------------------------------------------+ | 915 | | PCECC | | 916 | +-----------------------------------------------------+ | 917 | ^ ^ ^ ^ | 918 | | PCEP | | PCEP | | 919 | V V V V | 920 | +--------+ +--------+ +--------+ +--------+ +--------+ | 921 | | NODE 1 | | NODE 2 | | NODE 3 | | NODE 4 | | NODE 5 | | 922 | | |...| |...| |...| |...| | | 923 | | Legacy |if1| Legacy |if2|Legacy |if3| PCECC |if4| PCECC | | 924 | | Node | | Node | |Enabled | |Enabled | | Enabled| | 925 | +--------+ +--------+ +--------+ +--------+ +--------+ | 926 | | 927 +------------------------------------------------------------------+ 929 Example: PCECC Initiated LSP Setup In the Network Migration 931 In this example, there are five nodes for the TE LSP from head end 932 (Node1) to the tail end (Node5). Where the Node4 and Node5 are 933 centrally controlled and other nodes are legacy nodes. 935 o Node1 sends a path request message for the setup of LSP 936 destinating to Node5. 938 o PCECC sends to node1 a reply message for LSP setup with the path: 939 (Node1, if1),(Node2, if2), (Node3, if3), (Node4, if4), Node5. 941 o Node1, Node2, Node3 will setup the LSP to Node5 using the local 942 labels as usual. Node 3 with help of PCECC could proxy the 943 signaling. 945 o Then the PCECC will program the out-segment of Node3, the in- 946 segment/ out-segment of Node4, and the in-segment for Node5. 948 3.6. Use Cases of PCECC for L3VPN and PWE3 950 As described in [RFC8283], various network services may be offered 951 over a network. These include protection services (including Virtual 952 Private Network (VPN) services (such as Layer 3 VPNs [RFC4364] or 953 Ethernet VPNs [RFC7432]); or Pseudowires [RFC3985]. Delivering 954 services over a network in an optimal way requires coordination in 955 the way that network resources are allocated to support the services. 956 A PCE-based central controller can consider the whole network and all 957 components of a service at once when planning how to deliver the 958 service. It can then use PCEP to manage the network resources and to 959 install the necessary associations between those resources. 961 In the case of L3VPN, VPN labels can be assigned and distributed 962 through the PCECC PCEP among the PE router instead of using the BGP 963 protocols. 965 Example described in this section is based on network configurations 966 illustrated using the following figure: 968 +-------------------------------------------+ 969 | PCE DOMAIN | 970 | +-----------------------------------+ | 971 | | PCECC | | 972 | +-----------------------------------+ | 973 | ^ ^ ^ | 974 |PWE3/L3VPN | PCEP PCEP|LSP PWE3/L3VPN|PCEP | 975 | V V V | 976 +--------+ | +--------+ +--------+ +--------+ | +--------+ 977 | CE | | | PE1 | | NODE x | | PE2 | | | CE | 978 | |...... | |...| |...| |.....| | 979 | Legacy | |if1 | PCECC |if2|PCCEC |if3| PCECC |if4 | Legacy | 980 | Node | | | Enabled| |Enabled | |Enabled | | | Node | 981 +--------+ | +--------+ +--------+ +--------+ | +--------+ 982 | | 983 +-------------------------------------------+ 985 Example: Using PCECC for L3VPN and PWE3 987 In the case PWE3, instead of using the LDP signaling protocols, the 988 label and port pairs assigned to each pseudowire can be assigned 989 through PCECC among the PE routers and the corresponding forwarding 990 entries will be distributed into each PE routers through the extended 991 PCEP protocols and PCECC mechanism. 993 3.7. Using PCECC for Traffic Classification Information 995 As described in [RFC8283], traffic classification is an important 996 part of traffic engineering. It is the process of looking at a 997 packet to determine how it should be treated as it is forwarded 998 through the network. It applies in many scenarios including MPLS 999 traffic engineering (where it determines what traffic is forwarded 1000 onto which LSPs); segment routing (where it is used to select which 1001 set of forwarding instructions to add to a packet); and SFC (where it 1002 indicates along which service function path a packet should be 1003 forwarded). In conjunction with traffic engineering, traffic 1004 classification is an important enabler for load balancing. Traffic 1005 classification is closely linked to the computational elements of 1006 planning for the network functions just listed because it determines 1007 how traffic load is balanced and distributed through the network. 1008 Therefore, selecting what traffic classification should be performed 1009 by a router is an important part of the work done by a PCECC. 1011 Instructions can be passed from the controller to the routers using 1012 PCEP. These instructions tell the routers how to map traffic to 1013 paths or connections. Refer [I-D.ietf-pce-pcep-flowspec]. 1015 Along with traffic classification, there are few more question that 1016 needs to be considered once the path is setup - 1018 o how to use it 1020 o Whether it is a virtual link 1022 o Whether to advertise it in the IGP as a virtual link 1024 o What bits of this information to signal to the tail end 1026 These are out of scope of this document. 1028 3.8. Use Cases of PCECC for SRv6 1030 As per [RFC8402], with Segment Routing (SR), a node steers a packet 1031 through an ordered list of instructions, called segments. Segment 1032 Routing can be applied to the IPv6 architecture with the Segment 1033 Routing Header (SRH) [I-D.ietf-6man-segment-routing-header]. A 1034 segment is encoded as an IPv6 address. An ordered list of segments 1035 is encoded as an ordered list of IPv6 addresses in the routing 1036 header. The active segment is indicated by the Destination Address 1037 of the packet. Upon completion of a segment, a pointer in the new 1038 routing header is incremented and indicates the next segment. 1040 As per [I-D.ietf-6man-segment-routing-header], an SRv6 Segment is a 1041 128-bit value. "SRv6 SID" or simply "SID" are often used as a 1042 shorter reference for "SRv6 Segment". Further details are in An 1043 illustration is provided in 1044 [I-D.filsfils-spring-srv6-network-programming] where SRv6 SID is 1045 represented as LOC:FUNCT. 1047 [I-D.ietf-pce-segment-routing-ipv6] extends 1048 [I-D.ietf-pce-segment-routing] to support SR for IPv6 data plane. 1049 Further a PCECC could be extended to support SRv6 SID allocation and 1050 distribution. 1052 2001:db8::1 1053 +----------+ 1054 | R1 | 1055 +----------+ 1056 | 1057 +----------+ 1058 | R2 | 2001:db8::2 1059 +----------+ 1060 * | * * 1061 * | * * 1062 *link1| * * 1063 2001:db8::4 * | *link2 * 2001:db8::5 1064 +-----------+ | * +-----------+ 1065 | R4 | | * | R5 | 1066 +-----------+ | * +-----------+ 1067 * | * * + 1068 * | * * + 1069 * | * * + 1070 +-----------+ +-----------+ 1071 2001:db8::3 | R3 | |R6 |2001:db8::6 1072 +-----------+ +-----------+ 1073 | 1074 +-----------+ 1075 | R8 | 2001:db8::8 1076 +-----------+ 1078 In this case, PCECC could assign the SRv6 SID (in form of a IPv6 1079 address) to be used for node and adjacency. Later SRv6 path in form 1080 of list of SRv6 SID could be used at the ingress. Some examples - 1082 o SRv6 SID-List={2001:db8::8} - The best path towards R8 1084 o SRv6 SID-List={2001:db8::5, 2001:db8::8} - The path towards R8 via 1085 R5 1087 3.9. Use Cases of PCECC for SFC 1089 Service Function Chaining (SFC) is described in [RFC7665]. It is the 1090 process of directing traffic in a network such that it passes through 1091 specific hardware devices or virtual machines (known as service 1092 function nodes) that can perform particular desired functions on the 1093 traffic. The set of functions to be performed and the order in which 1094 they are to be performed is known as a service function chain. The 1095 chain is enhanced with the locations at which the service functions 1096 are to be performed to derive a Service Function Path (SFP). Each 1097 packet is marked as belonging to a specific SFP, and that marking 1098 lets each successive service function node know which functions to 1099 perform and to which service function node to send the packet next. 1100 To operate an SFC network, the service function nodes must be 1101 configured to understand the packet markings, and the edge nodes must 1102 be told how to mark packets entering the network. Additionally, it 1103 may be necessary to establish tunnels between service function nodes 1104 to carry the traffic. Planning an SFC network requires load 1105 balancing between service function nodes and traffic engineering 1106 across the network that connects them. As per [RFC8283], these are 1107 operations that can be performed by a PCE-based controller, and that 1108 controller can use PCEP to program the network and install the 1109 service function chains and any required tunnels. 1111 PCECC can play the role for setting the traffic classification rules 1112 at the classifier as well as downloading the forwarding instructions 1113 to the SFFs so that they could process the NSH and forward 1114 accordingly. 1116 [Editor's Note - more details to be added] 1118 3.10. Use Cases of PCECC for Native IP 1120 [I-D.ietf-teas-native-ip-scenarios] describes the scenarios, and 1121 suggestions for the "Centrally Control Dynamic Routing (CCDR)" 1122 architecture, which integrates the merit of traditional distributed 1123 protocols (IGP/BGP), and the power of centrally control technologies 1124 (PCE/SDN) to provide one feasible traffic engineering solution in 1125 various complex scenarios for the service provider. 1126 [I-D.ietf-teas-pce-native-ip] defines the framework for CCDR traffic 1127 engineering within Native IP network, using Dual/Multi-BGP session 1128 strategy and CCDR architecture. PCEP protocol can be used to 1129 transfer the key parameters between PCE and the underlying network 1130 devices (PCC) using PCECC technique. The central control 1131 instructions from PCECC to identify which prefix should be advertised 1132 on which BGP session. 1134 3.11. Use Cases of PCECC for Local Protection (RSVP-TE) 1136 [I-D.cbrt-pce-stateful-local-protection] describes the need for the 1137 PCE to maintain and associate the local protection paths for the 1138 RSVP-TE LSP. Local protection requires the setup of a bypass at the 1139 PLR. This bypass can be PCC-initiated and delegated, or PCE- 1140 initiated. In either case, the PLR MUST maintain a PCEP session to 1141 the PCE. The Bypass LSPs need to mapped to the primary LSP. This 1142 could be done locally at the PLR based on a local policy but there is 1143 a need for a PCE to do the mapping as well to exert greater control. 1145 This mapping can be done via PCECC procedures where the PCE could 1146 instruct the PLR to the mapping and identify the primary LSP for 1147 which bypass should be used. 1149 3.12. Use Cases of PCECC for BIER 1151 Bit Index Explicit Replication (BIER) [RFC8279] defines an 1152 architecture where all intended multicast receivers are encoded as a 1153 bitmask in the multicast packet header within different 1154 encapsulations. A router that receives such a packet will forward 1155 the packet based on the bit position in the packet header towards the 1156 receiver(s) following a precomputed tree for each of the bits in the 1157 packet. Each receiver is represented by a unique bit in the bitmask. 1159 BIER-TE [I-D.ietf-bier-te-arch] shares architecture and packet 1160 formats with BIER. BIER-TE forwards and replicates packets based on 1161 a BitString in the packet header, but every BitPosition of the 1162 BitString of a BIER-TE packet indicates one or more adjacencies. 1163 BIER-TE Path can be derived from a PCE and used at the ingress as 1164 described in [I-D.chen-pce-bier]. 1166 Further, PCECC mechanims could be used for the allocation of bits for 1167 the BIER router for BIER as well as for the adjacencies for BIER-TE. 1168 PCECC based controller can use PCEP to instruct the BIER capable 1169 routers the meaning of the bits as well as other fields needed for 1170 BIER encapsulation. 1172 [Editor's Note - more details to be added] 1174 4. IANA Considerations 1176 This document does not require any action from IANA. 1178 5. Security Considerations 1180 TBD. 1182 6. Acknowledgments 1184 We would like to thank Adrain Farrel, Aijun Wang, Robert Tao, 1185 Changjiang Yan, Tieying Huang, Sergio Belotti, Dieter Beller, Andrey 1186 Elperin and Evgeniy Brodskiy for their useful comments and 1187 suggestions. 1189 7. References 1191 7.1. Normative References 1193 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1194 Requirement Levels", BCP 14, RFC 2119, 1195 DOI 10.17487/RFC2119, March 1997, 1196 . 1198 [RFC5440] Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation 1199 Element (PCE) Communication Protocol (PCEP)", RFC 5440, 1200 DOI 10.17487/RFC5440, March 2009, 1201 . 1203 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1204 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1205 May 2017, . 1207 [RFC8283] Farrel, A., Ed., Zhao, Q., Ed., Li, Z., and C. Zhou, "An 1208 Architecture for Use of PCE and the PCE Communication 1209 Protocol (PCEP) in a Network with Central Control", 1210 RFC 8283, DOI 10.17487/RFC8283, December 2017, 1211 . 1213 7.2. Informative References 1215 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 1216 Edge-to-Edge (PWE3) Architecture", RFC 3985, 1217 DOI 10.17487/RFC3985, March 2005, 1218 . 1220 [RFC4206] Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP) 1221 Hierarchy with Generalized Multi-Protocol Label Switching 1222 (GMPLS) Traffic Engineering (TE)", RFC 4206, 1223 DOI 10.17487/RFC4206, October 2005, 1224 . 1226 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1227 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1228 2006, . 1230 [RFC5150] Ayyangar, A., Kompella, K., Vasseur, JP., and A. Farrel, 1231 "Label Switched Path Stitching with Generalized 1232 Multiprotocol Label Switching Traffic Engineering (GMPLS 1233 TE)", RFC 5150, DOI 10.17487/RFC5150, February 2008, 1234 . 1236 [RFC5151] Farrel, A., Ed., Ayyangar, A., and JP. Vasseur, "Inter- 1237 Domain MPLS and GMPLS Traffic Engineering -- Resource 1238 Reservation Protocol-Traffic Engineering (RSVP-TE) 1239 Extensions", RFC 5151, DOI 10.17487/RFC5151, February 1240 2008, . 1242 [RFC5541] Le Roux, JL., Vasseur, JP., and Y. Lee, "Encoding of 1243 Objective Functions in the Path Computation Element 1244 Communication Protocol (PCEP)", RFC 5541, 1245 DOI 10.17487/RFC5541, June 2009, 1246 . 1248 [RFC5376] Bitar, N., Zhang, R., and K. Kumaki, "Inter-AS 1249 Requirements for the Path Computation Element 1250 Communication Protocol (PCECP)", RFC 5376, 1251 DOI 10.17487/RFC5376, November 2008, 1252 . 1254 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1255 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 1256 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 1257 2015, . 1259 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 1260 Chaining (SFC) Architecture", RFC 7665, 1261 DOI 10.17487/RFC7665, October 2015, 1262 . 1264 [RFC8231] Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path 1265 Computation Element Communication Protocol (PCEP) 1266 Extensions for Stateful PCE", RFC 8231, 1267 DOI 10.17487/RFC8231, September 2017, 1268 . 1270 [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1271 Przygienda, T., and S. Aldrin, "Multicast Using Bit Index 1272 Explicit Replication (BIER)", RFC 8279, 1273 DOI 10.17487/RFC8279, November 2017, 1274 . 1276 [RFC8281] Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "Path 1277 Computation Element Communication Protocol (PCEP) 1278 Extensions for PCE-Initiated LSP Setup in a Stateful PCE 1279 Model", RFC 8281, DOI 10.17487/RFC8281, December 2017, 1280 . 1282 [RFC8355] Filsfils, C., Ed., Previdi, S., Ed., Decraene, B., and R. 1283 Shakir, "Resiliency Use Cases in Source Packet Routing in 1284 Networking (SPRING) Networks", RFC 8355, 1285 DOI 10.17487/RFC8355, March 2018, 1286 . 1288 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 1289 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1290 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 1291 July 2018, . 1293 [I-D.ietf-pce-segment-routing] 1294 Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W., 1295 and J. Hardwick, "PCEP Extensions for Segment Routing", 1296 draft-ietf-pce-segment-routing-16 (work in progress), 1297 March 2019. 1299 [I-D.ietf-pce-stateful-hpce] 1300 Dhody, D., Lee, Y., Ceccarelli, D., Shin, J., and D. King, 1301 "Hierarchical Stateful Path Computation Element (PCE).", 1302 draft-ietf-pce-stateful-hpce-10 (work in progress), June 1303 2019. 1305 [I-D.ietf-pce-pcep-flowspec] 1306 Dhody, D., Farrel, A., and Z. Li, "PCEP Extension for Flow 1307 Specification", draft-ietf-pce-pcep-flowspec-03 (work in 1308 progress), February 2019. 1310 [I-D.ietf-pce-pcep-extension-for-pce-controller] 1311 Zhao, Q., Li, Z., Negi, M., and C. Zhou, "PCEP Procedures 1312 and Protocol Extensions for Using PCE as a Central 1313 Controller (PCECC) of LSPs", draft-ietf-pce-pcep- 1314 extension-for-pce-controller-01 (work in progress), 1315 February 2019. 1317 [I-D.zhao-pce-pcep-extension-pce-controller-sr] 1318 Zhao, Q., Li, Z., Negi, M., and C. Zhou, "PCEP Procedures 1319 and Protocol Extensions for Using PCE as a Central 1320 Controller (PCECC) of SR-LSPs", draft-zhao-pce-pcep- 1321 extension-pce-controller-sr-04 (work in progress), 1322 February 2019. 1324 [I-D.li-pce-controlled-id-space] 1325 Li, C., Chen, M., Dong, J., Li, Z., Wang, A., Cheng, W., 1326 and C. Zhou, "PCE Controlled ID Space", draft-li-pce- 1327 controlled-id-space-03 (work in progress), June 2019. 1329 [I-D.dugeon-pce-stateful-interdomain] 1330 Dugeon, O., Meuric, J., Lee, Y., and D. Ceccarelli, "PCEP 1331 Extension for Stateful Inter-Domain Tunnels", draft- 1332 dugeon-pce-stateful-interdomain-02 (work in progress), 1333 March 2019. 1335 [I-D.cbrt-pce-stateful-local-protection] 1336 Barth, C. and R. Torvi, "PCEP Extensions for RSVP-TE 1337 Local-Protection with PCE-Stateful", draft-cbrt-pce- 1338 stateful-local-protection-01 (work in progress), June 1339 2018. 1341 [I-D.filsfils-spring-srv6-network-programming] 1342 Filsfils, C., Camarillo, P., Leddy, J., 1343 daniel.voyer@bell.ca, d., Matsushima, S., and Z. Li, "SRv6 1344 Network Programming", draft-filsfils-spring-srv6-network- 1345 programming-07 (work in progress), February 2019. 1347 [I-D.ietf-pce-segment-routing-ipv6] 1348 Negi, M., Li, C., Sivabalan, S., Kaladharan, P., and Y. 1349 Zhu, "PCEP Extensions for Segment Routing leveraging the 1350 IPv6 data plane", draft-ietf-pce-segment-routing-ipv6-02 1351 (work in progress), April 2019. 1353 [I-D.ietf-6man-segment-routing-header] 1354 Filsfils, C., Dukes, D., Previdi, S., Leddy, J., 1355 Matsushima, S., and d. daniel.voyer@bell.ca, "IPv6 Segment 1356 Routing Header (SRH)", draft-ietf-6man-segment-routing- 1357 header-21 (work in progress), June 2019. 1359 [I-D.ietf-teas-pce-native-ip] 1360 Wang, A., Zhao, Q., Khasanov, B., Chen, H., and R. Mallya, 1361 "PCE in Native IP Network", draft-ietf-teas-pce-native- 1362 ip-03 (work in progress), April 2019. 1364 [I-D.ietf-teas-native-ip-scenarios] 1365 Wang, A., Huang, X., Qou, C., Li, Z., and P. Mi, 1366 "Scenarios and Simulation Results of PCE in Native IP 1367 Network", draft-ietf-teas-native-ip-scenarios-06 (work in 1368 progress), June 2019. 1370 [I-D.ietf-bier-te-arch] 1371 Eckert, T., Cauchie, G., Braun, W., and M. Menth, "Traffic 1372 Engineering for Bit Index Explicit Replication (BIER-TE)", 1373 draft-ietf-bier-te-arch-02 (work in progress), May 2019. 1375 [I-D.chen-pce-bier] 1376 Chen, R. and Z. Zhang, "PCEP Extensions for BIER", draft- 1377 chen-pce-bier-05 (work in progress), March 2019. 1379 [MAP-REDUCE] 1380 Lee, K., Choi, T., Ganguly, A., Wolinsky, D., Boykin, P., 1381 and R. Figueiredo, "Parallel Processing Framework on a P2P 1382 System Using Map and Reduce Primitives", , may 2011, 1383 . 1385 [MPLS-DC] Afanasiev, D. and D. Ginsburg, "MPLS in DC and inter-DC 1386 networks: the unified forwarding mechanism for network 1387 programmability at scale", , march 2014, 1388 . 1391 7.3. URIs 1393 [1] https://hadoop.apache.org/ 1395 Appendix A. Using reliable P2MP TE based multicast delivery for 1396 distributed computations (MapReduce-Hadoop) 1398 MapReduce model of distributed computations in computing clusters is 1399 widely deployed. In Hadoop [1] 1.0 architecture MapReduce operations 1400 on big data performs by means of Master-Slave architecture in the 1401 Hadoop Distributed File System (HDFS), where NameNode has the 1402 knowledge about resources of the cluster and where actual data 1403 (chunks) for particular task are located (which DataNode). Each 1404 chunk of data (64MB or more) should have 3 saved copies in different 1405 DataNodes based on their proximity. 1407 Proximity level currently has semi-manual allocation and based on 1408 Rack IDs (Assumption is that closer data are better because of access 1409 speed/smaller latency). 1411 JobTracker node is responsible for computation tasks, scheduling 1412 across DataNodes and also have Rack-awareness. Currently transport 1413 protocols between NameNode/JobTracker and DataNodes are based on IP 1414 unicast. It has simplicity as pros but has numerous drawbacks 1415 related with its flat approach. 1417 It is clear that we should go beyond of one DC for Hadoop cluster 1418 creation and move towards distributed clusters. In that case we need 1419 to handle performance and latency issues. Latency depends on speed 1420 of light in fiber links and also latency introduced by intermediate 1421 devices in between. The last one is closely correlated with network 1422 device architecture and performance. Current performance of NPU 1423 based routers should be enough for creating distribute Hadoop 1424 clusters with predicted latency. Performance of SW based routers 1425 (mainly as VNF) together with additional HW features such as DPDK are 1426 promising but require additional research and testing. 1428 Main question is how can we create simple but effective architecture 1429 for distributed Hadoop cluster? 1431 There is research [MAP-REDUCE] which show how usage of multicast tree 1432 could improve speed of resource or cluster members discovery inside 1433 the cluster as well as increase redundancy in communications between 1434 cluster nodes. 1436 Is traditional IP based multicast enough for that? We doubt it 1437 because it requires additional control plane (IGMP, PIM) and a lot of 1438 signaling, that is not suitable for high performance computations, 1439 that are very sensitive to latency. 1441 P2MP TE tunnels looks much more suitable as potential solution for 1442 creation of multicast based communications between Master and Slave 1443 nodes inside cluster. Obviously these P2MP tunnels should be 1444 dynamically created and turned down (no manual intervention). Here, 1445 the PCECC comes to play with main objective to create optimal 1446 topology of each particular request for MapReduce computation and 1447 also create P2MP tunnels with needed parameters such as bandwidth and 1448 delay. 1450 This solution would require to use MPLS label based forwarding inside 1451 the cluster. Usage of label based forwarding inside DC was proposed 1452 by Yandex [MPLS-DC]. Technically it is already possible because MPLS 1453 on switches is already supported by some vendors, MPLS also exists on 1454 Linux and OVS. 1456 The following framework can make this task: 1458 +--------+ 1459 | APP | 1460 +--------+ 1461 | NBI (REST API,...) 1462 | 1463 PCEP +----------+ REST API 1464 +---------+ +---| PCECC |----------+ 1465 | Client |---|---| | | 1466 +---------+ | +----------+ | 1467 | | | | | | 1468 +-----|---+ |PCEP| | 1469 +--------+ | | | | | 1470 | | | | | | 1471 | REST API | | | | | 1472 | | | | | | 1473 +-------------+ | | | | +----------+ 1474 | Job Tracker | | | | | | NameNode | 1475 | | | | | | | | 1476 +-------------+ | | | | +----------+ 1477 +------------------+ | +-----------+ 1478 | | | | 1479 |---+-----P2MP TE--+-----|-----------| | 1480 +----------+ +----------+ +----------+ 1481 | DataNode1| | DataNode2| | DataNodeN| 1482 |TaskTraker| |TaskTraker| .... |TaskTraker| 1483 +----------+ +----------+ +----------+ 1485 Communication between Master nodes (JobTracker and NameNode) and 1486 PCECC via REST API MAY be either done directly or via cluster manager 1487 such as Mesos. 1489 Phase 1: Distributed cluster resources discovery During this phase 1490 Master Nodes SHOULD identify and find available Slave nodes according 1491 to computing request from application (APP). NameNode SHOULD query 1492 PCECC about available DataNodes, NameNode MAY provide additional 1493 constrains to PCECC such as topological proximity, redundancy level. 1495 PCECC SHOULD analyze the topology of distributed cluster and perform 1496 constrain based path calculation from client towards most suitable 1497 NameNodes. PCECC SHOULD reply to NameNode the list of most suitable 1498 DataNodes and their resource capabilities. Topology discovery 1499 mechanism for PCECC will be added later to that framework. 1501 Phase 2: PCECC SHOULD create P2MP LSP from client towards those 1502 DataNodes by means of PCEP messages following previously calculated 1503 path. 1505 Phase 3. NameNode SHOULD send this information to client, PCECC 1506 informs client about optimal P2MP path towards DataNodes via PCEP 1507 message. 1509 Phase 4. Client sends data blocks to those DataNodes for writing via 1510 created P2MP tunnel. 1512 When this task will be finished, P2MP tunnel could be turned down. 1514 Authors' Addresses 1516 Quintin Zhao 1517 Huawei Technologies 1518 125 Nagog Technology Park 1519 Acton, MA 01719 1520 US 1522 Email: quintinzhao@gmail.com 1524 Zhenbin (Robin) Li 1525 Huawei Technologies 1526 Huawei Bld., No.156 Beiqing Rd. 1527 Beijing 100095 1528 China 1530 Email: lizhenbin@huawei.com 1532 Boris Khasanov 1533 Huawei Technologies 1534 Moskovskiy Prospekt 97A 1535 St.Petersburg 196084 1536 Russia 1538 Email: khasanov.boris@huawei.com 1540 Dhruv Dhody 1541 Huawei Technologies 1542 Divyashree Techno Park, Whitefield 1543 Bangalore, Karnataka 560066 1544 India 1546 Email: dhruv.ietf@gmail.com 1547 King Ke 1548 Tencent Holdings Ltd. 1549 Shenzhen 1550 China 1552 Email: kinghe@tencent.com 1554 Luyuan Fang 1555 Expedia, Inc. 1556 USA 1558 Email: luyuanf@gmail.com 1560 Chao Zhou 1561 Cisco Systems 1563 Email: chao.zhou@cisco.com 1565 Boris Zhang 1566 Telus Communications 1568 Email: Boris.zhang@telus.com 1570 Artem Rachitskiy 1571 Mobile TeleSystems JLLC 1572 Nezavisimosti ave., 95 1573 Minsk 220043 1574 Belarus 1576 Email: arachitskiy@mts.by 1578 Anton Gulida 1579 LLC "Lifetech" 1580 Krasnoarmeyskaya str., 24 1581 Minsk 220030 1582 Belarus 1584 Email: anton.gulida@life.com.by