idnits 2.17.1 draft-shen-spring-p2mp-transport-chain-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 3, 2020) is 1542 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 245 -- Looks like a reference, but probably isn't: '2' on line 184 -- Looks like a reference, but probably isn't: '3' on line 249 Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Yimin Shen 3 Internet-Draft Zhaohui Zhang 4 Intended status: Standards Track Juniper Networks 5 Expires: August 6, 2020 February 3, 2020 7 Point-to-Multipoint Transport Using Chain Replication in Segment Routing 8 draft-shen-spring-p2mp-transport-chain-00 10 Abstract 12 This document specifies a point-to-multipoint (P2MP) transport 13 mechanism based on chain replication. It can be used in segment 14 routing to achieve traffic optimization. 16 Status of This Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at https://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on August 6, 2020. 33 Copyright Notice 35 Copyright (c) 2020 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (https://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 51 2. Specification of Requirements . . . . . . . . . . . . . . . . 3 52 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 3 53 4. P2MP Transport Using Chain Replication . . . . . . . . . . . 3 54 4.1. Bud Segment . . . . . . . . . . . . . . . . . . . . . . . 4 55 4.2. P2MP Chain . . . . . . . . . . . . . . . . . . . . . . . 6 56 4.3. Example . . . . . . . . . . . . . . . . . . . . . . . . . 7 57 5. Path Computation for P2MP Chains . . . . . . . . . . . . . . 8 58 6. IGP and BGP-LS Extensions for Bud Segment . . . . . . . . . . 9 59 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 60 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 61 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 62 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 63 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 64 10.2. Informative References . . . . . . . . . . . . . . . . . 10 65 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 67 1. Introduction 69 The Segment Routing Architecture [RFC8402] describes segment routing 70 (SR) and its instantiation in two data planes, i.e. MPLS and IPv6. 71 In SR, point-to-multipoint (P2MP) transport is currently achieved by 72 using ingress replication, where a point-to-point (P2P) SR tunnel is 73 constructed from a root node to each leaf node, and every ingress 74 packet is replicated and sent via a bundle of such P2P SR tunnels to 75 all the leaf nodes. Although this approach provides P2MP 76 reachability, it does not consider traffic optimization across the 77 tunnels, as the path of each tunnel is computed or decided 78 independently. 80 An alternative approach would be to use P2MP-tree based transport. 81 Such approach can achieve maximum traffic optimization, but it relies 82 a controller or path computation element (PCE) to dynamically 83 provision and manage "replication segments" on branch nodes. The 84 replication segments are essentially per-P2MP-tree (i.e. per-tunnel) 85 state on transit routers. Therefore, this approach is not fully 86 aligned with SR's principles of single-point (i.e. ingress router) 87 provisioning and stateless core. 89 This document introduces a new solution for P2MP transport in SR, 90 based on "chain replication". In this solution, P2MP transport is 91 achieved by constructing a set of "P2MP chain tunnels" (or simply 92 "P2MP chains") from a root node to leaf nodes. Each P2MP chain is a 93 tunnel with a leaf node at the tail end and some transit leaf nodes 94 along the path, resembling a chain. A transit leaf node replicates a 95 packet only once for local processing off the chain, and forwards the 96 original packet down the chain. The root node replicates and sends 97 packets via the set of P2MP chains to all the leaf nodes. 99 As a P2MP chain can reach multiple leaf nodes, it is considered to be 100 more efficient than the multiple P2P tunnels which would be needed in 101 ingress replication to reach these leaf nodes. Compared with ingress 102 replication and the P2MP-tree based approach, this solution provides 103 a middle ground by achieving a certain level of traffic optimization, 104 while aligning with the fundamental principles of SR, including 105 single-point provisioning and stateless core. The solution can be 106 used to improve P2MP transport efficiency in general, and to achieve 107 maximum traffic optimization in certain types of topologies. 109 2. Specification of Requirements 111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 113 document are to be interpreted as described in [RFC2119] and 114 [RFC8174]. 116 3. Applicability 118 The P2MP transport mechanism in this document is generally applicable 119 to all networks. However, it benefits more for certain types of 120 topologies than for others. These topologies include ring 121 topologies, linear topologies, topologies with leaf nodes 122 concentrated in geographical sites which can be modeled as leaf 123 groups, etc. 125 The mechanism is transparent to all transit routers. Leaf nodes 126 intended to take advantage of the mechanism will need to support the 127 new forwarding behavior specified in this document. For other leaf 128 nodes, the mechanism has a backward compatibility to allow them to be 129 reached by P2P tunnels using ingress replication. Path computation 130 and P2MP chain construction will need to be supported by a controller 131 or root nodes, depending on where they are performed. 133 The mechanism is applicable to both SR-MPLS [RFC8660] and SRv6 134 [SRv6-SRH], [SRv6-Programming]. 136 4. P2MP Transport Using Chain Replication 138 In this document, a P2MP transport scheme associated with a root node 139 and a set of leaf nodes is denoted as {root node, leaf nodes}. It is 140 achieved by using a bundle of P2MP chains covering all the leaf 141 nodes. Each P2MP chain is a tunnel starting from the root node and 142 reaching one or multiple leaf nodes along the path. The tail-end 143 node of the P2MP chain is a leaf node, called a "tail-end" leaf node. 145 Each leaf node traversed by the P2MP chain is called a "transit" leaf 146 node. As a special case, a P2MP chain may have no transit leaf node, 147 but only a tail-end leaf node, essentially becoming a P2P tunnel of 148 ingress replication. 150 R ------ R1 ------ R2 ------ L1 ------ R3 ------ L2 ------ L3 152 R : root node 153 Li : leaf node 154 Ri : transit router 156 Figure 1 158 A tail-end leaf node and a transit leaf nodes have different 159 behaviors when processing a received packet. In particular, a tail- 160 end leaf node processes the packet as a normal receiver. A transit 161 leaf node not only processes the packet as a receiver, but also 162 forwards it downstream along the P2MP chain, hence acting as a "bud 163 node". To achieve this, the transit leaf node needs to replicate the 164 packet, producing two packets, one for forwarding and the other for 165 local processing. Such packet replication happens on every transit 166 leaf node along a P2MP chain. Therefore, it is called "chain 167 replication". 169 This document introduces a new type of segments, called "bud 170 segments", to facilitate the above packet processing on leaf nodes. 171 The segment ID (SID) of a bud segment is a "bud-SID". 173 4.1. Bud Segment 175 On a leaf node, a bud segment represents the following instructions 176 for forwarding hardware to execute on a received packet P. They 177 apply when the active SID of the packet P is the bud-SID of this bud 178 segment. 180 [1] Detect whether this leaf node is a transit or tail-end leaf 181 node, based on whether the bud-SID is the last SID of a P2MP 182 chain. 184 [2] If this is a transit leaf node, replicate the packet to 185 produce a copy P1. 187 [2.1] For P, perform a NEXT operation on the bud-SID, make the 188 next SID active, and forward the packet based on that SID. 190 [2.2] For P1, perform a sequence of NEXT operations on the bud- 191 SID and all the subsequent SIDs of the P2MP chain, and process 192 the packet locally. 194 [3] If this is a tail-end leaf node, perform a NEXT operation on 195 the bud-SID for P, and process the packet locally. 197 In [2.2], when the transit leaf node processes P1 locally, all the 198 SIDs of the P2MP chain are not useful. Hence, they are removed 199 before the processing. 201 Bud segments are global segments of leaf nodes. They are routable 202 segments via topological shortest-paths. Only one bud segment is 203 needed per leaf node, and per SR-MPLS or SRv6. Bud-SIDs are 204 allocated from SRGB (SR global block). 206 In SR-MPLS, bud-SIDs are labels. In SRv6, bud-SIDs are IPv6 207 addresses explicitly associated with bud segments. Therefore, the 208 above instructions [1] to [3] are achieved in different ways in SR- 209 MPLS and SRv6: 211 (a) In SR-MPLS, there are two cases: 213 (a.1) The packet should have no service label, but only P2MP 214 chain labels in MPLS header. In [1], the bud segment SHOULD 215 detect whether the leaf node is a transit or tail-end leaf node 216 based on the S-bit (bottom of stack) of the bud-SID label. If 217 the S-bit is 0, the leaf node is a transit leaf node. If the 218 S-bit is 1, it is a tail-end leaf node. In [2.2], the bud 219 segment SHOULD simply pop the entire MPLS header. 221 (a.2) The packet may have service label(s) after P2MP chain 222 labels in MPLS header, e.g. a bridge domain label, a source 223 Ethernet segment label, etc. In this case, the bud segment 224 MUST have a way to identify the position of the last P2MP chain 225 label. This document introduces an "end-of-chain" (EoC) label 226 to facilitate the process. An EoC label is a label which is 227 known to all root nodes and leaf nodes in a network. It MUST 228 have a globally common value, via configuration on these nodes. 229 When a root node constructs an MPLS header for a packet, the 230 EoC label MUST be pushed immediately before P2MP chain labels, 231 making it the next label after the last P2MP chain label. 232 Thus, in [1], the bud segment SHOULD detect whether the leaf 233 node is a transit or tail-end leaf node based on whether the 234 next label in the current MPLS header is the EoC label. If so, 235 the leaf node is a tail-end leaf node. Otherwise, it is a 236 transit leaf node. In [2.2], the bud segment SHOULD pop labels 237 until the EoC label is popped. In [3], the bud segment SHOULD 238 pop the bud-SID label and the next label, which is the EoC 239 label. 241 (b) In SRv6, the packet is encapsulated with an outer IPv6 header 242 corresponding to the P2MP chain, optionally followed by a segment 243 routing header (SRH) containing the SIDs of the P2MP chain, and 244 followed by an inner header (of IPv4, IPv6, MPLS, layer-2, etc.) 245 associated with a service. In [1], the bud segment SHOULD detect 246 whether it is the last P2MP chain SID based on the SRH. If the 247 SRH does not exist or the Segments Left in the SRH is 0, the leaf 248 node is a tail-end leaf node. Otherwise, it is a transit leaf 249 node. In [2.2] and [3], the bud segment SHOULD simply remove the 250 outer IPv6 header and the SRH (if any), and leave the packet with 251 the inner header to local processing. 253 Bud segments are shared by all P2MP transport schemes, i.e. all 254 combinations of {root node, leaf nodes}. A leaf node SHOULD advertise 255 a bud segment for SR-MPLS, if its forwarding hardware supports the 256 above SR-MPLS processing. Likewise, it SHOULD advertise a bud 257 segment for SRv6, if its forwarding hardware supports the above SRv6 258 processing. The advertisement may be via IGP (ISIS, OSPF) or BGP-LS. 259 The advertisement allows the leaf node to be considered on a P2MP 260 chain. If a leaf node does not advertise a bud segment, it MUST be 261 reached via a P2P tunnel using ingress replication. 263 Bud segments are generic purpose segments. They may also be used in 264 cases other than P2MP transport, such as traffic monitoring. These 265 use cases are out of the scope of this document. 267 4.2. P2MP Chain 269 Construction of P2MP chains for a P2MP transport scheme is performed 270 by a controller or a root node based on path computation (Section 5). 271 The path of a P2MP chain is a single path traversing one or multiple 272 transit leaf nodes and terminating at a tail-end leaf node. Between 273 the root node and the first transit leaf node, and between two 274 consecutive leaf nodes, there may be none, one, or multiple transit 275 routers. 277 The path is then translated to a SID list to be programmed on the 278 root node. In the SID list, each transit leaf node has its bud-SID 279 in a corresponding position. Given a P2MP chain to a set of leaf 280 nodes in the order of L1, L2, ..., Ln, the SID list may be 281 represented as: 283 , bud-SID of L1, ..., , 284 bud-SID of Li, ..., , 285 Where: 287 o is the sub-path from the root node to L1. 289 o is the sub-path from Li-1 to Li. 291 o Ln's bud-SID is the last SID of the list, if the sub-path from 292 Ln-1 to Ln is partial or empty, or if an EoC label is needed in 293 SR-MPLS. It is optional in other cases. 295 The above sub-paths are regular point-to-point paths. The SIDs in 296 the sub-paths are regular SIDs, such as adjacency-SIDs, node-SIDs, 297 binding-SIDs, etc. There is no SID specific to the given P2MP chain. 298 A sub-path from Li-1 to Li may have an empty SID list, if the sub- 299 path takes the shortest path indicated by the bud-SID of Li. 301 The root node then uses the SID list in packet encapsulation. Note 302 that in the SR-MPLS case where an EoC label is needed, the EoC label 303 SHOULD be pushed to an MPLS header, before the SID list is pushed. 305 4.3. Example 307 In the following example, P2MP transport is needed from the root node 308 R, to leaf nodes L1, L2, L3 and L4. 310 R ------ R1 -------------------- R2 ------- L1 311 | | / 312 | | / 313 | | / 314 R3 -------------------- R4 ------- L2 315 | | 316 | | 317 | | 318 R5 -------------------- R6 ------- L3 319 | | / 320 | | / 321 | | / 322 R7 -------------------- R8 ------- L4 324 Figure 2 326 Path computation results in two P2MP chains: 328 P2MP chain 1: 330 Path: R -> R1 -> R2 -> L1 -> R4 -> L2, where L1 is a transit 331 leaf node, and L2 is the tail-end leaf node. 333 Assuming that the sub-path L1 -> R4 -> L2 matches the shortest 334 path from L1 to L2, the bud-SID of L2 is used to represent this 335 sub-path. The segment list applied to packets on R is: 337 adj-SID 100 - link from R to R1 339 adj-SID 200 - link from R1 to R2 341 adj-SID 300 - link from R2 to L1 343 bud-SID 1000 - L1 345 bud-SID 2000 - L2 347 P2MP chain 2: 349 Path: R -> R1 -> R3 -> R5 -> R6 -> L3 -> R8 -> L4, where L3 is 350 a transit leaf node, and L4 is the tail-end leaf node. 352 Assuming that the sub-path R -> R1 -> R3 -> R5 -> R6 -> L3 353 matches the shortest path from R to L3, the bud-SID of L3 is 354 used to represent this sub-path. The segment list applied to 355 packets on R is: 357 bud-SID 3000 - L3 359 adj-SID 600 - link from L3 to R8 361 adj-SID 700 - link from R8 to L4 363 bud-SID 4000 - L4 365 5. Path Computation for P2MP Chains 367 Path computation for the P2MP chains of a P2MP transport scheme {root 368 node, leaf nodes} lies in the responsibility of a controller or the 369 root node. This document does not enforce a particular computation 370 algorithm. In fact, any P2P path computation algorithm may be 371 extended to serve the purpose. 373 The path computation may consider general metric for shortest paths, 374 or traffic engineering (TE) constraints for TE paths. This document 375 recommends the following constraints to be considered as well: 377 - The maximum hop count of path. This SHOULD be based on the 378 maximum delay allowed for a packet to accumulate before reaching a 379 tail-end leaf node. 381 - The maximum length of SID list. This SHOULD be based on the 382 maximum header size which a root node may apply to a packet. This 383 is typically a limit of forwarding hardware. 385 Note that a SID list is translated from a computed path. Hence, the 386 length of the SID list and the hop count of the path are typically 387 not the same. 389 The path computation may achieve more predictable results by dividing 390 leaf nodes into groups based on their geographical or administrative 391 location. Thus, paths MAY be computed in a manner that each P2MP 392 chain is used to reach only a given group, while the number of P2MP 393 chains to reach all the leaf nodes of the group is minimized. 395 6. IGP and BGP-LS Extensions for Bud Segment 397 The protocol extensions of IGP (ISIS and OSPF) and BGP-LS for bud 398 segment advertisement will be specified in the next version of this 399 document. 401 7. IANA Considerations 403 This document requires IANA registration and allocation for the ISIS, 404 OSPF and BGP-LS extensions for bud segment advertisement. The 405 details will be provided in the next version of this document. 407 8. Security Considerations 409 This document introduces bud segments for leaf nodes to act as both 410 packet receivers and transit routers. A security attack may target 411 on a leaf node by constructing malicious packets with the node's bud- 412 SID. Such kind of attacks can be defeated by restricting bud segment 413 distribution and P2MP chain construction within the scope of a 414 controller and a given network. 416 9. Acknowledgements 418 This document leverages work done by Alexander Arseniev and Ron 419 Bonica. 421 10. References 423 10.1. Normative References 425 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 426 Decraene, B., Litkowski, S., and R. Shakir, "Segment 427 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 428 July 2018, . 430 [RFC8660] Bashandy, A., Ed., Filsfils, C., Ed., Previdi, S., 431 Decraene, B., Litkowski, S., and R. Shakir, "Segment 432 Routing with the MPLS Data Plane", RFC 8660, 433 DOI 10.17487/RFC8660, December 2019, 434 . 436 [SRv6-SRH] 437 Filsfils, C., Dukes, D., Previdi, S., Leddy, J., 438 Matsushima, S., and D. Voyer, "IPv6 Segment Routing 439 Header", draft-ietf-6man-segment-routing-header (work in 440 progress), 2019. 442 [SRv6-Programming] 443 Filsfils, C., Garvia, P., Leddy, J., Voyer, D., 444 Matsushima, S., and Z. Li, "SRv6 Network Programming", 445 draft-ietf-spring-srv6-network-programming (work in 446 progress), 2019. 448 10.2. Informative References 450 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 451 Requirement Levels", BCP 14, RFC 2119, 452 DOI 10.17487/RFC2119, March 1997, 453 . 455 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 456 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 457 May 2017, . 459 Authors' Addresses 461 Yimin Shen 462 Juniper Networks 463 10 Technology Park Drive 464 Westford, MA 01886 465 USA 467 Email: yshen@juniper.net 468 Zhaohui Zhang 469 Juniper Networks 470 10 Technology Park Drive 471 Westford, MA 01886 472 USA 474 Email: zzhang@juniper.net