idnits 2.17.1 draft-allan-spring-mpls-multicast-framework-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 2017) is 2565 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'MCAST-OPSF' is mentioned on line 538, but not defined == Unused Reference: 'MCAST-OSPF' is defined on line 591, but no explicit reference was found in the text == Unused Reference: 'RFC6514' is defined on line 595, but no explicit reference was found in the text == Unused Reference: 'RFC7385' is defined on line 598, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 SPRING Working Group Dave Allan 2 Internet Draft Ericsson 3 Intended status: Standards Track Jeff Tantsura 4 Expires: October 2017 5 April 2017 7 A Framework for Computed Multicast applied to MPLS based Segment 8 Routing 9 draft-allan-spring-mpls-multicast-framework-03 11 Abstract 13 This document describes a multicast solution for Segment Routing with 14 MPLS data plane. It is consistent with the Segment Routing 15 architecture in that an IGP is augmented to distribute information in 16 addition to the link state. In this solution it is multicast group 17 membership information sufficient to synchronize state in a given 18 network domain. Computation is employed to determine the topology of 19 any loosely specified multicast distribution tree. 21 Status of this Memo 23 This Internet-Draft is submitted to IETF in full conformance 24 with the provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet 27 Engineering Task Force (IETF), its areas, and its working 28 groups. Note that other groups may also distribute working 29 documents as Internet-Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six 32 months and may be updated, replaced, or obsoleted by other 33 documents at any time. It is inappropriate to use Internet- 34 Drafts as reference material or to cite them other than as "work 35 in progress". 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/ietf/1id-abstracts.txt. 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html. 43 This Internet-Draft will expire in October 2017. 45 Copyright and License Notice 46 Copyright (c) 2017 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with 54 respect to this document. Code Components extracted from this 55 document must include Simplified BSD License text as described 56 in Section 4.e of the Trust Legal Provisions and are provided 57 without warranty as described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction...................................................3 62 1.1. Authors......................................................3 63 1.2. Requirements Language........................................3 64 2. Conventions used in this document..............................3 65 2.1. Terminology..................................................3 66 3. Solution Overview..............................................4 67 3.1. Mapping source specific trees onto the segment routing 68 architecture......................................................5 69 3.2. Role of the Routing System...................................6 70 3.3. MDT Construction Requirements................................6 71 3.4. Pruning - theory of operation................................6 72 4. Elements of Procedure..........................................7 73 4.1. Triggers for Computation.....................................7 74 4.2. FIB Determination............................................7 75 4.2.1. Information in the IGP.....................................7 76 4.2.2. Computation of individual segments.........................8 77 4.3. FIB Generation..............................................11 78 4.4. FIB installation............................................11 79 5. Related work..................................................12 80 5.1. IGP Extensions..............................................12 81 5.2. BGP Extensions..............................................12 82 6. Observations..................................................12 83 7. Acknowledgements..............................................13 84 8. Security Considerations.......................................13 85 9. IANA Considerations...........................................13 86 10. References...................................................13 87 10.1. Normative References.......................................13 88 10.2. Informative References.....................................13 89 11. Authors' Addresses...........................................14 91 1. Introduction 93 This memo describes a solution for multicast for Segment Routing with 94 MPLS data plane in which source specific multicast distribution trees 95 (MDTs) are computed from information distributed via an IGP. 96 Computation can use information in the IGP to determine if a given 97 node in the network has a role as a root, leaf or replication point 98 in a given MDT. Unicast tunnels are employed to interconnect the 99 nodes determined to have a role. Therefore state only need be 100 installed in nodes that have one of these three roles to fully 101 instantiate an MDT. 102 Although this approach is computationally intensive, a significant 103 amount of computation can be avoided when the computing agent 104 determines that the node it is computing for has no role in a given 105 MDT. This permits a computed approach to multicast convergence to be 106 computationally tractable. 107 1.1. Authors 109 David Allan, Jeff Tantsura 111 1.2. Requirements Language 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 115 document are to be interpreted as described in RFC2119 [RFC2119]. 117 2. Changes from the last version 119 Clarifications in the pruning and simplification algorithm 121 3. Conventions used in this document 123 3.1. Terminology 125 Candidate replication point (CRP) - is a node that potentially needs 126 to install state to replicate multicast traffic as determined at an 127 intermediate step in multicast segment computation. It will either 128 resolve to having no role or a role as a replication point once 129 multicast has converged. 131 Candidate role - refers to any potential combination of roles on a 132 given multicast segment as determined at some intermediate step in 133 MDT computation. For example, a node with a candidate role may be a 134 leaf and may be a candidate replication point. 136 Downstream - refers to the direction along the shortest path to one 137 or more leaves for a given multicast distribution tree 139 Multicast convergence - is when all computation and state 140 installation to ensure the FIB reflects the multicast information in 141 the IGP is complete. 143 MDT - multicast distribution tree. Is a tree composed of one or more 144 multicast segments. 146 Multicast segment - is a portion of the multicast tree where only the 147 root and the leaves have been specified, and computation based upon 148 the current state of the IGP database is employed to determine and 149 install the required state to implement the segment. For MPLS a 150 multicast segment is implemented as a p2mp LSP. A multicast segment 151 is identified by a multicast SID. 153 Multicast SID - Is the data plane identifier that is used to 154 implement a multicast segment. As per a unicast MPLS segment, the 155 rightmost 20 bits of a multicast SID is encoded as a label. It is 156 drawn from an SRGB that is global to the SR domain. 158 Pinned path - Is a unique shortest path extending from a leaf 159 upstream towards the root for a given multicast segment. Therefore is 160 a component of the multicast segment that it has been determined must 161 be there. It will not necessarily extend from the leaf all the way to 162 the root during intermediate computation steps. A pinned path can 163 result from pruning operations. 165 Role - refers specifically to a node that is either a root, a leaf, a 166 replication node, or a pinned waypoint for a given MDT. 168 Unicast convergence - is when all computation and state installation 169 to ensure the FIB reflects the unicast information in the IGP is 170 complete. 172 Upstream - refers to the direction along the shortest path to the 173 root of a given MDT. 175 4. Solution Overview 177 This memo describes a multicast architecture in which multicast state 178 is only installed in those nodes that have roles as a root, leaves, 179 and replication points for a given multicast segment. The a-priori 180 established segment routing unicast tunnels are used as interconnect 181 between the nodes that have a role in a given multicast SID. 183 A loosely specified MDT is composed of a single multicast segment and 184 the routing of the MDT is delegated entirely to computation driven by 185 information in the IGP database. 187 Explicitly routed MDTs are expressed as a tree of concatenated 188 multicast segments where both the leaves of each segment and the 189 waypoints coupling a given segment to the upstream and/or downstream 190 segment(s) is specified in information flooded in the IGP by the 191 overall root of the MDT. The segments themselves will be computed as 192 per a loosely specified MDT. 194 A PE acting as an overall root for a given tree is expected to be 195 configured by the operator as to where to source multicast traffic 196 from, be it an attachment circuit, interworking function for client 197 technology or other. Similarly a leaf for a given tree is expected to 198 be configured by the operator as to the disposition of received 199 multicast traffic. 201 A computed segment is guaranteed to be loop free in a stable system. 202 A concatenation of segments to construct an MDT will similarly be 203 loop free as any collision of segments can be disambiguated in the 204 data plane via the SIDs. 206 This architecture significantly reduces the amount of state that 207 needs to be installed in the data plane to support multicast. This 208 also means that the impact of many failures in the network on 209 multicast traffic distribution will be recovered by unicast local 210 repair or unicast convergence with subsequent multicast convergence 211 acting in the role of network re-optimization (as opposed to 212 restoration). 214 4.1. Mapping source specific trees onto the segment routing architecture 216 A computed source specific tree for a given multicast group 217 corresponds to one or more multicast segments in the SR architecture. 218 Each multicast segment is assigned a SID, typically by management 219 configuration of the node that will be the overall root for the 220 source specific tree. The root node then uses the IGP to advertise 221 this information to all nodes in the IGP area/domain. 223 A multicast group is implemented as the set of source specific trees 224 from all nodes that have registered transmit interest to all nodes 225 that have registered receive interest in a multicast group. 227 4.2. Role of the Routing System 229 The role of the IGP is to communicate topology information, multicast 230 capability and associated algorithm, multicast registrations, unicast 231 to SID bindings, multicast to SID bindings and waypoints in multi- 232 segment MDTs. No changes to topology or unicast to SID binding 233 advertisements are proposed by this memo. 235 The multicast registrations/bindings will be in the form of source, 236 group, transmit/receive interest and the SID to use for the source 237 specific multicast tree. Registrations are originated by any node 238 that has send or receive interest in a given multicast group. Nodes 239 will use the combination of topology and multicast registrations to 240 determine the nodes that have a role in each source specific tree and 241 the SID information to then derive the required FIB state. 243 4.3. MDT Construction Requirements 245 A multicast segment in an MDT is constructed such that between any 246 pair of nodes that have a role in the segment and are connected by a 247 unicast tunnel, there is not another node on the shortest path 248 between the two with a role in that segment. This ensures that copies 249 of a packet forwarded by an multicast segment will traverse a link 250 only once in a stable system. 252 Note that this can be satisfied by a minimum cost shortest path tree, 253 but is not an absolute requirement. The pruning rules specified in 254 this memo will meet this requirement without necessarily producing 255 absolutely minimum cost multicast segment (or incurring the 256 associated computational cost). 258 4.4. Pruning - theory of operation 260 The role of nodes in a given multicast segment is determined by first 261 producing an inclusive shortest path tree with all possible paths 262 between the root and leaves, and then applying a set of pruning rules 263 repeatedly until an acyclic tree is produced or no further prunes are 264 possible. 266 For the majority of multicast segments these rules will 267 authoritatively produce a minimum cost tree. For those segments that 268 have not yet been authoritatively resolved, there is a set of pruning 269 operations applied that are not guaranteed to produce a tree that 270 meets the requirements of 3.3, therefore these trees require auditing 271 and potential correction according to a further set of agreed rules. 272 This avoids the necessity of an exhaustive search of the solution 273 space. 275 A node during computation of a segment may conclude that it will 276 absolutely not have a role at any of numerous points in the 277 computation process and abandon computation of that segment. 279 5. Elements of Procedure 281 5.1. Triggers for Computation 283 MDT computation is triggered by changes to the IGP database. These 284 are in the form of either changes in registered multicast group 285 interest, addition or removal of a multi-segment MDT descriptor, or 286 topology changes. 288 A change in registered interest for a group will require re- 289 computation of all MDTs that implement the multicast group. 291 A topology change will require the computation of some number of 292 multicast segments, the actual number will depend on the 293 implementation of tree computation but at a minimum will be all trees 294 for which there is not an optimal shortest path solution as a result 295 of the topology change. 297 5.2. FIB Determination 299 5.2.1. Information in the IGP 301 Group membership information for a multicast segment is obtained from 302 the IGP. This is true for single segment MDTs as well as multi- 303 segment MDTs. Included in the multi-segment MDT specification is the 304 waypoint nodes in MDT and the upstream and downstream SIDs. The 305 specified node is expected to cross connect the SIDs to join the 306 segments together acting in the role of leaf for the upstream segment 307 and root for the downstream segment. 309 When a waypoint in an MDT descriptor does not exist in the IGP, the 310 assumption is that the node identified by the waypoint SID has 311 failed. The response of the other nodes in the system in FIB 312 determination is to add the leaves of the downstream segment to the 313 upstream segment. 315 An example of this would be consider a node "x", and another node 316 "y". At some point in time, "x" advertises a tree that identifies "y" 317 as a waypoint that cross connects upstream SID "a" to downstream SID 318 "b". At some later point node "y" fails. The other nodes in the 319 network will compute segment "a" as if it included all leaves and 320 waypoints in segment "b". All apriori state installed for segment "b" 321 would be removed as the failure of "y" has required "b" to be 322 subsumed by "a". 324 5.2.2. Computation of individual segments 326 FIB generation for a multicast segment is the result of computation, 327 ultimately as applied to all source specific trees in the network. 328 All computing nodes implement a common algorithm for tree generation, 329 as all MUST agree on the solution. 331 One algorithm is as follows: 333 All possible shortest paths to the set of leaves for the MDT is 334 determined. Then pruning rules are repeatedly applied until no 335 further prunes are possible. 337 The philosophy of the application of these rules could be expressed 338 as "simplify as much as possible, and prune that which cannot be". 339 The rules are: 341 1) Eliminate any links and nodes not on a potential shortest path 342 from the root to the leaves for the MDT under consideration. 344 2) Simplify via the replacement of any nodes that do not have a 345 potential role in the MDT with links. 347 This will be nodes that are not a leaf, a root or a candidate 348 replication point. For example: 350 Root---------A----------B 352 B is a leaf. A is not but is in a potential shortest path from root 353 to B. However A will have no role in the MDT that serves B as it 354 provides simple transit therefore is replaced with a direct 355 connection between the root and B. 357 Root--------------------B 359 Note that such pruning also needs to avoid the creation of 360 duplicate parallel links. For example: 362 /----------A----------\ 364 Root B 366 \----------C----------/ 368 Where A and C have no role and the cost root-A-B = cost root-C-B, 369 they can be replaced with a single link from Root to B. 371 3) Simplify via the elimination of fewer hop paths 373 When for a given set of leaves, a node has multiple downstream 374 links that converge on a common downstream point, and that set of 375 leaves is only a subset of the leaves reachable on one or more of 376 the links, any link that only serves that subset of leaves can be 377 pruned. 379 For example: 381 --A---------------------------B 383 \ / 385 -----------C----------- 387 \ 389 ----D 391 Link AB is cost 2, link AC and CB are cost 1 (cost of link CD does 392 not affect the example). 394 B and D are leaves of a root upstream of A. From A, link AB can 395 reach leaf B. Path AC can reach leaf B and D. In this case path A-B 396 can be pruned from consideration. The set of leaves reachable via 397 link A-B is a subset of that reachable by A-C, and the paths from A 398 that serves that subset converges at B. 400 4) Prune upstream links. 402 The normal procedure is to determine the closest upstream leaf or 403 pinned path and then compare all upstream adjacencies with that 404 metric 406 a. If the upstream adjacency extends closer to the root than 407 the closest leaf or pinned path, then that adjacency can be 408 pruned. 410 b. If the upstream adjacency extends the same distance towards 411 the root then 413 i. If it is to a non-leaf or pinned path candidate 414 replication point, it can be pruned 416 ii. If it is to a pinned path, where there are equal upstream 417 adjacencies that terminate on leaves, it can be pruned 418 (considered inferior). 420 iii. If there is more than one "equal" upstream adjacency, 421 that is all terminate on nodes that are on pinned paths, 422 or all terminate on nodes that are leaves, than one is 423 selected. This is via the lowest node ID. 425 c. If the upstream adjacency is a candidate replication point 426 closer than the closest leaf, and upstream from it is a node 427 that is a leaf or pinned path equidistant with the closest 428 leaf, then all adjacencies that extend to leaves ranked lower 429 than the leaf or pinned path behind the CRP may be pruned. 430 Note that an upstream adjacency that has a CRP closer than 431 the closest leaf or pinned path cannot be pruned. 433 d. When for a given node all possible upstream adjacencies that 434 can be pruned have been identified, each is removed, and any 435 simplifications that can be performed as a result of the 436 prune are performed. This is the equivalent of a localized 437 check for 2 and 3 above and is then performed iteratively in 438 response to changes to the graph as a result of pruning. 440 The procedure is to implement 1, 2 and 3 above, then loop on 4 until 441 such time as the MDT is fully resolved, or no further prunes are 442 possible. Step 4 is performed in a specific order. The nodes are 443 processed according to a ranking from closest to the root to the 444 farthest, and from lowest node ID to the highest within a given 445 distance from the root. 447 If, at the end of pruning and simplification, all leaves in a 448 multicast segment have a unique shortest path to the root, the tree 449 is considered resolved, and the computation can progress directly to 450 the FIB generation step. 452 If not all leaves have a unique shortest path, additional pruning 453 steps are applied. These steps are NOT guaranteed to produce a lowest 454 cost tree, and therefore require an additional audit and possible 455 modification to ensure when forwarding a maximum of one copy of a 456 packet will traverse an interface. 458 For segments not authoritatively resolved by the above rules, a prune 459 that will not authoritatively result in a minimum cost tree is 460 applied. For the purpose of interoperability, the following rule is 461 proposed: A computing node will select the closest node to the root 462 with a candidate role that does not have a unique shortest path to 463 the root. Where more than one such node exists, the one with the 464 lowest unicast SID is selected. For that node, the best upstream link 465 is selected and all other upstream links pruned. The best upstream 466 link is defined as the link with the closest node with a candidate 467 role that potentially serves the highest number of leaves. Where 468 there is a tie, once again the node with the lowest SID is selected. 470 Once the links have been pruned, rules 2 through 4 are repeatedly 471 applied until either the tree is fully resolved, or again no further 472 prunes are possible, in which case the next closest remaining 473 unresolved node has the same prune applied. 475 For all segments not resolved by the initial prune rules, they are 476 audited to ensure all nodes that have a role in the tree do not have 477 a node with a role between them and their upstream node on the tree. 478 If they do, the old upstream adjacency is removed, and the superior 479 one added. 481 5.3. FIB Generation 483 The topology components that remain at the end of the pruning 484 operation will reflect all nodes that have a role in a given 485 multicast segment plus the necessary tunnels (as all intervening 486 multi-path scenarios will have been simplified away). From this the 487 FIB can be generated: 489 All nodes that have a role in a given multicast segment and have 490 nodes upstream in the segment will need to accept the SID for the MDT 491 from at minimum, all upstream interfaces. 493 All nodes that have a role in a given segment and have nodes 494 immediately downstream in the segment will need to replicate packets 495 simply labelled with the multicast SID onto those interfaces. 497 All nodes that have a role in a given segment and have nodes 498 reachable via a tunnel downstream set the FIB to push the tunnel 499 unicast SID for the downstream node onto any replicated copies of a 500 received packet, and identify the set of interfaces on the shortest 501 path for the tunnel SID. 503 5.4. FIB installation 505 FIB installation needs to acknowledge two aspects of the hybrid 506 tunnel and role model of multicast tree construction. The first is 507 that because of the sparse state model simple tree adds, moves, and 508 changes may require the installation of state where it did not 509 previously exist, and such changes may impact existing services. The 510 second is that it is possible to retain the knowledge to prioritize 511 computation of those trees impacted the failure of a node with a 512 role. 514 To address this, there are three stages of state installation for 515 multicast convergence: 517 1) Immediate: 519 a. Installation of state for multicast segments impacted by the 520 failure of a node in the network, and installation of state 521 for segments in nodes that have not previously had a role in 522 the given segment. 524 b. Installation of state for waypoints in multi-segment MDTs. 526 2) After T1: Update state for nodes that both had and have a role in 527 a given multicast segment. 529 3) After T2: Removal of state for nodes that transition from having a 530 role to not having a role for a given multicast segment. 532 T1 and T2 are network wide configurable values. 534 6. Related work 536 6.1. IGP Extensions 538 The required IGP changes are documented in [MCAST-ISIS] and [MCAST- 539 OPSF]. 541 6.2. BGP Extensions 543 This memo will require the specification of a new PMSI Tunnel 544 Attribute (SPRING P2MP tunnel, tentatively 0x09) to order to 545 integrate into the multicast framework documented in RFC 6514 547 7. Observations 549 This technique is not confined to segment routing, and with the 550 provision of a global label space (to be employed as per a multicast 551 SID), an MPLS-LDP network would also provide the requisite mesh of 552 unicast tunnels and be capable of implementing this approach to 553 multicast. 555 This memo focuses on an implementation based upon nodes that are IGP 556 speakers and converge independently so is written in a form that 557 assumes a node, computing node and IGP speaker are one in the same. 558 It should be observed that the relative frugality of data plane state 559 would suggest that separation of computation from nodes in the data 560 plane combined with management or "software defined networking" based 561 population of the multicast FIB entries may also be useful modes of 562 network operation. 564 8. Acknowledgements 566 Thanks to Uma Chunduri for his detailed review and suggestions. 568 9. Security Considerations 570 For a future version of this document. 572 10. IANA Considerations 574 This document requires the allocation of a PMSI tunnel type to 575 identify a SPRING P2MP tunnel type from the P-Multicast Service 576 Interface Tunnel (PMSI Tunnel) Tunnel Types registry. 578 11. References 580 11.1. Normative References 582 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 583 Requirement Levels", BCP 14, RFC 2119, March 1997. 585 11.2. Informative References 587 [MCAST-ISIS] Allan et.al., "IS-IS extensions for Computed Multicast 588 applied to MPLS based Segment Routing", IETF work in progress, 589 draft-allan-isis-spring-multicast-00, July 2016 591 [MCAST-OSPF] Allan et.al., "OSPF extensions for Computed Multicast 592 applied to MPLS based Segment Routing", IETF work in progress, 593 draft-allan-ospf-spring-multicast-00, July 2016 595 [RFC6514] Aggarwal et.al., "BGP Encodings and Procedures for Multicast 596 in MPLS/BGP IP VPNs", IETF RFC 6514, February 2012 598 [RFC7385] Andersson & Swallow "IANA Registry for P-Multicast Service 599 Interface (PMSI) Tunnel Type Code Points", IETF RFC 7385, 600 October 2014 602 12. Authors' Addresses 604 Dave Allan (editor) 605 Ericsson 606 300 Holger Way 607 San Jose, CA 95134 608 USA 609 Email: david.i.allan@ericsson.com 611 Jeff Tantsura 612 Email: jefftant.ietf@gmail.com