idnits 2.17.1 draft-allan-spring-mpls-multicast-framework-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 2016) is 2751 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'MCAST-OPSF' is mentioned on line 539, but not defined == Unused Reference: 'MCAST-OSPF' is defined on line 592, but no explicit reference was found in the text == Unused Reference: 'RFC6514' is defined on line 596, but no explicit reference was found in the text == Unused Reference: 'RFC7385' is defined on line 599, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 SPRING Working Group Dave Allan 2 Internet Draft Ericsson 3 Intended status: Standards Track Jeff Tantsura 4 Expires: March 2017 6 September 2016 8 A Framework for Computed Multicast applied to MPLS based Segment 9 Routing 10 draft-allan-spring-mpls-multicast-framework-02 12 Abstract 14 This document describes a multicast solution for Segment Routing with 15 MPLS data plane. It is consistent with the Segment Routing 16 architecture in that an IGP is augmented to distribute information in 17 addition to the link state. In this solution it is multicast group 18 membership information sufficient to synchronize state in a given 19 network domain. Computation is employed to determine the topology of 20 any loosely specified multicast distribution tree. 22 Status of this Memo 24 This Internet-Draft is submitted to IETF in full conformance 25 with the provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet 28 Engineering Task Force (IETF), its areas, and its working 29 groups. Note that other groups may also distribute working 30 documents as Internet-Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six 33 months and may be updated, replaced, or obsoleted by other 34 documents at any time. It is inappropriate to use Internet- 35 Drafts as reference material or to cite them other than as "work 36 in progress". 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on March 2017. 46 Copyright and License Notice 47 Copyright (c) 2016 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with 55 respect to this document. Code Components extracted from this 56 document must include Simplified BSD License text as described 57 in Section 4.e of the Trust Legal Provisions and are provided 58 without warranty as described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction...................................................3 63 1.1. Authors......................................................3 64 1.2. Requirements Language........................................3 65 2. Conventions used in this document..............................3 66 2.1. Terminology..................................................3 67 3. Solution Overview..............................................4 68 3.1. Mapping source specific trees onto the segment routing 69 architecture......................................................5 70 3.2. Role of the Routing System...................................6 71 3.3. MDT Construction Requirements................................6 72 3.4. Pruning - theory of operation................................6 73 4. Elements of Procedure..........................................7 74 4.1. Triggers for Computation.....................................7 75 4.2. FIB Determination............................................7 76 4.2.1. Information in the IGP.....................................7 77 4.2.2. Computation of individual segments.........................8 78 4.3. FIB Generation..............................................11 79 4.4. FIB installation............................................11 80 5. Related work..................................................12 81 5.1. IGP Extensions..............................................12 82 5.2. BGP Extensions..............................................12 83 6. Observations..................................................12 84 7. Acknowledgements..............................................13 85 8. Security Considerations.......................................13 86 9. IANA Considerations...........................................13 87 10. References...................................................13 88 10.1. Normative References.......................................13 89 10.2. Informative References.....................................13 90 11. Authors' Addresses...........................................14 92 1. Introduction 94 This memo describes a solution for multicast for Segment Routing with 95 MPLS data plane in which source specific multicast distribution trees 96 (MDTs) are computed from information distributed via an IGP. 97 Computation can use information in the IGP to determine if a given 98 node in the network has a role as a root, leaf or replication point 99 in a given MDT. Unicast tunnels are employed to interconnect the 100 nodes determined to have a role. Therefore state only need be 101 installed in nodes that have one of these three roles to fully 102 instantiate an MDT. 103 Although this approach is computationally intensive, a significant 104 amount of computation can be avoided when the computing agent 105 determines that the node it is computing for has no role in a given 106 MDT. This permits a computed approach to multicast convergence to be 107 computationally tractable. 108 1.1. Authors 110 David Allan, Jeff Tantsura 112 1.2. Requirements Language 114 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 116 document are to be interpreted as described in RFC2119 [RFC2119]. 118 2. Changes from the last version 120 Clarifications in the pruning and simplification algorithm 122 3. Conventions used in this document 124 3.1. Terminology 126 Candidate replication point (CRP) - is a node that potentially needs 127 to install state to replicate multicast traffic as determined at an 128 intermediate step in multicast segment computation. It will either 129 resolve to having no role or a role as a replication point once 130 multicast has converged. 132 Candidate role - refers to any potential combination of roles on a 133 given multicast segment as determined at some intermediate step in 134 MDT computation. For example, a node with a candidate role may be a 135 leaf and may be a candidate replication point. 137 Downstream - refers to the direction along the shortest path to one 138 or more leaves for a given multicast distribution tree 140 Multicast convergence - is when all computation and state 141 installation to ensure the FIB reflects the multicast information in 142 the IGP is complete. 144 MDT - multicast distribution tree. Is a tree composed of one or more 145 multicast segments. 147 Multicast segment - is a portion of the multicast tree where only the 148 root and the leaves have been specified, and computation based upon 149 the current state of the IGP database is employed to determine and 150 install the required state to implement the segment. For MPLS a 151 multicast segment is implemented as a p2mp LSP. A multicast segment 152 is identified by a multicast SID. 154 Multicast SID - Is the data plane identifier that is used to 155 implement a multicast segment. As per a unicast MPLS segment, the 156 rightmost 20 bits of a multicast SID is encoded as a label. It is 157 drawn from an SRGB that is global to the SR domain. 159 Pinned path - Is a unique shortest path extending from a leaf 160 upstream towards the root for a given multicast segment. Therefore is 161 a component of the multicast segment that it has been determined must 162 be there. It will not necessarily extend from the leaf all the way to 163 the root during intermediate computation steps. A pinned path can 164 result from pruning operations. 166 Role - refers specifically to a node that is either a root, a leaf, a 167 replication node, or a pinned waypoint for a given MDT. 169 Unicast convergence - is when all computation and state installation 170 to ensure the FIB reflects the unicast information in the IGP is 171 complete. 173 Upstream - refers to the direction along the shortest path to the 174 root of a given MDT. 176 4. Solution Overview 178 This memo describes a multicast architecture in which multicast state 179 is only installed in those nodes that have roles as a root, leaves, 180 and replication points for a given multicast segment. The a-priori 181 established segment routing unicast tunnels are used as interconnect 182 between the nodes that have a role in a given multicast SID. 184 A loosely specified MDT is composed of a single multicast segment and 185 the routing of the MDT is delegated entirely to computation driven by 186 information in the IGP database. 188 Explicitly routed MDTs are expressed as a tree of concatenated 189 multicast segments where both the leaves of each segment and the 190 waypoints coupling a given segment to the upstream and/or downstream 191 segment(s) is specified in information flooded in the IGP by the 192 overall root of the MDT. The segments themselves will be computed as 193 per a loosely specified MDT. 195 A PE acting as an overall root for a given tree is expected to be 196 configured by the operator as to where to source multicast traffic 197 from, be it an attachment circuit, interworking function for client 198 technology or other. Similarly a leaf for a given tree is expected to 199 be configured by the operator as to the disposition of received 200 multicast traffic. 202 A computed segment is guaranteed to be loop free in a stable system. 203 A concatenation of segments to construct an MDT will similarly be 204 loop free as any collision of segments can be disambiguated in the 205 data plane via the SIDs. 207 This architecture significantly reduces the amount of state that 208 needs to be installed in the data plane to support multicast. This 209 also means that the impact of many failures in the network on 210 multicast traffic distribution will be recovered by unicast local 211 repair or unicast convergence with subsequent multicast convergence 212 acting in the role of network re-optimization (as opposed to 213 restoration). 215 4.1. Mapping source specific trees onto the segment routing architecture 217 A computed source specific tree for a given multicast group 218 corresponds to one or more multicast segments in the SR architecture. 219 Each multicast segment is assigned a SID, typically by management 220 configuration of the node that will be the overall root for the 221 source specific tree. The root node then uses the IGP to advertise 222 this information to all nodes in the IGP area/domain. 224 A multicast group is implemented as the set of source specific trees 225 from all nodes that have registered transmit interest to all nodes 226 that have registered receive interest in a multicast group. 228 4.2. Role of the Routing System 230 The role of the IGP is to communicate topology information, multicast 231 capability and associated algorithm, multicast registrations, unicast 232 to SID bindings, multicast to SID bindings and waypoints in multi- 233 segment MDTs. No changes to topology or unicast to SID binding 234 advertisements are proposed by this memo. 236 The multicast registrations/bindings will be in the form of source, 237 group, transmit/receive interest and the SID to use for the source 238 specific multicast tree. Registrations are originated by any node 239 that has send or receive interest in a given multicast group. Nodes 240 will use the combination of topology and multicast registrations to 241 determine the nodes that have a role in each source specific tree and 242 the SID information to then derive the required FIB state. 244 4.3. MDT Construction Requirements 246 A multicast segment in an MDT is constructed such that between any 247 pair of nodes that have a role in the segment and are connected by a 248 unicast tunnel, there is not another node on the shortest path 249 between the two with a role in that segment. This ensures that copies 250 of a packet forwarded by an multicast segment will traverse a link 251 only once in a stable system. 253 Note that this can be satisfied by a minimum cost shortest path tree, 254 but is not an absolute requirement. The pruning rules specified in 255 this memo will meet this requirement without necessarily producing 256 absolutely minimum cost multicast segment (or incurring the 257 associated computational cost). 259 4.4. Pruning - theory of operation 261 The role of nodes in a given multicast segment is determined by first 262 producing an inclusive shortest path tree with all possible paths 263 between the root and leaves, and then applying a set of pruning rules 264 repeatedly until an acyclic tree is produced or no further prunes are 265 possible. 267 For the majority of multicast segments these rules will 268 authoritatively produce a minimum cost tree. For those segments that 269 have not yet been authoritatively resolved, there is a set of pruning 270 operations applied that are not guaranteed to produce a tree that 271 meets the requirements of 3.3, therefore these trees require auditing 272 and potential correction according to a further set of agreed rules. 273 This avoids the necessity of an exhaustive search of the solution 274 space. 276 A node during computation of a segment may conclude that it will 277 absolutely not have a role at any of numerous points in the 278 computation process and abandon computation of that segment. 280 5. Elements of Procedure 282 5.1. Triggers for Computation 284 MDT computation is triggered by changes to the IGP database. These 285 are in the form of either changes in registered multicast group 286 interest, addition or removal of a multi-segment MDT descriptor, or 287 topology changes. 289 A change in registered interest for a group will require re- 290 computation of all MDTs that implement the multicast group. 292 A topology change will require the computation of some number of 293 multicast segments, the actual number will depend on the 294 implementation of tree computation but at a minimum will be all trees 295 for which there is not an optimal shortest path solution as a result 296 of the topology change. 298 5.2. FIB Determination 300 5.2.1. Information in the IGP 302 Group membership information for a multicast segment is obtained from 303 the IGP. This is true for single segment MDTs as well as multi- 304 segment MDTs. Included in the multi-segment MDT specification is the 305 waypoint nodes in MDT and the upstream and downstream SIDs. The 306 specified node is expected to cross connect the SIDs to join the 307 segments together acting in the role of leaf for the upstream segment 308 and root for the downstream segment. 310 When a waypoint in an MDT descriptor does not exist in the IGP, the 311 assumption is that the node identified by the waypoint SID has 312 failed. The response of the other nodes in the system in FIB 313 determination is to add the leaves of the downstream segment to the 314 upstream segment. 316 An example of this would be consider a node "x", and another node 317 "y". At some point in time, "x" advertises a tree that identifies "y" 318 as a waypoint that cross connects upstream SID "a" to downstream SID 319 "b". At some later point node "y" fails. The other nodes in the 320 network will compute segment "a" as if it included all leaves and 321 waypoints in segment "b". All apriori state installed for segment "b" 322 would be removed as the failure of "y" has required "b" to be 323 subsumed by "a". 325 5.2.2. Computation of individual segments 327 FIB generation for a multicast segment is the result of computation, 328 ultimately as applied to all source specific trees in the network. 329 All computing nodes implement a common algorithm for tree generation, 330 as all MUST agree on the solution. 332 One algorithm is as follows: 334 All possible shortest paths to the set of leaves for the MDT is 335 determined. Then pruning rules are repeatedly applied until no 336 further prunes are possible. 338 The philosophy of the application of these rules could be expressed 339 as "simplify as much as possible, and prune that which cannot be". 340 The rules are: 342 1) Eliminate any links and nodes not on a potential shortest path 343 from the root to the leaves for the MDT under consideration. 345 2) Simplify via the replacement of any nodes that do not have a 346 potential role in the MDT with links. 348 This will be nodes that are not a leaf, a root or a candidate 349 replication point. For example: 351 Root---------A----------B 353 B is a leaf. A is not but is in a potential shortest path from root 354 to B. However A will have no role in the MDT that serves B as it 355 provides simple transit therefore is replaced with a direct 356 connection between the root and B. 358 Root--------------------B 360 Note that such pruning also needs to avoid the creation of 361 duplicate parallel links. For example: 363 /----------A----------\ 365 Root B 367 \----------C----------/ 369 Where A and C have no role and the cost root-A-B = cost root-C-B, 370 they can be replaced with a single link from Root to B. 372 3) Simplify via the elimination of fewer hop paths 374 When for a given set of leaves, a node has multiple downstream 375 links that converge on a common downstream point, and that set of 376 leaves is only a subset of the leaves reachable on one or more of 377 the links, any link that only serves that subset of leaves can be 378 pruned. 380 For example: 382 --A---------------------------B 384 \ / 386 -----------C----------- 388 \ 390 ----D 392 Link AB is cost 2, link AC and CB are cost 1 (cost of link CD does 393 not affect the example). 395 B and D are leaves of a root upstream of A. From A, link AB can 396 reach leaf B. Path AC can reach leaf B and D. In this case path A-B 397 can be pruned from consideration. The set of leaves reachable via 398 link A-B is a subset of that reachable by A-C, and the paths from A 399 that serves that subset converges at B. 401 4) Prune upstream links. 403 The normal procedure is to determine the closest upstream leaf or 404 pinned path and then compare all upstream adjacencies with that 405 metric 407 a. If the upstream adjacency extends closer to the root than 408 the closest leaf or pinned path, then that adjacency can be 409 pruned. 411 b. If the upstream adjacency extends the same distance towards 412 the root then 414 i. If it is to a non-leaf or pinned path candidate 415 replication point, it can be pruned 417 ii. If it is to a pinned path, where there are equal upstream 418 adjacencies that terminate on leaves, it can be pruned 419 (considered inferior). 421 iii. If there is more than one "equal" upstream adjacency, 422 that is all terminate on nodes that are on pinned paths, 423 or all terminate on nodes that are leaves, than one is 424 selected. This is via the lowest node ID. 426 c. If the upstream adjacency is a candidate replication point 427 closer than the closest leaf, and upstream from it is a node 428 that is a leaf or pinned path equidistant with the closest 429 leaf, then all adjacencies that extend to leaves ranked lower 430 than the leaf or pinned path behind the CRP may be pruned. 431 Note that an upstream adjacency that has a CRP closer than 432 the closest leaf or pinned path cannot be pruned. 434 d. When for a given node all possible upstream adjacencies that 435 can be pruned have been identified, each is removed, and any 436 simplifications that can be performed as a result of the 437 prune are performed. This is the equivalent of a localized 438 check for 2 and 3 above and is then performed iteratively in 439 response to changes to the graph as a result of pruning. 441 The procedure is to implement 1, 2 and 3 above, then loop on 4 until 442 such time as the MDT is fully resolved, or no further prunes are 443 possible. Step 4 is performed in a specific order. The nodes are 444 processed according to a ranking from closest to the root to the 445 farthest, and from lowest node ID to the highest within a given 446 distance from the root. 448 If, at the end of pruning and simplification, all leaves in a 449 multicast segment have a unique shortest path to the root, the tree 450 is considered resolved, and the computation can progress directly to 451 the FIB generation step. 453 If not all leaves have a unique shortest path, additional pruning 454 steps are applied. These steps are NOT guaranteed to produce a lowest 455 cost tree, and therefore require an additional audit and possible 456 modification to ensure when forwarding a maximum of one copy of a 457 packet will traverse an interface. 459 For segments not authoritatively resolved by the above rules, a prune 460 that will not authoritatively result in a minimum cost tree is 461 applied. For the purpose of interoperability, the following rule is 462 proposed: A computing node will select the closest node to the root 463 with a candidate role that does not have a unique shortest path to 464 the root. Where more than one such node exists, the one with the 465 lowest unicast SID is selected. For that node, the best upstream link 466 is selected and all other upstream links pruned. The best upstream 467 link is defined as the link with the closest node with a candidate 468 role that potentially serves the highest number of leaves. Where 469 there is a tie, once again the node with the lowest SID is selected. 471 Once the links have been pruned, rules 2 through 4 are repeatedly 472 applied until either the tree is fully resolved, or again no further 473 prunes are possible, in which case the next closest remaining 474 unresolved node has the same prune applied. 476 For all segments not resolved by the initial prune rules, they are 477 audited to ensure all nodes that have a role in the tree do not have 478 a node with a role between them and their upstream node on the tree. 479 If they do, the old upstream adjacency is removed, and the superior 480 one added. 482 5.3. FIB Generation 484 The topology components that remain at the end of the pruning 485 operation will reflect all nodes that have a role in a given 486 multicast segment plus the necessary tunnels (as all intervening 487 multi-path scenarios will have been simplified away). From this the 488 FIB can be generated: 490 All nodes that have a role in a given multicast segment and have 491 nodes upstream in the segment will need to accept the SID for the MDT 492 from at minimum, all upstream interfaces. 494 All nodes that have a role in a given segment and have nodes 495 immediately downstream in the segment will need to replicate packets 496 simply labelled with the multicast SID onto those interfaces. 498 All nodes that have a role in a given segment and have nodes 499 reachable via a tunnel downstream set the FIB to push the tunnel 500 unicast SID for the downstream node onto any replicated copies of a 501 received packet, and identify the set of interfaces on the shortest 502 path for the tunnel SID. 504 5.4. FIB installation 506 FIB installation needs to acknowledge two aspects of the hybrid 507 tunnel and role model of multicast tree construction. The first is 508 that because of the sparse state model simple tree adds, moves, and 509 changes may require the installation of state where it did not 510 previously exist, and such changes may impact existing services. The 511 second is that it is possible to retain the knowledge to prioritize 512 computation of those trees impacted the failure of a node with a 513 role. 515 To address this, there are three stages of state installation for 516 multicast convergence: 518 1) Immediate: 520 a. Installation of state for multicast segments impacted by the 521 failure of a node in the network, and installation of state 522 for segments in nodes that have not previously had a role in 523 the given segment. 525 b. Installation of state for waypoints in multi-segment MDTs. 527 2) After T1: Update state for nodes that both had and have a role in 528 a given multicast segment. 530 3) After T2: Removal of state for nodes that transition from having a 531 role to not having a role for a given multicast segment. 533 T1 and T2 are network wide configurable values. 535 6. Related work 537 6.1. IGP Extensions 539 The required IGP changes are documented in [MCAST-ISIS] and [MCAST- 540 OPSF]. 542 6.2. BGP Extensions 544 This memo will require the specification of a new PMSI Tunnel 545 Attribute (SPRING P2MP tunnel, tentatively 0x09) to order to 546 integrate into the multicast framework documented in RFC 6514 548 7. Observations 550 This technique is not confined to segment routing, and with the 551 provision of a global label space (to be employed as per a multicast 552 SID), an MPLS-LDP network would also provide the requisite mesh of 553 unicast tunnels and be capable of implementing this approach to 554 multicast. 556 This memo focuses on an implementation based upon nodes that are IGP 557 speakers and converge independently so is written in a form that 558 assumes a node, computing node and IGP speaker are one in the same. 559 It should be observed that the relative frugality of data plane state 560 would suggest that separation of computation from nodes in the data 561 plane combined with management or "software defined networking" based 562 population of the multicast FIB entries may also be useful modes of 563 network operation. 565 8. Acknowledgements 567 Thanks to Uma Chunduri for his detailed review and suggestions. 569 9. Security Considerations 571 For a future version of this document. 573 10. IANA Considerations 575 This document requires the allocation of a PMSI tunnel type to 576 identify a SPRING P2MP tunnel type from the P-Multicast Service 577 Interface Tunnel (PMSI Tunnel) Tunnel Types registry. 579 11. References 581 11.1. Normative References 583 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 584 Requirement Levels", BCP 14, RFC 2119, March 1997. 586 11.2. Informative References 588 [MCAST-ISIS] Allan et.al., "IS-IS extensions for Computed Multicast 589 applied to MPLS based Segment Routing", IETF work in progress, 590 draft-allan-isis-spring-multicast-00, July 2016 592 [MCAST-OSPF] Allan et.al., "OSPF extensions for Computed Multicast 593 applied to MPLS based Segment Routing", IETF work in progress, 594 draft-allan-ospf-spring-multicast-00, July 2016 596 [RFC6514] Aggarwal et.al., "BGP Encodings and Procedures for Multicast 597 in MPLS/BGP IP VPNs", IETF RFC 6514, February 2012 599 [RFC7385] Andersson & Swallow "IANA Registry for P-Multicast Service 600 Interface (PMSI) Tunnel Type Code Points", IETF RFC 7385, 601 October 2014 603 12. Authors' Addresses 605 Dave Allan (editor) 606 Ericsson 607 300 Holger Way 608 San Jose, CA 95134 609 USA 610 Email: david.i.allan@ericsson.com 612 Jeff Tantsura 613 Email: jefftant.ietf@gmail.com