idnits 2.17.1 draft-ietf-rtgwg-mofrr-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 14, 2014) is 3634 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 4601 (Obsoleted by RFC 7761) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Karan 3 Internet-Draft C. Filsfils 4 Intended status: Informational Cisco Systems, Inc. 5 Expires: November 15, 2014 D. Farinacci 6 lispers.net 7 IJ. Wijnands, Ed. 8 Cisco Systems, Inc. 9 B. Decraene 10 Orange 11 U. Joorde 12 Deutsche Telekom 13 W. Henderickx 14 Alcatel-Lucent 15 May 14, 2014 17 Multicast only Fast Re-Route 18 draft-ietf-rtgwg-mofrr-04 20 Abstract 22 As IPTV deployments grow in number and size, service providers are 23 looking for solutions that minimize the service disruption due to 24 faults in the IP network carrying the packets for these services. 25 This draft describes a mechanism for minimizing packet loss in a 26 network when node or link failures occur. Multicast only Fast Re- 27 Route (MoFRR) works by making simple enhancements to multicast 28 routing protocols such as PIM and mLDP. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on November 15, 2014. 47 Copyright Notice 49 Copyright (c) 2014 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 1.1. Conventions used in this document . . . . . . . . . . . . 3 66 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 67 2. Basic Overview . . . . . . . . . . . . . . . . . . . . . . . 4 68 3. Determination of the secondary UMH . . . . . . . . . . . . . 5 69 3.1. ECMP-mode MoFRR . . . . . . . . . . . . . . . . . . . . . 5 70 3.2. Non-ECMP-mode MoFRR . . . . . . . . . . . . . . . . . . . 5 71 4. Upstream Multicast Hop Selection . . . . . . . . . . . . . . 5 72 4.1. PIM . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 73 4.2. mLDP . . . . . . . . . . . . . . . . . . . . . . . . . . 6 74 5. Detecting Failures . . . . . . . . . . . . . . . . . . . . . 6 75 6. MoFRR applicability . . . . . . . . . . . . . . . . . . . . . 7 76 6.1. Dual-Plane Topology . . . . . . . . . . . . . . . . . . . 7 77 6.2. Capacity Planning for MoFRR . . . . . . . . . . . . . . . 10 78 6.3. PE nodes . . . . . . . . . . . . . . . . . . . . . . . . 11 79 6.4. Other Applications . . . . . . . . . . . . . . . . . . . 11 80 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 81 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12 82 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 83 10. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 12 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 85 11.1. Normative References . . . . . . . . . . . . . . . . . . 12 86 11.2. Informative References . . . . . . . . . . . . . . . . . 12 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 89 1. Introduction 91 Different solutions have been developed and deployed to improve 92 service guarantees, both for multicast video traffic and Video on 93 Demand traffic. Most of these solutions are geared towards finding 94 an alternate path around one or more failed network elements (link, 95 node, path failures). 97 This draft describes a mechanism for minimizing packet loss in a 98 network when node or link failures occur. Multicast only Fast Re- 99 Route (MoFRR) works by making simple changes to the way selected 100 routers use multicast protocols such as PIM and mLDP. No changes to 101 the protocols themselves are required. With MoFRR, in many cases, 102 multicast routing protocols don't necessarily have to depend on or 103 have to wait on unicast routing protocols to detect network failures, 104 see Section 5 106 On a Merge Point MoFRR logic determines a primary Upstream Multicast 107 Hop (UMH) and a secondary UMH and joins the tree via both 108 simultaneously. Data packets are received over the primary and 109 secondary paths. Only the packets from the primary UMH are accepted 110 and forwarded down the tree, the packets from the secondary UMH are 111 discarded. The UMH determination is different for PIM and mLDP and 112 explained in Section 4. When a failure is detected on the path to 113 the primary UMH, the repair occurs by changing the secondary UMH into 114 the primary and the primary into the secondary. Since the repair is 115 local, it is fast - greatly improving convergence times in the event 116 of node or link failures on the path to the primary UMH. 118 1.1. Conventions used in this document 120 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 121 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 122 document are to be interpreted as described in RFC 2119 [RFC2119]. 124 1.2. Terminology 126 MoFRR: Multicast only Fast Re-Route. 128 ECMP: Equal Cost Multi-Path. 130 mLDP: Multi-point Label Distribution Protocol. 132 PIM: Protocol Independent Multicast. 134 UMH: Upstream Multicast Hop, a candidate next-hop that can be used 135 to reach the root of the tree. 137 tree: Either a PIM (S,G)/(*,G) tree or a mLDP P2MP or MP2MP LSP. 139 OIF: Outgoing InterFace, an interface used to forward multicast 140 packets down the tree towards the receivers. Either a PIM (S,G)/ 141 (*,G) tree or a mLDP P2MP or MP2MP LSP. 143 LFA: Loop Free Alternate as defined in [RFC5286]. In unicast Fast 144 ReRoute, this is an alternate next-hop which can be used to reach 145 a unicast destination without using the protected link or node. 147 Merge Point: A router that joins a multicast stream via two 148 divergent upstream paths. 150 RPF: Reverse Path Forwarding. 152 RP: Rendezvous Point. 154 LSR: Label Switched Router. 156 BFD: Bidirectional Forwarding Detection. 158 IGP: Interior Gateway Protocol. 160 MVPN: Multicast Virtual Private Networks. 162 2. Basic Overview 164 The basic idea of MoFRR is for a Merge Point router to join a 165 multicast tree via two divergent upstream paths in order to get 166 maximum redundancy. The determination of this alternate upstream is 167 defined in Section 3. 169 In order to maximize robustness against any failure, the two paths 170 should be as diverse as possible. Ideally, they should not merge 171 upstream. Sometimes the topology guarantees maximal redundancy, 172 other times additional configuration or techniques are needed to 173 enforce it. See Section 6 for more discussion on the applicability 174 of MoFRR depending on the network topology. 176 A Merge Point router should only accept and forward on one of the 177 upstream paths at a time in order to avoid duplicate packet 178 forwarding. The selection of the primary and secondary UMH is done 179 by the MoFRR logic and normally based on unicast routing to find loop 180 free candidates. This is described in Section 4. 182 Note, the impact of additional amount of data on the network is 183 mitigated when tree membership is densely populated. When a part of 184 the network has redundant data flowing, join latency for new joining 185 members is reduced because its likely a tree Merge Point is not far 186 away. 188 3. Determination of the secondary UMH 190 The secondary UMH is a Loop Free Alternate (LFA) as per [RFC5286]. 192 3.1. ECMP-mode MoFRR 194 If the IGP installs two ECMP paths to the source, then as per 195 [RFC5286] the LFA is a primary Next-hop. If the Multicast tree is 196 enabled for ECMP-Mode MoFRR, the router installs them as primary and 197 secondary UMH. Before the failure, only packets received from the 198 primary UMH path are processed while packets received from the 199 secondary UMH are dropped. 201 The selected primary UMH SHOULD be the same as if the MoFRR extension 202 was not enabled. 204 If more than two ECMP paths exist, one is selected as primary and 205 another one as secondary UMH. The selection of the primary and 206 secondary is a local decision. Information from the IGP link-state 207 topology could be leveraged to optimize this selection such that the 208 primary and secondary path are maximal divergent and don't lead to 209 the same upstream node. Note that MoFRR does not restrict the number 210 of UMH paths that are joined. Implementations may use as many paths 211 as are configured. 213 3.2. Non-ECMP-mode MoFRR 215 A router configured for non-ECMP-mode MoFRR for a Multicast tree 216 joins a primary path to its primary UMH and a secondary path to LFA 217 UMH. In order to prevent control-plane loops a router MUST stop 218 joining the secondary UMH if this UMH is the only member in the OIF 219 list. 221 To illustrate the reason for this rule, let's consider the example in 222 FIG3. If PE1 and PE2 have received an IGMP request for a Multicast 223 tree, they will both join the primary path on their plane and a 224 secondary path to the neighbor PE. If their receivers would leave at 225 the same time, it could be possible for the Multicast tree on PE1 and 226 PE2 to never get deleted as each PE refresh each other via the 227 secondary path joins (remember that a secondary path join is not 228 distinguishable from a primary join). 230 4. Upstream Multicast Hop Selection 232 An Upstream Multicast Hop (UMH) is a candidate next-hop that can be 233 used to reach the root of the tree. This is normally based on 234 unicast routing to find loop free candidate(s). With MoFRR 235 procedures we select a primary and a backup UMH. The procedures for 236 determining the UMH are different for PIM and mLDP. See below; 238 4.1. PIM 240 The UMH selection in PIM is also known as the Reverse Path Forwarding 241 (RPF) procedure. Based on a unicast route lookup on either the 242 Source address or Rendezvous Point (RP) [RFC4601], an upstream 243 interface is selected for sending the PIM Joins/Prunes AND accepting 244 the multicast packets. The interface the packets are received on is 245 used to pass or fail the RPF check. If packets are received on an 246 interface that was not selected by the RPF procedure, or not the 247 primary, the packets are discarded. 249 4.2. mLDP 251 The UMH selection in mLDP also depends on unicast routing, but the 252 difference with PIM is that the acceptance of multicast packets is 253 based on MPLS labels and independent of the interface the packet is 254 received on. Using the procedures as defined in [RFC6388] an 255 upstream Label Switched Router (LSR) is elected. The upstream LSR 256 that was elected for a Label Switched Path (LSP) gets a unique local 257 MPLS Label allocated. Multicast packets are only forwarded if the 258 MPLS label matches the MPLS label that was allocated for that LSPs 259 (primary) upstream LSR. 261 5. Detecting Failures 263 Once the two paths are established, the next step is detecting a 264 failure on the primary path to know when to switch to the backup 265 path. This is a local issue but this section explore some 266 possibilities. 268 The first (and simplest) option is to detect the failure of the local 269 interface as it it's done for unicast Fast ReRoute. Detection can be 270 performed using the loss of signal or the loss of probing packets 271 (e.g. BFD). This option can be used in combination with the other 272 options as documented below. Just like for unicast fast reroute, 273 50msec switch-over is possible. 275 A second option consists of comparing the packets received on the 276 primary and secondary streams but only forwarding one of them -- the 277 first one received, no matter which interface it is received on. 278 Zero packet loss is possible for RTP-based streams. 280 A third option assumes a minimum known packet rate for a given data 281 stream. If a packet is not received on the primary RPF within this 282 time frame, the router assumes primary path failure and switches to 283 the secondary RPF interface. 50msec switch-over may be possible for 284 high rate stream (e.g. IP TV where SD video has a continuous inter- 285 packet gap of ~ 3msec) but in general the delay is dependant on the 286 rate of the multicast stream. 288 A fourth option leverages the significant improvements of the IGP 289 convergence speed. When the primary path to the source is withdrawn 290 by the IGP, the MoFRR-enabled router switches over to the backup 291 path, the UMH is changed to the secondary UMH. Since the secondary 292 path is already in place, and assuming it is disjoint from the 293 primary path, convergence times would not include the time required 294 to build a new tree and hence are smaller. Sub-second to sub-200msec 295 switch-over should be possible. 297 6. MoFRR applicability 299 MoFRR applicability is topology dependent. The applicability is the 300 same as LFA FRR which is discussed in [RFC6571]. 302 The following section will discuss MoFRR applicability to dual-plane 303 network topologies. 305 6.1. Dual-Plane Topology 307 MoFRR works best in dual-planes topologies as illustrated in the 308 figures below. MoFRR may be enabled on any router in the network. 309 In the figures below, MoFRR is shown enabled on the Provider Edge 310 (PE) routers to illustrate one way in which the technology may be 311 deployed. 313 S 314 P / \ P 315 / \ 316 ^ G1 R1 ^ 317 P / \ P 318 / \ 319 G2----------R2 ^ 320 | \ | \ P 321 ^ | \ | \ 322 P | G3----------R3 323 | | | | 324 | | | | ^ 325 G4---|------R4 | P 326 ^ \ | \ | 327 P \ | \ | 328 G5----------R5 329 ^ | | ^ 330 P | | P 331 | | 332 Gi Ri 333 \ \__ ^ /| 334 \ \ S1/ | ^ 335 ^ \ ^\ / |P2 336 P1 \ S2\_/__ | 337 \ / \| 338 PE1 PE2 339 P = Primary path 340 S = Secondary path 342 FIG1. Two-Plane Network Design 344 The topology has two planes, a primary plane and a secondary plane 345 that are fully disjoint from each other all the way into the POPs. 346 This two plane design is common in service provider networks as it 347 eliminates single point of failures in their core network. The links 348 marked P indicate the normal path of how the PIM joins flow from the 349 POPs towards the source of the network. Multicast streams, 350 especially for the densely watched channels, typically flow along 351 both the planes in the network anyway. 353 The only change MoFRR adds to this is on the links marked S where the 354 PE routers join a secondary path to their secondary ECMP UMH. As a 355 result of this, each PE router receives two copies of the same 356 stream, one from the primary plane and the other from the secondary 357 plane. As a result of normal UMH behavior, the multicast stream 358 received over the primary path is accepted and forwarded to the 359 downstream receivers. The copy of the stream received from the 360 secondary UNH is discarded. 362 When a router detects a routing failure on the path to its primary 363 UMH, it will switch to the secondary UMH and accept packets for that 364 stream. If the failure is repaired the router may switch back. The 365 primary and secondary UMHs have only local context and not end-to-end 366 context. 368 As one can see, MoFRR achieves the faster convergence by pre-building 369 the secondary multicast tree and receiving the traffic on that 370 secondary path. The example discussed above is a simple case where 371 there are two ECMP paths from each PE device towards the source, one 372 along the primary plane and one along the secondary. In cases where 373 the topology is asymmetric or is a ring, this ECMP nature does not 374 hold, and additional rules have to be taken into account to choose 375 when and where to join the secondary path. 377 MoFRR is appealing in such topologies for the following reasons: 379 1. Ease of deployment and simplicity: the functionality is only 380 required on the PE devices although it may be configured on all 381 routers in the topology. Furthermore, each PE device can be 382 enabled separately, there is no need for a network wide 383 coordination in order to deploy MoFRR. Inter-operability testing 384 is not required as there are no PIM or mLDP protocol change. 386 2. End-to-end failure detection and recovery: any failure along the 387 path from the source to the PE can be detected and repaired with 388 the secondary disjoint stream.(see Section 5 options 2, 3, 4) 390 3. Capacity Efficiency: as illustrated in the previous example, the 391 Multicast trees corresponding to IPTV channels cover the backbone 392 and distribution topology in a very dense manner. As a 393 consequence, the secondary path graft into the normal Multicast 394 trees (ie. trees signaled by PIM or mLDP without MoFRR extension) 395 at the aggregation level and hence do not demand any extra 396 capacity either on the distribution links or in the backbone. 397 They simply use the capacity that is normally used, without any 398 duplication. This is different from conventional multicast FRR 399 mechanisms which often duplicate the capacity requirements when 400 the backup path crosses links/nodes which already carry the 401 primary/normal tree and hence twice as much capacity is required. 403 4. Loop free: the secondary path join is sent on an ECMP disjoint 404 path. By definition, the neighbor receiving this request is 405 closer to the source and hence will not cause a loop. 407 The topology we just analyzed is very frequent and can be modelled as 408 per Fig2. The PE has two ECMP disjoint paths to the source. Each 409 ECMP path uses a disjoint plane of the network. 411 Source 412 / \ 413 Plane1 Plane2 414 | | 415 A1 A2 416 \ / 417 PE 419 FIG2. PE is dual-homed to Dual-Plane Backbone 421 Another frequent topology is described in Fig 3. PEs are grouped by 422 pairs. In each pair, each PE is connected to a different plane. 423 Each PE has one single shortest-path to a source (via its connected 424 plane). There is no ECMP like in Fig 2. However, there is clearly a 425 way to provide MoFRR benefits as each PE can offer a disjoint 426 secondary path to the other plane PE (via the disjoint path). 428 MoFRR secondary neighbor selection process needs to be extended in 429 this case as one cannot simply rely on using an ECMP path as 430 secondary neighbor. This extension is referred to as non-ecmp 431 extension and is described in Section 3.2. 433 Source 434 / \ 435 Plane1 Plane2 436 | | 437 A1 A2 438 | | 439 PE1----PE2 441 FIG3. PEs are connected in pairs to Dual-Plane Backbone 443 6.2. Capacity Planning for MoFRR 445 The previous section has described two very frequent designs (Fig 2 446 and Fig 3) which provide maximum MoFRR benefits. 448 Designers with topologies different than Fig2 and 3 can still benefit 449 from MoFRR thanks to the use of capacity planning tools. 451 Such tools are able to simulate the ability of each PE to build two 452 disjoint branches of the same tree. This for hundreds of PEs and 453 hundreds of sources. 455 This allows to assess the MoFRR protection coverage of a given 456 network, for a set of sources. 458 If the protection coverage is deemed insufficient, the designer can 459 use such tool to optimize the topology (add links, change IGP 460 metrics). 462 6.3. PE nodes 464 Many Service Providers devise their topology such that PEs have 465 disjoint paths to the multicast sources. MoFRR leverages the 466 existence of these disjoint paths without any PIM or mLDP protocol 467 modification. Interoperability testing is thus not required. In 468 such topologies, MoFRR only needs to be deployed on the PE devices. 469 Each PE device can be enabled one by one. 471 6.4. Other Applications 473 While all the examples in this document show the MoFRR applicability 474 on PE devices, it is clear that MoFRR could be enabled on aggregation 475 or core routers. 477 MoFRR can be popular in Data Center network configurations. With the 478 advent of lower cost ethernet and increasing port density in routers, 479 there is more meshed connectivity than ever before. When using a 480 3-level access, distribution, and core layers in a Data Center, there 481 is a lot of inexpensive bandwidth connecting the layers. This will 482 lend itself to more opportunities for ECMP paths at multiple layers. 483 This allows for multiple layers of redundancy protecting link and 484 node failure at each layer with minimal redundancy cost. 486 Redundancy costs are reduced because only one packet is forwarded at 487 every link along the primary and secondary data paths so there is no 488 duplication of data on any link thereby providing make-before-break 489 protection at a very small cost. 491 The MoFRR principle may be applied to MVPNs. 493 7. IANA Considerations 495 This document makes no request of IANA. 497 8. Security Considerations 499 There are no security considerations for this design other than what 500 is already in the main PIM specification [RFC4601] and mLDP 501 specification [RFC6388]. 503 9. Acknowledgments 505 The authors would like to thank John Zwiebel, Greg Shepherd, Dave 506 Oran and Alvaro Retana for their review of the draft. 508 10. Contributor Addresses 510 Below is a list of other contributing authors in alphabetical order: 512 Nicolai Leymann 513 Deutsche Telekom 514 Winterfeldtstrasse 21 515 Berlin 10781 516 DE 517 Email: N.Leymann@telekom.de 519 Jeff Tantsura 520 Ericsson 521 300 Holger Way 522 San Jose CA 95134 523 USA 525 11. References 527 11.1. Normative References 529 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 530 Requirement Levels", BCP 14, RFC 2119, March 1997. 532 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 533 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 535 11.2. Informative References 537 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 538 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 539 Protocol Specification (Revised)", RFC 4601, August 2006. 541 [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas, 542 "Label Distribution Protocol Extensions for Point-to- 543 Multipoint and Multipoint-to-Multipoint Label Switched 544 Paths", RFC 6388, November 2011. 546 [RFC6571] Filsfils, C., Francois, P., Shand, M., Decraene, B., 547 Uttaro, J., Leymann, N., and M. Horneffer, "Loop-Free 548 Alternate (LFA) Applicability in Service Provider (SP) 549 Networks", RFC 6571, June 2012. 551 Authors' Addresses 553 Apoorva Karan 554 Cisco Systems, Inc. 555 3750 Cisco Way 556 San Jose CA, 95134 557 USA 559 Email: apoorva@cisco.com 561 Clarence Filsfils 562 Cisco Systems, Inc. 563 De kleetlaan 6a 564 Diegem BRABANT 1831 565 Belgium 567 Email: cfilsfil@cisco.com 569 Dino Farinacci 570 lispers.net 571 USA 573 Email: farinacci@gmail.com 575 IJsbrand Wijnands (editor) 576 Cisco Systems, Inc. 577 De Kleetlaan 6a 578 Diegem 1831 579 BE 581 Email: ice@cisco.com 583 Bruno Decraene 584 Orange 585 38-40 rue du General Leclerc 586 Issy Moulineaux Cedex 9, 92794 587 FR 589 Email: bruno.decraene@orange.com 590 Uwe Joorde 591 Deutsche Telekom 592 Hammer Str. 216-226 593 Muenster D-48153 594 DE 596 Email: Uwe.Joorde@telekom.de 598 Wim Henderickx 599 Alcatel-Lucent 600 Copernicuslaan 50 601 Antwerp 2018 602 Belgium 604 Email: wim.henderickx@alcatel-lucent.com