idnits 2.17.1 draft-ietf-rtgwg-mofrr-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 12, 2015) is 3265 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 4601 (Obsoleted by RFC 7761) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Karan 3 Internet-Draft C. Filsfils 4 Intended status: Informational IJ. Wijnands, Ed. 5 Expires: November 13, 2015 Cisco Systems, Inc. 6 B. Decraene 7 Orange 8 May 12, 2015 10 Multicast only Fast Re-Route 11 draft-ietf-rtgwg-mofrr-07 13 Abstract 15 As IPTV deployments grow in number and size, service providers are 16 looking for solutions that minimize the service disruption due to 17 faults in the IP network carrying the packets for these services. 18 This document describes a mechanism for minimizing packet loss in a 19 network when node or link failures occur. Multicast only Fast Re- 20 Route (MoFRR) works by making simple enhancements to multicast 21 routing protocols such as PIM and mLDP. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on November 13, 2015. 40 Copyright Notice 42 Copyright (c) 2015 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 1.1. Conventions used in this document . . . . . . . . . . . . 3 59 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Basic Overview . . . . . . . . . . . . . . . . . . . . . . . 4 61 3. Determination of the secondary UMH . . . . . . . . . . . . . 4 62 3.1. ECMP-mode MoFRR . . . . . . . . . . . . . . . . . . . . . 4 63 3.2. Non-ECMP-mode MoFRR . . . . . . . . . . . . . . . . . . . 5 64 4. Upstream Multicast Hop Selection . . . . . . . . . . . . . . 5 65 4.1. PIM . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 4.2. mLDP . . . . . . . . . . . . . . . . . . . . . . . . . . 6 67 5. Detecting Failures . . . . . . . . . . . . . . . . . . . . . 6 68 6. MoFRR applicability . . . . . . . . . . . . . . . . . . . . . 7 69 6.1. Dual-Plane Topology . . . . . . . . . . . . . . . . . . . 7 70 6.2. Other Topologies . . . . . . . . . . . . . . . . . . . . 10 71 6.3. Capacity Planning for MoFRR . . . . . . . . . . . . . . . 11 72 6.4. PE nodes . . . . . . . . . . . . . . . . . . . . . . . . 11 73 6.5. Other Applications . . . . . . . . . . . . . . . . . . . 11 74 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 75 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12 76 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 77 10. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 12 78 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 79 11.1. Normative References . . . . . . . . . . . . . . . . . . 13 80 11.2. Informative References . . . . . . . . . . . . . . . . . 13 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 83 1. Introduction 85 Different solutions have been developed and deployed to improve 86 service guarantees, both for multicast video traffic and Video on 87 Demand traffic. Most of these solutions are geared towards finding 88 an alternate path around one or more failed network elements (link, 89 node, path failures). 91 This document describes a mechanism for minimizing packet loss in a 92 network when node or link failures occur. Multicast only Fast Re- 93 Route (MoFRR) works by making simple changes to the way selected 94 routers use multicast protocols such as PIM and mLDP. No changes to 95 the protocols themselves are required. With MoFRR, in many cases, 96 multicast routing protocols don't necessarily have to depend on or 97 have to wait on unicast routing protocols to detect network failures, 98 see Section 5 100 On a Merge Point MoFRR logic determines a primary Upstream Multicast 101 Hop (UMH) and a secondary UMH and joins the tree via both 102 simultaneously. Data packets are received over the primary and 103 secondary paths. Only the packets from the primary UMH are accepted 104 and forwarded down the tree, the packets from the secondary UMH are 105 discarded. The UMH determination is different for PIM and mLDP and 106 explained in Section 4. When a failure is detected on the path to 107 the primary UMH, the repair occurs by changing the secondary UMH into 108 the primary and the primary into the secondary. Since the repair is 109 local, it is fast - greatly improving convergence times in the event 110 of node or link failures on the path to the primary UMH. 112 1.1. Conventions used in this document 114 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 116 document are to be interpreted as described in RFC 2119 [RFC2119]. 118 1.2. Terminology 120 MoFRR: Multicast only Fast Re-Route. 122 ECMP: Equal Cost Multi-Path. 124 mLDP: Multi-point Label Distribution Protocol. 126 PIM: Protocol Independent Multicast. 128 UMH: Upstream Multicast Hop, a candidate next-hop that can be used 129 to reach the root of the tree. 131 tree: Either a PIM (S,G)/(*,G) tree or a mLDP P2MP or MP2MP LSP. 133 OIF: Outgoing InterFace, an interface used to forward multicast 134 packets down the tree towards the receivers. Either a PIM 135 (S,G)/(*,G) tree or a mLDP P2MP or MP2MP LSP. 137 LFA: Loop Free Alternate as defined in [RFC5286] In unicast Fast 138 ReRoute, this is an alternate next-hop which can be used to reach 139 a unicast destination without using the protected link or node. 141 Merge Point: A router that joins a multicast stream via two 142 divergent upstream paths. 144 RPF: Reverse Path Forwarding. 146 RP: Rendezvous Point. 148 LSR: Label Switched Router. 150 BFD: Bidirectional Forwarding Detection. 152 IGP: Interior Gateway Protocol. 154 MVPN: Multicast Virtual Private Networks. 156 2. Basic Overview 158 The basic idea of MoFRR is for a Merge Point router to join a 159 multicast tree via two divergent upstream paths in order to get 160 maximum redundancy. The determination of this alternate upstream is 161 defined in Section 3. 163 In order to maximize robustness against any failure, the two paths 164 should be as diverse as possible. Ideally, they should not merge 165 upstream. Sometimes the topology guarantees maximal redundancy, 166 other times additional configuration or techniques are needed to 167 enforce it. See Section 6 for more discussion on the applicability 168 of MoFRR depending on the network topology. 170 A Merge Point router should only accept and forward on one of the 171 upstream paths at a time in order to avoid duplicate packet 172 forwarding. The selection of the primary and secondary UMH is done 173 by the MoFRR logic and normally based on unicast routing to find loop 174 free candidates. This is described in Section 4. 176 Note, the impact of additional amount of data on the network is 177 mitigated when tree membership is densely populated. When a part of 178 the network has redundant data flowing, join latency for new joining 179 members is reduced because its likely a tree Merge Point is not far 180 away. 182 3. Determination of the secondary UMH 184 The secondary UMH is a Loop Free Alternate (LFA) as per [RFC5286]. 186 3.1. ECMP-mode MoFRR 188 If the IGP installs two ECMP paths to the source, then as per 189 [RFC5286] the LFA is a primary Next-hop. If the Multicast tree is 190 enabled for ECMP-Mode MoFRR, the router installs them as primary and 191 secondary UMH. Before the failure, only packets received from the 192 primary UMH path are processed while packets received from the 193 secondary UMH are dropped. 195 The selected primary UMH SHOULD be the same as if the MoFRR extension 196 was not enabled. 198 If more than two ECMP paths exist, one is selected as primary and 199 another as secondary UMH. The selection of the primary and secondary 200 is a local decision. Information from the IGP link-state topology 201 could be leveraged to optimize this selection such that the primary 202 and secondary path are maximal divergent and don't lead to the same 203 upstream node. Note that MoFRR does not restrict the number of UMH 204 paths that are joined. Implementations may use as many paths as are 205 configured. 207 3.2. Non-ECMP-mode MoFRR 209 A router X configured for non-ECMP-mode MoFRR for a Multicast tree 210 joins a primary path to its primary UMH and a secondary path to its 211 LFA UMH. In order to prevent control-plane loops a router MUST stop 212 joining the secondary UMH if this UMH is the only member in the OIF 213 list. 215 To illustrate the reason for this rule, let's consider the example in 216 FIG3. If PE1 and PE2 have received an IGMP request for a Multicast 217 tree, they will both join the primary path on their plane and a 218 secondary path to the neighbor PE. If their receivers would leave at 219 the same time, it could be possible for the Multicast tree on PE1 and 220 PE2 to never get deleted as each PE refresh each other via the 221 secondary path joins (remember that a secondary path join is not 222 distinguishable from a primary join). 224 4. Upstream Multicast Hop Selection 226 An Upstream Multicast Hop (UMH) is a candidate next-hop that can be 227 used to reach the root of the tree. This is normally based on 228 unicast routing to find loop free candidate(s). With MoFRR 229 procedures we select a primary and a backup UMH. The procedures for 230 determining the UMH are different for PIM and mLDP. See below; 232 4.1. PIM 234 The UMH selection in PIM is also known as the Reverse Path Forwarding 235 (RPF) procedure. Based on a unicast route lookup on either the 236 Source address or Rendezvous Point (RP) [RFC4601], an upstream 237 interface is selected for sending the PIM Joins/Prunes AND accepting 238 the multicast packets. The interface the packets are received on is 239 used to pass or fail the RPF check. If packets are received on an 240 interface that was not selected by the RPF procedure, or not the 241 primary, the packets are discarded. 243 4.2. mLDP 245 The UMH selection in mLDP also depends on unicast routing, but the 246 difference with PIM is that the acceptance of multicast packets is 247 based on MPLS labels and independent of the interface the packet is 248 received on. Using the procedures as defined in [RFC6388] an 249 upstream Label Switched Router (LSR) is elected. The upstream LSR 250 that was elected for a Label Switched Path (LSP) gets a unique local 251 MPLS Label allocated. Multicast packets are only forwarded if the 252 MPLS label matches the MPLS label that was allocated for that LSPs 253 (primary) upstream LSR. 255 5. Detecting Failures 257 Once the two paths are established, the next step is detecting a 258 failure on the primary path to know when to switch to the backup 259 path. This is a local issue but this section explore some 260 possibilities. 262 The first (and simplest) option is to detect the failure of the local 263 interface as it it's done for unicast Fast ReRoute. Detection can be 264 performed using the loss of signal or the loss of probing packets 265 (e.g. BFD). This option can be used in combination with the other 266 options as documented below. Just like for unicast fast reroute, 267 50msec switch-over is possible. 269 A second option consists of comparing the packets received on the 270 primary and secondary streams but only forwarding one of them -- the 271 first one received, no matter which interface it is received on. 272 Zero packet loss is possible for RTP-based streams. 274 A third option assumes a minimum known packet rate for a given data 275 stream. If a packet is not received on the primary RPF within this 276 time frame, the router assumes primary path failure and switches to 277 the secondary RPF interface. 50msec switch-over may be possible for 278 high rate stream (e.g. IP TV where SD video has a continuous inter- 279 packet gap of ~ 3msec) but in general the delay is dependant on the 280 rate of the multicast stream. 282 A fourth option leverages the significant improvements of the IGP 283 convergence speed. When the primary path to the source is withdrawn 284 by the IGP, the MoFRR-enabled router switches over to the backup 285 path, the UMH is changed to the secondary UMH. Since the secondary 286 path is already in place, and assuming it is disjoint from the 287 primary path, convergence times would not include the time required 288 to build a new tree and hence are smaller. Sub-second to sub-200msec 289 switch-over should be possible. 291 6. MoFRR applicability 293 MoFRR applicability is topology dependent. The applicability is the 294 same as LFA FRR which is discussed in [RFC6571]. 296 The following section will discuss MoFRR applicability to dual-plane 297 network topologies. 299 6.1. Dual-Plane Topology 301 MoFRR works best in dual-planes topologies as illustrated in the 302 figures below. MoFRR may be enabled on any router in the network. 303 In the figures below, MoFRR is shown enabled on the Provider Edge 304 (PE) routers to illustrate one way in which the technology may be 305 deployed. 307 S 308 P / \ P 309 / \ 310 ^ G1 R1 ^ 311 P / \ P 312 / \ 313 G2----------R2 ^ 314 | \ | \ P 315 ^ | \ | \ 316 P | G3----------R3 317 | | | | 318 | | | | ^ 319 G4---|------R4 | P 320 ^ \ | \ | 321 P \ | \ | 322 G5----------R5 323 ^ | | ^ 324 P | | P 325 | | 326 Gi Ri 327 \ \__ ^ /| 328 \ \ S1/ | ^ 329 ^ \ ^\ / |P2 330 P1 \ S2\_/__ | 331 \ / \| 332 PE1 PE2 333 P = Primary path 334 S = Secondary path 336 FIG1. Two-Plane Network Design 338 The topology has two planes, a primary plane and a secondary plane 339 that are fully disjoint from each other all the way into the POPs. 340 This two plane design is common in service provider networks as it 341 eliminates single point of failures in their core network. The links 342 marked P indicate the normal (Primary) path of how the PIM joins flow 343 from the POPs towards the source of the network. Multicast streams, 344 especially for the densely watched channels, typically flow along 345 both the planes in the network anyway. 347 The only change MoFRR adds to this is on the links marked S where the 348 PE routers join a secondary path to their secondary ECMP UMH. As a 349 result of this, each PE router receives two copies of the same 350 stream, one from the primary plane and the other from the secondary 351 plane. As a result of normal UMH behavior, the multicast stream 352 received over the primary path is accepted and forwarded to the 353 downstream receivers. The copy of the stream received from the 354 secondary UNH is discarded. 356 When a router detects a routing failure on the path to its primary 357 UMH, it will switch to the secondary UMH and accept packets for that 358 stream. If the failure is repaired the router may switch back. The 359 primary and secondary UMHs have only local context and not end-to-end 360 context. 362 As one can see, MoFRR achieves the faster convergence by pre-building 363 the secondary multicast tree and receiving the traffic on that 364 secondary path. The example discussed above is a simple case where 365 there are two ECMP paths from each PE device towards the source, one 366 along the primary plane and one along the secondary. In cases where 367 the topology is asymmetric or is a ring, this ECMP nature does not 368 hold, and additional rules have to be taken into account to choose 369 when and where to join the secondary path. 371 MoFRR is appealing in such topologies for the following reasons: 373 1. Ease of deployment and simplicity: the functionality is only 374 required on the PE devices although it may be configured on all 375 routers in the topology. Furthermore, each PE device can be 376 enabled separately, there is no need for a network wide 377 coordination in order to deploy MoFRR. Inter-operability testing 378 is not required as there are no PIM or mLDP protocol change. 380 2. End-to-end failure detection and recovery: any failure along the 381 path from the source to the PE can be detected and repaired with 382 the secondary disjoint stream.(see Section 5 options 2, 3, 4) 384 3. Capacity Efficiency: as illustrated in the previous example, the 385 Multicast trees corresponding to IPTV channels cover the backbone 386 and distribution topology in a very dense manner. As a 387 consequence, the secondary path graft into the normal Multicast 388 trees (ie. trees signaled by PIM or mLDP without MoFRR extension) 389 at the aggregation level and hence do not demand any extra 390 capacity either on the distribution links or in the backbone. 391 They simply use the capacity that is normally used, without any 392 duplication. This is different from conventional FRR mechanisms 393 which often duplicate the capacity requirements when the backup 394 path crosses links/nodes which already carry the primary/normal 395 tree and hence twice as much capacity is required. 397 4. Loop free: the secondary path join is sent on an ECMP disjoint 398 path. By definition, the neighbor receiving this request is 399 closer to the source and hence will not cause a loop. 401 The topology we just analyzed is very frequent and can be modelled as 402 per Fig2. The PE has two ECMP disjoint paths to the source. Each 403 ECMP path uses a disjoint plane of the network. 405 Source 406 / \ 407 Plane1 Plane2 408 | | 409 A1 A2 410 \ / 411 PE 413 FIG2. PE is dual-homed to Dual-Plane Backbone 415 Another frequent topology is described in Fig 3. PEs are grouped by 416 pairs. In each pair, each PE is connected to a different plane. 417 Each PE has one single shortest-path to a source (via its connected 418 plane). There is no ECMP like in Fig 2. However, there is clearly a 419 way to provide MoFRR benefits as each PE can offer a disjoint 420 secondary path to the other plane PE (via the disjoint path). 422 MoFRR secondary neighbor selection process needs to be extended in 423 this case as one cannot simply rely on using an ECMP path as 424 secondary neighbor. This extension is referred to as non-ecmp 425 extension and is described in Section 3.2. 427 Source 428 / \ 429 Plane1 Plane2 430 | | 431 A1 A2 432 | | 433 PE1----PE2 435 FIG3. PEs are connected in pairs to Dual-Plane Backbone 437 6.2. Other Topologies 439 As mentioned in section Section 6.1, MoFRR works best in dual-plane 440 topologies. If MoFRR is applied to none dual-plane networks, its 441 possible that the secondary path is effected by the same failure that 442 effected the primary path. In that case, there is no guarentee that 443 the backup path will provide an un-interupted traffic flow of packets 444 without loss or duplication. 446 6.3. Capacity Planning for MoFRR 448 The previous section has described two very frequent designs (Fig 2 449 and Fig 3) which provide maximum MoFRR benefits. 451 Designers with topologies different than Fig2 and 3 can still benefit 452 from MoFRR thanks to the use of capacity planning tools. 454 Such tools are able to simulate the ability of each PE to build two 455 disjoint branches of the same tree. This for hundreds of PEs and 456 hundreds of sources. 458 This allows to assess the MoFRR protection coverage of a given 459 network, for a set of sources. 461 If the protection coverage is deemed insufficient, the designer can 462 use such tool to optimize the topology (add links, change IGP 463 metrics). 465 6.4. PE nodes 467 Many Service Providers devise their topology such that PEs have 468 disjoint paths to the multicast sources. MoFRR leverages the 469 existence of these disjoint paths without any PIM or mLDP protocol 470 modification. Interoperability testing is thus not required. In 471 such topologies, MoFRR only needs to be deployed on the PE devices. 472 Each PE device can be enabled one by one. 474 6.5. Other Applications 476 While all the examples in this document show the MoFRR applicability 477 on PE devices, it is clear that MoFRR could be enabled on aggregation 478 or core routers. 480 MoFRR can be popular in Data Center network configurations. With the 481 advent of lower cost ethernet and increasing port density in routers, 482 there is more meshed connectivity than ever before. When using a 483 3-level access, distribution, and core layers in a Data Center, there 484 is a lot of inexpensive bandwidth connecting the layers. This will 485 lend itself to more opportunities for ECMP paths at multiple layers. 486 This allows for multiple layers of redundancy protecting link and 487 node failure at each layer with minimal redundancy cost. 489 Redundancy costs are reduced because only one packet is forwarded at 490 every link along the primary and secondary data paths so there is no 491 duplication of data on any link thereby providing make-before-break 492 protection at a very small cost. 494 A MoFRR router only accepts packets from the primary path and 495 discards packets from the secondary path. For that reason, 496 management applications (like ping and mtrace) will not work when 497 verifying the secondary path. 499 The MoFRR principle may be applied to MVPNs. 501 7. IANA Considerations 503 This document makes no request of IANA. 505 8. Security Considerations 507 There are no security considerations for this design other than what 508 is already in the main PIM specification [RFC4601] and mLDP 509 specification [RFC6388]. 511 9. Acknowledgments 513 Thanks to Dave Oran and Alvaro Retana for their review and comments 514 on this document. 516 The authors would like to especially acknowledge the contribution 517 from Dino Farinacci, John Zwiebel and Greg Shepherd for the genesis 518 of the MoFRR concept. 520 10. Contributor Addresses 522 Below is a list of other contributing authors in alphabetical order: 524 Dino Farinacci 525 Email: farinacci@gmail.com 527 Wim Henderickx 528 Alcatel-Lucent 529 Copernicuslaan 50 530 Antwerp 2018 531 Belgium 532 Email: wim.henderickx@alcatel-lucent.com 534 Uwe Joorde 535 Deutsche Telekom 536 Dahlweg 100 537 D-48153 Muenster 538 Germany 539 Email: Uwe.Joorde@telekom.de 541 Nicolai Leymann 542 Deutsche Telekom 543 Winterfeldtstrasse 21 544 Berlin 10781 545 DE 546 Email: N.Leymann@telekom.de 548 Jeff Tantsura 549 Ericsson 550 300 Holger Way 551 San Jose CA 95134 552 USA 553 Email: jeff.tantsura@ericsson.com 555 11. References 557 11.1. Normative References 559 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 560 Requirement Levels", BCP 14, RFC 2119, March 1997. 562 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 563 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 565 11.2. Informative References 567 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 568 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 569 Protocol Specification (Revised)", RFC 4601, August 2006. 571 [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas, 572 "Label Distribution Protocol Extensions for Point-to- 573 Multipoint and Multipoint-to-Multipoint Label Switched 574 Paths", RFC 6388, November 2011. 576 [RFC6571] Filsfils, C., Francois, P., Shand, M., Decraene, B., 577 Uttaro, J., Leymann, N., and M. Horneffer, "Loop-Free 578 Alternate (LFA) Applicability in Service Provider (SP) 579 Networks", RFC 6571, June 2012. 581 Authors' Addresses 583 Apoorva Karan 584 Cisco Systems, Inc. 585 3750 Cisco Way 586 San Jose CA, 95134 587 USA 589 Email: apoorva@cisco.com 591 Clarence Filsfils 592 Cisco Systems, Inc. 593 De kleetlaan 6a 594 Diegem BRABANT 1831 595 Belgium 597 Email: cfilsfil@cisco.com 599 IJsbrand Wijnands (editor) 600 Cisco Systems, Inc. 601 De Kleetlaan 6a 602 Diegem 1831 603 BE 605 Email: ice@cisco.com 607 Bruno Decraene 608 Orange 609 38-40 rue du General Leclerc 610 Issy Moulineaux Cedex 9, 92794 611 FR 613 Email: bruno.decraene@orange.com