idnits 2.17.1 draft-karan-mofrr-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 10 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 539 has weird spacing: '...lineaux cedex...' == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 13, 2011) is 4793 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5036' is defined on line 492, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 4601 (Obsoleted by RFC 7761) == Outdated reference: A later version (-15) exists of draft-ietf-mpls-ldp-p2mp-12 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Karan 3 Internet-Draft C. Filsfils 4 Intended status: Informational D. Farinacci 5 Expires: September 14, 2011 Cisco Systems, Inc. 6 B. Decraene 7 France Telecom 8 N. Leymann 9 U. Joorde 10 Deutsche Telekom 11 T. Telkamp 12 Cariden Technologies, Inc. 13 March 13, 2011 15 Multicast only Fast Re-Route 16 draft-karan-mofrr-01 18 Abstract 20 As IPTV deployments grow in number and size, service providers are 21 looking for solutions that minimize the service disruption due to 22 faults in the IP network carrying the packets for these services. 23 This draft describes a mechanism for minimizing packet loss in a 24 network when node or link failures occur. Multicast only Fast Re- 25 Route (MoFRR) works by making simple enhancements to multicast 26 routing protocols such as PIM. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on September 14, 2011. 45 Copyright Notice 47 Copyright (c) 2011 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Conventions used in this document . . . . . . . . . . . . 3 64 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Basic Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 66 3. Topologies for MoFRR . . . . . . . . . . . . . . . . . . . . . 4 67 3.1. Dual-Plane Topology . . . . . . . . . . . . . . . . . . . 4 68 4. Detecting Failures . . . . . . . . . . . . . . . . . . . . . . 7 69 5. ECMP-mode MoFRR . . . . . . . . . . . . . . . . . . . . . . . 8 70 6. Non-ECMP-mode MoFRR . . . . . . . . . . . . . . . . . . . . . 8 71 6.1. Variation . . . . . . . . . . . . . . . . . . . . . . . . 10 72 7. Keep It Simple Principle . . . . . . . . . . . . . . . . . . . 10 73 8. Capacity Planning for MoFRR . . . . . . . . . . . . . . . . . 10 74 9. Other Applications . . . . . . . . . . . . . . . . . . . . . . 11 75 10. Security Considerations . . . . . . . . . . . . . . . . . . . 12 76 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 77 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 78 12.1. Normative References . . . . . . . . . . . . . . . . . . . 12 79 12.2. Informative References . . . . . . . . . . . . . . . . . . 12 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 82 1. Introduction 84 Multiple techniques have been developed and deployed to improve 85 service guarantees, both for multicast video traffic and Video on 86 Demand traffic. Most existing solutions are geared towards finding 87 an alternate path around one or more failed network elements (link, 88 node, path failures). 90 This draft describes a mechanism for minimizing packet loss in a 91 network when node or link failures occur. Multicast only Fast Re- 92 Route (MoFRR) works by making simple changes to the way selected 93 routers use multicast protocols such as PIM. No changes to the 94 protocols themselves are required. With MoFRR, in many cases, 95 multicast routing protocols don't necessarily have to depend on or 96 have to wait on unicast routing protocols to detect network failures. 98 MoFRR involves transmitting a multicast join message from a receiver 99 towards a source on a primary path and transmitting a secondary 100 multicast join message from the receiver towards the source on a 101 backup path. Data packets are received from the primary and 102 secondary paths. The redundant packets are discarded at topology 103 merge points using RPF checks. When a failure is detected on the 104 primary path, the repair occurs by changing the interface on which 105 packets are accepted to the secondary interface. Since the repair is 106 local, it is fast - greatly improving convergence times in the event 107 of node or link failures on the primary path. 109 1.1. Conventions used in this document 111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 113 document are to be interpreted as described in RFC 2119 [RFC2119]. 115 1.2. Terminology 117 MoFRR : Multicast only Fast Re-Route. 119 ECMP : Equal Cost Multi-Path. 121 Primary Join : Multicast join message sent from receiver towards the 122 source on the primary path. 124 Secondary Join : Multicast join message sent from receiver towards 125 the source on the secondary path. 127 2. Basic Overview 129 MoFRR uses standard PIM JOIN/PRUNE messages to set up a primary and a 130 secondary multicast forwarding path by establishing a primary and a 131 secondary RPF interface on each router that receives a PIM join. The 132 outgoing interface list remains the same. 134 Data packets are received from the primary and backup paths. 135 Redundant packets received on the secondary RPF interface are 136 discarded because of an RPF failure. When the router detects a 137 forwarding failure in the primary path, it changes RPF to the 138 secondary path and immediately has packets available to forward out 139 each outgoing interface. 141 The primary and secondary MoFRR forwarding paths should not use the 142 same nodes or links. This may be configured or determined by 143 computations described in this document. 145 Note, the impact of additional amount of data on the network is 146 mitigated when group membership is densely populated. When a part of 147 the network has redundant data flowing, join latency for new joining 148 members is reduced because joins don't have to propagate far to get 149 to on-tree routers. 151 3. Topologies for MoFRR 153 MoFRR works best in topologies illustrated in the figure below. 154 MoFRR may be enabled on any router in the network. In the figures 155 below, MoFRR is shown enabled on the Provider Edge (PE) routers to 156 illustrate one way in which the technology may be deployed. 158 3.1. Dual-Plane Topology 159 S 160 PJ / \ PJ 161 / \ 162 ^ G1 R1 ^ 163 PJ / \ PJ 164 / \ 165 G2----------R2 ^ 166 | \ | \ PJ 167 ^ | \ | \ 168 PJ | G3----------R3 169 | | | | 170 | | | | ^ 171 G4---|------R4 | PJ 172 ^ \ | \ | 173 PJ \ | \ | 174 G5----------R5 175 ^ | | ^ 176 PJ | | PJ 177 | | 178 Gi Ri 179 \ \__ ^ /| 180 \ \ SJ1/ | ^ 181 ^ \ ^\ / |PJ2 182 PJ1 \SJ2\_/__ | 183 \ / \| 184 PE1 PE2 185 PJ = Primary Join 186 SJ = Secondary Join 188 FIG1. Two-Plane Network Design 190 The topology has two planes, a primary plane and a secondary plane 191 that are fully disjoint from each other all the way into the POPs. 192 This two plane design is common in service provider networks as it 193 eliminates single point of failures in their core network. The links 194 marked PJ indicate the normal path of how the PIM joins flow from the 195 POPs towards the source of the network. Multicast streams, 196 especially for the densely watched channels, typically flow along 197 both the planes in the network anyways. 199 The only change MoFRR adds to this is on the links marked SJ where 200 the PE routers send a secondary PIM joins to their ECMP neighbor 201 towards the source. As a result of this, each PE router receives two 202 copies of the same stream, one from the primary plane and the other 203 from the secondary plane. As a result of normal multicast RPF checks 204 the multicast stream received over the primary path is accepted and 205 forwarded to the downstream links. The copy of the stream received 206 on the secondary path is discarded. 208 When a router detects a routing failure on its primary RPF interface, 209 it will switch to the secondary RPF interface and accept packets on 210 that stream. If the failure is repaired the router may switch back. 211 The primary and secondary path have only local context and not end- 212 to-end context. 214 As one can see, MoFRR achieves the faster convergence by pre-building 215 the secondary multicast tree and receiving the traffic on that 216 secondary path. The example discussed above is a simple case where 217 there are two ECMP paths from each PE device towards the source, one 218 along the primary plane and one along the secondary. In cases where 219 the topology is asymmetric or is a ring, this ECMP nature does not 220 hold, and additional rules have to be taken into account to choose 221 when and where to send the secondary PIM joins. 223 MoFRR is appealing in such topologies for the following reasons: 225 1. Ease of deployment and simplicity: the functionality is only 226 required on the PE devices although it may be configured on all 227 routers in the topology. Furthermore, each PE device can be 228 enabled separately. PEs not enabled for MoFRR do not see any 229 change or degradation. Inter-operability testing is not required 230 as there is no PIM protocol change. 232 2. End-to-end failure detection and recovery: any failure along the 233 path from the source to the PE can be detected and repaired with 234 the secondary disjoint stream. 236 3. Capacity Efficiency: as illustrated in the previous example, the 237 PIM trees corresponding to IPTV channels cover the backbone and 238 distribution topology in a very dense manner. As a consequence, 239 the secondary joins graft into the normal PIM trees (ie. trees 240 signaled by PIM without MoFRR extension) at the aggregation level 241 and hence do not demand any extra capacity either on the 242 distribution links or in the backbone. They simply use the 243 capacity that is normally used, without any duplication. This is 244 different from conventional FRR mechanisms which often duplicate 245 the capacity requirements (the backup path crosses links/nodes 246 which already carry the primary/normal tree and hence twice as 247 much capacity is required). 249 4. Loop free: the secondary PIM join is sent on an ECMP disjoint 250 path. By definition, the neighbor receiving this secondary PIM 251 join is closer to the source and hence will not send a PIM join 252 back. 254 The topology we just analyzed is very frequent and can be modelled as 255 per Fig2. The PE has two ECMP disjoint paths to the source. Each 256 ECMP path uses a disjoint plane of the network. 258 Source 259 / \ 260 Plane1 Plane2 261 | | 262 A1 A2 263 \ / 264 PE 266 FIG2. PE is dual-homed to Dual-Plane Backbone 268 Another frequent topology is described in Fig 3. PEs are grouped by 269 pairs. In each pair, each PE is connected to a different plane. 270 Each PE has one single shortest-path to a source (via its connected 271 plane). There is no ECMP like in Fig 2. However, there is clearly a 272 way to provide MoFRR benefits as each PE can offer a disjoint 273 secondary path to the other plane PE (via the disjoint path). 275 MoFRR secondary neighbor selection process needs to be extended in 276 this case as one cannot simply rely on using an ECMP path as 277 secondary neighbor. This extension is referred to as non-ecmp 278 extension and is described later in the document. 280 Source 281 / \ 282 Plane1 Plane2 283 | | 284 A1 A2 285 | | 286 PE1----PE2 288 FIG3. PEs are connected in pairs to Dual-Plane Backbone 290 4. Detecting Failures 292 Once the two paths are established, the next step is detecting a 293 failure on the primary path to know when to switch to the backup 294 path. 296 A first option consists of comparing the packets received on the 297 primary and secondary streams but only forwarding one of them -- the 298 first one received, no matter which interface it is received on. 299 Zero packet loss is possible for RTP-based streams. 301 A second option assumes a minimum known packet rate for a given data 302 stream. If a packet is not received on the primary RPF within this 303 time frame, the router assumes primary path failure and switches to 304 the secondary RPF interface. 50msec switchover is possible. 306 A third option leverages the significant improvements of the IGP 307 convergence speed. When the primary path to the source is withdrawn 308 by the IGP, the MoFRR-enabled router switches over to the backup 309 path, the RPF interface is changed to the secondary RPF interface. 310 Since the secondary path is already in place, and assuming it is 311 disjoint from the primary path, convergence times would not include 312 the time required to build a new tree and hence are smaller. 313 Realistic availability requirements (sub-second to sub-200msec) 314 should be possible. 316 A fourth option consists in leveraging connected link failure. This 317 option makes sense when MoFRR is deployed across the network (not 318 only at PE). 320 5. ECMP-mode MoFRR 322 If the IGP installs two ECMP paths to the source and if the (S, G) 323 PIM state is enabled for ECMP-Mode MoFRR, the router installs them as 324 primary RPF and secondary RPF. It sends a PIM join to both RPF 325 entries. Only packets receive from the primary RPF entry are 326 processed. Packets received from the secondary RPF are dropped 327 (equivalent to an RPF failure). 329 The selected primary RPF interface should be the same as if MoFRR 330 extension was not enabled. 332 If more than two ECMP paths exist, two are selected as primary and 333 secondary RPF interfaces. Information from the IGP link-state 334 topology could be leveraged to optimize this selection. 336 Note, MoFRR does not restrict the number of paths on which joins are 337 sent. Implementations may use as many paths as are configured. 339 6. Non-ECMP-mode MoFRR 340 SourceS 341 / \ 342 / \ 343 Backbone 344 | | 345 | | 346 | | 347 X--------N 349 Fig5. Non-ECMP-Mode MoFRR 351 X is configured for MoFRR for state (S, G) 352 R(X) is Xs RPF to S 353 N is a neighbor of X 354 R(N) is Ns RPF to S 355 xs represents the IGP metric from X to S 356 ns represents the IGP metric from N to S 357 xn represents the IGP metric from X to N 359 A router X configured for non-ECMP-mode MoFRR for (S, G) sends a 360 primary PIM join to its primary RPF R(X) and a secondary PIM Join to 361 a neighbor N if the following three conditions are met. 363 C1: xs < xn + ns 364 C2: ns < nx + xs 365 C3: X cannot send a secondary join to N if N is the only member of the OIF list 367 The first condition ensures that N is not on the primary branch from 368 X to S. 370 The second condition ensures that X is not on the primary branch from 371 N to S. 373 These two conditions ensure that at least locally the two paths are 374 disjoint. 376 The third condition is required to break control-plane loops which 377 could occur in some scenarios. 379 For example in FIG3, if PE1 and PE2 have received an igmp request for 380 (S, G), they will both send a primary PIM join on their plane and a 381 secondary PIM join to the neighbor PE. If their receivers would 382 leave at the same time, it could be possible for the (S, G) states on 383 PE1 and PE2 to never get deleted as each PE refresh each other via 384 the secondary PIM joins (remember that a secondary PIM join is not 385 distinguishable from a primary PIM join. MoFRR does not require any 386 PIM protocol modification). 388 A control-plane loop occurs when two nodes keep a state forever due 389 to the secondary joins they send to each other. This forever 390 condition is not acceptable as no real receiver is connected to the 391 nodes (directly via IGMP or indirectly via PIM). Rule 3 prevents 392 this case as it prevents the mutual refresh of secondary joins and it 393 applies it in the specific case where there is no real receiver 394 connected. 396 6.1. Variation 398 Rule R3 can be removed if Rule 2 is restricted as follows: 400 R2p: ns < xs 402 This ensures that X only sends a secondary join to a neighbor N who 403 is strictly closer to the source than X is. By reciprocity, N will 404 thus never be able to send an sedondary join to the same source via 405 X. The strictly smaller than is key here. 407 Note that this non-ECMP-mode MoFRR variation does not support the 408 square topology and hence is less preferred. 410 7. Keep It Simple Principle 412 Many Service Providers devise their topology such that PEs have 413 disjoint paths to the multicast sources. MoFRR leverages the 414 existence of these disjoint paths without any PIM protocol 415 modification. Interoperability testing is thus not required. In 416 such topologies, MoFRR only needs to be deployed on the PE devices. 417 Each PE device can be enabled one by one. PEs not enabled for MoFRR 418 do not see any change or degradation. 420 Multicast streams with Tight SLA requirements are often characterized 421 by a continuous high packet rate (SD video has a continuous 422 interpacket gap of ~ 3msec). MoFRR simply leverages the stream 423 characteristic to detect any failures along the primary branch and 424 switch-over on the secondary branch in a few 10s of msec. 426 8. Capacity Planning for MoFRR 428 As for LFA FRR (draft-ietf-rtgwg-lfa-applicability-00), MoFRR 429 applicability is topology dependent. 431 In this document, we have described two very frequent designs (Fig 2 432 and Fig 3) which provide maximum MoFRR benefits. 434 Designers with topologies different than Fig2 and 3 can still benefit 435 from MoFRR benefits thanks to the use of capacity planning tools. 437 Such tools are able to simulate the ability of each PE to build two 438 disjoint branches of the same tree. This for hundreds of PEs and 439 hundreds of sources. 441 This allows to assess the MoFRR protection coverage of a given 442 network, for a set of sources. 444 If the protection coverage is deemed insufficient, the designer can 445 use such tool to optimize the topology (add links, change igp 446 metrics). 448 9. Other Applications 450 While all the examples in this document show the MoFRR applicability 451 on PE devices, it is clear that MoFRR could be enabled on aggregation 452 or core routers. 454 MoFRR can be popular in Data Center network configurations. With the 455 advent of lower cost ethernet and increasing port density in routers, 456 there is more meshed connectivity than ever before. When using a 457 3-level access, distribution, and core layers in a Data Center, there 458 is a lot of inexpensive bandwidth connecting the layers. This will 459 lend itself to more opportunities for ECMP paths at multiple layers. 460 This allows for multiple layers of redundancy protecting link and 461 node failure at each layer with minimal redundancy cost. 463 Redundancy costs are reduced because only one packet is forwarded at 464 every link along the primary and secondary data paths so there is no 465 duplication of data on any link thereby providing make-before-break 466 protection at a very small cost. 468 The MoFRR behavior described for PIM are immediately applicable to 469 MLDP. Alternate methods to detect failures such as MPLS-OAM or BFD 470 may be considered. 472 The MoFRR principle may be applied to MVPNs. 474 The MoFRR principle may be applied to mLDP [I-D.ietf-mpls-ldp-p2mp]. 475 The reader may simply switch the term secondary-PIM-Join by 476 secondary-Label-Map message. 478 10. Security Considerations 480 There are no security considerations for this design other than what 481 is already in the main PIM specification [RFC4601]. 483 11. Acknowledgments 485 The authors would like to thank John Zwiebel, Greg Shepherd and Dave 486 Oran for their review of the draft. 488 12. References 490 12.1. Normative References 492 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 493 Specification", RFC 5036, October 2007. 495 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 496 Requirement Levels", BCP 14, RFC 2119, March 1997. 498 12.2. Informative References 500 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 501 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 502 Protocol Specification (Revised)", RFC 4601, August 2006. 504 [I-D.ietf-mpls-ldp-p2mp] 505 Minei, I., Wijnands, I., Kompella, K., and B. Thomas, 506 "Label Distribution Protocol Extensions for Point-to- 507 Multipoint and Multipoint-to-Multipoint Label Switched 508 Paths", draft-ietf-mpls-ldp-p2mp-12 (work in progress), 509 February 2011. 511 Authors' Addresses 513 Apoorva Karan 514 Cisco Systems, Inc. 515 3750 Cisco Way 516 San Jose CA, 95134 517 USA 519 Email: apoorva@cisco.com 520 Clarence Filsfils 521 Cisco Systems, Inc. 522 De kleetlaan 6a 523 Diegem BRABANT 1831 524 Belgium 526 Email: cfilsfil@cisco.com 528 Dino Farinacci 529 Cisco Systems, Inc. 530 425 East Tasman Drive 531 San Jose CA, 95134 532 USA 534 Email: dino@cisco.com 536 Bruno Decraene 537 France Telecom 538 38-40 rue du General Leclerc 539 Issy Moulineaux cedex 9, 92794 540 FR 542 Email: bruno.decraene@orange-ftgroup.com 544 Nicolai Leymann 545 Deutsche Telekom 546 Winterfeldtstrasse 21 547 Berlin 10781 548 DE 550 Email: N.Leymann@telekom.de 552 Uwe Joorde 553 Deutsche Telekom 554 Winterfeldtstrasse 21 555 Berlin 10781 556 DE 558 Email: N.Leymann@telekom.de 559 Thomas Telkamp 560 Cariden Technologies, Inc. 561 888 Villa Street, Suite 500 562 Mountain View CA, 94041 563 USA 565 Email: telkamp@cariden.com