idnits 2.17.1 draft-karan-mofrr-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 10 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 2, 2009) is 5533 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5036' is defined on line 483, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 4601 (Obsoleted by RFC 7761) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Karan 3 Internet-Draft C. Filsfils 4 Intended status: Informational D. Farinacci 5 Expires: September 3, 2009 Cisco Systems, Inc. 6 March 2, 2009 8 Multicast only Fast Re-Route 9 draft-karan-mofrr-00 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on September 3, 2009. 34 Copyright Notice 36 Copyright (c) 2009 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents in effect on the date of 41 publication of this document (http://trustee.ietf.org/license-info). 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. 45 Abstract 47 As IPTV deployments grow in number and size, service providers are 48 looking for solutions that minimize the service disruption due to 49 faults in the IP network carrying the packets for these services. 50 This draft describes a mechanism for minimizing packet loss in a 51 network when node or link failures occur. Multicast only Fast Re- 52 Route (MoFRR) works by making simple enhancements to multicast 53 routing protocols such as PIM. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 1.1. Conventions used in this document . . . . . . . . . . . . 3 59 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Basic Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 61 3. Topologies for MoFRR . . . . . . . . . . . . . . . . . . . . . 4 62 3.1. MoFRR ECMP Topology . . . . . . . . . . . . . . . . . . . 4 63 4. Detecting Failures . . . . . . . . . . . . . . . . . . . . . . 8 64 5. ECMP-mode MoFRR . . . . . . . . . . . . . . . . . . . . . . . 8 65 6. Non-ECMP-mode MoFRR . . . . . . . . . . . . . . . . . . . . . 9 66 6.1. Variation . . . . . . . . . . . . . . . . . . . . . . . . 10 67 7. Ring Topologies . . . . . . . . . . . . . . . . . . . . . . . 11 68 8. Keep It Simple Principle . . . . . . . . . . . . . . . . . . . 11 69 9. Other Applications . . . . . . . . . . . . . . . . . . . . . . 11 70 10. Security Considerations . . . . . . . . . . . . . . . . . . . 12 71 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 72 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 73 12.1. Normative References . . . . . . . . . . . . . . . . . . . 12 74 12.2. Informative References . . . . . . . . . . . . . . . . . . 12 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 77 1. Introduction 79 Multiple techniques have been developed and deployed to improve 80 service guarantees, both for multicast video traffic and Video on 81 Demand traffic. Most existing solutions are geared towards finding 82 an alternate path around one or more failed network elements (link, 83 node, path failures). 85 This draft describes a mechanism for minimizing packet loss in a 86 network when node or link failures occur. Multicast only Fast Re- 87 Route (MoFRR) works by making simple changes to the way selected 88 routers use multicast protocols such as PIM. No changes to the 89 protocols themselves are required. With MoFRR, multicast routing 90 protocols don't necessarily have to depend on or have to wait on 91 unicast routing protocols to detect network failures. 93 MoFRR involves transmitting a multicast join message from a receiver 94 towards a source on a primary path and transmitting a secondary 95 multicast join message from the receiver towards the source on a 96 backup path. Data packets are received from the primary and 97 secondary paths. The redundant packets are discarded at topology 98 merge points using RPF checks. When a failure is detected on the 99 primary path, the repair occurs by changing the interface on which 100 packets are accepted to the secondary interface. Since the repair is 101 local, it is fast - greatly improving convergence times in the event 102 of node or link failures on the primary path. 104 1.1. Conventions used in this document 106 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 107 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 108 document are to be interpreted as described in RFC 2119 [RFC2119]. 110 1.2. Terminology 112 MoFRR : Multicast only Fast Re-Route. 114 ECMP : Equal Cost Multi-Path. 116 Primary Join : Multicast join message sent from receiver towards the 117 source on the primary path. 119 Secondary Join : Multicast join message sent from receiver towards 120 the source on the secondary path. 122 2. Basic Overview 124 MoFRR uses standard PIM JOIN/PRUNE messages to set up a primary and a 125 secondary multicast forwarding path by establishing a primary and a 126 secondary RPF interface on each router that receives a PIM join. The 127 outgoing interface list remains the same. 129 Data packets are received from the primary and backup paths. 130 Redundant packets received on the secondary RPF interface are 131 discarded because of an RPF failure. When the router detects a 132 forwarding failure in the primary path, it changes RPF to the 133 secondary path and immediately has packets available to forward out 134 each outgoing interface. 136 The primary and secondary MoFRR forwarding paths should not use the 137 same nodes or links. This may be configured or determined by 138 computations described in this document. 140 Note, the impact of additional amount of data on the network is 141 mitigated when group membership is densely populated. When a part of 142 the network has redundant data flowing, join latency for new joining 143 members is reduced because joins don't have to propagate far to get 144 to on-tree routers. 146 3. Topologies for MoFRR 148 MoFRR works best in topologies illustrated in the figure below. 149 MoFRR may be enabled on any router in the network. In the figures 150 below, MoFRR is shown enabled on the Provider Edge (PE) routers to 151 illustrate one way in which the technology may be deployed. 153 3.1. MoFRR ECMP Topology 154 S 155 PJ / \ PJ 156 / \ 157 ^ G1 R1 ^ 158 PJ / \ PJ 159 / \ 160 G2----------R2 ^ 161 | \ | \ PJ 162 ^ | \ | \ 163 PJ | G3----------R3 164 | | | | 165 | | | | ^ 166 G4---|------R4 | PJ 167 ^ \ | \ | 168 PJ \ | \ | 169 G5----------R5 170 ^ | | ^ 171 PJ | | PJ 172 | | 173 Gi Ri 174 \ \__ ^ /| 175 \ \ SJ1/ | ^ 176 ^ \ ^\ / |PJ2 177 PJ1 \SJ2\_/__ | 178 \ / \| 179 PE1 PE2 180 PJ = Primary Join 181 SJ = Secondary Join 183 FIG1. Two-Plane Network Design 185 The topology has two planes, a primary plane and a secondary plane 186 that are fully disjoint from each other all the way into the POPs. 187 This two plane design is common in service provider networks as it 188 eliminates single point of failures in their core network. The links 189 marked PJ indicate the normal path of how the PIM joins flow from the 190 POPs towards the source of the network. Multicast streams, 191 especially for the densely watched channels, typically flow along 192 both the planes in the network anyways. 194 The only change MoFRR adds to this is on the links marked SJ where 195 the PE routers send a secondary PIM joins to their ECMP neighbor 196 towards the source. As a result of this, each PE router receives two 197 copies of the same stream, one from the primary plane and the other 198 from the secondary plane. As a result of normal multicast RPF checks 199 the multicast stream received over the primary path is accepted and 200 forwarded to the downstream links. The copy of the stream received 201 on the secondary path is discarded. 203 When a router detects a routing failure on its primary RPF interface, 204 it will switch to the secondary RPF interface and accept packets on 205 that stream. If the failure is repaired the router may switch back. 206 The primary and secondary path have only local context and not end- 207 to-end context. 209 As one can see, MoFRR achieves the faster convergence by pre-building 210 the secondary multicast tree and receiving the traffic on that 211 secondary path. The example discussed above is a simple case where 212 there are two ECMP paths from each PE device towards the source, one 213 along the primary plane and one along the secondary. In cases where 214 the topology is asymmetric or is a ring, this ECMP nature does not 215 hold, and additional rules have to be taken into account to choose 216 when and where to send the secondary PIM joins. 218 MoFRR is appealing in such topologies for the following reasons: 220 1. Ease of deployment and simplicity: the functionality is only 221 required on the PE devices although it may be configured on all 222 routers in the topology. Furthermore, each PE device can be 223 enabled separately. PEs not enabled for MoFRR do not see any 224 change or degradation. Inter-operability testing is not required 225 as there is no PIM protocol change. 227 2. End-to-end failure detection and recovery: any failure along the 228 path from the source to the PE can be detected and repaired with 229 the secondary disjoint stream. 231 3. Capacity Efficiency: as illustrated in the previous example, the 232 PIM trees corresponding to IPTV channels cover the backbone and 233 distribution topology in a very dense manner. As a consequence, 234 the secondary joins graft into the normal PIM trees (ie. trees 235 signaled by PIM without MoFRR extension) at the aggregation level 236 and hence do not demand any extra capacity either on the 237 distribution links or in the backbone. They simply use the 238 capacity that is normally used, without any duplication. This is 239 different from conventional FRR mechanisms which often duplicate 240 the capacity requirements (the backup path crosses links/nodes 241 which already carry the primary/normal tree and hence twice as 242 much capacity is required). 244 These properties are highly dependent on the topology: the existence 245 of at least two disjoint paths from the source to the PE implementing 246 the MoFRR behavior. 248 We consider hereafter three types of PE connectivity: 250 1. The PE has two ECMP disjoint paths to the source. This is a 251 common case when the PE and the source are dual-homed to two 252 aggregation routers, each ones belonging to two different planes 253 of a two-plane backbone design. One could picture this case as 254 the triangle. ECMP-mode MoFRR enables a PE to receive the same 255 PIM tree from both interfaces. 257 Source 258 / \ 259 Plane1 Plane2 260 | | 261 A1 A2 262 \ / 263 PE 265 FIG2. ECMP-mode MoFRR 267 2. The PE has one single path to the source. A neighbor has a 268 disjoint path to that source. This is common when the source is 269 dual-homed to the two planes while the PE is only connected to 270 one plane directly, and to the other plane via another PE. One 271 could picture this case as the square. Non-ECMP-mode MoFRR 272 enables a PE to receive the same PIM tree from both interfaces. 274 Source 275 / \ 276 Plane1 Plane2 277 | | 278 A1 A2 279 | | 280 PE1----PE2 282 FIG3. Non-ECMP-mode MoFRR 284 3. The PE is part of a ring of PEs. This ring is attached to the 285 two planes via two aggregation routers. Each one being in a 286 different plane. PEs on the left side of the ring have a 287 clockwise path to the source while PEs on the right side of the 288 ring use an anti-clockwise path. Ring-mode MoFRR enables a PE to 289 receive the same PIM tree from both direction. 291 Source 292 / \ 293 Plane1 Plane2 294 | | 295 A1 A2 296 | | 297 PE1----PE2 298 / \ 299 PE3 PE4 300 \ / 301 \ / 302 \ / 303 PE5 305 FIG4. Ring-mode MoFRR 307 4. Detecting Failures 309 Once the two paths are established, the next step is detecting a 310 failure on the primary path to know when to switch to the backup 311 path. 313 A first option consists of comparing the packets received on the 314 primary and secondary streams but only forwarding one of them -- the 315 first one received, no matter which interface it is received on. 316 Zero packet loss is possible for RTP-based streams. 318 A second option assumes a minimum packet rate for a given data 319 stream. If a packet is not received on the primary RPF within this 320 time frame, the router assumes primary path failure and switches to 321 the secondary RPF interface. 50msec switchover is possible. 323 A third option leverages the significant improvements of the IGP 324 convergence speed. When the primary path to the source is withdrawn 325 by the IGP, the MoFRR-enabled router switches over to the backup 326 path, the RPF interface is changed to the secondary RPF interface. 327 Since the secondary path is already in place, convergence times would 328 not include the time required to build a new tree and hence are 329 smaller. Realistic availability requirements (sub-second to sub- 330 200msec) should be possible. 332 5. ECMP-mode MoFRR 334 If the IGP installs two ECMP paths to the source and if the (S, G) 335 PIM state is enabled for ECMP-Mode MoFRR, the router installs them as 336 primary RPF and secondary RPF. It sends a PIM join to both RPF 337 entries. Only packets receive from the primary RPF entry are 338 processed. Packets received from the secondary RPF are dropped 339 (equivalent to an RPF failure). 341 The selected primary RPF interface should be the same as if MoFRR 342 extension was not enabled. 344 If more than two ECMP paths exist, two are selected as primary and 345 secondary RPF interfaces. Information from the IGP link-state 346 topology could be leveraged to optimize this selection. 348 Note, MoFRR does not restrict the number of paths on which joins are 349 sent. Implementations may use as many paths as are configured. 351 6. Non-ECMP-mode MoFRR 353 SourceS 354 / \ 355 / \ 356 Backbone 357 | | 358 | | 359 | | 360 X--------N 362 Fig5. Non-ECMP-Mode MoFRR 364 X is configured for MoFRR for state (S, G) 365 R(X) is Xs RPF to S 366 N is a neighbor of X 367 R(N) is Ns RPF to S 368 xs represents the IGP metric from X to S 369 ns represents the IGP metric from N to S 370 xn represents the IGP metric from X to N 372 A router X configured for non-ECMP-mode MoFRR for (S, G) sends a 373 primary PIM join to its primary RPF R(X) and a secondary PIM Join to 374 a neighbor N if the following three conditions are met. 376 C1: xs < xn + ns 377 C2: ns < nx + xs 378 C3: X cannot send a secondary join to N if N is the only member of the OIF list 380 The first condition ensures that N is not on the primary branch from 381 X to S. 383 The second condition ensures that X is not on the primary branch from 384 N to S. 386 These two conditions ensure that at least locally the two paths are 387 disjoint. 389 The third condition is required to break control-plane loops which 390 could occur in some scenarios. 392 For example in FIG3, if PE1 and PE2 have received an igmp request for 393 (S, G), they will both send a primary PIM join on their plane and a 394 secondary PIM join to the neighbor PE. If their receivers would 395 leave at the same time, it could be possible for the (S, G) states on 396 PE1 and PE2 to never get deleted as each PE refresh each other via 397 the secondary PIM joins (remember that a secondary PIM join is not 398 distinguishable from a primary PIM join. MoFRR does not require any 399 PIM protocol modification). 401 A control-plane loop occurs when two nodes keep a state forever due 402 to the secondary joins they send to each other. This forever 403 condition is not acceptable as no real receiver is connected to the 404 nodes (directly via IGMP or indirectly via PIM). Rule 3 prevents 405 this case as it prevents the mutual refresh of secondary joins and it 406 applies it in the specific case where there is no real receiver 407 connected. 409 6.1. Variation 411 Rule R3 can be removed if Rule 2 is restricted as follows: 413 R2p: ns < xs 415 This ensures that X only sends a secondary join to a neighbor N who 416 is strictly closer to the source than X is. By reciprocity, N will 417 thus never be able to send an sedondary join to the same source via 418 X. The strictly smaller than is key here. 420 Note that this non-ECMP-mode MoFRR variation does not support the 421 square topology and hence is less preferred. 423 7. Ring Topologies 425 This will be documented in a future versions of the draft. 427 8. Keep It Simple Principle 429 Many Service Providers devise their topology such that PEs have 430 disjoint paths to the multicast sources. MoFRR leverages the 431 existence of these disjoint paths without any PIM protocol 432 modification. Interoperability testing is thus not required. In 433 such topologies, MoFRR only needs to be deployed on the PE devices. 434 Each PE device can be enabled one by one. PEs not enabled for MoFRR 435 do not see any change or degradation. 437 Multicast streams with Tight SLA requirements are often characterized 438 by a continuous high packet rate (SD video has a continuous 439 interpacket gap of ~ 3msec). MoFRR simply leverages the stream 440 characteristic to detect any failures along the primary branch and 441 switch-over on the secondary branch in a few 10s of msec. 443 9. Other Applications 445 While all the examples in this document show the MoFRR applicability 446 on PE devices, it is clear that MoFRR could be enabled on aggregation 447 or core routers. 449 MoFRR can be popular in Data Center network configurations. With the 450 advent of lower cost ethernet and increasing port density in routers, 451 there is more meshed connectivity than ever before. When using a 452 3-level access, distribution, and core layers in a Data Center, there 453 is a lot of inexpensive bandwidth connecting the layers. This will 454 lend itself to more opportunities for ECMP paths at multiple layers. 455 This allows for multiple layers of redundancy protecting link and 456 node failure at each layer with minimal redundancy cost. 458 Redundancy costs are reduced because only one packet is forwarded at 459 every link along the primary and secondary data paths so there is no 460 duplication of data on any link thereby providing make-before-break 461 protection at a very small cost. 463 The MoFRR behavior described for PIM are immediately applicable to 464 MLDP. Alternate methods to detect failures such as MPLS-OAM or BFD 465 may be considered. 467 The MoFRR principle may be applied to MVPNs. 469 10. Security Considerations 471 There are no security considerations for this design other than what 472 is already in the main PIM specification [RFC4601]. 474 11. Acknowledgments 476 The authors would like to thank John Zwiebel, Greg Shepherd and Dave 477 Oran for their review of the draft. 479 12. References 481 12.1. Normative References 483 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 484 Specification", RFC 5036, October 2007. 486 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 487 Requirement Levels", BCP 14, RFC 2119, March 1997. 489 12.2. Informative References 491 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 492 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 493 Protocol Specification (Revised)", RFC 4601, August 2006. 495 Authors' Addresses 497 Apoorva Karan 498 Cisco Systems, Inc. 499 3750 Cisco Way 500 San Jose CA, 95134 501 USA 503 Email: apoorva@cisco.com 505 Clarence Filsfils 506 Cisco Systems, Inc. 507 De kleetlaan 6a 508 Diegem BRABANT 1831 509 Belgium 511 Email: cfilsfil@cisco.com 512 Dino Farinacci 513 Cisco Systems, Inc. 514 425 East Tasman Drive 515 San Jose CA, 95134 516 USA 518 Email: dino@cisco.com