idnits 2.17.1 draft-ietf-mboned-anycast-rp-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 9 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 257 has weird spacing: '... Let k = m...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'MSDP' is defined on line 313, but no explicit reference was found in the text == Unused Reference: 'RFC2362' is defined on line 325, but no explicit reference was found in the text == Unused Reference: 'RFC2403' is defined on line 334, but no explicit reference was found in the text -- No information found for draft-ietf-farinacci-anycast-clusters - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'CLUSTERS' -- Possible downref: Normative reference to a draft: ref. 'MSDP' == Outdated reference: A later version (-01) exists of draft-ietf-pim-v2-auth-00 -- Possible downref: Normative reference to a draft: ref. 'PIMAUTH' ** Obsolete normative reference: RFC 1825 (Obsoleted by RFC 2401) ** Downref: Normative reference to an Historic RFC: RFC 1828 ** Obsolete normative reference: RFC 2362 (Obsoleted by RFC 4601, RFC 5059) ** Downref: Normative reference to an Informational RFC: RFC 2382 ** Obsolete normative reference: RFC 2385 (Obsoleted by RFC 5925) Summary: 12 errors (**), 0 flaws (~~), 8 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 MBONED Working Group Dorian Kim 2 Internet Draft Verio 3 David Meyer 4 Cisco Systems 5 Henry Kilmer 6 Dino Farinacci 8 Category Informational 9 October, 1999 11 Anycast RP mechanism using PIM and MSDP 12 14 1. Status of this Memo 16 This document is an Internet-Draft and is in full conformance with 17 all provisions of Section 10 of RFC 2026. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet- Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 2. Abstract 37 This document describes a mechanism to allow for an arbitrary number 38 of RPs per group in a single share-tree PIM-SM domain. 40 This memo is a product of the MBONE Deployment Working Group (MBONED) 41 in the Operations and Management Area of the Internet Engineering 42 Task Force. Submit comments to or the 43 authors. 45 3. Copyright Notice 47 Copyright (C) The Internet Society (1999). All Rights Reserved. 49 4. Introduction 51 PIM-SM as currently defined allows for only a single active RP per 52 group, and as such the decision of optimal RP placement can become 53 problematic for a multi-regional network deploying PIM-SM. 55 The single active RP, or flat RP space design of PIM-SM has several 56 implications, including traffic concentration, lack of scalable load 57 balancing and redundancy between RPs, sub-optimal forwarding of 58 multicast packets, and distant RP dependencies. These properties of 59 PIM-SM have been demonstrated in recent native continental or inter- 60 continental scale multicast deployments. As a result, it became clear 61 that ISP backbones require a mechanism that allows definition of 62 multiple active RPs per group in single PIM-SM domain. Further, any 63 such mechanism should also addresses the issues addressed above. 65 The mechanism described here is intended to address the need for 66 redundancy and load sharing among RPs in a domain. It is primarily 67 intended for application within those networks which are using MBGP, 68 MSDP and PIM-SM protocols for native multicast deployment, although 69 it not limited to those protocols. In particular, Anycast RP is 70 applicable in any PIM-SM network that also supports MSDP (MSDP is 71 required so that the various RPs in the domain maintain a consistent 72 view of the sources that are active). Note however, a domain 73 deploying Anycast RP is not required to run MBGP. 75 5. Problem Definition 77 The anycast RP solution provides a solution for both redundancy and 78 load balancing among any number of active RPs in a domain. 80 5.1. Traffic Concentration and Load Balancing Between RPs 82 While PIM-SM allows for multiple RPs to be defined for a given group, 83 only one group to RP mapping can active at a given time. A 84 traditional deployment mechanism for load balancing between multiple 85 RPs covering the multicast group space is to split up the 224.0.0.0/4 86 space between multiple defined RPs. This is an acceptable solution as 87 long as multicast traffic remains low, but has problems as multicast 88 traffic increases, especially because the network operator defining 89 group space split between RPs does not alway have a priori knowledge 90 of traffic distribution between groups. This can be overcome via 91 periodic reconfigurations, but operational considerations cause this 92 type of solution to scale poorly. The other alternative to periodic 93 reconfiguration is to split 224.0.0.0/4 space more finely between 94 more RPs, but this solution can have the disadvantage of creating 95 more complex RP configurations, along with the attendant operational 96 problems when RPs are configured [CLUSTERS]. 98 5.2. Sub-optimal Forwarding of Multicast Packets 100 When a single RP serves a given multicast group, all joins to that 101 group will be sent to that RP regardless of the topological distance 102 between the RP and the sources and receivers. Initial data will be 103 sent towards the RP also until configured shortest path tree switch 104 threshold is is reached, or the data will always be sent towards the 105 RP if the network is configured to always use RP rooted shared tree. 106 This holds true even if all the sources and the receivers are in any 107 given single region, and RP is topologically distant from the sources 108 and the receivers. This is an artifact of the dynamic nature of 109 multicast group members, and of the fact that operators may not 110 always have a priori knowledge of the topological placement of the 111 group members. 113 Taken together, these effects can mean that (for example) although 114 all the sources and receivers of a given group are in Europe, they 115 are joining towards the RP in USA and the data will be traversing 116 relatively expensive pipe(s) twice, once to get to RP, and back down 117 the RP rooted tree again, creating inefficient use of expensive 118 resources. 120 5.3. Distant RP Dependencies 122 As outlined above, single active RP per group may cause local sources 123 and receivers to become dependent on a topologically distant RP. In 124 case of a scenario where there are backup RPs configured, distant RP 125 dependence can be created due to the failure of the primary RP, which 126 is topologically closer, and may become exacerbated by switching to 127 the backup RP, which may be even more distant topologically, which 128 may lead to inferior performance, if not outright loss of 129 connectivity to an RP serving the group, depending on the network 130 condition at the given moment. 132 6. Solution 134 Given the problem set outlined above, a good solution would allow an 135 operator to define multiple RPs per group, and distribute those RPs 136 in a topologically significant manner to the sources and receivers. 138 6.1. Mechanisms 140 All the RPs serving a given group or set of groups are configured 141 with identical unicast address, using a numbered interface on the RPs 142 (frequently a logical interface such as a loopback is used). RPs then 143 advertise group to RP mappings using this interface address. This 144 will cause group members (senders) to join (register) towards the 145 topologically closest RP. RPs MSDP peer with each other using the 146 unique shared addresses. Note that if the router implementation 147 chooses the shared address for the BGP router ID, then BGP peerings 148 will not be established. As a result, care should be taken to avoid 149 the ambiguity of the BGP router ID with the RP address (for example, 150 if the logical address chosen is the highest IP address configured on 151 the router, and the router implementation that automatically chooses 152 a router ID based upon highest IP address assigned to interfaces). 153 Finally, the solution described here can be implemented without any 154 modification to existing protocols or their implementations. 156 6.2. Interaction with MSDP Peer-RPF check 158 Each MSDP peer receives and forwards the message away from the RP 159 address in a "peer-RPF flooding" fashion. The notion of peer-RPF 160 flooding is with respect to forwarding SA messages. The BGP or MBGP 161 routing tables are examined to determine which peer is the next hop 162 towards the originating RP of the SA message. Such a peer is called 163 an "RPF peer". There are a few simple rules that govern how MSDP 164 Peer-RPF checks. These rules should be kept in mind when configuring 165 Anycast RP: 167 6.2.1. Singly Homed MSDP Speaker 169 A singly homed MSDP speaker always accepts SA messages from its peer. 171 6.2.2. RP in SA is a MSDP Peer 173 A MSDP speaker always accepts SAs for which the RP in the SA message 174 is a peer. 176 6.2.3. Router is itself RP in SA message 178 A MSDP speaker always rejects an SA from any peer if it the RP in the 179 SA message. 181 6.2.4. Router has default peers 183 If a MSDP speaker has one or more default peers configured, then it 184 will accept an SA message if comes from the default peer for the RP 185 in the SA message. 187 6.2.5. Complex MSDP Scenario 189 Consider routers R1, R2, and R3 form an Anycast RP mesh for an AS. C1 190 and C2 are customer routers (and the BGP session not multi-hop); RP 191 is C1's RP. P1 is a peer router. The picture is as follows: 193 MSDP/MBGP Anycast RPs 194 C1 ============= R1----------------R2 195 | SA(S,G,RP) / 196 | / 197 | MBGP/MSDP MSDP MSDP / 198 | / 199 | / 200 C2 SA(S',G',RP') / 201 / 202 R3 ============== P1 203 SA(S,G,RP) 204 -------> 206 6.2.6. Internal MSDP Peering 208 When R1 sees SA(S,G,RP) from C1, it sees the next-hop toward prefix 209 covering RP is through C1, so R1 accepts the SA. R1 will forward 210 SA(S,G,RP) to R2 and R3, which will only accept SA(S,G,RP) if R1 is 211 announcing (and is) the next-hop towards the originating RP each SA 212 message. Note that operationally, this means next-hop needs to be the 213 same as the MSDP connect-source. 215 This implies that if you want to pass on an SA internally, you have 216 to be announcing the next-hop towards the AS that originates the 217 prefix covering the originating RP. Note that the MSDP connect-source 218 has to be the interface that is configured with the address of the 219 next-hop. 221 Now, if C2 tries MSDP peer with R1 directly (cutting out transit 222 provider C1), then C2's SA RPF fails at R1, because R1 expects SA 223 message to come from a MSDP peer in the next AS in the AS-PATH 224 towards the originating RP, which in this case would C1. 226 6.2.6.1. RULE 228 An internal MSDP peer will accept an SA message from another internal 229 peer iff that peer is the advertiser of towards the prefix covering 230 the RP which originated the SA. 232 6.2.7. External MSDP Peering 234 External peer P1 will accept an SA from R3 iff R3 comes from the next 235 AS in the path. This breaks, for example, if P1 peers with C1. 237 6.2.7.1. RULE 239 An external MSDP peer will accept an SA message from another peer iff 240 the peer is in the next AS in the path towards the AS originating the 241 prefix covering the RP in the SA message. 243 6.3. Further Applications of Anycast RP mechanism 245 The solution described above can also be applied to external MSDP 246 peers that are used to join two PIM-SM domains together. This can 247 provide redundancy to the MSDP peering session, ease operational 248 complexity as well as simplify configuration management. A side 249 effect to be aware of with this design is that which of the 250 configured MSDP sessions comes up will be determined via the unicast 251 topology between two providers, and can be some what unpredictable. 252 If any of the backup peering sessions resets, the active session will 253 also reset. 255 7. Multicast State Scaling 257 Let k = m + r, where 259 r = registering to an RP 260 m = number internal sources learned through MSDP 261 p = number of anycast (internal) MSDP peers 263 For p = 1, m = 0 265 0 receivers ==> 1 (*,G) + 0 SAs 266 Greater than 1 receiver ==> k (S,G) + 0 SAs 268 For p > 1, m != 0 270 0 receivers ==> 1 (*,G) + m SAs 271 Greater than 1 receiver ==> k (S,G) + m SAs 273 Importantly, the multicast state growth is O(k), where k is not a 274 function of p, the number of anycast RP peers. 276 8. Security considerations 278 Since the solution described here makes heavy use of anycast 279 addressing, care must be taken to avoid spoofing. In particular 280 unicast routing and PIM RPs must be protected. 282 8.1. Unicast Routing 284 Both internal and external unicast routing can be weakly protected 285 with keyed MD5 [RFC1828], as implemented in an internal protocol such 286 as OSPF [RFC2382] or in BGP [RFC2385]. More generally, IPSEC 287 [RFC1825] could be used to provide protocol integrity for the unicast 288 routing system. 290 8.2. Multicast Protocol Integrity 292 The mechanisms described in [PIMAUTH] should be used to provide 293 protocol message integrity protection and group-wise message origin 294 authentication. 296 8.3. MSDP Peer Integrity 298 As is the the case for BGP, MSDP peers can be protected using keyed 299 MD5 [RFC1828]. 301 9. Acknowledgments 303 John Meylor, Dave Thaler and Tom Pusateri provided insightful 304 comments on earlier versions for this idea. 306 10. References 308 [CLUSTERS] D. Farinacci, et. al., "Use of Anycast Clusters for 309 Inter-Domain Multicast Routing", 310 draft-ietf-farinacci-anycast-clusters-01.txt, March, 311 1998. ftp://ftpeng.cisco.com/ipmulticast/internet-drafts 313 [MSDP] D. Farinacci, et. al., "Multicast Source Discovery 314 Protocol (MSDP)", draft-farinacci-msdp-00.txt, 315 June, 1998. 317 [PIMAUTH] L. Wei, et al., "Authenticating PIM version 2 messages", 318 draft-ietf-pim-v2-auth-00.txt, November, 1998. 320 [RFC1825] Atkinson, R., "IP Security Architecture", August 1995. 322 [RFC1828] P. Metzger and W. Simpson, "IP Authentication using Keyed 323 MD5", RFC 1828, August, 1995. 325 [RFC2362] D. Estrin, et. al., "Protocol Independent Multicast- 326 Sparse Mode (PIM-SM): Protocol Specification", RFC 327 2362, June, 1998. 329 [RFC2382] Moy, J., "OSPF Version 2", RFC 2382, April 1998. 331 [RFC2385] Herrernan, A., "Protection of BGP Sessions via the TCP 332 MD5 Signature Option", RFC 2385, August, 1998. 334 [RFC2403] C. Madson and R. Glenn, "The Use of HMAC-MD5-96 within 335 ESP and AH", RFC 2403, November, 1998. 337 11. Author's Address 339 Dorian Kim 340 Verio, Inc. 341 2361 Lancashire Dr. #2A 342 Ann Arbor, MI 48015 343 Email: dorian@blackrose.org 345 Hank Kilmer 346 Email: hank@rem.com 348 Dino Farinacci 349 Email: dino@dinof.net 351 David Meyer 352 Cisco Systems, Inc. 353 170 Tasman Drive 354 San Jose, CA, 95134 355 Email: dmm@cisco.com