idnits 2.17.1 draft-ietf-pim-bidir-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 233 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([9], [11]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 452: '..._Capable PIM-Hello option that MUST be...' RFC 2119 keyword, line 540: '...or Prune MUST be silently dropped. In ...' RFC 2119 keyword, line 1452: '...All PIM control messages MAY use IPsec [6] to address security concerns....' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (26 June 2002) is 7975 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '1' is defined on line 1600, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 1603, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 1606, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 1609, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 1612, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2283 (ref. '1') (Obsoleted by RFC 2858) -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 2434 (ref. '5') (Obsoleted by RFC 5226) ** Obsolete normative reference: RFC 2401 (ref. '6') (Obsoleted by RFC 4301) -- Possible downref: Normative reference to a draft: ref. '7' == Outdated reference: A later version (-02) exists of draft-ietf-pim-simplekmp-01 -- Possible downref: Normative reference to a draft: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' ** Obsolete normative reference: RFC 2362 (ref. '11') (Obsoleted by RFC 4601, RFC 5059) == Outdated reference: A later version (-12) exists of draft-ietf-pim-sm-bsr-00 Summary: 10 errors (**), 0 flaws (~~), 8 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force PIM WG 2 INTERNET-DRAFT Mark Handley/ICIR 3 draft-ietf-pim-bidir-04.txt Isidor Kouvelas/Cisco 4 Tony Speakman/Cisco 5 Lorenzo Vicisano/Cisco 6 26 June 2002 7 Expires: December 2002 9 Bi-directional Protocol Independent Multicast (BIDIR-PIM) 11 Status of this Document 13 This document is an Internet-Draft and is in full conformance with all 14 provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering Task 17 Force (IETF), its areas, and its working groups. Note that other groups 18 may also distribute working documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference material 23 or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This document is a product of the IETF PIM WG. Comments should be 32 addressed to the authors, or the WG's mailing list at 33 pim@catarina.usc.edu. 35 Abstract 37 This document discusses Bi-directional PIM, a variant of PIM 38 Sparse-Mode [9] that builds bi-directional shared trees 39 connecting multicast sources and receivers. Bi-directional 40 trees are built using a fail-safe Designated Forwarder (DF) 41 election mechanism operating on each link of a multicast 42 topology. With the assistance of the DF, multicast data is 43 natively forwarded from sources to the Rendezvous-Point and 44 hence along the shared tree to receivers without requiring 45 source-specific state. The DF election takes place at RP 46 discovery time and provides a default route to the RP thus 47 eliminating the requirement for data-driven protocol events. 49 Note on BIDIR-PIM status 51 The differences between this version of the BIDIR-PIM specification and 52 draft-ietf-pim-bidir-new-00.txt are mostly in the format of the 53 information presented. As BIDIR-PIM has many similarities in operation 54 to Sparse-Mode PIM, the earlier version of this spec relied heavily on 55 the now obsolete PIM-SM [11] specification. This revision removes this 56 dependency and instead references the new Sparse-Mode documentation [9] 57 where necessary. In addition the method in which the protocol 58 specification is presented has been updated to follow the format of [9]. 60 Table of Contents 62 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 64 2.1. Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 2.2. Pseudocode Notation. . . . . . . . . . . . . . . . . . . . . . 7 66 3. Protocol Specification. . . . . . . . . . . . . . . . . . . . . . 8 67 3.1. BIDIR-PIM Protocol State . . . . . . . . . . . . . . . . . . . 8 68 3.1.1. General Purpose State . . . . . . . . . . . . . . . . . . . 9 69 3.1.2. RP State. . . . . . . . . . . . . . . . . . . . . . . . . . 9 70 3.1.3. Group State . . . . . . . . . . . . . . . . . . . . . . . . 10 71 3.1.4. State Summarization Macros. . . . . . . . . . . . . . . . . 11 72 3.2. PIM Neighbor Discovery . . . . . . . . . . . . . . . . . . . . 12 73 3.3. Data Packet Forwarding Rules . . . . . . . . . . . . . . . . . 12 74 3.3.1. Source-Only Branches. . . . . . . . . . . . . . . . . . . . 13 75 3.4. PIM Join/Prune Messages. . . . . . . . . . . . . . . . . . . . 13 76 3.4.1. Receiving (*,G) Join/Prune Messages . . . . . . . . . . . . 14 77 3.4.2. Sending Join/Prune Messages . . . . . . . . . . . . . . . . 16 78 3.5. Designated Forwarder (DF) Election . . . . . . . . . . . . . . 19 79 3.5.1. DF Requirements . . . . . . . . . . . . . . . . . . . . . . 19 80 3.5.2. DF Election description . . . . . . . . . . . . . . . . . . 20 81 3.5.2.1. Bootstrap Election . . . . . . . . . . . . . . . . . . . 20 82 3.5.2.2. Loser Metric Changes . . . . . . . . . . . . . . . . . . 21 83 3.5.2.3. Winner Metric Changes. . . . . . . . . . . . . . . . . . 22 84 3.5.2.4. Winner Loses Path. . . . . . . . . . . . . . . . . . . . 22 85 3.5.2.5. Late Router Starting Up. . . . . . . . . . . . . . . . . 22 86 3.5.2.6. Winner Dies. . . . . . . . . . . . . . . . . . . . . . . 22 87 3.5.3. Election Protocol Specification . . . . . . . . . . . . . . 23 88 3.5.3.1. Election State . . . . . . . . . . . . . . . . . . . . . 23 89 3.5.3.2. Election Messages. . . . . . . . . . . . . . . . . . . . 24 90 3.5.3.3. Election Events. . . . . . . . . . . . . . . . . . . . . 24 91 3.5.3.4. Election Notation. . . . . . . . . . . . . . . . . . . . 25 92 3.5.3.5. Election State Transitions . . . . . . . . . . . . . . . 25 93 3.6. Timers and Constants . . . . . . . . . . . . . . . . . . . . . 28 94 3.7. BIDIR PIM Packet Formats . . . . . . . . . . . . . . . . . . . 32 95 3.7.1. DF Election Packet Formats. . . . . . . . . . . . . . . . . 32 96 3.7.2. Backoff Message . . . . . . . . . . . . . . . . . . . . . . 33 97 3.7.3. Pass Message. . . . . . . . . . . . . . . . . . . . . . . . 34 98 3.7.4. Bidir Capable PIM-Hello Option. . . . . . . . . . . . . . . 34 99 4. RP Discovery. . . . . . . . . . . . . . . . . . . . . . . . . . . 35 100 5. Security Considerations . . . . . . . . . . . . . . . . . . . . . 35 101 5.1. Appendix A: Election Reliability 102 Enhancements. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 103 5.1.1. A.1 Missing Pass. . . . . . . . . . . . . . . . . . . . . . 36 104 5.1.2. A.2 Periodic Winner Announcement. . . . . . . . . . . . . . 36 105 5.2. Appendix B: Interoperability with legacy 106 code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 107 5.3. Appendix C: Comparison with PIM-SM . . . . . . . . . . . . . . 37 108 6. Todo list.... . . . . . . . . . . . . . . . . . . . . . . . . . . 38 109 7. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . . . 38 110 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 38 111 9. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 112 10. Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 113 1. Introduction 115 This document specifies Bi-directional PIM, a variant of PIM Sparse-Mode 116 (PIM-SM) [9] that builds bi-directional shared trees connecting 117 multicast sources and receivers. 119 PIM-SM constructs uni-directional shared trees that are used to forward 120 data from senders to receivers of a multicast group. PIM-SM also allows 121 the construction of source specific trees, but this capability is not 122 related to the protocol described in this document. 124 The shared tree for each multicast group is rooted at a multicast router 125 called the Rendezvous Point (RP). Different multicast group ranges can 126 use separate RPs within a PIM domain. 128 In unidirectional PIM-SM, there are two possible methods for 129 distributing data packets on the shared tree. These differ in the way 130 packets are forwarded from a source to the RP: 132 o Initially when a source starts transmitting, its first hop router 133 encapsulates data packets in special control messages (Registers) 134 which are unicast to the RP. After reaching the RP the packets are 135 decapsulated and distributed on the shared tree. 137 o A transition from the above distribution mode can be made at a later 138 stage. This is achieved by building source specific state on all 139 routers along the path between the source and the RP. This state is 140 then used to natively forward packets from that source. 142 Both these mechanisms suffer from problems. Encapsulation results in 143 significant processing, bandwidth and delay overheads. Forwarding using 144 source specific state has additional protocol and memory requirements. 146 Bi-directional PIM dispenses with both encapsulation and source state by 147 allowing packets to be natively forwarded from a source to the RP using 148 shared tree state. For a complete discussion of the pros and cons of Bi- 149 directional PIM consult appendix C. 151 2. Terminology 153 In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", 154 "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 155 "OPTIONAL" are to be interpreted as described in RFC 2119 and indicate 156 requirement levels for compliant PIM-SM implementations. 158 2.1. Definitions 160 This specification uses a number of terms to refer to the roles of 161 routers participating in BIDIR-PIM. The following terms have special 162 significance for BIDIR-PIM: 164 MRIB Multicast Routing Information Base. This is the multicast 165 topology table, which is typically derived from the unicast 166 routing table, or routing protocols such as MBGP that carry 167 multicast-specific topology information. It is used by PIM for 168 establishing the RPF interface (used in the forwarding rules). In 169 PIM-SM the MRIB is also used to make decisions regarding where to 170 forward Join/Prune messages whereas in BIDIR-PIM it is used as a 171 source for routing metrics for the DF election process. 173 Rendezvous Point (RP): 174 An RP is a router that has been configured to be used as the root 175 of the distribution tree for a range of multicast groups. Join 176 messages from receivers for a group are sent towards the RP. 178 Upstream 179 Towards the root (Rendezvous-Point) of the tree. The direction 180 used by packets traveling from sources to the RP. 182 Downstream 183 Away from the root of the tree. The direction on which packets 184 travel from the RP to receivers. 186 Designated Forwarder (DF): 187 The protocol presented in this document is largely based on the 188 concept of a Designated Forwarder (DF). A single DF exists for 189 each RP on every link within a BIDIR-PIM domain (this includes 190 both multi-access and point-to-point links). The DF is the router 191 on the link with the best unicast route to the RP. A DF for a 192 given RP is in charge of forwarding downstream traffic onto the 193 link, and forwarding upstream traffic from the link towards the 194 RP. It does this for all the bi-directional groups served by the 195 RP. The DF on a link is also responsible for interpreting IGMP 196 information from local receivers and processing Join messages from 197 other routers on the link. 199 RPF Interface 200 RPF stands for "Reverse Path Forwarding". The RPF Interface of a 201 router with respect to an address is the interface that the MRIB 202 indicates should be used to forward packets to that address. In 203 the case of a BIDIR-PIM multicast group, the RPF interface is the 204 interface that would be used to send packets to the RP for the 205 group. 207 RPF Neighbor 208 The RPF Neighbor of a router with respect to an address is the 209 neighbor that the MRIB indicates should be used to forward packets 210 to that address. Note that in BIDIR-PIM, the RPF neighbor for a 211 group is not necessarily the router on the RPF interface that Join 212 messages for that group would be directed to (Join messages are 213 directed to the DF on the RPF interface for the group). 215 TIB Tree Information Base. This is the collection of state at a PIM 216 router that has been created by receiving PIM Join/Prune messages, 217 PIM DF election messages and IGMP information from local hosts. 218 It essentially stores the state of all multicast distribution 219 trees at that router. 221 MFIB Multicast Forwarding Information Base. The TIB holds all the 222 state that is necessary to forward multicast packets at a router. 223 However, although this specification defines forwarding in terms 224 of the TIB, to actually forward packets using the TIB is very 225 inefficient. Instead a real router implementation will normally 226 build an efficient MFIB from the TIB state to perform forwarding. 227 How this is done is implementation-specific, and is not discussed 228 in this document. 230 2.2. Pseudocode Notation 232 We use set notation in several places in this specification. 234 A (+) B 235 is the union of two sets A and B. 237 A (-) B 238 is the elements of set A that are not in set B. 240 NULL 241 is the empty set or list. 243 In addition we use C-like syntax: 245 = denotes assignment of a variable. 247 == denotes a comparison for equality. 249 != denotes a comparison for inequality. 251 Braces { and } are used for grouping. 253 3. Protocol Specification 255 The specification of BIDIR-PIM is broken into several parts: 257 o Section 3.1 details the protocol state stored. 259 o Section 3.3 specifies the data packet forwarding rules. 261 o Section 3.4 specifies the BIDIR-PIM Join/Prune generation and 262 processing rules. 264 o Designated Forwarder (DF) election is specified in Section 3.5. 266 o PIM packet formats are specified in Section 3.7. 268 o A summary of BIDIR-PIM timers and their default values is given in 269 Section 3.6. 271 3.1. BIDIR-PIM Protocol State 273 This section specifies all the protocol state that a BIDIR-PIM 274 implementation should maintain in order to function correctly. We term 275 this state the Tree Information Base or TIB, as it holds the state of 276 all the multicast distribution trees at this router. In this 277 specification we define PIM mechanisms in terms of the TIB. However, 278 only a very simple implementation would actually implement packet 279 forwarding operations in terms of this state. Most implementations will 280 use this state to build a multicast forwarding table, which would then 281 be updated when the relevant state in the TIB changes. 283 Although we specify precisely the state to be kept, this does not mean 284 that an implementation of PIM-SM needs to hold the state in this form. 285 This is actually an abstract state definition, which is needed in order 286 to specify the router's behavior. A BIDIR-PIM implementation is free to 287 hold whatever internal state it requires, and will still be conformant 288 with this specification so long as it results in the same externally 289 visible protocol behavior as an abstract router that holds the following 290 state. 292 We divide TIB state into two sections: 294 RP state 295 State that maintains the DF election information for each RP. 297 Group state 298 State that maintains a group-specific tree for groups that map to a 299 given RP. 301 The state that should be kept is described below. Of course, 302 implementations will only maintain state when it is relevant to 303 forwarding operations - for example, the "NoInfo" state might be assumed 304 from the lack of other state information, rather than being held 305 explicitly. 307 3.1.1. General Purpose State 309 A router holds the following state that is not specific to a RP or 310 group: 312 Neighbor State: 314 For each neighbor: 316 o Information from neighbor's Hello 318 o Neighbor's Gen ID. 320 o Neighbor liveness timer (NLT) 322 3.1.2. RP State 324 A router maintains a multicast-group to RP mapping which is built 325 through static configuration or by using an automatic RP discovery 326 mechanism like BSR or AUTO-RP (see section 4 ). For each BIDIR-PIM RP a 327 router holds the following state: 329 o RP address 331 Designated Forwarder (DF) State: 333 For each router interface: 335 Acting DF information: 337 o DF IP Address 339 o DF metric 341 Election information: 343 o Election State 345 o DF Election-Timer (DFT) 346 o Offer-Count (OC) 348 Current best offer: 350 o IP address of best offering router 352 o Best offering router metric 354 Designated Forwarder state is described in section 3.5. 356 3.1.3. Group State 358 For every group G a router keeps the following state: 360 Group state: 362 For each interface: 364 Local Membership: 366 o State: One of {"NoInfo", "Include"} 368 PIM Join/Prune State: 370 o State: One of {"NoInfo" (NI), "Join" (J), 371 "PrunePending" (PP)} 373 o Prune Pending Timer (PPT) 375 o Join/Prune Expiry Timer (ET) 377 Not interface specific: 379 o Upstream Join/Prune Timer (JT) 381 o Last RP Used 383 Local membership is the result of the local membership mechanism (such 384 as IGMP) running on that interface. This information is used by the 385 pim_include(*,G) macro described in section 3.1.4. 387 PIM Join/Prune state is the result of receiving PIM (*,G) Join/Prune 388 messages on this interface, and is specified in section 3.4.1. The state 389 is used by the macros that calculate the outgoing interface list in 390 section 3.1.4, and in the JoinDesired(G) macro (defined in section 391 3.4.2) that is used in deciding whether a Join(*,G) should be sent 392 upstream. 394 The upstream Join/Prune timer is used to send out periodic Join(*,G) 395 messages, and to override Prune(*,G) messages from peers on an upstream 396 LAN interface. 398 The last RP used must be stored because if the RP Set changes [9] then 399 state must be torn down and rebuilt for groups whose RP changes. 401 3.1.4. State Summarization Macros 403 Using this state, we define the following "macro" definitions which we 404 will use in the descriptions of the state machines and pseudocode in the 405 following sections. 407 olist(G) = 408 RPF_interface(RP(G)) (+) joins(G) (+) pim_include(G) 410 RPF_interface(RP) is the interface the MRIB indicates would be used to 411 route packets to RP. The olist(G) is the list of interfaces on which 412 packets to group G must be forwarded. 414 The macro pim_include(G) indicates the interfaces to which traffic might 415 be forwarded because of hosts that are local members on that interface. 417 pim_include(G) = 418 { all interfaces I such that: 419 I_am_DF(RP(G),I) AND local_receiver_include(G,I) } 421 The clause "I_am_DF(RP,I)" is TRUE if the router is in the Win or 422 Backoff states in the DF election state machine for interface I 423 (described in section 3.5 ). Otherwise it is FALSE. 425 The clause "local_receiver_include(G,I)" is true if the IGMP module or 426 other local membership mechanism has determined that there are local 427 members on interface I that desire to receive traffic sent to group G. 429 The set "joins(G)" is the set of all interfaces on which the router has 430 received (*,G) Joins: 432 joins(G) = 433 { all interfaces I such that 434 I_am_DF(RP(G),I) AND 435 DownstreamJPState(G,I) is either Joined or PrunePending } 437 DownstreamJPState(G,I) is the state of the finite state machine in 438 section 3.4.1. 440 RPF_DF(RP) is the neighbor that Join messages must be sent to in order 441 to reach the RP. This is the Designated-Forwarder on the 442 RPF_interface(RP). 444 3.2. PIM Neighbor Discovery 446 PIM routers exchange PIM-Hello messages with their neighboring PIM 447 routers. These messages are used to update the Neighbor State described 448 in section 3.1. The procedures for generating and processing received 449 Hello messages as well as maintaining Neighbor State are specified in 450 the PIM-SM [9] documentation. 452 Bidir PIM introduces the Bidir_Capable PIM-Hello option that MUST be 453 included in all Hello messages from a Bidir-PIM capable router. The 454 Bidir_Capable option advertises the router's ability to participate in 455 the Bidir-PIM protocol. The format of the Bidir_Capable option is 456 described in section 3.7. 458 3.3. Data Packet Forwarding Rules 460 For groups mapping to a given RP, the following responsibilities are 461 uniquely assigned to the DF for that RP on each link: 463 o The DF is the only router that forwards packets traveling downstream 464 onto the link. 466 o The DF is the only router that picks-up upstream traveling packets off 467 the link to forward towards the RP. 469 Non-DF routers on a link, that use that link as their RPF interface to 470 reach the RP, may perform the following forwarding actions for 471 bidirectional groups: 473 o Forward packets from the link towards downstream receivers. 475 o Forward packets from downstream sources onto the link (provided they 476 are the DF for the downstream link from which the packet was picked- 477 up). 479 The BIDIR-PIM packet forwarding rules are defined below in pseudocode. 481 iif is the incoming interface of the packet. 482 G is the destination address of the packet (group address). 483 RP is the address of the Rendezvous Point for this group. 485 First we check to see whether the packet should be accepted based on TIB 486 state and the interface that the packet arrived on. A packet is accepted 487 if it arrives on the RPF_interface to reach the RP (downstream traveling 488 packet) or if the router is the DF on the interface the packet arrives 489 (upstream traveling packet). 491 If the packet should be forwarded we build an outgoing interface list 492 for the packet. 494 Finally we remove the incoming interface from the outgoing interface 495 list we've created, and if the resulting outgoing interface list is not 496 empty, we forward the packet out of those interfaces. 498 On receipt on a data to G on interface iif: 500 if( iif == RPF_interface(RP) || I_am_DF(RP,I) ) { 501 oiflist = olist(G) (-) iif 502 forward packet on all interfaces in oiflist 503 } 505 Note: A major advantage of using a Designated Forwarder in BIDIR-PIM 506 compared to PIM-SM is that special treatment is no longer required for 507 sources that are directly connected to a router. Data from such sources 508 does not need to be differentiated from other multicast traffic and will 509 automatically be picked up by the DF. This removes the need for 510 performing a directly-connected-source check for data to groups that do 511 not have existing state. 513 3.3.1. Source-Only Branches 515 Source-only branches of the distribution tree for a group G are branches 516 which do not lead to any receivers, but which are used to forward 517 packets traveling upstream from sources towards the RP. Routers along 518 source-only branches only have the RPF_interface to the RP in their 519 olist for G and hence do not need to maintain any group specific state. 520 Upstream forwarding can be performed using RP state. An implementation 521 may decide to maintain group state for source-only branches for 522 accounting or performance reasons. 524 3.4. PIM Join/Prune Messages 526 A BIDIR-PIM Join/Prune message consists of a list of Joined and Pruned 527 Groups. When processing a received Join/Prune message, each Joined or 528 Pruned Group is effectively considered individually by applying the 529 following state machines. When considering a Join/Prune message whose 530 PIM Destination field addresses this router, (*,G) Joins and Prunes can 531 affect the downstream state machine. When considering a Join/Prune 532 message whose PIM Destination field addresses another router, most Join 533 or Prune entries could affect the upstream state machine. 535 3.4.1. Receiving (*,G) Join/Prune Messages 537 When a router receives a Join(*,G) or Prune(*,G) it must first check to 538 see whether the RP in the message matches RP(G) (the router's idea of 539 who the RP is). If the RP in the message does not match RP(G) the Join 540 or Prune MUST be silently dropped. In addition a router MUST NOT process 541 Join(*,G) messages targeted to itself if it is not the DF for RP(G) on 542 the interface on which the message was received. 544 The per-interface state-machine for receiving (*,G) Join/Prune Messages 545 is given below. There are three states: 547 NoInfo (NI) 548 The interface has no (*,G) Join state and no timers running. 550 Join (J) 551 The interface has (*,G) Join state which will cause us to 552 forward packets destined for G from this interface. 554 PrunePending (PP) 555 The router has received a Prune(*,G) on this interface from a 556 downstream neighbor and is waiting to see whether the prune 557 will be overridden by another downstream router. For 558 forwarding purposes, the PrunePending state functions exactly 559 like the Join state. 561 In addition the state-machine uses two timers: 563 ExpiryTimer (ET) 564 This timer is restarted when a valid Join(*,G) is received. 565 Expiry of the ExpiryTimer causes the interface state to revert 566 to NoInfo for this group. 568 PrunePendingTimer (PPT) 569 This timer is set when a valid Prune(*,G) is received. Expiry 570 of the PrunePendingTimer causes the interface state to revert 571 to NoInfo for this group. 573 +-----------------------------------+ 574 | Figures omitted from text version | 575 +-----------------------------------+ 577 Figure 1: Downstream group per-interface state-machine 579 In tabular form, the group per-interface state-machine is: 581 +----------+------------------------------------------------------------+ 582 | | Event | 583 | +----------+------------+-----------+------------+-----------+ 584 Prev State |Receive |Receive |Prune |Expiry Stop Being | 585 | |Join(*,G) |Prune(*,G) |Pending |Timer DF on I | 586 | | | |Timer |Expires | | 587 | | | |Expires | | | 588 +----------+----------+------------+-----------+------------+-----------+ 589 | |-> J state|-> NI state |- |- + | 590 NoInfo |start | | | | | 591 (NI) |Expiry | | | | | 592 | |Timer | | | | | 593 +----------+----------+------------+-----------+------------+-----------+ 594 | |-> J state|-> PP state |- |-> NI state +> NI state | 595 Join (J) |restart |start Prune | | | | 596 | |Expiry |Pending | | | | 597 | |Timer |Timer | | | | 598 +----------+----------+------------+-----------+------------+-----------+ 599 | |-> J state|-> PP state |-> NI state|-> NI state +> NI state | 600 | |restart | |Send Prune-| | | 601 Prune |Expiry | |Echo(*,G) | | | 602 Pending |Timer; | | | | | 603 (PP) |stop Prune| | | | | 604 | |Pending | | | | | 605 | |Timer | | | | | 606 +----------+----------+------------+-----------+------------+-----------+ 608 The transition events "Receive Join(*,G)" and "Receive Prune(*,G)" imply 609 receiving a Join or Prune targeted to this router's address on the 610 received interface. If the destination address is not correct, these 611 state transitions in this state machine must not occur, although seeing 612 such a packet may cause state transitions in other state machines. 614 On unnumbered interfaces on point-to-point links, the router's address 615 should be the same as the source address it chose for the hello packet 616 it sent over that interface. However on point-to-point links we also 617 recommend that PIM messages with a 0.0.0.0 destination address are also 618 accepted. 620 The transition event "Stop being DF" implies a DF re-election taking 621 place on this router interface and the router changing status from being 622 the active DF to being a non-DF router (the value of the I_am_DF macro 623 changing to FALSE). 625 When ExpiryTimer is started or restarted, it is set to the HoldTime from 626 the triggering Join/Prune message. 628 When PrunePendingTimer is started, it is set to the 629 J/P_Override_Interval if the router has more than one neighbor on that 630 interface; otherwise it is set to zero causing it to expire immediately. 632 The action "Send PruneEcho(*,G)" is triggered when the router stops 633 forwarding on an interface as a result of a prune. A PruneEcho(*,G) is 634 simply a Prune(*,G) message sent by the upstream router to itself on a 635 LAN. Its purpose is to add additional reliability so that if a Prune 636 that should have been overridden by another router is lost locally on 637 the LAN, then the PruneEcho may be received and cause the override to 638 happen. A PruneEcho(*,G) need not be sent on a point-to-point 639 interface. 641 3.4.2. Sending Join/Prune Messages 643 The downstream per-interface state-machines described above hold join 644 state from downstream PIM routers. This state then determines whether a 645 router needs to propagate a Join(*,G) upstream towards the RP. Such 646 Join(*,G) messages are sent on the RPF_interface towards the RP and are 647 targeted at the DF on that interface. 649 If a router wishes to propagate a Join(*,G) upstream, it must also watch 650 for messages on its upstream interface from other routers on that 651 subnet, and these may modify its behavior. If it sees a Join(*,G) to 652 the correct upstream neighbor, it should suppress its own Join(*,G). If 653 it sees a Prune(*,G) to the correct upstream neighbor, it should be 654 prepared to override that prune by sending a Join(*,G) almost 655 immediately. Finally, if it sees the Generation ID (see PIM-SM 656 specification [9]) of the correct upstream neighbor change, it knows 657 that the upstream neighbor has lost state, and it should be prepared to 658 refresh the state by sending a Join(*,G) almost immediately. 660 In addition changes in the next hop towards the RP trigger a prune off 661 from the old next hop, and join towards the new next hop. Such a change 662 can be cause by the following two reasons: 664 o The MRIB indicates that the RPF_interface towards the RP has 665 changed. 667 o There is a DF re-election on the RPF_interface and a new router 668 emerges as the DF. 670 The upstream (*,G) state-machine only contains two states: 672 Not Joined 673 The downstream state-machines indicate that the router does not 674 need to join the RP tree for this group. 676 Joined 677 The downstream state-machines indicate that the router would like 678 to join the RP tree for this group. 680 In addition, one timer JT(G) is kept which is used to trigger the 681 sending of a Join(*,G) to the upstream next hop towards the RP (the DF 682 on the RPF_interface for RP(G)). 684 +-----------------------------------+ 685 | Figures omitted from text version | 686 +-----------------------------------+ 688 Figure 2: Upstream group state-machine 690 In tabular form, the state machine is: 692 +----------------------+------------------------------------------------+ 693 | | Event | 694 | Prev State +------------------------+-----------------------+ 695 | | JoinDesired(G) | JoinDesired(G) | 696 | | ->True | ->False | 697 +----------------------+------------------------+-----------------------+ 698 | | -> J state | - | 699 | NotJoined (NJ) | Send Join(*,G); | | 700 | | Set Timer to | | 701 | | t_periodic | | 702 +----------------------+------------------------+-----------------------+ 703 | Joined (J) | - | -> NJ state | 704 | | | Send Prune(*,G) | 705 +----------------------+------------------------+-----------------------+ 707 In addition, we have the following transitions which occur within the 708 Joined state: 710 +-----------------------------------------------------------------------+ 711 | In Joined (J) State | 712 +-----------------+-----------------+-----------------+-----------------+ 713 |Timer Expires | See Join(*,G) | See Prune(*,G) | RPF_DF(RP(G)) | 714 | | to | to | changes | 715 | | RPF_DF(RP(G)) | RPF_DF(RP(G)) | | 716 +-----------------+-----------------+-----------------+-----------------+ 717 |Send | Increase Timer | Decrease Timer | Decrease Timer | 718 |Join(*,G); Set | to | to t_override | to t_override | 719 |Timer to | t_suppressed | | | 720 |t_periodic | | | | 721 +-----------------+-----------------+-----------------+-----------------+ 723 +-----------------------------------------------------------------------+ 724 | In Joined (J) State | 725 +-------------------------------------+---------------------------------+ 726 | Change of RPF_DF(RP(G)) | RPF_DF(RP(G)) GenID | 727 | | changes | 728 +-------------------------------------+---------------------------------+ 729 | Send Join(*,G) to new | Decrease Timer to | 730 | DF; Send Prune(*,G) to | t_override | 731 | old DF; set Timer to | | 732 | t_periodic | | 733 +-------------------------------------+---------------------------------+ 734 This state machine uses the following macro: 736 bool JoinDesired(G) { 737 if (olist(G) (-) RPF_interface(RP(G))) != NULL 738 return TRUE 739 else 740 return FALSE 741 } 743 3.5. Designated Forwarder (DF) Election 745 This section presents a fail-safe mechanism for electing a per-RP 746 designated router on each link in a BIDIR-PIM domain. We call this 747 router the Designated Forwarder (DF). 749 3.5.1. DF Requirements 751 The DF election chooses the best router on a link to assume the 752 responsibility of forwarding traffic between the RP and the link for the 753 range of multicast groups served by the RP. Different multicast groups 754 that share a common RP must use the same bi-directional tree for data 755 forwarding. Hence, the election of an upstream forwarder on each link 756 does not have to be a group specific decision but instead can be RP- 757 specific. As the number of RPs is typically small, the number of 758 elections that have to be performed is significantly reduced by this 759 observation. 761 To optimise tree creation, it is desirable that the winner of the 762 election process should be the router on the link with the "best" 763 unicast routing metric to the RP (as reported by the MRIB). When 764 comparing metrics from different unicast routing protocols, we use the 765 same comparison rules used by the PIM-SM assert process [9]. 767 The election process needs to take place when information on a new RP 768 initially becomes available, and can be re-used as new bidir groups for 769 the same RP are encountered. There are however some conditions where an 770 update to the election is required: 772 o There is a change in unicast metric to reach the RP for any of 773 the routers on the link. 775 o The interface on which the RP is reachable changes to an 776 interface for which the router was previously the DF. 778 o A new PIM neighbor starts up on a link. 780 o The elected DF dies. 782 The election process has to be robust enough to ensure with very high 783 probability that all routers on the link have a consistent view of the 784 DF. This is because with the forwarding rules described in section 3.3 785 if multiple routers end-up thinking that they should be responsible for 786 forwarding, loops may result. To reduce the possibility of this 787 occurrence to a minimum, the election algorithm has been biased towards 788 discarding DF information and suspending forwarding during periods of 789 ambiguity. 791 3.5.2. DF Election description 793 This section does not provide the definitive specification for the DF 794 election process. If any discrepancy exists between section 3.5.3 and 795 this section, the specification in section 3.5.3 is to be assumed 796 correct. 798 To perform the election of the DF for a particular RP, routers on a link 799 need to exchange their unicast routing metric information (as reported 800 by the MRIB) for reaching the RP. 802 In the election protocol described below, many message exchanges are 803 repeated Election_Robustness times for reliability. In all those cases 804 the message retransmissions are spaced in time by a small random 805 interval. 807 3.5.2.1. Bootstrap Election 809 Initially when no DF has been elected, routers finding out about a new 810 RP start participating in the election by sending Offer messages. Offer 811 messages include the router's metric to reach the RP. Offers are 812 periodically retransmitted with a period of Offer_Interval. 814 If a router hears a better offer than its own from a neighbor, it stops 815 participating in the election for a period of Election_Robustness * 816 Offer_Interval. If during this period no winner is elected, then the 817 router restarts the election from the beginning. If a router receives an 818 offer with worse metrics than its own, then it restarts the election 819 from the beginning. 821 The result should be that all routers except the best candidate stop 822 advertising their offers. 824 A router assumes the role of the DF after having advertised its metrics 825 Election_Robustness times without receiving any offer from any other 826 neighbor. At that point it transmits a Winner message which declares to 827 every other router on the link the identity of the winner and the 828 metrics it is using. 830 Routers hearing a winner message stop participating in the election and 831 record the identity and metrics of the winner. If the local metrics are 832 better than those of the winner then the router records the identity of 833 the winner but reinitiates the election. 835 3.5.2.2. Loser Metric Changes 837 Whenever the unicast metric to a RP changes for a non-DF router to a 838 value that is better than that previously advertised by the acting DF, 839 the router with the new metric should take action to eventually assume 840 forwarding responsibility. After the metric change is detected, the non- 841 DF router with the now better metric restarts the DF election process by 842 sending Offer messages with this new metric. If no response is received 843 after Election_Robustness retransmissions, the router assumes the role 844 of the DF following the usual Winner announcement procedure. 846 Upon receipt of an offer that is worse than its current metric, the DF 847 will respond with a Winner message declaring its status and advertising 848 its metric. Upon receiving this message, the originator of the Offer 849 records the identity of the DF and aborts the election. 851 Upon receipt of an offer that is better than its current metric, the DF 852 records the identity and metrics of the offering router and responds 853 with a Backoff message. This instructs the offering router to hold off 854 for a short period of time while the unicast routing stabilises. The 855 Backoff message includes the offering router's new metric and address. 856 All routers on the link who have pending offers with metrics worse than 857 those in the backoff message (including the original offering router) 858 will hold further offers for a period of time defined in the Backoff 859 message. 861 If during the Backoff_Period, a third router sends a new better offer, 862 the Backoff message is repeated for the new offer and the Backoff_Period 863 restarted. 865 Before the Backoff_Period expires, the acting DF nominates the router 866 having made the best offer as the new DF using a Pass message. This 867 message includes the IDs and metrics of both the old and new DFs. The 868 old DF stops performing its tasks as soon as the transmission is made. 869 The new DF assumes the role of the DF as soon as it receives the Pass 870 message. All other routers on the link take note of the new DF and its 871 metric. 873 3.5.2.3. Winner Metric Changes 875 If the DF's routing metric to reach the RP changes to a worse value, it 876 sends a set of Election_Robustness randomly spaced Winner messages on 877 the link, advertising the new metric. Routers who receive this 878 announcement but have a better metric may respond with an Offer message 879 which results in the same handoff procedure described above. All 880 routers assume the DF has not changed until they see a Pass or Winner 881 message indicating the change. 883 There is no pressure to make this handoff quickly if the acting DF still 884 has a path to the RP. The old path may now be suboptimal but it will 885 still work while the re-election is in progress. 887 If the routing metric at the DF changes to a better value, a single 888 Winner message is sent advertising the new metric. 890 3.5.2.4. Winner Loses Path 892 If a router's RPF_interface to the RP switches to be on a link for which 893 it is acting as the DF, then it can no longer provide forwarding 894 services for that link. It therefore immediately stops being the DF and 895 restarts the election. As its path to the RP is through the link, an 896 infinite metric is used in the Offer message it sends. 898 Note: At this stage the old DF will have a new RPF neighbor on the link 899 (indicated by unicast routing) which it could use in a Pass message but 900 this adds unnecessary complication to the election process. 902 3.5.2.5. Late Router Starting Up 904 A late router starting up after the DF election process has completed 905 will have no immediate knowledge of the election outcome. As a result, 906 it will start advertising its metric in Offer messages. As soon as this 907 happens, the currently elected DF will respond with a Winner message if 908 its metric is better than the metric in the Offer message, or with a 909 Backoff message if its metric worse than the metric in the Offer 910 message. 912 3.5.2.6. Winner Dies 914 Whenever the DF dies, a new DF has to be elected. The speed at which 915 this can be achieved depends on whether there are any downstream routers 916 on the link. 918 If there are downstream routers, typically their RPF_neighbor as 919 reported by the MRIB before the DF dies will be the DF itself. They will 920 therefore notice either a change in the metric for the route to the RP 921 or a change in RPF_neighbor away from the DF and will restart the 922 election by transmitting Offer messages. If according to the MRIB the 923 RP is now reachable through the same link via another upstream router, 924 an infinite metric will be used in the Offer. 926 If no downstream routers are present, the only way for other upstream 927 routers to detect a DF failure is by the timeout of the PIM neighbor 928 information, which will take significantly longer. 930 3.5.3. Election Protocol Specification 932 This section provides the definitive specification for the DF election 933 process. If any discrepancy exists between section 3.5.2 and this 934 section, the specification in this section is to be assumed correct. 936 3.5.3.1. Election State 938 The DF election state is maintained per RP for each multicast enabled 939 interface on the router as introduced in section 3.1: 941 The state machine has the following four states: 943 Offer 944 Initial election state. When in the Offer state a router 945 thinks it can eventually become the winner and periodically 946 generates Offer messages. 948 Lose In this state the router knows that there either is a 949 different election winner or that no router on the link has a 950 path to the RP. 952 Winner 953 The router is the acting DF without any contest. 955 Backoff 956 The router is the acting DF but another router has made a bid 957 to take over. 959 In the state machine a router is considered to be an acting DF if it is 960 in the Win or Backoff states. 962 The operation of the election protocol makes use of the variables and 963 timers described below: 965 Acting DF information 966 Used to store the election winner who is the currently acting 967 DF. 969 Election-Timer (DFT) 970 Used to schedule transmission of Offer, Winner and Pass 971 messages. 973 Offer-Count (OC) 974 Used to maintain the number of times an Offer or Winner 975 message has been transmitted. 977 Best-Offer 978 Used by the DF to record who has made the last offer for 979 sending the Pass message. 981 3.5.3.2. Election Messages 983 The election process uses the following PIM control messages the packet 984 format of which is described in section 3.7: 986 Offer (OfferingID, Metric) 987 Sent by routers that believe they have a better metric to the 988 RP than the metric that has been on offer so far. 990 Winner (DF-ID, DF-Metric) 991 Sent by a router when assuming the role of the DF or when re- 992 asserting in response to worse offers. 994 Backoff (DF-ID, DF-Metric, OfferingID, OfferMetric, 995 BackoffInterval) 996 Used by the DF to acknowledge better offers. It instructs 997 other routers with equal or worse offers to wait till the DF 998 passes responsibility to the sender of the offer. 1000 Pass (Old-DF-ID, Old-DF-Metric, New-DF-ID, New-DF-Metric) 1001 Used by the old DF to pass forwarding responsibility to a 1002 router that has previously made an offer. The Old-DF-Metric 1003 is the current metric of the DF at the time the pass is sent. 1005 3.5.3.3. Election Events 1007 During protocol operation, in addition to the expiration of the 1008 Election-Timer and the reception of the four control messages, the 1009 following events can take place: 1011 o Discovery of new RP 1013 o Metric reported by the MRIB to reach the RP changes 1015 o DF loses path to RP 1017 o Detection of DF failure 1019 3.5.3.4. Election Notation 1021 The DF election state machine description uses the following notation in 1022 addition to the pseudocode notation described earlier in this spec. 1024 ?= denotes the operation of lowering a timer to a new value. If 1025 the timer is not running then it is started using the new 1026 value. If the timer is running with an expiration lower than 1027 the new value, then the timer is not altered. 1029 When a control message is received and actions are specified on a 1030 condition that metrics are Better or Worse the comparison must be 1031 performed as follows: 1033 o On receipt of an Offer or Winner message compare our current 1034 metrics for the DF with the metrics advertised for the sender of 1035 the message. 1037 o On receipt of a Backoff or Pass message compare our current 1038 metrics for the DF with the metrics advertised for the target of 1039 the message. 1041 When an action of "set DF to Sender or Target" is encountered during 1042 receipt of a Winner, Pass or Backoff message it means the following: 1044 o On receipt of a Winner message set the DF to be the originator of 1045 the message and record its metrics. 1047 o On receipt of a Pass message set the DF to be the target of the 1048 message and record its metrics. 1050 o On receipt of a Backoff message set the DF to be the originator 1051 of the message and record its metrics. 1053 3.5.3.5. Election State Transitions 1055 +-----------------------------------+ 1056 | Figures omitted from text version | 1057 +-----------------------------------+ 1059 Figure 3: Designated Forwarder election state-machine 1061 In tabular form, the state machine is: 1063 +-------------++--------------------------------------------------------+ 1064 | || Event | 1065 | Prev State ++------------------+------------------+------------------+ 1066 | || Recv better | Recv better | Recv better | 1067 | || Pass / Win | Backoff | Offer | 1068 +-------------++------------------+------------------+------------------+ 1069 | || -> Lose | - | - | 1070 | Offer || DF = Sender or | DFT = BOperiod | DFT = OPhigh; | 1071 | || Target; Stop | + OPlow; OC = | OC = 0 | 1072 | || DFT | 0 | | 1073 +-------------++------------------+------------------+------------------+ 1074 | || - | - | -> Offer | 1075 | Lose || DF = Sender or | DF = Sender | DFT = OPhigh; | 1076 | || Target | | OC = 0 | 1077 +-------------++------------------+------------------+------------------+ 1078 | || -> Lose | -> Lose | -> Backoff | 1079 | || DF = Sender or | DF = Sender; | Set Best to | 1080 | Win || Target; Stop | Stop DFT | Sender; Send | 1081 | || DFT | | Backoff; DFT = | 1082 | || | | BOperiod | 1083 +-------------++------------------+------------------+------------------+ 1084 | || -> Lose | -> Lose | - | 1085 | || DF = Sender or | DF = Sender; | Set Best to | 1086 | Backoff || Target; Stop | Stop DFT | Sender; Send | 1087 | || DFT | | Backoff; DFT = | 1088 | || | | BOperiod | 1089 +-------------++------------------+------------------+------------------+ 1090 +-----------++----------------------------------------------------------+ 1091 | || Event | 1092 | ++-------------+--------------+--------------+--------------+ 1093 |Prev State ||Recv Backoff | Recv Pass | Recv Worse | Recv worse | 1094 | ||for us | for us | Pass / Win / | Offer | 1095 | || | | Backoff | | 1096 +-----------++-------------+--------------+--------------+--------------+ 1097 | ||- | -> Win | - | - | 1098 | ||DFT = | Stop DFT | Set DF to | DFT ?= | 1099 |Offer ||BOperiod + | | Sender or | OPlow; OC = | 1100 | ||OPlow; OC = | | Target; DFT | 0 | 1101 | ||0 | | ?= OPlow; OC | | 1102 | || | | = 0 | | 1103 +-----------++-------------+--------------+--------------+--------------+ 1104 | ||-> Offer | -> Offer | -> Offer | -> Offer | 1105 | ||DF = Sender; | DF = Sender; | DF = Sender | DFT = OPlow; | 1106 |Lose ||DFT = OPlow; | DFT = OPlow; | or Target; | OC = 0 | 1107 | ||OC = 0 | OC = 0 | DFT = OPlow; | | 1108 | || | | OC = 0 | | 1109 +-----------++-------------+--------------+--------------+--------------+ 1110 | ||-> Offer | -> Offer | -> Offer | - | 1111 | ||DF = Sender; | DF = Sender; | DF = Sender | Send Winner | 1112 |Win ||DFT = OPlow; | DFT = OPlow; | or Target; | | 1113 | ||OC = 0 | OC = 0 | DFT = OPlow; | | 1114 | || | | OC = 0 | | 1115 +-----------++-------------+--------------+--------------+--------------+ 1116 | ||-> Offer | -> Offer | -> Offer | -> Win | 1117 | ||DF = Sender; | DF = Sender; | DF = Sender | Send Winner; | 1118 |Backoff ||DFT = OPlow; | DFT = OPlow; | or Target; | Stop DFT | 1119 | ||OC = 0 | OC = 0 | DFT = OPlow; | | 1120 | || | | OC = 0 | | 1121 +-----------++-------------+--------------+--------------+--------------+ 1123 +-----------------------------------------------------------------------+ 1124 | In Offer State | 1125 +-----------------------+-----------------------+-----------------------+ 1126 | DFT Expires and OC | DFT Expires and OC | DFT Expires and OC | 1127 | is less than | is equal to | is equal to | 1128 | Robustness | Robustness and we | Robustness and | 1129 | | have path to RP | there is no path | 1130 | | | to RP | 1131 +-----------------------+-----------------------+-----------------------+ 1132 | - | -> Win | -> Lose | 1133 | Send Offer; DFT = | Send Winner | Set DF to None | 1134 | OPlow; OC = OC + 1 | | | 1135 +-----------------------+-----------------------+-----------------------+ 1136 +-----------------------------------------------------------------------+ 1137 | In Lose State | 1138 +--------------------------------+--------------------------------------+ 1139 | Detect DF Failure | Metric changes and now | 1140 | | is better than DF | 1141 +--------------------------------+--------------------------------------+ 1142 | -> Offer | -> Offer | 1143 | DF = None; DFT = | DFT = OPlow_int; OC = 0 | 1144 | OPlow_int; OC = 0 | | 1145 +--------------------------------+--------------------------------------+ 1147 +-----------------------------------------------------------------------+ 1148 | In Win State | 1149 +-----------------------+------------------------+----------------------+ 1150 | Metric changes and | Timer Expires and | No path to RP | 1151 | is now worse | Count is less than | | 1152 | | Robustness | | 1153 +-----------------------+------------------------+----------------------+ 1154 | - | - | -> Offer | 1155 | DFT = OPlow; OC = | Send Winner; DFT = | Set DF to None; | 1156 | 0 | OPlow; OC = OC + 1 | DFT = OPlow; OC = | 1157 | | | 0 | 1158 +-----------------------+------------------------+----------------------+ 1160 +-----------------------------------------------------------------------+ 1161 | In Backoff State | 1162 +-----------------------------------+-----------------------------------+ 1163 | Metric changes and is | Timer Expires | 1164 | now better than Best | | 1165 +-----------------------------------+-----------------------------------+ 1166 | -> Win | -> Lose | 1167 | Stop Timer | Send Pass; Set DF to | 1168 | | stored Best | 1169 +-----------------------------------+-----------------------------------+ 1171 3.6. Timers and Constants 1173 BIDIR-PIM maintains the following timers, as discussed in section 3.1. 1174 All timers are countdown timers - they are set to a value and count down 1175 to zero, at which point they typically trigger an action. Of course 1176 they can just as easily be implemented as count-up timers, where the 1177 absolute expiry time is stored and compared against a real-time clock, 1178 but the language in this specification assumes that they count downwards 1179 to zero. 1181 Per Rendezvous-Point (RP): 1183 Per interface (I): 1185 DF Election Timer: DFT(RP,I) 1187 Per Group (G): 1189 Upstream Join Timer: JT(G) 1191 Per interface (I): 1193 Join Expiry Timer: ET(G,I) 1195 PrunePending Timer: PPT(G,I) 1197 When timers are started or restarted, they are set to default values. 1198 This section summarizes those default values. 1200 Timer Name: DF Election Timer (DFT) 1202 +--------------------+-------------------------+------------------------+ 1203 | Value Name | Value | Explanation | 1204 +--------------------+-------------------------+------------------------+ 1205 | Offer_Period | 100 ms | Interval to wait | 1206 | | | between repeated | 1207 | | | Offer and Winner | 1208 | | | messages. | 1209 +--------------------+-------------------------+------------------------+ 1210 | Backoff_Period | 1 sec | Period that acting | 1211 | | | DF waits between | 1212 | | | receiving a better | 1213 | | | Offer and sending | 1214 | | | the Pass message | 1215 | | | to transfer DF | 1216 | | | responsibility. | 1217 +--------------------+-------------------------+------------------------+ 1218 | OPLow | rand(0.5, 1) * | Range of actual | 1219 | | Offer_Period | randomised value | 1220 | | | used between | 1221 | | | repeated messages. | 1222 +--------------------+-------------------------+------------------------+ 1223 | OPHigh | Election_Robustness | Interval to wait | 1224 | | * Offer_Period | in order to give a | 1225 | | | chance to a router | 1226 | | | with a better | 1227 | | | Offer to become | 1228 | | | the DF. | 1229 +--------------------+-------------------------+------------------------+ 1231 Timer Names: Join Expiry Timer (ET(G,I)) 1233 +----------------+----------------+-------------------------------------+ 1234 | Value Name | Value | Explanation | 1235 +----------------+----------------+-------------------------------------+ 1236 | J/P HoldTime | from message | Hold Time from Join/Prune Message | 1237 +----------------+----------------+-------------------------------------+ 1238 Timer Names: Prune Pending Timer (PPT(G,I)) 1240 +--------------------------+--------------------+-----------------------+ 1241 | Value Name | Value | Explanation | 1242 +--------------------------+--------------------+-----------------------+ 1243 | J/P Override Interval | Default: 3 secs | Short period after | 1244 | | | a join or prune to | 1245 | | | allow other | 1246 | | | routers on the LAN | 1247 | | | to override the | 1248 | | | join or prune | 1249 +--------------------------+--------------------+-----------------------+ 1251 Timer Names: Upstream Join Timer (JT(G)) 1253 +-------------+--------------------+------------------------------------+ 1254 |Value Name |Value |Explanation | 1255 +-------------+--------------------+------------------------------------+ 1256 |t_periodic |Default: 60 secs |Period between Join/Prune Messages | 1257 +-------------+--------------------+------------------------------------+ 1258 |t_suppressed |rand(1.1 * |Suppression period when someone | 1259 | |t_periodic, 1.4 * |else sends a J/P message so we | 1260 | |t_periodic) |don't need to do so. | 1261 +-------------+--------------------+------------------------------------+ 1262 |t_override |rand(0, 0.9 * J/P |Randomized delay to prevent | 1263 | |Override Interval) |response implosion when sending a | 1264 | | |join message to override someone | 1265 | | |else's prune message. | 1266 +-------------+--------------------+------------------------------------+ 1268 For more information about these values refer to the PIM-SM [9] 1269 documentation. 1271 Constant Name: DF Election Robustness 1273 +--------------------------+-------------------+------------------------+ 1274 | Constant Name | Value | Explanation | 1275 +--------------------------+-------------------+------------------------+ 1276 | Election_Robustness | Default: 3 | Minimum number of | 1277 | | | election messages | 1278 | | | that must be lost | 1279 | | | in order for | 1280 | | | election to fail. | 1281 +--------------------------+-------------------+------------------------+ 1282 3.7. BIDIR PIM Packet Formats 1284 This section describes the details of the packet formats for BIDIR-PIM 1285 control messages. BIDIR-PIM shares a number of control messages in 1286 common with PIM-SM [9] well as the format for the Encoded-Unicast 1287 address. For details on the format of these packets please refer to the 1288 PIM-SM documentation. Here we will only define the additional packets 1289 that are introduced by BIDIR-PIM. These are the packets used in the DF 1290 election process as well as the Bidir_Capable PIM-Hello option. 1292 3.7.1. DF Election Packet Formats 1294 All PIM control messages have IP protocol number 103. 1296 BIDIR-PIM messages are multicast with TTL 1 to the `ALL-PIM-ROUTERS' 1297 group `224.0.0.13'. 1299 All DF election BIDIR-PIM control messages share the common header 1300 below: 1302 0 1 2 3 1303 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1304 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1305 |PIM Ver| Type |Subtype| Rsvd | Checksum | 1306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1307 | Encoded-Unicast-RP-Address | 1308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1309 | Sender Metric Preference | 1310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1311 | Sender Metric | 1312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1314 PIM Ver 1315 PIM Version number is 2. 1317 Type All DF-Election PIM control messages share the PIM message Type of 1318 10. 1320 Subtype 1321 Subtypes for DF election messages are: 1323 1 = Offer 1324 2 = Winner 1325 3 = Backoff 1326 4 = Pass 1328 Rsvd Set to zero on transmission. Ignored upon receipt. 1330 Checksum 1331 The checksum is standard IP checksum, i.e. the 16-bit one's 1332 complement of the one's complement sum of the entire PIM message. 1333 For computing the checksum, the checksum field is zeroed. 1335 RP-Address 1336 The address of the bidir RP for which the election is taking place 1337 (note that the length of this field is more than 32 bits). 1339 Sender Metric Preference 1340 Preference value assigned to the unicast routing protocol that the 1341 message sender used to obtain the route to the RP-address. 1343 Sender Metric 1344 The unicast routing table metric used by the message sender to 1345 reach the RP. The metric is in units applicable to the unicast 1346 routing protocol used. 1348 In addition to the fields defined above the Backoff and Pass messages 1349 have the extra fields described below. 1351 3.7.2. Backoff Message 1353 The Backoff message uses the following fields in addition to the common 1354 election message format described above. 1356 0 1 2 3 1357 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1358 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1359 | Encoded-Unicast-Offering-Address | 1360 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1361 | Offering Metric Preference | 1362 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1363 | Offering Metric | 1364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1365 | Interval | 1366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1368 Offering Address 1369 The address of the router that made the last (best) Offer (note 1370 that the length of this field is more than 32 bits). 1372 Offering Metric Preference 1373 Preference value assigned to the unicast routing protocol that the 1374 offering router used to obtain the route to RP-address. 1376 Offering Metric 1377 The unicast routing table metric used by the offering router to 1378 reach the RP. The metric is in units applicable to the unicast 1379 routing protocol used. 1381 Interval 1382 The backoff interval in milliseconds to be used by routers with 1383 worse metrics than the offering router. 1385 3.7.3. Pass Message 1387 The Pass message uses the following fields in addition to the common 1388 election fields described above. 1390 0 1 2 3 1391 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1393 | Encoded-Unicast-New-Winner-Address | 1394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1395 | New Winner Metric Preference | 1396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1397 | New Winner Metric | 1398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1400 New Winner Address 1401 The address of the router that made the last (best) Offer (note 1402 that the length of this field is more than 32 bits). 1404 New Winner Metric Preference 1405 Preference value assigned to the unicast routing protocol that the 1406 offering router used to obtain the route to RP-address. 1408 New Winner Metric 1409 The unicast routing table metric used by the offering router to 1410 reach the RP. The metric is in units applicable to the unicast 1411 routing protocol used. 1413 3.7.4. Bidir Capable PIM-Hello Option 1415 BIDIR-PIM introduces one new PIM-Hello option. 1417 o OptionType 22: Bidir Capable 1418 0 1 2 3 1419 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1421 | Type = 22 | Length = 0 | 1422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1424 4. RP Discovery 1426 Routers discover that a range of multicast group addresses operates in 1427 bi-directional mode and the address of the Rendezvous-Point serving the 1428 group range either through static configuration or using an automatic RP 1429 discovery mechanism like the PIM Bootsrtap mechanism (BSR). [12]. 1431 By default the BSR protocol advertises RPs that operate the PIM-SM 1432 protocol. In order to identify a RP as operating in BIDIR mode, the 1433 Encoded-Group Address field in Bootstrap and Candidate-RP Advertisement 1434 messages has been extended by adding the BIDIR bit (B-bit) as specified 1435 below: 1437 0 1 2 3 1438 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1440 | Addr Family | Encoding Type |B| Reserved | Mask Len | 1441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1442 | Group Multicast Address | 1443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1445 B-bit 1446 When the Bidir-bit is set, all BIDIR capable PIM routers will 1447 operate the protocol described in this document for the specified 1448 group range. 1450 5. Security Considerations 1452 All PIM control messages MAY use IPsec [6] to address security concerns. 1453 The authentication methods are addressed in a companion document [7]. 1454 Keys may be distributed as described in [8]. 1456 5.1. Appendix A: Election Reliability Enhancements 1458 For the correct operation of bi-directional PIM it is very important to 1459 avoid situations where two routers consider themselves to be Designated 1460 Forwarders for the same link. The two precautions below are not required 1461 for correct operation but can help diagnose anomalies and correct them. 1463 5.1.1. A.1 Missing Pass 1465 After a DF has been elected, a router whose metrics change to become 1466 better than the DF will attempt to take over. If during the re-election 1467 the acting DF has a condition that causes it to lose all of the election 1468 messages (like a CPU overload), the new candidate will transmit three 1469 offers and assume the role of the forwarder resulting in two DFs on the 1470 link. This situation is pathological and should be corrected by fixing 1471 the overloaded router. It is desirable that such an event can be 1472 detected by a network administrator. 1474 When a router becomes the DF for a link without receiving a Pass message 1475 from the known old DF, the PIM neighbor information for the old DF can 1476 be marked to this effect. Upon receiving the next PIM Hello message from 1477 the old DF, the router can retransmit Winner messages for all the RPs 1478 for which it acting as the DF. The anomaly may also be logged by the 1479 router to alert the operator. 1481 5.1.2. A.2 Periodic Winner Announcement 1483 An additional degree of safety can be achieved by having the DF for each 1484 RP periodically announce its status in a Winner message. Transmission 1485 of the periodic Winner message can be restricted to occur only for RPs 1486 which have active groups, thus avoiding the periodic control traffic in 1487 areas of the network without senders or receivers for a particular RP. 1489 5.2. Appendix B: Interoperability with legacy code 1491 The rules provided in [10] for interoperating between legacy PIM-SM 1492 routers and new bi-directional capable routers change only slightly to 1493 support this new proposal. The only difference is in the definition of a 1494 boundary between a bi-directional capable area and a legacy area of the 1495 network. In [10], a bidir capable router forwarding upstream, register 1496 encapsulates the data packet to the RP if its RPF neighbor is not bidir 1497 capable. 1499 In our proposal, since all the routers on a link need to co-operate to 1500 elect the Designated Forwarder, if even one of the routers on the link 1501 is a legacy router, the election cannot take place. As a result register 1502 encapsulation is necessary if one or more routers on the RPF interface 1503 are not bi-directional capable. 1505 As in [10], a Hello option must be used to differentiate between bi- 1506 directional capable and legacy routers, and (S,G) state must be created 1507 on the router doing the register encapsulation to prevent loops. 1509 5.3. Appendix C: Comparison with PIM-SM 1511 This section describes the main differences between Bidir PIM and 1512 sparse-mode PIM: 1514 o Bidir PIM uses a single shared tree for distributing the data for 1515 all the sources of a multicast group. The use of a single tree 1516 significantly reduces state requirements on a router. The 1517 drawback is that it may produce suboptimal paths from sources to 1518 receivers possibly resulting in higher network latency and less 1519 efficient bandwidth utilisation. 1521 o In Bidir PIM, packets traveling from a source to the RP, are 1522 natively forwarded on the shared tree. In contrast sparse-mode 1523 PIM uses unicast encapsulation or source-specific state. 1525 o In Bidir PIM, sender-only branches do not need to keep group 1526 state. Data from the source can be natively forwarded towards the 1527 RP using RP-specific forwarding state. 1529 o The Bidir Designated Forwarder (DF) assumes all the 1530 responsibilities of the sparse-mode DR. In a multi-access link, 1531 the DF responds to IGMP notifications. Downstream routers on the 1532 link use the DF as their upstream neighbor and direct all 1533 Join/Prune messages towards it. 1535 o To enforce a single forwarder on multi-access links, sparse-mode 1536 PIM uses the Assert mechanism which requires data-packets to 1537 trigger protocol events. In Bidir PIM, data-driven events are 1538 completely eliminated as a correct route is always available at 1539 packet forwarding time. 1541 The DF election problem is easier than the assert problem because 1542 there is a small number of RPs and the per RP DF election can be 1543 done in advance. With the assert mechanism, in addition to each RP, 1544 a forwarder has to be elected for each possible source to a group. 1545 This can not be done before data is available. 1547 o With sparse-mode PIM, when forwarding packets using shared-tree 1548 (*,G) state, a directly-connected-source check has to be made on 1549 every packet. This is done to determine if the packet was 1550 originated by a source which is directly connected to the router. 1551 For a connected source, source-specific state has to be created 1552 to register packets to the RP and prune the source off the shared 1553 tree. 1555 With Bidir PIM directly connected sources do not need any special 1556 handling. The DF for the RP of the group the source is sending to, 1557 seamlessly picks-up and forwards upstream traveling packets. 1559 6. Todo list... 1561 o Update legacy interoperability section to remove dependency on 10. 1563 o Incorporate BSR mods into BSR spec. 1565 o In the state machine, perhaps add an arc from NI to NI, labelled 1566 "(*,G) Join but not DF" just to make it really clear. 1568 7. Authors' Addresses 1570 Mark Handley 1571 ICIR/ICSI 1572 1947 Center St, Suite 600 1573 Berkeley, CA 94708 1574 mjh@icir.org 1576 Isidor Kouvelas 1577 Cisco Systems 1578 kouvelas@cisco.com 1580 Tony Speakman 1581 Cisco Systems 1582 speakman@cisco.com 1584 Lorenzo Vicisano 1585 Cisco Systems 1586 lorenzo@cisco.com 1588 8. Acknowledgments 1590 The bidir proposal in this draft is heavily based on the ideas and text 1591 presented by Estrin and Farinacci in [10]. The main difference between 1592 the two proposals is in the method chosen for upstream forwarding. 1594 We would also like to thank Deborah Estrin at ISI/USC as well as Nidhi 1595 Bhaskar, Yiqun Cai, Apoorva Karan, Rajitha Sumanasekera and Beau 1596 Williamson at cisco for their contributions and comments to this draft. 1598 9. References 1600 [1] T. Bates , R. Chandra , D. Katz , Y. Rekhter, "Multiprotocol 1601 Extensions for BGP-4", RFC 2283 1603 [2] S.E. Deering, "Host extensions for IP multicasting", RFC 1112, Aug 1604 1989. 1606 [3] W. Fenner, "Internet Group Management Protocol, Version 2", RFC 1607 2236. 1609 [4] IANA, "Address Family Numbers", linked from 1610 http://www.iana.org/numbers.html 1612 [5] T. Narten , H. Alvestrand, "Guidelines for Writing an IANA 1613 Considerations Section in RFCs", RFC 2434. 1615 [6] S. Kent, R. Atkinson, "Security Architecture for the Internet 1616 Protocol.", RFC 2401. 1618 [7] L. Wei, "Authenticating PIM version 2 messages", draft-ietf-pim- 1619 v2-auth-01.txt, work in progress. 1621 [8] T. Hardjono, B. Cain, "Simple Key Management Protocol for PIM", 1622 draft-ietf-pim-simplekmp-01.txt, work in progress. 1624 [9] B. Fenner, M. Handley, H. Holbrook, I. Kouvelas "Protocol 1625 Independent Multicast - Sparse Mode (PIM-SM): Protocol 1626 Specification (Revised)", Work In Progress, , 2002. 1629 [10] D. Estrin, D. Farinacci, "Bi-directional Shared Trees in PIM-SM", 1630 Work In Progress, , May 1999. 1632 [11] D. Estrin et al, "Protocol Independent Multicast-Sparse Mode (PIM- 1633 SM): Protocol Specification", RFC 2362, Nov 1999. 1635 [12] W. Fenner, M. Handley, R. Kermode and D. Thaler, "Bootstrap Router 1636 (BSR) Mechanism for PIM Sparse Mode", draft-ietf-pim-sm-bsr-00.txt, 1637 work in progress. 1639 10. Index 1640 DownstreamJPState(G,I) . . . . . . . . . . . . . . . . . . . . . . . 11 1641 ET(G,I). . . . . . . . . . . . . . . . . . . . . . . . . . . . .10,14,30 1642 ET(RP,I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1643 I_am_DF(RP,I). . . . . . . . . . . . . . . . . . . . . . . . . .11,13,16 1644 J/P_HoldTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1645 J/P_Override_Interval. . . . . . . . . . . . . . . . . . . . . . . 16,31 1646 JoinDesired(G) . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1647 joins(G) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1648 JT(*,G). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1649 JT(G). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10,31 1650 local_receiver_include(G,I). . . . . . . . . . . . . . . . . . . . . 11 1651 NLT(N,I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1652 Offer_Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1653 olist(G) . . . . . . . . . . . . . . . . . . . . . . . . . . . .11,13,18 1654 OT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1655 pim_include(G) . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1656 PPT(G,I) . . . . . . . . . . . . . . . . . . . . . . . . . . . .10,14,31 1657 RPF_interface(RP). . . . . . . . . . . . . . . . . . . . . . . . . 11,13 1658 t_override . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18,31 1659 t_periodic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18,31 1660 t_suppressed . . . . . . . . . . . . . . . . . . . . . . . . . . . 18,31