idnits 2.17.1 draft-ietf-pim-drlb-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 19, 2018) is 2109 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'HELLO-OPT' is defined on line 742, but no explicit reference was found in the text ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Yiqun. Cai 3 Internet-Draft Heidi. Ou 4 Intended status: Standards Track Alibaba Group 5 Expires: December 21, 2018 Sri. Vallepalli 6 Mankamana. Mishra 7 Stig. Venaas 8 Cisco Systems 9 Andy. Green 10 British Telecom 11 June 19, 2018 13 PIM Designated Router Load Balancing 14 draft-ietf-pim-drlb-08 16 Abstract 18 On a multi-access network, one of the PIM routers is elected as a 19 Designated Router (DR). On the last hop LAN, the PIM DR is 20 responsible for tracking local multicast listeners and forwarding 21 traffic to these listeners if the group is operating in PIM-SM. In 22 this document, we propose a modification to the PIM-SM protocol that 23 allows more than one of these last hop routers to be selected so that 24 the forwarding load can be distributed among these routers. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on December 21, 2018. 43 Copyright Notice 45 Copyright (c) 2018 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 62 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 5 63 4. Functional Overview . . . . . . . . . . . . . . . . . . . . . 6 64 4.1. GDR Candidates . . . . . . . . . . . . . . . . . . . . . 6 65 4.2. Hash Mask and Hash Algorithm . . . . . . . . . . . . . . 7 66 4.3. Modulo Hash Algorithm . . . . . . . . . . . . . . . . . . 8 67 4.4. PIM Hello Options . . . . . . . . . . . . . . . . . . . . 9 68 5. Hello Option Formats . . . . . . . . . . . . . . . . . . . . 9 69 5.1. PIM DR Load Balancing Capability (DRLBC) Hello Option . . 9 70 5.2. PIM DR Load Balancing GDR (DRLBGDR) Hello Option . . . . 10 71 6. Protocol Specification . . . . . . . . . . . . . . . . . . . 11 72 6.1. PIM DR Operation . . . . . . . . . . . . . . . . . . . . 11 73 6.2. PIM GDR Candidate Operation . . . . . . . . . . . . . . . 12 74 6.2.1. Router Receives New DRLBGDR . . . . . . . . . . . . . 13 75 6.2.2. Router Receives Updated DRLBGDR . . . . . . . . . . . 13 76 6.3. PIM Assert Modification . . . . . . . . . . . . . . . . . 14 77 7. Compatibility . . . . . . . . . . . . . . . . . . . . . . . . 15 78 8. Manageability Considerations . . . . . . . . . . . . . . . . 15 79 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 80 10. Security Considerations . . . . . . . . . . . . . . . . . . . 16 81 11. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 16 82 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 83 12.1. Normative References . . . . . . . . . . . . . . . . . . 16 84 12.2. Informative References . . . . . . . . . . . . . . . . . 17 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 87 1. Introduction 89 On a multi-access LAN such as an Ethernet, one of the PIM routers is 90 elected as a DR. The PIM DR has two roles in the PIM-SM protocol. 91 On the first hop network, the PIM DR is responsible for registering 92 an active source with the Rendezvous Point (RP) if the group is 93 operating in PIM-SM. On the last hop LAN, the PIM DR is responsible 94 for tracking local multicast listeners and forwarding to these 95 listeners if the group is operating in PIM-SM. 97 Consider the following last hop LAN in Figure 1: 99 ( core networks ) 100 | | | 101 | | | 102 R1 R2 R3 103 | | | 104 --(last hop LAN)-- 105 | 106 | 107 (many receivers) 109 Figure 1: Last Hop LAN 111 Assume R1 is elected as the Designated Router. According to 112 [RFC4601], R1 will be responsible for forwarding traffic to that LAN 113 on behalf of any local members. In addition to keeping track of IGMP 114 and MLD membership reports, R1 is also responsible for initiating the 115 creation of source and/or shared trees towards the senders or the 116 RPs. 118 Forcing sole data plane forwarding responsibility on the PIM DR 119 uncovers a limitation in the protocol. In comparison, even though an 120 OSPF DR or an IS-IS DIS handles additional duties while running the 121 OSPF or IS-IS protocols, they are not required to be solely 122 responsible for forwarding packets for the network. On the other 123 hand, on a last hop LAN, only the PIM DR is asked to forward packets 124 while the other routers handle only control traffic (and perhaps drop 125 packets due to RPF failures). Hence the forwarding load of a last 126 hop LAN is concentrated on a single router. 128 This leads to several issues. One of the issues is that the 129 aggregated bandwidth will be limited to what R1 can handle towards 130 this particular interface. It is very common that the last hop LAN 131 usually consists of switches that run IGMP/MLD or PIM snooping. This 132 allows the forwarding of multicast packets to be restricted only to 133 segments leading to receivers who have indicated their interest in 134 multicast groups using either IGMP or MLD. The emergence of the 135 switched Ethernet allows the aggregated bandwidth to exceed, 136 sometimes by a large number, that of a single link. For example, let 137 us modify Figure 1 and introduce an Ethernet switch in Figure 2. 139 ( core networks ) 140 | | | 141 | | | 142 R1 R2 R3 143 | | | 144 +=gi0===gi1===gi2=+ 145 + + 146 + switch + 147 + + 148 +=gi4===gi5===gi6=+ 149 | | | 150 H1 H2 H3 152 Figure 2: Last Hop Network with Ethernet Switch 154 Let us assume that each individual link is a Gigabit Ethernet. Each 155 router, R1, R2 and R3, and the switch have enough forwarding capacity 156 to handle hundreds of Gigabits of data. 158 Let us further assume that each of the hosts requests 500 Mbps of 159 unique multicast data. This totals to 1.5 Gbps of data, which is 160 less than what each switch or the combined uplink bandwidth across 161 the routers can handle, even under failure of a single router. 163 On the other hand, the link between R1 and switch, via port gi0, can 164 only handle a throughput of 1Gbps. And if R1 is the only DR (the PIM 165 DR elected using the procedure defined by [RFC4601]) at least 500 166 Mbps worth of data will be lost because the only link that can be 167 used to draw the traffic from the routers to the switch is via gi0. 168 In other words, the entire network's throughput is limited by the 169 single connection between the PIM DR and the switch (or the last hop 170 LAN as in Figure 1). 172 The problem may also manifest itself in a different way. For 173 example, R1 happens to forward 500 Mbps worth of unicast data to H1, 174 and at the same time, H2 and H3 each request 300 Mbps of different 175 multicast data. R1 experiences packet drop once again. while, in the 176 meantime, there is sufficient forwarding capacity left on R2 and R3 177 and unused link capacity between the switch and R2/R3. 179 Another important issue is related to failover. If R1 is the only 180 forwarder on the last hop router for shared LAN, when R1 goes out of 181 service, multicast forwarding for the entire LAN has to be rebuilt by 182 the newly elected PIM DR. However, if there was a way that allowed 183 multiple routers to forward to the LAN for different groups, failure 184 of one of the routers would only lead to disruption to a subset of 185 the flows, therefore improving the overall resilience of the network. 187 There is limitation in the hash algorithm used in this document, but 188 this draft provides the option to have different and more consistent 189 hash algorithms in the future. 191 In this document, we propose a modification to the PIM-SM protocol 192 that allows more than one of these routers, called Group Designated 193 Routers (GDR) to be selected so that the forwarding load can be 194 distributed among a number of routers. 196 2. Terminology 198 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 199 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 200 document are to be interpreted as described in [RFC2119]. 202 With respect to PIM, this document follows the terminology that has 203 been defined in [RFC4601]. 205 This document also introduces the following new acronyms: 207 o GDR: GDR stands for "Group Designated Router". For each multicast 208 flow, either a (*,G) for ASM, or an (S,G) for SSM, a hash 209 algorithm (described below) is used to select one of the routers 210 as a GDR. The GDR is responsible for initiating the forwarding 211 tree building process for the corresponding multicast flow. 213 o GDR Candidate: a last hop router that has the potential to become 214 a GDR. A GDR Candidate must have the same DR priority and must 215 run the same GDR election hash algorithm as the DR router. It 216 must send and process new PIM Hello Options as defined in this 217 document. There might be more than one GDR Candidate on a LAN, 218 but only one can become GDR for a specific multicast flow. 220 3. Applicability 222 The proposed change described in this specification applies to PIM-SM 223 last hop routers only. 225 It does not alter the behavior of a PIM DR on the first hop network. 226 This is because the source tree is built using the IP address of the 227 sender, not the IP address of the PIM DR that sends the registers 228 towards the RP. The load balancing between first hop routers can be 229 achieved naturally if an IGP provides equal cost multiple paths 230 (which it usually does in practice). Also distributing the load to 231 do registering does not justify the additional complexity required to 232 support it. 234 4. Functional Overview 236 In the existing PIM DR election, when multiple last hop routers are 237 connected to a multi-access LAN (for example, an Ethernet), one of 238 them is selected to act as PIM DR. The PIM DR is responsible for 239 sending local Join/Prune messages towards the RP or source. In order 240 to elect the PIM DR, each PIM router on the LAN examines the received 241 PIM Hello messages and compares its DR priority and IP address with 242 those of its neighbors. The router with the highest DR priority is 243 the PIM DR. If there are multiple such routers, their IP addresses 244 are used as the tie-breaker, as described in [RFC4601]. 246 In order to share forwarding load among last hop routers, besides the 247 normal PIM DR election, the GDR is also elected on the last hop 248 multi-access LAN. There is only one PIM DR on the multi-access LAN, 249 but there might be multiple GDR Candidates. 251 For each multicast flow, that is, (*,G) for ASM and (S,G) for SSM, a 252 hash algorithm is used to select one of the routers to be the GDR. A 253 new DR Load Balancing Capability (DRLBC) PIM Hello Option, which 254 contains hash algorithm type, is announced by routers on interfaces 255 where this specification is enabled. Last hop routers with the new 256 DRLBC Option advertised in its Hello, and using the same GDR election 257 hash algorithm and the same DR priority as the PIM DR, are considered 258 as GDR Candidates. 260 Hash Masks are defined for Source, Group and RP separately, in order 261 to handle PIM ASM/SSM. The masks, as well as a sorted list of GDR 262 Candidates' Addresses, are announced by DR in a new DR Load Balancing 263 GDR (DRLBGDR) PIM Hello Option. 265 A hash algorithm based on the announced Source, Group, or RP masks 266 allows one GDR to be assigned to a corresponding multicast state. 267 And that GDR is responsible for initiating the creation of the 268 multicast forwarding tree for multicast traffic. 270 4.1. GDR Candidates 272 GDR is the new concept introduced by this specification. GDR 273 Candidates are routers eligible for GDR election on the LAN. To 274 become a GDR Candidate, a router MUST support this specification, 275 have the same DR priority and run the same GDR election hash 276 algorithm as the DR on the LAN. 278 For example, assume there are 4 routers on the LAN: R1, R2, R3 and 279 R4, which all support this specification. R1, R2 and R3 have the 280 same DR priority while R4's DR priority is less preferred. In this 281 example, R4 will not be eligible for GDR election, because R4 will 282 not become a PIM DR unless all of R1, R2 and R3 go out of service. 284 Furthermore, assume router R1 wins the PIM DR election, R1 and R2 run 285 the same hash algorithm for GDR election, while R3 runs a different 286 one. In this case, only R1 and R2 will be eligible for GDR election, 287 while R3 will not. 289 As a DR, R1 will include its own Load Balancing Hash Masks and the 290 identity of R1 and R2 (the GDR Candidates) in its DRLBGDR Hello 291 Option. 293 4.2. Hash Mask and Hash Algorithm 295 A Hash Mask is used to extract a number of bits from the 296 corresponding IP address field (32 for v4, 128 for v6) and calculate 297 a hash value. A hash value is used to select a GDR from GDR 298 Candidates advertised by PIM DR. For example, 0.0.255.0 defines a 299 Hash Mask for an IPv4 address that masks the first, the second, and 300 the fourth octets. 302 There are three Hash Masks defined, 304 o RP Hash Mask 306 o Source Hash Mask 308 o Group Hash Mask 310 The hash masks need to be configured on the PIM routers that can 311 potentially become a PIM DR, unless the implementation provides 312 default Hash Mask. An implementation SHOULD provide masks with 313 default values 255.255.255.255 (IPv4) and 314 FFFF:FFFF:FFFF:FFFF:FFFFF:FFFF:FFFF:FFFF (IPv6). 316 o If the group is ASM and the RP Hash Mask announced by the PIM DR 317 is not 0, calculate the value of hashvalue_RP [Section 4.3] to 318 determine GDR. 320 o If the group is ASM and the RP Hash Mask announced by the PIM DR 321 is 0, obtain the value of hashvalue_Group [Section 4.3 ] to 322 determine GDR. 324 o If the group is SSM, use hashvalue_SG [Section 4.3] to determine 325 GDR. 327 A simple Modulo hash algorithm will be discussed in this document. 328 However, to allow another hash algorithms to be used, a 4-bytes "Hash 329 Algorithm Type" field is included in DRLBC Hello Option to specify 330 the hash algorithm used by a last hop router. 332 If different hash algorithm types are advertised among last hop 333 routers, only last hop routers running the same hash algorithm as the 334 DR (and having the same DR priority as the DR) are eligible for GDR 335 election. 337 4.3. Modulo Hash Algorithm 339 Modulo hash algorithm is discussed here with a detailed description 340 on hashvalue_RP. The same algorithm is described in brief for 341 hashvalue_Group using the group address instead of the RP address for 342 an ASM group with RP_hashmask==0, and also with hashvalue_SG for a 343 the source address of an (S,G), instead of the RP address, 345 o For ASM groups, with a non-zero RP_Hash Mask, hash value is 346 calculated as: 348 hashvalue_RP = (((RP_address & RP_hashmask) >> N) & 0xFFFF) % M 350 RP_address is the address of the RP defined for the group. N 351 is the number of zeros, counted from the least significant bit 352 of the RP_hashmask. M is the number of GDR Candidates. 354 For example, Router X with IPv4 address 203.0.113.1 receives a 355 DRLBGDR Hello Option from the DR, which announces RP Hash Mask 356 0.0.255.0 and a list of GDR Candidates, sorted by IP addresses 357 from high to low: 203.0.113.3, 203.0.113.2 and 203.0.113.1. 358 The ordinal number assigned to those addresses would be: 360 0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1 (Router 361 X) 363 Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2 364 198.51.100.2 for Group2. Following the modulo hash algorithm: 366 N is 8 for 0.0.255.0, and M is 3 for the total number of GDR 367 Candidates. The hashvalue_RP for RP1 192.0.2.1 is: 369 (((192.0.2.1 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 2 % 3 = 2 371 matches the ordinal number assigned to Router X. Router X will 372 be the GDR for Group1, which uses 192.0.2.1 as the RP. 374 The hashvalue_RP for RP2 198.51.100.2 is: 376 (((198.51.100.2 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 100 % 3 = 1 378 which is different from Router X's ordinal number(2) hence, 379 Router X will not be GDR for Group2. 381 o If RP_hashmask is 0, a hash value for ASM group is calculated 382 using the group Hash Mask: 384 hashvalue_Group = (((Group_address & Group_hashmask) >> N) & 385 0xFFFF) % M 387 Compare hashvalue_Group with Ordinal number assigned to Router 388 X, to decide if Router X is the GDR. 390 o For SSM groups, a hash value is calculated using both the source 391 and group Hash Mask: 393 hashvalue_SG = ((((Source_address & Source_hashmask) >> N_S) & 394 0xFFFF) ^ (((Group_address & Group_hashmask) >> N_G) & 0xFFFF)) 395 % M 397 4.4. PIM Hello Options 399 When a last hop PIM router sends a PIM Hello from an interface with 400 this specification enabled, it includes a new option, called "Load 401 Balancing Capability (DRLBC)". 403 Besides this DRLBC Hello Option, the elected PIM DR also includes a 404 new "DR Load Balancing GDR (DRLBGDR) Hello Option". The DRLBGDR 405 Hello Option consists of three Hash Masks as defined above and also 406 the sorted list of all GDR Candidates' Address on the last hop LAN. 408 The elected PIM DR uses DRLBC Hello Option advertised by all routers 409 on the last hop LAN to compose its DRLBGDR. The GDR Candidates use 410 DRLBGDR Hello Option advertised by PIM DR to calculate hash value. 412 5. Hello Option Formats 414 5.1. PIM DR Load Balancing Capability (DRLBC) Hello Option 415 0 1 2 3 416 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 418 | Type = TBD | Length = 4 | 419 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 420 | Hash Algorithm Type | 421 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 423 Figure 3: Capability Hello Option 425 Type: TBD. 427 Length: 4 octets 429 Hash Algorithm Type: 0 for Modulo hash algorithm 431 This DRLBC Hello Option SHOULD be advertised by last hop routers from 432 interfaces with this specification enabled. 434 5.2. PIM DR Load Balancing GDR (DRLBGDR) Hello Option 436 0 1 2 3 437 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 439 | Type = TBD | Length | 440 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 441 | Group Mask | 442 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 443 | Source Mask | 444 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 445 | RP Mask | 446 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 447 | GDR Candidate Address(es) | 448 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 Figure 4: GDR Hello Option 452 Type: TBD 454 Length: 3 x (4 byte or 16 byte) + n x (4 byte or 16 byte) where n 455 is the number of GDR candidates. 457 Group Mask (32/128 bits): Mask 459 Source Mask (32/128 bits): Mask 460 RP Mask (32/128 bits): Mask 462 All masks MUST be in the same address family as the Hello IP 463 header. 465 GDR Address (32/128 bits): Address(es) of GDR Candidate(s) 467 All addresses must be in the same address family as the Hello 468 IP header. The addresses are sorted in descending order. The 469 order is converted to the ordinal number associated with each 470 GDR candidate in hash value calculation. For example, 471 addresses advertised are R3, R2, R1, the ordinal number 472 assigned to R3 is 0, to R2 is 1 and to R1 is 2. 474 If "Interface ID" option, as described in [RFC6395], presents 475 in a GDR Candidate's PIM Hello message, and the "Router ID" 476 portion is non-zero, 478 + For IPv4, the "GDR Candidate Address" will be set directly 479 to "Router ID". 481 + For IPv6, the "GDR Candidate Address" will be set to the 482 IPv4-IPv6 translated address of "Router ID", as described in 483 [RFC4291] , that is the "Router-ID" is appended to the 484 prefix of 96-bits zeros. 486 If the "Interface ID" option is not present in a GDR 487 Candidate's PIM Hello message, or if the "Interface ID" option 488 is present but the "Router ID" field is zero, the "GDR 489 Candidate Address" will be the IPv4 or IPv6 source address from 490 PIM Hello message. 492 This DRLBGDR Hello Option MUST only be advertised by the 493 elected PIM DR. 495 6. Protocol Specification 497 6.1. PIM DR Operation 499 The DR election process is still the same as defined in [RFC4601]. A 500 DR that has this specification enabled on the interface advertises 501 the new DRLBGDR Hello Option, which contains value of masks from user 502 configuration, followed by a sorted list of all GDR Candidates' 503 Addresses, from the highest value to the lowest value. Moreover, 504 same as non-DR routers, DR also advertises DRLBC Hello Option to 505 indicate its capability of supporting this specification and the type 506 of its GDR election hash algorithm. 508 If a PIM DR receives a PIM Hello with DRLBGDR Option, the PIM DR 509 SHOULD ignore the TLV. 511 If a PIM DR receives a neighbor DRLBC Hello Option, which contains 512 the same hash algorithm type as the DR, and the neighbor has the same 513 DR priority as the DR, PIM DR SHOULD consider the neighbor as a GDR 514 Candidate and insert the GDR Candidate's Address into the sorted list 515 of DRLBGDR Option. 517 6.2. PIM GDR Candidate Operation 519 When an IGMP/MLD join is received, without this specification, only 520 PIM DR will handle the join and potentially run into the issues 521 described earlier. Using this specification, a hash algorithm is 522 used on GDR Candidate to determine which router is going to be 523 responsible for building forwarding trees on behalf of the host. 525 If a router supports this specification then each of the interfaces 526 where multicast protocol is enabled, it MUST advertise DRLBC Hello 527 Option in its PIM Hello. Though DRLBC option in PIM hello does not 528 guarantee that this router would be considered as a GDR candidate. 529 For example, this router may have lower priority configured on shared 530 LAN compare to other PIM routers. Once DR election is done, DRLBGDR 531 Hello option would be received from the current PIM DR on the link 532 which would contain list of GDR. 534 A GDR Candidate may receive a DRLBGDR Hello Option from PIM DR with 535 different Hash Masks from those configured on it. The GDR Candidate 536 must use the Hash Masks advertised by the PIM DR to calculate the 537 hash value. 539 A GDR Candidate may receive a DRLBGDR Hello Option from a PIM router 540 which is not DR. The GDR Candidate MUST ignore such DRLBGDR Hello 541 Option. 543 A GDR Candidate may receive a Hello from the elected PIM DR, and the 544 PIM DR does not support this specification. The GDR election 545 described by this specification will not take place, that is only the 546 PIM DR joins the multicast tree. 548 A router only acts as GDR if it is included in the GDR list of 549 DRLBGDR Hello Option 551 6.2.1. Router Receives New DRLBGDR 553 When a router receives a new DRLBGDR from the current PIM DR, it need 554 to process and check if router is in list of of GDR 556 1. If a router is not listed as a GDR candidate in DRLBGDR, no 557 action is needed. 559 2. If a router is listed as a GDR candidate in DRLBGDR, then it need 560 to process each of the groups in the IGMP/MLD reports. The masks 561 are announced in the PIM Hello by DR as DRLBGDR Hello option. 562 For each of groups in the reports it (PIM Router) needs to run 563 hash algorithm (described in section 4.3) based on the announced 564 Source, Group or RP masks to determine if it is GDR for specified 565 group. If the hash result is to be the GDR for the multicast 566 flow, it does build the multicast forwarding tree. If it is not 567 the GDR for the multicast flow, no action is needed. 569 6.2.2. Router Receives Updated DRLBGDR 571 If a router (GDR or non GDR) receives an unchanged DRLBGDR from the 572 current PIM DR, no action is needed. 574 If a router (GDR or non GDR) receives a new or modified DRLBGDR from 575 the current PIM DR. It requires processing as described below: 577 1. If it was GDR and still included in current GDR list: it needs to 578 process each of the groups and run the hash algorithm to check if 579 it is still the GDR for the given group. 581 If it was the GDR for group G and the new hash result chose it 582 as the GDR, then no processing is required. 584 If it was the GDR for a group earlier and now it is no longer 585 the GDR, then it sets its assert metric for the multicast flow 586 to be (PIM_ASSERT_INFINITY - 1), as explained in Sec 6.3 588 If it was not the GDR for a group earlier, than even the new 589 hash does not make it GDR. For the multicast group no 590 processing is required. 592 If it was not the GDR for an earlier group and now becomes the 593 GDR, it starts building multicast forwarding tree for this 594 flow. 596 2. If it was not the GDR , and updated DRLBGDR from current PIM DR 597 contains this router as one of the GDR. In this case this router 598 being new GDR candidate MUST run hash algorithm for each of the 599 groups (multicast flows) and for given group, 601 If it is not the GDR, no processing is required. 603 If it is hashed as the GDR , it needs to build multicast 604 forwarding tree. 606 6.3. PIM Assert Modification 608 It is possible that the identity of the GDR might change in the 609 middle of an active flow. Examples this could happen include: 611 When a new PIM router comes up 613 When a GDR restarts 615 When the GDR changes, existing traffic might be disrupted. 616 Duplicates or packet losses might be observed. To illustrate the 617 case, consider the following scenario where there are two streams G1 618 and G2. R1 is the GDR for G1, and R2 is the GDR for G2. When R3 619 comes up online, it is possible that R3 becomes GDR for both G1 and 620 G2, hence R3 starts to build the forwarding tree for G1 and G2. If 621 R1 and R2 stop forwarding before R3 completes the process, packet 622 loss might occur. On the other hand, if R1 and R2 continue 623 forwarding while R3 is building the forwarding trees, duplicates 624 might occur. 626 This is not a typical deployment scenario but might still happen. 627 Here we describe a mechanism to minimize the impact. We essentially 628 want to minimize packet loss. Therefore, we would allow a small 629 amount of duplicates and depend on PIM Assert to minimize the 630 duplication. 632 When the role of GDR changes as above, instead of immediately 633 stopping forwarding, R1 and R2 continue forwarding to G1 and G2 634 respectively, while, at the same time, R3 build forwarding trees for 635 G1 and G2. This will lead to PIM Asserts. 637 With the introduction of GDR, the following modification to the 638 Assert packet MUST be done: if a router enables this specification on 639 its downstream interface, but it is not a GDR (before network event 640 it was GDR), it would adjust its Assert metric to 641 (PIM_ASSERT_INFINITY - 1). 643 Using the above example, for G1, assume R1 and R3 agree on the new 644 GDR, which is R3. R1 will set its Assert metric as 645 (PIM_ASSERT_INFINITY - 1). That will make R3, which has normal 646 metric in its Assert as the Assert winner. 648 For G2, assume it takes a slightly longer time for R2 to find out 649 that R3 is the new GDR and still considers itself being the GDR while 650 R3 already has assumed the role of GDR. Since both R2 and R3 think 651 they are GDRs, they further compare the metric and IP address. If R3 652 has the better routing metric, or the same metric but a better tie- 653 breaker, the result will be consistent during GDR selection. If 654 unfortunately, R2 has the better metric or the same metric but a 655 better tie-breaker, R2 will become the Assert winner and continues to 656 forward traffic. This will continue until: 658 The next PIM Hello option from DR selects R3 as the GDR. R3 will 659 then build the forwarding tree and send an Assert. 661 The process continues until R2 agrees to the selection of R3 as the 662 GDR, and set its own Assert metric to (PIM_ASSERT_INFINITY - 1), 663 which will make R3 the Assert winner. During the process, we will 664 see intermittent duplication of traffic but packet loss will be 665 minimized. In the unlikely case that R2 never relinquishes its role 666 as GDR (while every other router thinks otherwise), the proposed 667 mechanism also helps to keep the duplication to a minimum until 668 manual intervention takes place to remedy the situation. 670 7. Compatibility 672 In case of the hybrid Ethernet shared LAN ( where some PIM router 673 enables specification defined in this draft and some do not enable) 675 o If a router which does not support specification defined in this 676 draft becomes DR on link, it MUST be only DR on link as [RFC4601] 677 and there would be no router which would act as GDR. 679 o If a router which does not support specification defined in this 680 draft becomes non DR on link, then it should act as non-DR defined 681 in [RFC4601]. 683 8. Manageability Considerations 685 o All of the routers in LAN that support this specification MUST use 686 identical Hash Algorithm Type (described in section 5.1). In the 687 case of a hybrid Hash Algorithm Type, one MUST go backward to use 688 DR election method defined in PIM-SM [RFC4601]. Migration between 689 different algorithm type is out of the scope of this document. 691 9. IANA Considerations 693 IANA has temporarily assigned type 34 for the PIM DR Load Balancing 694 Capability (DRLBC) Hello Option, and type 35 for the PIM DR Load 695 Balancing GDR (DRLBGDR) Hello Option. IANA is requested to make 696 these assignments permanent when this document is published as an 697 RFC. The string TBD should be replaced by the assigned values 698 accordingly. 700 10. Security Considerations 702 Security of the new DR Load Balancing PIM Hello Options is only 703 guaranteed by the security of PIM Hello message, so the security 704 considerations for PIM Hello messages as described in PIM-SM 705 [RFC4601] apply here. 707 11. Acknowledgement 709 The authors would like to thank Steve Simlo, Taki Millonis for 710 helping with the original idea, Bill Atwood, Bharat Joshi for review 711 comments, Toerless Eckert and Rishabh Parekh for helpful conversation 712 on the document. 714 Special thanks to Anish Kachinthaya, Anvitha Kachinthaya and Jake 715 Holland for reviewing the document and providing comments. 717 12. References 719 12.1. Normative References 721 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 722 Requirement Levels", BCP 14, RFC 2119, 723 DOI 10.17487/RFC2119, March 1997, 724 . 726 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 727 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 728 2006, . 730 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 731 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 732 Protocol Specification (Revised)", RFC 4601, 733 DOI 10.17487/RFC4601, August 2006, 734 . 736 [RFC6395] Gulrajani, S. and S. Venaas, "An Interface Identifier (ID) 737 Hello Option for PIM", RFC 6395, DOI 10.17487/RFC6395, 738 October 2011, . 740 12.2. Informative References 742 [HELLO-OPT] 743 IANA, "PIM Hello Options", IANA PIM-HELLO-OPTIONS, March 744 2007. 746 Authors' Addresses 748 Yiqun Cai 749 Alibaba Group 751 Email: yiqun.cai@alibaba-inc.com 753 Heidi Ou 754 Alibaba Group 756 Sri Vallepalli 757 Cisco Systems 758 3625 Cisco Way, 759 Sanjose, CALIFORNIA 95134 760 UNITED STATES 762 Email: svallepa@cisco.com 764 Mankamana Mishra 765 Cisco Systems 766 821 Alder Drive, 767 MILPITAS, CALIFORNIA 95035 768 UNITED STATES 770 Email: mankamis@cisco.com 772 Stig Venaas 773 Cisco Systems 774 821 Alder Drive, 775 MILPITAS, CALIFORNIA 95035 776 UNITED STATES 778 Email: stig@cisco.com 779 Andy Green 780 British Telecom 781 Adastral Park 782 Ipswich IP5 2RE 783 United Kingdom 785 Email: andy.da.green@bt.com