idnits 2.17.1 draft-ietf-pim-drlb-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 13, 2018) is 1988 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Cai 3 Internet-Draft H. Ou 4 Intended status: Standards Track Alibaba Group 5 Expires: May 17, 2019 S. Vallepalli 6 M. Mishra 7 S. Venaas 8 Cisco Systems, Inc. 9 A. Green 10 British Telecom 11 November 13, 2018 13 PIM Designated Router Load Balancing 14 draft-ietf-pim-drlb-10 16 Abstract 18 On a multi-access network, one of the PIM routers is elected as a 19 Designated Router (DR). On the last hop LAN, the PIM DR is 20 responsible for tracking local multicast listeners and forwarding 21 traffic to these listeners if the group is operating in PIM-SM. This 22 document specifies a modification to the PIM-SM protocol that allows 23 more than one of these last hop routers to be selected, so that the 24 forwarding load can be distributed among these routers. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on May 17, 2019. 43 Copyright Notice 45 Copyright (c) 2018 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 62 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 5 63 4. Functional Overview . . . . . . . . . . . . . . . . . . . . . 6 64 4.1. GDR Candidates . . . . . . . . . . . . . . . . . . . . . 6 65 4.2. Hash Mask and Hash Algorithm . . . . . . . . . . . . . . 7 66 4.3. Modulo Hash Algorithm . . . . . . . . . . . . . . . . . . 8 67 4.3.1. Limitations . . . . . . . . . . . . . . . . . . . . . 9 68 4.4. PIM Hello Options . . . . . . . . . . . . . . . . . . . . 9 69 5. Hello Option Formats . . . . . . . . . . . . . . . . . . . . 10 70 5.1. PIM DR Load Balancing Capability (DRLBC) Hello Option . . 10 71 5.2. PIM DR Load Balancing GDR (DRLBGDR) Hello Option . . . . 10 72 6. Protocol Specification . . . . . . . . . . . . . . . . . . . 11 73 6.1. PIM DR Operation . . . . . . . . . . . . . . . . . . . . 11 74 6.2. PIM GDR Candidate Operation . . . . . . . . . . . . . . . 12 75 6.2.1. Router Receives New DRLBGDR . . . . . . . . . . . . . 13 76 6.2.2. Router Receives Updated DRLBGDR . . . . . . . . . . . 13 77 6.3. PIM Assert Modification . . . . . . . . . . . . . . . . . 14 78 7. Compatibility . . . . . . . . . . . . . . . . . . . . . . . . 15 79 8. Manageability Considerations . . . . . . . . . . . . . . . . 16 80 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 81 9.1. Initial registry . . . . . . . . . . . . . . . . . . . . 16 82 9.2. Assignment of new hash algorithms . . . . . . . . . . . . 16 83 10. Security Considerations . . . . . . . . . . . . . . . . . . . 16 84 11. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 17 85 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 86 12.1. Normative References . . . . . . . . . . . . . . . . . . 17 87 12.2. Informative References . . . . . . . . . . . . . . . . . 17 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 90 1. Introduction 92 On a multi-access LAN such as an Ethernet, one of the PIM routers is 93 elected as a DR. The PIM DR has two roles in the PIM-SM protocol. 94 On the first hop LAN, the PIM DR is responsible for registering an 95 active source with the Rendezvous Point (RP) if the group is 96 operating in PIM-SM. On the last hop LAN, the PIM DR is responsible 97 for tracking local multicast listeners and forwarding to these 98 listeners if the group is operating in PIM-SM. 100 Consider the following last hop LAN in Figure 1: 102 (core networks) 103 | | | 104 | | | 105 R1 R2 R3 106 | | | 107 --(last hop LAN)-- 108 | 109 | 110 (many receivers) 112 Figure 1: Last Hop LAN 114 Assume R1 is elected as the Designated Router. According to 115 [RFC7761], R1 will be responsible for forwarding traffic to that LAN 116 on behalf of any local members. In addition to keeping track of IGMP 117 and MLD membership reports, R1 is also responsible for initiating the 118 creation of source and/or shared trees towards the senders or the 119 RPs. 121 Forcing sole data plane forwarding responsibility on the PIM DR 122 uncovers a limitation in the protocol. In comparison, even though an 123 OSPF DR or an IS-IS DIS handles additional duties while running the 124 OSPF or IS-IS protocols, they are not required to be solely 125 responsible for forwarding packets for the network. On the other 126 hand, on a last hop LAN, only the PIM DR is asked to forward packets 127 while the other routers handle only control traffic (and perhaps drop 128 packets due to RPF failures). Hence the forwarding load of a last 129 hop LAN is concentrated on a single router. 131 This leads to several issues. One of the issues is that the 132 aggregated bandwidth will be limited to what R1 can handle towards 133 this particular interface. It is very common that the last hop LAN 134 consists of switches that run IGMP/MLD or PIM snooping. This allows 135 the forwarding of multicast packets to be restricted only to segments 136 leading to receivers who have indicated their interest in multicast 137 groups using either IGMP or MLD. The emergence of the switched 138 Ethernet allows the aggregated bandwidth to exceed, sometimes by a 139 large number, that of a single link. For example, let us modify 140 Figure 1 and introduce an Ethernet switch in Figure 2. 142 (core networks) 143 | | | 144 | | | 145 R1 R2 R3 146 | | | 147 +=gi0===gi1===gi2=+ 148 + + 149 + switch + 150 + + 151 +=gi4===gi5===gi6=+ 152 | | | 153 H1 H2 H3 155 Figure 2: Last Hop Network with Ethernet Switch 157 Let us assume that each individual link is a Gigabit Ethernet. Each 158 router, R1, R2 and R3, and the switch have enough forwarding capacity 159 to handle hundreds of Gigabits of data. 161 Let us further assume that each of the hosts requests 500 Mbps of 162 unique multicast data. This totals to 1.5 Gbps of data, which is 163 less than what each switch or the combined uplink bandwidth across 164 the routers can handle, even under failure of a single router. 166 On the other hand, the link between R1 and switch, via port gi0, can 167 only handle a throughput of 1Gbps. And if R1 is the only DR (the PIM 168 DR elected using the procedure defined by [RFC7761]) at least 500 169 Mbps worth of data will be lost because the only link that can be 170 used to draw the traffic from the routers to the switch is via gi0. 171 In other words, the entire network's throughput is limited by the 172 single connection between the PIM DR and the switch (or the last hop 173 LAN as in Figure 1). 175 Another important issue is related to failover. If R1 is the only 176 forwarder on the last hop router for a shared LAN, when R1 goes out 177 of service, multicast forwarding for the entire LAN has to be rebuilt 178 by the newly elected PIM DR. However, if there was a way that 179 allowed multiple routers to forward to the LAN for different groups, 180 failure of one of the routers would only lead to disruption to a 181 subset of the flows, therefore improving the overall resilience of 182 the network. 184 There is a limitation in the hash algorithm used in this document, 185 but this document provides the option to have different and more 186 consistent hash algorithms in the future. 188 This document specifies a modification to the PIM-SM protocol that 189 allows more than one of these routers, called Group Designated 190 Routers (GDR) to be selected so that the forwarding load can be 191 distributed among a number of routers. 193 2. Terminology 195 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 196 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 197 document are to be interpreted as described in [RFC2119]. 199 With respect to PIM, this document follows the terminology that has 200 been defined in [RFC7761]. 202 This document also introduces the following new acronyms: 204 o GDR: GDR stands for "Group Designated Router". For each multicast 205 flow, either a (*,G) for ASM, or an (S,G) for SSM, a hash 206 algorithm (described below) is used to select one of the routers 207 as a GDR. The GDR is responsible for initiating the forwarding 208 tree building process for the corresponding multicast flow. 210 o GDR Candidate: a last hop router that has the potential to become 211 a GDR. A GDR Candidate must have the same DR priority and must 212 run the same GDR election hash algorithm as the DR router. It 213 must send and process new PIM Hello Options as defined in this 214 document. There might be more than one GDR Candidate on a LAN, 215 but only one can become GDR for a specific multicast flow. 217 3. Applicability 219 The extension specified in this document applies to PIM-SM last hop 220 routers only. It does not alter the behavior of a PIM DR on the 221 first hop network. This is because the source tree is built using 222 the IP address of the sender, not the IP address of the PIM DR that 223 sends the registers towards the RP. The load balancing between first 224 hop routers can be achieved naturally if an IGP provides equal cost 225 multiple paths (which it usually does in practice). Also 226 distributing the load to do registering does not justify the 227 additional complexity required to support it. 229 4. Functional Overview 231 In the PIM DR election as defined in [RFC7761], when multiple last 232 hop routers are connected to a multi-access LAN (for example, an 233 Ethernet), one of them is elected to act as PIM DR. The PIM DR is 234 responsible for sending local Join/Prune messages towards the RP or 235 source. In order to elect the PIM DR, each PIM router on the LAN 236 examines the received PIM Hello messages and compares its own DR 237 priority and IP address with those of its neighbors. The router with 238 the highest DR priority is the PIM DR. If there are multiple such 239 routers, their IP addresses are used as the tie-breaker, as described 240 in [RFC7761]. 242 In order to share forwarding load among last hop routers, besides the 243 normal PIM DR election, the GDR is also elected on the last hop 244 multi-access LAN. There is only one PIM DR on the multi-access LAN, 245 but there might be multiple GDR Candidates. 247 For each multicast flow, that is, (*,G) for ASM and (S,G) for SSM, a 248 hash algorithm is used to select one of the routers to be the GDR. A 249 new DR Load Balancing Capability (DRLBC) PIM Hello Option, which 250 contains hash algorithm type, is announced by routers on interfaces 251 where this specification is enabled. Last hop routers with the new 252 DRLBC Option advertised in its Hello, and using the same GDR election 253 hash algorithm and the same DR priority as the PIM DR, are considered 254 as GDR Candidates. 256 Hash Masks are defined for Source, Group and RP separately, in order 257 to handle PIM ASM/SSM. The masks, as well as a sorted list of GDR 258 Candidate Addresses, are announced by the DR in a new DR Load 259 Balancing GDR (DRLBGDR) PIM Hello Option. 261 A hash algorithm based on the announced Source, Group, or RP masks 262 allows one GDR to be assigned to a corresponding multicast state. 263 And that GDR is responsible for initiating the creation of the 264 multicast forwarding tree for multicast traffic. 266 4.1. GDR Candidates 268 GDR is the new concept introduced by this specification. GDR 269 Candidates are routers eligible for GDR election on the LAN. To 270 become a GDR Candidate, a router MUST support this specification, 271 have the same DR priority and run the same GDR election hash 272 algorithm as the DR on the LAN. 274 For example, assume there are 4 routers on the LAN: R1, R2, R3 and 275 R4, which all support this specification. R1, R2 and R3 have the 276 same DR priority while R4's DR priority is less preferred. In this 277 example, R4 will not be eligible for GDR election, because R4 will 278 not become a PIM DR unless all of R1, R2 and R3 go out of service. 280 Furthermore, assume router R1 wins the PIM DR election, R1 and R2 run 281 the same hash algorithm for GDR election, while R3 runs a different 282 one. In this case, only R1 and R2 will be eligible for GDR election, 283 while R3 will not. 285 As a DR, R1 will include its own Load Balancing Hash Masks and the 286 identity of R1 and R2 (the GDR Candidates) in its DRLBGDR Hello 287 Option. 289 4.2. Hash Mask and Hash Algorithm 291 A Hash Mask is used to extract a number of bits from the 292 corresponding IP address field (32 for v4, 128 for v6) and calculate 293 a hash value. A hash value is used to select a GDR from GDR 294 Candidates advertised by PIM DR. For example, 0.0.255.0 defines a 295 Hash Mask for an IPv4 address that masks the first, the second, and 296 the fourth octets. 298 There are three Hash Masks defined: 300 o RP Hash Mask 302 o Source Hash Mask 304 o Group Hash Mask 306 The hash masks need to be configured on the PIM routers that can 307 potentially become a PIM DR, unless the implementation provides 308 default Hash Mask values. An implementation SHOULD provide masks 309 with default values 255.255.255.255 (IPv4) and 310 FFFF:FFFF:FFFF:FFFF:FFFFF:FFFF:FFFF:FFFF (IPv6). 312 o If the group is in ASM mode and the RP Hash Mask announced by the 313 PIM DR is not 0, calculate the value of hashvalue_RP [Section 4.3] 314 to determine GDR. 316 o If the group is in ASM mode and the RP Hash Mask announced by the 317 PIM DR is 0, obtain the value of hashvalue_Group [Section 4.3 ] to 318 determine GDR. 320 o If the group is in SSM mode, use hashvalue_SG [Section 4.3] to 321 determine GDR. 323 A simple Modulo hash algorithm is defined in this document. However, 324 to allow another hash algorithms to be used, a 1-octet "Hash 325 Algorithm" field is included in DRLBC Hello Option to specify the 326 hash algorithm used by a last hop router. 328 If different hash algorithms are advertised among last hop routers, 329 only last hop routers running the same hash algorithm as the DR (and 330 having the same DR priority as the DR) are eligible for GDR election. 332 4.3. Modulo Hash Algorithm 334 The Modulo hash algorithm is discussed here with a detailed 335 description on hashvalue_RP. The same algorithm is described in 336 brief for hashvalue_Group using the group address instead of the RP 337 address for an ASM group with zero RP_hashmask, and also with 338 hashvalue_SG for a the source address of an (S,G), instead of the RP 339 address, 341 o For ASM groups, with a non-zero RP_Hash Mask, hash value is 342 calculated as: 344 hashvalue_RP = (((RP_address & RP_hashmask) >> N) & 0xFFFF) % M 346 RP_address is the address of the RP defined for the group. N 347 is the number of zeroes, counted from the least significant bit 348 of the RP_hashmask. M is the number of GDR Candidates. 350 For example, Router X with IPv4 address 203.0.113.1 receives a 351 DRLBGDR Hello Option from the DR, which announces RP Hash Mask 352 0.0.255.0 and a list of GDR Candidates, sorted by IP addresses 353 from high to low: 203.0.113.3, 203.0.113.2 and 203.0.113.1. 354 The ordinal number assigned to those addresses would be: 356 0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1 (Router 357 X) 359 Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2 360 198.51.100.2 for Group2. Following the modulo hash algorithm: 362 N is 8 for 0.0.255.0, and M is 3 for the total number of GDR 363 Candidates. The hashvalue_RP for RP1 192.0.2.1 is: 365 (((192.0.2.1 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 2 % 3 = 2 367 matches the ordinal number assigned to Router X. Router X will 368 be the GDR for Group1, which uses 192.0.2.1 as the RP. 370 The hashvalue_RP for RP2 198.51.100.2 is: 372 (((198.51.100.2 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 100 % 3 = 1 373 which is different from Router X's ordinal number(2) hence, 374 Router X will not be GDR for Group2. 376 o If RP_hashmask is 0, a hash value for an ASM group is calculated 377 using the Group Hash Mask: 379 hashvalue_Group = (((Group_address & Group_hashmask) >> N) & 380 0xFFFF) % M 382 Compare hashvalue_Group with Ordinal number assigned to Router 383 X, to decide if Router X is the GDR. 385 o For SSM groups, a hash value is calculated using both the Source 386 and Group Hash Mask: 388 hashvalue_SG = ((((Source_address & Source_hashmask) >> N_S) & 389 0xFFFF) ^ (((Group_address & Group_hashmask) >> N_G) & 0xFFFF)) 390 % M 392 4.3.1. Limitations 394 The Modulo Hash Algorithm has poor failover characteristics when a 395 shared LAN has more than two GDRs. In the case of more than two GDRs 396 on a LAN, when one GDR fails, all of the groups may be reassigned to 397 a new GDR, even if they were not assigned to the failed GDR. 398 However, many deployments use only two routers on a shared LAN for 399 redundancy purposes. Future work may define new hash algorithms 400 where only groups assigned to the failed GDR get reassigned. 402 4.4. PIM Hello Options 404 When a last hop PIM router sends a PIM Hello for an interface with 405 this specification enabled, it includes a new option, called "Load 406 Balancing Capability (DRLBC)". 408 Besides this DRLBC Hello Option, the elected PIM DR also includes a 409 new "DR Load Balancing GDR (DRLBGDR) Hello Option". The DRLBGDR 410 Hello Option consists of three Hash Masks as defined above and also a 411 sorted list of GDR Candidate addresses on the last hop LAN. 413 The elected PIM DR uses DRLBC Hello Option advertised by all routers 414 on the last hop LAN to compose the DRLBGDR Option. The GDR 415 Candidates use the DRLBGDR Hello Option advertised by the PIM DR to 416 calculate the hash value. 418 5. Hello Option Formats 420 5.1. PIM DR Load Balancing Capability (DRLBC) Hello Option 422 0 1 2 3 423 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 424 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 425 | Type = TBD | Length = 4 | 426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 427 | Reserved |Hash Algorithm | 428 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 430 Figure 3: Capability Hello Option 432 Type: TBD 434 Length: 4 436 Hash Algorithm: 0 for Modulo 438 This DRLBC Hello Option MUST be advertised by last hop routers on 439 interfaces with this specification enabled. 441 5.2. PIM DR Load Balancing GDR (DRLBGDR) Hello Option 443 0 1 2 3 444 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 446 | Type = TBD | Length | 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 | Group Mask | 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 | Source Mask | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | RP Mask | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | GDR Candidate Address(es) | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 457 Figure 4: GDR Hello Option 459 Type: TBD 461 Length: (3 + n) x (4 or 16) where n is the number of GDR 462 candidates. 464 Group Mask (32/128 bits): Mask 466 Source Mask (32/128 bits): Mask 468 RP Mask (32/128 bits): Mask 470 All masks MUST be in the same address family as the Hello IP 471 header. 473 GDR Address (32/128 bits): Address(es) of GDR Candidate(s) 475 All addresses must be in the same address family as the Hello 476 IP header. The addresses are sorted in descending order. The 477 order is converted to the ordinal number associated with each 478 GDR candidate in hash value calculation. For example, if 479 addresses advertised are R3, R2, R1, the ordinal number 480 assigned to R3 is 0, to R2 is 1 and to R1 is 2. 482 If the "Interface ID" option, as specified in [RFC6395], is 483 present in a GDR Candidate's PIM Hello message, and the "Router 484 ID" portion is non-zero: 486 + For IPv4, the "GDR Candidate Address" will be set directly 487 to the "Router ID". 489 + For IPv6, the "GDR Candidate Address" will be set to the 490 IPv4-IPv6 translated address of the "Router ID", as 491 described in [RFC4291], that is the "Router-ID" is appended 492 to the prefix of 96 bits of zeroes. 494 If the "Interface ID" option is not present in a GDR 495 Candidate's PIM Hello message, or if the "Interface ID" option 496 is present but the "Router ID" field is zero, the "GDR 497 Candidate Address" will be the IPv4 or IPv6 source address of 498 the PIM Hello message. 500 This DRLBGDR Hello Option MUST only be advertised by the 501 elected PIM DR. 503 6. Protocol Specification 505 6.1. PIM DR Operation 507 The DR election process is still the same as defined in [RFC7761]. A 508 DR that has this specification enabled on an interface advertises the 509 new DRLBGDR Hello Option, which contains mask values from user 510 configuration, followed by a sorted list of GDR Candidate Addresses, 511 from the highest value to the lowest value. Moreover, same as non-DR 512 routers, the DR also advertises DRLBC Hello Option to indicate its 513 capability of supporting this specification and the type of its GDR 514 election hash algorithm. 516 If a PIM DR receives a PIM Hello with the DRLBGDR Option, the PIM DR 517 SHOULD ignore the TLV. 519 If a PIM DR receives a neighbor DRLBC Hello Option, which contains 520 the same hash algorithm as the DR, and the neighbor has the same DR 521 priority as the DR, PIM DR SHOULD consider the neighbor as a GDR 522 Candidate and insert the GDR Candidate's Address into the sorted list 523 of the DRLBGDR Option. However, the DR MAY have policies limiting 524 which GDR Candidates, or the number of GDR Candidates to include. 526 6.2. PIM GDR Candidate Operation 528 When an IGMP/MLD report is received, without this specification, only 529 the PIM DR will handle the join and potentially run into the issues 530 described earlier. Using this specification, a hash algorithm is 531 used by the GDR Candidates to determine which router is going to be 532 responsible for building forwarding trees on behalf of the host. 534 If this specification is enabled on an interface, the router MUST 535 include the DRLBC Hello Option in its PIM Hello on the interface. 536 Note that the presence of the DRLBC Option in PIM Hello does not 537 guarantee that this router would be considered as a GDR candidate. 538 Once DR election is done, the DRLBGDR Hello Option would be received 539 from the current PIM DR on the link which would contain a list of 540 GDRs selected by the PIM DR. 542 A router only acts as a GDR candidate if it is included in the GDR 543 list of the DRLBGDR Hello Option. 545 A GDR Candidate may receive a DRLBGDR Hello Option from the PIM DR 546 with different Hash Masks from those the candidate was configured 547 with. The GDR Candidate MUST use the Hash Masks advertised by the 548 PIM DR to calculate the hash value. 550 A GDR Candidate MUST ignore the DRLBGDR Hello Option if it is 551 received from a PIM router which is not the DR. 553 If the PIM DR does not support this specification, GDR election will 554 not take place, and only the PIM DR joins the multicast tree. 556 6.2.1. Router Receives New DRLBGDR 558 The first time a router receives a DRLBGDR option from the PIM DR, it 559 MUST process the option and check if it is in the GDR list. 561 1. If a router is not listed as a GDR candidate in DRLBGDR, no 562 action is needed. 564 2. If a router is listed as a GDR candidate in DRLBGDR, then it MUST 565 process each of the groups, or source and group pairs if SSM, in 566 the IGMP/MLD reports. The masks are announced in the PIM Hello 567 by the DR in the DRLBGDR Hello Option. For each group in the 568 reports that is in ASM mode, and each source and group pair if 569 the group is in SSM mode, it (PIM Router) needs to run the hash 570 algorithm (described in section 4.3) based on the announced 571 Source, Group or RP masks to determine if it is the GDR for 572 specified group, or source and group pair. If the hash result is 573 to be the GDR for the multicast flow, it does build the multicast 574 forwarding tree. If it is not the GDR for the multicast flow, no 575 action is needed. 577 6.2.2. Router Receives Updated DRLBGDR 579 If a router (GDR or non GDR) receives an unchanged DRLBGDR from the 580 current PIM DR, no action is needed. 582 If a router (GDR or non GDR) receives a new or modified DRLBGDR from 583 the current PIM DR, it requires processing as described below: 585 1. If it was included in the previous GDR list, and still is 586 included in the new GDR list: It needs to process each of the 587 groups, or source and group pairs if the group is in SSM mode, 588 and run the hash algorithm to check if it is still the GDR for 589 the given group, or source and group pair if SSM. 591 If it was the GDR for a group, or source and group pair if 592 SSM, and the new hash result chose it as the GDR, then no 593 processing is required. 595 If it was the GDR for a group, or source and group pair if 596 SSM, earlier and now it is no longer the GDR, then it sets its 597 assert metric for the multicast flow to be 598 (PIM_ASSERT_INFINITY - 1), as explained in Section 6.3. 600 If it was not the GDR for a group, or source and group pair if 601 SSM, earlier, and the new hash does not make it GDR, then no 602 processing is required. 604 If it was not the GDR for an earlier group, or source and 605 group pair if SSM, and now becomes the GDR, it starts building 606 multicast forwarding tree for this flow. 608 2. If it was included in the previous GDR list, but is not included 609 in the new GDR list: It needs to process each of the groups, or 610 source and group pairs if the group is in SSM mode. 612 If it was the GDR for a group, or source and group pair if 613 SSM, it sets its assert metric for the multicast flow to be 614 (PIM_ASSERT_INFINITY - 1), as explained in Section 6.3. 616 If it was not the GDR, then no processing is required. 618 3. If it was not included in the previous GDR list, but is included 619 in the new GDR list, the router MUST run the hash algorithm for 620 each of the groups, source and group pairs if SSM. 622 If it is not the GDR for a group, or source and group pair if 623 SSM, no processing is required. 625 If it is hashed as the GDR, it needs to build a multicast 626 forwarding tree. 628 6.3. PIM Assert Modification 630 It is possible that the identity of the GDR might change in the 631 middle of an active flow. Examples when this could happen include: 633 When a new PIM router comes up 635 When a GDR restarts 637 When the GDR changes, existing traffic might be disrupted. 638 Duplicates or packet loss might be observed. To illustrate the case, 639 consider the following scenario where there are two flows G1 and G2. 640 R1 is the GDR for G1, and R2 is the GDR for G2. When R3 comes up 641 online, it is possible that R3 becomes GDR for both G1 and G2, hence 642 R3 starts to build the forwarding tree for G1 and G2. If R1 and R2 643 stop forwarding before R3 completes the process, packet loss might 644 occur. On the other hand, if R1 and R2 continue forwarding while R3 645 is building the forwarding trees, duplicates might occur. 647 This is not a typical deployment scenario but might still happen. 648 Here we describe a mechanism to minimize the impact. We essentially 649 want to minimize packet loss. Therefore, we would allow a small 650 amount of duplicates and depend on PIM Assert to minimize the 651 duplication. 653 When the role of GDR changes as above, instead of immediately 654 stopping forwarding, R1 and R2 continue forwarding to G1 and G2 655 respectively, while, at the same time, R3 build forwarding trees for 656 G1 and G2. This will lead to PIM Asserts. 658 With the introduction of GDR, the following modification to the 659 Assert packet MUST be done: if a router enables this specification on 660 its downstream interface, but it is not a GDR (before network event 661 it was GDR), it would adjust its Assert metric to 662 (PIM_ASSERT_INFINITY - 1). 664 Using the above example, for G1, assume R1 and R3 agree on the new 665 GDR, which is R3. R1 will set its Assert metric as 666 (PIM_ASSERT_INFINITY - 1). That will make R3, which has normal 667 metric in its Assert as the Assert winner. 669 For G2, assume it takes a slightly longer time for R2 to find out 670 that R3 is the new GDR and still considers itself being the GDR while 671 R3 already has assumed the role of GDR. Since both R2 and R3 think 672 they are GDRs, they further compare their metric and IP addresses. 673 If R3 has the better routing metric, or the same metric but a better 674 tie-breaker, the result will be consistent during GDR selection. If 675 unfortunately, R2 has the better metric or the same metric but a 676 better tie-breaker, R2 will become the Assert winner and continues to 677 forward traffic. This will continue until: 679 The next PIM Hello Option from DR selects R3 as the GDR. R3 will 680 then build the forwarding tree and send an Assert. 682 The process continues until R2 agrees to the selection of R3 as the 683 GDR, and sets its own Assert metric to (PIM_ASSERT_INFINITY - 1), 684 which will make R3 the Assert winner. During the process, we will 685 see intermittent duplication of traffic but packet loss will be 686 minimized. In the unlikely case that R2 never relinquishes its role 687 as GDR (while every other router thinks otherwise), the proposed 688 mechanism also helps to keep the duplication to a minimum until 689 manual intervention takes place to remedy the situation. 691 7. Compatibility 693 In the case of a hybrid Ethernet shared LAN (where some PIM routers 694 enable the specification defined in this document, and some do not) 696 o If a router which does not support this specification becomes the 697 DR on the LAN, then it is the only router acting as a DR, and 698 there will be no load-balancing. 700 o If a router which does not support this specification becomes a 701 non-DR on link, then it acts as non-DR defined in [RFC7761], and 702 it will not take part in any load-balancing. 704 8. Manageability Considerations 706 Only the routers announcing the same Hash Algorithm as the DR would 707 be considered as GDR candidates. Network administrators need to make 708 sure that the desired set of routers announce the same algorithm. 709 Migration between different algorithms is not considered in this 710 document. 712 9. IANA Considerations 714 IANA has temporarily assigned type 34 for the PIM DR Load Balancing 715 Capability (DRLBC) Hello Option, and type 35 for the PIM DR Load 716 Balancing GDR (DRLBGDR) Hello Option in the PIM-Hello Options 717 registry. IANA is requested to make these assignments permanent when 718 this document is published as an RFC. The string TBD should be 719 replaced by the assigned values accordingly. 721 This document requests IANA to create a registry called "Designated 722 Router Load Balancing Hash Algorithms" in the "Protocol Independent 723 Multicast (PIM)" branch of the registry tree. The registry lists 724 hash algorithms for use by PIM Designated Router Load Balancing. 726 9.1. Initial registry 728 The initial content of the registry should be as follows. 730 Type Name Reference 731 ------ ---------------------------------------- -------------------- 732 0 Modulo This document 733 1-255 Unassigned 735 9.2. Assignment of new hash algorithms 737 Assignment of new hash algorithms is done according to the "IETF 738 Review" model, see [RFC5226]. 740 10. Security Considerations 742 Security of the new DR Load Balancing PIM Hello Options is only 743 guaranteed by the security of PIM Hello messages, so the security 744 considerations for PIM Hello messages as described in PIM-SM 745 [RFC7761] apply here. 747 11. Acknowledgement 749 The authors would like to thank Steve Simlo, Taki Millonis for 750 helping with the original idea, Bill Atwood, Bharat Joshi for review 751 comments, Toerless Eckert and Rishabh Parekh for helpful conversation 752 on the document. 754 Special thanks to Anish Kachinthaya, Anvitha Kachinthaya and Jake 755 Holland for reviewing the document and providing comments. 757 12. References 759 12.1. Normative References 761 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 762 Requirement Levels", BCP 14, RFC 2119, 763 DOI 10.17487/RFC2119, March 1997, 764 . 766 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 767 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 768 2006, . 770 [RFC6395] Gulrajani, S. and S. Venaas, "An Interface Identifier (ID) 771 Hello Option for PIM", RFC 6395, DOI 10.17487/RFC6395, 772 October 2011, . 774 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 775 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 776 Multicast - Sparse Mode (PIM-SM): Protocol Specification 777 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 778 2016, . 780 12.2. Informative References 782 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 783 IANA Considerations Section in RFCs", RFC 5226, 784 DOI 10.17487/RFC5226, May 2008, 785 . 787 Authors' Addresses 788 Yiqun Cai 789 Alibaba Group 791 Email: yiqun.cai@alibaba-inc.com 793 Heidi Ou 794 Alibaba Group 796 Sri Vallepalli 797 Cisco Systems, Inc. 798 3625 Cisco Way 799 San Jose CA 95134 800 USA 802 Email: svallepa@cisco.com 804 Mankamana Mishra 805 Cisco Systems, Inc. 806 821 Alder Drive, 807 Milpitas CA 95035 808 USA 810 Email: mankamis@cisco.com 812 Stig Venaas 813 Cisco Systems, Inc. 814 Tasman Drive 815 San Jose CA 95134 816 USA 818 Email: stig@cisco.com 820 Andy Green 821 British Telecom 822 Adastral Park 823 Ipswich IP5 2RE 824 United Kingdom 826 Email: andy.da.green@bt.com