idnits 2.17.1 draft-ietf-pim-drlb-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 23, 2019) is 1646 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Cai 3 Internet-Draft H. Ou 4 Intended status: Standards Track Alibaba Group 5 Expires: April 25, 2020 S. Vallepalli 6 M. Mishra 7 S. Venaas 8 Cisco Systems, Inc. 9 A. Green 10 British Telecom 11 October 23, 2019 13 PIM Designated Router Load Balancing 14 draft-ietf-pim-drlb-13 16 Abstract 18 On a multi-access network, one of the PIM-SM routers is elected as a 19 Designated Router. One of the responsibilities of the Designated 20 Router is to track local multicast listeners and forward data to 21 these listeners if the group is operating in PIM-SM. This document 22 specifies a modification to the PIM-SM protocol that allows more than 23 one of the PIM-SM routers to take on this responsibility so that the 24 forwarding load can be distributed among multiple routers. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on April 25, 2020. 43 Copyright Notice 45 Copyright (c) 2019 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 62 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 5 63 4. Functional Overview . . . . . . . . . . . . . . . . . . . . . 5 64 4.1. GDR Candidates . . . . . . . . . . . . . . . . . . . . . 6 65 5. Protocol Specification . . . . . . . . . . . . . . . . . . . 7 66 5.1. Hash Mask and Hash Algorithm . . . . . . . . . . . . . . 7 67 5.2. Modulo Hash Algorithm . . . . . . . . . . . . . . . . . . 8 68 5.2.1. Modulo Hash Algorithm Examples . . . . . . . . . . . 9 69 5.2.2. Limitations . . . . . . . . . . . . . . . . . . . . . 10 70 5.3. PIM Hello Options . . . . . . . . . . . . . . . . . . . . 10 71 5.3.1. PIM DR Load Balancing Capability (DRLB-Cap) Hello 72 Option . . . . . . . . . . . . . . . . . . . . . . . 11 73 5.3.2. PIM DR Load Balancing List (DRLB-List) Hello Option . 11 74 5.4. PIM DR Operation . . . . . . . . . . . . . . . . . . . . 13 75 5.5. PIM GDR Candidate Operation . . . . . . . . . . . . . . . 13 76 5.6. DRLB-List Hello Option Processing . . . . . . . . . . . . 14 77 5.7. PIM Assert Modification . . . . . . . . . . . . . . . . . 15 78 5.8. Backward Compatibility . . . . . . . . . . . . . . . . . 16 79 6. Operational Considerations . . . . . . . . . . . . . . . . . 16 80 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 81 7.1. Initial registry . . . . . . . . . . . . . . . . . . . . 17 82 7.2. Assignment of new Hash Algorithms . . . . . . . . . . . . 17 83 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 84 9. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 18 85 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 86 10.1. Normative References . . . . . . . . . . . . . . . . . . 18 87 10.2. Informative References . . . . . . . . . . . . . . . . . 18 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 90 1. Introduction 92 On a multi-access LAN, such as an Ethernet, with one or more PIM-SM 93 [RFC7761] routers, one of the PIM-SM routers is elected as a 94 Designated Router (DR). The PIM DR has two responsibilities in the 95 PIM-SM protocol. For any active sources on a LAN, the PIM DR is 96 responsible for registering with the Rendezvous Point (RP) if the 97 group is operating in PIM-SM. Also, the PIM DR is responsible for 98 tracking local multicast listeners and forwarding to these listeners 99 if the group is operating in PIM-SM. 101 Consider the following LAN in Figure 1: 103 (core networks) 104 | | | 105 | | | 106 R1 R2 R3 107 | | | 108 ----(LAN)---- 109 | 110 | 111 (many receivers) 113 Figure 1: LAN with receivers 115 Assume R1 is elected as the DR. According to the PIM-SM protocol, R1 116 will be responsible for forwarding traffic to that LAN on behalf of 117 any local members. In addition to keeping track of membership 118 reports, R1 is also responsible for initiating the creation of source 119 and/or shared trees towards the senders or the RPs. The membership 120 reports would be IGMP or MLD messages. This applies to any versions 121 of the IGMP and MLD protocols. The most recent versions are IGMPv3 122 [RFC3376] and MLDv2 [RFC3810]. 124 Having a single router acting as DR and being responsible for data 125 plane forwarding leads to several issues. One of the issues is that 126 the aggregated bandwidth will be limited to what R1 can handle with 127 regards to capacity of incoming links, the interface on the LAN, and 128 total forwarding capacity. It is very common that a LAN consists of 129 switches that run IGMP/MLD or PIM snooping [RFC4541]. This allows 130 the forwarding of multicast packets to be restricted only to segments 131 leading to receivers who have indicated their interest in multicast 132 groups using either IGMP or MLD. The emergence of the switched 133 Ethernet allows the aggregated bandwidth to exceed, sometimes by a 134 large number, that of a single link. For example, let us modify 135 Figure 1 and introduce an Ethernet switch in Figure 2. 137 (core networks) 138 | | | 139 | | | 140 R1 R2 R3 141 | | | 142 +=gi0===gi1===gi2=+ 143 + + 144 + switch + 145 + + 146 +=gi4===gi5===gi6=+ 147 | | | 148 H1 H2 H3 150 Figure 2: LAN with Ethernet Switch 152 Let us assume that each individual link is a Gigabit Ethernet. Each 153 router, R1, R2 and R3, and the switch have enough forwarding capacity 154 to handle hundreds of Gigabits of data. 156 Let us further assume that each of the hosts requests 500 Mbps of 157 unique multicast data. This totals to 1.5 Gbps of data, which is 158 less than what each switch or the combined uplink bandwidth across 159 the routers can handle, even under failure of a single router. 161 On the other hand, the link between R1 and switch, via port gi0, can 162 only handle a throughput of 1Gbps. And if R1 is the only DR (the PIM 163 DR elected using the procedure defined by [RFC7761]) at least 500 164 Mbps worth of data will be lost because the only link that can be 165 used to draw the traffic from the routers to the switch is via gi0. 166 In other words, the entire network's throughput is limited by the 167 single connection between the PIM DR and the switch (or LAN as in 168 Figure 1). 170 Another important issue is related to failover. If R1 is the only 171 forwarder on a shared LAN, when R1 goes out of service, multicast 172 forwarding for the entire LAN has to be rebuilt by the newly elected 173 PIM DR. However, if there was a way that allowed multiple routers to 174 forward to the LAN for different groups, failure of one of the 175 routers would only lead to disruption to a subset of the flows, 176 therefore improving the overall resilience of the network. 178 This document specifies a modification to the PIM-SM protocol that 179 allows more than one of these routers, called Group Designated 180 Routers (GDR) to be selected so that the forwarding load can be 181 distributed among a number of routers. 183 2. Terminology 185 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 186 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 187 "OPTIONAL" in this document are to be interpreted as described in BCP 188 14 [RFC2119] [RFC8174] when, and only when, they appear in all 189 capitals, as shown here. 191 With respect to PIM-SM, this document follows the terminology that 192 has been defined in [RFC7761]. 194 This document also introduces the following new acronyms: 196 o GDR: Group Designated Router. For each multicast flow, either a 197 (*,G) for Any-Source Multicast (ASM), or an (S,G) for Source- 198 Specific Multicast (SSM) [RFC4607], a Hash Algorithm (described 199 below) is used to select one of the routers as a GDR. The GDR is 200 responsible for initiating the forwarding tree building process 201 for the corresponding multicast flow. 203 o GDR Candidate: a router that has the potential to become a GDR. 204 There might be multiple GDR Candidates on a LAN, but only one can 205 become the GDR for a specific multicast flow. 207 3. Applicability 209 The extension specified in this document applies to PIM-SM when they 210 act as last hop routers (there are directly connected receivers). It 211 does not alter the behavior of a PIM DR, or any other routers, on the 212 first hop network (directly connected sources). This is because the 213 source tree is built using the IP address of the sender, not the IP 214 address of the PIM DR that sends the registers towards the RP. The 215 load balancing between first hop routers can be achieved naturally if 216 an IGP provides equal cost multiple paths (which it usually does in 217 practice). Also distributing the load to do registering does not 218 justify the additional complexity required to support it. 220 4. Functional Overview 222 In the PIM DR election as defined in [RFC7761], when multiple routers 223 are connected to a multi-access LAN (for example, an Ethernet), one 224 of them is elected to act as PIM DR. The PIM DR is responsible for 225 sending local Join/Prune messages towards the RP or source. In order 226 to elect the PIM DR, each PIM router on the LAN examines the received 227 PIM Hello messages and compares its own DR priority and IP address 228 with those of its neighbors. The router with the highest DR priority 229 is the PIM DR. If there are multiple such routers, their IP 230 addresses are used as the tie-breaker, as described in [RFC7761]. 232 In order to share forwarding load among last hop routers, besides the 233 normal PIM DR election, the GDR is also elected on the multi-access 234 LAN. There is only one PIM DR on the multi-access LAN, but there 235 might be multiple GDR Candidates. 237 For each multicast flow, that is, (*,G) for ASM and (S,G) for SSM, a 238 Hash Algorithm is used to select one of the routers to be the GDR. 239 The new DR Load Balancing Capability (DRLB-Cap) PIM Hello Option is 240 used to announce the Capability as well as the Hash Algorithm type. 241 Routers with the new DRLB-Cap Option advertised in their PIM Hello, 242 using the same GDR election Hash Algorithm and the same DR priority 243 as the PIM DR, are considered as GDR Candidates. 245 Hash Masks are defined for Source, Group and RP separately, in order 246 to handle PIM ASM/SSM. The masks, as well as a sorted list of GDR 247 Candidate Addresses, are announced by the DR in a new DR Load 248 Balancing List (DRLB-List) PIM Hello Option. 250 A Hash Algorithm based on the announced Source, Group, or RP masks 251 allows one GDR to be assigned to a corresponding multicast state. 252 And that GDR is responsible for initiating the creation of the 253 multicast forwarding tree for multicast traffic. 255 4.1. GDR Candidates 257 GDR is the new concept introduced by this specification. GDR 258 Candidates are routers eligible for GDR election on the LAN. To 259 become a GDR Candidate, a router must have the same DR priority and 260 run the same GDR election Hash Algorithm as the DR on the LAN. 262 For example, assume there are 4 routers on the LAN: R1, R2, R3 and 263 R4, each announcing a DRLB-Cap option. R1, R2 and R3 have the same 264 DR priority while R4's DR priority is less preferred. In this 265 example, R4 will not be eligible for GDR election, because R4 will 266 not become a PIM DR unless all of R1, R2 and R3 go out of service. 268 Furthermore, assume router R1 wins the PIM DR election, R1 and R2 run 269 the same Hash Algorithm for GDR election, while R3 runs a different 270 one. In this case, only R1 and R2 will be eligible for GDR election, 271 while R3 will not. 273 As a DR, R1 will include its own Load Balancing Hash Masks and the 274 identity of R1 and R2 (the GDR Candidates) in its DRLB-List Hello 275 Option. 277 5. Protocol Specification 279 5.1. Hash Mask and Hash Algorithm 281 A Hash Mask is used to extract a number of bits from the 282 corresponding IP address field (32 for IPv4, 128 for IPv6) and 283 calculate a hash value. A hash value is used to select a GDR from 284 GDR Candidates advertised by the PIM DR. Hash masks allow for 285 certain flows to always be forwarded by the same GDR, by ignoring 286 certain bits in the hash value calculation, so that the hash values 287 are the same. For example, 0.0.255.0 defines a Hash Mask for an IPv4 288 address that masks the first, the second, and the fourth octets, 289 which means that only the third octet will influence the hash value 290 computed. 292 In the text below, a hash mask is in some places said to be zero. A 293 hash mask is zero if no bits are set. That is, 0.0.0.0 for IPv4 and 294 :: for IPv6. Also, a hash mask is said to be an all-bits-set mask if 295 it is 255.255.255.255 for IPv4 or 296 FFFF:FFFF:FFFF:FFFF:FFFFF:FFFF:FFFF:FFFF for IPv6. 298 There are three Hash Masks defined: 300 o RP Hash Mask 302 o Source Hash Mask 304 o Group Hash Mask 306 The hash masks need to be configured on the PIM routers that can 307 potentially become a PIM DR, unless the implementation provides 308 default hash mask values. An implementation SHOULD have default hash 309 mask values as follows. The default RP Hash Mask SHOULD be zero (no 310 bits set). The default Source and Group Hash Masks SHOULD both be 311 all-bits-set masks. These default values are likely acceptable for 312 most deployments, and simplify configuration. 314 The DRLB-List Hello Option contains a list of GDR Candidates. The 315 first one listed has ordinal number 0, the second listed ordinal 316 number 1, and the last one has ordinal number N - 1 if there are N 317 candidates listed. The hash value computed will be the ordinal 318 number of the GDR Candidate that is acting as GDR. 320 o If the group is in ASM mode and the RP Hash Mask announced by the 321 PIM DR is not zero (at least one bit is set), calculate the value 322 of hashvalue_RP [Section 5.2] to determine the GDR. 324 o If the group is in ASM mode and the RP Hash Mask announced by the 325 PIM DR is zero (no bits are set), obtain the value of 326 hashvalue_Group [Section 5.2] to determine the GDR. 328 o If the group is in SSM mode, use hashvalue_SG [Section 5.2] to 329 determine the GDR. 331 A simple Modulo Hash Algorithm is defined in this document. However, 332 to allow another Hash Algorithms to be used, a 1-octet "Hash 333 Algorithm" field is included in the DRLB-Cap Hello Option to specify 334 the Hash Algorithm used by the router. 336 If different Hash Algorithms are advertised among the routers on a 337 LAN, only the routers advertising the same Hash Algorithm as the DR 338 (as well as having the same DR priority as the DR) are eligible for 339 GDR election. 341 5.2. Modulo Hash Algorithm 343 As part of computing the hash, the notation LSZC(hash_mask) is used 344 to denote the number of zeroes counted from the least significant bit 345 of a Hash Mask hash_mask. As an example, LSZC(255.255.128) is 7 and 346 also LSZC(FFFF:8000::) is 111. If all bits are set, LSZC will be 0. 347 If the mask is zero, then LSZC will be 32 for IPv4, and 128 for IPv6. 349 The number of GDR Candidates is denoted as GDRC. 351 The idea behind the Modulo Hash Algorithm is in simple terms that the 352 corresponding mask is applied to a value, then the result is shifted 353 right LSZC(mask) bits so that the least significant bits that were 354 masked out are not considered. Then this result is masked by 355 0xFFFFFFFF, keeping only the last 32 bits of the result (this only 356 makes a difference for IPv6). Finally, the hash value is this result 357 modulo the number of GDR Candidates (GDRC). 359 The Modulo Hash Algorithm for computing the values hashvalue_RP, 360 hashvalue_Group and hashvalue_SG is defined as follows. 362 hashvalue_RP is calculated as: 364 (((RP_address & RP_mask) >> LSZC(RP_mask)) & 0xFFFFFFFF) % GDRC 366 RP_address is the address of the RP defined for the group and 367 RP_mask is the RP Hash Mask. 369 hashvalue_Group is calculated as: 371 (((Group_address & Group_mask) >> LSZC(Group_mask)) & 0xFFFFFFFF) 372 % GDRC 374 Group_address is the group address and Group_mask is the Group 375 Hash Mask. 377 hashvalue_SG is calculated as: 379 ((((Source_address & Source_mask) >> LSZC(Source_mask)) & 380 0xFFFFFFFF) ^ (((Group_address & Group_mask) >> LSZC(Group_mask)) 381 & 0xFFFFFFFF)) % GDRC 383 Group_address is the group address and Group_mask is the Group 384 Hash Mask. 386 5.2.1. Modulo Hash Algorithm Examples 388 To help illustrate the algorithm, consider this example. Router X 389 with IPv4 address 203.0.113.1 receives a DRLB-List Hello Option from 390 the DR, which announces RP Hash Mask 0.0.255.0 and a list of GDR 391 Candidates, sorted by IP addresses from high to low: 203.0.113.3, 392 203.0.113.2 and 203.0.113.1. The ordinal number assigned to those 393 addresses would be: 395 0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1 (Router X). 397 Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2 198.51.100.2 398 for Group2. Following the modulo Hash Algorithm: 400 LSZC(0.0.255.0) is 8 and GDRC is 3. The hashvalue_RP for Group1 with 401 RP RP1 is: 403 (((192.0.2.1 & 0.0.255.0) >> 8) & 0xFFFFFFFF % 3) = 2 % 3 = 2 405 which matches the ordinal number assigned to Router X. Router X will 406 be the GDR for Group1. 408 The hashvalue_RP for Group2 with RP RP2 is: 410 (((198.51.100.2 & 0.0.255.0) >> 8) & 0xFFFFFFFF % 3) = 100 % 3 = 1 412 which is different from the ordinal number of router X (2). Hence, 413 Router X will not be GDR for Group2. 415 For IPv6 consider this example, similar to the above. Router X with 416 IPv6 address FE80::1 receives a DRLB-List Hello Option from the DR, 417 which announces RP Hash Mask ::FFFF:FFFF:FFFF:0 and a list of GDR 418 Candidates, sorted by IP addresses from high to low: FE80::3, FE80::2 419 and FE80::1. The ordinal number assigned to those addresses would 420 be: 422 0 for FE80::3; 1 for FE80::2; 2 for FE80::1 (Router X). 424 Assume there are 2 RPs: RP1 2001:DB8::1:0:5678:1 for Group1 and RP2 425 2001:DB8::1:0:1234:2 for Group2. Following the modulo Hash 426 Algorithm: 428 LSZC(::FFFF:FFFF:FFFF:0) is 16 and GDRC is 3. The hashvalue_RP for 429 Group1 with RP RP1 is: 431 (((2001:DB8::1:0:5678:1 & ::FFFF:FFFF:FFFF:0) >> 16) & 0xFFFFFFFF % 432 3) = ((::1:0:5678:0 >> 16) & 0xFFFFFFFF % 3) = (::1:0:5678 & 433 0xFFFFFFFF % 3) = ::5678 % 3 = 2 435 which matches the ordinal number assigned to Router X. Router X will 436 be the GDR for Group1. 438 The hashvalue_RP for Group2 with RP RP2 is: 440 (((2001:DB8::1:0:1234:1 & ::FFFF:FFFF:FFFF:0) >> 16) & 0xFFFFFFFF % 441 3) = ((::1:0:1234:0 >> 16) & 0xFFFFFFFF % 3) = (::1:0:1234 & 442 0xFFFFFFFF % 3) = ::1234 % 3 = 1 444 which is different from the ordinal number of router X (2). Hence, 445 Router X will not be GDR for Group2. 447 5.2.2. Limitations 449 The Modulo Hash Algorithm has poor failover characteristics when a 450 shared LAN has more than two GDRs. In the case of more than two GDRs 451 on a LAN, when one GDR fails, all of the groups may be reassigned to 452 a different GDR, even if they were not assigned to the failed GDR. 453 However, many deployments use only two routers on a shared LAN for 454 redundancy purposes. Future work may define new Hash Algorithms 455 where only groups assigned to the failed GDR get reassigned. 457 5.3. PIM Hello Options 459 All PIM routers include a new option, called "Load Balancing 460 Capability (DRLB-Cap)" in their PIM Hello messages. 462 Besides this DRLB-Cap Hello Option, the elected PIM DR also includes 463 a new "DR Load Balancing List (DRLB-List) Hello Option". The DRLB- 464 List Hello Option consists of three Hash Masks as defined above and 465 also a sorted list of GDR Candidate addresses on the LAN. 467 5.3.1. PIM DR Load Balancing Capability (DRLB-Cap) Hello Option 469 0 1 2 3 470 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | Type = 34 | Length = 4 | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | Reserved |Hash Algorithm | 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 477 Figure 3: PIM DR Load Balancing Capability Hello Option 479 Type: 34 481 Length: 4 483 Reserved: Transmitted as zero, ignored on receipt. 485 Hash Algorithm: Hash Algorithm type. 0 for the Modulo algorithm 486 defined in this document. 488 This DRLB-Cap Hello Option MUST be advertised by routers on all 489 interfaces where DR Load Balancing is enabled. 491 5.3.2. PIM DR Load Balancing List (DRLB-List) Hello Option 493 0 1 2 3 494 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 495 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 496 | Type = 35 | Length | 497 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 498 | Group Mask | 499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 500 | Source Mask | 501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 502 | RP Mask | 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 504 | GDR Candidate Address(es) | 505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 507 Figure 4: PIM DR Load Balancing List Hello Option 509 Type: 35 510 Length: (3 + n) x (4 or 16) bytes, where n is the number of GDR 511 candidates. 513 Group Mask (32/128 bits): Mask applied to group addresses as part 514 of hash computation. 516 Source Mask (32/128 bits): Mask applied to source addresses as 517 part of hash computation. 519 RP Mask (32/128 bits): Mask applied to RP addresses as part of 520 hash computation. 522 All masks MUST have the same number of bits as the IP source 523 address in the PIM Hello IP header. 525 GDR Candidate Address(es) (32/128 bits): List of GDR Candidate(s) 527 All addresses MUST be in the same address family as the PIM 528 Hello IP header. It is RECOMMENDED that the addresses are 529 sorted in descending order. 531 If the "Interface ID" option, as specified in [RFC6395], is 532 present in a GDR Candidate's PIM Hello message, and the "Router 533 Identifier" portion is non-zero: 535 + For IPv4, the "GDR Candidate Address" will be set directly 536 to the "Router Identifier". 538 + For IPv6, the "GDR Candidate Address" will be 96 bits of 539 zeroes followed by the 32 bit Router Identifier. 541 If the "Interface ID" option is not present in a GDR Candidate' 542 PIM Hello message, or if the "Interface ID" option is present 543 but the "Router Identifier" field is zero, the "GDR Candidate 544 Address" will be the IPv4 or IPv6 source address of the PIM 545 Hello message. 547 This DRLB-List Hello Option MUST only be advertised by the 548 elected PIM DR. It MUST be ignored if received from a non-DR. 549 The option MUST also be ignored if the hash masks are not the 550 correct number of bits, or GDR Candidate addresses are in the 551 wrong address family. 553 5.4. PIM DR Operation 555 The DR election process is still the same as defined in [RFC7761]. 556 The DR advertises the new DRLB-List Hello Option, which contains mask 557 values from user configuration (or default values), followed by a 558 list of GDR Candidate Addresses. It is RECOMMENDED that the list be 559 sorted, from the highest value to the lowest value. The reason for 560 sorting the list is to make the behavior deterministic, regardless of 561 the order in which the DR learns of new candidates. Note that, as 562 non-DR routers, the DR also advertises the DRLB-Cap Hello Option to 563 indicate its ability to support the new functionality and the type of 564 GDR election Hash Algorithm. 566 If a PIM DR receives a neighbor DRLB-Cap Hello Option, which contains 567 the same Hash Algorithm as the DR, and the neighbor has the same DR 568 priority as the DR, PIM DR SHOULD consider the neighbor as a GDR 569 Candidate and insert the GDR Candidate' Address into the list of the 570 DRLB-List Option. However, the DR may have policies limiting which 571 GDR Candidates, or the number of GDR Candidates to include. 572 Likewise, the DR SHOULD include itself in the list of GDR Candidates, 573 but it is permissable not to do so, if for instance there is some 574 policy restricting the candidate set. 576 If a PIM neighbor included in the list expires, stops announcing the 577 DRLB-Cap Hello Option, changes DR priority, changes Hash Algorithm or 578 otherwise becomes ineligible as a candidate, the DR SHOULD 579 immediately send a triggered hello with a new list in the DRLB-List 580 option, excluding the neighbor. 582 If a new router becomes eligible as a candidate, there is no urgency 583 in sending out an updated list. An updated list SHOULD be included 584 in the next hello. 586 5.5. PIM GDR Candidate Operation 588 When an IGMP/MLD report is received, a Hash Algorithm is used by the 589 GDR Candidates to determine which router is going to be responsible 590 for building forwarding trees on behalf of the host. 592 The router MUST include the DRLB-Cap Hello Option in all PIM Hello 593 messages sent on the interface. Note that the presence of the DRLB- 594 Cap Option in the PIM Hello does not guarantee that the router will 595 be considered as a GDR candidate. Once the DR election is done, the 596 DRLB-List Hello Option is received from the current PIM DR containing 597 a list of the selected GDRs Candidates. 599 A router only acts as a GDR Candidate if it is included in the GDR 600 Candidate list of the DRLB-List Hello Option. See next section for 601 details. 603 5.6. DRLB-List Hello Option Processing 605 This section discusses processing of the DRLB-List Hello Option, 606 including the case where it was received in the previous hello, but 607 not in the current hello. All routers MUST ignore the DRLB-List 608 Hello Option if it is received from a PIM router which is not the DR. 609 The option MUST only be processed by routers that are announcing the 610 DRLB-Cap Option, and only if the Hash Algorithm announced by the DR 611 is the same as the local announcement. All GDR Candidates MUST use 612 the Hash Masks advertised in the Option, even if they differ from 613 those the candidate was configured with. The DR MUST also process 614 its own DRLB-List Hello Option. 616 A router stores the latest option contents that was announced, if 617 any, and deletes the previous contents. The router MUST also compare 618 the new contents with any previous contents, and if there are any 619 changes, continue processing as below. Note that if the option does 620 not pass the above checks, the below processing MUST be done as if 621 the option was not announced. 623 If the contents of the DRLB-List Option, the masks or the candidate 624 list, differs from the previously saved copy, it is received for the 625 first time, or it is no longer being received or accepted, the option 626 MUST be processed as below. 628 1. If the local router is included in the GDR Candidate Address(es) 629 field, for each of the groups, or source and group pairs if the 630 group is in SSM mode, with local receiver interest, the router 631 MUST run the Hash Algorithm to determine which of them it is the 632 GDR for. 634 If there is no change in the GDR status, then no further 635 action is required. 637 If the router becomes the new GDR, then a multicast forwarding 638 tree MUST be built [RFC7761]. 640 If the router is no longer the GDR, then it uses an Assert as 641 explained in [Section 5.7]. 643 2. If the local router is not included in the GDR Candidate 644 Address(es) field, or if the DRLB-List Hello Option is no longer 645 included in the DR's Hello, or if the DR's Neighbor Liveness 646 Timer expires [RFC7761], for each of the groups, or source and 647 group pairs if the group is in SSM mode, with local receiver 648 interest, for which the router is the GDR, it uses an Assert as 649 explained in [Section 5.7]. 651 5.7. PIM Assert Modification 653 GDR changes may occur due to configuration change, due to GDR 654 candidates going down, and also new routers coming up and becoming 655 GDR candidates. This may occur while flows are being forwarded. If 656 the GDR for an active flow changes, there is likely to be some 657 disruption, such as packet loss or duplicates. By using asserts, 658 packet loss is minimized, while allowing a small amount of 659 duplicates. 661 When a router stops acting as the GDR for a group, or source and 662 group pair if SSM, it MUST set the Assert metric preference to 663 maximum (0x7FFFFFFF) and the Assert metric to one less than maximum 664 (0xFFFFFFFE). This was also mentioned in the previous section. That 665 is, whenever it sends or receives an Assert for the group, it must 666 use these values as the metric preference and metric rather than the 667 values provided by the unicast routing protocol. 669 The rest of this section is just for illustration purposes and not 670 part of the protocol definition. 672 To illustrate the behavior when there is a GDR change, consider the 673 following scenario where there are two flows G1 and G2. R1 is the 674 GDR for G1, and R2 is the GDR for G2. When R3 comes up, it is 675 possible that R3 becomes GDR for both G1 and G2, hence R3 starts to 676 build the forwarding tree for G1 and G2. If R1 and R2 stop 677 forwarding before R3 completes the process, packet loss might occur. 678 On the other hand, if R1 and R2 continue forwarding while R3 is 679 building the forwarding trees, duplicates might occur. 681 When the role of GDR changes as above, instead of immediately 682 stopping forwarding, R1 and R2 continue forwarding to G1 and G2 683 respectively, while, at the same time, R3 build forwarding trees for 684 G1 and G2. This will lead to PIM Asserts. 686 For G1, using the functionality described in this document, R1 and R3 687 determine the new GDR, which is R3. With the modified Assert 688 behavior, R1 sets its Assert metric to the near maximum value 689 discussed above. That will make R3, which has normal metric in its 690 Assert as the Assert winner. 692 5.8. Backward Compatibility 694 In the case of a hybrid Ethernet shared LAN (where some PIM routers 695 support the functionality defined in this document, and some do not); 697 o If the DR does not support the new functionality, then there will 698 be no load-balancing. 700 o If non-DR routers do not support the new functionality, they will 701 not be considered as Candidate GDRs and it will not take part in 702 an load-balancing. Load-balancing may still happen on the link. 704 6. Operational Considerations 706 An administrator needs to consider what the total bandwidth 707 requirements are and find a set of routers that together has enough 708 total capacity, while making sure that each of the routers can handle 709 its part, assuming that the traffic is distributed roughly equally 710 among the routers. Ideally, one should also have enough bandwidth to 711 handle the case where at least one router fails. All routers should 712 have reachability to the sources, and RPs if applicable, that is not 713 via the LAN. 715 Care must be taken when choosing what hash masks to configure. One 716 would typically configure the same masks on all the routers, so that 717 they are the same, regardless of which router is elected as DR. The 718 default masks are likely suitable for most deployment. The RP Hash 719 Mask must be configured (the default is no bits set) if one wishes to 720 hash based on the RP address rather than the group address for ASM. 721 The default masks will use the entire group addresses, and source 722 addresses if SSM, as part of the hash. An administrator may set 723 other masks that masks out part of the addresses to ensure that 724 certain flows always get hashed to the same router. How this is 725 achieved depends on how the group addresses are allocated. 727 Only the routers announcing the same Hash Algorithm as the DR would 728 be considered as GDR candidates. Network administrators need to make 729 sure that the desired set of routers announce the same algorithm. 730 Migration between different algorithms is not considered in this 731 document. 733 7. IANA Considerations 735 IANA has temporarily assigned type 34 for the PIM DR Load Balancing 736 Capability (DRLB-Cap) Hello Option, and type 35 for the PIM DR Load 737 Balancing List (DRLB-List) Hello Option in the PIM-Hello Options 738 registry. IANA is requested to make these assignments permanent when 739 this document is published as an RFC. Note that the option names 740 have changed slightly since the temporary assignments were made. 741 Also, the length of option 34 is always 4, the registry currently 742 says it is variable. 744 This document requests IANA to create a registry called "Designated 745 Router Load Balancing Hash Algorithms" in the "Protocol Independent 746 Multicast (PIM)" branch of the registry tree. The registry lists 747 Hash Algorithms for use by PIM Designated Router Load Balancing. 749 7.1. Initial registry 751 The initial content of the registry should be as follows. 753 Type Name Reference 754 ------ ---------------------------------------- -------------------- 755 0 Modulo This document 756 1-255 Unassigned 758 7.2. Assignment of new Hash Algorithms 760 Assignment of new Hash Algorithms is done according to the "IETF 761 Review" model, see [RFC8126]. 763 8. Security Considerations 765 Security of the new DR Load Balancing PIM Hello Options is only 766 guaranteed by the security of PIM Hello messages, so the security 767 considerations for PIM Hello messages as described in PIM-SM 768 [RFC7761] apply here. 770 If the DR is subverted it could omit or add certain GDRs or announce 771 an unsupported algorithm. If another router is subverted, it could 772 be made DR and cause similar issues. While these issues are specific 773 to this specification, they are not that different from existing 774 attacks such as subverting a DR and lowering the DR priority, causing 775 a different router to become the DR. 777 If for any reason, the DR includes a GDR in the announced list which 778 announces a different algorithm from what the DR announces, the GDR 779 is required to ignore the announcement, and there will be no router 780 acting as the DR for the flows that hash to that GDR. 782 If a GDR is subverted, it could potentially be made to stop 783 forwarding all the traffic it is expected to forward. This is also 784 similar today to if a DR is subverted. 786 9. Acknowledgement 788 The authors would like to thank Steve Simlo and Taki Millonis for 789 helping with the original idea; Alia Atlas, Bill Atwood, Jake 790 Holland, Bharat Joshi, Anish Kachinthaya, Anvitha Kachinthaya and 791 Alvaro Retana for reviews and comments; and Toerless Eckert and 792 Rishabh Parekh for helpful conversation on the document. 794 10. References 796 10.1. Normative References 798 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 799 Requirement Levels", BCP 14, RFC 2119, 800 DOI 10.17487/RFC2119, March 1997, 801 . 803 [RFC6395] Gulrajani, S. and S. Venaas, "An Interface Identifier (ID) 804 Hello Option for PIM", RFC 6395, DOI 10.17487/RFC6395, 805 October 2011, . 807 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 808 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 809 Multicast - Sparse Mode (PIM-SM): Protocol Specification 810 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 811 2016, . 813 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 814 Writing an IANA Considerations Section in RFCs", BCP 26, 815 RFC 8126, DOI 10.17487/RFC8126, June 2017, 816 . 818 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 819 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 820 May 2017, . 822 10.2. Informative References 824 [RFC3376] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A. 825 Thyagarajan, "Internet Group Management Protocol, Version 826 3", RFC 3376, DOI 10.17487/RFC3376, October 2002, 827 . 829 [RFC3810] Vida, R., Ed. and L. Costa, Ed., "Multicast Listener 830 Discovery Version 2 (MLDv2) for IPv6", RFC 3810, 831 DOI 10.17487/RFC3810, June 2004, 832 . 834 [RFC4541] Christensen, M., Kimball, K., and F. Solensky, 835 "Considerations for Internet Group Management Protocol 836 (IGMP) and Multicast Listener Discovery (MLD) Snooping 837 Switches", RFC 4541, DOI 10.17487/RFC4541, May 2006, 838 . 840 [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for 841 IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, 842 . 844 Authors' Addresses 846 Yiqun Cai 847 Alibaba Group 849 Email: yiqun.cai@alibaba-inc.com 851 Heidi Ou 852 Alibaba Group 854 Email: heidi.ou@alibaba-inc.com 856 Sri Vallepalli 857 Cisco Systems, Inc. 858 3625 Cisco Way 859 San Jose CA 95134 860 USA 862 Email: svallepa@cisco.com 864 Mankamana Mishra 865 Cisco Systems, Inc. 866 821 Alder Drive, 867 Milpitas CA 95035 868 USA 870 Email: mankamis@cisco.com 871 Stig Venaas 872 Cisco Systems, Inc. 873 Tasman Drive 874 San Jose CA 95134 875 USA 877 Email: stig@cisco.com 879 Andy Green 880 British Telecom 881 Adastral Park 882 Ipswich IP5 2RE 883 United Kingdom 885 Email: andy.da.green@bt.com