idnits 2.17.1 draft-ietf-pim-drlb-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 16, 2018) is 2285 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Yiqun. Cai 3 Internet-Draft Heidi. Ou 4 Intended status: Standards Track Alibaba Group 5 Expires: July 20, 2018 Sri. Vallepalli 6 Mankamana. Mishra 7 Stig. Venaas 8 Cisco Systems 9 Andy. Green 10 British Telecom 11 January 16, 2018 13 PIM Designated Router Load Balancing 14 draft-ietf-pim-drlb-07 16 Abstract 18 On a multi-access network, one of the PIM routers is elected as a 19 Designated Router (DR). On the last hop LAN, the PIM DR is 20 responsible for tracking local multicast listeners and forwarding 21 traffic to these listeners if the group is operating in PIM-SM. In 22 this document, we propose a modification to the PIM-SM protocol that 23 allows more than one of these last hop routers to be selected so that 24 the forwarding load can be distributed among these routers. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on July 20, 2018. 43 Copyright Notice 45 Copyright (c) 2018 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 62 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 5 63 4. Functional Overview . . . . . . . . . . . . . . . . . . . . . 6 64 4.1. GDR Candidates . . . . . . . . . . . . . . . . . . . . . 7 65 4.2. Hash Mask and Hash Algorithm . . . . . . . . . . . . . . 7 66 4.3. Modulo Hash Algorithm . . . . . . . . . . . . . . . . . . 8 67 4.4. PIM Hello Options . . . . . . . . . . . . . . . . . . . . 9 68 5. Hello Option Formats . . . . . . . . . . . . . . . . . . . . 10 69 5.1. PIM DR Load Balancing Capability (DRLBC) Hello Option . . 10 70 5.2. PIM DR Load Balancing GDR (DRLBGDR) Hello Option . . . . 10 71 6. Protocol Specification . . . . . . . . . . . . . . . . . . . 11 72 6.1. PIM DR Operation . . . . . . . . . . . . . . . . . . . . 12 73 6.2. PIM GDR Candidate Operation . . . . . . . . . . . . . . . 12 74 6.2.1. Router receives new DRLBGDR . . . . . . . . . . . . . 13 75 6.2.2. Router receives updated DRLBGDR . . . . . . . . . . . 13 76 6.3. PIM Assert Modification . . . . . . . . . . . . . . . . . 14 77 7. Compatibility . . . . . . . . . . . . . . . . . . . . . . . . 15 78 8. Manageability Considerations . . . . . . . . . . . . . . . . 15 79 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 80 10. Security Considerations . . . . . . . . . . . . . . . . . . . 16 81 11. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 16 82 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 83 12.1. Normative References . . . . . . . . . . . . . . . . . . 16 84 12.2. Informative References . . . . . . . . . . . . . . . . . 17 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 87 1. Introduction 89 On a multi-access LAN such as an Ethernet, one of the PIM routers is 90 elected as a DR. The PIM DR has two roles in the PIM-SM protocol. 91 On the first hop network, the PIM DR is responsible for registering 92 an active source with the Rendezvous Point (RP) if the group is 93 operating in PIM-SM. On the last hop LAN, the PIM DR is responsible 94 for tracking local multicast listeners and forwarding to these 95 listeners if the group is operating in PIM-SM. 97 Consider the following last hop LAN in Figure 1: 99 ( core networks ) 100 | | | 101 | | | 102 R1 R2 R3 103 | | | 104 --(last hop LAN)-- 105 | 106 | 107 (many receivers) 109 Figure 1: Last Hop LAN 111 Assume R1 is elected as the Designated Router. According to 112 [RFC4601], R1 will be responsible for forwarding traffic to that LAN 113 on behalf of any local members. In addition to keeping track of IGMP 114 and MLD membership reports, R1 is also responsible for initiating the 115 creation of source and/or shared trees towards the senders or the 116 RPs. 118 Forcing sole data plane forwarding responsibility on the PIM DR 119 proves a limitation in the protocol. In comparison, even though an 120 OSPF DR, or an IS-IS DIS, handles additional duties while running the 121 OSPF or IS-IS protocols, they are not required to be solely 122 responsible for forwarding packets for the network. On the other 123 hand, on a last hop LAN, only the PIM DR is asked to forward packets 124 while the other routers handle only control traffic (and perhaps drop 125 packets due to RPF failures). The forwarding load of a last hop LAN 126 is concentrated on a single router. 128 This leads to several issues. One of the issues is that the 129 aggregated bandwidth will be limited to what R1 can handle towards 130 this particular interface. These days, it is very common that the 131 last hop LAN usually consists of switches that run IGMP/MLD or PIM 132 snooping. This allows the forwarding of multicast packets to be 133 restricted only to segments leading to receivers who have indicated 134 their interest in multicast groups using either IGMP or MLD. The 135 emergence of the switched Ethernet allows the aggregated bandwidth to 136 exceed, some times by a large number, that of a single link. For 137 example, let us modify Figure 1 and introduce an Ethernet switch in 138 Figure 2. 140 ( core networks ) 141 | | | 142 | | | 143 R1 R2 R3 144 | | | 145 +=gi0===gi1===gi2=+ 146 + + 147 + switch + 148 + + 149 +=gi4===gi5===gi6=+ 150 | | | 151 H1 H2 H3 153 Figure 2: Last Hop Network with Ethernet Switch 155 Let us assume that each individual link is a Gigabit Ethernet. Each 156 router, R1, R2 and R3, and the switch have enough forwarding capacity 157 to handle hundreds of Gigabits of data. 159 Let us further assume that each of the hosts requests 500 Mbps of 160 data and different traffic is requested by each host. This 161 represents a total 1.5 Gbps of data, which is under what each switch 162 or the combined uplink bandwidth across the routers can handle, even 163 under failure of a single router. 165 On the other hand, the link between R1 and switch, via port gi0, can 166 only handle a throughput of 1Gbps. And if R1 is the only router, the 167 PIM DR elected using the procedure defined by [RFC4601], at least 500 168 Mbps worth of data will be lost because the only link that can be 169 used to draw the traffic from the routers to the switch is via gi0. 170 In other words, the entire network's throughput is limited by the 171 single connection between the PIM DR and the switch (or the last hop 172 LAN as in Figure 1). 174 The problem may also manifest itself in a different way. For 175 example, R1 happens to forward 500 Mbps worth of unicast data to H1, 176 and at the same time, H2 and H3 each requests 300 Mbps of different 177 multicast data. Once again packet drop happens on R1 while in the 178 mean time, there is sufficient forwarding capacity left on R2 and R3 179 and link capacity between the switch and R2/R3. 181 Another important issue is related to failover. If R1 is the only 182 forwarder on the last hop router for shared LAN, in the event of a 183 failure when R1 goes out of service, multicast forwarding for the 184 entire LAN has to be rebuilt by the newly elected PIM DR. However, 185 if there was a way that allowed multiple routers to forward to the 186 LAN for different groups, failure of one of the routers would only 187 lead to disruption to a subset of the flows, therefore improving the 188 overall resilience of the network. 190 In this document, we propose a modification to the PIM-SM protocol 191 that allows more than one of these routers, called Group Designated 192 Router (GDR) to be selected so that the forwarding load can be 193 distributed among a number of routers. 195 2. Terminology 197 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 198 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 199 document are to be interpreted as described in [RFC2119] . 201 With respect to PIM, this document follows the terminology that has 202 been defined in [RFC4601] . 204 This document also introduces the following new acronyms: 206 o GDR: GDR stands for "Group Designated Router". For each multicast 207 flow, either a (*,G) for ASM, or an (S,G) for SSM, a hash 208 algorithm (described below) is used to select one of the routers 209 as a GDR. The GDR is responsible for initiating the forwarding 210 tree building for the corresponding multicast flow. 212 o GDR Candidate: a last hop router that has potential to become a 213 GDR. A GDR Candidate must have the same DR priority and must run 214 the same GDR election hash algorithm as the DR router. It must 215 send and process new PIM Hello Options as defined in this 216 document. There might be more than one GDR Candidate on a LAN. 217 But only one can become GDR for a specific multicast flow. 219 3. Applicability 221 The proposed change described in this specification applies to PIM-SM 222 last hop routers only. 224 It does not alter the behavior of a PIM DR on the first hop network 225 This is because the source tree is built using the IP address of the 226 sender, not the IP address of the PIM DR that sends the registers 227 towards the RP. The load balancing between first hop routers can be 228 achieved naturally if an IGP provides equal cost multiple paths 229 (which it usually does in practice). And distributing the load to do 230 registering does not justify the additional complexity required to 231 support it. 233 4. Functional Overview 235 In the existing PIM DR election, when multiple last hop routers are 236 connected to a multi-access LAN (for example, an Ethernet), one of 237 them is selected to act as PIM DR. The PIM DR is responsible for 238 sending local Join/Prune messages towards the RP or source. To elect 239 the PIM DR, each PIM router on the LAN examines the received PIM 240 Hello messages and compares its DR priority and IP address with those 241 of its neighbors. The router with the highest DR priority is the PIM 242 DR. If there are multiple such routers, their IP addresses are used 243 as the tie-breaker, as described in [RFC4601]. 245 In order to share forwarding load among last hop routers, besides the 246 normal PIM DR election, the GDR is also elected on the last hop 247 multi-access LAN. There is only one PIM DR on the multi-access LAN, 248 but there might be multiple GDR Candidates. 250 For each multicast flow, that is (*,G) for ASM and (S,G) for SSM, a 251 hash algorithm is used to select one of the routers to be the GDR. A 252 new DR Load Balancing Capability (DRLBC) PIM Hello Option, which 253 contains hash algorithm type, is announced by routers on interfaces 254 where this specification is enabled. Last hop routers with the new 255 DRLBC Option advertised in its Hello, and using the same GDR election 256 hash algorithm and the same DR priority as the PIM DR, are considered 257 as GDR Candidates. 259 Hash Masks are defined for Source, Group and RP separately, in order 260 to handle PIM ASM/SSM. The masks, as well as a sorted list of GDR 261 Candidates' Addresses are announced by DR in a new DR Load Balancing 262 GDR (DRLBGDR) PIM Hello Option. 264 For each multicast flow, a hash algorithm is used to select one of 265 the routers to be the GDR. The masks are announced in PIM Hello by 266 DR as a DR Load Balancing GDR (DRLBGDR) Hello Option. Besides that, 267 a DR Load Balancing Capability (DRLBC) Hello Option, which contains 268 hash algorithm type, is also announced by the router on interfaces 269 where this specification is enabled. Last hop routers with the new 270 DRLBC Option advertised in its Hello, and using the same GDR election 271 hash algorithm and the same DR priority as the PIM DR, are considered 272 as GDR Candidates. 274 A hash algorithm based on the announced Source, Group or RP masks 275 allows one GDR to be assigned to a corresponding multicast state. 276 And that GDR is responsible for initiating the creation of the 277 multicast forwarding tree for multicast traffic. 279 4.1. GDR Candidates 281 GDR is the new concept introduced by this specification. GDR 282 Candidates are routers eligible for GDR election on the LAN. To 283 become a GDR Candidate, a router MUST support this specification, 284 have the same DR priority and run the same GDR election hash 285 algorithm as the DR on the LAN. 287 For example, assume there are 4 routers on the LAN: R1, R2, R3 and 288 R4, which all support this specification on the LAN. R1, R2 and R3 289 have the same DR priority while R4's DR priority is less preferred. 290 In this example, R4 will not be eligible for GDR election, because R4 291 will not become a PIM DR unless all of R1, R2 and R3 go out of 292 service. 294 Further assume router R1 wins the PIM DR election, and R1, R2 run the 295 same hash algorithm for GDR election, while R3 runs a different one. 296 Then only R1 and R2 will be eligible for GDR election, R3 will not. 298 As a DR, R1 will include its own Load Balancing Hash Masks, and also 299 the identity of R1 and R2 (the GDR Candidates) in its DRLBGDR Hello 300 Option. 302 4.2. Hash Mask and Hash Algorithm 304 A Hash Mask is used to extract a number of bits from the 305 corresponding IP address field (32 for v4, 128 for v6), and calculate 306 a hash value. A hash value is used to select a GDR from GDR 307 Candidates advertised by PIM DR. For example, 0.0.255.0 defines a 308 Hash Mask for an IPv4 address that masks the first, the second and 309 the fourth octets. 311 There are three Hash Masks defined, 313 o RP Hash Mask 315 o Source Hash Mask 317 o Group Hash Mask 319 The hask masks need to be configured on the PIM routers that can 320 potentially become a PIM DR, unless the implementation provides 321 default hash mask. An implementation SHOULD provide masks with 322 default values 255.255.255.255 (IPv4) and 323 FFFF:FFFF:FFFF:FFFF:FFFFF:FFFF:FFFF:FFFF (IPv6). 325 o If the group is ASM, and if the RP Hash Mask announced by the PIM 326 DR is not 0, calculate the value of hashvalue_RP [Section 4.3] to 327 determine GDR. 329 o If the group is ASM and if the RP Hash Mask announced by the PIM 330 DR is 0, obtain the value of hashvalue_Group [Section 4.3 ] to 331 determine GDR. 333 o If the group is SSM, use hashvalue_SG [Section 4.3] to determine 334 GDR. 336 A simple Modulo hash algorithm will be discussed in this document. 337 However, to allow other hash algorithm to be used, a 4-bytes "Hash 338 Algorithm Type" field is included in DRLBC Hello Option to specify 339 the hash algorithm used by a last hop router. 341 If different hash algorithm types are advertised among last hop 342 routers, only last hop routers running the same hash algorithm as the 343 DR (and having the same DR priority as the DR) are eligible for GDR 344 election. 346 4.3. Modulo Hash Algorithm 348 Modulo hash algorithm is discussed here as an example, with detailed 349 description on hashvalue_RP. 351 o For ASM groups, with a non-zero RP_hash mask, hash value is 352 calculated as: 354 hashvalue_RP = (((RP_address & RP_hashmask) >> N) & 0xFFFF) % M 356 RP_address is the address of the RP defined for the group. N 357 is the number of zeros, counted from the least significant bit 358 of the RP_hashmask. M is the number of GDR Candidates. 360 For example, Router X with IPv4 address 203.0.113.1, receives a 361 DRLBGDR Hello Option from the DR, which announces RP Hash Mask 362 0.0.255.0, and a list of GDR Candidates, sorted by IP addresses 363 from high to low, 203.0.113.3, 203.0.113.2 and 203.0.113.1. 364 The ordinal number assigned to those addresses would be: 366 0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1 (Router 367 X) 369 Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2 370 198.51.100.2 for Group2. Following the modulo hash algorithm: 372 N is 8 for 0.0.255.0, and M is 3 for the total number of GDR 373 Candidates. The hashvalue_RP for RP1 192.0.2.1 is: 375 (((192.0.2.1 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 2 % 3 = 2 377 matches the ordinal number assigned to Router X. Router X will 378 be the GDR for Group1, which uses 192.0.2.1 as the RP. 380 The hashvalue_RP for RP2 198.51.100.2 is: 382 (((198.51.100.2 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 100 % 3 = 1 384 which is different from Router X's ordinal number 2, hence, 385 Router X will not be GDR for Group2. 387 o If RP_hashmask is 0, a hash value for ASM group is calculated 388 using the group Hash Mask: 390 hashvalue_Group = (((Group_address & Group_hashmask) >> N) & 391 0xFFFF) % M 393 Compare hashvalue_Group with Ordinal number assigned to Router 394 X, to decide if Router X is the GDR. 396 o For SSM groups, a hash value is calculated using both the source 397 and group Hash Mask 399 hashvalue_SG = ((((Source_address & Source_hashmask) >> N_S) & 400 0xFFFF) ^ (((Group_address & Group_hashmask) >> N_G) & 0xFFFF)) 401 % M 403 4.4. PIM Hello Options 405 When a last hop PIM router sends a PIM Hello from an interface with 406 this specification enabled, it includes a new option, called "Load 407 Balancing Capability (DRLBC)". 409 Besides this DRLBC Hello Option, the elected PIM DR also includes a 410 new "DR Load Balancing GDR (DRLBGDR) Hello Option". The DRLBGDR 411 Hello Option consists of three Hash Masks as defined above and also 412 the sorted list of all GDR Candidates' Address on the last hop LAN. 414 The elected PIM DR uses DRLBC Hello Option advertised by all routers 415 on the last hop LAN to compose its DRLBGDR . The GDR Candidates use 416 DRLBGDR Hello Option advertised by PIM DR to calculate hash value. 418 5. Hello Option Formats 420 5.1. PIM DR Load Balancing Capability (DRLBC) Hello Option 422 0 1 2 3 423 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 424 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 425 | Type = TBD | Length = 4 | 426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 427 | Hash Algorithm Type | 428 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 430 Figure 3: Capability Hello Option 432 Type: TBD. 434 Length: 4 octets 436 Hash Algorithm Type: 0 for Modulo hash algorithm 438 This DRLBC Hello Option SHOULD be advertised by last hop routers from 439 interfaces with this specification enabled. 441 5.2. PIM DR Load Balancing GDR (DRLBGDR) Hello Option 443 0 1 2 3 444 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 446 | Type = TBD | Length | 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 | Group Mask | 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 | Source Mask | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | RP Mask | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | GDR Candidate Address(es) | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 457 Figure 4: GDR Hello Option 459 Type: TBD 461 Length: 3 x (4 byte or 16 byte) + n x (4 byte or 16 byte) where n 462 is number of GDR candidate 463 Group Mask (32/128 bits): Mask 465 Source Mask (32/128 bits): Mask 467 RP Mask (32/128 bits): Mask 469 All masks MUST be in the same address family as the Hello IP 470 header. 472 GDR Address (32/128 bits): Address(es) of GDR Candidate(s) 474 All addresses must be in the same address family as the Hello 475 IP header. The addresses are sorted from high to low. The 476 order is converted to the ordinal number associated with each 477 GDR candidate in hash value calculation. For example, 478 addresses advertised are R3, R2, R1, the ordinal number 479 assigned to R3 is 0, to R2 is 1 and to R1 is 2. 481 If "Interface ID" option, as described in [RFC6395], presents 482 in a GDR Candicate's PIM Hello message, and the "Router ID" 483 portion is non-zero, 485 + For IPv4, the "GDR Candidate Address" will be set directly 486 to "Router ID". 488 + For IPv6, the "GDR Candidate Address" will be set to the 489 IPv4-IPv6 translated address of "Router ID", as described in 490 [RFC4291] , that is the "Router-ID" is appended to the 491 prefix of 96-bits zeros. 493 If the "Interface ID" option is not present in a GDR 494 Candidate's PIM Hello message, or if the "Interface ID" option 495 is present, but"Router ID" field is zero, the "GDR Candidate 496 Address" will be the IPv4 or IPv6 source address from PIM Hello 497 message. 499 This DRLBGDR Hello Option SHOULD only be advertised by the 500 elected PIM DR. 502 6. Protocol Specification 503 6.1. PIM DR Operation 505 The DR election process is still the same as defined in [RFC4601]. A 506 DR that has this specification enabled on the interface, advertises 507 the new LBGRD Hello Option, which contains value of masks from user 508 configuration, followed by a sorted list of all GDR Candidates' 509 Addresses, from high to low. Moreover, same as non-DR routers, DR 510 also advertises DRLBC Hello Option to indicate its capability of 511 supporting this specification and the type of its GDR election hash 512 algorithm. 514 If a PIM DR receives a PIM Hello with DRLBGRD Option, the PIM DR 515 SHOULD ignore the TLV. 517 If a PIM DR receives a neighbor DRLBC Hello Option, which contains 518 the same hash algorithm type as the DR, and the neighbor has the same 519 DR priority as the DR, PIM DR SHOULD consider the neighbor as a GDR 520 Candidate and insert the GDR Candidate's Address into the sorted list 521 of DRLBGRD Option. 523 6.2. PIM GDR Candidate Operation 525 When an IGMP/MLD join is received, without this proposal, only PIM DR 526 will handle the join and potentially run into the issues described 527 earlier. Using this proposal, a hash algorithm is used on GDR 528 Candidate to determine which router is going to be responsible for 529 building forwarding trees on behalf of the host. 531 A router which supports this specification, a interface where this 532 protocol is enabled advertises DRLBC Hello Option in its PIM Hello, 533 even if the router may not be considered as a GDR Candidate, for 534 example, due to low DR priority. once DR election is done, DRLBGDR 535 Hello option would be received from the current PIM DR on link. 537 A GDR Candidate may receive a DRLBGDR Hello Option from PIM DR, with 538 different Hash Masks from those configured on it, The GDR Candidate 539 must use the Hash Masks advertised by the PIM DR to calculate the 540 hash value. 542 A GDR Candidate may receive a DRLBGDR Hello Option from a PIM router 543 which is not DR. The GDR Candidate MUST ignore such DRLBGDR Hello 544 Option. 546 A GDR Candidate may receive a Hello from the elected PIM DR, and the 547 PIM DR does not support this specification. The GDR election 548 described by this specification will not take place, that is only the 549 PIM DR joins the multicast tree. 551 A router only act as GDR if it is included in the GDR list of DRLBGDR 552 Hello Option 554 6.2.1. Router receives new DRLBGDR 556 When a router receives new DRLBGDR from the current PIM DR, it need 557 to process and check if router is in list of of GDR 559 1. If router is not listed as a GDR candidate in DRLBGDR , no action 560 needed. 562 2. If router is listed as a GDR candidate in DRLBGDR, then it need 563 to process each of the groups in the IGMP/MLD reports. The masks 564 are announced in the PIM Hello by DR as DRLBGDR Hello option. 565 For each of groups in the reports it need to run hash algorithem 566 (described in section 4.3) based on the announced Source, Group 567 or RP masks to determine if it is GDR for specified group. If 568 hash result is to be GDR for multicast flow, it does build 569 multicast forwarding tree. if it is not GDR for flow, no action 570 is needed. 572 6.2.2. Router receives updated DRLBGDR 574 If router (GDR or non GDR) receives an unchanged DRLBGDR from the 575 current PIM DR, no action needed. 577 If router (GDR or non GDR) receives a new or modified DRLBGDR from 578 the current PIM DR. It requires processing as described below 580 1. If it was GDR and still included in current GDR list: It need to 581 process each of the groups, run hash algorithem to check if it is 582 still GDR for given group. 584 If it was GDR for group earlier. and even new hash choose it 585 as GDR, no processing required. 587 If it was GDR for group earlier and now it is no more GDR, 588 then it sets its assert metric for this flow to be 589 (PIM_ASSERT_INFINITY - 1), as explained in Sec 6.3 591 If it was not GDR for group earlier, and even new hash does 592 not make it GDR no processing required. 594 If it was not GDR earlier and now becomes GDR, it starts 595 building multicast forwarding tree for this flow. 597 2. If it was non GDR , and updated DRLBGDR from current PIM DR 598 contains this router as one of the GDR. In this case this router 599 being new GDR candiate MUST run hash algorithem for each of the 600 groups (multicast flows) and for given group, 602 If it is not GDR, no processing is required. 604 If it is hased as GDR , it need to build multicast forwarding 605 tree. 607 3. If a router receives IGMP/MLD report for flow for which the 608 router has been the GDR AND the DRLBGDR has changed since last 609 report for this flow, then the router MUST determine if it is 610 still the GDR. if it is, no action needed. if it is not, then the 611 router sets its assert metric for this flow to be 612 (PIM_ASSERT_INFINITY - 1) as explained in Sec 6.3. 614 6.3. PIM Assert Modification 616 It is possible that the identity of the GDR might change in the 617 middle of an active flow. Examples this could happen include: 619 When a new PIM router comes up 621 When a GDR restarts 623 When the GDR changes, existing traffic might be disrupted. 624 Duplicates or packet loss might be observed. To illustrate the case, 625 consider the following scenario: there are two streams G1 and G2. R1 626 is the GDR for G1, and R2 is the GDR for G2. When R3 comes up 627 online, it is possible that R3 becomes GDR for both G1 and G2, hence 628 R3 starts to build the forwarding tree for G1 and G2. If R1 and R2 629 stop forwarding before R3 completes the process, packet loss might 630 occur. On the other hand, if R1 and R2 continue forwarding while R3 631 is building the forwarding trees, duplicates might occur. 633 This is not a typical deployment scenario but it still might happen. 634 Here we describe a mechanism to minimize the impact. The motivation 635 is that we want to minimize packet loss. And therefore, we would 636 allow a small amount of duplicates and depend on PIM Assert to 637 minimize the duplication. 639 When the role of GDR changes as above, instead of immediately 640 stopping forwarding, R1 and R2 continue forwarding to G1 and G2 641 respectively, while at the same time, R3 build forwarding trees for 642 G1 and G2. This will lead to PIM Asserts. 644 With introduction of GDR, the following modification to the Assert 645 packet MUST be done: if a router enables this specification on its 646 downstream interface, but it is not a GDR (before network event it 647 was GDR), it would adjust its Assert metric to (PIM_ASSERT_INFINITY - 648 1). 650 Using the above example, for G1, assume R1 and R3 agree on the new 651 GDR, which is R3. R1 will set its Assert metric as 652 (PIM_ASSERT_INFINITY - 1). That will make R3, which has normal 653 metric in its Assert as the Assert winner. 655 For G2, assume it takes a little bit longer time for R2 to find out 656 that R3 is the new GDR and still thinks itself being the GDR while R3 657 already has assumed the role of GDR. Since both R2 and R3 think they 658 are GDRs, they further compare the metric and IP address. If R3 has 659 the better routing metric, or same metric but better tie-breaker, the 660 result will be consistent with GDR selection. If unfortunately, R2 661 has the better metric or same metric but better tie-breaker R2 will 662 become the Assert winner and continues to forward traffic. This will 663 continue until: 665 The next PIM Hello option from DR is seen that selects R3 as the GDR. 666 R3 will then build the forwarding tree and send an Assert. 668 The process continues until R2 agrees to the selection of R3 as being 669 the GDR, and set its own Assert metric to (PIM_ASSERT_INFINITY - 1), 670 which will make R3 the Assert winner. During the process, we will 671 see intermittent duplication of traffic but packet loss will be 672 minimized. In the unlikely case that R2 never relinquishes its role 673 as GDR (while every other router thinks otherwise), the proposed 674 mechanism also helps to keep the duplication to a minimum until 675 manual intervention takes place to remedy the situation. 677 7. Compatibility 679 In case of hybrid Ethernet shared LAN ( where some PIM router enables 680 specification defined in this draft and some do not enable) 682 o If router which does not support specification defined in this 683 draft becomes DR on link, it MUST be only DR on link as [RFC4601] 684 and there would be no router which would act as GDR. 686 o If router which does not support specification defined in this 687 draft becomes non DR on link, then it should act as non-DR defined 688 in [RFC4601]. 690 8. Manageability Considerations 692 o All of the router in LAN who are supporting this specification 693 MUST use identical Hash Algorithm Type (described in section 5.1). 694 In case of hybrid Hash Algorithm Type router must go backward to 695 use DR election method defined in PIM-SM [RFC4601]. Migration 696 between different algorithem type is out of scope of this 697 document. 699 9. IANA Considerations 701 Two new PIM Hello Option Types have been assigned to the DR Load 702 Balancing messages. [HELLO-OPT], this document recommends 34(0x22) 703 as the new "PIM DR Load Balancing Capability Hello Option", and 704 35(0x23) as the new "PIM DR Load Balancing GDR Hello Option". 706 10. Security Considerations 708 Security of the new DR Load Balancing PIM Hello Options is only 709 guaranteed by the security of PIM Hello message, so the security 710 considerations for PIM Hello messages as described in PIM-SM 711 [RFC4601] apply here. 713 11. Acknowledgement 715 The authors would like to thank Steve Simlo, Taki Millonis for 716 helping with the original idea, Bill Atwood, Bharat Joshi for review 717 comments, Toerless Eckert and Rishabh Parekh for helpful conversation 718 on the document. 720 12. References 722 12.1. Normative References 724 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 725 Requirement Levels", BCP 14, RFC 2119, 726 DOI 10.17487/RFC2119, March 1997, 727 . 729 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 730 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 731 2006, . 733 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 734 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 735 Protocol Specification (Revised)", RFC 4601, 736 DOI 10.17487/RFC4601, August 2006, 737 . 739 [RFC6395] Gulrajani, S. and S. Venaas, "An Interface Identifier (ID) 740 Hello Option for PIM", RFC 6395, DOI 10.17487/RFC6395, 741 October 2011, . 743 12.2. Informative References 745 [HELLO-OPT] 746 IANA, "PIM Hello Options", IANA PIM-HELLO-OPTIONS, March 747 2007. 749 Authors' Addresses 751 Yiqun Cai 752 Alibaba Group 754 Email: yiqun.cai@alibaba-inc.com 756 Heidi Ou 757 Alibaba Group 759 Sri Vallepalli 760 Cisco Systems 761 3625 Cisco Way, 762 Sanjose, CALIFORNIA 95134 763 UNITED STATES 765 Email: svallepa@cisco.com 767 Mankamana Prasad Mishra 768 Cisco Systems 769 821 Alder Drive, 770 MILPITAS, CALIFORNIA 95035 771 UNITED STATES 773 Email: mankamis@cisco.com 775 Stig Venaas 776 Cisco Systems 777 821 Alder Drive, 778 MILPITAS, CALIFORNIA 95035 779 UNITED STATES 781 Email: stig@cisco.com 782 Andy Green 783 British Telecom 784 Adastral Park 785 Ipswich IP5 2RE 786 United Kingdom 788 Email: andy.da.green@bt.com