idnits 2.17.1 draft-ietf-pim-drlb-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 7 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 4 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 15, 2014) is 3723 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC3973' is defined on line 589, but no explicit reference was found in the text == Unused Reference: 'RFC5015' is defined on line 593, but no explicit reference was found in the text == Unused Reference: 'RFC6395' is defined on line 597, but no explicit reference was found in the text ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) -- Duplicate reference: RFC4601, mentioned in 'HELLO-OPT', was also mentioned in 'RFC4601'. -- Obsolete informational reference (is this intentional?): RFC 4601 (ref. 'HELLO-OPT') (Obsoleted by RFC 7761) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Yiqun Cai 3 Internet-Draft Microsoft 4 Intended status: Standards Track Sri Vallepalli 5 Expires: August 19, 2014 Heidi Ou 6 Cisco Systems, Inc. 7 Andy Green 8 British Telecom 9 February 15, 2014 11 PIM Designated Router Load Balancing 12 draft-ietf-pim-drlb-03.txt 14 Abstract 16 On a multi-access network, one of the PIM routers is elected as a 17 Designated Router (DR). On the last hop network, the PIM DR is 18 responsible for tracking local multicast listeners and forwarding 19 traffic to these listeners if the group is operated in PIM SM. In 20 this document, we propose a modification to the PIM SM protocol that 21 allows more than one of these last hop routers to be selected so that 22 the forwarding load can be distributed to and handled among these 23 routers. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on August 19, 2014. 42 Copyright Notice 44 Copyright (c) 2014 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 6 62 4. Functional Overview . . . . . . . . . . . . . . . . . . . . . 6 63 4.1. GDR Candidates . . . . . . . . . . . . . . . . . . . . . . 7 64 4.2. Hash Mask . . . . . . . . . . . . . . . . . . . . . . . . 7 65 4.3. PIM Hello Options . . . . . . . . . . . . . . . . . . . . 8 66 5. Hello Option Formats . . . . . . . . . . . . . . . . . . . . . 9 67 5.1. PIM DR Load Balancing Capability (DRLBC) Hello Option . . 9 68 5.2. PIM DR Load Balancing GDR (DRLBGDR) Hello Option . . . . . 10 69 6. Protocol Specification . . . . . . . . . . . . . . . . . . . . 11 70 6.1. PIM DR Operation . . . . . . . . . . . . . . . . . . . . . 11 71 6.2. PIM GDR Candidate Operation . . . . . . . . . . . . . . . 11 72 6.3. PIM Assert Modification . . . . . . . . . . . . . . . . . 12 73 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 74 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 75 9. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 14 76 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 77 10.1. Normative Reference . . . . . . . . . . . . . . . . . . . 14 78 10.2. Informative References . . . . . . . . . . . . . . . . . . 14 79 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 81 1. Terminology 83 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 84 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 85 document are to be interpreted as described in [RFC2119]. 87 With respect to PIM, this document follows the terminology that has 88 been defined in [RFC4601]. 90 This document also introduces the following new acronyms: 92 o GDR: GDR stands for "Group Designated Router". For each multicast 93 group, a hash algorithm (described below) is used to select one of 94 the routers as a GDR. The GDR is responsible for initiating the 95 forwarding tree building for the corresponding group. 97 o GDR Candidate: a last hop router that has potential to become a 98 GDR. A GDR Candidate must have the same DR priority and must run 99 the same GDR election hash algorithm as the DR router. It must 100 send and process received new PIM Hello Options as defined in this 101 document. There might be more than one GDR Candidate on a LAN. 102 But only one can become GDR for a specific multicast group. 104 2. Introduction 106 On a multi-access network such as an Ethernet, one of the PIM routers 107 is elected as a DR. The PIM DR has two roles in the PIM protocol. 108 On the first hop network, the PIM DR is responsible for registering 109 an active source with the Rendezvous Point (RP) if the group is 110 operated in PIM SM. On the last hop network, the PIM DR is 111 responsible for tracking local multicast listeners and forwarding to 112 these listeners if the group is operated in PIM SM. 114 Consider the following last hop network in Figure 1: 116 ( core networks ) 117 | | | 118 | | | 119 R1 R2 R3 120 | | | 121 --(last hop LAN)-- 122 | 123 | 124 (many receivers) 126 Figure 1: Last Hop Network 128 Assume R1 is elected as the Designated Router. According to 129 [RFC4601], R1 will be responsible for forwarding to the last hop LAN. 130 In addition to keeping track of IGMP and MLD membership reports, R1 131 is also responsible for initiating the creation of source and/or 132 shared trees towards the senders or the RPs. 134 Forcing sole data plane forwarding responsibility on the PIM DR 135 proves a limitation in the protocol. In comparison, even though an 136 OSPF DR, or an IS-IS DIS, handles additional duties while running the 137 OSPF or IS-IS protocols, they are not required to be solely 138 responsible for forwarding packets for the network. On the other 139 hand, on a last hop LAN, only the PIM DR is asked to forward packets 140 while the other routers handle only control traffic (and perhaps drop 141 packets due to RPF failures). The forwarding load of a last hop LAN 142 is concentrated on a single router. 144 This leads to several issues. One of the issues is that the 145 aggregated bandwidth will be limited to what R1 can handle towards 146 this particular interface. These days, it is very common that the 147 last hop LAN usually consists of switches that run IGMP/MLD or PIM 148 snooping. This allows the forwarding of multicast packets to be 149 restricted only to segments leading to receivers who have indicated 150 their interest in multicast groups using either IGMP or MLD. The 151 emergence of the switched Ethernet allows the aggregated bandwidth to 152 exceed, some times by a large number, that of a single link. For 153 example, let us modify Figure 1 and introduce an Ethernet switch in 154 Figure 2. 156 ( core networks ) 157 | | | 158 | | | 159 R1 R2 R3 160 | | | 161 +=gi0===gi1===gi2=+ 162 + + 163 + switch + 164 + + 165 +=gi4===gi5===gi6=+ 166 | | | 167 H1 H2 H3 169 Figure 2: Last Hop Network with Ethernet Switch 171 Let us assume that each individual link is a Gigabit Ethernet. Each 172 router, R1, R2 and R3, and the switch have enough forwarding capacity 173 to handle hundreds of Gigabits of data. 175 Let us further assume that each of the hosts requests 500 mbps of 176 data and different traffic is requested by each host. This 177 represents a total 1.5 gbps of data, which is under what each switch 178 or the combined uplink bandwidth across the routers can handle, even 179 under failure of a single router. 181 On the other hand, the link between R1 and switch, via port gi0, can 182 only handle a throughput of 1gbps. And if R1 is the only router, the 183 PIM DR elected using the procedure defined by RFC 4601, at least 500 184 mbps worth of data will be lost because the only link that can be 185 used to draw the traffic from the routers to the switch is via gi0. 186 In other words, the entire network's throughput is limited by the 187 single connection between the PIM DR and the switch (or the last hop 188 LAN as in Figure 1). 190 The problem may also manifest itself in a different way. For 191 example, R1 happens to forward 500 mbps worth of unicast data to H1, 192 and at the same time, H2 and H3 each requests 300 mbps of different 193 multicast data. Once again packet drop happens on R1 while in the 194 mean time, there is sufficient forwarding capacity left on R2 and R3 195 and link capacity between the switch and R2/R3. 197 Another important issue is related to failover. If R1 is the only 198 forwarder on the last hop network, in the event of a failure when R1 199 goes out of service, multicast forwarding for the entire network has 200 to be rebuilt by the newly elected PIM DR. However, if there was a 201 way that allowed multiple routers to forward to the network for 202 different groups, failure of one of the routers would only lead to 203 disruption to a subset of the flows, therefore improving the overall 204 resilience of the network. 206 In this document, we propose a modification to the PIM protocol that 207 allows more than one of these routers, called Group Designated Router 208 (GDR) to be selected so that the forwarding load can be distributed 209 to and handled by a number of routers. 211 3. Applicability 213 The proposed change described in this specification applies to PIM SM 214 last hop routers only. 216 It does not alter the behavior of a PIM DR on the first hop network 217 This is because the source tree is built using the IP address of the 218 sender, not the IP address of the PIM DR that sends the registers 219 towards the RP. The load balancing between first hop routers can be 220 achieved naturally if an IGP provides equal cost multiple paths 221 (which it usually does in practice). And distributing the load to do 222 registering does not justify the additional complexity required to 223 support it. 225 4. Functional Overview 227 In the existing PIM DR election, when multiple last hop routers are 228 connected to a multi-access network (for example, an Ethernet), one 229 of them is selected to act as PIM DR. The PIM DR is responsible for 230 sending Join/Prune messages towards the RP or source. To elect the 231 PIM DR, each PIM router on the network examines the received PIM 232 Hello messages and compares its DR priority and IP address with those 233 of its neighbors. The router with the highest DR priority is the PIM 234 DR. If there are multiple such routers, their IP addresses are used 235 as the tie-breaker, as described in [RFC4601]. 237 In order to share forwarding load among last hop routers, besides the 238 normal PIM DR election, the GDR is also elected on the last hop 239 multi-access network. There is only one PIM DR on the multi-access 240 network, but there might be multiple GDR Candidates. 242 For each multicast group, a hash algorithm is used to select one of 243 the routers to be the GDR. Hash Masks are defined for Source, Group 244 and RP separately, in order to handle PIM ASM/SSM. The masks are 245 announced in PIM Hello by DR as a DR Load Balancing GDR (DRLBGDR) 246 Hello Option. Besides that, a DR Load Balancing Capability (DRLBC) 247 Hello Option, which contains hash algorithm type, is also announced 248 by router interfaces which have this specification supported. Last 249 hop routers who are with the new DRLBC Option, and with the same GDR 250 election hash algorithm and the same DR priority as the PIM DR are 251 GDR Candidates. 253 A hash algorithm based on the announced Source, Group or RP masks 254 allows one GDR to be assigned to a corresponding multicast group, and 255 that GDR is responsible for initiating the creation of the multicast 256 forwarding tree for the group. 258 4.1. GDR Candidates 260 GDR is the new concept introduced by this specification. GDR 261 Candidates are routers eligible for GDR election on the LAN. To 262 become a GDR Candidate, a router MUST support this specification, 263 have the same DR priority and run the same GDR election hash 264 algorithm as the DR on the LAN. 266 For example, assume there are 4 routers on the LAN: R1, R2, R3 and 267 R4, which all support this specification on the LAN. R1, R2 and R3 268 have the same DR priority while R4's DR priority is less preferred. 269 In this example, R4 will not be eligible for GDR election, because R4 270 will not become a PIM DR unless all of R1, R2 and R3 go out of 271 service. 273 Further assume router R1 wins the PIM DR election, and R1, R2 run the 274 same hash algorithm for GDR election, while R3 runs a different one. 275 Then only R1 and R2 will be eligible for GDR election, R3 will not. 277 As a DR, R1 will include its own Load Balancing Hash Masks, and also 278 the identity of R1 and R2 (the GDR Candidates) in its DRLBGDR Hello 279 Option. 281 4.2. Hash Mask 283 A Hash Mask is used to extract a number of bits from the 284 corresponding IP address field (32 for v4, 128 for v6), and calculate 285 a hash value. A hash value is used to select a GDR from GDR 286 Candidates advertised by PIM DR. For example, 0.255.0.0 defines a 287 Hash Mask for an IPv4 address that masks the first, the third and the 288 fourth octets. 290 There are three Hash Masks defined, 292 o RP Hash Mask 293 o Source Hash Mask 294 o Group Hash Mask 296 The Hash Masks MUST be configured on the PIM routers that can 297 potentially become a PIM DR. 299 A simple Modulo hash algorithm will be discussed in this document. 300 However, to allow other hash algorithm to be used, a 4-bytes "Hash 301 Algorithm Type" field is included in DRLBC Hello Option to specify 302 the hash algorithm used by a last hop router. 304 If different hash algorithm types are advertised among last hop 305 routers, only last hop routers running the same hash algorithm as the 306 DR (and having the same DR priority as the DR) are eligible for GDR 307 election. 309 For ASM groups, a hash value is calculated using the following Modulo 310 algorithm: 312 o hashvalue_RP = (((RP_address & RP_hashmask) >> N) & 0xFFFF) % M 314 RP_address is the address of the RP defined for the group. N is the 315 number of zeros, counted from the least significant bit of the 316 RP_hashmask. For example, for a given IPv4 RP_hashmask 0.255.0.0, N 317 will be 16. M is the number of GDR Candidates as described above. 319 If RP_hashmask is 0, a hash value is also calculated using the group 320 Hash Mask in a similar fashion. 322 o hashvalue_Group = (((Group_address & Group_hashmask) >> N) & 323 0xFFFF) % M 325 For SSM groups, a hash value is calculated using both the source and 326 group Hash Mask 328 o hashvalue_SG = ((((Source_address & Source_hashmask) >> N_S) & 329 0xFFFF) ^ (((Group_address & Group_hashmask) >> N_G) & 0xFFFF)) % 330 M 332 4.3. PIM Hello Options 334 When a last hop PIM router sends a PIM Hello from an interface with 335 this specificiation support, it includes a new option, called "Load 336 Balancing Capability (DRLBC)". 338 Besides this DRLBC Hello Option, the elected PIM DR also includes a 339 new "DR Load Balancing GDR (DRLBGDR) Hello Option". The DRLBGDR 340 Hello Option consists of three Hash Masks as defined above and also 341 the sorted addresses of all GDR Candidates on the last hop network. 343 The elected PIM DR uses DRLBC Hello Option advertised by all routers 344 on the last hop network to compose its DRLBGDR . The GDR Candidates 345 use DRLBGDR Hello Option advertised by PIM DR to calculate hash 346 value. 348 5. Hello Option Formats 350 5.1. PIM DR Load Balancing Capability (DRLBC) Hello Option 352 0 1 2 3 353 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 355 | Type = TBD | Length = 4 | 356 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 357 | Hash Algorithm Type | 358 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 360 Figure 3: Capability Hello Option 362 Type: TBD. 363 Length: 4 octets 364 Hash Algorithm Type: 0 for Modulo hash algorithm 366 This DRLBC Hello Option SHOULD be advertised by last hop routers from 367 interfaces which support this specification. 369 5.2. PIM DR Load Balancing GDR (DRLBGDR) Hello Option 371 0 1 2 3 372 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 373 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 374 | Type = TBD | Length | 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 376 | Group Mask | 377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 378 | Source Mask | 379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 380 | RP Mask | 381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 382 | GDR Candidate Address(es) | 383 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 385 Figure 4: GDR Hello Option 387 Type: TBD 388 Length: 389 Group Mask (32/128 bits): Mask 390 Source Mask (32/128 bits): Mask 391 RP Mask (32/128 bits): Mask 392 All masks MUST be in the same address family, with the same 393 length. 394 GDR Address (32/128 bits): Address(es) of GDR Candidate(s) 395 All addresses must be in the same address family. The addresses 396 are sorted from high to low. The order is converted to the 397 ordinal number associated with each GDR candidate in hash value 398 calculation. For example, addresses advertised are R3, R2, R1, 399 the ordinal number assigned to R3 is 0, to R2 is 1 and to R1 is 2. 400 If "Interface ID" option (type 31) presents in a GDR Candicate's 401 PIM Hello message, and the "Router ID" portion is non-zero, 402 * For IPv4, the "GDR Candidate Address" will be set directly to 403 "Router ID". 404 * For IPv6, the "GDR Candidate Address" will be set to the IPv4- 405 IPv6 translated address of "Router ID", as described in 406 [RFC4291], that is the "Router-ID" is appended to the prefix of 407 96-bits zeros. 409 If the "Interface ID" option is not present in a GDR Candidate's 410 PIM Hello message, or if the "Interface ID" option is present, 411 but"Router ID" field is zero, the "GDR Candidate Address" will be 412 the IPv4 or IPv6 source address from PIM Hello message. 414 This DRLBGDR Hello Option SHOULD only be advertised by the elected 415 PIM DR. 417 6. Protocol Specification 419 6.1. PIM DR Operation 421 The DR election process is still the same as defined in [RFC4601]. A 422 DR that has this specification enabled on the interface, advertises 423 the new LBGRD Hello Option, which contains value of masks from user 424 configuration, followed by a sorted list of addresses of all GDR 425 Candidates. Moreover, same as non-DR routers, DR also advertises 426 DRLBC Hello Option to indicate its capability of supporting this 427 specification and the type of its GDR election hash algorithm. 429 If a PIM DR receives a neighbor Hello with DRLBGRD Option, the PIM DR 430 SHOULD ignore the TLV. 432 If a PIM DR receives a neighbor DRLBC Hello Option, which contains 433 the same hash algorithm type as the DR, and the neighbor has the same 434 DR priority as the DR, PIM DR SHOULD consider the neighbor as a GDR 435 Candidate and insert the neighbor's address into the sorted list of 436 DRLBGRD Option. 438 6.2. PIM GDR Candidate Operation 440 When an IGMP join is received, without this proposal, router R1 (the 441 PIM DR) will handle the join and potentially run into the issues 442 described earlier. Using this proposal, a hash algorithm is used to 443 determine which router is going to be responsible for building 444 forwarding trees on behalf of the host. 446 The algorithm works as follows, assuming the router in question is X, 447 which is a GDR Candidate, and its ordinal number assigned implicitly 448 by PIM DR in DRLBGDR Hello Option is Ox: 450 o If the group is ASM, and the RP Hash Mask announced by the PIM DR 451 is not zero, calculate the value of hashvalue_RP. If hashvalue_RP 452 is equal to Ox, X becomes the GDR. 454 For example, X with IPv4 address 10.1.1.3, receives a DRLBGDR Hello 455 Option from the DR, which announces RP Hash Mask 0.255.0.0, and a 456 list of GDR Candidates, sorted by IP addresses from high to low, 457 10.1.1.3, 10.1.1.2 and 10.1.1.1. The ordinal number assigned to 458 those addresses would be 0 for 10.1.1.3 (X), 1 for 10.1.1.2, and 2 459 for 10.1.1.1. Assume there are 2 RPs: RP1 172.3.10.10 for Group1 and 460 RP2 172.2.10.10 for Group2. Following the modulo hash algorithm 462 hashvalue_RP = (((RP_address & RP_hashmask) >> N) & 0xFFFF) % M 464 Here N is 16 for 0.255.0.0, and M is 3 for the total number of GDR 465 Candidates. The hasvalue_RP for RP1 172.3.10.10 is 0, matches the 466 ordinal number assigned to X. X will be the GDR for Group1, which 467 uses 172.3.10.10 as the RP. The hashvalue_RP for RP2 172.2.10.10 is 468 2, which is different from X's ordinal number, hence, X will not be 469 GDR for Group2. 471 o If the group is ASM, and the RP Hash Mask announced by the PIM DR 472 is zero, obtain the value of hashvalue_Group. Compare 473 hashvalue_Group with Ox, to decide if X is the GDR. 474 o If the group is SSM, then use hashvalue_SG to determine if X is 475 the GDR. 477 If X is the GDR for the group, X will be responsible for building the 478 forwarding tree. 480 A router interface where this protocol is enabled advertises DRLBC 481 Hello Option in its PIM Hello, even if the router may not be a GDR 482 Candidate. 484 A GDR Candidate may receive a DRLBGDR Hello Option from PIM DR, with 485 different Hash Masks from those configured on it, The GDR Candidate 486 must use the Hash Masks advertised by the PIM DR to calculate the 487 hash value. 489 A GDR Candidate may receive a DRLBGDR Hello Option from a non-DR PIM 490 router. The GDR Candidate must ignore such DRLBGDR Hello Option. 492 A GDR Candidate may receive a Hello from the elected PIM DR, and the 493 PIM DR does not support this specification. The GDR election 494 described by this specification will not take place, that is only the 495 PIM DR joins the multicast tree. 497 6.3. PIM Assert Modification 499 It is possible that the identity of the GDR might change in the 500 middle of an active flow. Examples this could happen include: 501 1. When a new PIM router comes up 502 2. When a GDR restarts 503 When the GDR changes, existing traffic might be disrupted. 504 Duplicates or packet loss might be observed. To illustrate the case, 505 consider the following scenario: there are two streams G1 and G2. R1 506 is the GDR for G1, and R2 is the GDR for G2. When R3 comes up 507 online, it is possible that R3 becomes GDR for both G1 and G2, hence 508 R2 starts to build the forwarding tree for G1 and G2. If R1 and R2 509 stop forwarding before R3 completes the process, packet loss might 510 occur. On the other hand, if R1 and R2 continue forwarding while R3 511 is building the forwarding trees, duplicates might occur. 513 This is not a typical deployment scenario but it still might happen. 514 Here we describe a mechanism to minimize the impact. The motivation 515 is that we want to minimize packet loss. And therefore, we would 516 allow a small amount of duplicates and depend on PIM Assert to 517 minimize the duplication. 519 When the role of GDR changes as above, instead of immediately 520 stopping forwarding, R1 and R2 continue forwarding to G1 and G2 521 respectively, while at the same time, R3 build forwarding trees for 522 G1 and G2. This will lead to PIM Asserts. 524 Due to the introduction of GDR, this document suggests the following 525 modification to the Assert packet: if a router enables this 526 specification on its downstream interface, but it is not a GDR, it 527 would adjust its Assert metric to (PIM_ASSERT_INFINITY - 1). 529 Using the above example, assume R1 and R3 agree on the new GDR, which 530 is R3. R1 will set its Assert metric as (PIM_ASSERT_INFINITY - 1). 531 That will make R3, which has normal metric in its Assert as the 532 Assert winner. 534 For G2, assume it takes a little bit longer time for R2 to find out 535 that R3 is the new GDR and still thinks itself being the GDR while R3 536 already has assumed the role of GDR. Since both R2 and R3 think they 537 are GDRs, they further compare the metric and IP address. If R3 has 538 the better routing metric, or same metric but better tie-breaker, the 539 result will be consistent with GDR selection. If unfortunately, R2 540 has the better metric or same metric but better tie-breaker R2 will 541 become the Assert winner and continues to forward traffic. This will 542 continue until: 543 1. The next PIM Hello option from DR is seen that selects R3 as the 544 GDR. 545 2. R3 will build the forwarding tree and send an Assert. 546 The process continues until R2 agrees to the selection of R3 as being 547 the GDR, and set its own Assert metric to (PIM_ASSERT_INFINITY - 1), 548 which will make R3 the Assert winner. During the process, we will 549 see intermittent duplication of traffic but packet loss will be 550 minimized. In the unlikely case that R2 never relinquishes its role 551 as GDR (while every other router thinks otherwise), the proposed 552 mechanism also helps to keep the duplication to a minimum until 553 manual intervention takes place to remedy the situation. 555 7. IANA Considerations 557 Two new PIM Hello Option Types are required to be assigned to the DR 558 Load Balancing messages. [HELLO-OPT], this document recommends 559 34(0x22) as the new "PIM DR Load Balancing Capability Hello Option", 560 and 35(0x23) as the new "PIM DR Load Balancing GDR Hello Option". 562 8. Security Considerations 564 Security of the PIM DR Load Balancing Hello message is only 565 guaranteed by the security of PIM Hello message, so the security 566 considerations for PIM Hello messages as described in PIM-SM 567 [RFC4601] apply here. 569 9. Acknowledgement 571 The authors would like to thank Steve Simlo, Taki Millonis for 572 helping with the original idea, Bill Atwood for review comments, Stig 573 Venaas, Toerless Eckert and Rishabh Parekh for helpful conversation 574 on the document. 576 10. References 578 10.1. Normative Reference 580 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 581 Requirement Levels", BCP 14, RFC 2119, March 1997. 583 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 584 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 585 Protocol Specification (Revised)", RFC 4601, August 2006. 587 10.2. Informative References 589 [RFC3973] Adams, A., Nicholas, J., and W. Siadak, "Protocol 590 Independent Multicast - Dense Mode (PIM-DM): Protocol 591 Specification (Revised)", RFC 3973, January 2005. 593 [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, 594 "Bidirectional Protocol Independent Multicast (BIDIR- 595 PIM)", RFC 5015, October 2007. 597 [RFC6395] Gulrajani, S. and S. Venaas, "An Interface Identifier (ID) 598 Hello Option for PIM", RFC 6395, October 2011. 600 [RFC4291] Hinden, R. and L. S., "IP Version 6 Addressing 601 Architecture", RFC 6890, February 2006. 603 [HELLO-OPT] 604 IANA, "PIM Hello Options", PIM-HELLO-OPTIONS per 605 RFC4601 http://www.iana.org/assignments/pim-hello-options, 606 March 2007. 608 Authors' Addresses 610 Yiqun Cai 611 Microsoft 612 La Avenida 613 Mountain View, CA 94043 614 USA 616 Email: yiqunc@microsoft.com 618 Sri Vallepalli 619 Cisco Systems, Inc. 620 Tasman Drive 621 San Jose, CA 95134 622 USA 624 Email: svallepa@cisco.com 626 Heidi Ou 627 Cisco Systems, Inc. 628 Tasman Drive 629 San Jose, CA 95134 630 USA 632 Email: hou@cisco.com 633 Andy Green 634 British Telecom 635 Adastral Park 636 Ipswich IP5 2RE 637 United Kingdom 639 Email: andy.da.green@bt.com