idnits 2.17.1 draft-farinacci-mpls-multicast-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 2000) is 8840 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '1' is defined on line 377, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 384, but no explicit reference was found in the text == Outdated reference: A later version (-07) exists of draft-ietf-mpls-arch-06 ** Obsolete normative reference: RFC 2362 (ref. '2') (Obsoleted by RFC 4601, RFC 5059) -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' Summary: 5 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Dino Farinacci 2 Internet Draft Procket Networks, Inc. 3 Expiration Date: July 2000 4 Yakov Rekhter 5 Eric C. Rosen 6 Cisco Systems, Inc. 8 January 2000 10 Using PIM to Distribute MPLS Labels for Multicast Routes 12 draft-farinacci-mpls-multicast-01.txt 14 Status of this Memo 16 This document is an Internet-Draft and is in full conformance with 17 all provisions of Section 10 of RFC2026. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 Abstract 37 This document specifies a method of distributing MPLS labels for 38 multicast routes. The labels are distributed in the same PIM 39 messages that are used to create the corresponding routes. The 40 method is media-type independent, and therefore works for multi- 41 access/multicast capable LANs, point-to-point links, and NBMA 42 networks. 44 Table of Contents 46 1 Overview ........................................... 2 47 2 Proposal ........................................... 3 48 2.1 Piggybacking ....................................... 3 49 2.2 Labels for LANs with Multiple Downstream Nodes ..... 5 50 2.3 Labels for Point-to-Point Links .................... 5 51 2.4 Labels for NBMA Networks ........................... 5 52 2.5 Corner cases ....................................... 6 53 2.6 When NOT to Send a Labelled Multicast Packet ....... 7 54 2.7 No Conflict between Unicast and Multicast Labels ... 7 55 3 Modifications to PIMv2 ............................. 7 56 4 Label Distribution for dense-mode groups ........... 8 57 5 Security Considerations ............................ 9 58 6 Acknowledgments .................................... 9 59 7 References ......................................... 10 61 1. Overview 63 PIM [2] is used to combine MPLS label distribution with the 64 distribution of (*,G) join state, (S,G) join state, or (S,G)RPT-bit 65 prune state. Labels and multicast routes are sent together in one 66 message. 68 The design of this method has been motivated by the following goals: 70 o If an interface attaches to a network with data-link broadcast 71 capability, an LSR should never have to send more than one copy 72 of a given multicast data packet out that interface. However, it 73 is NOT a goal for that LSR to be able to send the same packet, 74 with the same label, out multiple interfaces. 76 o When an interface supports data link multicasting, it must be 77 possible to have a single Label Information Base (LIB) for that 78 interface. That is, the receiver of a labeled packet should be 79 able to interpret the label without knowing who the transmitter 80 is. 82 o When a LAN contains multiple label distribution peers, it should 83 be possible to use data link multicast to distribute the label 84 distribution control packets themselves. Other aspects of label 85 distribution methodology should remain as consistent with unicast 86 label distribution as possible. Multicast label distribution 87 procedures should not depend on the media type. 89 o Once the label for a particular multicast tree on a given LAN has 90 been assigned, unicast routing changes should not cause 91 redistribution or reassignment of the label for that group on 92 that LAN. 94 o When a multicast routing table change requires a label 95 distribution change, the latency between the two should be 96 minimized, both to improve performance and to minimize the 97 possibility of race conditions. 99 o The procedures should work with either dense-mode or sparse mode 100 operation. 102 2. Proposal 104 2.1. Piggybacking 106 A LSR that supports multicast sends PIM Join/Prune messages on behalf 107 of hosts that join groups. It sends Join/Prune messages to upstream 108 neighboring LSRs toward the RP for the shared-tree (*,G) or toward a 109 source for a source-tree (S,G). Labels are distributed by being 110 associated with addresses in the join list or the prune list. In 111 particular: 113 1. If an LSR, Rd, joins the shared tree for a group, the 114 Join/Prune message it sends upstream will contain the group 115 address followed by a join-list. The join-list will contain an 116 element which contains the address of the RP. This element 117 will also contain a a label, and this label can be used by the 118 upstream LSR, Ru, when it sends multicast data down the shared 119 tree. Intuitively, this label represents the route downstream 120 from the current node along the shared tree. 122 2. If an LSR, Rd, joins a source tree for a group, the Join/Prune 123 message it sends upstream will contain the group address 124 followed by a join-list. The join-list will contain an element 125 which contains the address of the source. This element will 126 also contain a label, and this label can be used by the 127 upstream LSR, Ru, when it sends multicast data down the source 128 tree. Intuitively, this label represents the route downstream 129 from the current node along the specified source tree. 131 3. Suppose an LSR, Rd, has (S,G)RPT-bit state with a null output 132 interface list. This indicates that all of its downstream 133 neighbors on the shared tree for G have pruned source S from 134 the shared tree. Rd sends a Join/Prune message upstream (on 135 the shared tree), containing the group address followed by a 136 prune-list. The prune-list contains an element which contains 137 the address of the source. In this case, no label is included 138 in the element. 140 4. Suppose an LSR, Rd, as the result of receiving, from a 141 downstream neighbor on the shared tree, a Join/Prune message 142 such as described in 3, creates (S,G)RPT-bit state with a non- 143 null output interface list. In this case, it may send a 144 Join/Prune message upstream on the shared tree, containing the 145 group address followed by a prune-list. An element of the 146 prune list will contain the address S and a corresponding 147 label. However, a special bit (the "don't prune" bit) in the 148 element will be set indicating to the upstream LSR that the 149 source S is not really to be pruned from the shared tree. The 150 result is that the upstream LSR, Ru, will still send packets 151 from S to G to Rd, and will label those packets as specified. 152 When Rd receives such packets, it forwards them according to 153 the output interface list of the (S,G)RPT-bit entry. 154 Intuitively, this label represents a route along the shared 155 tree, but only for packets from the specified source. 157 5. An LSR which receives a Join/Prune message as described in 4 158 may send a corresponding Join/Prune message (with the "don't 159 prune" bit set) to its upstream LSR on the shared tree. Again, 160 this label represents a route along the shared tree, but only 161 for packets from the specified source. 163 Rules 3-5 above ensure that if a source is pruned off the shared tree 164 at some point, any packets from that source which is sent down the 165 shared tree will have a label that implicitly identifies the source. 166 Thus if those packets encounter a node with (S,G)RPT-bit state, they 167 will be sent according to the output interface list of the (S,G)RPT- 168 bit entry, NOT according to the output interface list of the (*,G) 169 entry. 171 2.2. Labels for LANs with Multiple Downstream Nodes 173 Since PIM Join/Prune messages are multicast on a LAN, other 174 downstream LSRs that are interested in the group will hear the 175 message. They must cache the binding of multicast routing table 176 state and label state together. Since the upstream LSR is going to 177 forward data packets using the advertised label, they must be ready 178 to accept the data packet with that advertised label. 180 The first downstream LSR that joins a group is the label assigner on 181 that LAN for that multicast route. All other downstream LSRs that 182 send PIM Join/Prune messages will use the same label that the 183 assigner selected. A LSR that sends a PIM Join/Prune message with a 184 label of 0 means that it doesn't know the label for the associated 185 multicast routing table entry. When this occurs, the assigner can 186 trigger a PIM Join/Prune message making the label known. 188 2.3. Labels for Point-to-Point Links 190 The procedure of section 2.2 works on point-to-point links because 191 there is only one downstream LSR on the link which always becomes the 192 label assigner. 194 2.4. Labels for NBMA Networks 196 On NBMA networks, all PIM routers are known to each other through 197 pseudo-broadcast mechanisms provided by the data-link layer. However, 198 PIM Join messages are unicast to the upstream LSR. Therefore, other 199 downstream LSRs will not hear the label assigner's advertisement. 200 Therefore we treat an NBMA network with one upstream and n downstream 201 LSRs as n point-to-point links, from the upstream LSR to each of the 202 downstream LSRs. Each downstream LSR then assigns its own label, and 203 the upstream LSR must replicate the multicast data packets. 204 Therefore the procedure of section 2.2 applies. 206 Note that this is not incompatible with the use of native point-to- 207 multipoint capabilities at the data link layer. 209 2.5. Corner cases 211 Multiple downstream LSRs cannot assign the same label value for any 212 multicast route because they partition the label space into non- 213 overlapping ranges according to [4]. When a LSR is enabled on an 214 interface, it obtains a unique label range for the LAN. 216 When the label assigner leaves the group, the label that it assigned 217 still remains active. The next highest IP addressed downstream LSR 218 becomes the owner of that label and may change it if it sees fit. 219 However, it is not required to change it. All downstream LSRs can 220 continue to use the assignment in their Join messages. 222 If two systems both join for the first time (they do not have state), 223 at the same time and each choose a different label value, the highest 224 IP addressed downstream LSR's label will be used by the upstream LSR. 225 The lower addressed LSR will hear the higher addressed LSR's Join too 226 and will also use it's label. 228 If the label assigner crashes, the highest IP addressed downstream 229 LSR assigns a new label to the multicast routes, which were assigned 230 by the crashing LSR, and triggers a Join message so all other LSRs on 231 the LAN to use the new label. 233 When a LAN partitions due to a layer-2 switch failure, it follows the 234 same logic for the case when a LSR stops joining for a group. When 235 the partition heals, there may be an RPF neighbor change in one of 236 the partitions. When there is an RPF neighbor change and the 237 downstream routers trigger joins to their new RPF neighbor with a 238 different label assignment than the other partition is using, one of 239 two resolutions occur: 241 1) The LSR which is the allocator in the partition of the new RPF 242 neighbor will trigger a join if it has a higher IP address than 243 the allocator in the other region. The downstream routers in 244 the other partition use the new label assignment immediately. 246 2) If the LSR which is the allocator in the partition of the new 247 RPF neighbor has a lower IP address, all downstream routers and 248 the new RPF neighbor will switch to the label assigned by the 249 allocator in the other partition. 251 If an RPF change occurs (the topology changed so the upstream LSR is 252 different), the PIM protocol spec indicates that a PIM Join may be 253 triggered to get on the new distribution tree as soon as possible. In 254 this case, if the label assigner becomes the upstream LSR, then the 255 new highest IP addressed downstream LSR may become the label 256 assigner. It may change the label if it sees fit. Otherwise, the same 257 label is used. 259 2.6. When NOT to Send a Labelled Multicast Packet 261 PIM Hello messages, sent periodically by all PIM-capable routers, 262 will indicate if the router is MPLS-capable. An upstream router on a 263 LAN will therefore know if all routers on the same LAN are LSRs or 264 not. If there are ANY MPLS-incapable routers which are interested in 265 a particular group, the upstream router will transmit to the LAN only 266 unlabelled multicast data packets for that group. 268 If there are any group members on a LAN, only unlabelled multicast 269 data for that group will be transmitted onto that LAN. 271 Routers that support non-PIM multicast are assumed, for the purposes 272 of this procedure, to be MPLS-incapable. 274 2.7. No Conflict between Unicast and Multicast Labels 276 MPLS uses different data-link layer code-points [5] to distinguish 277 multicast labeled packets from unicast labeled packets. Therefore, 278 the assignment of labels for unicast routes is completely independent 279 from the assignment of labels for multicast routes. For example, the 280 same label value could be allocated for a unicast route and for a 281 multicast route, without any possibility of ambiguity. 283 3. Modifications to PIMv2 285 PIMv2 has a packet format for each address type it may support when 286 encoding both multicast and unicast addresses. We will define a new 287 address type called "Label Address" for unicast address encoding. 288 The label will accompany the source address in the Encoded Source 289 Address format as specified in [2]. The label value will be in a 290 32-bit quantity following the source address. We also take one bit 291 from the PIMv2 reserved field to be the "don't prune" bit (shown 292 below as the "D" bit). So, for example, an IPv4 Label Address format 293 would look like: 295 0 1 2 3 296 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 297 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 298 | Rsrvd |D|S|W|R| Mask Len | 299 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 300 | Source Address | 301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 302 | Label | 303 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 304 | Current Multicast Route Timer | 305 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 307 Label 308 If the high-order bit is clear, the low-order 20 bits are a label 309 value (as described in [5]) assigned by the LSR sending the 310 Join/Prune message. All other bits should be set to 0 by the 311 sender and should be ignored by the receiver. 313 If the high-order bit is set, the low-order 28 bits are a label 314 value in the VPI/VCI format of (as described in [7]) assigned by 315 the LSR sending the Join/Prune message. All other bits should be 316 set to 0 by the senderand should be ignored by the receiver. 318 Current Multicast Route Timer 319 The sender of a Join/Prune message inserts the current time left 320 before expiration for the multicast route table entry described by 321 the Source Address (either the (S,G) or (*,G) entry). This is 322 needed so all routers on a common multi-access subnet can time-out 323 the entry close to the same time without each other recreating the 324 state when the source goes inactive. 326 Refer to [2] for other field descriptions not specified here. 328 4. Label Distribution for dense-mode groups 330 In dense-mode PIM, there is no downstream Join message traveling 331 upstream to perform the binding of multicast routes with labels. 332 However, since we don't want a separate algorithm for dense-mode 333 groups, we extend this basic design for dense-mode PIM. 335 When a downstream LSR creates (S,G) state from the receipt of 1) 336 data, or 2) Join/Prune or Graft messages, it will start a periodic 337 timer to send Join messages with label assignment information 338 present. The messages look no different and are treated on receipt no 339 differently than in the sparse-mode case. 341 The periodic Join message will be multicast on the LAN with an 342 upstream target address of 0.0.0.0. All multicast LSRs on the LAN 343 must know the group operates in dense-mode. This is accomplished 344 using standard PIM mechanisms. 346 5. Security Considerations 348 Security considerations are not discussed in this memo. 350 6. Acknowledgments 352 The authors would like to thank Fred Baker for his comments. We also 353 thank the authors of [6] for their critique of an earlier version. 355 9.0 Author's Addresses 357 Dino Farinacci 358 Procket Networks, Inc. 359 3850 North First Street 360 San Jose, CA 95134 361 Email: dino@procket.com 363 Yakov Rekhter 364 Cisco Systems, Inc. 365 170 Tasman Drive 366 San Jose, CA, 95134 367 Email: yakov@cisco.com 369 Eric C. Rosen 370 Cisco Systems, Inc. 371 250 Apollo Drive 372 Chelmsford, MA, 01824 373 Email: erosen@cisco.com 375 7. References 377 [1] "Multiprotocol Label Switching Architecture", draft-ietf-mpls- 378 arch-06.txt, Rosen, Viswanathan, Callon, Augusst 1999. 380 [2] "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol 381 Specification", RFC 2362, Estrin, Farinacci, Helmy, Thaler, Deering, 382 Handley, Jacobson, Liu, Sharma, Wei, June 1998. 384 [3] "LDP Specification", , Andersson, 385 Doolan, Feldman, Fredette, Thomas, October 1999. 387 [4] "Partitioning Label Space amoung Multicast Routers on a Common 388 Subnet", , Farinacci, 389 August 1999. 391 [5] "MPLS Label Stack Encoding", , Rosen, Rekhter, Farinacci, Tappan, Fedorkow, Li, Conta, 393 September 1999. 395 [6] "Framework for IP Multicast in MPLS", , Ooms, Livens, Sales, Ramalho, Acharya, Griffoul, 397 Ansari, May 1999. 399 [7] "MPLS using LDP and ATM VC Switching", , Davie, Lawrence, McCloghrie, Rekhter, Rosen, Swallow, 401 Doolan, April 1999.