idnits 2.17.1 draft-ietf-bier-architecture-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 24, 2017) is 2552 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 356 -- Looks like a reference, but probably isn't: '255' on line 356 -- Looks like a reference, but probably isn't: '1' on line 278 -- Looks like a reference, but probably isn't: '65535' on line 261 -- Looks like a reference, but probably isn't: '256' on line 276 -- Looks like a reference, but probably isn't: '512' on line 278 -- Looks like a reference, but probably isn't: '15' on line 355 Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force IJ. Wijnands, Ed. 3 Internet-Draft Cisco Systems, Inc. 4 Intended status: Standards Track E. Rosen, Ed. 5 Expires: October 26, 2017 Juniper Networks, Inc. 6 A. Dolganow 7 Nokia 8 T. Przygienda 9 Juniper Networks, Inc. 10 S. Aldrin 11 Google, Inc. 12 April 24, 2017 14 Multicast using Bit Index Explicit Replication 15 draft-ietf-bier-architecture-06 17 Abstract 19 This document specifies a new architecture for the forwarding of 20 multicast data packets. It provides optimal forwarding of multicast 21 packets through a "multicast domain". However, it does not require a 22 protocol for explicitly building multicast distribution trees, nor 23 does it require intermediate nodes to maintain any per-flow state. 24 This architecture is known as "Bit Index Explicit Replication" 25 (BIER). When a multicast data packet enters the domain, the ingress 26 router determines the set of egress routers to which the packet needs 27 to be sent. The ingress router then encapsulates the packet in a 28 BIER header. The BIER header contains a bitstring in which each bit 29 represents exactly one egress router in the domain; to forward the 30 packet to a given set of egress routers, the bits corresponding to 31 those routers are set in the BIER header. Elimination of the per- 32 flow state and the explicit tree-building protocols results in a 33 considerable simplification. 35 Status of This Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at http://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on October 26, 2017. 51 Copyright Notice 53 Copyright (c) 2017 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 69 2. The BFR Identifier and BFR-Prefix . . . . . . . . . . . . . . 6 70 3. Encoding BFR Identifiers in BitStrings . . . . . . . . . . . 7 71 4. Layering . . . . . . . . . . . . . . . . . . . . . . . . . . 9 72 4.1. The Routing Underlay . . . . . . . . . . . . . . . . . . 9 73 4.2. The BIER Layer . . . . . . . . . . . . . . . . . . . . . 10 74 4.3. The Multicast Flow Overlay . . . . . . . . . . . . . . . 11 75 5. Advertising BFR-ids and BFR-Prefixes . . . . . . . . . . . . 12 76 6. BIER Intra-Domain Forwarding Procedures . . . . . . . . . . . 13 77 6.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 13 78 6.2. BFR Neighbors . . . . . . . . . . . . . . . . . . . . . . 15 79 6.3. The Bit Index Routing Table . . . . . . . . . . . . . . . 15 80 6.4. The Bit Index Forwarding Table . . . . . . . . . . . . . 16 81 6.5. The BIER Forwarding Procedure . . . . . . . . . . . . . . 17 82 6.6. Examples of BIER Forwarding . . . . . . . . . . . . . . . 19 83 6.6.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . 20 84 6.6.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . 21 85 6.7. Equal Cost Multi-path Forwarding . . . . . . . . . . . . 23 86 6.7.1. Non-deterministic ECMP . . . . . . . . . . . . . . . 23 87 6.7.2. Deterministic ECMP . . . . . . . . . . . . . . . . . 24 88 6.8. Prevention of Loops and Duplicates . . . . . . . . . . . 26 89 6.9. When Some Nodes do not Support BIER . . . . . . . . . . . 27 90 6.10. Use of Different BitStringLengths within a Domain . . . . 29 91 6.10.1. BitStringLength Compatibility Check . . . . . . . . 30 92 6.10.2. Handling BitStringLength Mismatches . . . . . . . . 32 93 6.10.3. Transitioning from One BitStringLength to Another . 32 94 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 95 8. Security Considerations . . . . . . . . . . . . . . . . . . . 33 96 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33 97 10. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 33 98 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 99 11.1. Normative References . . . . . . . . . . . . . . . . . . 35 100 11.2. Informative References . . . . . . . . . . . . . . . . . 35 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 36 103 1. Introduction 105 This document specifies a new architecture for the forwarding of 106 multicast data packets. It provides optimal forwarding of multicast 107 data packets through a "multicast domain". However, it does not 108 require the use of a protocol for explicitly building multicast 109 distribution trees, and it does not require intermediate nodes to 110 maintain any per-flow state. This architecture is known as "Bit 111 Index Explicit Replication" (BIER). 113 A router that supports BIER is known as a "Bit-Forwarding Router" 114 (BFR). The BIER control plane protocols (see Section 4.2) run within 115 a "BIER domain", allowing the BFRs within that domain to exchange the 116 information needed for them to forward packets to each other using 117 BIER. 119 A multicast data packet enters a BIER domain at a "Bit-Forwarding 120 Ingress Router" (BFIR), and leaves the BIER domain at one or more 121 "Bit-Forwarding Egress Routers" (BFERs). A BFR that receives a 122 multicast data packet from another BFR in the same BIER domain, and 123 forwards the packet to another BFR in the same BIER domain, will be 124 known as a "transit BFR" for that packet. A single BFR may be a BFIR 125 for some multicast traffic while also being a BFER for some multicast 126 traffic and a transit BFR for some multicast traffic. In fact, a BFR 127 may be the BFIR for a given packet and may also be (one of) the 128 BFER(s), for that packet; it may also forward that packet to one or 129 more additional BFRs. 131 A BIER domain may contain one or more sub-domains. Each BIER domain 132 MUST contain at least one sub-domain, the "default sub-domain" (also 133 denoted "sub-domain zero"). If a BIER domain contains more than one 134 sub-domain, each BFR in the domain MUST be provisioned to know the 135 set of sub-domains to which it belongs. Each sub-domain is 136 identified by a sub-domain-id in the range [0,255]. 138 For each sub-domain to which a given BFR belongs, if the BFR is 139 capable of acting as a BFIR or a BFER, it MUST be provisioned with a 140 "BFR-id" that is unique within the sub-domain. A BFR-id is a small 141 unstructured positive integer. For instance, if a particular BIER 142 sub-domain contains 1,374 BFRs, each one could be given a BFR-id in 143 the range 1-1374. 145 If a given BFR belongs to more than one sub-domain, it may (though it 146 need not) have a different BFR-id for each sub-domain. 148 When a multicast packet arrives from outside the domain at a BFIR, 149 the BFIR determines the set of BFERs to which the packet will be 150 sent. The BFIR also determines the sub-domain in which the packet 151 will be sent. Determining the sub-domain in which a given packet 152 will be sent is known as "assigning the packet to a sub-domain". 153 Procedures for choosing the sub-domain to which a particular packet 154 is assigned are outside the scope of this document. However, once a 155 particular packet has been assigned to a particular sub-domain, it 156 remains assigned to that sub-domain until it leaves the BIER domain. 157 That is, the sub-domain to which a packet is assigned MUST NOT be 158 changed while the packet is in flight through the BIER domain. 160 Once the BFIR determines sub-domain and the set of BFERs for a given 161 packet, the BFIR encapsulates the packet in a "BIER header". The 162 BIER header contains a bit string in which each bit represents a 163 single BFR-id. To indicate that a particular BFER is to receive a 164 given packet, the BFIR sets the bit corresponding to that BFER's 165 BFR-id in the sub-domain to which the packet has been assigned. We 166 will use term "BitString" to refer to the bit string field in the 167 BIER header. We will use the term "payload" to refer to the packet 168 that has been encapsulated. Thus a "BIER-encapsulated" packet 169 consists of a "BIER header" followed by a "payload". 171 The number of BFERs to which a given packet can be forwarded is 172 limited only by the length of the BitString in the BIER header. 173 Different deployments can use different BitString lengths. We will 174 use the term "BitStringLength" to refer to the number of bits in the 175 BitString. It is possible that some deployment will have more BFERs 176 in a given sub-domain than there are bits in the BitString. To 177 accommodate this case, the BIER encapsulation includes both the 178 BitString and a "Set Identifier" (SI). It is the BitString and the 179 SI together that determine the set of BFERs to which a given packet 180 will be delivered: 182 o by convention, the least significant (rightmost) bit in the 183 BitString is "bit 1", and the most significant (leftmost) bit is 184 "bit BitStringLength". 186 o if a BIER-encapsulated packet has an SI of n, and a BitString with 187 bit k set, then the packet must be delivered to the BFER whose 188 BFR-id (in the sub-domain to which the packet has been assigned) 189 is n*BitStringLength+k. 191 For example, suppose the BIER encapsulation uses a BitStringLength of 192 256 bits. By convention, the least significant (rightmost) bit is 193 "bit 1", and the most significant (leftmost) bit is "bit 256". 194 Suppose that a given packet has been assigned to sub-domain 0, and 195 needs to be delivered to three BFERs, where those BFERs have BFR-ids 196 in sub-domain 0 of 13, 126, and 235 respectively. The BFIR would 197 create a BIER encapsulation with the SI set to zero, and with bits 198 13, 126, and 235 of the BitString set. (All other bits of the 199 BitString would be clear.) If the packet also needs to be sent to a 200 BFER whose BFR-id is 257, the BFIR would have to create a second copy 201 of the packet, and the BIER encapsulation would specify an SI of 1, 202 and a BitString with bit 1 set and all the other bits clear. 204 Note that it is generally advantageous to assign the BFR-ids of a 205 given sub-domain so that as many BFERs as possible can be represented 206 in a single bit string. 208 Suppose a BFR, call it BFR-A, receives a packet whose BIER 209 encapsulation specifies an SI of 0, and a BitString with bits 13, 26, 210 and 235 set. Suppose BFR-A has two BFR neighbors, BFR-B and BFR-C, 211 such that the best path to BFERs 13 and 26 is via BFR-B, but the best 212 path to BFER 235 is via BFR-C. Then BFR-A will replicate the packet, 213 sending one copy to BFR-B and one copy to BFR-C. However, BFR-A will 214 clear bit 235 in the BitString of the packet copy it sends to BFR-B, 215 and will clear bits 13 and 26 in the BitString of the packet copy it 216 sends to BFR-C. As a result, BFR-B will forward the packet only 217 towards BFERs 13 and 26, and BFR-C will forward the packet only 218 towards BFER 235. This ensures that each BFER receives only one copy 219 of the packet. 221 With this forwarding procedure, a multicast data packet can follow an 222 optimal path from its BFIR to each of its BFERs. Further, since the 223 set of BFERs for a given packet is explicitly encoded into the BIER 224 header, the packet is not sent to any BFER that does not need to 225 receive it. This allows for optimal forwarding of multicast traffic. 226 This optimal forwarding is achieved without any need for transit BFRs 227 to maintain per-flow state, or to run a multicast tree-building 228 protocol. 230 The idea of encoding the set of egress nodes into the header of a 231 multicast packet is not new. For example, [Boivie_Feldman] proposes 232 to encode the set of egress nodes as a set of IP addresses, and 233 proposes mechanisms and procedures that are in some ways similar to 234 those described in the current document. However, since BIER encodes 235 each BFR-id as a single bit in a bit string, it can represent up to 236 128 BFERs in the same number of bits that it would take to carry the 237 IPv6 address of a single BFER. Thus BIER scales to a much larger 238 number of egress nodes per packet. 240 BIER does not require that each transit BFR look up the best path to 241 each BFER that is identified in the BIER header; the number of 242 lookups required in the forwarding path for a single packet can be 243 limited to the number of neighboring BFRs; this can be much smaller 244 than the number of BFERs. See Section 6 (especially Section 6.4) for 245 details. 247 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 248 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 249 document are to be interpreted as described in RFC 2119 [RFC2119]. 251 2. The BFR Identifier and BFR-Prefix 253 Each BFR MUST be assigned a "BFR-Prefix". A BFR's BFR-Prefix MUST be 254 an IP address (either IPv4 or IPv6) of the BFR, and MUST be unique 255 and routable within the BIER domain. It is RECOMMENDED that the 256 BFR-prefix be a loopback address of the BFR. Two BFRs in the same 257 BIER domain MUST NOT be assigned the same BFR-Prefix. Note that a 258 BFR in a given BIER domain has the same BFR-prefix in all the sub- 259 domains of that BIER domain. 261 A "BFR Identifier" (BFR-id) is a number in the range [1,65535]. In 262 general, each BFR in a given BIER sub-domain must be assigned a 263 unique number from this range (i.e., two BFRs in the same BIER sub- 264 domain MUST NOT have the same BFR-id in that sub-domain). However, 265 if it is known that a given BFR will never need to function as a BFER 266 or BFIR in a given sub-domain, then it is not necessary to assign a 267 BFR-id for that sub-domain to that BFR. 269 Note that the value 0 is not a legal BFR-id. 271 The procedure for assigning a particular BFR-id to a particular BFR 272 is outside the scope of this document. However, it is RECOMMENDED 273 that the BFR-ids for each sub-domain be assigned "densely" from the 274 numbering space, as this will result in a more efficient encoding 275 (see Section 3). That is, if there are 256 or fewer BFERs, it is 276 RECOMMENDED to assign all the BFR-ids from the range [1,256]. If 277 there are more than 256 BFERs, but less than 512, it is RECOMMENDED 278 to assign all the BFR-ids from the range [1,512], with as few "holes" 279 as possible in the earlier range. However, in some deployments, it 280 may be advantageous to depart from this recommendation; this is 281 discussed further in Section 3. 283 In some deployments, it may not be possible to support (in a given 284 sub-domain) the full range of 65535 BFR-ids. For example, if the 285 BFRs in a given sub-domain only support 16 SIs and if they only 286 support BitStringLengths of 256 or less, then only 16*256=4096 BFR- 287 ids can be supported in that sub-domain. 289 3. Encoding BFR Identifiers in BitStrings 291 To encode a BFR-id in a BIER data packet, one must convert the BFR-id 292 to an SI and a BitString. This conversion depends upon the parameter 293 we are calling "BitStringLength". The conversion is done as follows. 294 If the BFR-id is N, then 296 o SI is the integer part of the quotient (N-1)/BitStringLength 298 o The BitString has one bit position set. If the low-order bit is 299 bit 1, and the high-order bit is bit BitStringLength, the bit 300 position that represents BFR-id N is 301 ((N-1) modulo BitStringLength)+1. 303 If several different BFR-ids all resolve to the same SI, then all 304 those BFR-ids can be represented in a single BitString. The 305 BitStrings for all of those BFR-ids are combined using a bitwise 306 logical OR operation. 308 Within a given BIER domain (or even within a given BIER sub-domain), 309 different values of BitStringLength may be used. Each BFR MUST be 310 provisioned to know the following: 312 o the BitStringLength ("Imposition BitStringLength") and sub-domain 313 ("Imposition sub-domain") to use when it imposes (as a BFIR) a 314 BIER encapsulation on a particular set of packets, and 316 o the BitStringLengths ("Disposition BitStringLengths") that it will 317 process when (as a BFR or BFER) it receives packets from a 318 particular sub-domain. 320 It is not required that a BFIR use the same Imposition 321 BitStringLength or the same Imposition sub-domain for all packets on 322 which it imposes the BIER encapsulation. However, if a particular 323 BFIR is provisioned to use a particular Imposition BitStringLength 324 and a particular Imposition sub-domain when imposing the 325 encapsulation on a given set of packets, all other BFRs with BFR-ids 326 in that sub-domain SHOULD be provisioned to process received BIER 327 packets with that BitStringLength (i.e., all other BFRs with BFR-ids 328 in that sub-domain SHOULD be provisioned with that BitStringLength as 329 a Disposition BitStringLength for that sub-domain. Exceptions to 330 this rule MAY be made under certain conditions; this is discussed in 331 Section 6.10. 333 Every BFIR MUST be capable of being provisioned with an Imposition 334 BitStringLength of 256. Every BFR and BFER MUST be capable of being 335 provisioned with a Disposition BitStringLength of 256. 337 Particular BIER encapsulation types MAY allow other BitStringLengths 338 to be OPTIONALLY supported. For example, when using the 339 encapsulation specified in [MPLS_BIER_ENCAPS], a BFR may be capable 340 of being provisioned to use any or all of the following 341 BitStringLengths as Imposition BitStringLengths and as Disposition 342 BitStringLengths: 64, 128, 256, 512, 1024, 2048, and 4096. 344 If a BFR is capable of being provisioned with a given value of 345 BitStringLength as an Imposition BitStringLength, it MUST also be 346 capable of being provisioned with that same value as one of its 347 Disposition BitStringLengths. It SHOULD be capable of being 348 provisioned with all legal smaller values of BitStringLength as both 349 Imposition and Disposition BitStringLength. 351 In order to support transition from one BitStringLength to another, 352 every BFR MUST be capable of being provisioned to simultaneously use 353 two different Disposition BitStringLengths. 355 A BFR MUST support SI values in the range [0,15], and MAY support SI 356 values in the range [0,255]. ("Supporting the values in a given 357 range" means, in this context, that any value in the given range is 358 legal, and will be properly interpreted.) Note that for a given 359 BitStringLength, the total number of BFR-ids that can be represented 360 is the product of the BitStringLength and the number of supported 361 SIs. For example, if a deployment uses (in a given sub-domain) a 362 BitStringLength of 64 and supports 256 SIs, that deployment can only 363 support 16384 BFR-ids in that sub-domain. Even a deployment that 364 supports 256 SIs will not be able to support 65535 BFR-ids unless it 365 uses a BitStringLength of at least 256. 367 When a BFIR determines that a multicast data packet, assigned to a 368 given sub-domain, needs to be forwarded to a particular set of 369 destination BFERs, the BFIR partitions that set of BFERs into 370 subsets, where each subset contains the target BFERs whose BFR-ids in 371 the given sub-domain all resolve to the same SI. Call these the 372 "SI-subsets" for the packet. Each SI-subset can be represented by a 373 single BitString. The BFIR creates a copy of the packet for each 374 SI-subset. The BIER encapsulation is then applied to each packet. 375 The encapsulation specifies a single SI for each packet, and contains 376 the BitString that represents all the BFR-ids in the corresponding 377 SI-subset. Of course, in order to properly interpret the BitString, 378 it must be possible to infer the sub-domain-id from the encapsulation 379 as well. 381 Suppose, for example, that a BFIR determines that a given packet 382 needs to be forwarded to three BFERs, whose BFR-ids (in the 383 appropriate sub-domain) are 27, 235, and 497. The BFIR will have to 384 forward two copies of the packet. One copy, associated with SI=0, 385 will have a BitString with bits 27 and 235 set. The other copy, 386 associated with SI=1, will have a BitString with bit 241 set. 388 In order to minimize the number of copies that must be made of a 389 given multicast packet, it is RECOMMENDED that the BFR-ids be 390 assigned "densely" (see Section 2) from the numbering space. This 391 will minimize the number of SIs that have to be used in the domain. 392 However, depending upon the details of a particular deployment, other 393 assignment methods may be more advantageous. Suppose, for example, 394 that in a certain deployment, every multicast flow is either intended 395 for the "east coast" or for the "west coast". In such a deployment, 396 it would be advantageous to assign BFR-ids so that all the "west 397 coast" BFR-ids fall into the same SI-subset, and so that all the 398 "east coast" BFR-ids fall into the same SI-subset. 400 When a BFR receives a BIER data packet, it will infer the SI from the 401 encapsulation. The set of BFERs to which the packet needs to be 402 forwarded can then be inferred from the SI and the BitString. 404 In some of the examples given later in this document, we will use a 405 BitStringLength of 4, and will represent a BFR-id in the form 406 "SI:xyzw", where SI is the Set Identifier of the BFR-id (assuming a 407 BitStringLength of 4), and xyzw is a string of 4 bits. A 408 BitStringLength of 4 is used only in the examples; we would not 409 expect actual deployments to have such a small BitStringLength. 411 It is possible that several different forms of BIER encapsulation 412 will be developed. If so, the particular encapsulation that is used 413 in a given deployment will depend on the type of network 414 infrastructure that is used to realize the BIER domain. Details of 415 the BIER encapsulation(s) will be given in companion documents. An 416 encapsulation for use in MPLS networks is described in 417 [MPLS_BIER_ENCAPS] 419 4. Layering 421 It is helpful to think of the BIER architecture as consisting of 422 three layers: the "routing underlay", the "BIER layer", and the 423 "multicast flow overlay". 425 4.1. The Routing Underlay 427 The "routing underlay" establishes "adjacencies" between pairs of 428 BFRs, and determines one or more "best paths" from a given BFR to a 429 given set of BFRs. Each such path is a sequence of BFRs such that BFR(k+j) is "adjacent" to 431 BFR(k+j+1) (for 0<=j and BitStringLength can be done using the 549 advertisement capabilities of the IGP. For example, if a BIER domain 550 is also an OSPF domain, these advertisements can be done using the 551 OSPF "Opaque Link State Advertisement" (Opaque LSA) mechanism. 552 Details of the necessary extensions to OSPF and IS-IS will be 553 provided in companion documents. (See [OSPF_BIER_EXTENSIONS] and 554 [ISIS_BIER_EXTENSIONS].) 556 These advertisements enable each BFR to associate a given with a given BFR-prefix. As will be seen in 558 subsequent sections of this document, knowledge of this association 559 is an important part of the forwarding process. 561 Since each BFR needs to have a unique (in each sub-domain) BFR-id, 562 two different BFRs will not advertise ownership of the same unless there has been a provisioning error. 565 o If BFR-A determines that BFR-B and BFR-C have both advertised the 566 same BFR-id for the same sub-domain, BFR-A MUST log an error. 567 Suppose that the duplicate BFR-id is "N". When BFR-A is 568 functioning as a BFIR, it MUST NOT encode the BFR-id value N in 569 the BIER encapsulation of any packet that has been assigned to the 570 given sub-domain, even if it has determined that the packet needs 571 to be received by BFR-B and/or BFR-C. 573 This will mean that BFR-B and BFR-C cannot receive multicast 574 traffic at all in the given sub-domain until the provisioning 575 error is fixed. However, that is preferable to having them 576 receive each other's traffic. 578 o If BFR-A has been provisioned with BFR-id N for a particular sub- 579 domain, has not yet advertised its ownership of BFR-id N for that 580 sub-domain, but has received an advertisement from a different BFR 581 (say BFR-B) that is advertising ownership of BFR-id N for the same 582 sub-domain, then BFR-A SHOULD log an error, and MUST NOT advertise 583 its own ownership of BFR-id N for that sub-domain as long as the 584 advertisement from BFR-B is extant. 586 This procedure may prevent the accidental misconfiguration of a 587 new BFR from impacting an existing BFR. 589 If a BFR advertises that it has a BFR-id of 0 in a particular sub- 590 domain, other BFRs receiving the advertisement MUST interpret that 591 advertisement as meaning that the advertising BFR does not have a 592 BFR-id in that sub-domain. 594 6. BIER Intra-Domain Forwarding Procedures 596 This section specifies the rules for forwarding a BIER-encapsulated 597 data packet within a BIER domain. These rules are not intended to 598 specify an implementation strategy; to conform to this specification, 599 an implementation need only produce the same results that these rules 600 produce. 602 6.1. Overview 604 This section provides a brief overview of the BIER forwarding 605 procedures. Subsequent sub-sections specify the procedures in more 606 detail. 608 To forward a BIER-encapsulated packet: 610 1. Determine the packet's sub-domain. 612 2. Determine the packet's BitStringLength and BitString. 614 3. Determine the packet's SI. 616 4. From the sub-domain, the SI and the BitString, determine the set 617 of destination BFERs for the packet. 619 5. Using information provided by the routing underlay associated 620 with the packet's sub-domain, determine the next hop adjacency 621 for each of the destination BFERs. 623 6. It is possible that the packet's BitString will have one or more 624 bits that correspond to BFR-ids that are not in use. It is also 625 possible that the packet's BitString will have one or more bits 626 that correspond to BFERs that are unreachable, i.e., that have no 627 next hop adjacency. In the following, we will consider the "next 628 hop adjacency" for all such bit positions to be the "null" next 629 hop. 631 7. Partition the set of destination BFERs such that all the BFERs in 632 a single partition have the same next hop. We will say that each 633 partition is associated with a next hop. 635 8. For each partition: 637 a. Make a copy of the packet. 639 b. Clear any bit in the packet's BitString that identifies a 640 BFER that is not in the partition. 642 c. Transmit the packet to the associated next hop. (If the next 643 hop is the null next hop, the packet is discarded.) 645 If a BFR receives a BIER-encapsulated packet whose sub-domain, SI and 646 BitString identify that BFR itself, then the BFR is also a BFER for 647 that packet. As a BFER, it must pass the payload to the multicast 648 flow overlay. If the BitString has bits set for other BFRs, the 649 packet also needs to be forwarded further within the BIER domain. If 650 the BF(E)R also forwards one or more copies of the packet within the 651 BIER domain, the bit representing the BFR's own BFR-id MUST be clear 652 in all the copies. 654 When BIER on a BFER passes a packet to the multicast flow overlay, it 655 may need to provide contextual information obtained from the BIER 656 encapsulation. The information that needs to pass between the BIER 657 layer and the multicast flow overlay is specific to the multicast 658 flow overlay. Specification of the interaction between the BIER 659 layer and the multicast flow overlay is outside the scope of this 660 specification. 662 When BIER on a BFER passes a packet to the multicast flow overlay, 663 the overlay will determine how to further dispatch the packet. If 664 the packet needs to be forwarded into another BIER domain, then the 665 BFR will act as a BFER in one BIER domain and as a BFIR in another. 666 A BIER-encapsulated packet cannot pass directly from one BIER domain 667 to another; at the boundary between BIER domains, the packet must be 668 decapsulated and passed to the multicast flow overlay. 670 Note that when a BFR transmits multiple copies of a packet within a 671 BIER domain, only one copy will be destined to any given BFER. 672 Therefore it is not possible for any BIER-encapsulated packet to be 673 delivered more than once to any BFER. 675 6.2. BFR Neighbors 677 The "BFR Neighbors" (BFR-NBRs) of a given BFR, say BFR-A, are those 678 BFRs that, according to the routing underlay, are adjacencies of 679 BFR-A. Each BFR-NBR will have a BFR-prefix. 681 Suppose a BIER-encapsulated packet arrives at BFR-A. From the 682 packet's encapsulation, BFR-A learns the sub-domain of the packet, 683 and the BFR-ids (in that sub-domain) of the BFERs to which the packet 684 is destined. Then using the information advertised per Section 5, 685 BFR-A can find the BFR-prefix of each destination BFER. Given the 686 BFR-prefix of a particular destination BFER, say BFER-D, BFR-A learns 687 from the routing underlay (associated with the packet's sub-domain) 688 an IP address of the BFR that is the next hop on the path from BFR-A 689 to BFER-D. Let's call this next hop BFR-B. BFR-A must then 690 determine the BFR-prefix of BFR-B. (This determination can be made 691 from the information advertised per Section 5.) This BFR-prefix is 692 the BFR-NBR of BFR-A on the path from BFR-A to BFER-D. 694 Note that if the routing underlay provides multiple equal cost paths 695 from BFR-A to BFER-D, BFR-A may have multiple BFR-NBRs for BFER-D. 697 Under certain circumstances, a BFR may have adjacencies (in a 698 particular routing underlay) that are not BFRs. Please see 699 Section 6.9 for a discussion of how to handle those circumstances. 701 6.3. The Bit Index Routing Table 703 The Bit Index Routing Table (BIRT) is a table that maps from the 704 BFR-id (in a particular sub-domain) of a BFER to the BFR-prefix of 705 that BFER, and to the BFR-NBR on the path to that BFER. 707 ( A ) ------------ ( B ) ------------ ( C ) ------------ ( D ) 708 4 (0:1000) \ \ 1 (0:0001) 709 \ \ 710 ( E ) ( F ) 711 3 (0:0100) 2 (0:0010) 713 Figure 1: BIER Topology 1 715 As an example, consider the topology shown in Figure 1. In this 716 diagram, we represent the BFR-id of each BFR in the SI:xyzw form 717 discussed in Section 3. This topology will result in the BIRT of 718 Figure 2 at BFR-B. The first column shows the BFR-id as a number and 719 also (in parentheses) in the SI:BitString format that corresponds to 720 a BitStringLength of 4. (The actual minimum BitStringLength is 64, 721 but we use 4 in the examples.) 722 Note that a BIRT is specific to a particular BIER sub-domain. 724 -------------------------------------------- 725 | BFR-id | BFR-Prefix | BFR-NBR | 726 | (SI:BitString) | of Dest BFER | | 727 ============================================ 728 | 4 (0:1000) | A | A | 729 -------------------------------------------- 730 | 1 (0:0001) | D | C | 731 -------------------------------------------- 732 | 3 (0:0100) | E | E | 733 -------------------------------------------- 734 | 2 (0:0010) | F | C | 735 -------------------------------------------- 737 Figure 2: Bit Index Routing Table at BFR-B 739 6.4. The Bit Index Forwarding Table 741 The "Bit Index Forwarding Table" (BIFT) is derived from the BIRT as 742 follows. (Note that a BIFT is specific to a particular sub-domain.) 744 Suppose that several rows in the BIRT have the same SI and the same 745 BFR-NBR. By taking the logical OR of the BitStrings of those rows, 746 we obtain a bit mask that corresponds to that combination of SI and 747 BFR-NBR. We will refer to this bit mask as the "Forwarding Bit Mask" 748 (F-BM) for that combination. 750 For example, in Figure 2, we see that two of the rows have the same 751 SI (0) and same BFR-NBR (C). The Bit Mask that corresponds to is 0011 ("0001" OR'd with "0010"). 754 The BIFT is used to map from the BFR-id of a BFER to the 755 corresponding F-BM and BFR-NBR. For example, Figure 3 shows the BIFT 756 that is derived from the BIRT of Figure 2. Note that BFR-ids 1 and 2 757 have the same SI and the same BFR-NBR, hence they have the same F-BM. 759 ------------------------------------- 760 | BFR-id | F-BM | BFR-NBR | 761 | (SI:Bitstring) | | | 762 ===================================== 763 | 1 (0:0001) | 0011 | C | 764 ------------------------------------- 765 | 2 (0:0010) | 0011 | C | 766 ------------------------------------- 767 | 3 (0:0100) | 0100 | E | 768 ------------------------------------- 769 | 4 (0:1000) | 1000 | A | 770 ------------------------------------- 772 Figure 3: Bit Index Forwarding Table 774 This Bit Index Forwarding Table (BIFT) is programmed into the data- 775 plane and used to forward packets, applying the rules specified below 776 in Section 6.5. 778 6.5. The BIER Forwarding Procedure 780 Below is the procedure that a BFR uses for forwarding a BIER- 781 encapsulated packet. 783 1. Determine the packet's SI, BitStringLength, and sub-domain. 785 2. If the BitString consists entirely of zeroes, discard the packet; 786 the forwarding process has been completed. Otherwise proceed to 787 step 3. 789 3. Find the position, call it "k", of the least significant (i.e., 790 of the rightmost) bit that is set in the packet's BitString. 791 (Remember, bits are numbered from 1, starting with the least 792 significant bit.) 794 4. If bit k identifies the BFR itself, copy the packet, and send the 795 copy to the multicast flow overlay. Then clear bit k in the 796 original packet, and go to step 2. Otherwise, proceed to step 5. 798 5. Use the value k, together with the SI, sub-domain, and 799 BitStringLength, as the 'index' into the BIFT. 801 6. Extract from the BIFT the F-BM and the BFR-NBR. 803 7. Copy the packet. Update the copy's BitString by AND'ing it with 804 the F-BM (i.e., PacketCopy->BitString &= F-BM). Then forward the 805 copy to the BFR-NBR. (If the BFR-NBR is null, the copy is just 806 discarded.) Note that when a packet is forwarded to a particular 807 BFR-NBR, its BitString identifies only those BFERs that are to be 808 reached via that BFR-NBR. 810 8. Now update the original packet's BitString by AND'ing it with the 811 INVERSE of the F-BM (i.e., Packet->Bitstring &= ~F-BM). (This 812 clears the bits that identify the BFERs to which a copy of the 813 packet has just been forwarded.) Go to step 2. 815 Note that this procedure causes the packet to be forwarded to a 816 particular BFR-NBR only once. The number of lookups in the BIFT is 817 the same as the number of BFR-NBRs to which the packet must be 818 forwarded; it is not necessary to do a separate lookup for each 819 destination BFER. 821 Suppose it has been decided (by the above rules) to send a packet to 822 a particular BFR-NBR. If that BFR-NBR is connected via multiple 823 parallel interfaces, it may be desirable to apply some form of load 824 balancing. Load balancing algorithms are outside the scope of this 825 document. However, if the packet's encapsulation contains an 826 "entropy" field, the entropy field SHOULD be respected; two packets 827 with the same value of the entropy field SHOULD be sent on the same 828 interface (if possible). 830 In some cases, the routing underlay may provide multiple equal cost 831 paths (through different BFR-NBRs) to a given BFER. This is known as 832 "Equal Cost Multiple Paths" (ECMP). The procedures described in this 833 section must be augmented in order to support load balancing over 834 ECMP. The necessary augmentations can be found in Section 6.7. 836 In the event that unicast traffic to the BFR-NBR is being sent via a 837 "bypass tunnel" of some sort, the BIER-encapsulated multicast traffic 838 send to the BFR-NBR SHOULD also be sent via that tunnel. This allows 839 any existing "Fast Reroute" schemes to be applied to multicast 840 traffic as well as to unicast traffic. 842 Some examples of these forwarding procedures can be found in 843 Section 6.6. 845 The rules given in this section can be represented by the following 846 pseudocode: 848 void ForwardBitMaskPacket (Packet) 849 { 850 SI=GetPacketSI(Packet); 851 Offset=SI*BitStringLength; 852 for (Index = GetFirstBitPosition(Packet->BitString); Index ; 853 Index = GetNextBitPosition(Packet->BitString, Index)) { 854 F-BM = BIFT[Index+Offset]->F-BM; 855 if (!F-BM) continue; 856 BFR-NBR = BIFT[Index+Offset]->BFR-NBR; 857 PacketCopy = Copy(Packet); 858 PacketCopy->BitString &= F-BM; 859 PacketSend(PacketCopy, BFR-NBR); 860 Packet->BitString &= ~F-BM; 861 } 862 } 864 Figure 4: Pseudocode 866 This pseudocode assumes that at a given BFER, the BFR-NBR entry 867 corresponding to the BFER's own BFR-id will be the BFER's own 868 BFR-prefix. It also assumes that the corresponding F-BM has only one 869 bit set, the bit representing the BFER itself. In this case, the 870 "PacketSend" function sends the packet to the multicast flow overlay. 872 This pseudocode also assumes that the F-BM for the null next hop 873 contains a 1 in a given bit position if and only if that bit position 874 corresponds either to an unused BFR-id or to an unreachable BFER. 875 When the BFR-NBR is null, the "PacketSend" function discards the 876 packet. 878 6.6. Examples of BIER Forwarding 880 In this section, we give two examples of BIER forwarding, based on 881 the topology in Figure 1. In these examples, all packets have been 882 assigned to the default sub-domain, all packets have SI=0, and the 883 BitStringLength is 4. Figure 5 shows the BIFT entries for SI=0 only. 884 For compactness, we show the first column of the BIFT, the BFR-id, 885 only as an integer. 887 BFR-A BIFT BFR-B BIFT BFR-C BIFT 888 ------------------- ------------------- ------------------- 889 | Id | F-BM | NBR | | Id | F-BM | NBR | | Id | F-BM | NBR | 890 =================== =================== =================== 891 | 1 | 0111 | B | | 1 | 0011 | C | | 1 | 0001 | D | 892 ------------------- ------------------- ------------------- 893 | 2 | 0111 | B | | 2 | 0011 | C | | 2 | 0010 | F | 894 ------------------- ------------------- ------------------- 895 | 3 | 0111 | B | | 3 | 0100 | E | | 3 | 1100 | B | 896 ------------------- ------------------- ------------------- 897 | 4 | 1000 | A | | 4 | 1000 | A | | 4 | 1100 | B | 898 ------------------- ------------------- ------------------- 900 Figure 5: BIFTs for Forwarding Examples 902 6.6.1. Example 1 904 BFR-D, BFR-E and BFR-F are BFER's. BFR-A is the BFIR. Suppose that 905 BFIR-A has learned from the multicast flow overlay that BFER-D is 906 interested in a given multicast flow. If BFIR-A receives a packet of 907 that flow from outside the BIER domain, BFIR-A applies the BIER 908 encapsulation to the packet. The encapsulation must be such that the 909 SI is zero. The encapsulation also includes a BitString, with just 910 bit 1 set and with all other bits clear (i.e., 0001). This indicates 911 that BFER-D is the only BFER that needs to receive the packet. Then 912 BFIR-A follows the procedures of Section 6.5: 914 o Since the packet's BitString is 0001, BFIR-A finds that the first 915 bit in the string is bit 1. Looking at entry 1 in its BIFT, BFR-A 916 determines that the bit mask F-BM is 0111 and the BFR-NBR is 917 BFR-B. 919 o BFR-A then makes a copy of the packet, and applies F-BM to the 920 copy: Copy->BitString &= 0111. The copy's Bitstring is now 0001 921 (0001 & 0111). 923 o The copy is now sent to BFR-B. 925 o BFR-A then updates the packet's BitString by applying the inverse 926 of the F-BM: Packet->Bitstring &= ~F-BM. As a result, the 927 packet's BitString is now 0000 (0001 & 1000). 929 o As the packet's BitString is now zero, the forwarding procedure is 930 complete. 932 When BFR-B receives the multicast packet from BFR-A, it follows the 933 same procedure. The result is that a copy of the packet, with a 934 BitString of 0001, is sent to BFR-C. BFR-C applies the same 935 procedures, and as a result sends a copy of the packet, with a 936 BitString of 0001, to BFR-D. 938 At BFER-D, the BIFT entry (not pictured) for BFR-id 1 will specify an 939 F-BM of 0000 and a BFR-NBR of BFR-D itself. This will cause a copy 940 of the packet to be delivered to the multicast flow overlay at BFR-D. 941 The packet's BitString will be set to 0000, and the packet will not 942 be forwarded any further. 944 6.6.2. Example 2 946 This example is similar to Example 1, except that BFIR-A has learned 947 from the multicast flow overlay that both BFER-D and BFER-E are 948 interested in a given multicast flow. If BFIR-A receives a packet of 949 that flow from outside the BIER domain, BFIR-A applies the BIER 950 encapsulation to the packet. The encapsulation must be such that the 951 SI is zero. The encapsulation also includes a BitString with two 952 bits set: bit 1 is set (as in example 1) to indicate that BFR-D is a 953 BFER for this packet, and bit 3 is set to indicate that BFR-E is a 954 BFER for this packet. I.e., the BitString (assuming again a 955 BitStringLength of 4) is 0101. To forward the packet, BFIR-A follows 956 the procedures of Section 6.5: 958 o Since the packet's BitString is 0101, BFIR-A finds that the first 959 bit in the string is bit 1. Looking at entry 1 in its BIFT, BFR-A 960 determines that the bit mask F-BM is 0111 and the BFR-NBR is 961 BFR-B. 963 o BFR-A then makes a copy of the packet, and applies the F-BM to the 964 copy: Copy->BitString &= 0111. The copy's Bitstring is now 0101 965 (0101 & 0111). 967 o The copy is now sent to BFR-B. 969 o BFR-A then updates the packet's BitString by applying the inverse 970 of the F-BM: Packet->Bitstring &= ~F-BM. As a result, the 971 packet's BitString is now 0000 (0101 & 1000). 973 o As the packet's BitString is now zero, the forwarding procedure is 974 complete. 976 When BFR-B receives the multicast packet from BFR-A, it follows the 977 procedure of Section 6.5, as follows: 979 o Since the packet's BitString is 0101, BFR-B finds that the first 980 bit in the string is bit 1. Looking at entry 1 in its BIFT, BFR-B 981 determines that the bit mask F-BM is 0011 and the BFR-NBR is 982 BFR-C. 984 o BFR-B then makes a copy of the packet, and applies the F-BM to the 985 copy: Copy->BitString &= 0011. The copy's Bitstring is now 0001 986 (0101 & 0011). 988 o The copy is now sent to BFR-C. 990 o BFR-B then updates the packet's BitString by applying the inverse 991 of the F-BM: Packet->Bitstring &= F-BM. As a result, the 992 packet's BitString is now 0100 (0101 & 1100). 994 o Now BFR-B finds the next bit in the packet's (modified) BitString. 995 This is bit 3. Looking at entry 3 in its BIFT, BFR-B determines 996 that the F-BM is 0100 and the BFR-NBR is BFR-E. 998 o BFR-B then makes a copy of the packet, and applies the F-BM to the 999 copy: Copy->BitString &= 0100. The copy's Bitstring is now 0100 1000 (0100 & 0100). 1002 o The copy is now sent to BFR-E. 1004 o BFR-B then updates the packet's BitString by applying the inverse 1005 of the F-BM: Packet->Bitstring &= ~F-BM. As a result, the 1006 packet's BitString is now 0000 (0100 & 1011). 1008 o As the packet's BitString is now zero, the forwarding procedure is 1009 complete. 1011 Thus BFR-B forwards two copies of the packet. One copy of the 1012 packet, with BitString 0001, has now been sent from BFR-B to BFR-C. 1013 Following the same procedures, BFR-C will forward the packet to 1014 BFER-D. 1016 At BFER-D, the BIFT entry (not pictured) for BFR-id 1 will specify an 1017 F-BM of 0000 and a BFR-NBR of BFR-D itself. This will cause a copy 1018 of the packet to be delivered to the multicast flow overlay at BFR-D. 1019 The packet's BitString will be set to 0000, and the packet will not 1020 be forwarded any further. 1022 The other copy of the packet has been sent from BFR-B to BFER-E, with 1023 BitString 0100. 1025 At BFER-E, the BIFT entry (not pictured) for BFR-id 3 will specify an 1026 F-BM of 0000 and a BFR-NBR of BFR-E itself. This will cause a copy 1027 of the packet to be delivered to the multicast flow overlay at BFR-E. 1028 The packet's BitString will be set to 0000, and the packet will not 1029 be forwarded any further. 1031 6.7. Equal Cost Multi-path Forwarding 1033 In many networks, the routing underlay will provide multiple equal 1034 cost paths from a given BFR to a given BFER. When forwarding 1035 multicast packets through the network, it can be beneficial to take 1036 advantage of this by load balancing among those paths. This feature 1037 is known as "equal cost multiple path forwarding", or "ECMP". 1039 BIER supports ECMP, but the procedures of Section 6.5 must be 1040 modified slightly. Two ECMP procedures are defined. In the first 1041 (described in Section 6.7.1), the choice among equal-cost paths taken 1042 by a given packet from a given BFR to a given BFER depends on (a) the 1043 packet's entropy, and (b) the other BFERs to which that packet is 1044 destined. In the second (described in Section 6.7.2), the choice 1045 depends only upon the packet's entropy. 1047 There are tradeoffs between the two forwarding procedures described 1048 here. In the procedure of Section 6.7.1, the number of packet 1049 replications is minimized. The procedure in Section 6.7.1 also uses 1050 less memory in the BFR. In the procedure of Section 6.7.2, the path 1051 traveled by a given packet from a given BFR to a given BFER is 1052 independent of the other BFERs to which the packet is destined. 1053 While the procedures of Section 6.7.2 may cause more replications, 1054 they provide a more predictable behavior. 1056 The two procedures described here operate on identical packet formats 1057 and will interoperate correctly. However, if deterministic behavior 1058 is desired, then all BFRs would need to use the procedure from 1059 Section 6.7.2. 1061 6.7.1. Non-deterministic ECMP 1063 Figure 6 shows the operation of non-deterministic ECMP in BIER. 1065 BFR-A BIFT BFR-B BIFT BFR-C BIFT 1066 ------------------- ------------------- ------------------- 1067 | Id | F-BM | NBR | | Id | F-BM | NBR | | Id | F-BM | NBR | 1068 =================== =================== =================== 1069 | 1 | 0111 | B | | 1 | 0011 | C | | 1 | 0001 | D | 1070 ------------------- ------------------- ------------------- 1071 | 2 | 0111 | B | | 2 | 0011 | C | | 2 | 0010 | F | 1072 ------------------- | | 0110 | E | ------------------- 1073 | 3 | 0111 | B | ------------------- | 3 | 1100 | B | 1074 ------------------- | 3 | 0110 | E | ------------------- 1075 | 4 | 1000 | A | ------------------| | 4 | 1100 | B | 1076 ------------------- | 4 | 1000 | A | ------------------- 1077 ------------------- 1079 ( A ) ------------ ( B ) ------------ ( C ) ------------ ( D ) 1080 4 (0:1000) \ \ 1 (0:0001) 1081 \ \ 1082 ( E ) ------------ ( F ) 1083 3 (0:0100) 2 (0:0010) 1085 Figure 6: Example of ECMP 1087 In this example, BFR-B has two equal cost paths to reach BFER-F, one 1088 via BFR-C and one via BFR-E. Since the BFR-id of BFER-F is 2, this 1089 is reflected in entry 2 of BFR-B's BIFT. Entry 2 shows that BFR-B 1090 has a choice of two BFR-NBRs for BFER-B, and that a different F-BM is 1091 associated with each choice. When BFR-B looks up entry 2 in the 1092 BIFT, it can choose either BFR-NBR. However, when following the 1093 procedures of Section 6.5, it MUST use the F-BM corresponding to the 1094 BFR-NBR that it chooses. 1096 How the choice is made is an implementation matter. However, the 1097 usual rules for ECMP apply: packets of a given flow SHOULD NOT be 1098 split among two paths, and any "entropy" field in the packet's 1099 encapsulation SHOULD be respected. 1101 Note however that by the rules of Section 6.5, any packet destined 1102 for both BFER-D and BFER-F will be sent via BFR-C. 1104 6.7.2. Deterministic ECMP 1106 With the procedures of Section 6.7.1, where ECMP paths exist, the 1107 path a packet takes to reach any particular BFER depends not only on 1108 routing and on the packet's entropy, but also on the set of other 1109 BFERs to which the packet is destined. 1111 For example consider the following scenario in the network of 1112 Figure 6. 1114 o There is a sequence of packets being transmitted by BFR-A, some of 1115 which are destined for both D and F, and some of which are 1116 destined only for F. 1118 o All the packets in this sequence have the same entropy value, call 1119 it "Q". 1121 o At BFR-B, when a packet with entropy value Q is forwarded via 1122 entry 2 in the BIFT, the packet is sent to E. 1124 Using the forwarding procedure of Section 6.7.1, packets of this 1125 sequence that are destined for both D and F are forwarded according 1126 to entry 1 in the BIFT, and thus will reach F via the path A-B-C-F. 1127 However, packets of this sequence that are destined only for F are 1128 forwarded according to entry 2 in the BIFT, and thus will reach F via 1129 the path A-B-E-F. 1131 That procedure minimizes the number of packets transmitted by BFR B. 1132 However, consider the following scenario: 1134 o Beginning at time t0, the multicast flow in question needs to be 1135 received ONLY by BFER-F; 1137 o Beginning at a later time, t1, the flow needs to be received by 1138 both BFER-D and BFER-F. 1140 o Beginning at a later time, t2, the no longer needs to be received 1141 by D, but still needs to be received by F. 1143 Then from t0 until t1, the flow will travel to F via the path 1144 A-B-E-F. From t1 until t2, the flow will travel to F via the path 1145 A-B-C-F. And from t2, the flow will again travel to F via the path 1146 A-B-E-F. 1148 The problem is that if D repeatedly joins and leaves the flow, the 1149 flow's path from B to F will keep switching. This could cause F to 1150 receive packets out of order. It also makes troubleshooting 1151 difficult. For example, if there is some problem on the E-F link, 1152 receivers at F will get good service when the flow is also going to D 1153 (avoiding the E-F link), but bad service when the flow is not going 1154 to D. Since it is hard to know which path is being used at any given 1155 time, this may be hard to troubleshoot. Also, it is very difficult 1156 to perform a traceroute that is known to follow the path taken by the 1157 flow at any given time. 1159 The source of this difficulty is that, in the procedures of 1160 Section 6.7.1, the path taken by a particular flow to a particular 1161 BFER depends upon whether there are lower numbered BFERs that are 1162 also receiving the flow. Thus the choice among the ECMP paths is 1163 fundamentally non-deterministic. 1165 Deterministic forwarding can be achieved by using multiple BIFTs, 1166 such that each row in a BIFT has only one path to each destination, 1167 but the multiple ECMP paths to any particular destination are spread 1168 across the multiple tables. When a BIER-encapsulated packet arrives 1169 to be forwarded, the BFR uses a hash of the BIER Entropy field to 1170 determine which BIFT to use, and then the normal BIER forwarding 1171 algorithm (as described in Sections 6.5 and 6.6) is used with the 1172 selected BIFT. 1174 As an example, suppose there are two paths to destination X (call 1175 them X1 and X2), and four paths to destination Y (call them Y1, Y2, 1176 Y3, and Y4). If there are, say, four BIFTs, one BIFT would have 1177 paths X1 and Y1, one would have X1 and Y2, one would have X2 and Y3, 1178 and one would have X2 and Y4. If traffic to X is split evenly among 1179 these four BIFTs, the traffic will be split evenly between the two 1180 paths to X; if traffic to Y is split evenly among these four BIFTs, 1181 the traffic will be split evenly between the four paths to Y. 1183 Note that if there are three paths to one destination and four paths 1184 to another, 12 BIFTs would be required in order to get even splitting 1185 of the load to each of those two destinations. Of course, each BIFT 1186 uses some memory, and one might be willing to have less optimal 1187 splitting in order to have fewer BIFTs. How that tradeoff is made is 1188 an implementation or deployment decision. 1190 6.8. Prevention of Loops and Duplicates 1192 The BitString in a BIER-encapsulated packet specifies the set of 1193 BFERs to which that packet is to be forwarded. When a BIER- 1194 encapsulated packet is replicated, no two copies of the packet will 1195 ever have a BFER in common. If one of the packet's BFERs forwards 1196 the packet further, that will first clear the bit that identifies 1197 itself. As a result, duplicate delivery of packets is not possible 1198 with BIER. 1200 As long as the routing underlay provides a loop free path between 1201 each pair of BFRs, BIER-encapsulated packets will not loop. Since 1202 the BIER layer does not create any paths of its own, there is no need 1203 for any BIER-specific loop prevention techniques beyond the 1204 forwarding procedures specified in Section 6.5. 1206 If, at some time, the routing underlay is not providing a loop free 1207 path between BFIR-A and BFER-B, then BIER encapsulated packets may 1208 loop while traveling from BFIR-A to BFER-B. However, such loops will 1209 never result in delivery of duplicate packets to BFER-B. 1211 These properties of BIER eliminate the need for the "reverse path 1212 forwarding" (RPF) check that is used in conventional IP multicast 1213 forwarding. 1215 6.9. When Some Nodes do not Support BIER 1217 The procedures of section Section 6.2 presuppose that, within a given 1218 BIER domain, all the nodes adjacent to a given BFR in a given routing 1219 underlay are also BFRs. However, it is possible to use BIER even 1220 when this is not the case, as long as the ingress and egress nodes 1221 are BFRs. In this section, we assume that the routing underlay is an 1222 SPF-based IGP that computes a shortest path tree from each node to 1223 all other nodes in the domain. 1225 At a given BFR, say BFR B, start with a copy of the IGP-computed 1226 shortest path tree from BFR B to each router in the domain. (This 1227 tree is computed by the SPF algorithm of the IGP.) Let's call this 1228 copy the "BIER-SPF tree rooted at BFR B." BFR B then modifies this 1229 BIER-SPF tree as follows. 1231 1. BFR B looks in turn at each of B's child nodes on the BIER-SPF 1232 tree. 1234 2. If one of the child nodes does not support BIER, BFR B removes 1235 that node from the tree. The child nodes of the node that has 1236 just been removed are then re-parented on the tree, so that BFR B 1237 now becomes their parent. 1239 3. BFR B then continues to look at each of its child nodes, 1240 including any nodes that have been re-parented to B as a result 1241 of the previous step. 1243 When all of the child nodes (the original child nodes plus any new 1244 ones) have been examined, B's children on the BIER-SPF tree will all 1245 be BFRs. 1247 When the BIFT is constructed, B's child nodes on the BIER-SPF tree 1248 are considered to be the BFR-NBRs. The F-BMs and outgoing BIER-MPLS 1249 labels must be computed appropriately, based on the BFR-NBRs. 1251 B may now have BFR-NBRs that are not "directly connected" to B via 1252 layer 2. To send a packet to one of these BFR-NBRs, B will have to 1253 send the packet through a unicast tunnel. This may be as simple as 1254 finding the IGP unicast next hop to the child node, and pushing on 1255 (above the BIER-MPLS label advertised by the child) the MPLS label 1256 that the IGP next hop has bound to an address of the child node. (If 1257 for some reason the unicast tunnel cannot be an MPLS tunnel, any 1258 other kind of tunnel can be used, as long as it is possible to 1259 encapsulate MPLS within that kind of tunnel.) 1261 Of course, the above is not meant as an implementation technique, 1262 just as a functional description. 1264 While the above description assumes that the routing underlay 1265 provides an SPF tree, it may also be applicable to other types of 1266 routing underlay. 1268 Note that the technique above can also be used to provide "node 1269 protection" (i.e., to provide fast reroute around nodes that are 1270 believed to have failed). If BFR B has a failed BFR-NBR, B can 1271 remove the failed BFR-NBR from the BIER-SPF tree, and can then re- 1272 parent the child BFR-NBRs of the failed BFR-NBR so that they appear 1273 to be B's own child nodes on the tree (i.e., so that they appear to 1274 be B's BFR-NBRs). Then the usual BIER forwarding procedures apply. 1275 However, getting the packet from B to the child nodes of the failed 1276 BFR-NBR is a bit more complicated, as it may require using a unicast 1277 bypass tunnel to get around the failed node. 1279 A simpler variant of step 2 above would be the following: 1281 If one of the child nodes does not support BIER, BFR B removes 1282 that node from the tree. All BFERs that are reached through that 1283 child node are then re-parented on the tree, so that BFR B now 1284 becomes their parent. 1286 This variant is simpler because the set of BFERs that are reached 1287 through a particular child node of B can be determined from the F-BM 1288 in the BIFT. However, if this variant is used, the results are less 1289 optimal, because packets will be unicast directly from B to the BFERs 1290 that are reachable through the non-BIER child node. 1292 When using a unicast MPLS tunnel to get a packet to a BFR-NBR: 1294 o the TTL of the MPLS label entry representing the "tunnel" SHOULD 1295 be set to a large value, rather than being copied from the TTL 1296 value from the BIER-MPLS label, and 1298 o when the tunnel labels are popped off, the TTL from the tunnel 1299 labels SHOULD NOT be copied to the BIER-MPLS label. 1301 In other words, the TTL processing for the tunnel SHOULD be as 1302 specified in [RFC3443] for "Pipe Model" and "Short Pipe Model" LSPs. 1303 That way, the TTL of the BIER-MPLS label constrains only the number 1304 of BFRs that the packet may traverse, not the total number of hops. 1306 The material in this section presupposes that a given node is either 1307 a BFR or not, and that a BFR supports BIER on all its interfaces. It 1308 is however possible that a router will have some line cards that 1309 support BIER and some that do not. In such a case, one can think of 1310 the router as a "partial-BFR", that supports BIER only on some of its 1311 interfaces. If it is desired to deploy such partial-BFRs, one can 1312 use the multi-topology features of the IGP to set up a BIER-specific 1313 topology. This topology would exclude all the non-BIER-capable 1314 interfaces that attach to BFRs. BIER would then have to be run in a 1315 sub-domain that is bound to this topology. If unicast tunnels are 1316 used to bypass non-BFRs, either the tunnels have to be restricted to 1317 this topology, or the tunnel endpoints have to be BFRs that do not 1318 have any non-BIER-capable interfaces. 1320 6.10. Use of Different BitStringLengths within a Domain 1322 It is possible for different BFRs within a BIER domain to be using 1323 different Imposition and/or Disposition BitStringLengths. As stated 1324 in Section 3: 1326 "if a particular BFIR is provisioned to use a particular 1327 Imposition BitStringLength and a particular Imposition sub-domain 1328 when imposing the encapsulation on a given set of packets, all 1329 other BFRs with BFR-ids in that sub-domain SHOULD be provisioned 1330 to process received BIER packets with that BitStringLength (i.e., 1331 all other BFRs with BFR-ids in that sub-domain SHOULD be 1332 provisioned with that BitStringLength as a Disposition 1333 BitStringLength for that sub-domain)." 1335 Note that mis-provisioning can result in "black holes". If a BFIR 1336 creates a BIER packet with a particular BitStringLength, and if that 1337 packet needs to travel through a BFR that cannot process received 1338 BIER packets with that BitStringLength, then it may be impossible to 1339 forward the packet to all of the BFERs identified in its BIER header. 1340 Section 6.10.1 defines a procedure, the "BitStringLength 1341 Compatibility Check", that can be used to detect the possibility of 1342 such black holes. 1344 However, failure of the BitStringLength Compatibility Check does not 1345 necessarily result in the creation of black holes; Section 6.10.2 1346 specifies OPTIONAL procedures that allow BIER forwarding to proceed 1347 without black holes, even if the BitStringLength Compatibility Check 1348 fails. 1350 If the procedures of Section 6.10.2 are not deployed, but the 1351 BitStringLength Compatibility Check fails at some BFIR, the BFIR has 1352 two choices: 1354 o Create BIER packets with the provisioned Imposition 1355 BitStringLength, even though the packets may not be able to reach 1356 all the BFERs identified in their BitStrings 1358 o Use an Imposition BitStringLength that passes the Compatibility 1359 Check (assuming that there is one), even if this is not the 1360 provisioned Imposition BitStringLength. 1362 Section 6.10.1 discusses the implications of making one or the other 1363 of these choices. 1365 There will be times when an operator wishes to change the 1366 BitStringLengths used in a particular BIER domain. Section 6.10.3 1367 specifies a simple procedure that can be used to transition a BIER 1368 domain from one BitStringLength to another. 1370 6.10.1. BitStringLength Compatibility Check 1372 When a BFIR needs to encapsulate a packet, the BFIR first assigns the 1373 packet to a sub-domain. Then the BFIR chooses an Imposition 1374 BitStringLength L for the packet. The choice of Imposition 1375 BitStringLength is by provisioning. However, the BFIR should also 1376 perform the BitStringLength Compatibility Check defined below. 1378 The combination of Sub-Domain S and Imposition BitStringLength L 1379 passes the BitStringLength Compatibility Check if and only if the 1380 following condition holds: 1382 Every BFR that has advertised its membership in sub-domain S has 1383 also advertised that it is using Disposition BitStringLength L 1384 (and possibly other BitStringLengths as well) in that Sub-Domain. 1385 (If the MPLS encapsulation [MPLS_BIER_ENCAPS] is being used, this 1386 means that every BFR that is advertising a label for Sub-Domain S 1387 is advertising a label for the combination of Sub-Domain S and 1388 Disposition BitStringLength L.) 1390 If a BFIR has been provisioned to use a particular Imposition 1391 BitStringLength and a particular sub-domain for some set of packets, 1392 and if that combination of Imposition BitStringLength and sub-domain 1393 does not pass the BitStringLength Compatibility Check, the BFIR 1394 SHOULD log this fact as an error. It then has the following choice 1395 about what to do with the packets: 1397 1. The BFIR MAY use the provisioned Imposition BitStringLength 1398 anyway. If the procedure Paragraph 2 or Paragraph 3 of 1399 Section 6.10.2 are deployed, this will not cause black holes, and 1400 may actually be the optimal result. It should be understood 1401 though that the BFIR cannot determine by signaling whether those 1402 procedures have been deployed. 1404 2. If the BFIR is capable of using an Imposition BitStringlength 1405 that does pass the BitStringLength Compatibility Check for the 1406 particular sub-domain, the BFIR MAY use that Imposition 1407 BitStringLength instead. 1409 Which of these two choices to make is itself determined by 1410 provisioning. 1412 Note that discarding the packets is not one of the allowable choices. 1413 Suppose, for example, that all the BFIRs are provisioned to use 1414 Imposition BitStringLength L for a particular sub-domain S, but one 1415 BFR has not been provisioned to use Disposition BitStringLength L for 1416 sub-domain S. This will cause the BitStringLength Compatibility 1417 Check to fail. If the BFIR sends packets with BitStringLength L and 1418 sub-domain S, the mis-provisioned BFR will not be able to forward 1419 those packets, and thus the packets may only be able to reach a 1420 subset of the BFERs to which they are destined. However, this is 1421 still better than having the BFIRs drop the packets; if the BFIRs 1422 discard the packets, the packets won't reach any of the BFERs to 1423 which they are destined at all. 1425 If the procedures of Section 6.10.2 have not been deployed, choice 2 1426 might seem like a better option. However, there might not be any 1427 Imposition BitStringLength that a given BFIR can use that also passes 1428 the BitStringLength Compatibility Check. If it is desired to use 1429 choice 2 in a particular deployment, then there should be a "Fallback 1430 Disposition BitStringLength", call it F, such that: 1432 o Every BFR advertises that it uses BitStringLength F as a 1433 Disposition BitStringLength for every sub-domain, and 1435 o If a BFIR is provisioned to use Imposition BitStringLength X and 1436 Imposition sub-domain S for a certain class of packets, but the 1437 BitStringLength Compatibility check fails for the combination of 1438 BitStringLength X and sub-domain S, then the BFIR will fall back 1439 to using BitStringLength F as the Imposition BitStringLength 1440 whenever the Imposition sub-domain is S. 1442 This fallback procedure will work best if the value of F is 1443 established by the architecture, rather than by provisioning. 1445 6.10.2. Handling BitStringLength Mismatches 1447 Suppose a packet has been BIER-encapsulated with a BitStringLength 1448 value of X, and that the packet has arrived at BFR-A. Now suppose 1449 that according to the routing underlay, the next hop is BFR-B, but 1450 BFR-B is not using X as one of its Disposition BitStringLengths. 1451 What should BFR-A do with the packet? BFR-A has three options. It 1452 MUST do one of the three, but the choice of which procedure to follow 1453 is a local matter. The three options are: 1455 1. BFR-A MAY discard the packet. 1457 2. BFR-A MAY re-encapsulate the packet, using a BIER header whose 1458 BitStringLength value is supported by BFR-B. 1460 Note that if BFR-B only uses Disposition BitStringLength values 1461 that are smaller than the BitStringLength value of the packet, 1462 this may require creating additional copies of the packet. 1463 Whether additional copies actually have to be created depends 1464 upon the bits that are actually set in the original packet's 1465 BitString. 1467 3. BFR-A MAY treat BFR-B as if BFR-B did not support BIER at all, 1468 and apply the rules of Section 6.9. 1470 Note that there is no signaling that enables a BFR to advertise which 1471 of the three options it will use. 1473 Option 2 can be useful if there is a region of the BIER domain where 1474 the BFRs are capable of using a long BitStringLength, and a region 1475 where the BFRs are only capable of using a shorter BitStringLength. 1477 6.10.3. Transitioning from One BitStringLength to Another 1479 Suppose one wants to migrate the BitStringLength used in a particular 1480 BIER domain from one value (X) to another value (Y). The following 1481 migration procedure can be used. This procedure allows the BFRs to 1482 be reprovisioned one at a time, and does not require a "flag day". 1484 First, upgrade all the BFRs in the domain so that they use both value 1485 X and value Y as their Disposition BitStringLengths. Once this is 1486 done, reprovision the BFIRs so that they use BitStringLength value Y 1487 as the Imposition BitStringLength. Once that is done, one may 1488 optionally reprovision all the BFRs so that they no longer use 1489 Dispostion BitStringLength X. 1491 7. IANA Considerations 1493 This document contains no actions for IANA. 1495 8. Security Considerations 1497 When BIER is paired with a particular multicast flow overlay, it 1498 inherits the security considerations of that layer. Similarly, when 1499 BIER is paired with a particular routing underlay, it inherits the 1500 security considerations of that layer. 1502 If the BIER encapsulation of a particular packet specifies an SI or a 1503 BitString other than the one intended by the BFIR, the packet is 1504 likely to be misdelivered. If the BIER encapsulation of a packet is 1505 modified (through error or malfeasance) in a way other than that 1506 specified in this document, the packet may be misdelivered. 1508 If the procedures used for advertising BFR-ids and BFR-prefixes are 1509 not secure, an attack on those procedures may result in incorrect 1510 delivery of BIER-encapsulated packets. 1512 Every BFR must be provisioned to know which of its interfaces lead to 1513 a BIER domain and which do not. If two interfaces lead to different 1514 BIER domains, the BFR must be provisioned to know that those two 1515 interfaces lead to different BIER domains. If the provisioning is 1516 not correct, BIER-encapsulated packets from one BIER domain may 1517 "leak" into another; this is likely to result in misdelivery of 1518 packets. 1520 9. Acknowledgements 1522 The authors wish to thank Rajiv Asati, John Bettink, Ross Callon (who 1523 contributed much of the text on deterministic ECMP), Nagendra Kumar, 1524 Christian Martin, Neale Ranns, Greg Shepherd, Albert Tian, Ramji 1525 Vaithianathan, Xiaohu Xu and Jeffrey Zhang for their ideas and 1526 contributions to this work. 1528 10. Contributor Addresses 1530 Below is a list of other contributing authors in alphabetical order: 1532 Gregory Cauchie 1533 Bouygues Telecom 1535 Email: gcauchie@bouyguestelecom.fr 1537 Mach (Guoyi) Chen 1538 Huawei 1539 Email: mach.chen@huawei.com 1541 Arkadiy Gulko 1542 Thomson Reuters 1543 195 Broadway 1544 New York NY 10007 1545 United States 1547 Email: arkadiy.gulko@thomsonreuters.com 1549 Wim Henderickx 1550 Nokia 1551 Copernicuslaan 50 1552 Antwerp 2018 1553 Belgium 1555 Email: wim.henderickx@nokia.com 1557 Martin Horneffer 1558 Deutsche Telekom 1559 Hammer Str. 216-226 1560 Muenster 48153 1561 Germany 1563 Email: Martin.Horneffer@telekom.de 1565 Uwe Joorde 1566 Deutsche Telekom 1567 Hammer Str. 216-226 1568 Muenster D-48153 1569 Germany 1571 Email: Uwe.Joorde@telekom.de 1573 Luay Jalil 1574 Verizon 1575 1201 E Arapaho Rd. 1576 Richardson, TX 75081 1577 United States 1579 Email: luay.jalil@verizon.com 1581 Jeff Tantsura 1583 Email: jefftant.ietf@gmail.com 1585 11. References 1587 11.1. Normative References 1589 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1590 Requirement Levels", BCP 14, RFC 2119, 1591 DOI 10.17487/RFC2119, March 1997, 1592 . 1594 [RFC3443] Agarwal, P. and B. Akyol, "Time To Live (TTL) Processing 1595 in Multi-Protocol Label Switching (MPLS) Networks", 1596 RFC 3443, DOI 10.17487/RFC3443, January 2003, 1597 . 1599 11.2. Informative References 1601 [Boivie_Feldman] 1602 Boivie, R. and N. Feldman, "Small Group Multicast", 1603 (expired) draft-boivie-sgm-02.txt, February 2001. 1605 [ISIS_BIER_EXTENSIONS] 1606 Ginsberg, L., Przygienda, T., Aldrin, S., and J. Zhang, 1607 "BIER Support via ISIS", internet-draft draft-ietf-bier- 1608 isis-extensions-04.txt, March 2017. 1610 [MPLS_BIER_ENCAPS] 1611 Wijnands, IJ., Rosen, E., Dolganow, A., Tantsura, J., 1612 Aldrin, S., and I. Meilik, "Encapsulation for Bit Index 1613 Explicit Replication in MPLS Networks", internet-draft 1614 draft-ietf-bier-mpls-encapsulation-06.txt, December 2016. 1616 [OSPF_BIER_EXTENSIONS] 1617 Psenak, P., Kumar, N., Wijnands, IJ., Dolganow, A., 1618 Przygienda, T., Zhang, J., and S. Aldrin, "OSPF Extensions 1619 for Bit Index Explicit Replication", internet-draft draft- 1620 ietf-bier-ospf-bier-extensions-05.txt, March 2017. 1622 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 1623 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 1624 2012, . 1626 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 1627 Encodings and Procedures for Multicast in MPLS/BGP IP 1628 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 1629 . 1631 Authors' Addresses 1633 IJsbrand Wijnands (editor) 1634 Cisco Systems, Inc. 1635 De Kleetlaan 6a 1636 Diegem 1831 1637 Belgium 1639 Email: ice@cisco.com 1641 Eric C. Rosen (editor) 1642 Juniper Networks, Inc. 1643 10 Technology Park Drive 1644 Westford, Massachusetts 01886 1645 United States 1647 Email: erosen@juniper.net 1649 Andrew Dolganow 1650 Nokia 1651 600 March Rd. 1652 Ottawa, Ontario K2K 2E6 1653 Canada 1655 Email: andrew.dolganow@nokia.com 1657 Tony Przygienda 1658 Juniper Networks, Inc. 1659 1194 N. Mathilda Ave. 1660 Sunnyvale, California 94089 1661 United States 1663 Email: prz@juniper.net 1665 Sam K Aldrin 1666 Google, Inc. 1667 1600 Amphitheatre Parkway 1668 Mountain View, California 1669 United States 1671 Email: aldrin.ietf@gmail.com