idnits 2.17.1 draft-ietf-bier-architecture-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 29, 2015) is 3187 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 350 -- Looks like a reference, but probably isn't: '255' on line 350 -- Looks like a reference, but probably isn't: '1' on line 278 -- Looks like a reference, but probably isn't: '65535' on line 261 -- Looks like a reference, but probably isn't: '256' on line 276 -- Looks like a reference, but probably isn't: '512' on line 278 -- Looks like a reference, but probably isn't: '15' on line 349 Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force IJ. Wijnands, Ed. 3 Internet-Draft Cisco Systems, Inc. 4 Intended status: Standards Track E. Rosen, Ed. 5 Expires: January 30, 2016 Juniper Networks, Inc. 6 A. Dolganow 7 Alcatel-Lucent 8 T. Przygienda 9 Ericsson 10 S. Aldrin 11 Google, Inc. 12 July 29, 2015 14 Multicast using Bit Index Explicit Replication 15 draft-ietf-bier-architecture-02 17 Abstract 19 This document specifies a new architecture for the forwarding of 20 multicast data packets. It provides optimal forwarding of multicast 21 packets through a "multicast domain". However, it does not require a 22 protocol for explicitly building multicast distribution trees, nor 23 does it require intermediate nodes to maintain any per-flow state. 24 This architecture is known as "Bit Index Explicit Replication" 25 (BIER). When a multicast data packet enters the domain, the ingress 26 router determines the set of egress routers to which the packet needs 27 to be sent. The ingress router then encapsulates the packet in a 28 BIER header. The BIER header contains a bitstring in which each bit 29 represents exactly one egress router in the domain; to forward the 30 packet to a given set of egress routers, the bits corresponding to 31 those routers are set in the BIER header. Elimination of the per- 32 flow state and the explicit tree-building protocols results in a 33 considerable simplification. 35 Status of This Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at http://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on January 30, 2016. 51 Copyright Notice 53 Copyright (c) 2015 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 69 2. The BFR Identifier and BFR-Prefix . . . . . . . . . . . . . . 6 70 3. Encoding BFR Identifiers in BitStrings . . . . . . . . . . . 6 71 4. Layering . . . . . . . . . . . . . . . . . . . . . . . . . . 9 72 4.1. The Routing Underlay . . . . . . . . . . . . . . . . . . 9 73 4.2. The BIER Layer . . . . . . . . . . . . . . . . . . . . . 10 74 4.3. The Multicast Flow Overlay . . . . . . . . . . . . . . . 11 75 5. Advertising BFR-ids and BFR-Prefixes . . . . . . . . . . . . 11 76 6. BIER Intra-Domain Forwarding Procedures . . . . . . . . . . . 13 77 6.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 13 78 6.2. BFR Neighbors . . . . . . . . . . . . . . . . . . . . . . 14 79 6.3. The Bit Index Routing Table . . . . . . . . . . . . . . . 15 80 6.4. The Bit Index Forwarding Table . . . . . . . . . . . . . 16 81 6.5. The BIER Forwarding Procedure . . . . . . . . . . . . . . 16 82 6.6. Examples of BIER Forwarding . . . . . . . . . . . . . . . 18 83 6.6.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . 19 84 6.6.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . 20 85 6.7. Equal Cost Multi-path Forwarding . . . . . . . . . . . . 22 86 6.7.1. Non-deterministic ECMP . . . . . . . . . . . . . . . 22 87 6.7.2. Deterministic ECMP . . . . . . . . . . . . . . . . . 23 88 6.8. Prevention of Loops and Duplicates . . . . . . . . . . . 25 89 6.9. When Some Nodes do not Support BIER . . . . . . . . . . . 26 90 6.10. Use of Different BitStringLengths within a Domain . . . . 28 91 6.10.1. BitStringLength Compatibility Check . . . . . . . . 29 92 6.10.2. Handling BitStringLength Mismatches . . . . . . . . 31 93 6.10.3. Transitioning from One BitStringLength to Another . 31 94 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 95 8. Security Considerations . . . . . . . . . . . . . . . . . . . 32 96 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 32 97 10. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 32 98 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 99 11.1. Normative References . . . . . . . . . . . . . . . . . . 34 100 11.2. Informative References . . . . . . . . . . . . . . . . . 34 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 103 1. Introduction 105 This document specifies a new architecture for the forwarding of 106 multicast data packets. It provides optimal forwarding of multicast 107 data packets through a "multicast domain". However, it does not 108 require the use of a protocol for explicitly building multicast 109 distribution trees, and it does not require intermediate nodes to 110 maintain any per-flow state. This architecture is known as "Bit 111 Index Explicit Replication" (BIER). 113 A router that supports BIER is known as a "Bit-Forwarding Router" 114 (BFR). The BIER control plane protocols (see Section 4.2) run within 115 a "BIER domain", allowing the BFRs within that domain to exchange the 116 information needed for them to forward packets to each other using 117 BIER. 119 A multicast data packet enters a BIER domain at a "Bit-Forwarding 120 Ingress Router" (BFIR), and leaves the BIER domain at one or more 121 "Bit-Forwarding Egress Routers" (BFERs). A BFR that receives a 122 multicast data packet from another BFR in the same BIER domain, and 123 forwards the packet to another BFR in the same BIER domain, will be 124 known as a "transit BFR" for that packet. A single BFR may be a BFIR 125 for some multicast traffic while also being a BFER for some multicast 126 traffic and a transit BFR for some multicast traffic. In fact, a BFR 127 may be the BFIR for a given packet and may also be (one of) the 128 BFER(s), for that packet; it may also forward that packet to one or 129 more additional BFRs. 131 A BIER domain may contain one or more sub-domains. Each BIER domain 132 MUST contain at least one sub-domain, the "default sub-domain" (also 133 denoted "sub-domain zero"). If a BIER domain contains more than one 134 sub-domain, each BFR in the domain MUST be provisioned to know the 135 set of sub-domains to which it belongs. Each sub-domain is 136 identified by a sub-domain-id in the range [0,255]. 138 For each sub-domain to which a given BFR belongs, if the BFR is 139 capable of acting as a BFIR or a BFER, it MUST be provisioned with a 140 "BFR-id" that is unique within the sub-domain. A BFR-id is a small 141 unstructured positive integer. For instance, if a particular BIER 142 sub-domain contains 1,374 BFRs, each one could be given a BFR-id in 143 the range 1-1374. 145 If a given BFR belongs to more than one sub-domain, it may (though it 146 need not) have a different BFR-id for each sub-domain. 148 When a multicast packet arrives from outside the domain at a BFIR, 149 the BFIR determines the set of BFERs to which the packet will be 150 sent. The BFIR also determines the sub-domain in which the packet 151 will be sent. Determining the sub-domain in which a given packet 152 will be sent is known as "assigning the packet to a sub-domain". 153 Procedures for choosing the sub-domain to which a particular packet 154 is assigned are outside the scope of this document. However, once a 155 particular packet has been assigned to a particular sub-domain, it 156 remains assigned to that sub-domain until it leaves the BIER domain. 157 That is, the sub-domain to which a packet is assigned MUST NOT be 158 changed while the packet is in flight through the BIER domain. 160 Once the BFIR determines sub-domain and the set of BFERs for a given 161 packet, the BFIR encapsulates the packet in a "BIER header". The 162 BIER header contains a bit string in which each bit represents a 163 single BFR-id. To indicate that a particular BFER is to receive a 164 given packet, the BFIR sets the bit corresponding to that BFER's 165 BFR-id in the sub-domain to which the packet has been assigned. We 166 will use term "BitString" to refer to the bit string field in the 167 BIER header. We will use the term "payload" to refer to the packet 168 that has been encapsulated. Thus a "BIER-encapsulated" packet 169 consists of a "BIER header" followed by a "payload". 171 The number of BFERs to which a given packet can be forwarded is 172 limited only by the length of the BitString in the BIER header. 173 Different deployments can use different BitString lengths. We will 174 use the term "BitStringLength" to refer to the number of bits in the 175 BitString. It is possible that some deployment will have more BFERs 176 in a given sub-domain than there are bits in the BitString. To 177 accommodate this case, the BIER encapsulation includes both the 178 BitString and a "Set Identifier" (SI). It is the BitString and the 179 SI together that determine the set of BFERs to which a given packet 180 will be delivered: 182 o by convention, the least significant (rightmost) bit in the 183 BitString is "bit 1", and the most significant (leftmost) bit is 184 "bit BitStringLength". 186 o if a BIER-encapsulated packet has an SI of n, and a BitString with 187 bit k set, then the packet must be delivered to the BFER whose 188 BFR-id (in the sub-domain to which the packet has been assigned) 189 is n*BitStringLength+k. 191 For example, suppose the BIER encapsulation uses a BitStringLength of 192 256 bits. By convention, the least significant (rightmost) bit is 193 "bit 1", and the most significant (leftmost) bit is "bit 256". 194 Suppose that a given packet has been assigned to sub-domain 0, and 195 needs to be delivered to three BFERs, where those BFERs have BFR-ids 196 in sub-domain 0 of 13, 126, and 235 respectively. The BFIR would 197 create a BIER encapsulation with the SI set to zero, and with bits 198 13, 126, and 235 of the BitString set. (All other bits of the 199 BitString would be clear.) If the packet also needs to be sent to a 200 BFER whose BFR-id is 257, the BFIR would have to create a second copy 201 of the packet, and the BIER encapsulation would specify an SI of 1, 202 and a BitString with bit 1 set and all the other bits clear. 204 Note that it is generally advantageous to assign the BFR-ids of a 205 given sub-domain so that as many BFERs as possible can be represented 206 in a single bit string. 208 Suppose a BFR, call it BFR-A, receives a packet whose BIER 209 encapsulation specifies an SI of 0, and a BitString with bits 13, 26, 210 and 235 set. Suppose BFR-A has two BFR neighbors, BFR-B and BFR-C, 211 such that the best path to BFERs 13 and 26 is via BFR-B, but the best 212 path to BFER 235 is via BFR-C. Then BFR-A will replicate the packet, 213 sending one copy to BFR-B and one copy to BFR-C. However, BFR-A will 214 clear bit 235 in the BitString of the packet copy it sends to BFR-B, 215 and will clear bits 13 and 26 in the BitString of the packet copy it 216 sends to BFR-C. As a result, BFR-B will forward the packet only 217 towards BFERs 13 and 26, and BFR-C will forward the packet only 218 towards BFER 235. This ensures that each BFER receives only one copy 219 of the packet. 221 With this forwarding procedure, a multicast data packet can follow an 222 optimal path from its BFIR to each of its BFERs. Further, since the 223 set of BFERs for a given packet is explicitly encoded into the BIER 224 header, the packet is not sent to any BFER that does not need to 225 receive it. This allows for optimal forwarding of multicast traffic. 226 This optimal forwarding is achieved without any need for transit BFRs 227 to maintain per-flow state, or to run a multicast tree-building 228 protocol. 230 The idea of encoding the set of egress nodes into the header of a 231 multicast packet is not new. For example, [Boivie_Feldman] proposes 232 to encode the set of egress nodes as a set of IP addresses, and 233 proposes mechanisms and procedures that are in some ways similar to 234 those described in the current document. However, since BIER encodes 235 each BFR-id as a single bit in a bit string, it can represent up to 236 128 BFERs in the same number of bits that it would take to carry the 237 IPv6 address of a single BFER. Thus BIER scales to a much larger 238 number of egress nodes per packet. 240 BIER does not require that each transit BFR look up the best path to 241 each BFER that is identified in the BIER header; the number of 242 lookups required in the forwarding path for a single packet can be 243 limited to the number of neighboring BFRs; this can be much smaller 244 than the number of BFERs. See Section 6 (especially Section 6.4) for 245 details. 247 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 248 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 249 document are to be interpreted as described in RFC 2119 [RFC2119]. 251 2. The BFR Identifier and BFR-Prefix 253 Each BFR MUST be assigned a "BFR-Prefix". A BFR's BFR-Prefix MUST be 254 an IP address (either IPv4 or IPv6) of the BFR, and MUST be unique 255 and routable within the BIER domain. It is RECOMMENDED that the 256 BFR-prefix be a loopback address of the BFR. Two BFRs in the same 257 BIER domain MUST NOT be assigned the same BFR-Prefix. Note that a 258 BFR in a given BIER domain has the same BFR-prefix in all the sub- 259 domains of that BIER domain. 261 A "BFR Identifier" (BFR-id) is a number in the range [1,65535]. In 262 general, each BFR in a given BIER sub-domain must be assigned a 263 unique number from this range (i.e., two BFRs in the same BIER sub- 264 domain MUST NOT have the same BFR-id in that sub-domain). However, 265 if it is known that a given BFR will never need to function as a BFER 266 in a given sub-domain, then it is not necessary to assign a BFR-id 267 for that sub-domain to that BFR. 269 Note that the value 0 is not a legal BFR-id. 271 The procedure for assigning a particular BFR-id to a particular BFR 272 is outside the scope of this document. However, it is RECOMMENDED 273 that the BFR-ids for each sub-domain be assigned "densely" from the 274 numbering space, as this will result in a more efficient encoding 275 (see Section 3). That is, if there are 256 or fewer BFERs, it is 276 RECOMMENDED to assign all the BFR-ids from the range [1,256]. If 277 there are more than 256 BFERs, but less than 512, it is RECOMMENDED 278 to assign all the BFR-ids from the range [1,512], with as few "holes" 279 as possible in the earlier range. However, in some deployments, it 280 may be advantageous to depart from this recommendation; this is 281 discussed further in Section 3. 283 3. Encoding BFR Identifiers in BitStrings 285 To encode a BFR-id in a BIER data packet, one must convert the BFR-id 286 to an SI and a BitString. This conversion depends upon the parameter 287 we are calling "BitStringLength". The conversion is done as follows. 288 If the BFR-id is N, then 290 o SI is the integer part of the quotient (N-1)/BitStringLength 292 o The BitString has one bit position set. If the low-order bit is 293 bit 1, and the high-order bit is bit BitStringLength, the bit 294 position that represents BFR-id N is 295 ((N-1) modulo BitStringLength)+1. 297 If several different BFR-ids all resolve to the same SI, then all 298 those BFR-ids can be represented in a single BitString. The 299 BitStrings for all of those BFR-ids are combined using a bitwise 300 logical OR operation. 302 Different BIER domains may use different values of BitStringLength. 303 Each BFR in a given BIER domain MUST be provisioned to know the 304 following: 306 o the BitStringLength ("Imposition BitStringLength") and sub-domain 307 ("Imposition sub-domain") to use when it imposes (as a BFIR) a 308 BIER encapsulation on a particular set of packets, and 310 o the BitStringLengths ("Disposition BitStringLengths") that it will 311 process when (as a BFR or BFER) it receives packets from a 312 particular sub-domain. 314 It is not required that a BFIR use the same Imposition 315 BitStringLength or the same Imposition sub-domain for all packets on 316 which it imposes the BIER encapsulation. However, if a particular 317 BFIR is provisioned to use a particular Imposition BitStringLength 318 and a particular Imposition sub-domain when imposing the 319 encapsulation on a given set of packets, all other BFRs with BFR-ids 320 in that sub-domain SHOULD be provisioned to process received BIER 321 packets with that BitStringLength (i.e., all other BFRs with BFR-ids 322 in that sub-domain SHOULD be provisioned with that BitStringLength as 323 a Disposition BitStringLength for that sub-domain. Exceptions to 324 this rule MAY be made under certain conditions; this is discussed in 325 Section 6.10. 327 Every BFIR MUST be capable of being provisioned with an Imposition 328 BitStringLength of 256. Every BFR and BFER MUST be capable of being 329 provisioned with a Disposition BitStringLength of 256. 331 Particular BIER encapsulation types MAY allow other BitStringLengths 332 to be OPTIONALLY supported. For example, when using the 333 encapsulation specified in [MPLS_BIER_ENCAPS], a BFR may be capable 334 of being provisioned to use any or all of the following 335 BitStringLengths as Imposition BitStringLengths and as Disposition 336 BitStringLengths: 64, 128, 256, 512, 1024, 2048, and 4096. 338 If a BFR is capable of being provisioned with a given value of 339 BitStringLength as an Imposition BitStringLength, it MUST also be 340 capable of being provisioned with that same value as one of its 341 Disposition BitStringLengths. It SHOULD be capable of being 342 provisioned with all legal smaller values of BitStringLength as both 343 Imposition and Disposition BitStringLength. 345 In order to support transition from one BitStringLength to another, 346 every BFR MUST be capable of being provisioned to simultaneously use 347 two different Disposition BitStringLengths. 349 A BFR MUST support SI values in the range [0,15], and MAY support SI 350 values in the range [0,255]. ("Supporting the values in a given 351 range" means, in this context, that any value in the given range is 352 legal, and will be properly interpreted.) 354 When a BFIR determines that a multicast data packet, assigned to a 355 given sub-domain, needs to be forwarded to a particular set of 356 destination BFERs, the BFIR partitions that set of BFERs into 357 subsets, where each subset contains the target BFERs whose BFR-ids in 358 the given sub-domain all resolve to the same SI. Call these the 359 "SI-subsets" for the packet. Each SI-subset can be represented by a 360 single BitString. The BFIR creates a copy of the packet for each 361 SI-subset. The BIER encapsulation is then applied to each packet. 362 The encapsulation specifies a single SI for each packet, and contains 363 the BitString that represents all the BFR-ids in the corresponding 364 SI-subset. Of course, in order to properly interpret the BitString, 365 it must be possible to infer the sub-domain-id from the encapsulation 366 as well. 368 Suppose, for example, that a BFIR determines that a given packet 369 needs to be forwarded to three BFERs, whose BFR-ids (in the 370 appropriate sub-domain) are 27, 235, and 497. The BFIR will have to 371 forward two copies of the packet. One copy, associated with SI=0, 372 will have a BitString with bits 27 and 235 set. The other copy, 373 associated with SI=1, will have a BitString with bit 241 set. 375 In order to minimize the number of copies that must be made of a 376 given multicast packet, it is RECOMMENDED that the BFR-ids be 377 assigned "densely" (see Section 2) from the numbering space. This 378 will minimize the number of SIs that have to be used in the domain. 379 However, depending upon the details of a particular deployment, other 380 assignment methods may be more advantageous. Suppose, for example, 381 that in a certain deployment, every multicast flow is either intended 382 for the "east coast" or for the "west coast". In such a deployment, 383 it would be advantageous to assign BFR-ids so that all the "west 384 coast" BFR-ids fall into the same SI-subset, and so that all the 385 "east coast" BFR-ids fall into the same SI-subset. 387 When a BFR receives a BIER data packet, it will infer the SI from the 388 encapsulation. The set of BFERs to which the packet needs to be 389 forwarded can then be inferred from the SI and the BitString. 391 In some of the examples given later in this document, we will use a 392 BitStringLength of 4, and will represent a BFR-id in the form 393 "SI:xyzw", where SI is the Set Identifier of the BFR-id (assuming a 394 BitStringLength of 4), and xyzw is a string of 4 bits. A 395 BitStringLength of 4 is used only in the examples; we would not 396 expect actual deployments to have such a small BitStringLength. 398 It is possible that several different forms of BIER encapsulation 399 will be developed. If so, the particular encapsulation that is used 400 in a given deployment will depend on the type of network 401 infrastructure that is used to realize the BIER domain. Details of 402 the BIER encapsulation(s) will be given in companion documents. An 403 encapsulation for use in MPLS networks is described in 404 [MPLS_BIER_ENCAPS] 406 4. Layering 408 It is helpful to think of the BIER architecture as consisting of 409 three layers: the "routing underlay", the "BIER layer", and the 410 "multicast flow overlay". 412 4.1. The Routing Underlay 414 The "routing underlay" establishes "adjacencies" between pairs of 415 BFRs, and determines one or more "best paths" from a given BFR to a 416 given set of BFRs. Each such path is a sequence of BFRs such that BFR(k+j) is "adjacent" to 418 BFR(k+j+1) (for 0<=j and BitStringLength can be done using the 536 advertisement capabilities of the IGP. For example, if a BIER domain 537 is also an OSPF domain, these advertisements can be done using the 538 OSPF "Opaque Link State Advertisement" (Opaque LSA) mechanism. 539 Details of the necessary extensions to OSPF and IS-IS will be 540 provided in companion documents. (See [OSPF_BIER_EXTENSIONS] and 541 [ISIS_BIER_EXTENSIONS].) 543 These advertisements enable each BFR to associate a given with a given BFR-prefix. As will be seen in 545 subsequent sections of this document, knowledge of this association 546 is an important part of the forwarding process. 548 Since each BFR needs to have a unique (in each sub-domain) BFR-id, 549 two different BFRs will not advertise ownership of the same unless there has been a provisioning error. 552 o If BFR-A determines that BFR-B and BFR-C have both advertised the 553 same BFR-id for the same sub-domain, BFR-A MUST log an error. 554 Suppose that the duplicate BFR-id is "N". When BFR-A is 555 functioning as a BFIR, it MUST NOT encode the BFR-id value N in 556 the BIER encapsulation of any packet that has been assigned to the 557 given sub-domain, even if it has determined that the packet needs 558 to be received by BFR-B and/or BFR-C. 560 This will mean that BFR-B and BFR-C cannot receive multicast 561 traffic at all in the given sub-domain until the provisioning 562 error is fixed. However, that is preferable to having them 563 receive each other's traffic. 565 o If BFR-A has been provisioned with BFR-id N for a particular sub- 566 domain, has not yet advertised its ownership of BFR-id N for that 567 sub-domain, but has received an advertisement from a different BFR 568 (say BFR-B) that is advertising ownership of BFR-id N for the same 569 sub-domain, then BFR-A SHOULD log an error, and MUST NOT advertise 570 its own ownership of BFR-id N for that sub-domain as long as the 571 advertisement from BFR-B is extant. 573 This procedure may prevent the accidental misconfiguration of a 574 new BFR from impacting an existing BFR. 576 If a BFR advertises that it has a BFR-id of 0 in a particular sub- 577 domain, other BFRs receiving the advertisement MUST interpret that 578 advertisement as meaning that the advertising BFR does not have a 579 BFR-id in that sub-domain. 581 6. BIER Intra-Domain Forwarding Procedures 583 This section specifies the rules for forwarding a BIER-encapsulated 584 data packet within a BIER domain. 586 6.1. Overview 588 This section provides a brief overview of the BIER forwarding 589 procedures. Subsequent sub-sections specify the procedures in more 590 detail. 592 To forward a BIER-encapsulated packet: 594 1. Determine the packet's sub-domain. 596 2. Determine the packet's BitStringLength and BitString. 598 3. Determine the packet's SI. 600 4. From the sub-domain, the SI and the BitString, determine the set 601 of destination BFERs for the packet. 603 5. Using information provided by the routing underlay associated 604 with the packet's sub-domain, determine the next hop adjacency 605 for each of the destination BFERs. 607 6. Partition the set of destination BFERs such that all the BFERs in 608 a single partition have the same next hop. We will say that each 609 partition is associated with a next hop. 611 7. For each partition: 613 a. Make a copy of the packet. 615 b. Clear any bit in the packet's BitString that identifies a 616 BFER that is not in the partition. 618 c. Transmit the packet to the associated next hop. 620 If a BFR receives a BIER-encapsulated packet whose sub-domain, SI and 621 BitString identify that BFR itself, then the BFR is also a BFER for 622 that packet. As a BFER, it must pass the payload to the multicast 623 flow overlay. If the BitString has bits set for other BFRs, the 624 packet also needs to be forwarded further within the BIER domain. If 625 the BF(E)R also forwards one or more copies of the packet within the 626 BIER domain, the bit representing the BFR's own BFR-id MUST be clear 627 in all the copies. 629 When BIER on a BFER passes a packet to the multicast flow overlay, it 630 may need to provide contextual information obtained from the BIER 631 encapsulation. The information that needs to pass between the BIER 632 layer and the multicast flow overlay is specific to the multicast 633 flow overlay. Specification of the interaction between the BIER 634 layer and the multicast flow overlay is outside the scope of this 635 specification. 637 When BIER on a BFER passes a packet to the multicast flow overlay, 638 the overlay will determine how to further dispatch the packet. If 639 the packet needs to be forwarded into another BIER domain, then the 640 BFR will act as a BFER in one BIER domain and as a BFIR in another. 641 A BIER-encapsulated packet cannot pass directly from one BIER domain 642 to another; at the boundary between BIER domains, the packet must be 643 decapsulated and passed to the multicast flow overlay. 645 Note that when a BFR transmits multiple copies of a packet within a 646 BIER domain, only one copy will be destined to any given BFER. 647 Therefore it is not possible for any BIER-encapsulated packet to be 648 delivered more than once to any BFER. 650 6.2. BFR Neighbors 652 The "BFR Neighbors" (BFR-NBRs) of a given BFR, say BFR-A, are those 653 BFRs that, according to the routing underlay, are adjacencies of 654 BFR-A. Each BFR-NBR will have a BFR-prefix. 656 Suppose a BIER-encapsulated packet arrives at BFR-A. From the 657 packet's encapsulation, BFR-A learns the sub-domain of the packet, 658 and the BFR-ids (in that sub-domain) of the BFERs to which the packet 659 is destined. Then using the information advertised per Section 5, 660 BFR-A can find the BFR-prefix of each destination BFER. Given the 661 BFR-prefix of a particular destination BFER, say BFER-D, BFR-A learns 662 from the routing underlay (associated with the packet's sub-domain) 663 an IP address of the BFR that is the next hop on the path from BFR-A 664 to BFER-D. Let's call this next hop BFR-B. BFR-A must then 665 determine the BFR-prefix of BFR-B. (This determination can be made 666 from the information advertised per Section 5.) This BFR-prefix is 667 the BFR-NBR of BFR-A on the path from BFR-A to BFER-D. 669 Note that if the routing underlay provides multiple equal cost paths 670 from BFR-A to BFER-D, BFR-A may have multiple BFR-NBRs for BFER-D. 672 Under certain circumstances, a BFR may have adjacencies (in a 673 particular routing underlay) that are not BFRs. Please see 674 Section 6.9 for a discussion of how to handle those circumstances. 676 6.3. The Bit Index Routing Table 678 The Bit Index Routing Table (BIRT) is a table that maps from the 679 BFR-id (in a particular sub-domain) of a BFER to the BFR-prefix of 680 that BFER, and to the BFR-NBR on the path to that BFER. 682 ( A ) ------------ ( B ) ------------ ( C ) ------------ ( D ) 683 4 (0:1000) \ \ 1 (0:0001) 684 \ \ 685 ( E ) ( F ) 686 3 (0:0100) 2 (0:0010) 688 Figure 1: BIER Topology 1 690 As an example, consider the topology shown in Figure 1. In this 691 diagram, we represent the BFR-id of each BFR in the SI:xyzw form 692 discussed in Section 3. This topology will result in the BIRT of 693 Figure 2 at BFR-B. The first column shows the BFR-id as a number and 694 also (in parentheses) in the SI:BitString format that corresponds to 695 a BitStringLength of 4. (The actual minimum BitStringLength is 64, 696 but we use 4 in the examples.) 698 Note that a BIRT is specific to a particular BIER sub-domain. 700 -------------------------------------------- 701 | BFR-id | BFR-Prefix | BFR-NBR | 702 | (SI:BitString) | of Dest BFER | | 703 ============================================ 704 | 4 (0:1000) | A | A | 705 -------------------------------------------- 706 | 1 (0:0001) | D | C | 707 -------------------------------------------- 708 | 3 (0:0100) | E | E | 709 -------------------------------------------- 710 | 2 (0:0010) | F | C | 711 -------------------------------------------- 713 Figure 2: Bit Index Routing Table at BFR-B 715 6.4. The Bit Index Forwarding Table 717 The "Bit Index Forwarding Table" (BIFT) is derived from the BIRT as 718 follows. (Note that a BIFT is specific to a particular sub-domain.) 720 Suppose that several rows in the BIRT have the same SI and the same 721 BFR-NBR. By taking the logical OR of the BitStrings of those rows, 722 we obtain a bit mask that corresponds to that combination of SI and 723 BFR-NBR. We will refer to this bit mask as the "Forwarding Bit Mask" 724 (F-BM) for that combination. 726 For example, in Figure 2, we see that two of the rows have the same 727 SI (0) and same BFR-NBR (C). The Bit Mask that corresponds to is 0011 ("0001" OR'd with "0010"). 730 The BIFT is used to map from the BFR-id of a BFER to the 731 corresponding F-BM and BFR-NBR. For example, Figure 3 shows the BIFT 732 that is derived from the BIRT of Figure 2. Note that BFR-ids 1 and 2 733 have the same SI and the same BFR-NBR, hence they have the same F-BM. 735 ------------------------------------- 736 | BFR-id | F-BM | BFR-NBR | 737 | (SI:Bitstring) | | | 738 ===================================== 739 | 1 (0:0001) | 0011 | C | 740 ------------------------------------- 741 | 2 (0:0010) | 0011 | C | 742 ------------------------------------- 743 | 3 (0:0100) | 0100 | E | 744 ------------------------------------- 745 | 4 (0:1000) | 1000 | A | 746 ------------------------------------- 748 Figure 3: Bit Index Forwarding Table 750 This Bit Index Forwarding Table (BIFT) is programmed into the data- 751 plane and used to forward packets, applying the rules specified below 752 in Section 6.5. 754 6.5. The BIER Forwarding Procedure 756 Below is the procedure that a BFR uses for forwarding a BIER- 757 encapsulated packet. 759 1. Determine the packet's SI, BitStringLength, and sub-domain. 761 2. If the BitString consists entirely of zeroes, discard the packet; 762 the forwarding process has been completed. Otherwise proceed to 763 step 3. 765 3. Find the position, call it "k", of the least significant (i.e., 766 of the rightmost) bit that is set in the packet's BitString. 767 (Remember, bits are numbered from 1, starting with the least 768 significant bit.) 770 4. If bit k identifies the BFR itself, copy the packet, and send the 771 copy to the multicast flow overlay. Then clear bit k in the 772 original packet, and go to step 2. Otherwise, proceed to step 5. 774 5. Use the value k, together with the SI, sub-domain, and 775 BitStringLength, as the 'index' into the BIFT. 777 6. Extract from the BIFT the F-BM and the BFR-NBR. 779 7. Copy the packet. Update the copy's BitString by AND'ing it with 780 the F-BM (i.e., PacketCopy->BitString &= F-BM). Then forward the 781 copy to the BFR-NBR. Note that when a packet is forwarded to a 782 particular BFR-NBR, its BitString identifies only those BFERs 783 that are to be reached via that BFR-NBR. 785 8. Now update the original packet's BitString by AND'ing it with the 786 INVERSE of the F-BM (i.e., Packet->Bitstring &= ~F-BM). (This 787 clears the bits that identify the BFERs to which a copy of the 788 packet has just been forwarded.) Go to step 2. 790 Note that this procedure causes the packet to be forwarded to a 791 particular BFR-NBR only once. The number of lookups in the BIFT is 792 the same as the number of BFR-NBRs to which the packet must be 793 forwarded; it is not necessary to do a separate lookup for each 794 destination BFER. 796 Suppose it has been decided (by the above rules) to send a packet to 797 a particular BFR-NBR. If that BFR-NBR is connected via multiple 798 parallel interfaces, it may be desirable to apply some form of load 799 balancing. Load balancing algorithms are outside the scope of this 800 document. However, if the packet's encapsulation contains an 801 "entropy" field, the entropy field SHOULD be respected; two packets 802 with the same value of the entropy field SHOULD be sent on the same 803 interface (if possible). 805 In some cases, the routing underlay may provide multiple equal cost 806 paths (through different BFR-NBRs) to a given BFER. This is known as 807 "Equal Cost Multiple Paths" (ECMP). The procedures described in this 808 section must be augmented in order to support load balancing over 809 ECMP. The necessary augmentations can be found in Section 6.7. 811 In the event that unicast traffic to the BFR-NBR is being sent via a 812 "bypass tunnel" of some sort, the BIER-encapsulated multicast traffic 813 send to the BFR-NBR SHOULD also be sent via that tunnel. This allows 814 any existing "Fast Reroute" schemes to be applied to multicast 815 traffic as well as to unicast traffic. 817 Some examples of these forwarding procedures can be found in 818 Section 6.6. 820 The rules given in this section can be represented by the following 821 pseudocode: 823 void ForwardBitMaskPacket (Packet) 824 { 825 SI=GetPacketSI(Packet); 826 Offset=SI*BitStringLength; 827 for (Index = GetFirstBitPosition(Packet->BitString); Index ; 828 Index = GetNextBitPosition(Packet->BitString, Index)) { 829 F-BM = BIFT[Index+Offset]->F-BM; 830 if (!F-BM) continue; 831 BFR-NBR = BIFT[Index+Offset]->BFR-NBR; 832 PacketCopy = Copy(Packet); 833 PacketCopy->BitString &= F-BM; 834 PacketSend(PacketCopy, BFR-NBR); 835 Packet->BitString &= ~F-BM; 836 } 837 } 839 Figure 4: Pseudocode 841 This pseudocode assumes that at a given BFER, the BFR-NBR entry 842 corresponding to the BFER's own BFR-id will be the BFER's own 843 BFR-prefix. It also assumes that the corresponding F-BM has only one 844 bit set, the bit representing the BFER itself. In this case, the 845 "PacketSend" function sends the packet to the multicast flow overlay. 847 6.6. Examples of BIER Forwarding 849 In this section, we give two examples of BIER forwarding, based on 850 the topology in Figure 1. In these examples, all packets have been 851 assigned to the default sub-domain, all packets have SI=0, and the 852 BitStringLength is 4. Figure 5 shows the BIFT entries for SI=0 only. 853 For compactness, we show the first column of the BIFT, the BFR-id, 854 only as an integer. 856 BFR-A BIFT BFR-B BIFT BFR-C BIFT 857 ------------------- ------------------- ------------------- 858 | Id | F-BM | NBR | | Id | F-BM | NBR | | Id | F-BM | NBR | 859 =================== =================== =================== 860 | 1 | 0111 | B | | 1 | 0011 | C | | 1 | 0001 | D | 861 ------------------- ------------------- ------------------- 862 | 2 | 0111 | B | | 2 | 0011 | C | | 2 | 0010 | F | 863 ------------------- ------------------- ------------------- 864 | 3 | 0111 | B | | 3 | 0100 | E | | 3 | 1100 | B | 865 ------------------- ------------------- ------------------- 866 | 4 | 1000 | A | | 4 | 1000 | A | | 4 | 1100 | B | 867 ------------------- ------------------- ------------------- 869 Figure 5: BIFTs for Forwarding Examples 871 6.6.1. Example 1 873 BFR-D, BFR-E and BFR-F are BFER's. BFR-A is the BFIR. Suppose that 874 BFIR-A has learned from the multicast flow overlay that BFER-D is 875 interested in a given multicast flow. If BFIR-A receives a packet of 876 that flow from outside the BIER domain, BFIR-A applies the BIER 877 encapsulation to the packet. The encapsulation must be such that the 878 SI is zero. The encapsulation also includes a BitString, with just 879 bit 1 set and with all other bits clear (i.e., 0001). This indicates 880 that BFER-D is the only BFER that needs to receive the packet. Then 881 BFIR-A follows the procedures of Section 6.5: 883 o Since the packet's BitString is 0001, BFIR-A finds that the first 884 bit in the string is bit 1. Looking at entry 1 in its BIFT, BFR-A 885 determines that the bit mask F-BM is 0111 and the BFR-NBR is 886 BFR-B. 888 o BFR-A then makes a copy of the packet, and applies F-BM to the 889 copy: Copy->BitString &= 0111. The copy's Bitstring is now 0001 890 (0001 & 0111). 892 o The copy is now sent to BFR-B. 894 o BFR-A then updates the packet's BitString by applying the inverse 895 of the F-BM: Packet->Bitstring &= ~F-BM. As a result, the 896 packet's BitString is now 0000 (0001 & 1000). 898 o As the packet's BitString is now zero, the forwarding procedure is 899 complete. 901 When BFR-B receives the multicast packet from BFR-A, it follows the 902 same procedure. The result is that a copy of the packet, with a 903 BitString of 0001, is sent to BFR-C. BFR-C applies the same 904 procedures, and as a result sends a copy of the packet, with a 905 BitString of 0001, to BFR-D. 907 At BFER-D, the BIFT entry (not pictured) for BFR-id 1 will specify an 908 F-BM of 0000 and a BFR-NBR of BFR-D itself. This will cause a copy 909 of the packet to be delivered to the multicast flow overlay at BFR-D. 910 The packet's BitString will be set to 0000, and the packet will not 911 be forwarded any further. 913 6.6.2. Example 2 915 This example is similar to Example 1, except that BFIR-A has learned 916 from the multicast flow overlay that both BFER-D and BFER-E are 917 interested in a given multicast flow. If BFIR-A receives a packet of 918 that flow from outside the BIER domain, BFIR-A applies the BIER 919 encapsulation to the packet. The encapsulation must be such that the 920 SI is zero. The encapsulation also includes a BitString with two 921 bits set: bit 1 is set (as in example 1) to indicate that BFR-D is a 922 BFER for this packet, and bit 3 is set to indicate that BFR-E is a 923 BFER for this packet. I.e., the BitString (assuming again a 924 BitStringLength of 4) is 0101. To forward the packet, BFIR-A follows 925 the procedures of Section 6.5: 927 o Since the packet's BitString is 0101, BFIR-A finds that the first 928 bit in the string is bit 1. Looking at entry 1 in its BIFT, BFR-A 929 determines that the bit mask F-BM is 0111 and the BFR-NBR is 930 BFR-B. 932 o BFR-A then makes a copy of the packet, and applies the F-BM to the 933 copy: Copy->BitString &= 0111. The copy's Bitstring is now 0101 934 (0101 & 0111). 936 o The copy is now sent to BFR-B. 938 o BFR-A then updates the packet's BitString by applying the inverse 939 of the F-BM: Packet->Bitstring &= ~F-BM. As a result, the 940 packet's BitString is now 0000 (0101 & 1000). 942 o As the packet's BitString is now zero, the forwarding procedure is 943 complete. 945 When BFR-B receives the multicast packet from BFR-A, it follows the 946 procedure of Section 6.5, as follows: 948 o Since the packet's BitString is 0101, BFR-B finds that the first 949 bit in the string is bit 1. Looking at entry 1 in its BIFT, BFR-B 950 determines that the bit mask F-BM is 0011 and the BFR-NBR is 951 BFR-C. 953 o BFR-B then makes a copy of the packet, and applies the F-BM to the 954 copy: Copy->BitString &= 0011. The copy's Bitstring is now 0001 955 (0101 & 0011). 957 o The copy is now sent to BFR-C. 959 o BFR-B then updates the packet's BitString by applying the inverse 960 of the F-BM: Packet->Bitstring &= F-BM. As a result, the 961 packet's BitString is now 0100 (0101 & 1100). 963 o Now BFR-B finds the next bit in the packet's (modified) BitString. 964 This is bit 3. Looking at entry 3 in its BIFT, BFR-B determines 965 that the F-BM is 0100 and the BFR-NBR is BFR-E. 967 o BFR-B then makes a copy of the packet, and applies the F-BM to the 968 copy: Copy->BitString &= 0100. The copy's Bitstring is now 0100 969 (0100 & 0100). 971 o The copy is now sent to BFR-E. 973 o BFR-B then updates the packet's BitString by applying the inverse 974 of the F-BM: Packet->Bitstring &= ~F-BM. As a result, the 975 packet's BitString is now 0000 (0100 & 1011). 977 o As the packet's BitString is now zero, the forwarding procedure is 978 complete. 980 Thus BFR-B forwards two copies of the packet. One copy of the 981 packet, with BitString 0001, has now been sent from BFR-B to BFR-C. 982 Following the same procedures, BFR-C will forward the packet to 983 BFER-D. 985 At BFER-D, the BIFT entry (not pictured) for BFR-id 1 will specify an 986 F-BM of 0000 and a BFR-NBR of BFR-D itself. This will cause a copy 987 of the packet to be delivered to the multicast flow overlay at BFR-D. 988 The packet's BitString will be set to 0000, and the packet will not 989 be forwarded any further. 991 The other copy of the packet has been sent from BFR-B to BFER-E, with 992 BitString 0100. 994 At BFER-E, the BIFT entry (not pictured) for BFR-id 3 will specify an 995 F-BM of 0000 and a BFR-NBR of BFR-E itself. This will cause a copy 996 of the packet to be delivered to the multicast flow overlay at BFR-E. 997 The packet's BitString will be set to 0000, and the packet will not 998 be forwarded any further. 1000 6.7. Equal Cost Multi-path Forwarding 1002 In many networks, the routing underlay will provide multiple equal 1003 cost paths from a given BFR to a given BFER. When forwarding 1004 multicast packets through the network, it can be beneficial to take 1005 advantage of this by load balancing among those paths. This feature 1006 is known as "equal cost multiple path forwarding", or "ECMP". 1008 BIER supports ECMP, but the procedures of Section 6.5 must be 1009 modified slightly. Two ECMP procedures are defined. In the first 1010 (described in Section 6.7.1), the choice among equal-cost paths taken 1011 by a given packet from a given BFR to a given BFER depends on (a) the 1012 packet's entropy, and (b) the other BFERs to which that packet is 1013 destined. In the second (described in Section 6.7.2), the choice 1014 depends only upon the packet's entropy. 1016 There are tradeoffs between the two forwarding procedures described 1017 here. In the procedure of Section 6.7.1, the number of packet 1018 replications is minimized. The procedure in Section 6.7.1 also uses 1019 less memory in the BFR. In the procedure of Section 6.7.2, the path 1020 traveled by a given packet from a given BFR to a given BFER is 1021 independent of the other BFERs to which the packet is destined. 1022 While the procedures of Section 6.7.2 may cause more replications, 1023 they provide a more predictable behavior. 1025 The two procedures described here operate on identical packet formats 1026 and will interoperate correctly. However, if deterministic behavior 1027 is desired, then all BFRs would need to use the procedure from 1028 Section 6.7.2. 1030 6.7.1. Non-deterministic ECMP 1032 Figure 6 shows the operation of non-deterministic ECMP in BIER. 1034 BFR-A BIFT BFR-B BIFT BFR-C BIFT 1035 ------------------- ------------------- ------------------- 1036 | Id | F-BM | NBR | | Id | F-BM | NBR | | Id | F-BM | NBR | 1037 =================== =================== =================== 1038 | 1 | 0111 | B | | 1 | 0011 | C | | 1 | 0001 | D | 1039 ------------------- ------------------- ------------------- 1040 | 2 | 0111 | B | | 2 | 0011 | C | | 2 | 0010 | F | 1041 ------------------- | | 0110 | E | ------------------- 1042 | 3 | 0111 | B | ------------------- | 3 | 1100 | B | 1043 ------------------- | 3 | 0110 | E | ------------------- 1044 | 4 | 1000 | A | ------------------| | 4 | 1100 | B | 1045 ------------------- | 4 | 1000 | A | ------------------- 1046 ------------------- 1048 ( A ) ------------ ( B ) ------------ ( C ) ------------ ( D ) 1049 4 (0:1000) \ \ 1 (0:0001) 1050 \ \ 1051 ( E ) ------------ ( F ) 1052 3 (0:0100) 2 (0:0010) 1054 Figure 6: Example of ECMP 1056 In this example, BFR-B has two equal cost paths to reach BFER-F, one 1057 via BFR-C and one via BFR-E. Since the BFR-id of BFER-F is 2, this 1058 is reflected in entry 2 of BFR-B's BIFT. Entry 2 shows that BFR-B 1059 has a choice of two BFR-NBRs for BFER-B, and that a different F-BM is 1060 associated with each choice. When BFR-B looks up entry 2 in the 1061 BIFT, it can choose either BFR-NBR. However, when following the 1062 procedures of Section 6.5, it MUST use the F-BM corresponding to the 1063 BFR-NBR that it chooses. 1065 How the choice is made is an implementation matter. However, the 1066 usual rules for ECMP apply: packets of a given flow SHOULD NOT be 1067 split among two paths, and any "entropy" field in the packet's 1068 encapsulation SHOULD be respected. 1070 Note however that by the rules of Section 6.5, any packet destined 1071 for both BFER-D and BFER-F will be sent via BFR-C. 1073 6.7.2. Deterministic ECMP 1075 With the procedures of Section 6.7.1, where ECMP paths exist, the 1076 path a packet takes to reach any particular BFER depends not only on 1077 routing and on the packet's entropy, but also on the set of other 1078 BFERs to which the packet is destined. 1080 For example consider the following scenario in the network of 1081 Figure 6. 1083 o There is a sequence of packets being transmitted by BFR-A, some of 1084 which are destined for both D and F, and some of which are 1085 destined only for F. 1087 o All the packets in this sequence have the same entropy value, call 1088 it "Q". 1090 o At BFR-B, when a packet with entropy value Q is forwarded via 1091 entry 2 in the BIFT, the packet is sent to E. 1093 Using the forwarding procedure of Section 6.7.1, packets of this 1094 sequence that are destined for both D and F are forwarded according 1095 to entry 1 in the BIFT, and thus will reach F via the path A-B-C-F. 1096 However, packets of this sequence that are destined only for F are 1097 forwarded according to entry 2 in the BIFT, and thus will reach F via 1098 the path A-B-E-F. 1100 That procedure minimizes the number of packets transmitted by BFR B. 1101 However, consider the following scenario: 1103 o Beginning at time t0, the multicast flow in question needs to be 1104 received ONLY by BFER-F; 1106 o Beginning at a later time, t1, the flow needs to be received by 1107 both BFER-D and BFER-F. 1109 o Beginning at a later time, t2, the no longer needs to be received 1110 by D, but still needs to be received by F. 1112 Then from t0 until t1, the flow will travel to F via the path 1113 A-B-E-F. From t1 until t2, the flow will travel to F via the path 1114 A-B-C-F. And from t2, the flow will again travel to F via the path 1115 A-B-E-F. 1117 The problem is that if D repeatedly joins and leaves the flow, the 1118 flow's path from B to F will keep switching. This could cause F to 1119 receive packets out of order. It also makes troubleshooting 1120 difficult. For example, if there is some problem on the E-F link, 1121 receivers at F will get good service when the flow is also going to D 1122 (avoiding the E-F link), but bad service when the flow is not going 1123 to D. Since it is hard to know which path is being used at any given 1124 time, this may be hard to troubleshoot. Also, it is very difficult 1125 to perform a traceroute that is known to follow the path taken by the 1126 flow at any given time. 1128 The source of this difficulty is that, in the procedures of 1129 Section 6.7.1, the path taken by a particular flow to a particular 1130 BFER depends upon whether there are lower numbered BFERs that are 1131 also receiving the flow. Thus the choice among the ECMP paths is 1132 fundamentally non-deterministic. 1134 Deterministic forwarding can be achieved by using multiple BIFTs, 1135 such that each row in a BIFT has only one path to each destination, 1136 but the multiple ECMP paths to any particular destination are spread 1137 across the multiple tables. When a BIER-encapsulated packet arrives 1138 to be forwarded, the BFR uses a hash of the BIER Entropy field to 1139 determine which BIFT to use, and then the normal BIER forwarding 1140 algorithm (as described in Sections 6.5 and 6.6) is used with the 1141 selected BIFT. 1143 As an example, suppose there are two paths to destination X (call 1144 them X1 and X2), and four paths to destination Y (call them Y1, Y2, 1145 Y3, and Y4). If there are, say, four BIFTs, one BIFT would have 1146 paths X1 and Y1, one would have X1 and Y2, one would have X2 and Y3, 1147 and one would have X2 and Y4. If traffic to X is split evenly among 1148 these four BIFTs, the traffic will be split evenly between the two 1149 paths to X; if traffic to Y is split evenly among these four BIFTs, 1150 the traffic will be split evenly between the four paths to Y. 1152 Note that if there are three paths to one destination and four paths 1153 to another, 12 BIFTs would be required in order to get even splitting 1154 of the load to each of those two destinations. Of course, each BIFT 1155 uses some memory, and one might be willing to have less optimal 1156 splitting in order to have fewer BIFTs. How that tradeoff is made is 1157 an implementation or deployment decision. 1159 6.8. Prevention of Loops and Duplicates 1161 The BitString in a BIER-encapsulated packet specifies the set of 1162 BFERs to which that packet is to be forwarded. When a BIER- 1163 encapsulated packet is replicated, no two copies of the packet will 1164 ever have a BFER in common. If one of the packet's BFERs forwards 1165 the packet further, that will first clear the bit that identifies 1166 itself. As a result, duplicate delivery of packets is not possible 1167 with BIER. 1169 As long as the routing underlay provides a loop free path between 1170 each pair of BFRs, BIER-encapsulated packets will not loop. Since 1171 the BIER layer does not create any paths of its own, there is no need 1172 for any BIER-specific loop prevention techniques beyond the 1173 forwarding procedures specified in Section 6.5. 1175 If, at some time, the routing underlay is not providing a loop free 1176 path between BFIR-A and BFER-B, then BIER encapsulated packets may 1177 loop while traveling from BFIR-A to BFER-B. However, such loops will 1178 never result in delivery of duplicate packets to BFER-B. 1180 These properties of BIER eliminate the need for the "reverse path 1181 forwarding" (RPF) check that is used in conventional IP multicast 1182 forwarding. 1184 6.9. When Some Nodes do not Support BIER 1186 The procedures of section Section 6.2 presuppose that, within a given 1187 BIER domain, all the nodes adjacent to a given BFR in a given routing 1188 underlay are also BFRs. However, it is possible to use BIER even 1189 when this is not the case, as long as the ingress and egress nodes 1190 are BFRs. In this section, we assume that the routing underlay is an 1191 SPF-based IGP that computes a shortest path tree from each node to 1192 all other nodes in the domain. 1194 At a given BFR, say BFR B, start with a copy of the IGP-computed 1195 shortest path tree from BFR B to each router in the domain. (This 1196 tree is computed by the SPF algorithm of the IGP.) Let's call this 1197 copy the "BIER-SPF tree rooted at BFR B." BFR B then modifies this 1198 BIER-SPF tree as follows. 1200 1. BFR B looks in turn at each of B's child nodes on the BIER-SPF 1201 tree. 1203 2. If one of the child nodes does not support BIER, BFR B removes 1204 that node from the tree. The child nodes of the node that has 1205 just been removed are then re-parented on the tree, so that BFR B 1206 now becomes their parent. 1208 3. BFR B then continues to look at each of its child nodes, 1209 including any nodes that have been re-parented to B as a result 1210 of the previous step. 1212 When all of the child nodes (the original child nodes plus any new 1213 ones) have been examined, B's children on the BIER-SPF tree will all 1214 be BFRs. 1216 When the BIFT is constructed, B's child nodes on the BIER-SPF tree 1217 are considered to be the BFR-NBRs. The F-BMs and outgoing BIER-MPLS 1218 labels must be computed appropriately, based on the BFR-NBRs. 1220 B may now have BFR-NBRs that are not "directly connected" to B via 1221 layer 2. To send a packet to one of these BFR-NBRs, B will have to 1222 send the packet through a unicast tunnel. This may be as simple as 1223 finding the IGP unicast next hop to the child node, and pushing on 1224 (above the BIER-MPLS label advertised by the child) the MPLS label 1225 that the IGP next hop has bound to an address of the child node. (If 1226 for some reason the unicast tunnel cannot be an MPLS tunnel, any 1227 other kind of tunnel can be used, as long as it is possible to 1228 encapsulate MPLS within that kind of tunnel.) 1230 Of course, the above is not meant as an implementation technique, 1231 just as a functional description. 1233 While the above description assumes that the routing underlay 1234 provides an SPF tree, it may also be applicable to other types of 1235 routing underlay. 1237 Note that the technique above can also be used to provide "node 1238 protection" (i.e., to provide fast reroute around nodes that are 1239 believed to have failed). If BFR B has a failed BFR-NBR, B can 1240 remove the failed BFR-NBR from the BIER-SPF tree, and can then re- 1241 parent the child BFR-NBRs of the failed BFR-NBR so that they appear 1242 to be B's own child nodes on the tree (i.e., so that they appear to 1243 be B's BFR-NBRs). Then the usual BIER forwarding procedures apply. 1244 However, getting the packet from B to the child nodes of the failed 1245 BFR-NBR is a bit more complicated, as it may require using a unicast 1246 bypass tunnel to get around the failed node. 1248 A simpler variant of step 2 above would be the following: 1250 If one of the child nodes does not support BIER, BFR B removes 1251 that node from the tree. All BFERs that are reached through that 1252 child node are then re-parented on the tree, so that BFR B now 1253 becomes their parent. 1255 This variant is simpler because the set of BFERs that are reached 1256 through a particular child node of B can be determined from the F-BM 1257 in the BIFT. However, if this variant is used, the results are less 1258 optimal, because packets will be unicast directly from B to the BFERs 1259 that are reachable through the non-BIER child node. 1261 When using a unicast MPLS tunnel to get a packet to a BFR-NBR: 1263 o the TTL of the MPLS label entry representing the "tunnel" SHOULD 1264 be set to a large value, rather than being copied from the TTL 1265 value from the BIER-MPLS label, and 1267 o when the tunnel labels are popped off, the TTL from the tunnel 1268 labels SHOULD NOT be copied to the BIER-MPLS label. 1270 In other words, the TTL processing for the tunnel SHOULD be as 1271 specified in [RFC3443] for "Pipe Model" and "Short Pipe Model" LSPs. 1272 That way, the TTL of the BIER-MPLS label constrains only the number 1273 of BFRs that the packet may traverse, not the total number of hops. 1275 The material in this section presupposes that a given node is either 1276 a BFR or not, and that a BFR supports BIER on all its interfaces. It 1277 is however possible that a router will have some line cards that 1278 support BIER and some that do not. In such a case, one can think of 1279 the router as a "partial-BFR", that supports BIER only on some of its 1280 interfaces. If it is desired to deploy such partial-BFRs, one can 1281 use the multi-topology features of the IGP to set up a BIER-specific 1282 topology. This topology would exclude all the non-BIER-capable 1283 interfaces that attach to BFRs. BIER would then have to be run in a 1284 sub-domain that is bound to this topology. If unicast tunnels are 1285 used to bypass non-BFRs, either the tunnels have to be restricted to 1286 this topology, or the tunnel endpoints have to be BFRs that do not 1287 have any non-BIER-capable interfaces. 1289 6.10. Use of Different BitStringLengths within a Domain 1291 It is possible for different BFRs within a BIER domain to be using 1292 different Imposition and/or Disposition BitStringLengths. As stated 1293 in Section 3: 1295 "if a particular BFIR is provisioned to use a particular 1296 Imposition BitStringLength and a particular Imposition sub-domain 1297 when imposing the encapsulation on a given set of packets, all 1298 other BFRs with BFR-ids in that sub-domain SHOULD be provisioned 1299 to process received BIER packets with that BitStringLength (i.e., 1300 all other BFRs with BFR-ids in that sub-domain SHOULD be 1301 provisioned with that BitStringLength as a Disposition 1302 BitStringLength for that sub-domain)." 1304 Note that mis-provisioning can result in "black holes". If a BFIR 1305 creates a BIER packet with a particular BitStringLength, and if that 1306 packet needs to travel through a BFR that cannot process received 1307 BIER packets with that BitStringLength, then it may be impossible to 1308 forward the packet to all of the BFERs identified in its BIER header. 1309 Section 6.10.1 defines a procedure, the "BitStringLength 1310 Compatibility Check", that can be used to detect the possibility of 1311 such black holes. 1313 However, failure of the BitStringLength Compatibility Check does not 1314 necessarily result in the creation of black holes; Section 6.10.2 1315 specifies OPTIONAL procedures that allow BIER forwarding to proceed 1316 without black holes, even if the BitStringLength Compatibility Check 1317 fails. 1319 If the procedures of Section 6.10.2 are not deployed, but the 1320 BitStringLength Compatibility Check fails at some BFIR, the BFIR has 1321 two choices: 1323 o Create BIER packets with the provisioned Imposition 1324 BitStringLength, even though the packets may not be able to reach 1325 all the BFERs identified in their BitStrings 1327 o Use an Imposition BitStringLength that passes the Compatibility 1328 Check (assuming that there is one), even if this is not the 1329 provisioned Imposition BitStringLength. 1331 Section 6.10.1 discusses the implications of making one or the other 1332 of these choices. 1334 There will be times when an operator wishes to change the 1335 BitStringLengths used in a particular BIER domain. Section 6.10.3 1336 specifies a simple procedure that can be used to transition a BIER 1337 domain from one BitStringLength to another. 1339 6.10.1. BitStringLength Compatibility Check 1341 When a BFIR needs to encapsulate a packet, the BFIR first assigns the 1342 packet to a sub-domain. Then the BFIR chooses an Imposition 1343 BitStringLength L for the packet. The choice of Imposition 1344 BitStringLength is by provisioning. However, the BFIR should also 1345 perform the BitStringLength Compatibility Check defined below. 1347 The combination of Sub-Domain S and Imposition BitStringLength L 1348 passes the BitStringLength Compatibility Check if and only if the 1349 following condition holds: 1351 Every BFR that has advertised its membership in sub-domain S has 1352 also advertised that it is using Disposition BitStringLength L 1353 (and possibly other BitStringLengths as well) in that Sub-Domain. 1354 (If the MPLS encapsulation [MPLS_BIER_ENCAPS] is being used, this 1355 means that every BFR that is advertising a label for Sub-Domain S 1356 is advertising a label for the combination of Sub-Domain S and 1357 Disposition BitStringLength L.) 1359 If a BFIR has been provisioned to use a particular Imposition 1360 BitStringLength and a particular sub-domain for some set of packets, 1361 and if that combination of Imposition BitStringLength and sub-domain 1362 does not pass the BitStringLength Compatibility Check, the BFIR 1363 SHOULD log this fact as an error. It then has the following choice 1364 about what to do with the packets: 1366 1. The BFIR MAY use the provisioned Imposition BitStringLength 1367 anyway. If the procedure Paragraph 2 or Paragraph 3 of 1368 Section 6.10.2 are deployed, this will not cause black holes, and 1369 may actually be the optimal result. It should be understood 1370 though that the BFIR cannot determine by signaling whether those 1371 procedures have been deployed. 1373 2. If the BFIR is capable of using an Imposition BitStringlength 1374 that does pass the BitStringLength Compatibility Check for the 1375 particular sub-domain, the BFIR MAY use that Imposition 1376 BitStringLength instead. 1378 Which of these two choices to make is itself determined by 1379 provisioning. 1381 Note that discarding the packets is not one of the allowable choices. 1382 Suppose, for example, that all the BFIRs are provisioned to use 1383 Imposition BitStringLength L for a particular sub-domain S, but one 1384 BFR has not been provisioned to use Disposition BitStringLength L for 1385 sub-domain S. This will cause the BitStringLength Compatibility 1386 Check to fail. If the BFIR sends packets with BitStringLength L and 1387 sub-domain S, the mis-provisioned BFR will not be able to forward 1388 those packets, and thus the packets may only be able to reach a 1389 subset of the BFERs to which they are destined. However, this is 1390 still better than having the BFIRs drop the packets; if the BFIRs 1391 discard the packets, the packets won't reach any of the BFERs to 1392 which they are destined at all. 1394 If the procedures of Section 6.10.2 have not been deployed, choice 2 1395 might seem like a better option. However, there might not be any 1396 Imposition BitStringLength that a given BFIR can use that also passes 1397 the BitStringLength Compatibility Check. If it is desired to use 1398 choice 2 in a particular deployment, then there should be a "Fallback 1399 Disposition BitStringLength", call it F, such that: 1401 o Every BFR advertises that it uses BitStringLength F as a 1402 Disposition BitStringLength for every sub-domain, and 1404 o If a BFIR is provisioned to use Imposition BitStringLength X and 1405 Imposition sub-domain S for a certain class of packets, but the 1406 BitStringLength Compatibility check fails for the combination of 1407 BitStringLength X and sub-domain S, then the BFIR will fall back 1408 to using BitStringLength F as the Imposition BitStringLength 1409 whenever the Imposition sub-domain is S. 1411 This fallback procedure will work best if the value of F is 1412 established by the architecture, rather than by provisioning. 1414 6.10.2. Handling BitStringLength Mismatches 1416 Suppose a packet has been BIER-encapsulated with a BitStringLength 1417 value of X, and that the packet has arrived at BFR-A. Now suppose 1418 that according to the routing underlay, the next hop is BFR-B, but 1419 BFR-B is not using X as one of its Disposition BitStringLengths. 1420 What should BFR-A do with the packet? BFR-A has three options. It 1421 MUST do one of the three, but the choice of which procedure to follow 1422 is a local matter. The three options are: 1424 1. BFR-A MAY discard the packet. 1426 2. BFR-A MAY re-encapsulate the packet, using a BIER header whose 1427 BitStringLength value is supported by BFR-B. 1429 Note that if BFR-B only uses Disposition BitStringLength values 1430 that are smaller than the BitStringLength value of the packet, 1431 this may require creating additional copies of the packet. 1432 Whether additional copies actually have to be created depends 1433 upon the bits that are actually set in the original packet's 1434 BitString. 1436 3. BFR-A MAY treat BFR-B as if BFR-B did not support BIER at all, 1437 and apply the rules of Section 6.9. 1439 Note that there is no signaling that enables a BFR to advertise which 1440 of the three options it will use. 1442 Option 2 can be useful if there is a region of the BIER domain where 1443 the BFRs are capable of using a long BitStringLength, and a region 1444 where the BFRs are only capable of using a shorter BitStringLength. 1446 6.10.3. Transitioning from One BitStringLength to Another 1448 Suppose one wants to migrate the BitStringLength used in a particular 1449 BIER domain from one value (X) to another value (Y). The following 1450 migration procedure can be used. This procedure allows the BFRs to 1451 be reprovisioned one at a time, and does not require a "flag day". 1453 First, upgrade all the BFRs in the domain so that they use both value 1454 X and value Y as their Disposition BitStringLengths. Once this is 1455 done, reprovision the BFIRs so that they use BitStringLength value Y 1456 as the Imposition BitStringLength. Once that is done, one may 1457 optionally reprovision all the BFRs so that they no longer use 1458 Dispostion BitStringLength X. 1460 7. IANA Considerations 1462 This document contains no actions for IANA. 1464 8. Security Considerations 1466 When BIER is paired with a particular multicast flow overlay, it 1467 inherits the security considerations of that layer. Similarly, when 1468 BIER is paired with a particular routing underlay, it inherits the 1469 security considerations of that layer. 1471 If the BIER encapsulation of a particular packet specifies an SI or a 1472 BitString other than the one intended by the BFIR, the packet is 1473 likely to be misdelivered. If the BIER encapsulation of a packet is 1474 modified (through error or malfeasance) in a way other than that 1475 specified in this document, the packet may be misdelivered. 1477 If the procedures used for advertising BFR-ids and BFR-prefixes are 1478 not secure, an attack on those procedures may result in incorrect 1479 delivery of BIER-encapsulated packets. 1481 Every BFR must be provisioned to know which of its interfaces lead to 1482 a BIER domain and which do not. If two interfaces lead to different 1483 BIER domains, the BFR must be provisioned to know that those two 1484 interfaces lead to different BIER domains. If the provisioning is 1485 not correct, BIER-encapsulated packets from one BIER domain may 1486 "leak" into another; this is likely to result in misdelivery of 1487 packets. 1489 9. Acknowledgements 1491 The authors wish to thank Rajiv Asati, John Bettink, Ross Callon (who 1492 contributed much of the text on deterministic ECMP), Nagendra Kumar, 1493 Christian Martin, Neale Ranns, Greg Shepherd, Albert Tian, Ramji 1494 Vaithianathan, Xiaohu Xu and Jeffrey Zhang for their ideas and 1495 contributions to this work. 1497 10. Contributor Addresses 1499 Below is a list of other contributing authors in alphabetical order: 1501 Gregory Cauchie 1502 Bouygues Telecom 1504 Email: gcauchie@bouyguestelecom.fr 1506 Mach (Guoyi) Chen 1507 Huawei 1508 Email: mach.chen@huawei.com 1510 Arkadiy Gulko 1511 Thomson Reuters 1512 195 Broadway 1513 New York NY 10007 1514 United States 1516 Email: arkadiy.gulko@thomsonreuters.com 1518 Wim Henderickx 1519 Alcatel-Lucent 1520 Copernicuslaan 50 1521 Antwerp 2018 1522 Belgium 1524 Email: wim.henderickx@alcatel-lucent.com 1526 Martin Horneffer 1527 Deutsche Telekom 1528 Hammer Str. 216-226 1529 Muenster 48153 1530 Germany 1532 Email: Martin.Horneffer@telekom.de 1534 Uwe Joorde 1535 Deutsche Telekom 1536 Hammer Str. 216-226 1537 Muenster D-48153 1538 Germany 1540 Email: Uwe.Joorde@telekom.de 1542 Luay Jalil 1543 Verizon 1544 1201 E Arapaho Rd. 1545 Richardson, TX 75081 1546 United States 1548 Email: luay.jalil@verizon.com 1550 Jeff Tantsura 1551 Ericsson 1552 300 Holger Way 1553 San Jose, CA 95134 1554 United States 1556 Email: jeff.tantsura@ericsson.com 1558 11. References 1560 11.1. Normative References 1562 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1563 Requirement Levels", BCP 14, RFC 2119, 1564 DOI 10.17487/RFC2119, March 1997, 1565 . 1567 [RFC3443] Agarwal, P. and B. Akyol, "Time To Live (TTL) Processing 1568 in Multi-Protocol Label Switching (MPLS) Networks", 1569 RFC 3443, DOI 10.17487/RFC3443, January 2003, 1570 . 1572 11.2. Informative References 1574 [Boivie_Feldman] 1575 Boivie, R. and N. Feldman, "Small Group Multicast", 1576 (expired) draft-boivie-sgm-02.txt, February 2001. 1578 [ISIS_BIER_EXTENSIONS] 1579 Ginsberg, L., Przygienda, T., Aldrin, S., and J. Zhang, 1580 "BIER Support via ISIS", internet-draft draft-ietf-bier- 1581 isis-extensions-00.txt, April 2015. 1583 [MPLS_BIER_ENCAPS] 1584 Wijnands, IJ., Rosen, E., Dolganow, A., Tantsura, J., and 1585 S. Aldrin, "Encapsulation for Bit Index Explicit 1586 Replication in MPLS Networks", internet-draft draft-ietf- 1587 bier-mpls-encapsulation-01.txt, June 2015. 1589 [OSPF_BIER_EXTENSIONS] 1590 Psenak, P., Kumar, N., Wijnands, IJ., Dolganow, A., 1591 Przygienda, T., and J. Zhang, "OSPF Extensions for Bit 1592 Index Explicit Replication", internet-draft draft-ietf- 1593 ospf-bier-extensions-00.txt, April 2015. 1595 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 1596 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 1597 2012, . 1599 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 1600 Encodings and Procedures for Multicast in MPLS/BGP IP 1601 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 1602 . 1604 Authors' Addresses 1606 IJsbrand Wijnands (editor) 1607 Cisco Systems, Inc. 1608 De Kleetlaan 6a 1609 Diegem 1831 1610 Belgium 1612 Email: ice@cisco.com 1614 Eric C. Rosen (editor) 1615 Juniper Networks, Inc. 1616 10 Technology Park Drive 1617 Westford, Massachusetts 01886 1618 United States 1620 Email: erosen@juniper.net 1622 Andrew Dolganow 1623 Alcatel-Lucent 1624 600 March Rd. 1625 Ottawa, Ontario K2K 2E6 1626 Canada 1628 Email: andrew.dolganow@alcatel-lucent.com 1630 Tony Przygienda 1631 Ericsson 1632 300 Holger Way 1633 San Jose, California 95134 1634 United States 1636 Email: antoni.przygienda@ericsson.com 1637 Sam K Aldrin 1638 Google, Inc. 1639 1600 Amphitheatre Parkway 1640 Mountain View, California 1641 United States 1643 Email: aldrin.ietf@gmail.com