idnits 2.17.1 draft-ietf-bier-te-arch-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC8279]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 8, 2019) is 1754 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '2' on line 1062 -- Looks like a reference, but probably isn't: '1' on line 1072 == Missing Reference: 'SI' is mentioned on line 1109, but not defined == Missing Reference: 'I' is mentioned on line 1116, but not defined == Missing Reference: 'VRF' is mentioned on line 1499, but not defined Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Eckert, Ed. 3 Internet-Draft Huawei 4 Intended status: Standards Track G. Cauchie 5 Expires: January 9, 2020 Bouygues Telecom 6 M. Menth 7 University of Tuebingen 8 July 8, 2019 10 Traffic Engineering for Bit Index Explicit Replication (BIER-TE) 11 draft-ietf-bier-te-arch-03 13 Abstract 15 This memo introduces per-packet stateless strict and loose path 16 engineered replication and forwarding for Bit Index Explicit 17 Replication packets ([RFC8279]). This is called BIER-TE. 19 BIER-TE leverages the BIER architecture ([RFC8279]) and extends it 20 with a new semantic for bits in the bitstring. BIER-TE can leverage 21 BIER forwarding engines with little or no changes. 23 In BIER, the BitPositions (BP) of the packets bitstring indicate BIER 24 Forwarding Egress Routers (BFER), and hop-by-hop forwarding uses a 25 Routing Underlay such as an IGP. 27 In BIER-TE, BitPositions indicate adjacencies. The BIFT of each BFR 28 are only populated with BPs that are adjacent to the BFR in the BIER- 29 TE topology. The BIER-TE topology can consist of layer 2 or remote 30 (route) adjacencies. The BFR then replicates and forwards BIER 31 packets to those adjacencies. This results in the aforementioned 32 strict and loose path forwarding. 34 BIER-TE can co-exist with BIER forwarding in the same domain, for 35 example by using seperate sub-domains. In the absence of routed 36 adjacencies, BIER-TE does not require a BIER routing underlay, and 37 can then be operated without requiring an IGP routing protocol. 39 BIER-TE operates without explicit in-network tree-building and 40 carries the multicast distribution tree in the packet header. It can 41 therefore be a good fit to support multicast path steering in Segment 42 Routing (SR) networks. 44 Status of This Memo 46 This Internet-Draft is submitted in full conformance with the 47 provisions of BCP 78 and BCP 79. 49 Internet-Drafts are working documents of the Internet Engineering 50 Task Force (IETF). Note that other groups may also distribute 51 working documents as Internet-Drafts. The list of current Internet- 52 Drafts is at https://datatracker.ietf.org/drafts/current/. 54 Internet-Drafts are draft documents valid for a maximum of six months 55 and may be updated, replaced, or obsoleted by other documents at any 56 time. It is inappropriate to use Internet-Drafts as reference 57 material or to cite them other than as "work in progress." 59 This Internet-Draft will expire on January 9, 2020. 61 Copyright Notice 63 Copyright (c) 2019 IETF Trust and the persons identified as the 64 document authors. All rights reserved. 66 This document is subject to BCP 78 and the IETF Trust's Legal 67 Provisions Relating to IETF Documents 68 (https://trustee.ietf.org/license-info) in effect on the date of 69 publication of this document. Please review these documents 70 carefully, as they describe your rights and restrictions with respect 71 to this document. Code Components extracted from this document must 72 include Simplified BSD License text as described in Section 4.e of 73 the Trust Legal Provisions and are provided without warranty as 74 described in the Simplified BSD License. 76 Table of Contents 78 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 79 1.1. Basic Examples . . . . . . . . . . . . . . . . . . . . . 4 80 1.2. BIER-TE Topology and adjacencies . . . . . . . . . . . . 7 81 1.3. Comparison with BIER . . . . . . . . . . . . . . . . . . 8 82 1.4. Requirements Language . . . . . . . . . . . . . . . . . . 8 83 2. Layering . . . . . . . . . . . . . . . . . . . . . . . . . . 8 84 2.1. The Multicast Flow Overlay . . . . . . . . . . . . . . . 9 85 2.2. The BIER-TE Controller Host . . . . . . . . . . . . . . . 9 86 2.2.1. Assignment of BitPositions to adjacencies of the 87 network topology . . . . . . . . . . . . . . . . . . 10 88 2.2.2. Changes in the network topology . . . . . . . . . . . 10 89 2.2.3. Set up per-multicast flow BIER-TE state . . . . . . . 10 90 2.2.4. Link/Node Failures and Recovery . . . . . . . . . . . 10 91 2.3. The BIER-TE Forwarding Layer . . . . . . . . . . . . . . 11 92 2.4. The Routing Underlay . . . . . . . . . . . . . . . . . . 11 93 3. BIER-TE Forwarding . . . . . . . . . . . . . . . . . . . . . 11 94 3.1. The Bit Index Forwarding Table (BIFT) . . . . . . . . . . 11 95 3.2. Adjacency Types . . . . . . . . . . . . . . . . . . . . . 12 96 3.2.1. Forward Connected . . . . . . . . . . . . . . . . . . 12 97 3.2.2. Forward Routed . . . . . . . . . . . . . . . . . . . 13 98 3.2.3. ECMP . . . . . . . . . . . . . . . . . . . . . . . . 13 99 3.2.4. Local Decap . . . . . . . . . . . . . . . . . . . . . 13 100 3.3. Encapsulation considerations . . . . . . . . . . . . . . 14 101 3.4. Basic BIER-TE Forwarding Example . . . . . . . . . . . . 14 102 3.5. Forwarding comparison with BIER . . . . . . . . . . . . . 16 103 3.6. Requirements . . . . . . . . . . . . . . . . . . . . . . 17 104 4. BIER-TE Controller Host BitPosition Assignments . . . . . . . 17 105 4.1. P2P Links . . . . . . . . . . . . . . . . . . . . . . . . 18 106 4.2. BFER . . . . . . . . . . . . . . . . . . . . . . . . . . 18 107 4.3. Leaf BFERs . . . . . . . . . . . . . . . . . . . . . . . 18 108 4.4. LANs . . . . . . . . . . . . . . . . . . . . . . . . . . 18 109 4.5. Hub and Spoke . . . . . . . . . . . . . . . . . . . . . . 19 110 4.6. Rings . . . . . . . . . . . . . . . . . . . . . . . . . . 19 111 4.7. Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . . 20 112 4.8. Routed adjacencies . . . . . . . . . . . . . . . . . . . 23 113 4.8.1. Reducing BitPositions . . . . . . . . . . . . . . . . 23 114 4.8.2. Supporting nodes without BIER-TE . . . . . . . . . . 23 115 5. Avoiding loops and duplicates . . . . . . . . . . . . . . . . 23 116 5.1. Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 23 117 5.2. Duplicates . . . . . . . . . . . . . . . . . . . . . . . 24 118 6. BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . . 24 119 7. Managing SI, subdomains and BFR-ids . . . . . . . . . . . . . 27 120 7.1. Why SI and sub-domains . . . . . . . . . . . . . . . . . 28 121 7.2. Bit assignment comparison BIER and BIER-TE . . . . . . . 29 122 7.3. Using BFR-id with BIER-TE . . . . . . . . . . . . . . . . 29 123 7.4. Assigning BFR-ids for BIER-TE . . . . . . . . . . . . . . 30 124 7.5. Example bit allocations . . . . . . . . . . . . . . . . . 31 125 7.5.1. With BIER . . . . . . . . . . . . . . . . . . . . . . 31 126 7.5.2. With BIER-TE . . . . . . . . . . . . . . . . . . . . 32 127 7.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 33 128 8. BIER-TE and Segment Routing (SR) . . . . . . . . . . . . . . 33 129 9. Security Considerations . . . . . . . . . . . . . . . . . . . 34 130 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 131 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 132 12. Change log [RFC Editor: Please remove] . . . . . . . . . . . 35 133 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 134 13.1. Normative References . . . . . . . . . . . . . . . . . . 38 135 13.2. Informative References . . . . . . . . . . . . . . . . . 38 136 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 138 1. Introduction 140 BIER-TE shares architecture, terminology and packet formats with BIER 141 as described in [RFC8279] and [RFC8296]. This document describes 142 BIER-TE in the expectation that the reader is familiar with these two 143 documents. 145 In BIER-TE, BitPositions (BP) indicate adjacecies. The BIFT of each 146 BFR is only populated with BP that are adjacent to the BFR in the 147 BIER-TE Topology. Other BPs are left without adjacency. The BFR 148 replicate and forwards BIER packets to adjacent BPs that are set in 149 the packet. BPs are normally also reset upon forwarding to avoid 150 duplicates and loops. This is detailled further below. 152 1.1. Basic Examples 154 BIER-TE forwarding is best introduced with simple examples. 156 BIER-TE Topology: 158 Diagram: 160 p5 p6 161 --- BFR3 --- 162 p3/ p13 \p7 163 BFR1 ---- BFR2 BFR5 ----- BFR6 164 p1 p2 p4\ p14 /p10 p11 p12 165 --- BFR4 --- 166 p8 p9 168 (simplified) BIER-TE Bit Index Forwarding Tables (BIFT): 170 BFR1: p1 -> local_decap 171 p2 -> forward_connected to BFR2 173 BFR2: p1 -> forward_connected to BFR1 174 p5 -> forward_connected to BFR3 175 p8 -> forward_connected to BFR4 177 BFR3: p3 -> forward_connected to BFR2 178 p7 -> forward_connected to BFR5 179 p13 -> local_decap 181 BFR4: p4 -> forward_connected to BFR2 182 p10 -> forward_connected to BFR5 183 p14 -> local_decap 185 BFR5: p6 -> forward_connected to BFR3 186 p9 -> forward_connected to BFR4 187 p12 -> forward_connected to BFR6 189 BFR6: p11 -> forward_connected to BFR5 190 p12 -> local_decap 192 Figure 1: BIER-TE basic example 194 Consider the simple network in the above BIER-TE overview example 195 picture with 6 BFR. p1...p14 are the BitPositions (BP) used. All BFR 196 can act as ingres BFR (BFIR), BFR1, BFR3, BFR4 and BFR6 can also be 197 egres BFR (BFER). Forward_connected is the name for adjacencies that 198 are representing subnet adjacencies of the network. Local_decap is 199 the name of the adjacency to decapsulate BIER-TE packets and pass 200 their payload to higher layer processing. 202 Assume a packet from BFR1 should be sent via BFR4 to BFR6. This 203 requires a bitstring (p2,p8,p10,p12). When this packet is examined 204 by BIER-TE on BFR1, the only BitPosition from the bitstring that is 205 also set in the BIFT is p2. This will cause BFR1 to to send the only 206 copy of the packet to BFR2. Similarily, BFR2 will forward to BFR4 207 because of p8, BFR4 to BFR5 because of p10 and BFR5 to BFR6 because 208 of p12. p12 also makes BFR6 receive and decapsulate the packet. 210 To send in addition to BFR6 via BFR4 also a copy to BFR3, the 211 bitstring needs to be (p2,p5,p8,p10,p12,p13). When this packet is 212 examined by BFR2, p5 causes one copy to be sent to BFR3 and p8 one 213 copy to BFR4. When BFR3 receives the packet, p13 will cause it to 214 receive and decapsulate the packet. 216 If instead the bitstring was (p2,p6,p8,p10,p12,p13), the packet would 217 be copied by BFR5 towards BFR3 because p6 instead of BFR2 to BFR5 218 because of p6 in the prior casse. This is showing the ability of the 219 shown BIER-TE Topology to make the traffic pass across any possible 220 path and be replicated where desired. 222 BIER-TE has various options to minimize BP assignments, many of which 223 are based on assumptions about the required multicast traffic paths 224 and bandwidth consumption in the network. 226 The following picture shows a modified example, in which Rtr2 and 227 Rtr5 are assumed not to support BIER-TE, so traffic has to be unicast 228 encapsulated across them. Unicast tunneling of BIER-TE packets can 229 leverage any feasible mechanism such as MPLS or IP, these 230 encapsulations are out of scope of this document. To emphasize non- 231 native forwarding of BIER-TE packets, these adjacencies are called 232 "forward_routed", but otherwise there is no difference in their 233 processing over the aforementioned "forward_connected" adjacencies. 235 In addition, bits are saved in the following example by assuming that 236 BFR1 only needs to be BFIR but not BFER or transit BFR. 238 BIER-TE Topology: 240 Diagram: 242 p1 p3 p7 243 ....> BFR3 <.... p5 244 ........ ........> 245 BFR1 (Rtr2) (Rtr5) BFR6 246 ........ ........> 247 ....> BFR4 <.... p6 248 p2 p4 p8 250 (simplified) BIER-TE Bit Index Forwarding Tables (BIFT): 252 BFR1: p1 -> forward_routed to BFR3 253 p2 -> forward_routed to BFR4 255 BFR3: p3 -> local_decap 256 p5 -> forward_routed to BFR6 258 BFR4: p4 -> local_decap 259 p6 -> forward_routed to BFR6 261 BFR6: p5 -> local_decap 262 p6 -> local_decap 263 p7 -> forward_routed to BFR3 264 p8 -> forward_routed to BFR4 266 Figure 2: BIER-TE basic overlay example 268 To send a BIER-TE packet from BFR1 via BFR3 to BFR6, the bitstring is 269 (p1,p5). From BFR1 via BFR4 to BFR6 it is (p2,p6). A packet from 270 BFR1 to BFR3,BFR4 and BFR6 can use (p1,p2,p3,p4,p5) or 271 (p1,p2,p3,p4,p6), or via BFR6 (p2,p3,p4,p6,p7) or (p1.p3,p4,p5,p8). 273 1.2. BIER-TE Topology and adjacencies 275 The key new component in BIER-TE to control where replication can or 276 should happens and how to minimize the required BP for segments is - 277 as shown in these two examples - the BIER-TE topology. 279 The BIER-TE Topology effectively consists of the BIFT of all the the 280 BFR and can also be expressed in a diagram as a graph where the edges 281 are the adjacencies between the BFR. Adjacencies are naturally 282 unidirectional. BP can be reused across multiple adjacencies as long 283 as this does not lead to undesired duplicates or loops as explained 284 further down in the text. 286 If the BIER-TE topology represents the underlying (layer 2) topology 287 of the network, this is called "native" BIER-TE as shown in the first 288 example. This can be freely mixed with "overlay" BIER-TE, in 289 "forward_routed" adjacencies are used. 291 1.3. Comparison with BIER 293 The key differences over BIER are: 295 o BIER-TE replaces in-network autonomous path calculation by 296 explicit paths calculated offpath by the BIER-TE controller host. 298 o In BIER-TE every BitPosition of the BitString of a BIER-TE packet 299 indicates one or more adjacencies - instead of a BFER as in BIER. 301 o BIER-TE in each BFR has no routing table but only a BIER-TE 302 Forwarding Table (BIFT) indexed by SI:BitPosition and populated 303 with only those adjacencies to which the BFR should replicate 304 packets to. 306 BIER-TE headers use the same format as BIER headers. 308 BIER-TE forwarding does not require/use the BFIR-ID. The BFIR-ID can 309 still be useful though for coordinated BFIR/BFER functions, such as 310 the context for upstream assigned labels for MPLS payloads in MVPN 311 over BIER-TE. 313 If the BIER-TE domain is also running BIER, then the BFIR-ID in BIER- 314 TE packets can be set to the same BFIR-ID as used with BIER packets. 316 If the BIER-TE domain is not running full BIER or does not want to 317 reduce the need to allocate bits in BIER bitstrings for BFIR-ID 318 values, then the allocation of BFIR-ID values in BIER-TE packets can 319 be done through other mechanisms outside the scope of this document, 320 as long as this is appropriately agreed upon between all BFIR/BFER. 322 1.4. Requirements Language 324 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 325 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 326 document are to be interpreted as described in RFC 2119 [RFC2119]. 328 2. Layering 330 End to end BIER-TE operations consists of four layers: The "Multicast 331 Flow Overlay", the "BIER-TE Controller Host", the "Routing Underlay" 332 and the "BIER-TE forwarding layer". The Bier-TE Controller Host is 333 the new architectural element in BIER-TE compared toBIER . 335 Picture 2: Layers of BIER-TE 337 <------BGP/PIM-----> 338 |<-IGMP/PIM-> multicast flow <-PIM/IGMP->| 339 overlay 341 [BIER-TE Controller Host] <=> [BIER-TE Topology] 342 ^ ^ ^ 343 / | \ BIER-TE control protocol 344 | | | eg.: Netconf/Restconf/Yang 345 v v v 346 Src -> Rtr1 -> BFIR-----BFR-----BFER -> Rtr2 -> Rcvr 348 |--------------------->| 349 BIER-TE forwarding layer 351 |<- BIER-TE domain-->| 353 |<--------------------->| 354 Routing underlay 356 Figure 3: BIER-TE architecture 358 2.1. The Multicast Flow Overlay 360 The Multicast Flow Overlay operates as in BIER. See [RFC8279]. 361 Instead of interacting with the BIER forwarding layer layer (as in 362 BIER), it interacts with the BIER-TE Controller Host. 364 2.2. The BIER-TE Controller Host 366 The BIER-TE controller host is representing the control plane of 367 BIER-TE. It communicates two sets of information with BFRs: 369 During bring-up or modifications of the network topology, the 370 controller discovers the network topology and creates the BIER-TE 371 topology from it: determine which adjacencies are required/desired 372 and assign BitPositions to them. Then it signals the resulting of 373 BitPositions and their adjacencies to each BFR to set up their BIER- 374 TE BIFTs. 376 During day-to-day operations of the network, the controller signals 377 to BFIRs what multicast flows are mapped to what BitStrings. 379 Communications between the BIER-TE controller host to BFRs is ideally 380 via standardized protocols and data-models such as Netconf/Retconf/ 381 Yang. This is currently outside the scope of this document. Vendor- 382 specific CLI on the BFRs is also a possible stopgap option (as in 383 many other SDN solutions lacking definition of standardized data 384 model). 386 For simplicity, the procedures of the BIER-TE controller host are 387 described in this document as if it is a single, centralized 388 automated entity, such as an SDN controller. It could equally be an 389 operator setting up CLI on the BFRs. Distribution of the functions 390 of the BIER-TE controller host is currently outside the scope of this 391 document. 393 2.2.1. Assignment of BitPositions to adjacencies of the network 394 topology 396 The BIER-TE controller host tracks the BFR topology of the BIER-TE 397 domain. It determines what adjacencies require BitPositions so that 398 BIER-TE explicit paths can be built through them as desired by 399 operator policy. 401 The controller then pushes the BitPositions/adjacencies to the BIFT 402 of the BFRs, populating only those SI:BitPositions to the BIFT of 403 each BFR to which that BFR should be able to send packets to - 404 adjacencies connecting to this BFR. 406 2.2.2. Changes in the network topology 408 If the network topology changes (not failure based) so that 409 adjacencies that are assigned to BitPositions are no longer needed, 410 the controller can re-use those BitPositions for new adjacencies. 411 First, these BitPositions need to be removed from any BFIR flow state 412 and BFR BIFT state, then they can be repopulated, first into BIFT and 413 then into the BFIR. 415 2.2.3. Set up per-multicast flow BIER-TE state 417 The BIER-TE controller host tracks the multicast flow overlay to 418 determine what multicast flow needs to be sent by a BFIR to which set 419 of BFER. It calculates the desired distribution tree across the 420 BIER-TE domain based on algorithms outside the scope of this document 421 (eg.: CSFP, Steiner Tree,...). It then pushes the calculated 422 BitString into the BFIR. 424 2.2.4. Link/Node Failures and Recovery 426 When link or nodes fail or recover in the topology, BIER-TE can 427 quickly respond with the optional FRR procedures described in [I- 428 D.eckert-bier-te-frr]. It can also more slowly react by 429 recalculating the BitStrings of affected multicast flows. This 430 reaction is slower than the FRR procedure because the controller 431 needs to receive link/node up/down indications, recalculate the 432 desired BitStrings and push them down into the BFIRs. With FRR, this 433 is all performed locally on a BFR receiving the adjacency up/down 434 notification. 436 2.3. The BIER-TE Forwarding Layer 438 When the BIER-TE Forwarding Layer receives a packet, it simply looks 439 up the BitPositions that are set in the BitString of the packet in 440 the Bit Index Forwarding Table (BIFT) that was populated by the BIER- 441 TE controller host. For every BP that is set in the BitString, and 442 that has one or more adjacencies in the BIFT, a copy is made 443 according to the type of adjacencies for that BP in the BIFT. Before 444 sending any copy, the BFR resets all BitPositions in the BitString of 445 the packet to which it can create a copy. This is done to inhibit 446 that packets can loop. 448 2.4. The Routing Underlay 450 BIER-TE is sending BIER packets to directly connected BIER-TE 451 neighbors as L2 (unicasted) BIER packets without requiring a routing 452 underlay. BIER-TE forwarding uses the Routing underlay for 453 forward_routed adjacencies which copy BIER-TE packets to not- 454 directly-connected BFRs (see below for adjacency definitions). 456 If the BFR intends to support FRR for BIER-TE, then the BIER-TE 457 forwarding plane needs to receive fast adjacency up/down 458 notifications: Link up/down or neighbor up/down, eg.: from BFD. 459 Providing these notifications is considered to be part of the routing 460 underlay in this document. 462 3. BIER-TE Forwarding 464 3.1. The Bit Index Forwarding Table (BIFT) 466 The Bit Index Forwarding Table (BIFT) exists in every BFR. For every 467 subdomain in use, it is a table indexed by SI:BitPosition and is 468 populated by the BIER-TE control plane. Each index can be empty or 469 contain a list of one or more adjacencies. 471 BIER-TE can support multiple subdomains like BIER. Each one with a 472 separate BIFT 474 In the BIER architecture, indices into the BIFT are explained to be 475 both BFR-id and SI:BitString (BitPosition). This is because there is 476 a 1:1 relationship between BFR-id and SI:BitString - every bit in 477 every SI is/can be assigned to a BFIR/BFER. In BIER-TE there are 478 more bits used in each BitString than there are BFIR/BFER assigned to 479 the bitstring. This is because of the bits required to express the 480 (traffic engineered) path through the topology. The BIER-TE 481 forwarding definitions do therefore not use the term BFR-id at all. 482 Instead, BFR-ids are only used as required by routing underlay, flow 483 overlay of BIER headers. Please refer to Section 7 for explanations 484 how to deal with SI, subdomains and BFR-id in BIER-TE. 486 ------------------------------------------------------------------ 487 | Index: | Adjacencies: | 488 | SI:BitPosition | or one or more per entry | 489 ================================================================== 490 | 0:1 | forward_connected(interface,neighbor,DNR) | 491 ------------------------------------------------------------------ 492 | 0:2 | forward_connected(interface,neighbor,DNR) | 493 | | forward_connected(interface,neighbor,DNR) | 494 ------------------------------------------------------------------ 495 | 0:3 | local_decap({VRF}) | 496 ------------------------------------------------------------------ 497 | 0:4 | forward_routed({VRF,}l3-neighbor) | 498 ------------------------------------------------------------------ 499 | 0:5 | | 500 ------------------------------------------------------------------ 501 | 0:6 | ECMP({adjacency1,...adjacencyN}, seed) | 502 ------------------------------------------------------------------ 503 ... 504 | BitStringLength | ... | 505 ------------------------------------------------------------------ 506 Bit Index Forwarding Table 508 Figure 4: BIFT adjacencies 510 The BIFT is programmed into the data plane of BFRs by the BIER-TE 511 controller host and used to forward packets, according to the rules 512 specified in the BIER-TE Forwarding Procedures. 514 Adjacencies for the same BP when populated in more than one BFR by 515 the controller do not have to have the same adjacencies. This is up 516 to the controller. BPs for p2p links are one case (see below). 518 3.2. Adjacency Types 520 3.2.1. Forward Connected 522 A "forward_connected" adjacency is towards a directly connected BFR 523 neighbor using an interface address of that BFR on the connecting 524 interface. A forward_connected adjacency does not route packets but 525 only L2 forwards them to the neighbor. 527 Packets sent to an adjacency with "DoNotReset" (DNR) set in the BIFT 528 will not have the BitPosition for that adjacency reset when the BFR 529 creates a copy for it. The BitPosition will still be reset for 530 copies of the packet made towards other adjacencies. The can be used 531 for example in ring topologies as explained below. 533 3.2.2. Forward Routed 535 A "forward_routed" adjacency is an adjacency towards a BFR that is 536 not a forward_connected adjacency: towards a loopback address of a 537 BFR or towards an interface address that is non-directly connected. 538 Forward_routed packets are forwarded via the Routing Underlay. 540 If the Routing Underlay has multiple paths for a forward_routed 541 adjacency, it will perform ECMP independent of BIER-TE for packets 542 forwarded across a forward_routed adjacency. 544 If the Routing Underlay has FRR, it will perform FRR independent of 545 BIER-TE for packets forwarded across a forward_routed adjacency. 547 3.2.3. ECMP 549 The ECMP mechanisms in BIER are tied to the BIER BIFT and are are 550 therefore not directly useable with BIER-TE. The following 551 procedures describe ECMP for BIER-TE that we consider to be 552 lightweight but also well manageable. It leverages the existing 553 entropy parameter in the BIER header to keep packets of the flows on 554 the same path and it introduces a "seed" parameter to allow 555 engineering traffic to be polarized or randomized across multiple 556 hops. 558 An "Equal Cost Multipath" (ECMP) adjacency has a list of two or more 559 adjacencies included in it. It copies the BIER-TE to one of those 560 adjacencies based on the ECMP hash calculation. The BIER-TE ECMP 561 hash algorithm must select the same adjacency from that list for all 562 packets with the same "entropy" value in the BIER-TE header if the 563 same number of adjacencies and same seed are given as parameters. 564 Further use of the seed parameter is explained below. 566 3.2.4. Local Decap 568 A "local_decap" adjacency passes a copy of the payload of the BIER-TE 569 packet to the packets NextProto within the BFR (IPv4/IPv6, 570 Ethernet,...). A local_decap adjacency turns the BFR into a BFER for 571 matching packets. Local_decap adjacencies require the BFER to 572 support routing or switching for NextProto to determine how to 573 further process the packet. 575 3.3. Encapsulation considerations 577 Specifications for BIER-TE encapsulation are outside the scope of 578 this document. This section gives explanations and guidelines. 580 Because a BFR needs to interpret the BitString of a BIER-TE packet 581 differently from a BIER packet, it is necessary to distinguish BIER 582 from BIER-TE packets. This is subject to definitions in BIER 583 encapsulation specifications. 585 MPLS encapsulation [RFC8296] for example assigns one label by which 586 BFRs recognizes BIER packets for every (SI,subdomain) combination. 587 If it is desirable that every subdomain can forward only BIER or 588 BIER-TE packets, then the label allocation could stay the same, and 589 only the forwarding model (BIER/BIER-TE) would have to be defined per 590 subdomain. If it is desirable to support both BIER and BIER-TE 591 forwarding in the same subdomain, then additional labels would need 592 to be assigned for BIER-TE forwarding. 594 "forward_routed" requires an encapsulation permitting to unicast 595 BIER-TE packets to a specific interface address on a target BFR. 596 With MPLS encapsulation, this can simply be done via a label stack 597 with that addresses label as the top label - followed by the label 598 assigned to (SI,subdomain) - and if necessary (see above) BIER-TE. 599 With non-MPLS encapsulation, some form of IP tunneling (IP in IP, 600 LISP, GRE) would be required. 602 The encapsulation used for "forward_routed" adjacencies can equally 603 support existing advanced adjacency information such as "loose source 604 routes" via eg: MPLS label stacks or appropriate header extensions 605 (eg: for IPv6). 607 3.4. Basic BIER-TE Forwarding Example 609 Step by step example of basic BIER-TE forwarding. This does not use 610 ECMP or forward_routed adjacencies nor does it try to minimize the 611 number of required BitPositions for the topology. 613 [Bier-Te Controller Host] 614 / | \ 615 v v v 617 | p13 p1 | 618 +- BFIR2 --+ | 619 | | p2 p6 | LAN2 620 | +-- BFR3 --+ | 621 | | | p7 p11 | 622 Src -+ +-- BFER1 --+ 623 | | p3 p8 | | 624 | +-- BFR4 --+ +-- Rcv1 625 | | | | 626 | | 627 | p14 p4 | 628 +- BFIR1 --+ | 629 | +-- BFR5 --+ p10 p12 | 630 LAN1 | p5 p9 +-- BFER2 --+ 631 | +-- Rcv2 632 | 633 LAN3 635 IP |..... BIER-TE network......| IP 637 Figure 5: BIER-TE Forwarding Example 639 pXX indicate the BitPositions number assigned by the BIER-TE 640 controller host to adjacencies in the BIER-TE topology. For example, 641 p9 is the adjacency towards BFR9 on the LAN connecting to BFER2. 643 BIFT BFIR2: 644 p13: local_decap() 645 p2: forward_connected(BFR3) 647 BIFT BFR3: 648 p1: forward_connected(BFIR2) 649 p7: forward_connected(BFER1) 650 p8: forward_connected(BFR4) 652 BIFT BFER1: 653 p11: local_decap() 654 p6: forward_connected(BFR3) 655 p8: forward_connected(BFR4) 657 Figure 6: BIER-TE Forwarding Example Adjacencies 659 ...and so on. 661 Traffic needs to flow from BFIR2 towards Rcv1, Rcv2. The controller 662 determines it wants it to pass across the following paths: 664 -> BFER1 ---------------> Rcv1 665 BFIR2 -> BFR3 666 -> BFR4 -> BFR5 -> BFER2 -> Rcv2 668 Figure 7: BIER-TE Forwarding Example Paths 670 These paths equal to the following BitString: p2, p5, p7, p8, p10, 671 p11, p12. 673 This BitString is set up in BFIR2. Multicast packets arriving at 674 BFIR2 from Src are assigned this BitString. 676 BFIR2 forwards based on that BitString. It has p2 and p13 populated. 677 Only p13 is in BitString which has an adjacency towards BFR3. BFIR2 678 resets p2 in BitString and sends a copy towards BFR2. 680 BFR3 sees a BitString of p5,p7,p8,p10,p11,p12. It is only interested 681 in p1,p7,p8. It creates a copy of the packet to BFER1 (due to p7) 682 and one to BFR4 (due to p8). It resets p7, p8 before sending. 684 BFER1 sees a BitString of p5,p10,p11,p12. It is only interested in 685 p6,p7,p8,p11 and therefore considers only p11. p11 is a "local_decap" 686 adjacency installed by the BIER-TE controller host because BFER1 687 should pass packets to IP multicast. The local_decap adjacency 688 instructs BFER1 to create a copy, decapsulate it from the BIER header 689 and pass it on to the NextProtocol, in this example IP multicast. IP 690 multicast will then forward the packet out to LAN2 because it did 691 receive PIM or IGMP joins on LAN2 for the traffic. 693 Further processing of the packet in BFR4, BFR5 and BFER2 accordingly. 695 3.5. Forwarding comparison with BIER 697 Forwarding of BIER-TE is designed to allow common forwarding hardware 698 with BIER. In fact, one of the main goals of this document is to 699 encourage the building of forwarding hardware that can not only 700 support BIER, but also BIER-TE - to allow experimentation with BIER- 701 TE and support building of BIER-TE control plane code. 703 The pseudocode in Section 6 shows how existing BIER/BIFT forwarding 704 can be amended to support basic BIER-TE forwarding, by using BIER 705 BIFT's F-BM. Only the masking of bits due to avoid duplicates must 706 be skipped when forwarding is for BIER-TE. 708 Whether to use BIER or BIER-TE forwarding can simply be a configured 709 choice per subdomain and accordingly be set up by a BIER-TE 710 controller host. The BIER packet encapsulation [RFC8296] too can be 711 reused without changes except that the currently defined BIER-TE ECMP 712 adjacency does not leverage the entropy field so that field would be 713 unused when BIER-TE forwarding is used. 715 3.6. Requirements 717 Basic BIER-TE forwarding MUST support to configure Subdomains to use 718 basic BIER-TE forwarding rules (instead of BIER). With basic BIER-TE 719 forwarding, every bit MUST support to have zero or one adjacency. It 720 MUST support the adjacency types forward_connected without DNR flag, 721 forward_routed and local_decap. All other BIER-TE forwarding 722 features are optional. This Basic BIER-TE requirements make BIER-TE 723 forwarding exactly the same as BIER forwarding with the exception of 724 skipping the aforementioned F-BM masking on egres. 726 BIER-TE forwarding SHOULD support the DNR flag, as this is highly 727 useful to save bits in rings (see Section 4.6). 729 BIER-TE forwarding MAY support more than one djacency on a bit and 730 ECMP adjacencies. The importance of ECMP adjacencies is unclear when 731 traffic engineering is used because it may be more desirable to 732 explicitly steer traffic across non-ECMP paths to make per-path 733 traffic calculation easier for controllers. Having more than one 734 adjacency for a bit allows further savings of bits in hub&spoke 735 scenarios, but unlike rings it is less "natural" to flood traffic 736 across multuple links unconditional. Both ECMP and multiple 737 adjacencies are forwarding plane features that should be possible to 738 support later when needed as they do not impact the basic BIER-TE 739 replication loop. This is true because there is no inter-copy 740 depency through resetting of F-BM as in BIER. 742 4. BIER-TE Controller Host BitPosition Assignments 744 This section describes how the BIER-TE controller host can use the 745 different BIER-TE adjacency types to define the BitPositions of a 746 BIER-TE domain. 748 Because the size of the BitString is limiting the size of the BIER-TE 749 domain, many of the options described exist to support larger 750 topologies with fewer BitPositions (4.1, 4.3, 4.4, 4.5, 4.6, 4.7, 751 4.8). 753 4.1. P2P Links 755 Each P2p link in the BIER-TE domain is assigned one unique 756 BitPosition with a forward_connected adjacency pointing to the 757 neighbor on the p2p link. 759 4.2. BFER 761 Every BFER is given a unique BitPosition with a local_decap 762 adjacency. 764 4.3. Leaf BFERs 766 Leaf BFERs are BFERs where incoming BIER-TE packets never need to be 767 forwarded to another BFR but are only sent to the BFER to exit the 768 BIER-TE domain. For example, in networks where PEs are spokes 769 connected to P routers, those PEs are Leaf BFIRs unless there is a 770 U-turn between two PEs. 772 All leaf-BFER in a BIER-TE domain can share a single BitPosition. 773 This is possible because the BitPosition for the adjacency to reach 774 the BFER can be used to distinguish whether or not packets should 775 reach the BFER. 777 This optimization will not work if an upstream interface of the BFER 778 is using a BitPosition optimized as described in the following two 779 sections (LAN, Hub and Spoke). 781 4.4. LANs 783 In a LAN, the adjacency to each neighboring BFR on the LAN is given a 784 unique BitPosition. The adjacency of this BitPosition is a 785 forward_connected adjacency towards the BFR and this BitPosition is 786 populated into the BIFT of all the other BFRs on that LAN. 788 BFR1 789 |p1 790 LAN1-+-+---+-----+ 791 p3| p4| p2| 792 BFR3 BFR4 BFR7 794 Figure 8: LAN Example 796 If Bandwidth on the LAN is not an issue and most BIER-TE traffic 797 should be copied to all neighbors on a LAN, then BitPositions can be 798 saved by assigning just a single BitPosition to the LAN and 799 populating the BitPosition of the BIFTs of each BFRs on the LAN with 800 a list of forward_connected adjacencies to all other neighbors on the 801 LAN. 803 This optimization does not work in the face of BFRs redundantly 804 connected to more than one LANs with this optimization because these 805 BFRs would receive duplicates and forward those duplicates into the 806 opposite LANs. Adjacencies of such BFRs into their LANs still need a 807 separate BitPosition. 809 4.5. Hub and Spoke 811 In a setup with a hub and multiple spokes connected via separate p2p 812 links to the hub, all p2p links can share the same BitPosition. The 813 BitPosition on the hubs BIFT is set up with a list of 814 forward_connected adjacencies, one for each Spoke. 816 This option is similar to the BitPosition optimization in LANs: 817 Redundantly connected spokes need their own BitPositions. 819 4.6. Rings 821 In L3 rings, instead of assigning a single BitPosition for every p2p 822 link in the ring, it is possible to save BitPositions by setting the 823 "Do Not Reset" (DNR) flag on forward_connected adjacencies. 825 For the rings shown in the following picture, a single BitPosition 826 will suffice to forward traffic entering the ring at BFRa or BFRb all 827 the way up to BFR1: 829 On BFRa, BFRb, BFR30,... BFR3, the BitPosition is populated with a 830 forward_connected adjacency pointing to the clockwise neighbor on the 831 ring and with DNR set. On BFR2, the adjacency also points to the 832 clockwise neighbor BFR1, but without DNR set. 834 Handling DNR this way ensures that copies forwarded from any BFR in 835 the ring to a BFR outside the ring will not have the ring BitPosition 836 set, therefore minimizing the chance to create loops. 838 v v 839 | | 840 L1 | L2 | L3 841 /-------- BFRa ---- BFRb --------------------\ 842 | | 843 \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/ 844 | | L4 | | 845 p33| p15| 846 BFRd BFRc 848 Figure 9: Ring Example 850 Note that this example only permits for packets to enter the ring at 851 BFRa and BFRb, and that packets will always travel clockwise. If 852 packets should be allowed to enter the ring at any ring BFR, then one 853 would have to use two ring BitPositions. One for clockwise, one for 854 counterclockwise. 856 Both would be set up to stop rotating on the same link, eg: L1. When 857 the ingress ring BFR creates the clockwise copy, it will reset the 858 counterclockwise BitPosition because the DNR bit only applies to the 859 bit for which the replication is done. Likewise for the clockwise 860 BitPosition for the counterclockwise copy. In result, the ring 861 ingress BFR will send a copy in both directions, serving BFRs on 862 either side of the ring up to L1. 864 4.7. Equal Cost MultiPath (ECMP) 866 The ECMP adjacency allows to use just one BP per link bundle between 867 two BFRs instead of one BP for each p2p member link of that link 868 bundle. In the following picture, one BP is used across L1,L2,L3 and 869 BFR1/BFR2 have for the BP 870 --L1----- 871 BFR1 --L2----- BFR2 872 --L3----- 874 BIFT entry in BFR1: 875 ------------------------------------------------------------------ 876 | Index | Adjacencies | 877 ================================================================== 878 | 0:6 | ECMP({L1-to-BFR2,L2-to-BFR2,L3-to-BFR2}, seed) | 879 ------------------------------------------------------------------ 881 BIFT entry in BFR2: 882 ------------------------------------------------------------------ 883 | Index | Adjacencies | 884 ================================================================== 885 | 0:6 | ECMP({L1-to-BFR1,L2-to-BFR1,L3-to-BFR1}, seed) | 886 ------------------------------------------------------------------ 888 Figure 10: ECMP Example 890 In the following example, all traffic from BFR1 towards BFR10 is 891 intended to be ECMP load split equally across the topology. This 892 example is not mean as a likely setup, but to illustrate that ECMP 893 can be used to share BPs not only across link bundles, and it 894 explains the use of the seed parameter. 896 BFR1 897 / \ 898 /L11 \L12 899 BFR2 BFR3 900 / \ / \ 901 /L21 \L22 /L31 \L32 902 BFR4 BFR5 BFR6 BFR7 903 \ / \ / 904 \ / \ / 905 BFR8 BFR9 906 \ / 907 \ / 908 BFR10 910 BIFT entry in BFR1: 911 ------------------------------------------------------------------ 912 | 0:6 | ECMP({L11-to-BFR2,L12-to-BFR3}, seed) | 913 ------------------------------------------------------------------ 915 BIFT entry in BFR2: 916 ------------------------------------------------------------------ 917 | 0:6 | ECMP({L21-to-BFR4,L22-to-BFR5}, seed) | 918 ------------------------------------------------------------------ 920 BIFT entry in BFR3: 921 ------------------------------------------------------------------ 922 | 0:6 | ECMP({L31-to-BFR6,L32-to-BFR7}, seed) | 923 ------------------------------------------------------------------ 925 Figure 11: Polarization Example 927 With the setup of ECMP in above topology, traffic would not be 928 equally load-split. Instead, links L22 and L31 would see no traffic 929 at all: BFR2 will only see traffic from BFR1 for which the ECMP hash 930 in BFR1 selected the first adjacency in a list of 2 adjacencies: link 931 L11-to-BFR2. When forwarding in BFR2 performs again an ECMP with two 932 adjacencies on that subset of traffic, then it will again select the 933 first of its two adjacencies to it: L21-to-BFR4. And therefore L22 934 and BFR5 sees no traffic. 936 To resolve this issue, the ECMP adjacency on BFR1 simply needs to be 937 set up with a different seed than the ECMP adjacencies on BFR2/BFR3 939 This issue is called polarization. It depends on the ECMP hash. It 940 is possible to build ECMP that does not have polarization, for 941 example by taking entropy from the actual adjacency members into 942 account, but that can make it harder to achieve evenly balanced load- 943 splitting on all BFR without making the ECMP hash algorithm 944 potentially too complex for fast forwarding in the BFRs. 946 4.8. Routed adjacencies 948 4.8.1. Reducing BitPositions 950 Routed adjacencies can reduce the number of BitPositions required 951 when the traffic engineering requirement is not hop-by-hop explicit 952 path selection, but loose-hop selection. 954 ............... ............... 955 BFR1--... Redundant ...--L1-- BFR2... Redundant ...--- 956 \--... Network ...--L2--/ ... Network ...--- 957 BFR4--... Segment 1 ...--L3-- BFR3... Segment 2 ...--- 958 ............... ............... 960 Figure 12: Routed Adjacencies Example 962 Assume the requirement in above network is to explicitly engineer 963 paths such that specific traffic flows are passed from segment 1 to 964 segment 2 via link L1 (or via L2 or via L3). 966 To achieve this, BFR1 and BFR4 are set up with a forward_routed 967 adjacency BitPosition towards an address of BFR2 on link L1 (or link 968 L2 BFR3 via L3). 970 For paths to be engineered through a specific node BFR2 (or BFR3), 971 BFR1 and BFR4 are set up up with a forward_routed adjacency 972 BitPosition towards a loopback address of BFR2 (or BFR3). 974 4.8.2. Supporting nodes without BIER-TE 976 Routed adjacencies also enable incremental deployment of BIER-TE. 977 Only the nodes through which BIER-TE traffic needs to be steered - 978 with or without replication - need to support BIER-TE. Where they 979 are not directly connected to each other, forward_routed adjacencies 980 are used to pass over non BIER-TE enabled nodes. 982 5. Avoiding loops and duplicates 984 5.1. Loops 986 Whenever BIER-TE creates a copy of a packet, the BitString of that 987 copy will have all BitPositions cleared that are associated with 988 adjacencies in the BFR. This inhibits looping of packets. The only 989 exception are adjacencies with DNR set. 991 With DNR set, looping can happen. Consider in the ring picture that 992 link L4 from BFR3 is plugged into the L1 interface of BFRa. This 993 creates a loop where the rings clockwise BitPosition is never reset 994 for copies of the packets traveling clockwise around the ring. 996 To inhibit looping in the face of such physical misconfiguration, 997 only forward_connected adjacencies are permitted to have DNR set, and 998 the link layer destination address of the adjacency (eg.: MAC 999 address) protects against closing the loop. Link layers without port 1000 unique link layer addresses should not used with the DNR flag set. 1002 5.2. Duplicates 1004 Duplicates happen when the topology of the BitString is not a tree 1005 but redundantly connecting BFRs with each other. The controller must 1006 therefore ensure to only create BitStrings that are trees in the 1007 topology. 1009 When links are incorrectly physically re-connected before the 1010 controller updates BitStrings in BFIRs, duplicates can happen. Like 1011 loops, these can be inhibited by link layer addressing in 1012 forward_connected adjacencies. 1014 If interface or loopback addresses used in forward_routed adjacencies 1015 are moved from one BFR to another, duplicates can equally happen. 1016 Such re-addressing operations must be coordinated with the 1017 controller. 1019 6. BIER-TE Forwarding Pseudocode 1021 The following simplified pseudocode for BIER-TE forwarding is using 1022 BIER forwarding pseudocode of [RFC8279], section 6.5 with the one 1023 modification necessary to support basic BIER-TE forwarding. Like the 1024 BIER pseudo forwarding code, for simplicity it does hide the details 1025 of the adjacency processing inside PacketSend() which can be 1026 forward_connected, forward_routed or local_decap. 1028 void ForwardBitMaskPacket_withTE (Packet) 1029 { 1030 SI=GetPacketSI(Packet); 1031 Offset=SI*BitStringLength; 1032 for (Index = GetFirstBitPosition(Packet->BitString); Index ; 1033 Index = GetNextBitPosition(Packet->BitString, Index)) { 1034 F-BM = BIFT[Index+Offset]->F-BM; 1035 if (!F-BM) continue; 1036 BFR-NBR = BIFT[Index+Offset]->BFR-NBR; 1037 PacketCopy = Copy(Packet); 1038 PacketCopy->BitString &= F-BM; [2] 1039 PacketSend(PacketCopy, BFR-NBR); 1040 // The following must not be done for BIER-TE: 1041 // Packet->BitString &= ~F-BM; [1] 1042 } 1043 } 1045 Figure 13: Simplified BIER-TE Forwarding Pseudocode 1047 The difference is that in BIER-TE, step [1] must not be performed. 1049 In BIER, this step is necessary to avoid duplicates when two or more 1050 BFER are reachable via the same neighbor. The F-BM of all those BFER 1051 bits will indicate each others bits, and step [1] will reset all 1052 these bits on the first copy made for the first of those BFER bits 1053 set in the BitString, hence skipping any further copies to that 1054 neighbor. 1056 Whereas in BIER, the F-BM of bits toward a specific neighbor contain 1057 only the bits of those BFER destined to be forwarded across this 1058 neighbor, in BIER-TE the F-BM for a neighbor needs to have all bits 1059 set except all those bits that are actual (non-empty) adjacencies of 1060 this BFR. Step [2] will reset those adjacency bits to avoid loops, 1061 but all the other bits that are not adjacencies of this BFR need to 1062 stay untouched by [2] so that they can be processed by further BFR 1063 along the path. If [1] was performed as in BIER, then those non- 1064 adjacency bits would erroneously get reset during replication. 1066 To support the DNR (Do Not Reset) flag of forward_connected() 1067 adjacencies, the F-BM must also have its own bit set in the F-BM of 1068 such an adjacency , so that for the packet copy made for this 1069 adjacency the bit stays on, whereas it will not be set in the F-BM of 1070 other bits so that it will be reset for any other packet copy made. 1072 Eliminating the need to perform [1] also makes processing of bits in 1073 the BIER-TE bitstring independent of processing other bits, which may 1074 also simplify forwarding plane implementations. 1076 The following pseudocode is comprehensive: 1078 o This pseudocode eliminates per-bit F-BM, therefore reducing state 1079 by BitStringLength^2*SI and eliminating the need for per-packet- 1080 copy masking operation except for adjacencies with DNR flag set: 1082 * AdjacentBits[SI] are bits with a non-empty list of adjcencies. 1083 This can be computed whenever the BIER-TE controller host 1084 updates the adjacencies. 1086 * Only the AdjacentBits need to be examined in the loop for 1087 packet copies. 1089 * The packets BitString is masked with those AdjacentBits on 1090 ingres to avoid packet loopings. 1092 o The code loops over the adjacencies because there may be more than 1093 one adjacency for a bit. 1095 o When an adjacency has the DNR bit, the bit is set in the packet 1096 copy (to save bits in rings for example). 1098 o The ECMP adjacency is shown. Its parameters are a 1099 ListOfAdjacencies from which one is picked. 1101 o The forward_local, forward_routed, local_decap adjacencies are 1102 shown with their parameters. 1104 void ForwardBitMaskPacket_withTE (Packet) 1105 { 1106 SI=GetPacketSI(Packet); 1107 Offset=SI*BitStringLength; 1108 AdjacentBitstring = Packet->BitString &= ~AdjacentBits[SI]; 1109 Packet->BitString &= AdjacentBits[SI]; 1110 for (Index = GetFirstBitPosition(AdjacentBits); Index ; 1111 Index = GetNextBitPosition(AdjacentBits, Index)) { 1112 foreach adjacency BIFT[Index+Offset] { 1113 if(adjacency == ECMP(ListOfAdjacencies, seed) ) { 1114 I = ECMP_hash(sizeof(ListOfAdjacencies), 1115 Packet->Entropy, seed); 1116 adjacency = ListOfAdjacencies[I]; 1117 } 1118 PacketCopy = Copy(Packet); 1119 switch(adjacency) { 1120 case forward_connected(interface,neighbor,DNR): 1121 if(DNR) 1122 PacketCopy->BitString |= 2<<(Index-1); 1123 SendToL2Unicast(PacketCopy,interface,neighbor); 1125 case forward_routed({VRF},neighbor): 1126 SendToL3(PacketCopy,{VRF,}l3-neighbor); 1128 case local_decap({VRF},neighbor): 1129 DecapBierHeader(PacketCopy); 1130 PassTo(PacketCopy,{VRF,}Packet->NextProto); 1131 } 1132 } 1133 } 1134 } 1136 Figure 14: BIER-TE Forwarding Pseudocode 1138 7. Managing SI, subdomains and BFR-ids 1140 When the number of bits required to represent the necessary hops in 1141 the topology and BFER exceeds the supported bitstring length, 1142 multiple SI and/or subdomains must be used. This section discusses 1143 how. 1145 BIER-TE forwarding does not require the concept of BFR-id, but 1146 routing underlay, flow overlay and BIER headers may. This section 1147 also discusses how BFR-id can be assigned to BFIR/BFER for BIER-TE. 1149 7.1. Why SI and sub-domains 1151 For BIER and BIER-TE forwarding, the most important result of using 1152 multiple SI and/or subdomains is the same: Packets that need to be 1153 sent to BFER in different SI or subdomains require different BIER 1154 packets: each one with a bitstring for a different (SI,subdomain) 1155 bitstring. Each such bitstring uses one bitstring length sized SI 1156 block in the BIFT of the subdomain. We call this a BIFT:SI (block). 1158 For BIER and BIER-TE forwarding itself there is also no difference 1159 whether different SI and/or sub-domains are chosen, but SI and 1160 subdomain have different purposes in the BIER architecture shared by 1161 BIER-TE. This impacts how operators are managing them and how 1162 especially flow overlays will likely use them. 1164 By default, every possible BFIR/BFER in a BIER network would likely 1165 be given a BFR-id in subdomain 0 (unless there are > 64k BFIR/BFER). 1167 If there are different flow services (or service instances) requiring 1168 replication to different subsets of BFER, then it will likely not be 1169 possible to achieve the best replication efficiency for all of these 1170 service instances via subdomain 0. Ideal replication efficiency for 1171 N BFER exists in a subdomain if they are split over not more than 1172 ceiling(N/bitstring-length) SI. 1174 If service instances justify additional BIER:SI state in the network, 1175 additional subdomains will be used: BFIR/BFER are assigned BFIR-id in 1176 those subdomains and each service instance is configured to use the 1177 most appropriate subdomain. This results in improved replication 1178 efficiency for different services. 1180 Even if creation of subdomains and assignment of BFR-id to BFIR/BFER 1181 in those subdomains is automated, it is not expected that individual 1182 service instances can deal with BFER in different subdomains. A 1183 service instance may only support configuration of a single subdomain 1184 it should rely on. 1186 To be able to easily reuse (and modify as little as possible) 1187 existing BIER procedures including flow-overlay and routing underlay, 1188 when BIER-TE forwarding is added, we therefore reuse SI and subdomain 1189 logically in the same way as they are used in BIER: All necessary 1190 BFIR/BFER for a service use a single BIER-TE BIFT and are split 1191 across as many SI as necessary (see below). Different services may 1192 use different subdomains that primarily exist to provide more 1193 efficient replication (and for BIER-TE desirable traffic engineering) 1194 for different subsets of BFIR/BFER. 1196 7.2. Bit assignment comparison BIER and BIER-TE 1198 In BIER, bitstrings only need to carry bits for BFER, which lead to 1199 the model that BFR-ids map 1:1 to each bit in a bitstring. 1201 In BIER-TE, bitstrings need to carry bits to indicate not only the 1202 receiving BFER but also the intermediate hops/links across which the 1203 packet must be sent. The maximum number of BFER that can be 1204 supported in a single bitstring or BIFT:SI depends on the number of 1205 bits necessary to represent the desired topology between them. 1207 "Desired" topology because it depends on the physical topology, and 1208 on the desire of the operator to allow for explicit traffic 1209 engineering across every single hop (which requires more bits), or 1210 reducing the number of required bits by exploiting optimizations such 1211 as unicast (forward_route), ECMP or flood (DNR) over "uninteresting" 1212 sub-parts of the topology - eg: parts where different trees do not 1213 need to take different paths due to traffic-engineering reasons. 1215 The total number of bits to describe the topology in a BIFT:SI can 1216 therefore easily be as low as 20% or as high as 80%. The higher the 1217 percentage, the higher the likelihood, that those topology bits are 1218 not just BIER-TE overhead without additional benefit, but instead 1219 they will allow to express the desired traffic-engineering 1220 alternatives. 1222 7.3. Using BFR-id with BIER-TE 1224 Because there is no 1:1 mapping between bits in the bitstring and 1225 BFER, BIER-TE can not simply rely on the BIER 1:1 mapping between 1226 bits in a bitstring and BFR-id. 1228 In BIER, automatic schemes could assign all possible BFR-ids 1229 sequentially to BFERs. This will not work in BIER-TE. In BIER-TE, 1230 the operator or BIER-TE controller host has to determine a BFR-id for 1231 each BFER in each required subdomain. The BFR-id may or may not have 1232 a relationship with a bit in the bitstring. Suggestions are detailed 1233 below. Once determined, the BFR-id can then be configured on the 1234 BFER and used by flow overlay, routing underlay and the BIER header 1235 almost the same as the BFR-id in BIER. 1237 The one exception are application/flow-overlays that automatically 1238 calculate the bitstring(s) of BIER packets by converting BFR-id to 1239 bits. In BIER-TE, this operation can be done in two ways: 1241 "Independent branches": For a given application or (set of) trees, 1242 the branches from a BFIR to every BFER are independent of the 1243 branches to any other BFER. For example, shortest part trees have 1244 independent branches. 1246 "Interdependent branches": When a BFER is added or deleted from a 1247 particular distribution tree, branches to other BFER still in the 1248 tree may need to change. Steiner tree are examples of dependent 1249 branch trees. 1251 If "independent branches" are sufficient, the BIER-TE controller host 1252 can provide to such applications for every BFR-id a SI:bitstring with 1253 the BIER-TE bits for the branch towards that BFER. The application 1254 can then independently calculate the SI:bitstring for all desired 1255 BFER by OR'ing their bitstrings. 1257 If "interdependent branches" are required, the application could call 1258 a BIER-TE controller host API with the list of required BFER-id and 1259 get the required bitstring back. Whenever the set of BFER-id 1260 changes, this is repeated. 1262 Note that in either case (unlike in BIER), the bits in BIER-TE may 1263 need to change upon link/node failure/recovery, network expansion and 1264 network load by other traffic (as part of traffic engineering goals). 1265 Interactions between such BFIR applications and the BIER-TE 1266 controller host do therefore need to support dynamic updates to the 1267 bitstrings. 1269 7.4. Assigning BFR-ids for BIER-TE 1271 For non-leaf BFER, there is usually a single bit k for that BFER with 1272 a local_decap() adjacency on the BFER. The BFR-id for such a BFER is 1273 therefore most easily the one it would have in BIER: SI * bitstring- 1274 length + k. 1276 As explained earlier in the document, leaf BFER do not need such a 1277 separate bit because the fact alone that the BIER-TE packet is 1278 forwarded to the leaf BFER indicates that the BFER should decapsulate 1279 it. Such a BFER will have one or more bits for the links leading 1280 only to it. The BFR-id could therefore most easily be the BFR-id 1281 derived from the lowest bit for those links. 1283 These two rules are only recommendations for the operator or BIER-TE 1284 controller assigning the BFR-ids. Any allocation scheme can be used, 1285 the BFR-ids just need to be unique across BFRs in each subdomain. 1287 It is not currently determined if a single subdomain could or should 1288 be allowed to forward both BIER and BIER-TE packets. If this should 1289 be supported, there are two options: 1291 A. BIER and BIER-TE have different BFR-id in the same subdomain. 1292 This allows higher replication efficiency for BIER because their BFR- 1293 id can be assigned sequentially, while the bitstrings for BIER-TE 1294 will have also the additional bits for the topology. There is no 1295 relationship between a BFR BIER BFR-id and BIER-TE BFR-id. 1297 B. BIER and BIER-TE share the same BFR-id. The BFR-id are assigned 1298 as explained above for BIER-TE and simply reused for BIER. The 1299 replication efficiency for BIER will be as low as that for BIER-TE in 1300 this approach. Depending on topology, only the same 20%..80% of bits 1301 as possible for BIER-TE can be used for BIER. 1303 7.5. Example bit allocations 1305 7.5.1. With BIER 1307 Consider a network setup with a bitstring length of 256 for a network 1308 topology as shown in the picture below. The network has 6 areas, 1309 each with ca. 180 BFR, connecting via a core with some larger (core) 1310 BFR. To address all BFER with BIER, 4 SI are required. To send a 1311 BIER packet to all BFER in the network, 4 copies need to be sent by 1312 the BFIR. On the BFIR it does not make a difference how the BFR-id 1313 are allocated to BFER in the network, but for efficiency further down 1314 in the network it does make a difference. 1316 area1 area2 area3 1317 BFR1a BFR1b BFR2a BFR2b BFR3a BFR3b 1318 | \ / \ / | 1319 ................................ 1320 . Core . 1321 ................................ 1322 | / \ / \ | 1323 BFR4a BFR4b BFR5a BFR5b BFR6a BFR6b 1324 area4 area5 area6 1326 Figure 15: Scaling BIER-TE bits by reuse 1328 With random allocation of BFR-id to BFER, each receiving area would 1329 (most likely) have to receive all 4 copies of the BIER packet because 1330 there would be BFR-id for each of the 4 SI in each of the areas. 1331 Only further towards each BFER would this duplication subside - when 1332 each of the 4 trees runs out of branches. 1334 If BFR-id are allocated intelligently, then all the BFER in an area 1335 would be given BFR-id with as few as possible different SI. Each 1336 area would only have to forward one or two packets instead of 4. 1338 Given how networks can grow over time, replication efficiency in an 1339 area will also easily go down over time when BFR-id are network wide 1340 allocated sequentially over time. An area that initially only has 1341 BFR-id in one SI might end up with many SI over a longer period of 1342 growth. Allocating SIs to areas with initially sufficiently many 1343 spare bits for growths can help to alleviate this issue. Or renumber 1344 BFR-id after network expansion. In this example one may consider to 1345 use 6 SI and assign one to each area. 1347 This example shows that intelligent BFR-id allocation within at least 1348 subdomain 0 can even be helpful or even necessary in BIER. 1350 7.5.2. With BIER-TE 1352 In BIER-TE one needs to determine a subset of the physical topology 1353 and attached BFER so that the "desired" representation of this 1354 topology and the BFER fit into a single bitstring. This process 1355 needs to be repeated until the whole topology is covered. 1357 Once bits/SIs are assigned to topology and BFER, BFR-id is just a 1358 derived set of identifiers from the operator/BIER-TE controller as 1359 explained above. 1361 Every time that different sub-topologies have overlap, bits need to 1362 be repeated across the bitstrings, increasing the overall amount of 1363 bits required across all bitstring/SIs. In the worst case, random 1364 subsets of BFER are assigned to different SI. This is much worse 1365 than in BIER because it not only reduces replication efficiency with 1366 the same number of overall bits, but even further - because more bits 1367 are required due to duplication of bits for topology across multiple 1368 SI. Intelligent BFER to SI assignment and selecting specific 1369 "desired" subtopologies can minimize this problem. 1371 To set up BIER-TE efficiently for above topology, the following bit 1372 allocation methods can be used. This method can easily be expanded 1373 to other, similarly structured larger topologies. 1375 Each area is allocated one or more SI depending on the number of 1376 future expected BFER and number of bits required for the topology in 1377 the area. In this example, 6 SI, one per area. 1379 In addition, we use 4 bits in each SI: bia, bib, bea, beb: bit 1380 ingress a, bit ingress b, bit egress a, bit egress b. These bits 1381 will be used to pass BIER packets from any BFIR via any combination 1382 of ingress area a/b BFR and egress area a/b BFR into a specific 1383 target area. These bits are then set up with the right 1384 forward_routed adjacencies on the BFIR and area edge BFR: 1386 On all BFIR in an area j, bia in each BIFT:SI is populated with the 1387 same forward_routed(BFRja), and bib with forward_routed(BFRjb). On 1388 all area edge BFR, bea in BIFT:SI=k is populated with 1389 forward_routed(BFRka) and beb in BIFT:SI=k with 1390 forward_routed(BFRkb). 1392 For BIER-TE forwarding of a packet to some subset of BFER across all 1393 areas, a BFIR would create at most 6 copies, with SI=1...SI=6, In 1394 each packet, the bits indicate bits for topology and BFER in that 1395 topology plus the four bits to indicate whether to pass this packet 1396 via the ingress area a or b border BFR and the egress area a or b 1397 border BFR, therefore allowing path engineering for those two 1398 "unicast" legs: 1) BFIR to ingress are edge and 2) core to egress 1399 area edge. Replication only happens inside the egress areas. For 1400 BFER in the same area as in the BFIR, these four bits are not used. 1402 7.6. Summary 1404 BIER-TE can like BIER support multiple SI within a sub-domain to 1405 allow re-using the concept of BFR-id and therefore minimize BIER-TE 1406 specific functions in underlay routing, flow overlay methods and BIER 1407 headers. 1409 The number of BFIR/BFER possible in a subdomain is smaller than in 1410 BIER because BIER-TE uses additional bits for topology. 1412 Subdomains can in BIER-TE be used like in BIER to create more 1413 efficient replication to known subsets of BFER. 1415 Assigning bits for BFER intelligently into the right SI is more 1416 important in BIER-TE than in BIER because of replication efficiency 1417 and overall amount of bits required. 1419 8. BIER-TE and Segment Routing (SR) 1421 Segment Routing (SR ([RFC8402])) aims to enable lightweight path 1422 engineering via loose source routing. Compared to its more heavy- 1423 weight predecessor RSVP-TE ([RFC3209]), SR does for example not 1424 require per-path signaling to each of these hops. 1426 BIER-TE supports the same design philosophy for multicast. Like in 1427 SR, it relies on source-routing - via the definition of a BitString. 1428 Like SR, it only requires to consider the "hops" on which either 1429 replication has to happen, or across which the traffic should be 1430 steered (even without replication). Any other hops can be skipped 1431 via the use of routed adjacencies. 1433 BIER-TE BitPosition (BP) can be understood as the BIER-TE equivalent 1434 of "forwarding segments" in SR, but they have a different scope than 1435 SR forwarding segments. Whereas forwarding segments in SR are global 1436 or local, BPs in BIER-TE have a scope that is the group of BFR(s) 1437 that have adjacencies for this BP in their BIFT. This can be called 1438 "adjacency" scoped forwarding segments. 1440 Adjacency scope could be global, but then every BFR would need an 1441 adjacency for this BP, for example a forward_routed adjacency with 1442 encapsulation to the global SR SID of the destination. Such a BP 1443 would always result in ingres replication though. The first BFR 1444 encountering this BP would directly replicate to it. Only by using 1445 non-global adjacency scope for BPs can traffic be steered and 1446 replicated on non-ingres BFR. 1448 SR can naturally be combined with BIER-TE and help to optimize it. 1449 For example, instead of defining BitPositions for non-replicating 1450 hops, it is equally possible to use segment routing encapsulations 1451 (eg: MPLS label stacks) for the encapsulation of "forward_routed" 1452 adjacencies. 1454 Note that BIER itself can also be seen to be similar to SR. BIER BPs 1455 act as global destination Node-SIDs and the BIER bitstring is simply 1456 a highly optimized mechanism to indicate multiple such SIDS and let 1457 the network take care of effectively replicating the packet hop-by- 1458 hop to each destination Node-SID. What BIER does not allow is to 1459 indicate intermediate hops, or terms of SR the ability to indicate a 1460 sequence of SID to reach the destination. This is what BIER-TE and 1461 its adjacency scoped BP enables. 1463 9. Security Considerations 1465 The security considerations are the same as for BIER with the 1466 following differences: 1468 BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures 1469 for their distribution, so these are not attack vectors against BIER- 1470 TE. 1472 10. IANA Considerations 1474 This document requests no action by IANA. 1476 11. Acknowledgements 1478 The authors would like to thank Greg Shepherd, Ijsbrand Wijnands and 1479 Neale Ranns for their extensive review and suggestions. 1481 12. Change log [RFC Editor: Please remove] 1483 draft-ietf-bier-te-arch: 1485 03: Last call textual changes by authors to improve readability: 1487 removed Wolfgang Braun as co-authors (as requested). 1489 Improved abstract to be more explanatory. Removed mentioning of 1490 FRR (not conluded on so far). 1492 Added new text into Introduction setion because the text was too 1493 difficult to jump into (too many forward pointers). This 1494 primarily consists of examples and the early introduction of the 1495 BIER-TE Topology concept enabled by these examples. 1497 Amended comparison to SR. 1499 Changed syntax from [VRF] to {VRF} to indicate its optional and to 1500 make idnits happy. 1502 Split references into normative / informative, added references. 1504 02: Refresh after IETF104 discussion: changed intended status back 1505 to standard. Reasoning: 1507 Tighter review of standards document == ensures arch will be 1508 better prepared for possible adoption by other WGs (e.g.: DetNet) 1509 or std. bodies. 1511 Requirement against the degree of existing implementations is self 1512 defined by the WG. BIER WG seems to think it is not necessary to 1513 apply multiple interoperating implementions against an 1514 architecture level document at this time to make it qualify to go 1515 to standards track. Also, the levels of support introduced in -01 1516 rev. should allow all BIER forwarding engines to also be able to 1517 support the base level BIER-TE forwarding. 1519 01: Added note comparing BIER and SR to also hopefully clarify 1520 BIER-TE vs. BIER comparison re. SR. 1522 - added requirements section mandating only most basic BIER-TE 1523 forwarding features as MUST. 1525 - reworked comparison with BIER forwarding section to only 1526 summarize and point to pseudocode section. 1528 - reworked pseudocode section to have one pseodcode that mirrors 1529 the BIER forwarding pseudocode to make comparison easier and a 1530 second pseudocode that shows the complete set of BIER-TE 1531 forwarding options and simplification/optimization possible vs. 1532 BIER forwarding. 1534 - Added captions to pictures. 1536 00: Changed target state to experimental (WG conclusion), updated 1537 references, mod auth association. 1539 - Source now on http://www.github.com/toerless/bier-te-arch 1541 - Please open issues on the github for change/improvement requests 1542 to the document - in addition to posting them on the list 1543 (bier@ietf.). Thanks!. 1545 draft-eckert-bier-te-arch: 1547 06: Added overview of forwarding differences between BIER, BIER- 1548 TE. 1550 05: Author affiliation change only. 1552 04: Added comparison to Live-Live and BFIR to FRR section 1553 (Eckert). 1555 04: Removed FRR content into the new FRR draft [I-D.eckert-bier- 1556 te-frr] (Braun). 1558 - Linked FRR information to new draft in Overview/Introduction 1560 - Removed BTAFT/FRR from "Changes in the network topology" 1562 - Linked new draft in "Link/Node Failures and Recovery" 1564 - Removed FRR from "The BIER-TE Forwarding Layer" 1566 - Moved FRR section to new draft 1568 - Moved FRR parts of Pseudocode into new draft 1570 - Left only non FRR parts 1572 - removed FrrUpDown(..) and //FRR operations in 1573 ForwardBierTePacket(..) 1574 - New draft contains FrrUpDown(..) and ForwardBierTePacket(Packet) 1575 from bier-arch-03 1577 - Moved "BIER-TE and existing FRR to new draft 1579 - Moved "BIER-TE and Segment Routing" section one level up 1581 - Thus, removed "Further considerations" that only contained this 1582 section 1584 - Added Changes for version 04 1586 03: Updated the FRR section. Added examples for FRR key concepts. 1587 Added BIER-in-BIER tunneling as option for tunnels in backup 1588 paths. BIFT structure is expanded and contains an additional 1589 match field to support full node protection with BIER-TE FRR. 1591 03: Updated FRR section. Explanation how BIER-in-BIER 1592 encapsulation provides P2MP protection for node failures even 1593 though the routing underlay does not provide P2MP. 1595 02: Changed the definition of BIFT to be more inline with BIER. 1596 In revs. up to -01, the idea was that a BIFT has only entries for 1597 a single bitstring, and every SI and subdomain would be a separate 1598 BIFT. In BIER, each BIFT covers all SI. This is now also how we 1599 define it in BIER-TE. 1601 02: Added Section 7 to explain the use of SI, subdomains and BFR- 1602 id in BIER-TE and to give an example how to efficiently assign 1603 bits for a large topology requiring multiple SI. 1605 02: Added further detailed for rings - how to support input from 1606 all ring nodes. 1608 01: Fixed BFIR -> BFER for section 4.3. 1610 01: Added explanation of SI, difference to BIER ECMP, 1611 consideration for Segment Routing, unicast FRR, considerations for 1612 encapsulation, explanations of BIER-TE controller host and CLI. 1614 00: Initial version. 1616 13. References 1618 13.1. Normative References 1620 [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1621 Przygienda, T., and S. Aldrin, "Multicast Using Bit Index 1622 Explicit Replication (BIER)", RFC 8279, 1623 DOI 10.17487/RFC8279, November 2017, 1624 . 1626 [RFC8296] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1627 Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation 1628 for Bit Index Explicit Replication (BIER) in MPLS and Non- 1629 MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January 1630 2018, . 1632 13.2. Informative References 1634 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1635 Requirement Levels", BCP 14, RFC 2119, 1636 DOI 10.17487/RFC2119, March 1997, 1637 . 1639 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 1640 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 1641 Tunnels", RFC 3209, DOI 10.17487/RFC3209, December 2001, 1642 . 1644 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 1645 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1646 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 1647 July 2018, . 1649 Authors' Addresses 1651 Toerless Eckert (editor) 1652 Huawei USA - Futurewei Technologies Inc. 1653 2330 Central Expy 1654 Santa Clara 95050 1655 USA 1657 Email: tte+ietf@cs.fau.de 1659 Gregory Cauchie 1660 Bouygues Telecom 1662 Email: GCAUCHIE@bouyguestelecom.fr 1663 Michael Menth 1664 University of Tuebingen 1666 Email: menth@uni-tuebingen.de