idnits 2.17.1 draft-ietf-bier-te-arch-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([RFC8279]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 14, 2019) is 1802 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'I-D.eckert-bier-te-frr' is mentioned on line 136, but not defined == Missing Reference: 'VRF' is mentioned on line 973, but not defined -- Looks like a reference, but probably isn't: '2' on line 907 -- Looks like a reference, but probably isn't: '1' on line 917 == Missing Reference: 'SI' is mentioned on line 954, but not defined == Missing Reference: 'I' is mentioned on line 961, but not defined Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Eckert, Ed. 3 Internet-Draft Huawei 4 Intended status: Standards Track G. Cauchie 5 Expires: November 15, 2019 Bouygues Telecom 6 W. Braun 7 M. Menth 8 University of Tuebingen 9 May 14, 2019 11 Traffic Engineering for Bit Index Explicit Replication (BIER-TE) 12 draft-ietf-bier-te-arch-02 14 Abstract 16 This document proposes an architecture for BIER-TE: Traffic 17 Engineering for Bit Index Explicit Replication (BIER). 19 BIER-TE shares part of its architecture with BIER as described in 20 [RFC8279]. It also proposes to share the packet format with BIER. 22 BIER-TE forwards and replicates packets like BIER based on a 23 BitString in the packet header but it does not require an IGP. It 24 does support traffic engineering by explicit hop-by-hop forwarding 25 and loose hop forwarding of packets. It does support Fast ReRoute 26 (FRR) for link and node protection and incremental deployment. 27 Because BIER-TE like BIER operates without explicit in-network tree- 28 building but also supports traffic engineering, it is more similar to 29 SR than RSVP-TE. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on November 15, 2019. 48 Copyright Notice 50 Copyright (c) 2019 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 3 67 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 68 2. Layering . . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 2.1. The Multicast Flow Overlay . . . . . . . . . . . . . . . 5 70 2.2. The BIER-TE Controller Host . . . . . . . . . . . . . . . 5 71 2.2.1. Assignment of BitPositions to adjacencies of the 72 network topology . . . . . . . . . . . . . . . . . . 6 73 2.2.2. Changes in the network topology . . . . . . . . . . . 6 74 2.2.3. Set up per-multicast flow BIER-TE state . . . . . . . 6 75 2.2.4. Link/Node Failures and Recovery . . . . . . . . . . . 6 76 2.3. The BIER-TE Forwarding Layer . . . . . . . . . . . . . . 7 77 2.4. The Routing Underlay . . . . . . . . . . . . . . . . . . 7 78 3. BIER-TE Forwarding . . . . . . . . . . . . . . . . . . . . . 7 79 3.1. The Bit Index Forwarding Table (BIFT) . . . . . . . . . . 7 80 3.2. Adjacency Types . . . . . . . . . . . . . . . . . . . . . 8 81 3.2.1. Forward Connected . . . . . . . . . . . . . . . . . . 8 82 3.2.2. Forward Routed . . . . . . . . . . . . . . . . . . . 9 83 3.2.3. ECMP . . . . . . . . . . . . . . . . . . . . . . . . 9 84 3.2.4. Local Decap . . . . . . . . . . . . . . . . . . . . . 9 85 3.3. Encapsulation considerations . . . . . . . . . . . . . . 10 86 3.4. Basic BIER-TE Forwarding Example . . . . . . . . . . . . 10 87 3.5. Forwarding comparison with BIER . . . . . . . . . . . . . 12 88 3.6. Requirements . . . . . . . . . . . . . . . . . . . . . . 13 89 4. BIER-TE Controller Host BitPosition Assignments . . . . . . . 13 90 4.1. P2P Links . . . . . . . . . . . . . . . . . . . . . . . . 14 91 4.2. BFER . . . . . . . . . . . . . . . . . . . . . . . . . . 14 92 4.3. Leaf BFERs . . . . . . . . . . . . . . . . . . . . . . . 14 93 4.4. LANs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 94 4.5. Hub and Spoke . . . . . . . . . . . . . . . . . . . . . . 15 95 4.6. Rings . . . . . . . . . . . . . . . . . . . . . . . . . . 15 96 4.7. Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . . 16 97 4.8. Routed adjacencies . . . . . . . . . . . . . . . . . . . 19 98 4.8.1. Reducing BitPositions . . . . . . . . . . . . . . . . 19 99 4.8.2. Supporting nodes without BIER-TE . . . . . . . . . . 19 100 5. Avoiding loops and duplicates . . . . . . . . . . . . . . . . 19 101 5.1. Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 19 102 5.2. Duplicates . . . . . . . . . . . . . . . . . . . . . . . 20 103 6. BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . . 20 104 7. Managing SI, subdomains and BFR-ids . . . . . . . . . . . . . 23 105 7.1. Why SI and sub-domains . . . . . . . . . . . . . . . . . 24 106 7.2. Bit assignment comparison BIER and BIER-TE . . . . . . . 25 107 7.3. Using BFR-id with BIER-TE . . . . . . . . . . . . . . . . 25 108 7.4. Assigning BFR-ids for BIER-TE . . . . . . . . . . . . . . 26 109 7.5. Example bit allocations . . . . . . . . . . . . . . . . . 27 110 7.5.1. With BIER . . . . . . . . . . . . . . . . . . . . . . 27 111 7.5.2. With BIER-TE . . . . . . . . . . . . . . . . . . . . 28 112 7.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 29 113 8. BIER-TE and Segment Routing . . . . . . . . . . . . . . . . . 29 114 9. Security Considerations . . . . . . . . . . . . . . . . . . . 30 115 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 116 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 30 117 12. Change log [RFC Editor: Please remove] . . . . . . . . . . . 30 118 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 119 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 121 1. Introduction 123 1.1. Overview 125 This document specifies the architecture for BIER-TE: traffic 126 engineering for Bit Index Explicit Replication BIER. 128 BIER-TE shares architecture and packet formats with BIER as described 129 in [RFC8279]. 131 BIER-TE forwards and replicates packets like BIER based on a 132 BitString in the packet header but it does not require an IGP. It 133 does support traffic engineering by explicit hop-by-hop forwarding 134 and loose hop forwarding of packets. It does support incremental 135 deployment and a Fast ReRoute (FRR) extension for link and node 136 protection is given in [I-D.eckert-bier-te-frr]. Because BIER-TE 137 like BIER operates without explicit in-network tree-building but also 138 supports traffic engineering, it is more similar to Segment Routing 139 (SR) than RSVP-TE. 141 The key differences over BIER are: 143 o BIER-TE replaces in-network autonomous path calculation by 144 explicit paths calculated offpath by the BIER-TE controller host. 146 o In BIER-TE every BitPosition of the BitString of a BIER-TE packet 147 indicates one or more adjacencies - instead of a BFER as in BIER. 149 o BIER-TE in each BFR has no routing table but only a BIER-TE 150 Forwarding Table (BIFT) indexed by SI:BitPosition and populated 151 with only those adjacencies to which the BFR should replicate 152 packets to. 154 BIER-TE headers use the same format as BIER headers. 156 BIER-TE forwarding does not require/use the BFIR-ID. The BFIR-ID can 157 still be useful though for coordinated BFIR/BFER functions, such as 158 the context for upstream assigned labels for MPLS payloads in MVPN 159 over BIER-TE. 161 If the BIER-TE domain is also running BIER, then the BFIR-ID in BIER- 162 TE packets can be set to the same BFIR-ID as used with BIER packets. 164 If the BIER-TE domain is not running full BIER or does not want to 165 reduce the need to allocate bits in BIER bitstrings for BFIR-ID 166 values, then the allocation of BFIR-ID values in BIER-TE packets can 167 be done through other mechanisms outside the scope of this document, 168 as long as this is appropriately agreed upon between all BFIR/BFER. 170 1.2. Requirements Language 172 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 173 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 174 document are to be interpreted as described in RFC 2119 [RFC2119]. 176 2. Layering 178 End to end BIER-TE operations consists of four components: The 179 "Multicast Flow Overlay", the "BIER-TE Controller Host", the "Routing 180 Underlay" and the "BIER-TE forwarding layer". 182 Picture 2: Layers of BIER-TE 184 <------BGP/PIM-----> 185 |<-IGMP/PIM-> multicast flow <-PIM/IGMP->| 186 overlay 188 [Bier-TE Controller Host] 189 ^ ^ ^ 190 / | \ BIER-TE control protocol 191 | | | eg.: Netconf/Restconf/Yang 192 v v v 193 Src -> Rtr1 -> BFIR-----BFR-----BFER -> Rtr2 -> Rcvr 195 |--------------------->| 196 BIER-TE forwarding layer 198 |<- BIER-TE domain-->| 200 |<--------------------->| 201 Routing underlay 203 Figure 1: BIER-TE architecture 205 2.1. The Multicast Flow Overlay 207 The Multicast Flow Overlay operates as in BIER. See [RFC8279]. 208 Instead of interacting with the BIER layer, it interacts with the 209 BIER-TE Controller Host 211 2.2. The BIER-TE Controller Host 213 The BIER-TE controller host is representing the control plane of 214 BIER-TE. It communicates two sets of information with BFRs: 216 During bring-up or modifications of the network topology, the 217 controller discovers the network topology, assigns BitPositions to 218 adjacencies and signals the resulting mapping of BitPositions to 219 adjacencies to each BFR connecting to the adjacency. 221 During day-to-day operations of the network, the controller signals 222 to BFIRs what multicast flows are mapped to what BitStrings. 224 Communications between the BIER-TE controller host to BFRs is ideally 225 via standardized protocols and data-models such as Netconf/Retconf/ 226 Yang. This is currently outside the scope of this document. Vendor- 227 specific CLI on the BFRs is also a possible stopgap option (as in 228 many other SDN solutions lacking definition of standardized data 229 model). 231 For simplicity, the procedures of the BIER-TE controller host are 232 described in this document as if it is a single, centralized 233 automated entity, such as an SDN controller. It could equally be an 234 operator setting up CLI on the BFRs. Distribution of the functions 235 of the BIER-TE controller host is currently outside the scope of this 236 document. 238 2.2.1. Assignment of BitPositions to adjacencies of the network 239 topology 241 The BIER-TE controller host tracks the BFR topology of the BIER-TE 242 domain. It determines what adjacencies require BitPositions so that 243 BIER-TE explicit paths can be built through them as desired by 244 operator policy. 246 The controller then pushes the BitPositions/adjacencies to the BIFT 247 of the BFRs, populating only those SI:BitPositions to the BIFT of 248 each BFR to which that BFR should be able to send packets to - 249 adjacencies connecting to this BFR. 251 2.2.2. Changes in the network topology 253 If the network topology changes (not failure based) so that 254 adjacencies that are assigned to BitPositions are no longer needed, 255 the controller can re-use those BitPositions for new adjacencies. 256 First, these BitPositions need to be removed from any BFIR flow state 257 and BFR BIFT state, then they can be repopulated, first into BIFT and 258 then into the BFIR. 260 2.2.3. Set up per-multicast flow BIER-TE state 262 The BIER-TE controller host tracks the multicast flow overlay to 263 determine what multicast flow needs to be sent by a BFIR to which set 264 of BFER. It calculates the desired distribution tree across the 265 BIER-TE domain based on algorithms outside the scope of this document 266 (eg.: CSFP, Steiner Tree,...). It then pushes the calculated 267 BitString into the BFIR. 269 2.2.4. Link/Node Failures and Recovery 271 When link or nodes fail or recover in the topology, BIER-TE can 272 quickly respond with the optional FRR procedures described in [I- 273 D.eckert-bier-te-frr]. It can also more slowly react by 274 recalculating the BitStrings of affected multicast flows. This 275 reaction is slower than the FRR procedure because the controller 276 needs to receive link/node up/down indications, recalculate the 277 desired BitStrings and push them down into the BFIRs. With FRR, this 278 is all performed locally on a BFR receiving the adjacency up/down 279 notification. 281 2.3. The BIER-TE Forwarding Layer 283 When the BIER-TE Forwarding Layer receives a packet, it simply looks 284 up the BitPositions that are set in the BitString of the packet in 285 the Bit Index Forwarding Table (BIFT) that was populated by the BIER- 286 TE controller host. For every BP that is set in the BitString, and 287 that has one or more adjacencies in the BIFT, a copy is made 288 according to the type of adjacencies for that BP in the BIFT. Before 289 sending any copy, the BFR resets all BitPositions in the BitString of 290 the packet to which it can create a copy. This is done to inhibit 291 that packets can loop. 293 2.4. The Routing Underlay 295 BIER-TE is sending BIER packets to directly connected BIER-TE 296 neighbors as L2 (unicasted) BIER packets without requiring a routing 297 underlay. BIER-TE forwarding uses the Routing underlay for 298 forward_routed adjacencies which copy BIER-TE packets to not- 299 directly-connected BFRs (see below for adjacency definitions). 301 If the BFR intends to support FRR for BIER-TE, then the BIER-TE 302 forwarding plane needs to receive fast adjacency up/down 303 notifications: Link up/down or neighbor up/down, eg.: from BFD. 304 Providing these notifications is considered to be part of the routing 305 underlay in this document. 307 3. BIER-TE Forwarding 309 3.1. The Bit Index Forwarding Table (BIFT) 311 The Bit Index Forwarding Table (BIFT) exists in every BFR. For every 312 subdomain in use, it is a table indexed by SI:BitPosition and is 313 populated by the BIER-TE control plane. Each index can be empty or 314 contain a list of one or more adjacencies. 316 BIER-TE can support multiple subdomains like BIER. Each one with a 317 separate BIFT 319 In the BIER architecture, indices into the BIFT are explained to be 320 both BFR-id and SI:BitString (BitPosition). This is because there is 321 a 1:1 relationship between BFR-id and SI:BitString - every bit in 322 every SI is/can be assigned to a BFIR/BFER. In BIER-TE there are 323 more bits used in each BitString than there are BFIR/BFER assigned to 324 the bitstring. This is because of the bits required to express the 325 (traffic engineered) path through the topology. The BIER-TE 326 forwarding definitions do therefore not use the term BFR-id at all. 327 Instead, BFR-ids are only used as required by routing underlay, flow 328 overlay of BIER headers. Please refer to Section 7 for explanations 329 how to deal with SI, subdomains and BFR-id in BIER-TE. 331 ------------------------------------------------------------------ 332 | Index: | Adjacencies: | 333 | SI:BitPosition | or one or more per entry | 334 ================================================================== 335 | 0:1 | forward_connected(interface,neighbor,DNR) | 336 ------------------------------------------------------------------ 337 | 0:2 | forward_connected(interface,neighbor,DNR) | 338 | | forward_connected(interface,neighbor,DNR) | 339 ------------------------------------------------------------------ 340 | 0:3 | local_decap([VRF]) | 341 ------------------------------------------------------------------ 342 | 0:4 | forward_routed([VRF,]l3-neighbor) | 343 ------------------------------------------------------------------ 344 | 0:5 | | 345 ------------------------------------------------------------------ 346 | 0:6 | ECMP({adjacency1,...adjacencyN}, seed) | 347 ------------------------------------------------------------------ 348 ... 349 | BitStringLength | ... | 350 ------------------------------------------------------------------ 351 Bit Index Forwarding Table 353 Figure 2: BIFT adjacencies 355 The BIFT is programmed into the data plane of BFRs by the BIER-TE 356 controller host and used to forward packets, according to the rules 357 specified in the BIER-TE Forwarding Procedures. 359 Adjacencies for the same BP when populated in more than one BFR by 360 the controller do not have to have the same adjacencies. This is up 361 to the controller. BPs for p2p links are one case (see below). 363 3.2. Adjacency Types 365 3.2.1. Forward Connected 367 A "forward_connected" adjacency is towards a directly connected BFR 368 neighbor using an interface address of that BFR on the connecting 369 interface. A forward_connected adjacency does not route packets but 370 only L2 forwards them to the neighbor. 372 Packets sent to an adjacency with "DoNotReset" (DNR) set in the BIFT 373 will not have the BitPosition for that adjacency reset when the BFR 374 creates a copy for it. The BitPosition will still be reset for 375 copies of the packet made towards other adjacencies. The can be used 376 for example in ring topologies as explained below. 378 3.2.2. Forward Routed 380 A "forward_routed" adjacency is an adjacency towards a BFR that is 381 not a forward_connected adjacency: towards a loopback address of a 382 BFR or towards an interface address that is non-directly connected. 383 Forward_routed packets are forwarded via the Routing Underlay. 385 If the Routing Underlay has multiple paths for a forward_routed 386 adjacency, it will perform ECMP independent of BIER-TE for packets 387 forwarded across a forward_routed adjacency. 389 If the Routing Underlay has FRR, it will perform FRR independent of 390 BIER-TE for packets forwarded across a forward_routed adjacency. 392 3.2.3. ECMP 394 The ECMP mechanisms in BIER are tied to the BIER BIFT and are are 395 therefore not directly useable with BIER-TE. The following 396 procedures describe ECMP for BIER-TE that we consider to be 397 lightweight but also well manageable. It leverages the existing 398 entropy parameter in the BIER header to keep packets of the flows on 399 the same path and it introduces a "seed" parameter to allow 400 engineering traffic to be polarized or randomized across multiple 401 hops. 403 An "Equal Cost Multipath" (ECMP) adjacency has a list of two or more 404 adjacencies included in it. It copies the BIER-TE to one of those 405 adjacencies based on the ECMP hash calculation. The BIER-TE ECMP 406 hash algorithm must select the same adjacency from that list for all 407 packets with the same "entropy" value in the BIER-TE header if the 408 same number of adjacencies and same seed are given as parameters. 409 Further use of the seed parameter is explained below. 411 3.2.4. Local Decap 413 A "local_decap" adjacency passes a copy of the payload of the BIER-TE 414 packet to the packets NextProto within the BFR (IPv4/IPv6, 415 Ethernet,...). A local_decap adjacency turns the BFR into a BFER for 416 matching packets. Local_decap adjacencies require the BFER to 417 support routing or switching for NextProto to determine how to 418 further process the packet. 420 3.3. Encapsulation considerations 422 Specifications for BIER-TE encapsulation are outside the scope of 423 this document. This section gives explanations and guidelines. 425 Because a BFR needs to interpret the BitString of a BIER-TE packet 426 differently from a BIER packet, it is necessary to distinguish BIER 427 from BIER-TE packets. This is subject to definitions in BIER 428 encapsulation specifications. 430 MPLS encapsulation [RFC8296] for example assigns one label by which 431 BFRs recognizes BIER packets for every (SI,subdomain) combination. 432 If it is desirable that every subdomain can forward only BIER or 433 BIER-TE packets, then the label allocation could stay the same, and 434 only the forwarding model (BIER/BIER-TE) would have to be defined per 435 subdomain. If it is desirable to support both BIER and BIER-TE 436 forwarding in the same subdomain, then additional labels would need 437 to be assigned for BIER-TE forwarding. 439 "forward_routed" requires an encapsulation permitting to unicast 440 BIER-TE packets to a specific interface address on a target BFR. 441 With MPLS encapsulation, this can simply be done via a label stack 442 with that addresses label as the top label - followed by the label 443 assigned to (SI,subdomain) - and if necessary (see above) BIER-TE. 444 With non-MPLS encapsulation, some form of IP tunneling (IP in IP, 445 LISP, GRE) would be required. 447 The encapsulation used for "forward_routed" adjacencies can equally 448 support existing advanced adjacency information such as "loose source 449 routes" via eg: MPLS label stacks or appropriate header extensions 450 (eg: for IPv6). 452 3.4. Basic BIER-TE Forwarding Example 454 Step by step example of basic BIER-TE forwarding. This does not use 455 ECMP or forward_routed adjacencies nor does it try to minimize the 456 number of required BitPositions for the topology. 458 [Bier-Te Controller Host] 459 / | \ 460 v v v 462 | p13 p1 | 463 +- BFIR2 --+ | 464 | | p2 p6 | LAN2 465 | +-- BFR3 --+ | 466 | | | p7 p11 | 467 Src -+ +-- BFER1 --+ 468 | | p3 p8 | | 469 | +-- BFR4 --+ +-- Rcv1 470 | | | | 471 | | 472 | p14 p4 | 473 +- BFIR1 --+ | 474 | +-- BFR5 --+ p10 p12 | 475 LAN1 | p5 p9 +-- BFER2 --+ 476 | +-- Rcv2 477 | 478 LAN3 480 IP |..... BIER-TE network......| IP 482 Figure 3: BIER-TE Forwarding Example 484 pXX indicate the BitPositions number assigned by the BIER-TE 485 controller host to adjacencies in the BIER-TE topology. For example, 486 p9 is the adjacency towards BFR9 on the LAN connecting to BFER2. 488 BIFT BFIR2: 489 p13: local_decap() 490 p2: forward_connected(BFR3) 492 BIFT BFR3: 493 p1: forward_connected(BFIR2) 494 p7: forward_connected(BFER1) 495 p8: forward_connected(BFR4) 497 BIFT BFER1: 498 p11: local_decap() 499 p6: forward_connected(BFR3) 500 p8: forward_connected(BFR4) 502 Figure 4: BIER-TE Forwarding Example Adjacencies 504 ...and so on. 506 Traffic needs to flow from BFIR2 towards Rcv1, Rcv2. The controller 507 determines it wants it to pass across the following paths: 509 -> BFER1 ---------------> Rcv1 510 BFIR2 -> BFR3 511 -> BFR4 -> BFR5 -> BFER2 -> Rcv2 513 Figure 5: BIER-TE Forwarding Example Paths 515 These paths equal to the following BitString: p2, p5, p7, p8, p10, 516 p11, p12. 518 This BitString is set up in BFIR2. Multicast packets arriving at 519 BFIR2 from Src are assigned this BitString. 521 BFIR2 forwards based on that BitString. It has p2 and p13 populated. 522 Only p13 is in BitString which has an adjacency towards BFR3. BFIR2 523 resets p2 in BitString and sends a copy towards BFR2. 525 BFR3 sees a BitString of p5,p7,p8,p10,p11,p12. It is only interested 526 in p1,p7,p8. It creates a copy of the packet to BFER1 (due to p7) 527 and one to BFR4 (due to p8). It resets p7, p8 before sending. 529 BFER1 sees a BitString of p5,p10,p11,p12. It is only interested in 530 p6,p7,p8,p11 and therefore considers only p11. p11 is a "local_decap" 531 adjacency installed by the BIER-TE controller host because BFER1 532 should pass packets to IP multicast. The local_decap adjacency 533 instructs BFER1 to create a copy, decapsulate it from the BIER header 534 and pass it on to the NextProtocol, in this example IP multicast. IP 535 multicast will then forward the packet out to LAN2 because it did 536 receive PIM or IGMP joins on LAN2 for the traffic. 538 Further processing of the packet in BFR4, BFR5 and BFER2 accordingly. 540 3.5. Forwarding comparison with BIER 542 Forwarding of BIER-TE is designed to allow common forwarding hardware 543 with BIER. In fact, one of the main goals of this document is to 544 encourage the building of forwarding hardware that can not only 545 support BIER, but also BIER-TE - to allow experimentation with BIER- 546 TE and support building of BIER-TE control plane code. 548 The pseudocode in Section 6 shows how existing BIER/BIFT forwarding 549 can be amended to support basic BIER-TE forwarding, by using BIER 550 BIFT's F-BM. Only the masking of bits due to avoid duplicates must 551 be skipped when forwarding is for BIER-TE. 553 Whether to use BIER or BIER-TE forwarding can simply be a configured 554 choice per subdomain and accordingly be set up by a BIER-TE 555 controller host. The BIER packet encapsulation [RFC8296] too can be 556 reused without changes except that the currently defined BIER-TE ECMP 557 adjacency does not leverage the entropy field so that field would be 558 unused when BIER-TE forwarding is used. 560 3.6. Requirements 562 Basic BIER-TE forwarding MUST support to configure Subdomains to use 563 basic BIER-TE forwarding rules (instead of BIER). With basic BIER-TE 564 forwarding, every bit MUST support to have zero or one adjacency. It 565 MUST support the adjacency types forward_connected without DNR flag, 566 forward_routed and local_decap. All other BIER-TE forwarding 567 features are optional. This Basic BIER-TE requirements make BIER-TE 568 forwarding exactly the same as BIER forwarding with the exception of 569 skipping the aforementioned F-BM masking on egres. 571 BIER-TE forwarding SHOULD support the DNR flag, as this is highly 572 useful to save bits in rings (see Section 4.6). 574 BIER-TE forwarding MAY support more than one djacency on a bit and 575 ECMP adjacencies. The importance of ECMP adjacencies is unclear when 576 traffic engineering is used because it may be more desirable to 577 explicitly steer traffic across non-ECMP paths to make per-path 578 traffic calculation easier for controllers. Having more than one 579 adjacency for a bit allows further savings of bits in hub&spoke 580 scenarios, but unlike rings it is less "natural" to flood traffic 581 across multuple links unconditional. Both ECMP and multiple 582 adjacencies are forwarding plane features that should be possible to 583 support later when needed as they do not impact the basic BIER-TE 584 replication loop. This is true because there is no inter-copy 585 depency through resetting of F-BM as in BIER. 587 4. BIER-TE Controller Host BitPosition Assignments 589 This section describes how the BIER-TE controller host can use the 590 different BIER-TE adjacency types to define the BitPositions of a 591 BIER-TE domain. 593 Because the size of the BitString is limiting the size of the BIER-TE 594 domain, many of the options described exist to support larger 595 topologies with fewer BitPositions (4.1, 4.3, 4.4, 4.5, 4.6, 4.7, 596 4.8). 598 4.1. P2P Links 600 Each P2p link in the BIER-TE domain is assigned one unique 601 BitPosition with a forward_connected adjacency pointing to the 602 neighbor on the p2p link. 604 4.2. BFER 606 Every BFER is given a unique BitPosition with a local_decap 607 adjacency. 609 4.3. Leaf BFERs 611 Leaf BFERs are BFERs where incoming BIER-TE packets never need to be 612 forwarded to another BFR but are only sent to the BFER to exit the 613 BIER-TE domain. For example, in networks where PEs are spokes 614 connected to P routers, those PEs are Leaf BFIRs unless there is a 615 U-turn between two PEs. 617 All leaf-BFER in a BIER-TE domain can share a single BitPosition. 618 This is possible because the BitPosition for the adjacency to reach 619 the BFER can be used to distinguish whether or not packets should 620 reach the BFER. 622 This optimization will not work if an upstream interface of the BFER 623 is using a BitPosition optimized as described in the following two 624 sections (LAN, Hub and Spoke). 626 4.4. LANs 628 In a LAN, the adjacency to each neighboring BFR on the LAN is given a 629 unique BitPosition. The adjacency of this BitPosition is a 630 forward_connected adjacency towards the BFR and this BitPosition is 631 populated into the BIFT of all the other BFRs on that LAN. 633 BFR1 634 |p1 635 LAN1-+-+---+-----+ 636 p3| p4| p2| 637 BFR3 BFR4 BFR7 639 Figure 6: LAN Example 641 If Bandwidth on the LAN is not an issue and most BIER-TE traffic 642 should be copied to all neighbors on a LAN, then BitPositions can be 643 saved by assigning just a single BitPosition to the LAN and 644 populating the BitPosition of the BIFTs of each BFRs on the LAN with 645 a list of forward_connected adjacencies to all other neighbors on the 646 LAN. 648 This optimization does not work in the face of BFRs redundantly 649 connected to more than one LANs with this optimization because these 650 BFRs would receive duplicates and forward those duplicates into the 651 opposite LANs. Adjacencies of such BFRs into their LANs still need a 652 separate BitPosition. 654 4.5. Hub and Spoke 656 In a setup with a hub and multiple spokes connected via separate p2p 657 links to the hub, all p2p links can share the same BitPosition. The 658 BitPosition on the hubs BIFT is set up with a list of 659 forward_connected adjacencies, one for each Spoke. 661 This option is similar to the BitPosition optimization in LANs: 662 Redundantly connected spokes need their own BitPositions. 664 4.6. Rings 666 In L3 rings, instead of assigning a single BitPosition for every p2p 667 link in the ring, it is possible to save BitPositions by setting the 668 "Do Not Reset" (DNR) flag on forward_connected adjacencies. 670 For the rings shown in the following picture, a single BitPosition 671 will suffice to forward traffic entering the ring at BFRa or BFRb all 672 the way up to BFR1: 674 On BFRa, BFRb, BFR30,... BFR3, the BitPosition is populated with a 675 forward_connected adjacency pointing to the clockwise neighbor on the 676 ring and with DNR set. On BFR2, the adjacency also points to the 677 clockwise neighbor BFR1, but without DNR set. 679 Handling DNR this way ensures that copies forwarded from any BFR in 680 the ring to a BFR outside the ring will not have the ring BitPosition 681 set, therefore minimizing the chance to create loops. 683 v v 684 | | 685 L1 | L2 | L3 686 /-------- BFRa ---- BFRb --------------------\ 687 | | 688 \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/ 689 | | L4 | | 690 p33| p15| 691 BFRd BFRc 693 Figure 7: Ring Example 695 Note that this example only permits for packets to enter the ring at 696 BFRa and BFRb, and that packets will always travel clockwise. If 697 packets should be allowed to enter the ring at any ring BFR, then one 698 would have to use two ring BitPositions. One for clockwise, one for 699 counterclockwise. 701 Both would be set up to stop rotating on the same link, eg: L1. When 702 the ingress ring BFR creates the clockwise copy, it will reset the 703 counterclockwise BitPosition because the DNR bit only applies to the 704 bit for which the replication is done. Likewise for the clockwise 705 BitPosition for the counterclockwise copy. In result, the ring 706 ingress BFR will send a copy in both directions, serving BFRs on 707 either side of the ring up to L1. 709 4.7. Equal Cost MultiPath (ECMP) 711 The ECMP adjacency allows to use just one BP per link bundle between 712 two BFRs instead of one BP for each p2p member link of that link 713 bundle. In the following picture, one BP is used across L1,L2,L3 and 714 BFR1/BFR2 have for the BP 715 --L1----- 716 BFR1 --L2----- BFR2 717 --L3----- 719 BIFT entry in BFR1: 720 ------------------------------------------------------------------ 721 | Index | Adjacencies | 722 ================================================================== 723 | 0:6 | ECMP({L1-to-BFR2,L2-to-BFR2,L3-to-BFR2}, seed) | 724 ------------------------------------------------------------------ 726 BIFT entry in BFR2: 727 ------------------------------------------------------------------ 728 | Index | Adjacencies | 729 ================================================================== 730 | 0:6 | ECMP({L1-to-BFR1,L2-to-BFR1,L3-to-BFR1}, seed) | 731 ------------------------------------------------------------------ 733 Figure 8: ECMP Example 735 In the following example, all traffic from BFR1 towards BFR10 is 736 intended to be ECMP load split equally across the topology. This 737 example is not mean as a likely setup, but to illustrate that ECMP 738 can be used to share BPs not only across link bundles, and it 739 explains the use of the seed parameter. 741 BFR1 742 / \ 743 /L11 \L12 744 BFR2 BFR3 745 / \ / \ 746 /L21 \L22 /L31 \L32 747 BFR4 BFR5 BFR6 BFR7 748 \ / \ / 749 \ / \ / 750 BFR8 BFR9 751 \ / 752 \ / 753 BFR10 755 BIFT entry in BFR1: 756 ------------------------------------------------------------------ 757 | 0:6 | ECMP({L11-to-BFR2,L12-to-BFR3}, seed) | 758 ------------------------------------------------------------------ 760 BIFT entry in BFR2: 761 ------------------------------------------------------------------ 762 | 0:6 | ECMP({L21-to-BFR4,L22-to-BFR5}, seed) | 763 ------------------------------------------------------------------ 765 BIFT entry in BFR3: 766 ------------------------------------------------------------------ 767 | 0:6 | ECMP({L31-to-BFR6,L32-to-BFR7}, seed) | 768 ------------------------------------------------------------------ 770 Figure 9: Polarization Example 772 With the setup of ECMP in above topology, traffic would not be 773 equally load-split. Instead, links L22 and L31 would see no traffic 774 at all: BFR2 will only see traffic from BFR1 for which the ECMP hash 775 in BFR1 selected the first adjacency in a list of 2 adjacencies: link 776 L11-to-BFR2. When forwarding in BFR2 performs again an ECMP with two 777 adjacencies on that subset of traffic, then it will again select the 778 first of its two adjacencies to it: L21-to-BFR4. And therefore L22 779 and BFR5 sees no traffic. 781 To resolve this issue, the ECMP adjacency on BFR1 simply needs to be 782 set up with a different seed than the ECMP adjacencies on BFR2/BFR3 784 This issue is called polarization. It depends on the ECMP hash. It 785 is possible to build ECMP that does not have polarization, for 786 example by taking entropy from the actual adjacency members into 787 account, but that can make it harder to achieve evenly balanced load- 788 splitting on all BFR without making the ECMP hash algorithm 789 potentially too complex for fast forwarding in the BFRs. 791 4.8. Routed adjacencies 793 4.8.1. Reducing BitPositions 795 Routed adjacencies can reduce the number of BitPositions required 796 when the traffic engineering requirement is not hop-by-hop explicit 797 path selection, but loose-hop selection. 799 ............... ............... 800 BFR1--... Redundant ...--L1-- BFR2... Redundant ...--- 801 \--... Network ...--L2--/ ... Network ...--- 802 BFR4--... Segment 1 ...--L3-- BFR3... Segment 2 ...--- 803 ............... ............... 805 Figure 10: Routed Adjacencies Example 807 Assume the requirement in above network is to explicitly engineer 808 paths such that specific traffic flows are passed from segment 1 to 809 segment 2 via link L1 (or via L2 or via L3). 811 To achieve this, BFR1 and BFR4 are set up with a forward_routed 812 adjacency BitPosition towards an address of BFR2 on link L1 (or link 813 L2 BFR3 via L3). 815 For paths to be engineered through a specific node BFR2 (or BFR3), 816 BFR1 and BFR4 are set up up with a forward_routed adjacency 817 BitPosition towards a loopback address of BFR2 (or BFR3). 819 4.8.2. Supporting nodes without BIER-TE 821 Routed adjacencies also enable incremental deployment of BIER-TE. 822 Only the nodes through which BIER-TE traffic needs to be steered - 823 with or without replication - need to support BIER-TE. Where they 824 are not directly connected to each other, forward_routed adjacencies 825 are used to pass over non BIER-TE enabled nodes. 827 5. Avoiding loops and duplicates 829 5.1. Loops 831 Whenever BIER-TE creates a copy of a packet, the BitString of that 832 copy will have all BitPositions cleared that are associated with 833 adjacencies in the BFR. This inhibits looping of packets. The only 834 exception are adjacencies with DNR set. 836 With DNR set, looping can happen. Consider in the ring picture that 837 link L4 from BFR3 is plugged into the L1 interface of BFRa. This 838 creates a loop where the rings clockwise BitPosition is never reset 839 for copies of the packets traveling clockwise around the ring. 841 To inhibit looping in the face of such physical misconfiguration, 842 only forward_connected adjacencies are permitted to have DNR set, and 843 the link layer destination address of the adjacency (eg.: MAC 844 address) protects against closing the loop. Link layers without port 845 unique link layer addresses should not used with the DNR flag set. 847 5.2. Duplicates 849 Duplicates happen when the topology of the BitString is not a tree 850 but redundantly connecting BFRs with each other. The controller must 851 therefore ensure to only create BitStrings that are trees in the 852 topology. 854 When links are incorrectly physically re-connected before the 855 controller updates BitStrings in BFIRs, duplicates can happen. Like 856 loops, these can be inhibited by link layer addressing in 857 forward_connected adjacencies. 859 If interface or loopback addresses used in forward_routed adjacencies 860 are moved from one BFR to another, duplicates can equally happen. 861 Such re-addressing operations must be coordinated with the 862 controller. 864 6. BIER-TE Forwarding Pseudocode 866 The following simplified pseudocode for BIER-TE forwarding is using 867 BIER forwarding pseudocode of [RFC8279], section 6.5 with the one 868 modification necessary to support basic BIER-TE forwarding. Like the 869 BIER pseudo forwarding code, for simplicity it does hide the details 870 of the adjacency processing inside PacketSend() which can be 871 forward_connected, forward_routed or local_decap. 873 void ForwardBitMaskPacket_withTE (Packet) 874 { 875 SI=GetPacketSI(Packet); 876 Offset=SI*BitStringLength; 877 for (Index = GetFirstBitPosition(Packet->BitString); Index ; 878 Index = GetNextBitPosition(Packet->BitString, Index)) { 879 F-BM = BIFT[Index+Offset]->F-BM; 880 if (!F-BM) continue; 881 BFR-NBR = BIFT[Index+Offset]->BFR-NBR; 882 PacketCopy = Copy(Packet); 883 PacketCopy->BitString &= F-BM; [2] 884 PacketSend(PacketCopy, BFR-NBR); 885 // The following must not be done for BIER-TE: 886 // Packet->BitString &= ~F-BM; [1] 887 } 888 } 890 Figure 11: Simplified BIER-TE Forwarding Pseudocode 892 The difference is that in BIER-TE, step [1] must not be performed. 894 In BIER, this step is necessary to avoid duplicates when two or more 895 BFER are reachable via the same neighbor. The F-BM of all those BFER 896 bits will indicate each others bits, and step [1] will reset all 897 these bits on the first copy made for the first of those BFER bits 898 set in the BitString, hence skipping any further copies to that 899 neighbor. 901 Whereas in BIER, the F-BM of bits toward a specific neighbor contain 902 only the bits of those BFER destined to be forwarded across this 903 neighbor, in BIER-TE the F-BM for a neighbor needs to have all bits 904 set except all those bits that are actual (non-empty) adjacencies of 905 this BFR. Step [2] will reset those adjacency bits to avoid loops, 906 but all the other bits that are not adjacencies of this BFR need to 907 stay untouched by [2] so that they can be processed by further BFR 908 along the path. If [1] was performed as in BIER, then those non- 909 adjacency bits would erroneously get reset during replication. 911 To support the DNR (Do Not Reset) flag of forward_connected() 912 adjacencies, the F-BM must also have its own bit set in the F-BM of 913 such an adjacency , so that for the packet copy made for this 914 adjacency the bit stays on, whereas it will not be set in the F-BM of 915 other bits so that it will be reset for any other packet copy made. 917 Eliminating the need to perform [1] also makes processing of bits in 918 the BIER-TE bitstring independent of processing other bits, which may 919 also simplify forwarding plane implementations. 921 The following pseudocode is comprehensive: 923 o This pseudocode eliminates per-bit F-BM, therefore reducing state 924 by BitStringLength^2*SI and eliminating the need for per-packet- 925 copy masking operation except for adjacencies with DNR flag set: 927 * AdjacentBits[SI] are bits with a non-empty list of adjcencies. 928 This can be computed whenever the BIER-TE controller host 929 updates the adjacencies. 931 * Only the AdjacentBits need to be examined in the loop for 932 packet copies. 934 * The packets BitString is masked with those AdjacentBits on 935 ingres to avoid packet loopings. 937 o The code loops over the adjacencies because there may be more than 938 one adjacency for a bit. 940 o When an adjacency has the DNR bit, the bit is set in the packet 941 copy (to save bits in rings for example). 943 o The ECMP adjacency is shown. Its parameters are a 944 ListOfAdjacencies from which one is picked. 946 o The forward_local, forward_routed, local_decap adjacencies are 947 shown with their parameters. 949 void ForwardBitMaskPacket_withTE (Packet) 950 { 951 SI=GetPacketSI(Packet); 952 Offset=SI*BitStringLength; 953 AdjacentBitstring = Packet->BitString &= ~AdjacentBits[SI]; 954 Packet->BitString &= AdjacentBits[SI]; 955 for (Index = GetFirstBitPosition(AdjacentBits); Index ; 956 Index = GetNextBitPosition(AdjacentBits, Index)) { 957 foreach adjacency BIFT[Index+Offset] { 958 if(adjacency == ECMP(ListOfAdjacencies, seed) ) { 959 I = ECMP_hash(sizeof(ListOfAdjacencies), 960 Packet->Entropy, seed); 961 adjacency = ListOfAdjacencies[I]; 962 } 963 PacketCopy = Copy(Packet); 964 switch(adjacency) { 965 case forward_connected(interface,neighbor,DNR): 966 if(DNR) 967 PacketCopy->BitString |= 2<<(Index-1); 968 SendToL2Unicast(PacketCopy,interface,neighbor); 970 case forward_routed([VRF],neighbor): 971 SendToL3(PacketCopy,[VRF,]l3-neighbor); 973 case local_decap([VRF],neighbor): 974 DecapBierHeader(PacketCopy); 975 PassTo(PacketCopy,[VRF,]Packet->NextProto); 976 } 977 } 978 } 979 } 981 Figure 12: BIER-TE Forwarding Pseudocode 983 7. Managing SI, subdomains and BFR-ids 985 When the number of bits required to represent the necessary hops in 986 the topology and BFER exceeds the supported bitstring length, 987 multiple SI and/or subdomains must be used. This section discusses 988 how. 990 BIER-TE forwarding does not require the concept of BFR-id, but 991 routing underlay, flow overlay and BIER headers may. This section 992 also discusses how BFR-id can be assigned to BFIR/BFER for BIER-TE. 994 7.1. Why SI and sub-domains 996 For BIER and BIER-TE forwarding, the most important result of using 997 multiple SI and/or subdomains is the same: Packets that need to be 998 sent to BFER in different SI or subdomains require different BIER 999 packets: each one with a bitstring for a different (SI,subdomain) 1000 bitstring. Each such bitstring uses one bitstring length sized SI 1001 block in the BIFT of the subdomain. We call this a BIFT:SI (block). 1003 For BIER and BIER-TE forwarding itself there is also no difference 1004 whether different SI and/or sub-domains are chosen, but SI and 1005 subdomain have different purposes in the BIER architecture shared by 1006 BIER-TE. This impacts how operators are managing them and how 1007 especially flow overlays will likely use them. 1009 By default, every possible BFIR/BFER in a BIER network would likely 1010 be given a BFR-id in subdomain 0 (unless there are > 64k BFIR/BFER). 1012 If there are different flow services (or service instances) requiring 1013 replication to different subsets of BFER, then it will likely not be 1014 possible to achieve the best replication efficiency for all of these 1015 service instances via subdomain 0. Ideal replication efficiency for 1016 N BFER exists in a subdomain if they are split over not more than 1017 ceiling(N/bitstring-length) SI. 1019 If service instances justify additional BIER:SI state in the network, 1020 additional subdomains will be used: BFIR/BFER are assigned BFIR-id in 1021 those subdomains and each service instance is configured to use the 1022 most appropriate subdomain. This results in improved replication 1023 efficiency for different services. 1025 Even if creation of subdomains and assignment of BFR-id to BFIR/BFER 1026 in those subdomains is automated, it is not expected that individual 1027 service instances can deal with BFER in different subdomains. A 1028 service instance may only support configuration of a single subdomain 1029 it should rely on. 1031 To be able to easily reuse (and modify as little as possible) 1032 existing BIER procedures including flow-overlay and routing underlay, 1033 when BIER-TE forwarding is added, we therefore reuse SI and subdomain 1034 logically in the same way as they are used in BIER: All necessary 1035 BFIR/BFER for a service use a single BIER-TE BIFT and are split 1036 across as many SI as necessary (see below). Different services may 1037 use different subdomains that primarily exist to provide more 1038 efficient replication (and for BIER-TE desirable traffic engineering) 1039 for different subsets of BFIR/BFER. 1041 7.2. Bit assignment comparison BIER and BIER-TE 1043 In BIER, bitstrings only need to carry bits for BFER, which lead to 1044 the model that BFR-ids map 1:1 to each bit in a bitstring. 1046 In BIER-TE, bitstrings need to carry bits to indicate not only the 1047 receiving BFER but also the intermediate hops/links across which the 1048 packet must be sent. The maximum number of BFER that can be 1049 supported in a single bitstring or BIFT:SI depends on the number of 1050 bits necessary to represent the desired topology between them. 1052 "Desired" topology because it depends on the physical topology, and 1053 on the desire of the operator to allow for explicit traffic 1054 engineering across every single hop (which requires more bits), or 1055 reducing the number of required bits by exploiting optimizations such 1056 as unicast (forward_route), ECMP or flood (DNR) over "uninteresting" 1057 sub-parts of the topology - eg: parts where different trees do not 1058 need to take different paths due to traffic-engineering reasons. 1060 The total number of bits to describe the topology in a BIFT:SI can 1061 therefore easily be as low as 20% or as high as 80%. The higher the 1062 percentage, the higher the likelihood, that those topology bits are 1063 not just BIER-TE overhead without additional benefit, but instead 1064 they will allow to express the desired traffic-engineering 1065 alternatives. 1067 7.3. Using BFR-id with BIER-TE 1069 Because there is no 1:1 mapping between bits in the bitstring and 1070 BFER, BIER-TE can not simply rely on the BIER 1:1 mapping between 1071 bits in a bitstring and BFR-id. 1073 In BIER, automatic schemes could assign all possible BFR-ids 1074 sequentially to BFERs. This will not work in BIER-TE. In BIER-TE, 1075 the operator or BIER-TE controller host has to determine a BFR-id for 1076 each BFER in each required subdomain. The BFR-id may or may not have 1077 a relationship with a bit in the bitstring. Suggestions are detailed 1078 below. Once determined, the BFR-id can then be configured on the 1079 BFER and used by flow overlay, routing underlay and the BIER header 1080 almost the same as the BFR-id in BIER. 1082 The one exception are application/flow-overlays that automatically 1083 calculate the bitstring(s) of BIER packets by converting BFR-id to 1084 bits. In BIER-TE, this operation can be done in two ways: 1086 "Independent branches": For a given application or (set of) trees, 1087 the branches from a BFIR to every BFER are independent of the 1088 branches to any other BFER. For example, shortest part trees have 1089 independent branches. 1091 "Interdependent branches": When a BFER is added or deleted from a 1092 particular distribution tree, branches to other BFER still in the 1093 tree may need to change. Steiner tree are examples of dependent 1094 branch trees. 1096 If "independent branches" are sufficient, the BIER-TE controller host 1097 can provide to such applications for every BFR-id a SI:bitstring with 1098 the BIER-TE bits for the branch towards that BFER. The application 1099 can then independently calculate the SI:bitstring for all desired 1100 BFER by OR'ing their bitstrings. 1102 If "interdependent branches" are required, the application could call 1103 a BIER-TE controller host API with the list of required BFER-id and 1104 get the required bitstring back. Whenever the set of BFER-id 1105 changes, this is repeated. 1107 Note that in either case (unlike in BIER), the bits in BIER-TE may 1108 need to change upon link/node failure/recovery, network expansion and 1109 network load by other traffic (as part of traffic engineering goals). 1110 Interactions between such BFIR applications and the BIER-TE 1111 controller host do therefore need to support dynamic updates to the 1112 bitstrings. 1114 7.4. Assigning BFR-ids for BIER-TE 1116 For non-leaf BFER, there is usually a single bit k for that BFER with 1117 a local_decap() adjacency on the BFER. The BFR-id for such a BFER is 1118 therefore most easily the one it would have in BIER: SI * bitstring- 1119 length + k. 1121 As explained earlier in the document, leaf BFER do not need such a 1122 separate bit because the fact alone that the BIER-TE packet is 1123 forwarded to the leaf BFER indicates that the BFER should decapsulate 1124 it. Such a BFER will have one or more bits for the links leading 1125 only to it. The BFR-id could therefore most easily be the BFR-id 1126 derived from the lowest bit for those links. 1128 These two rules are only recommendations for the operator or BIER-TE 1129 controller assigning the BFR-ids. Any allocation scheme can be used, 1130 the BFR-ids just need to be unique across BFRs in each subdomain. 1132 It is not currently determined if a single subdomain could or should 1133 be allowed to forward both BIER and BIER-TE packets. If this should 1134 be supported, there are two options: 1136 A. BIER and BIER-TE have different BFR-id in the same subdomain. 1137 This allows higher replication efficiency for BIER because their BFR- 1138 id can be assigned sequentially, while the bitstrings for BIER-TE 1139 will have also the additional bits for the topology. There is no 1140 relationship between a BFR BIER BFR-id and BIER-TE BFR-id. 1142 B. BIER and BIER-TE share the same BFR-id. The BFR-id are assigned 1143 as explained above for BIER-TE and simply reused for BIER. The 1144 replication efficiency for BIER will be as low as that for BIER-TE in 1145 this approach. Depending on topology, only the same 20%..80% of bits 1146 as possible for BIER-TE can be used for BIER. 1148 7.5. Example bit allocations 1150 7.5.1. With BIER 1152 Consider a network setup with a bitstring length of 256 for a network 1153 topology as shown in the picture below. The network has 6 areas, 1154 each with ca. 180 BFR, connecting via a core with some larger (core) 1155 BFR. To address all BFER with BIER, 4 SI are required. To send a 1156 BIER packet to all BFER in the network, 4 copies need to be sent by 1157 the BFIR. On the BFIR it does not make a difference how the BFR-id 1158 are allocated to BFER in the network, but for efficiency further down 1159 in the network it does make a difference. 1161 area1 area2 area3 1162 BFR1a BFR1b BFR2a BFR2b BFR3a BFR3b 1163 | \ / \ / | 1164 ................................ 1165 . Core . 1166 ................................ 1167 | / \ / \ | 1168 BFR4a BFR4b BFR5a BFR5b BFR6a BFR6b 1169 area4 area5 area6 1171 Figure 13: Scaling BIER-TE bits by reuse 1173 With random allocation of BFR-id to BFER, each receiving area would 1174 (most likely) have to receive all 4 copies of the BIER packet because 1175 there would be BFR-id for each of the 4 SI in each of the areas. 1176 Only further towards each BFER would this duplication subside - when 1177 each of the 4 trees runs out of branches. 1179 If BFR-id are allocated intelligently, then all the BFER in an area 1180 would be given BFR-id with as few as possible different SI. Each 1181 area would only have to forward one or two packets instead of 4. 1183 Given how networks can grow over time, replication efficiency in an 1184 area will also easily go down over time when BFR-id are network wide 1185 allocated sequentially over time. An area that initially only has 1186 BFR-id in one SI might end up with many SI over a longer period of 1187 growth. Allocating SIs to areas with initially sufficiently many 1188 spare bits for growths can help to alleviate this issue. Or renumber 1189 BFR-id after network expansion. In this example one may consider to 1190 use 6 SI and assign one to each area. 1192 This example shows that intelligent BFR-id allocation within at least 1193 subdomain 0 can even be helpful or even necessary in BIER. 1195 7.5.2. With BIER-TE 1197 In BIER-TE one needs to determine a subset of the physical topology 1198 and attached BFER so that the "desired" representation of this 1199 topology and the BFER fit into a single bitstring. This process 1200 needs to be repeated until the whole topology is covered. 1202 Once bits/SIs are assigned to topology and BFER, BFR-id is just a 1203 derived set of identifiers from the operator/BIER-TE controller as 1204 explained above. 1206 Every time that different sub-topologies have overlap, bits need to 1207 be repeated across the bitstrings, increasing the overall amount of 1208 bits required across all bitstring/SIs. In the worst case, random 1209 subsets of BFER are assigned to different SI. This is much worse 1210 than in BIER because it not only reduces replication efficiency with 1211 the same number of overall bits, but even further - because more bits 1212 are required due to duplication of bits for topology across multiple 1213 SI. Intelligent BFER to SI assignment and selecting specific 1214 "desired" subtopologies can minimize this problem. 1216 To set up BIER-TE efficiently for above topology, the following bit 1217 allocation methods can be used. This method can easily be expanded 1218 to other, similarly structured larger topologies. 1220 Each area is allocated one or more SI depending on the number of 1221 future expected BFER and number of bits required for the topology in 1222 the area. In this example, 6 SI, one per area. 1224 In addition, we use 4 bits in each SI: bia, bib, bea, beb: bit 1225 ingress a, bit ingress b, bit egress a, bit egress b. These bits 1226 will be used to pass BIER packets from any BFIR via any combination 1227 of ingress area a/b BFR and egress area a/b BFR into a specific 1228 target area. These bits are then set up with the right 1229 forward_routed adjacencies on the BFIR and area edge BFR: 1231 On all BFIR in an area j, bia in each BIFT:SI is populated with the 1232 same forward_routed(BFRja), and bib with forward_routed(BFRjb). On 1233 all area edge BFR, bea in BIFT:SI=k is populated with 1234 forward_routed(BFRka) and beb in BIFT:SI=k with 1235 forward_routed(BFRkb). 1237 For BIER-TE forwarding of a packet to some subset of BFER across all 1238 areas, a BFIR would create at most 6 copies, with SI=1...SI=6, In 1239 each packet, the bits indicate bits for topology and BFER in that 1240 topology plus the four bits to indicate whether to pass this packet 1241 via the ingress area a or b border BFR and the egress area a or b 1242 border BFR, therefore allowing path engineering for those two 1243 "unicast" legs: 1) BFIR to ingress are edge and 2) core to egress 1244 area edge. Replication only happens inside the egress areas. For 1245 BFER in the same area as in the BFIR, these four bits are not used. 1247 7.6. Summary 1249 BIER-TE can like BIER support multiple SI within a sub-domain to 1250 allow re-using the concept of BFR-id and therefore minimize BIER-TE 1251 specific functions in underlay routing, flow overlay methods and BIER 1252 headers. 1254 The number of BFIR/BFER possible in a subdomain is smaller than in 1255 BIER because BIER-TE uses additional bits for topology. 1257 Subdomains can in BIER-TE be used like in BIER to create more 1258 efficient replication to known subsets of BFER. 1260 Assigning bits for BFER intelligently into the right SI is more 1261 important in BIER-TE than in BIER because of replication efficiency 1262 and overall amount of bits required. 1264 8. BIER-TE and Segment Routing 1266 Segment Routing aims to achieve lightweight path engineering via 1267 loose source routing. Compared for example to RSVP-TE, it does not 1268 require per-path signaling to each of these hops. 1270 BIER-TE is supports the same design philosophy for multicast. Like 1271 in SR, it relies on source-routing - via the definition of a 1272 BitString. Like SR, it only requires to consider the "hops" on which 1273 either replication has to happen, or across which the traffic should 1274 be steered (even without replication). Any other hops can be skipped 1275 via the use of routed adjacencies. 1277 Instead of defining BitPositions for non-replicating hops, it is 1278 equally possible to use segment routing encapsulations (eg: MPLS 1279 label stacks) for "forward_routed" adjacencies. 1281 Note that BIER itself is also similar to SR - it achieves the same as 1282 "Shortest Path SID" where the label stack uses only one SID to 1283 indicate the egres node of the SR domain. Instead of routing such a 1284 SR packet hop-by-hop based on that SID, BIER routes the packet hop- 1285 by-hop based on the BFER-id bits of the egres nodes of the BIER 1286 domain. What BIER does not allow is to indicate intermediate hops, 1287 or terms of SR lavbel stacks with more than one SID in the stack (for 1288 the same SR domain). This is what BIER-TE provides. 1290 9. Security Considerations 1292 The security considerations are the same as for BIER with the 1293 following differences: 1295 BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures 1296 for their distribution, so these are not attack vectors against BIER- 1297 TE. 1299 10. IANA Considerations 1301 This document requests no action by IANA. 1303 11. Acknowledgements 1305 The authors would like to thank Greg Shepherd, Ijsbrand Wijnands and 1306 Neale Ranns for their extensive review and suggestions. 1308 12. Change log [RFC Editor: Please remove] 1310 draft-ietf-bier-te-arch: 1312 02: Refresh after IETF104 discussion: changed intended status back 1313 to standard. Reasoning: 1315 Tighter review of standards document == ensures arch will be 1316 better prepared for possible adoption by other WGs (e.g.: DetNet) 1317 or std. bodies. 1319 Requirement against the degree of existing implementations is self 1320 defined by the WG. BIER WG seems to think it is not necessary to 1321 apply multiple interoperating implementions against an 1322 architecture level document at this time to make it qualify to go 1323 to standards track. Also, the levels of support introduced in -01 1324 rev. should allow all BIER forwarding engines to also be able to 1325 support the base level BIER-TE forwarding. 1327 01: Added note comparing BIER and SR to also hopefully clarify 1328 BIER-TE vs. BIER comparison re. SR. 1330 - added requirements section mandating only most basic BIER-TE 1331 forwarding features as MUST. 1333 - reworked comparison with BIER forwarding section to only 1334 summarize and point to pseudocode section. 1336 - reworked pseudocode section to have one pseodcode that mirrors 1337 the BIER forwarding pseudocode to make comparison easier and a 1338 second pseudocode that shows the complete set of BIER-TE 1339 forwarding options and simplification/optimization possible vs. 1340 BIER forwarding. 1342 - Added captions to pictures. 1344 00: Changed target state to experimental (WG conclusion), updated 1345 references, mod auth association. 1347 - Source now on http://www.github.com/toerless/bier-te-arch 1349 - Please open issues on the github for change/improvement requests 1350 to the document - in addition to posting them on the list 1351 (bier@ietf.). Thanks!. 1353 draft-eckert-bier-te-arch: 1355 06: Added overview of forwarding differences between BIER, BIER- 1356 TE. 1358 05: Author affiliation change only. 1360 04: Added comparison to Live-Live and BFIR to FRR section 1361 (Eckert). 1363 04: Removed FRR content into the new FRR draft [I-D.eckert-bier- 1364 te-frr] (Braun). 1366 - Linked FRR information to new draft in Overview/Introduction 1368 - Removed BTAFT/FRR from "Changes in the network topology" 1370 - Linked new draft in "Link/Node Failures and Recovery" 1371 - Removed FRR from "The BIER-TE Forwarding Layer" 1373 - Moved FRR section to new draft 1375 - Moved FRR parts of Pseudocode into new draft 1377 - Left only non FRR parts 1379 - removed FrrUpDown(..) and //FRR operations in 1380 ForwardBierTePacket(..) 1382 - New draft contains FrrUpDown(..) and ForwardBierTePacket(Packet) 1383 from bier-arch-03 1385 - Moved "BIER-TE and existing FRR to new draft 1387 - Moved "BIER-TE and Segment Routing" section one level up 1389 - Thus, removed "Further considerations" that only contained this 1390 section 1392 - Added Changes for version 04 1394 03: Updated the FRR section. Added examples for FRR key concepts. 1395 Added BIER-in-BIER tunneling as option for tunnels in backup 1396 paths. BIFT structure is expanded and contains an additional 1397 match field to support full node protection with BIER-TE FRR. 1399 03: Updated FRR section. Explanation how BIER-in-BIER 1400 encapsulation provides P2MP protection for node failures even 1401 though the routing underlay does not provide P2MP. 1403 02: Changed the definition of BIFT to be more inline with BIER. 1404 In revs. up to -01, the idea was that a BIFT has only entries for 1405 a single bitstring, and every SI and subdomain would be a separate 1406 BIFT. In BIER, each BIFT covers all SI. This is now also how we 1407 define it in BIER-TE. 1409 02: Added Section 7 to explain the use of SI, subdomains and BFR- 1410 id in BIER-TE and to give an example how to efficiently assign 1411 bits for a large topology requiring multiple SI. 1413 02: Added further detailed for rings - how to support input from 1414 all ring nodes. 1416 01: Fixed BFIR -> BFER for section 4.3. 1418 01: Added explanation of SI, difference to BIER ECMP, 1419 consideration for Segment Routing, unicast FRR, considerations for 1420 encapsulation, explanations of BIER-TE controller host and CLI. 1422 00: Initial version. 1424 13. References 1426 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1427 Requirement Levels", BCP 14, RFC 2119, 1428 DOI 10.17487/RFC2119, March 1997, 1429 . 1431 [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1432 Przygienda, T., and S. Aldrin, "Multicast Using Bit Index 1433 Explicit Replication (BIER)", RFC 8279, 1434 DOI 10.17487/RFC8279, November 2017, 1435 . 1437 [RFC8296] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1438 Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation 1439 for Bit Index Explicit Replication (BIER) in MPLS and Non- 1440 MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January 1441 2018, . 1443 Authors' Addresses 1445 Toerless Eckert (editor) 1446 Huawei USA - Futurewei Technologies Inc. 1447 2330 Central Expy 1448 Santa Clara 95050 1449 USA 1451 Email: tte+ietf@cs.fau.de 1453 Gregory Cauchie 1454 Bouygues Telecom 1456 Email: GCAUCHIE@bouyguestelecom.fr 1458 Wolfgang Braun 1459 University of Tuebingen 1461 Email: wolfgang.braun@uni-tuebingen.de 1462 Michael Menth 1463 University of Tuebingen 1465 Email: menth@uni-tuebingen.de