idnits 2.17.1 draft-eckert-bier-te-arch-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([I-D.ietf-bier-architecture]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 18, 2015) is 3103 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'VRF' is mentioned on line 996, but not defined == Missing Reference: 'Index' is mentioned on line 978, but not defined == Missing Reference: 'BitStringLength' is mentioned on line 928, but not defined == Missing Reference: 'BP' is mentioned on line 972, but not defined == Missing Reference: 'BT' is mentioned on line 973, but not defined == Missing Reference: 'I' is mentioned on line 983, but not defined == Unused Reference: 'I-D.ietf-bier-mpls-encapsulation' is defined on line 1361, but no explicit reference was found in the text == Outdated reference: A later version (-08) exists of draft-ietf-bier-architecture-02 == Outdated reference: A later version (-12) exists of draft-ietf-bier-mpls-encapsulation-02 Summary: 2 errors (**), 0 flaws (~~), 11 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Eckert 3 Internet-Draft Cisco Systems, Inc. 4 Intended status: Standards Track G. Cauchie 5 Expires: April 20, 2016 Bouygues Telecom 6 October 18, 2015 8 Traffic Enginering for Bit Index Explicit Replication BIER-TE 9 draft-eckert-bier-te-arch-02 11 Abstract 13 This document proposes an architecture for BIER-TE: Traffic 14 Engineering for Bit Index Explicit Replication (BIER). 16 BIER-TE shares part of its architecture with BIER as described in 17 [I-D.ietf-bier-architecture]. It also proposes to share the packet 18 format with BIER. 20 BIER-TE forwards and replicates packets like BIER based on a 21 BitString in the packet header but it does not require an IGP. It 22 does support traffic engineering by explicit hop-by-hop forwarding 23 and loose hop forwarding of packets. It does support Fast ReRoute 24 (FRR) for link and node protection and incremental deployment. 25 Because BIER-TE like BIER operates without explicit in-network tree- 26 building but also supports traffic engineering, it is more similar to 27 SR than RSVP-TE. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on April 20, 2016. 46 Copyright Notice 48 Copyright (c) 2015 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 66 2. Layering . . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 2.1. The Multicast Flow Overlay . . . . . . . . . . . . . . . 5 68 2.2. The BIER-TE Controller Host . . . . . . . . . . . . . . . 5 69 2.2.1. Assignment of BitPositions to adjacencies of the 70 network topology . . . . . . . . . . . . . . . . . . 6 71 2.2.2. Changes in the network topology . . . . . . . . . . . 6 72 2.2.3. Set up per-multicast flow BIER-TE state . . . . . . . 6 73 2.2.4. Link/Node Failures and Recovery . . . . . . . . . . . 6 74 2.3. The BIER-TE Forwarding Layer . . . . . . . . . . . . . . 7 75 2.4. The Routing Underlay . . . . . . . . . . . . . . . . . . 7 76 3. BIER-TE Forwarding . . . . . . . . . . . . . . . . . . . . . 7 77 3.1. The Bit Index Forwarding Table (BIFT) . . . . . . . . . . 7 78 3.2. Adjacency Types . . . . . . . . . . . . . . . . . . . . . 8 79 3.2.1. Forward Connected . . . . . . . . . . . . . . . . . . 8 80 3.2.2. Forward Routed . . . . . . . . . . . . . . . . . . . 9 81 3.2.3. ECMP . . . . . . . . . . . . . . . . . . . . . . . . 9 82 3.2.4. Local Decap . . . . . . . . . . . . . . . . . . . . . 9 83 3.3. Encapsulation considerations . . . . . . . . . . . . . . 10 84 3.4. Basic BIER-TE Forwarding Example . . . . . . . . . . . . 10 85 4. BIER-TE Controller Host BitPosition Assignments . . . . . . . 12 86 4.1. P2P Links . . . . . . . . . . . . . . . . . . . . . . . . 12 87 4.2. BFER . . . . . . . . . . . . . . . . . . . . . . . . . . 13 88 4.3. Leaf BFERs . . . . . . . . . . . . . . . . . . . . . . . 13 89 4.4. LANs . . . . . . . . . . . . . . . . . . . . . . . . . . 13 90 4.5. Hub and Spoke . . . . . . . . . . . . . . . . . . . . . . 14 91 4.6. Rings . . . . . . . . . . . . . . . . . . . . . . . . . . 14 92 4.7. Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . . 15 93 4.8. Routed adjacencies . . . . . . . . . . . . . . . . . . . 17 94 4.8.1. Reducing BitPositions . . . . . . . . . . . . . . . . 17 95 4.8.2. Supporting nodes without BIER-TE . . . . . . . . . . 17 96 5. Avoiding loops and duplicates . . . . . . . . . . . . . . . . 17 97 5.1. Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 17 98 5.2. Duplicates . . . . . . . . . . . . . . . . . . . . . . . 18 99 6. BIER-TE FRR . . . . . . . . . . . . . . . . . . . . . . . . . 18 100 6.1. The BIER-TE Adjacency FRR Table (BTAFT) . . . . . . . . . 18 101 6.2. FRR in BIER-TE forwarding . . . . . . . . . . . . . . . . 19 102 6.3. FRR in the BIER-TE Controller Host . . . . . . . . . . . 19 103 6.4. BIER-TE FRR Benefits . . . . . . . . . . . . . . . . . . 20 104 7. BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . . 20 105 8. Managing SI, subdomains and BFR-ids . . . . . . . . . . . . . 23 106 8.1. Why SI and sub-domains . . . . . . . . . . . . . . . . . 23 107 8.2. Bit assignment comparison BIER and BIER-TE . . . . . . . 24 108 8.3. Using BFR-id with BIER-TE . . . . . . . . . . . . . . . . 24 109 8.4. Assigning BFR-ids for BIER-TE . . . . . . . . . . . . . . 25 110 8.5. Example bit allocations . . . . . . . . . . . . . . . . . 26 111 8.5.1. With BIER . . . . . . . . . . . . . . . . . . . . . . 26 112 8.5.2. With BIER-TE . . . . . . . . . . . . . . . . . . . . 27 113 8.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 28 114 9. Further considerations . . . . . . . . . . . . . . . . . . . 28 115 9.1. BIER-TE and existing FRR . . . . . . . . . . . . . . . . 28 116 9.2. BIER-TE and Segment Routing . . . . . . . . . . . . . . . 29 117 10. Security Considerations . . . . . . . . . . . . . . . . . . . 29 118 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 119 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 29 120 13. Change log [RFC Editor: Please remove] . . . . . . . . . . . 29 121 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 122 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 124 1. Introduction 126 1.1. Overview 128 This document specifies the architecture for BIER-TE: traffic 129 engineering for Bit Index Explicit Replication BIER. 131 BIER-TE shares architecture and packet formats with BIER as described 132 in [I-D.ietf-bier-architecture]. 134 BIER-TE forwards and replicates packets like BIER based on a 135 BitString in the packet header but it does not require an IGP. It 136 does support traffic engineering by explicit hop-by-hop forwarding 137 and loose hop forwarding of packets. It does support Fast ReRoute 138 (FRR) for link and node protection and incremental deployment. 139 Because BIER-TE like BIER operates without explicit in-network tree- 140 building but also supports traffic engineering, it is more similar to 141 SR than RSVP-TE. 143 The key differences over BIER are: 145 o BIER-TE replaces in-network autonomous path calculation by 146 explicit paths calculated offpath by the BIER-TE controller host. 148 o In BIER-TE every BitPosition of the BitString of a BIER-TE packet 149 indicates one or more adjacencies - instead of a BFER as in BIER. 151 o BIER-TE in each BFR has no routing table but only a BIER-TE 152 Forwarding Table (BIFT) indexed by SI:BitPosition and populated 153 with only those adjacencies to which the BFR should replicate 154 packets to. 156 BIER-TE headers use the same format as BIER headers. 158 BIER-TE forwarding does not require/use the BFIR-ID. The BFIR-ID can 159 still be useful though for coordinated BFIR/BFER functions, such as 160 the context for upstream assigned labels for MPLS payloads in MVPN 161 over BIER-TE. 163 If the BIER-TE domain is also running BIER, then the BFIR-ID in BIER- 164 TE packets can be set to the same BFIR-ID as used with BIER packets. 166 If the BIER-TE domain is not running full BIER or does not want to 167 reduce the need to allocate bits in BIER bierstrings for BFIR-ID 168 values, then the allocation of BFIR-ID values in BIER-TE packets can 169 be done through other mechanisms outside the scope of this document, 170 as long as this is appropraitely agreed upon between all BFIR/BFER. 172 Currently, this specification has no considerations for BIER sub- 173 domains. 175 1.2. Requirements Language 177 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 178 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 179 document are to be interpreted as described in RFC 2119 [RFC2119]. 181 2. Layering 183 End to end BIER-TE operations consists of four components: The 184 "Multicast Flow Overlay", the "BIER-TE Controller Host", the "Routing 185 Underlay" and the "BIER-TE forwarding layer". 187 Picture 2: Layers of BIER-TE 189 <------BGP/PIM-----> 190 |<-IGMP/PIM-> multicast flow <-PIM/IGMP->| 191 overlay 193 [Bier-TE Controller Host] 194 ^ ^ ^ 195 / | \ BIER-TE control protocol 196 | | | eg.: Netconf/Restconf/Yang 197 v v v 198 Src -> Rtr1 -> BFIR-----BFR-----BFER -> Rtr2 -> Rcvr 200 |--------------------->| 201 BIER-TE forwarding layer 203 |<- BIER-TE domain-->| 205 |<--------------------->| 206 Routing underlay 208 2.1. The Multicast Flow Overlay 210 The Multicast Flow Overlay operates as in BIER. See 211 [I-D.ietf-bier-architecture]. Instead of interacting with the BIER 212 layer, it interacts with the BIER-TE Controller Host 214 2.2. The BIER-TE Controller Host 216 The BIER-TE controller host is representing the control plane of 217 BIER-TE. It communicates two sets of informations with BFRs: 219 During bring-up or modifications of the network topology, the 220 controller discovers the network topology, assigns BitPositions to 221 adjacencies and signals the resulting mapping of BitPositions to 222 adjacencies to each BFR connecting to the adjacency. 224 During day-to-day operations of the network, the controller signals 225 to BFIRs what multicast flows are mapped to what BitStrings. 227 Communications between the BIER-TE controller host to BFRs is ideally 228 via standardized protocols and data-models such as Netconf/Retconf/ 229 Yang. This is currently outside the scope of this document. Vendor- 230 specific CLI on the BFRs is also a posible stopgap option (as in many 231 other SDN solutions lacking definition of standardized data model). 233 For simplicity, the procedures of the BIER-TE controller host are 234 described in this document as if it is a single, centralized 235 automated entity, such as an SDN controller. It could equally be an 236 operator setting up CLI on the BFRs. Distribution of the functions 237 of the BIER-TE controller host is currently outside the scope of this 238 document. 240 2.2.1. Assignment of BitPositions to adjacencies of the network 241 topology 243 The BIER-TE controller host tracks the BFR topology of the BIER-TE 244 domain. It determines what adjacencies require BitPositions so that 245 BIER-TE explicit paths can be built through them as desired by 246 operator policy. 248 The controller then pushes the BitPositions/adjacencies to the BIFT 249 of the BFRs, populating only those SI:BitPositions to the BIFT of 250 each BFR to which that BFR should be able to send packets to - 251 adjacencies connecting to this BFR. 253 2.2.2. Changes in the network topology 255 If the network topology changes (not failure based) so that 256 adjacencies that are assigned to BitPositions are no longer needed, 257 the controller can re-use those BitPositions for new adjacencies. 258 First, these BitPositions need to be removed from any BFIR flow state 259 and BFR BIFT state (and BTAFT if FRR is supported, see below), then 260 they can be repopulated, first into BIFT (and if FRR is supported 261 BTAFT), then into BFIR. 263 2.2.3. Set up per-multicast flow BIER-TE state 265 The BIER-TE controller host tracks the multicast flow overlay to 266 determine what multicast flow needs to be sent by a BFIR to which set 267 of BFER. It calculates the desired distribution tree across the 268 BIER-TE domain based on algorithms outside the scope of this document 269 (eg.: CSFP, Steiner Tree,...). It then pushes the calculated 270 BitString into the BFIR. 272 2.2.4. Link/Node Failures and Recovery 274 When link or nodes fail or recover in the topology, BIER-TE can 275 quickly respond with the optional FRR procedures described below. It 276 can also more slowly react by recalculating the BitStrings of 277 affected multicast flows. This reaction is slower than the FR 278 procedure because the controller needs to receive link/node up/down 279 indications, recalculate the desired BitStrings and push them down 280 into the BFIRs. with FRR, this is all performed locally on a BFR 281 receiving the adjacency up/down notification. 283 2.3. The BIER-TE Forwarding Layer 285 When the BIER-TE Forwarding Layer receives a packet, it simply looks 286 up the BitPositions that are set in the BitString of the packet in 287 the Bit Index Forwarding Table (BIFT) that was populated by the BIER- 288 TE controller host. For every BP that is set in the BitString, and 289 that has one or more adjacencies in the BIFT, a copy is made 290 according to the type of adjacencies for that BP in the BIFT. Before 291 sending any copy, the BFR resets all BitPositions in the BitString of 292 the packet to which it can create a copy. This is done to inhibit 293 that packets can loop. 295 If the BFR support BIER-TE FRR operations, then the BIER-TE 296 forwarding layer will receive fast adjacency up/down notification 297 uses the BIER-TE FRR Adjacency Table to modify the BitString of the 298 packet before it performs BIER-TE forwarding. This is detailed in 299 the FRR section. 301 2.4. The Routing Underlay 303 BIER-TE is sending BIER packets to directly connected BIER-TE 304 neighbors as L2 (unicasted) BIER packets without requiring a routing 305 underlay. BIER-TE forwarding uses the Routing underlay for 306 forward_routed adjacencies which copy BIER-TE packets to not- 307 directly-connected BFRs (see below for adjacency definitions). 309 If the BFR intends to support FRR for BIER-TE, then the BIER-TE 310 forwarding plane needs to receive fast adjacency up/down 311 notifications: Link up/down or neighbor up/down, eg.: from BFD. 312 Providing these notifications is considered to be part of the routing 313 underlay in this document. 315 3. BIER-TE Forwarding 317 3.1. The Bit Index Forwarding Table (BIFT) 319 The Bit Index Forwarding Table (BIFT) exists in every BFR. For every 320 subdomain in use, it is a table indexed by SI:BitPosition and is 321 populated by the BIER-TE control plane. Each index can be empty or 322 contain a list of one or more adjacencies. 324 BIER-TE can support multiple subdomains like BIER. Each one with a 325 separate BIFT 327 In the BIER architecture, indices into the BIFT are explained to be 328 both BFR-id and SI:Bitstring (BitPosition). This is because there is 329 a 1:1 relationship between BFR-id and SI:Bitstring - every bit in 330 every SI is/can be assigned to a BFIR/BFER. In BIER-TE there are 331 more bits used in each BitString than there are BFIR/BFER assigned to 332 the bitstring. This is because of the bits required to express the 333 (traffic engineered) path through the topology. The BIER-TE 334 forwarding definitions do therefore not use the term BFR-id at all. 335 Instead, BFR-ids are only used as required by routing underlay, flow 336 overlay of BIER headers. Please refer to Section 8 for explanations 337 how to deal with SI, subdomains and BFR-id in BIER-TE. 339 ------------------------------------------------------------------ 340 | Index: | Adjacencies: | 341 | SI:BitPosition | or one or more per entry | 342 ================================================================== 343 | 0:1 | forward_connected(interface,neighbor,DNR) | 344 ------------------------------------------------------------------ 345 | 0:2 | forward_connected(interface,neighbor,DNR) | 346 | | forward_connected(interface,neighbor,DNR) | 347 ------------------------------------------------------------------ 348 | 0:3 | local_decap([VRF]) | 349 ------------------------------------------------------------------ 350 | 0:4 | forward_routed([VRF,]l3-neighbor) | 351 ------------------------------------------------------------------ 352 | 0:5 | | 353 ------------------------------------------------------------------ 354 | 0:6 | ECMP({adjacency1,...adjacencyN}, seed) | 355 ------------------------------------------------------------------ 356 ... 357 | BitStringLength | ... | 358 ------------------------------------------------------------------ 359 Bit Index Forwarding Table 361 The BIFT is programmed into the data plane of BFRs by the BIER-TE 362 controller host and used to forward packets, according to the rules 363 specified in the BIER-TE Forwarding Procedures. 365 Adjacencies for the same BP when populated in more than one BFR by 366 the controller do not have to have the same adjacencies. This is up 367 to the controller. BPs for p2p links are one case (see below). 369 3.2. Adjacency Types 371 3.2.1. Forward Connected 373 A "forward_connected" adjacency is towards a directly connected BFR 374 neighbor using an interface address of that BFR on the connecting 375 interface. A forward_connected adjacency does not route packets but 376 only L2 forwards them to the neighbor. 378 Packets sent to an adjacency with "DoNotReset" (DNR) set in the BIFT 379 will not have the BitPosition for that adjacency reset when the BFR 380 creates a copy for it. The BitPosition will still be reset for 381 copies of the packet made towards other adjacencies. The can be used 382 for example in ring topologies as explained below. 384 3.2.2. Forward Routed 386 A "forward_routed" adjacency is an adjacency towards a BFR that is 387 not a forward_connected adjacency: towards a loopback address of a 388 BFR or towards an interface address that is non-directly connected. 389 Forward_routed packets are forwarded via the Routing Underlay. 391 If the Routing Underlay has multiple paths for a forward_routed 392 adjacency, it will perform ECMP independent of BIER-TE for packets 393 forwarded across a forward_routed adjacency. 395 If the Routing Underlay has FRR, it will perform FRR independent of 396 BIER-TE for packets forwarded across a forward_routed adjacency. 398 3.2.3. ECMP 400 The ECMP mechanisms in BIER are tied to the BIER BIFT and are are 401 therefore not directly useable with BIER-TE. The following 402 procedures describe ECMP for BIER-TE that we consider to be 403 lightweight but also well manageable. It leverages the existing 404 entropy parameter in the BIER header to keep packets of the flows on 405 the same path anbd it introduces a "seed" parameter to allow 406 engineering traffic to be polarized or randomized across multiple 407 hops. 409 An "Equal Cost Multipath" (ECMP) adjacency has a list of two or more 410 adjacencies included in it. It copies the BIER-TE to one of those 411 adjacencies based on the ECMP hash calculation. The BIER-TE ECMP 412 hash algorithm must select the same adjacency from that list for all 413 packets with the same "entropy" value in the BIER-TE header if the 414 same number of adjacencies and same seed are given as parameters. 415 Further use of the seed parameter is explained below. 417 3.2.4. Local Decap 419 A "local_decap" adjacency passes a copy of the payload of the BIER-TE 420 packet to the packets NextProto within the BFR (IPv4/IPv6, 421 Ethernet,...). A local_decap adjacency turns the BFR into a BFER for 422 matching packets. Local_decap adjacencies require the BFER to 423 support routing or switching for NextProto to determine how to 424 further process the packet. 426 3.3. Encapsulation considerations 428 Specifications for BIER-TE encapsulation are outside the scope of 429 this document. This section gives explanations and guidelines. 431 Because a BFR needs to interpret the BitString of a BIER-TE packet 432 differently from a BIER packet, it is necessary to distinguish BIER 433 from BIER-TE packets. This is subject to definitions in BIER 434 encapsulation specifications. 436 MPLS encapsulation for example assigns one label by which BFRs 437 recognizes BIER packets for every (SI,subdomain) combination. If it 438 is desirable that every subdomain can forward only BIER or BIER-TE 439 packets, then the label allocation could stay the same, and only the 440 forwarding model (BIER/BIER-TE) would have to be defined per 441 subdomain. If it id desirable to support both BIER and BIER-TE 442 forwarding in the same subdomain, then additional label would need to 443 be assigned for BIER-TE forwarding. 445 "forward_routed" requires an encapsulation permitting to unicast 446 BIER-TE packets to a specific interface address on a target BFR. 447 With MPLS encapsulation, this can simply be done via a label stack 448 with that addresses label as the top label - followed by the label 449 assigned to (SI,subdomain) - and if necessary (see above) BIER-TE. 450 With non-MPLS encapsulation, some form of IP tunneling (IP in IP, 451 LISP, GRE) would be required. 453 The encapsulation used for "forward_routed" adjacencies can equally 454 support existing advanced adjacency information such as "loose source 455 routes" via eg: MPLS label stacks or appropriate header extensions 456 (eg: for IPv6). 458 3.4. Basic BIER-TE Forwarding Example 460 Step by step example of basic BIER-TE forwarding. This does not use 461 ECMP or forward_routed adjacencies nor does it try to minimize the 462 number of required BitPositions for the topology. 464 Picture 1: Forwarding Example 466 [Bier-Te Controller Host] 467 / | \ 468 v v v 470 | p13 p1 | 471 +- BFIR2 --+ | 472 | | p2 p6 | LAN2 473 | +-- BFR3 --+ | 474 | | | p7 p11 | 475 Src -+ +-- BFER1 --+ 476 | | p3 p8 | | 477 | +-- BFR4 --+ +-- Rcv1 478 | | | | 479 | | 480 | p14 p4 | 481 +- BFIR1 --+ | 482 | +-- BFR5 --+ p10 p12 | 483 LAN1 | p5 p9 +-- BFER2 --+ 484 | +-- Rcv2 485 | 486 LAN3 488 IP |..... BIER-TE network......| IP 490 pXX indicate the BitPositions number assigned by the BIER-TE 491 controller host to adjacencies in the BIER-TE topology. For example, 492 p9 is the adjacency towards BFR9 on the LAN connecting to BFER2. 494 BIFT BFIR2: 495 p13: local_decap() 496 p2: forward_connected(BFR3) 498 BIFT BFR3: 499 p1: forward_connected(BFIR2) 500 p7: forward_connected(BFER1) 501 p8: forward_connected(BFR4) 503 BIFT BFER1: 504 p11: local_decap() 505 p6: forward_connected(BFR3) 506 p8: forward_connected(BFR4) 508 ...and so on. 510 Traffic needs to flow from BFIR2 towards Rcv1, Rcv2. The controller 511 determines it wants it to pass across the following paths: 513 -> BFER1 ---------------> Rcv1 514 BFIR2 -> BFR3 515 -> BFR4 -> BFR5 -> BFER2 -> Rcv2 517 These paths equal to the following BitString: p2, p5, p7, p8, p10, 518 p11, p12 520 This BitString is set up in BFIR2. Multicast packets arriving at 521 BFIR2 from Src are assigned this BitString. 523 BFIR2 forwards based on that BitString. It has p2 and p13 populated. 524 Only p13 is in BitString which has an adjacency towards BFR3. BFIR2 525 resets p2 in BitString and sends a copy towards BFR2. 527 BFR3 sees a BitString of p5,p7,p8,p10,p11,p12. It is only interested 528 in p1,p7,p8. It creates a copy of the packet to BFER1 (due to p7) 529 and one to BFR4 (due to p8). It resets p7, p8 before sending. 531 BFER1 sees a BitString of p5,p10,p11,p12. It is only interested in 532 p6,p7,p8,p11 and therefore considers only p11. p11 is a "local_decap" 533 adjacency installed by the BIER-TE controller host because BFER1 534 should pass packets to IP multicast. The local_decap adjacency 535 instructs BFER1 to create a copy, decapsulate it from the BIER header 536 and pass it on to the NextProtocol, in this example IP multicast. IP 537 multicast will then forward the packet out to LAN2 because it did 538 receive PIM or IGMP joins on LAN2 for the traffic. 540 Further processing of the packet in BFR4, BFR5 and BFER2 accordingly. 542 4. BIER-TE Controller Host BitPosition Assignments 544 This section describes how the BIER-TE controller host can use the 545 different BIER-TE adjacency types to define the BitPositions of a 546 BIER-TE domain. 548 Because the size of the BitString is limiting the size of the BIER-TE 549 domain, many of the options described exist to support larger 550 topologies with fewer BitPositions (4.1, 4.3, 4.4, 4.5, 4.6, 4.7, 551 4.8). 553 4.1. P2P Links 555 Each P2p link in the BIER-TE domain is assigned one unique 556 BitPosition with a forward_connected adjacency pointing to the 557 neighbor on the p2p link. 559 4.2. BFER 561 Every BFER is given a unique BitPosition with a local_decap 562 adjacency. 564 4.3. Leaf BFERs 566 Leaf BFERs are BFERs where incoming BIER-TE packets never need to be 567 forwarded to another BFR but are only sent to the BFER to exit the 568 BIER-TE domain. For example, in networks where PEs are spokes 569 connected to P routers, those PEs are Leaf BFIRs unless there is a 570 U-turn between two PEs. 572 All leaf-BFER in a BIER-TE domain can share a single BitPosition. 573 This is possible because the BitPosition for the adjacency to reach 574 the BFER can be used to distinguish whether or not packets should 575 reach the BFER. 577 This optimization will not work if an upstream interface of the BFER 578 is using a BitPosition optimized as described in the following two 579 sections (LAN, Hub and Spoke). 581 4.4. LANs 583 In a LAN, the adjacency to each neighboring BFR on the LAN is given a 584 unique BitPosition. The adjacency of this BitPosition is a 585 forward_connected adjacency towards the BFR and this BitPosition is 586 populated into the BIFT of all the other BFRs on that LAN. 588 BFR1 589 |p1 590 LAN1-+-+---+-----+ 591 p3| p4| p2| 592 BFR3 BFR4 BFR7 594 If Bandwidth on the LAN is not an issue and most BIER-TE traffic 595 should be copied to all neighbors on a LAN, then BitPositions can be 596 saved by assigning just a single BitPosition to the LAN and 597 populating the BitPosition of the BIFTs of each BFRs on the LAN with 598 a list of forward_connected adjacencies to all other neighbors on the 599 LAN. 601 This optimization does not work in the face of BFRs redundantly 602 connected to more than one LANs with this optimization because these 603 BFRs would receive duplicates and forward those duplicates into the 604 opposite LANs. Adjacencies of such BFRs into their LANs still need a 605 separate BitPosition. 607 4.5. Hub and Spoke 609 In a setup with a hub and multiple spokes connected via separate p2p 610 links to the hub, all p2p links can share the same BitPosition. The 611 BitPosition on the hubs BIFT is set up with a list of 612 forward_connected adjacencies, one for each Spoke. 614 This option is similar to the BitPosition optimization in LANs: 615 Redundantly connected spokes need their own BitPositions. 617 4.6. Rings 619 In L3 rings, instead of assigning a single BitPosition for every p2p 620 link in the ring, it is possible to save BitPositions by setting the 621 "Do Not Reset" (DNR) flag on forward_connected adjacencies. 623 For the rings shown in the following picture, a single BitPosition 624 will suffice to forward traffic entering the ring at BFRa or BFRb all 625 the way up to BFR1: 627 On BFRa, BFRb, BFR30,... BFR3, the BitPosition is populated with a 628 forward_connected adjacency pointing to the clockwise neighbor on the 629 ring and with DNR set. On BFR2, the adjacency also points to the 630 clockwise neighbor BFR1, but without DNR set. 632 Handling DNR this way ensures that copies forwarded from any BFR in 633 the ring to a BFR outside the ring will not have the ring BitPosition 634 set, therefore minimizing the chance to create loops. 636 v v 637 | | 638 L1 | L2 | L3 639 /-------- BFRa ---- BFRb --------------------\ 640 | | 641 \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/ 642 | | L4 | | 643 p33| p15| 644 BFRd BFRc 646 Note that this example only permits for packets to enter the ring at 647 BFRa and BFRb, and that packets will always travel clockwise. If 648 packets should be allowed to enter the ring at any ring BFR, then one 649 would have to use two ring BitPositions. One for clockwise, one for 650 counterlockwise. 652 Both would be set up to stop rotating on the same link, eg: L1. When 653 the ingres ring BFR creates the clockwise copy, it will reset the 654 counterlockwise BitPosition because the DNR bit only applies to the 655 bit for which the replication is done. Likewise for the clockwise 656 BitPosition for the counterlockwise copy. In result, the ring ingres 657 BFR will send a copy in both directions, serving BFRs on either side 658 of the ring up to L1. 660 4.7. Equal Cost MultiPath (ECMP) 662 The ECMP adjacency allows to use just one BP per link bundle between 663 two BFRs instead of one BP for each p2p member link of that link 664 bundle. In the following picture, one BP is used across L1,L2,L3 and 665 BFR1/BFR2 have for the BP 667 --L1----- 668 BFR1 --L2----- BFR2 669 --L3----- 671 BIFT entry in BFR1: 672 ------------------------------------------------------------------ 673 | Index | Adjacencies | 674 ================================================================== 675 | 0:6 | ECMP({L1-to-BFR2,L2-to-BFR2,L3-to-BFR2}, seed) | 676 ------------------------------------------------------------------ 678 BIFT entry in BFR2: 679 ------------------------------------------------------------------ 680 | Index | Adjacencies | 681 ================================================================== 682 | 0:6 | ECMP({L1-to-BFR1,L2-to-BFR1,L3-to-BFR1}, seed) | 683 ------------------------------------------------------------------ 685 In the following example, all traffic from BFR1 towards BFR10 is 686 intended to be ECMP load split equally across the topology. This 687 example is not mean as a likely setup, but to illustrate that ECMP 688 can be used to share BPs not only across link bundles, and it 689 explains the use of the seed parameter. 691 BFR1 692 / \ 693 /L11 \L12 694 BFR2 BFR3 695 / \ / \ 696 /L21 \L22 /L31 \L32 697 BFR4 BFR5 BFR6 BFR7 698 \ / \ / 699 \ / \ / 700 BFR8 BFR9 701 \ / 702 \ / 703 BFR10 705 BIFT entry in BFR1: 706 ------------------------------------------------------------------ 707 | 0:6 | ECMP({L11-to-BFR2,L12-to-BFR3}, seed) | 708 ------------------------------------------------------------------ 710 BIFT entry in BFR2: 711 ------------------------------------------------------------------ 712 | 0:6 | ECMP({L21-to-BFR4,L22-to-BFR5}, seed) | 713 ------------------------------------------------------------------ 715 BIFT entry in BFR3: 716 ------------------------------------------------------------------ 717 | 0:6 | ECMP({L31-to-BFR6,L32-to-BFR7}, seed) | 718 ------------------------------------------------------------------ 720 With the setup of ECMP in above topology, traffic would not be 721 equally load-split. Instead, links L22 and L31 would see no traffic 722 at all: BFR2 will only see traffic from BFR1 for which the ECMP hash 723 in BFR1 selected the first adjacency in a list of 2 adjacencies: link 724 L11-to-BFR2. When forwarding in BFR2 performs again an ECMP with two 725 adjacencies on that subset of traffic, then it will again select the 726 first of its two adjacencies to it: L21-to-BFR4. And therefore L22 727 and BFR5 sees no traffic. 729 To resolve this issue, the ECMP adjaceny on BFR1 simply needs to be 730 set up with a different seed than the ECMP adjacncies on BFR2/BFR3 732 This issue is called polarization. It depends on the ECMP hash. It 733 is possible to build ECMP that does not have polarization, for 734 example by taking entropy from the actual adjacency members into 735 account, but that can make it harder to achieve evenly balanced load- 736 splitting on all BFR without making the ECMP hash algorithm 737 potentially too complex for fast forwarding in the BFRs. 739 4.8. Routed adjacencies 741 4.8.1. Reducing BitPositions 743 Routed adjacencies can reduce the number of BitPositions required 744 when the traffic engineering requirement is not hop-by-hop explicit 745 path selection, but loose-hop selection. 747 ............... ............... 748 BFR1--... Redundant ...--L1-- BFR2... Redundant ...--- 749 \--... Network ...--L2--/ ... Network ...--- 750 BFR4--... Segment 1 ...--L3-- BFR3... Segment 2 ...--- 751 ............... ............... 753 Assume the requirement in above network is to explicitly engineer 754 paths such that specific traffic flows are passed from segment 1 to 755 segment 2 via link L1 (or via L2 or via L3). 757 To achieve this, BFR1 and BFR4 are set up with a forward_routed 758 adjacency BitPosition towards an address of BFR2 on link L1 (or link 759 L2 BFR3 via L3). 761 For paths to be engineered through a specific node BFR2 (or BFR3), 762 BFR1 and BFR4 are set up up with a forward_routed adjacency 763 BitPosition towards a loopback address of BFR2 (or BFR3). 765 4.8.2. Supporting nodes without BIER-TE 767 Routed adjacencies also enable incremental deployment of BIER-TE. 768 Only the nodes through which BIER-TE traffic needs to be steered - 769 with or without replication - need to support BIER-TE. Where they 770 are not directly connected to each other, forward_routed adjacencies 771 are used to pass over non BIER-TE enabled nodes. 773 5. Avoiding loops and duplicates 775 5.1. Loops 777 Whenever BIER-TE creates a copy of a packet, the BitString of that 778 copy will have all BitPositions cleared that are associated with 779 adjacencies in the BFR. This inhibits looping of packets. The only 780 exception are adjacencies with DNR set. 782 With DNR set, looping can happen. Consider in the ring picture that 783 link L4 from BFR3 is plugged into the L1 interface of BFRa. This 784 creates a loop where the rings clockwise BitPosition is never reset 785 for copies of the packets traveling clockwise around the ring. 787 To inhibit looping in the face of such physical misconfiguration, 788 only forward_connected adjacencies are permitted to have DNR set, and 789 the link layer destination address of the adjacency (eg.: MAC 790 address) protects against closing the loop. Link layers without port 791 unique link layer addresses should not used with the DNR flag set. 793 5.2. Duplicates 795 Duplicates happen when the topology of the BitString is not a tree 796 but redundantly connecting BFRs with each other. The controller must 797 therefore ensure to only create BitStrings that are trees in the 798 topology. 800 When links are incorrectly physically re-connected before the 801 controller updates BitStrings in BFIRs, duplicates can happen. Like 802 loops, these can be inhibited by link layer addressing in 803 forward_connected adjacencies. 805 If interface or loopback addresses used in forward_routed adjacencies 806 are moved from one BFR to another, duplicates can equally happen. 807 Such re-addressing operations must be coordinated with the 808 controller. 810 6. BIER-TE FRR 812 FRR is an optional procedure. To leverage it, the BIER-TE controller 813 host and BFRs need to support it. It does not have to be supported 814 on all BFRs, but only those that are attached to a link/adjacency for 815 which FRR support is required. 817 If BIER-TE FRR is supported by the BIER-TE controller host, then it 818 needs to calculate the desired backup paths for link and/or node 819 failures in the BIER-TE domain and download this information into the 820 BIER-TE Adjacency FRR Table (BTAFT) of the BFRs. The BTAFT then 821 drives FRR operations in the BIER-TE forwarding plane of that BFR. 823 6.1. The BIER-TE Adjacency FRR Table (BTAFT) 825 The BIER-TE IF FRR Table exists in every BFR that is supporting BIER- 826 TE FRR procedures. It is indexed by FRR Adjacency Index. Associated 827 with each FRR Adjacency Index is a ResetBitmask, AddBitmask and 828 BitPosition. 830 ----------------------------------------------------------- 831 | FRR Adjacency | BitPosition | ResetBitmask | AddBitmask | 832 | Index | | | | 833 =========================================================== 834 | 0:1 | 5 | ..0010000 | ..11000000 | 835 ----------------------------------------------------------- 836 ... 838 An FRR Adjacency is an adjacency that is used in the BIFT of the BFR. 839 The BFR has to be able to determine whether the adjacency is up or 840 down in less than 50msec. An FRR adjacency can be a 841 forward_connected adjacency with fast L2 link state Up/Down state 842 notifications or a forward_connected or forward_routed adjacency with 843 a fast aliveness mechanism such as BFD. Details of those mechanism 844 are outside the scope of this architecture. 846 The FRR Adjacency Index is the index that would be indicated on the 847 fast Up/Down notifications to the BIER-TE forwarding plane 849 The BitPosition is the BP in the BIFT in which the FRR Adjacency is 850 used 852 6.2. FRR in BIER-TE forwarding 854 The BIER-TE forwarding plane receives fast Up/Down notifications with 855 the FRR Adjacency Index. From the BitPosition in the BTAFT entry, it 856 remembers which BPs are currently affected (have a down adjacency). 858 When a packet is received, BIER-TE forwarding checks if it has 859 affected BPs to which it would forward. If it does, it will remove 860 the ResetBitmask bits from the packets BitString and add the 861 AddBitmask bits to the packets BitString. 863 Afterwards, normal BIER-TE forwarding occurs, taking the modified 864 BitString into account. 866 6.3. FRR in the BIER-TE Controller Host 868 The basic rules how the BIER-TE controller host would calculate 869 ResetBitMask and AddBitmask are as follows: 871 1. The BIER-TE controller host has to determine whether a failure of 872 the adjacency should be taken to indicate link or node failure. 873 This is a policy decision. 875 2. The ResetBitmask has the BitPosition of the failed adjacency. 877 3. In the case of link protection, the AddBitmask are the segments 878 forming a path from the BFR over to the BFR on the other end of 879 the failed link. 881 4. In the case of node protection, the AddBitmask are the segments 882 forming a tree from the BFR over to all necessary BFR downstream 883 of the (assumed to be failed) BFR across the failed adjacency. 885 5. The ResetBitmask is extended with those segments that could lead 886 to duplicate packets if the AddBitmask is added to possible 887 BitStrings of packets using the failing BitPosition. 889 6.4. BIER-TE FRR Benefits 891 Compared to other FRR solutions, such as RSVP-TE/P2MP FRR, BIER-TE 892 FRR has two key distinctions 894 o It maintains the goal of BIER-TE not to establish in-network per 895 multicast traffic flow state. For that reason, the backup path/ 896 trees are only tied to the topology but not to individual 897 distribution trees. 899 o For the case of node failure, it allows to build a path engineered 900 backup tree (4.) as opposed to only a set of p2p backup tunnels. 902 7. BIER-TE Forwarding Pseudocode 904 The following sections of Pseudocode are meant to illustrate the 905 BIER-TE forwarding plane. This code is not meant to be normative but 906 to serve both as a potentially easier to read and more precise 907 representation of the forwarding functionality and to illustrate how 908 simple BIER-TE forwarding is and that it can be efficiently be 909 implemented. 911 The following procedure is executed on a BFR whenever the BIFT is 912 changed by the BIER-TE controller host: 914 global MyBitsOfInterest 916 void BIFTChanged() 917 { 919 for (Index = 0; Index++ ; Index <= BitStringLength) 920 if(BIFT[Index] != ) 921 MyBitsOfInterest != 2<<(Index-1) 922 } 924 The following procedure is executed whenever an adjacency used for 925 BIER-TE FRR changes state: 927 global ResetBitMaskByBT[BitStringLength] 928 global AddtBitMaskByBT[BitStringLength] 929 global FRRaffectedBP 931 void FrrUpDown(FrrAdjacencyIndex, UpDown) 932 { 933 global FRRAdjacenciesDown 934 local Idx = FrrAdjacencyIndex 936 if (UpDown == Up) 937 FRRAdjacenciesDown &= ~ 2<<(FrrAdjacencyIndex-1) 938 else 939 FRRAdjacenciesDown |= 2<<(FrrAdjacencyIndex-1) 941 for (Index = GetFirstBitPosition(FRRAdjacenciesDown); Index ; 942 Index = GetNextBitPosition(FRRAdjacenciesDown, Index)) 944 local BP = BTAFT[Index].BitPosition 945 FRRaffectedBP |= 2<<(Index) 946 ResetBitMaskByBT[BP] |= BTAFT[Index].ResetBitMask 947 AddBitMaskByBT[BP] |= BTAFT[Index].AddBitMask 948 } 950 The following procedure is executed whenever a BIER-TE packet is to 951 be forwarded: 953 void ForwardBierTePacket (Packet) 954 { 955 // We calculate in BitMask the subset of BPs of the BitString 956 // for which we have adjacencies. This is purely an 957 // optimization to avoid to replicate for every BP 958 // set in BitString only to discover that for most of them, 959 // the BIFT has no adjacency. 961 local BitMask = Packet->BitString 962 Packet->BitString &= ~MyBitsOfInterest 963 BitMask &= MyBitsOfInterest 965 // FRR Operations 966 // Note: this algorithm is not optimal yet for ECMP cases 967 // it performs FRR replacement for all candidate ECMP paths 969 local MyFRRBP = BitMask & FRRaffectedBP 970 for (BP = GetFirstBitPosition(MyFRRNP); BP ; 971 BP = GetNextBitPosition(MyFRRNP, BP)) 972 BitMask &= ~ResetBitMaskByBT[BP] 973 BitMask |= ResetBitMaskByBT[BT] 975 // Replication 976 for (Index = GetFirstBitPosition(BitMask); Index ; 977 Index = GetNextBitPosition(BitMask, Index)) 978 foreach adjacency BIFT[Index] 980 if(adjacency == ECMP(ListOfAdjacencies, seed) ) 981 I = ECMP_hash(sizeof(ListOfAdjacencies), 982 Packet->Entropy, seed) 983 adjacency = ListOfAdjacencies[I] 985 PacketCopy = Copy(Packet) 987 switch(adjacency) 988 case forward_connected(interface,neighbor,DNR): 989 if(DNR) 990 PacketCopy->BitString |= 2<<(Index-1) 991 SendToL2Unicast(PacketCopy,interface,neighbor) 993 case forward_routed([VRF],neighbor): 994 SendToL3(PacketCopy,[VRF,]l3-neighbor) 996 case local_decap([VRF],neighbor): 997 DecapBierHeader(PacketCopy) 998 PassTo(PacketCopy,[VRF,]Packet->NextProto) 999 } 1001 8. Managing SI, subdomains and BFR-ids 1003 When the number of bits required to represent the necessary hops in 1004 the topology and BFER exceeds the supported bitstring length, 1005 multiple SI and/or subdomains must be used. This section discusses 1006 how. 1008 BIER-TE forwarding does not require the concept of BFR-id, but 1009 routing underlay, flow overlay and BIER headers may. This section 1010 also discusses how BFR-id can be assigned to BFIR/BFER for BIER-TE. 1012 8.1. Why SI and sub-domains 1014 For BIER and BIER-TE forwarding, the most important result of using 1015 multiple SI and/or subdomains is the same: Packets that need to be 1016 sent to BFER in different SI or subdomains require different BIER 1017 packets: each one with a bitstring for a different (SI,subdomain) 1018 bitstring. Each such bitstring uses one bitstring legth sized SI 1019 block in the BIFT of the subdomain. We call this a BIFT:SI (block). 1021 For BIER and BIER-TE forwarding itself there is also no difference 1022 whether different SI and/or sub-domains are choosen, but SI and 1023 subdomain have different purposes in the BIER architecture shared by 1024 BIER-TE. This impacts how operators are managing them and how 1025 especially flow overlays will likely use them. 1027 By default, every possible BFIR/BFER in a BIER network would likey be 1028 given a BFR-id in subdomain 0 (unless there are > 64k BFIR/BFER). 1030 If there are different flow services (or service instances) requiring 1031 replication to different subsets of BFER, then it will likely not be 1032 possible to achieve the best replication efficieny for all of these 1033 service instances via subdomain 0. Ideal replication efficiency for 1034 N BFER exists in a subdomain if they are split over not more than 1035 ceiling(N/bitstring-length) SI. 1037 If service instances justify additional BIER:SI state in the network, 1038 additional subdomains will be used: BFIR/BFER are assigned BFIR-id in 1039 those subdomains and each service instance is configured to use the 1040 most appropriate subdomain. This results in improved replication 1041 efficiency for different services. 1043 Even if creation of subdomains and assignment of BFR-id to BFIR/BFER 1044 in those subdomains is automated, it is not expected that individual 1045 service instances can deal with BFER in different subdomains. A 1046 service instance may only support configuration of a single subdomain 1047 it should rely on. 1049 To be able to easily reuse (and modify as little as possible) 1050 existing BIER procedures including flow-overlay and routing underlay, 1051 when BIER-TE forwarding is added, we therefore reuse SI and subdomain 1052 logically in the same way as they are used in BIER: All necessary 1053 BFIR/BFER for a service use a single BIER-TE BIFT and are split 1054 across as many SI as necessary (see below). Different services may 1055 use different subdomains that primarily exist to provide more 1056 efficient replication (and for BIER-TE desirable traffic engineering) 1057 for different subsets of BFIR/BFER. 1059 8.2. Bit assignment comparison BIER and BIER-TE 1061 In BIER, bitstrings only need to carry bits for BFER, which lead to 1062 the model that BFR-ids map 1:1 to each bit in a bitstring. 1064 In BIER-TE, bitstrings need to carry bits to indicate not only the 1065 receiving BFER but also the intermediate hops/links across which the 1066 packet must be sent. The maximum number of BFER that can be 1067 supported in a single bitstring or BIFT:SI depends on the number of 1068 bits necessary to represent the desired topology between them. 1070 "Desired" topology because it depends on the physical topology, and 1071 on the desire of the operator to allow for explicit traffic 1072 engineering across every single hop (which requires more bits), or 1073 reducing the number of required bits by exploiting optimizations such 1074 as unicast (forward_route), ECMP or flood (DNR) over "uninteresting" 1075 sub-parts of the topology - eg: parts where different trees do not 1076 need to take different paths due to traffic-engineering reasons. 1078 The total number of bits to describe the topology in a BIFT:SI can 1079 therefore easily be as low as 20% or as high as 80%. The higher the 1080 percentage, the higher the likelyhood, that those topology bits are 1081 not just BIER-TE overhead without additional benefit, but instead 1082 they will allow to express the desired traffic-engineering 1083 alternatives. 1085 8.3. Using BFR-id with BIER-TE 1087 Because there is no 1:1 mapping between bits in the bitstring and 1088 BFER, BIER-TE can not simply rely on the BIER 1:1 mapping between 1089 bits in a bitstring and BFR-id. 1091 In BIER, automatic schemes could assign all possible BFR-ids 1092 sequentially to BFERs. This will not work in BIER-TE. In BIER-TE, 1093 the operator or BIER-TE controller host has to determine a BFR-id for 1094 each BFER in each required subdomain. The BFR-id may or may not have 1095 a relationship with a bit in the bitstring. Suggestions are 1096 detailled below. Once determined, the BFR-id can then be configured 1097 on the BFER and used by flow overlay, routing underlay and the BIER 1098 header almost the same as the BFR-id in BIER. 1100 The one exception are application/flow-overlays that automatically 1101 calculate the bitstring(s) of BIER packets by converting BFR-id to 1102 bits. In BIER-TE, this operation can be done in two ways: 1104 "Independent branches": For a given application or (set of) trees, 1105 the branches from a BFIR to every BFER are independent of the 1106 branches to any other BFER. For example, shortest part trees have 1107 independent branches. 1109 "Interdependent braches": When a BFER is added or deleted from a 1110 particular distribution tree, branches to other BFER still in the 1111 tree may need to change. Steiner tree are examples of dependent 1112 branch trees. 1114 If "independent branches" are sufficient, the BIER-TE controller host 1115 can provide to such applications for every BFR-id a SI:bitstring with 1116 the BIER-TE bits for the branch towards that BFER. The application 1117 can then independently calculate the SI:bitstring for all desired 1118 BFER by OR'ing their bitstrings. 1120 If "interdependent branches" are required, the application could call 1121 a BIER-TE controller host API with the list of required BFER-id and 1122 get the required bitstring back. Whenever the set of BFER-id 1123 changes, this is repeated. 1125 Note that in either case (unlike in BIER), the bits in BIER-TE may 1126 need to change upon link/node failure/recovery, network expansion and 1127 network load by other traffic (as part of traffic engineering goals). 1128 Interactions between such BFIR applications and the BIER-TE 1129 controller host do therefore need to support dynamic updates to the 1130 bitstrings. 1132 8.4. Assigning BFR-ids for BIER-TE 1134 For non-leaf BFER, there is usually a single bit k for that BFER with 1135 a local_decap() adjacency on the BFER. The BFR-id for such a BFER is 1136 therefore most easily the one it would have in BIER: SI * bitstring- 1137 length + k. 1139 As explained earlier in the document, leaf BFER do not need such a 1140 separate bit because the fact alone that the BIER-TE packet is 1141 forwarded to the leaf BFER indicates that the BFER should decapsulate 1142 it. Such a BFER will have one or more bits for the links leading 1143 only to it. The BFR-id could therefore most easily be the BFR-id 1144 derived from the lowest bit for those links. 1146 These two rules are only recommendations for the operator or BIER-TE 1147 controller assigning the BFR-ids. Any allocation scheme can be used, 1148 the BFR-ids just need to be unique across BFRs in each subdomain. 1150 It is not currently determined if a single subdomain could or should 1151 be allowed to forward both BIER and BIER-TE packets. If this should 1152 be supported, there are two options: 1154 A. BIER and BIER-TE have different BFR-id in the same subdomain. 1155 This allows higher replication efficiency for BIER because their BFR- 1156 id can be assigned sequentially, while the bitstrings for BIER-TE 1157 will have also the additional bits for the topology. There is no 1158 relationship between a BFR BIER BFR-id and BIER-TE BFR-id. 1160 B. BIER and BIER-TE share the same BFR-id. The BFR-id are assigned 1161 as explained above for BIER-TE and simply reused for BIER. The 1162 replication efficiency for BIER will be as low as that for BIER-TE in 1163 this approach. Depending on topology, only the same 20%..80% of bits 1164 as possible for BIER-TE can be used for BIER. 1166 8.5. Example bit allocations 1168 8.5.1. With BIER 1170 Consider a network setup with a bitstring length of 256 for a network 1171 topology as shown in the picture below. The network has 6 areas, 1172 each with ca. 180 BFR, connecting via a core with some larger (core) 1173 BFR. To address all BFER with BIER, 4 SI are required. To send a 1174 BIER packet to all BFER in the network, 4 copies need to be sent by 1175 the BFIR. On the BFIR it does not make a difference how the BFR-id 1176 are allocated to BFER in the network, but for efficiency further down 1177 in the network it does make a difference. 1179 area1 area2 area3 1180 BFR1a BFR1b BFR2a BFR2b BFR3a BFR3b 1181 | \ / \ / | 1182 ................................ 1183 . Core . 1184 ................................ 1185 | / \ / \ | 1186 BFR4a BFR4b BFR5a BFR5b BFR6a BFR6b 1187 area4 area5 area6 1189 With randomn allocation of BFR-id to BFER, each receiving area would 1190 (most likely) have to receive all 4 copies of the BIER packet because 1191 there would be BFR-id for each of the 4 SI in each of the areas. 1192 Only further towards each BFER would this duplication subside - when 1193 each of the 4 trees runs out of branches. 1195 If BFR-id are allocated intelligently, then all the BFER in an area 1196 would be given BFR-id with as few as possible different SI. Each 1197 area would only have to forward one or two packets instead of 4. 1199 Given how networks can grow over time, replication efficiency in an 1200 area will also easily go down over time when BFR-id are network wide 1201 allocated sequentially over time. An area that initially only has 1202 BFR-id in one SI might end up with many SI over a longer period of 1203 growth. Allocating SIs to areas with initially sufficienctly many 1204 spare bits for growths can help to aleviate this issue. Or renumber 1205 BFR-id after network expansion. In this example one may consider to 1206 use 6 SI and assign one to each area. 1208 This example shows that intelligent BFR-id allocation within at least 1209 subdomain 0 can even be helpfull or even necessary in BIER. 1211 8.5.2. With BIER-TE 1213 In BIER-TE one needs to determine a subset of the physical topology 1214 and attached BFER so that the "desired" representation of this 1215 topology and the BFER fit into a single bitstring. This process 1216 needs to be repeated intil the whole topology is covered. 1218 Once bits/SIs are assigned to topology and BFER, BFR-id is just a 1219 derived set of identifiers from the operator/BIER-TE controller as 1220 explained above. 1222 Every time that different sub-topologies have overlap, bits need to 1223 be repeated across the bitstrings, increasing the overall amount of 1224 bits required across all bitstring/SIs. In the worst case, randomn 1225 subsets of BFER are assigned to different SI. This is much worse 1226 than in BIER because it not only reduces replication efficiency with 1227 the same number of overall bits, but even further - because more bits 1228 are required due to duplication of bits for topology across multiple 1229 SI. Intelligent BFER to SI assignment and selecting specific 1230 "desired" subtopologies can minimize this problem. 1232 To set up BIER-TE efficiently for above topology, the following bit 1233 allocation methods can be used. This method can easily be expanded 1234 to other, similarily structured larger topologies. 1236 Each area is allocated one or more SI depending on the number of 1237 future expected BFER and number of bits required for the topology in 1238 the area. In this example, 6 SI, one per area. 1240 In addition, we use 4 bits in each SI: bia, bib, bea, beb: bit ingres 1241 a, bit ingres b, bit egres a, bit egres b. These bits will be used 1242 to pass BIER packets from any BFIR via any combination of ingres area 1243 a/b BFR and egres area a/b BFR into a specific target area. These 1244 bits are then set up with the right forward_routed adjacencies on the 1245 BFIR and area edge BFR: 1247 On all BFIR in an area j, bia in each BIFT:SI is populated with the 1248 same forward_routed(BFRja), and bib with forward_routed(BFRjb). On 1249 all area edge BFR, bea in BIFT:SI=k is populated with 1250 forward_routed(BFRka) and beb in BIFT:SI=k with 1251 forward_routed(BFRkb). 1253 For BIER-TE forwarding of a packet to some subset of BFER across all 1254 areas, a BFIR would create at most 6 copies, with SI=1...SI=6, In 1255 each packet, the bits indicate bits for topology and BFER in that 1256 topology plus the four bits to indicate whether to pass this packet 1257 via the ingres area a or b border BFR and the egres area a or b 1258 border BFR, therefore allowing path engineering for those two 1259 "unicast" legs: 1) BFIR to ingres are edge and 2) core to egres area 1260 edge. Replication only happens inside the egres areas. For BFER in 1261 the same area as in the BFIR, these four bits are not used. 1263 8.6. Summary 1265 BIER-TE can like BIER support multiple SI within a sub-domain to 1266 allow re-using the concept of BFR-id and therefore minimize BIER-TE 1267 specific functions in underlay routing, flow overlay methods and BIER 1268 headers. 1270 The number of BFIR/BFER possible in a subdomain is smaller than in 1271 BIER because BIER-TE uses additional bits for topology. 1273 Subdomains can in BIER-TE be used like in BIER to create more 1274 efficient replication to known subsets of BFER. 1276 Assigning bits for BFER intelligently into the right SI is more 1277 important in BIER-TE than in BIER because of replication efficiency 1278 and overall amount of bits required. 1280 9. Further considerations 1282 9.1. BIER-TE and existing FRR 1284 BIER-TE as described above is an advanced method for mode-protection 1285 where the replication in a failed node is on the fly replaced by 1286 another replication tree through bit operations on the BitString. 1288 If BIER-TE is not feasible or necessary, it is also possible for 1289 BIER-TE to leverage any existing form of "link" protection. For 1290 example: instead of dorectly setting up a forward_connected adjacency 1291 to a next-hop neighbor, this can be a "protected" adjacency that is 1292 maintained by RSVP-TE (or another FRR mechanism) and passes via a 1293 backup path if the link fails. 1295 9.2. BIER-TE and Segment Routing 1297 Segment Routing aims to achieve lightweight path engineering via 1298 loose source routing. Compared for example to RSVP-TE, it does not 1299 require per-path signaling to each of these hops. 1301 BIER-TE is supports the same design philosophy for multicast. Like 1302 in SR, it relies on source-routing - via the definition of a 1303 BitString. Like SR, it only requires to consider the "hops" on which 1304 either replication has to happen, or across which the traffic should 1305 be steered (even without replication). Any other hops can be skipped 1306 via the use of routed adjacencies. 1308 Instead of defining BitPositions for non-replicating hops, it is 1309 equally possible to use segment routing encapsulations (eg: MPLS 1310 label stacks) for "forward_routed" adjacencies. 1312 10. Security Considerations 1314 The security considerations are the same as for BIER with the 1315 following differences: 1317 BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures 1318 for their distribution, so these are not attack vectors against BIER- 1319 TE. 1321 11. IANA Considerations 1323 This document requests no action by IANA. 1325 12. Acknowledgements 1327 The authors would like to thank Greg Shepherd, Ijsbrand Wijnands and 1328 Neale Ranns for their extensive review and suggestions. 1330 13. Change log [RFC Editor: Please remove] 1332 02: Changed the definition of BIFT to be more inline with BIER. 1333 In revs. up to -01, the idea was that a BIFT has only entries for 1334 a single bitstring, and every SI and subdomain would be a separate 1335 BIFT. In BIER, each BIFT covers all SI. This is now also how we 1336 define it in BIER-TE. 1338 02: Added Section 8 to explain the use of SI, subdomains and BFR- 1339 id in BIER-TE and to give an example how to efficiently assign 1340 bits for a large topology requiring multiple SI. 1342 02: Added further detailed for rings - how to support input from 1343 all ring nodes. 1345 01: Fixed BFIR -> BFER for section 4.3. 1347 01: Added explanation of SI, difference to BIER ECMP, 1348 consideration for Segment Routing, unicast FRR, considerations for 1349 encapsulation, explanations of BIER-TE controller host and CLI. 1351 00: Initial version. 1353 14. References 1355 [I-D.ietf-bier-architecture] 1356 Wijnands, I., Rosen, E., Dolganow, A., Przygienda, T., and 1357 S. Aldrin, "Multicast using Bit Index Explicit 1358 Replication", draft-ietf-bier-architecture-02 (work in 1359 progress), July 2015. 1361 [I-D.ietf-bier-mpls-encapsulation] 1362 Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., and 1363 S. Aldrin, "Encapsulation for Bit Index Explicit 1364 Replication in MPLS Networks", draft-ietf-bier-mpls- 1365 encapsulation-02 (work in progress), August 2015. 1367 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1368 Requirement Levels", BCP 14, RFC 2119, 1369 DOI 10.17487/RFC2119, March 1997, 1370 . 1372 Authors' Addresses 1374 Toerless Eckert 1375 Cisco Systems, Inc. 1377 Email: eckert@cisco.com 1379 Gregory Cauchie 1380 Bouygues Telecom 1382 Email: GCAUCHIE@bouyguestelecom.fr