idnits 2.17.1 draft-eckert-bier-te-arch-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([I-D.wijnands-bier-architecture]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (July 5, 2015) is 3212 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'VRF' is mentioned on line 947, but not defined == Missing Reference: 'Index' is mentioned on line 929, but not defined == Missing Reference: 'BitStringLength' is mentioned on line 879, but not defined == Missing Reference: 'BP' is mentioned on line 923, but not defined == Missing Reference: 'BT' is mentioned on line 924, but not defined == Missing Reference: 'I' is mentioned on line 934, but not defined == Unused Reference: 'I-D.wijnands-mpls-bier-encapsulation' is defined on line 1018, but no explicit reference was found in the text Summary: 2 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Eckert 3 Internet-Draft Cisco Systems, Inc. 4 Intended status: Standards Track G. Cauchie 5 Expires: January 6, 2016 Bouygues Telecom 6 July 5, 2015 8 Traffic Enginering for Bit Index Explicit Replication BIER-TE 9 draft-eckert-bier-te-arch-01 11 Abstract 13 This document proposes an architecture for BIER-TE: Traffic 14 Engineering for Bit Index Explicit Replication (BIER). 16 BIER-TE shares part of its architecture with BIER as described in 17 [I-D.wijnands-bier-architecture]. It also proposes to share the 18 packet format with BIER. 20 BIER-TE forwards and replicates packets like BIER based on a 21 BitString in the packet header but it does not require an IGP. It 22 does support traffic engineering by explicit hop-by-hop forwarding 23 and loose hop forwarding of packets. It does support Fast ReRoute 24 (FRR) for link and node protection and incremental deployment. 25 Because BIER-TE like BIER operates without explicit in-network tree- 26 building but also supports traffic engineering, it is more similar to 27 SR than RSVP-TE. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on January 6, 2016. 46 Copyright Notice 48 Copyright (c) 2015 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 66 2. Layering . . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 2.1. The Multicast Flow Overlay . . . . . . . . . . . . . . . 4 68 2.2. The BIER-TE Controller Host . . . . . . . . . . . . . . . 5 69 2.2.1. Assignment of BitPositions to adjacencies of the 70 network topology . . . . . . . . . . . . . . . . . . 5 71 2.2.2. Changes in the network topology . . . . . . . . . . . 5 72 2.2.3. Set up per-multicast flow BIER-TE state . . . . . . . 6 73 2.2.4. Link/Node Failures and Recovery . . . . . . . . . . . 6 74 2.3. The BIER-TE Forwarding Layer . . . . . . . . . . . . . . 6 75 2.4. The Routing Underlay . . . . . . . . . . . . . . . . . . 6 76 3. BIER-TE Forwarding . . . . . . . . . . . . . . . . . . . . . 7 77 3.1. The Bit Index Forwarding Table (BIFT) . . . . . . . . . . 7 78 3.2. Adjacency Types . . . . . . . . . . . . . . . . . . . . . 8 79 3.2.1. Forward Connected . . . . . . . . . . . . . . . . . . 8 80 3.2.2. Forward Routed . . . . . . . . . . . . . . . . . . . 8 81 3.2.3. ECMP . . . . . . . . . . . . . . . . . . . . . . . . 8 82 3.2.4. Local Decap . . . . . . . . . . . . . . . . . . . . . 9 83 3.3. Encapsulation considerations . . . . . . . . . . . . . . 9 84 3.4. Basic BIER-TE Forwarding Example . . . . . . . . . . . . 9 85 4. BIER-TE Controller Host BitPosition Assignments . . . . . . . 11 86 4.1. P2P Links . . . . . . . . . . . . . . . . . . . . . . . . 11 87 4.2. BFER . . . . . . . . . . . . . . . . . . . . . . . . . . 12 88 4.3. Leaf BFIRs . . . . . . . . . . . . . . . . . . . . . . . 12 89 4.4. LANs . . . . . . . . . . . . . . . . . . . . . . . . . . 12 90 4.5. Hub and Spoke . . . . . . . . . . . . . . . . . . . . . . 13 91 4.6. Rings . . . . . . . . . . . . . . . . . . . . . . . . . . 13 92 4.7. Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . . 13 93 4.8. Routed adjacencies . . . . . . . . . . . . . . . . . . . 16 94 4.8.1. Reducing BitPositions . . . . . . . . . . . . . . . . 16 95 4.8.2. Supporting nodes without BIER-TE . . . . . . . . . . 16 96 4.9. Using multiple BIFTs . . . . . . . . . . . . . . . . . . 16 97 5. Avoiding loops and duplicates . . . . . . . . . . . . . . . . 16 98 5.1. Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 17 99 5.2. Duplicates . . . . . . . . . . . . . . . . . . . . . . . 17 100 6. BIER-TE FRR . . . . . . . . . . . . . . . . . . . . . . . . . 17 101 6.1. The BIER-TE Adjacency FRR Table (BTAFT) . . . . . . . . . 18 102 6.2. FRR in BIER-TE forwarding . . . . . . . . . . . . . . . . 18 103 6.3. FRR in the BIER-TE Controller Host . . . . . . . . . . . 18 104 6.4. BIER-TE FRR Benefits . . . . . . . . . . . . . . . . . . 19 105 7. BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . . 19 106 8. Further considerations . . . . . . . . . . . . . . . . . . . 22 107 8.1. BIER-TE and existing FRR . . . . . . . . . . . . . . . . 22 108 8.2. BIER-TE and Segment Routing . . . . . . . . . . . . . . . 22 109 9. Security Considerations . . . . . . . . . . . . . . . . . . . 22 110 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 111 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23 112 12. Change log [RFC Editor: Please remove] . . . . . . . . . . . 23 113 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 114 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 116 1. Introduction 118 1.1. Overview 120 This document specifies the architecture for BIER-TE: traffic 121 engineering for Bit Index Explicit Replication BIER. 123 BIER-TE shares architecture and packet formats with BIER as described 124 in [I-D.wijnands-bier-architecture]. 126 BIER-TE forwards and replicates packets like BIER based on a 127 BitString in the packet header but it does not require an IGP. It 128 does support traffic engineering by explicit hop-by-hop forwarding 129 and loose hop forwarding of packets. It does support Fast ReRoute 130 (FRR) for link and node protection and incremental deployment. 131 Because BIER-TE like BIER operates without explicit in-network tree- 132 building but also supports traffic engineering, it is more similar to 133 SR than RSVP-TE. 135 The key differences over BIER are: 137 o BIER-TE replaces in-network autonomous path calculation by 138 explicit paths calculated offpath by the BIER-TE controller host. 140 o In BIER-TE every BitPosition of the BitString of a BIER-TE packet 141 indicates one or more adjacencies - instead of a BFER as in BIER. 143 o BIER-TE in each BFR has no routing table but only a BIER-TE 144 Forwarding Table (BIFT) indexed by BitPosition and populated with 145 only those adjacencies to which the BFR should replicate packets 146 to. 148 Currently, BIER-TE does not support BIER-sub-domains and it does not 149 not use BFR-id. BIER-TE headers use the same format as BIER headers 150 (BFR-id is set to 0). 152 1.2. Requirements Language 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 156 document are to be interpreted as described in RFC 2119 [RFC2119]. 158 2. Layering 160 End to end BIER-TE operations consists of four components: The 161 "Multicast Flow Overlay", the "BIER-TE Controller Host", the "Routing 162 Underlay" and the "BIER-TE forwarding layer". 164 Picture 2: Layers of BIER-TE 166 <------BGP/PIM-----> 167 |<-IGMP/PIM-> multicast flow <-PIM/IGMP->| 168 overlay 170 [Bier-TE Controller Host] 171 ^ ^ ^ 172 / | \ BIER-TE control protocol 173 | | | eg.: Netconf/Restconf/Yang 174 v v v 175 Src -> Rtr1 -> BFIR-----BFR-----BFER -> Rtr2 -> Rcvr 177 |--------------------->| 178 BIER-TE forwarding layer 180 |<- BIER-TE domain-->| 182 |<--------------------->| 183 Routing underlay 185 2.1. The Multicast Flow Overlay 187 The Multicast Flow Overlay operates as in BIER. See 188 [I-D.wijnands-bier-architecture]. Instead of interacting with the 189 BIER layer, it interacts with the BIER-TE Controller Host 191 2.2. The BIER-TE Controller Host 193 The BIER-TE controller host is representing the control plane of 194 BIER-TE. It communicates two sets of informations with BFRs: 196 During bring-up or modifications of the network topology, the 197 controller discovers the network topology, assigns BitPositions to 198 adjacencies and signals the resulting mapping of BitPositions to 199 adjacencies to each BFR connecting to the adjacency. 201 During day-to-day operations of the network, the controller signals 202 to BFIRs what multicast flows are mapped to what BitStrings. 204 Communications between the BIER-TE controller host to BFRs is ideally 205 via standardized protocols and data-models such as Netconf/Retconf/ 206 Yang. This is currently outside the scope of this document. Vendor- 207 specific CLI on the BFRs is also a posible stopgap option (as in many 208 other SDN solutions lacking definition of standardized data model). 210 For simplicity, the procedures of the BIER-TE controller host are 211 described in this document as if it is a single, centralized 212 automated entity, such as an SDN controller. It could equally be an 213 operator setting up CLI on the BFRs. Distribution of the functions 214 of the BIER-TE controller host is currently outside the scope of this 215 document. 217 2.2.1. Assignment of BitPositions to adjacencies of the network 218 topology 220 The BIER-TE controller host tracks the BFR topology of the BIER-TE 221 domain. It determines what adjacencies require BitPositions so that 222 BIER-TE explicit paths can be built through them as desired by 223 operator policy. 225 The controller then pushes the BitPositions/adjacencies to the BIFT 226 of the BFRs, populating only those BitPositions to the BIFT of each 227 BFR to which that BFR should be able to send packets to - adjacencies 228 connecting to this BFR. 230 2.2.2. Changes in the network topology 232 If the network topology changes (not failure based) so that 233 adjacencies that are assigned to BitPositions are no longer needed, 234 the controller can re-use those BitPositions for new adjacencies. 235 First, these BitPositions need to be removed from any BFIR flow state 236 and BFR BIFT state (and BTAFT if FRR is supported, see below), then 237 they can be repopulated, first into BIFT (and if FRR is supported 238 BTAFT), then into BFIR. 240 2.2.3. Set up per-multicast flow BIER-TE state 242 The BIER-TE controller host tracks the multicast flow overlay to 243 determine what multicast flow needs to be sent by a BFIR to which set 244 of BFER. It calculates the desired distribution tree across the 245 BIER-TE domain based on algorithms outside the scope of this document 246 (eg.: CSFP, Steiner Tree,...). It then pushes the calculated 247 BitString into the BFIR. 249 2.2.4. Link/Node Failures and Recovery 251 When link or nodes fail or recover in the topology, BIER-TE can 252 quickly respond with the optional FRR procedures described below. It 253 can also more slowly react by recalculating the BitStrings of 254 affected multicast flows. This reaction is slower than the FR 255 procedure because the controller needs to receive link/node up/down 256 indications, recalculate the desired BitStrings and push them down 257 into the BFIRs. with FRR, this is all performed locally on a BFR 258 receiving the adjacency up/down notification. 260 2.3. The BIER-TE Forwarding Layer 262 When the BIER-TE Forwarding Layer receives a packet, it simply looks 263 up the BitPositions that are set in the BitString of the packet in 264 the Bit Index Forwarding Table (BIFT) that was populated by the BIER- 265 TE controller host. For every BP that is set in the BitString, and 266 that has one or more adjacencies in the BIFT, a copy is made 267 according to the type of adjacencies for that BP in the BIFT. Before 268 sending any copy, the BFR resets all BitPositions in the BitString of 269 the packet to which it can create a copy. This is done to inhibit 270 that packets can loop. 272 If the BFR support BIER-TE FRR operations, then the BIER-TE 273 forwarding layer will receive fast adjacency up/down notification 274 uses the BIER-TE FRR Adjacency Table to modify the BitString of the 275 packet before it performs BIER-TE forwarding. This is detailed in 276 the FRR section. 278 2.4. The Routing Underlay 280 BIER-TE is sending BIER packets to directly connected BIER-TE 281 neighbors as L2 (unicasted) BIER packets without requiring a routing 282 underlay. BIER-TE forwarding uses the Routing underlay for 283 forward_routed adjacencies which copy BIER-TE packets to not- 284 directly-connected BFRs (see below for adjacency definitions). 286 If the BFR intends to support FRR for BIER-TE, then the BIER-TE 287 forwarding plane needs to receive fast adjacency up/down 288 notifications: Link up/down or neighbor up/down, eg.: from BFD. 289 Providing these notifications is considered to be part of the routing 290 underlay in this document. 292 3. BIER-TE Forwarding 294 3.1. The Bit Index Forwarding Table (BIFT) 296 The Bit Index Forwarding Table (BIFT) exists in every BFR. It is a 297 table indexed by BitPosition and is populated by the BIER-TE control 298 plane. Each index can be empty or contain a list of one or more 299 adjacencies. 301 If the network is so large that the number of BitPositions in a 302 single BIFT does not suffice to identify the necessary adjacencies, 303 multiple BIFT need to be used, each identified via a separate SI (Set 304 Identifier) value. 306 ------------------------------------------------------------------ 307 | Index | Adjacencies | 308 ================================================================== 309 | 1 | forward_connected(interface,neighbor,DNR) | 310 ------------------------------------------------------------------ 311 | 2 | forward_connected(interface,neighbor,DNR) | 312 | | forward_connected(interface,neighbor,DNR) | 313 ------------------------------------------------------------------ 314 | 3 | local_decap([VRF]) | 315 ------------------------------------------------------------------ 316 | 4 | forward_routed([VRF,]l3-neighbor) | 317 ------------------------------------------------------------------ 318 | 5 | | 319 ------------------------------------------------------------------ 320 | 6 | ECMP({adjacency1,...adjacencyN}, seed) | 321 ------------------------------------------------------------------ 322 ... 323 | BitStringLength | ... | 324 ------------------------------------------------------------------ 325 Bit Index Forwarding Table 327 The BIFT is programmed into the data plane of BFRs by the BIER-TE 328 controller host and used to forward packets, according to the rules 329 specified in the BIER-TE Forwarding Procedures. 331 Adjacencies for the same BP when populated in more than one BFR by 332 the controller do not have to have the same adjacencies. This is up 333 to the controller. BPs for p2p links are one case (see below). 335 3.2. Adjacency Types 337 3.2.1. Forward Connected 339 A "forward_connected" adjacency is towards a directly connected BFR 340 neighbor using an interface address of that BFR on the connecting 341 interface. A forward_connected adjacency does not route packets but 342 only L2 forwards them to the neighbor. 344 Packets sent to an adjacency with "DoNotReset" (DNR) set in the BIFT 345 will not have the BitPosition for that adjacency reset when the BFR 346 creates a copy for it. The BitPosition will still be reset for 347 copies of the packet made towards other adjacencies. The can be used 348 for example in ring topologies as explained below. 350 3.2.2. Forward Routed 352 A "forward_routed" adjacency is an adjacency towards a BFR that is 353 not a forward_connected adjacency: towards a loopback address of a 354 BFR or towards an interface address that is non-directly connected. 355 Forward_routed packets are forwarded via the Routing Underlay. 357 If the Routing Underlay has multiple paths for a forward_routed 358 adjacency, it will perform ECMP independent of BIER-TE for packets 359 forwarded across a forward_routed adjacency. 361 If the Routing Underlay has FRR, it will perform FRR independent of 362 BIER-TE for packets forwarded across a forward_routed adjacency. 364 3.2.3. ECMP 366 The ECMP mechanisms in BIER are tied to the BIER BIFT and are are 367 therefore not directly useable with BIER-TE. The following 368 procedures describe ECMP for BIER-TE that we consider to be 369 lightweight but also well manageable. It leverages the existing 370 entropy parameter in the BIER header to keep packets of the flows on 371 the same path anbd it introduces a "seed" parameter to allow 372 engineering traffic to be polarized or randomized across multiple 373 hops. 375 An "Equal Cost Multipath" (ECMP) adjacency has a list of two or more 376 adjacencies included in it. It copies the BIER-TE to one of those 377 adjacencies based on the ECMP hash calculation. The BIER-TE ECMP 378 hash algorithm must select the same adjacency from that list for all 379 packets with the same "entropy" value in the BIER-TE header if the 380 same number of adjacencies and same seed are given as parameters. 381 Further use of the seed parameter is explained below. 383 3.2.4. Local Decap 385 A "local_decap" adjacency passes a copy of the payload of the BIER-TE 386 packet to the packets NextProto within the BFR (IPv4/IPv6, 387 Ethernet,...). A local_decap adjacency turns the BFR into a BFER for 388 matching packets. Local_decap adjacencies require the BFER to 389 support routing or switching for NextProto to determine how to 390 further process the packet. 392 3.3. Encapsulation considerations 394 Specifications for BIER-TE encapsulation are outside the scope of 395 this document. This section gives explanations and guidelines. 397 Because a BFR needs to interpret the BitString of a BIER-TE packet 398 differently from a BIER packet, it is necessary to distinguish BIER 399 from BIER-TE packets. BIER MPLS encapsulation for example assigns 400 one label by which BFRs recognize BIER packets. BIER-TE packets 401 should be recognized via a second equally assigned label. If an 402 encapsulation does not permit such differentiation, then 403 modifications in the BIER header may be necessary to support 404 simultaneous BIER and BIER-TE forwarding. 406 "forward_routed" requires an encapsulation permitting to unicast 407 BIER-TE packets to a specific interface address on a target BFR. 408 With MPLS encapsulation, this can simply be done via a label stack 409 with that addresses label as the top label - followed by the label 410 identifying BIER-TE packets. With a non-MPLS encapsulation, some 411 form of IP tunneling (IP in IP, LISP, GRE) would be required. 413 The encapsulation used for "forward_routed" adjacencies can equally 414 support existing advanced adjacency information such as "loose source 415 routes" via eg: MPLS label stacks or appropriate header extensions 416 (eg: for IPv6). 418 3.4. Basic BIER-TE Forwarding Example 420 Step by step example of basic BIER-TE forwarding. This does not use 421 ECMP or forward_routed adjacencies nor does it try to minimize the 422 number of required BitPositions for the topology. 424 Picture 1: Forwarding Example 426 [Bier-Te Controller Host] 427 / | \ 428 v v v 430 | p13 p1 | 431 +- BFIR2 --+ | 432 | | p2 p6 | LAN2 433 | +-- BFR3 --+ | 434 | | | p7 p11 | 435 Src -+ +-- BFER1 --+ 436 | | p3 p8 | | 437 | +-- BFR4 --+ +-- Rcv1 438 | | | | 439 | | 440 | p14 p4 | 441 +- BFIR1 --+ | 442 | +-- BFR5 --+ p10 p12 | 443 LAN1 | p5 p9 +-- BFER2 --+ 444 | +-- Rcv2 445 | 446 LAN3 448 IP |..... BIER-TE network......| IP 450 pXX indicate the BitPositions number assigned by the BIER-TE 451 controller host to adjacencies in the BIER-TE topology. For example, 452 p9 is the adjacency towards BFR9 on the LAN connecting to BFER2. 454 BIFT BFIR2: 455 p13: local_decap() 456 p2: forward_connected(BFR3) 458 BIFT BFR3: 459 p1: forward_connected(BFIR2) 460 p7: forward_connected(BFER1) 461 p8: forward_connected(BFR4) 463 BIFT BFER1: 464 p11: local_decap() 465 p6: forward_connected(BFR3) 466 p8: forward_connected(BFR4) 468 ...and so on. 470 Traffic needs to flow from BFIR2 towards Rcv1, Rcv2. The controller 471 determines it wants it to pass across the following paths: 473 -> BFER1 ---------------> Rcv1 474 BFIR2 -> BFR3 475 -> BFR4 -> BFR5 -> BFER2 -> Rcv2 477 These paths equal to the following BitString: p2, p5, p7, p8, p10, 478 p11, p12 480 This BitString is set up in BFIR2. Multicast packets arriving at 481 BFIR2 from Src are assigned this BitString. 483 BFIR2 forwards based on that BitString. It has p2 and p13 populated. 484 Only p13 is in BitString which has an adjacency towards BFR3. BFIR2 485 resets p2 in BitString and sends a copy towards BFR2. 487 BFR3 sees a BitString of p5,p7,p8,p10,p11,p12. It is only interested 488 in p1,p7,p8. It creates a copy of the packet to BFER1 (due to p7) 489 and one to BFR4 (due to p8). It resets p7, p8 before sending. 491 BFER1 sees a BitString of p5,p10,p11,p12. It is only interested in 492 p6,p7,p8,p11 and therefore considers only p11. p11 is a "local_decap" 493 adjacency installed by the BIER-TE controller host because BFER1 494 should pass packets to IP multicast. The local_decap adjacency 495 instructs BFER1 to create a copy, decapsulate it from the BIER header 496 and pass it on to the NextProtocol, in this example IP multicast. IP 497 multicast will then forward the packet out to LAN2 because it did 498 receive PIM or IGMP joins on LAN2 for the traffic. 500 Further processing of the packet in BFR4, BFR5 and BFER2 accordingly. 502 4. BIER-TE Controller Host BitPosition Assignments 504 This section describes how the BIER-TE controller host can use the 505 different BIER-TE adjacency types to define the BitPositions of a 506 BIER-TE domain. 508 Because the size of the BitString is limiting the size of the BIER-TE 509 domain, many of the options described exist to support larger 510 topologies with fewer BitPositions (4.1, 4.3, 4.4, 4.5, 4.6, 4.7, 511 4.8). 513 4.1. P2P Links 515 Each P2p link in the BIER-TE domain is assigned one unique 516 BitPosition with a forward_connected adjacency pointing to the 517 neighbor on the p2p link. 519 4.2. BFER 521 Every BFER is given a unique BitPosition with a local_decap 522 adjacency. 524 4.3. Leaf BFIRs 526 Leaf BFIRs are BFIRs where incoming BIER-TE packets never need to be 527 forwarded to another BFR but are only sent to the BFIR to exit the 528 BIER-TE domain. For example, in networks where PEs are spokes 529 connected to P routers, those PEs are Leaf BFIRs unless there is a 530 U-turn between two PEs. 532 All leaf-BFIR in a BIER-TE domain can share a single BitPosition. 533 This is possible because the BitPosition for the adjacency to reach 534 the BFIR can be used to distinguish whether or not packets should 535 reach the BFIR. 537 This optimization will not work if an upstream interface of the BFIR 538 is using a BitPosition optimized as described in the following two 539 sections (LAN, Hub and Spoke). 541 4.4. LANs 543 In a LAN, the adjacency to each neighboring BFR on the LAN is given a 544 unique BitPosition. The adjacency of this BitPosition is a 545 forward_connected adjacency towards the BFR and this BitPosition is 546 populated into the BIFT of all the other BFRs on that LAN. 548 BFR1 549 |p1 550 LAN1-+-+---+-----+ 551 p3| p4| p2| 552 BFR3 BFR4 BFR7 554 If Bandwidth on the LAN is not an issue and most BIER-TE traffic 555 should be copied to all neighbors on a LAN, then BitPositions can be 556 saved by assigning just a single BitPosition to the LAN and 557 populating the BitPosition of the BIFTs of each BFRs on the LAN with 558 a list of forward_connected adjacencies to all other neighbors on the 559 LAN. 561 This optimization does not work in the face of BFRs redundantly 562 connected to more than one LANs with this optimization because these 563 BFRs would receive duplicates and forward those duplicates into the 564 opposite LANs. Adjacencies of such BFRs into their LANs still need a 565 separate BitPosition. 567 4.5. Hub and Spoke 569 In a setup with a hub and multiple spokes connected via separate p2p 570 links to the hub, all p2p links can share the same BitPosition. The 571 BitPosition on the hubs BIFT is set up with a list of 572 forward_connected adjacencies, one for each Spoke. 574 This option is similar to the BitPosition optimization in LANs: 575 Redundantly connected spokes need their own BitPositions. 577 4.6. Rings 579 In L3 rings, instead of assigning a single BitPosition for every p2p 580 link in the ring, it is possible to save BitPositions by setting the 581 "Do Not Reset" (DNR) flag on forward_connected adjacencies. 583 For the rings shown in the following picture, a single BitPosition 584 will suffice to forward traffic entering the ring at BFRa or BFRb all 585 the way up to BFR1: 587 On BFRa, BFRb, BFR30,... BFR3, the BitPosition is populated with a 588 forward_connected adjacency pointing to the clockwise neighbor on the 589 ring and with DNR set. On BFR2, the adjacency also points to the 590 clockwise neighbor BFR1, but without DNR set. Handling DNR this way 591 ensures that copies forwarded from any BFR in the ring to a BFR 592 outside the ring will not have this BitPosition, therefore minimizing 593 the chance to create loops. 595 v v 596 | | 597 L1 | L2 | L3 598 /-------- BFRa ---- BFRb --------------------\ 599 | | 600 \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/ 601 | | L4 | | 602 p33| p15| 603 BFRd BFRc 605 4.7. Equal Cost MultiPath (ECMP) 607 The ECMP adjacency allows to use just one BP per link bundle between 608 two BFRs instead of one BP for each p2p member link of that link 609 bundle. In the following picture, one BP is used across L1,L2,L3 and 610 BFR1/BFR2 have for the BP 611 --L1----- 612 BFR1 --L2----- BFR2 613 --L3----- 615 BIFT entry in BFR1: 616 ------------------------------------------------------------------ 617 | Index | Adjacencies | 618 ================================================================== 619 | 6 | ECMP({L1-to-BFR2,L2-to-BFR2,L3-to-BFR2}, seed) | 620 ------------------------------------------------------------------ 622 BIFT entry in BFR2: 623 ------------------------------------------------------------------ 624 | Index | Adjacencies | 625 ================================================================== 626 | 6 | ECMP({L1-to-BFR1,L2-to-BFR1,L3-to-BFR1}, seed) | 627 ------------------------------------------------------------------ 629 In the following example, all traffic from BFR1 towards BFR10 is 630 intended to be ECMP load split equally across the topology. This 631 example is not mean as a likely setup, but to illustrate that ECMP 632 can be used to share BPs not only across link bundles, and it 633 explains the use of the seed parameter. 635 BFR1 636 / \ 637 /L11 \L12 638 BFR2 BFR3 639 / \ / \ 640 /L21 \L22 /L31 \L32 641 BFR4 BFR5 BFR6 BFR7 642 \ / \ / 643 \ / \ / 644 BFR8 BFR9 645 \ / 646 \ / 647 BFR10 649 BIFT entry in BFR1: 650 ------------------------------------------------------------------ 651 | 6 | ECMP({L11-to-BFR2,L12-to-BFR3}, seed) | 652 ------------------------------------------------------------------ 654 BIFT entry in BFR2: 655 ------------------------------------------------------------------ 656 | 6 | ECMP({L21-to-BFR4,L22-to-BFR5}, seed) | 657 ------------------------------------------------------------------ 659 BIFT entry in BFR3: 660 ------------------------------------------------------------------ 661 | 6 | ECMP({L31-to-BFR6,L32-to-BFR7}, seed) | 662 ------------------------------------------------------------------ 664 With the setup of ECMP in above topology, traffic would not be 665 equally load-split. Instead, links L22 and L31 would see no traffic 666 at all: BFR2 will only see traffic from BFR1 for which the ECMP hash 667 in BFR1 selected the first adjacency in a list of 2 adjacencies: link 668 L11-to-BFR2. When forwarding in BFR2 performs again an ECMP with two 669 adjacencies on that subset of traffic, then it will again select the 670 first of its two adjacencies to it: L21-to-BFR4. And therefore L22 671 and BFR5 sees no traffic. 673 To resolve this issue, the ECMP adjaceny on BFR1 simply needs to be 674 set up with a different seed than the ECMP adjacncies on BFR2/BFR3 676 This issue is called polarization. It depends on the ECMP hash. It 677 is possible to build ECMP that does not have polarization, for 678 example by taking entropy from the actual adjacency members into 679 account, but that can make it harder to achieve evenly balanced load- 680 splitting on all BFR without making the ECMP hash algorithm 681 potentially too complex for fast forwarding in the BFRs. 683 4.8. Routed adjacencies 685 4.8.1. Reducing BitPositions 687 Routed adjacencies can reduce the number of BitPositions required 688 when the traffic engineering requirement is not hop-by-hop explicit 689 path selection, but loose-hop selection. 691 ............... ............... 692 BFR1--... Redundant ...--L1-- BFR2... Redundant ...--- 693 \--... Network ...--L2--/ ... Network ...--- 694 BFR4--... Segment 1 ...--L3-- BFR3... Segment 2 ...--- 695 ............... ............... 697 Assume the requirement in above network is to explicitly engineer 698 paths such that specific traffic flows are passed from segment 1 to 699 segment 2 via link L1 (or via L2 or via L3). 701 To achieve this, BFR1 and BFR4 are set up with a forward_routed 702 adjacency BitPosition towards an address of BFR2 on link L1 (or link 703 L2 BFR3 via L3). 705 For paths to be engineered through a specific node BFR2 (or BFR3), 706 BFR1 and BFR4 are set up up with a forward_routed adjacency 707 BitPosition towards a loopback address of BFR2 (or BFR3). 709 4.8.2. Supporting nodes without BIER-TE 711 Routed adjacencies also enable incremental deployment of BIER-TE. 712 Only the nodes through which BIER-TE traffic needs to be steered - 713 with or without replication - need to support BIER-TE. Where they 714 are not directly connected to each other, forward_routed adjacencies 715 are used to pass over non BIER-TE enabled nodes. 717 4.9. Using multiple BIFTs 719 In a large network, multiple BIFT may be necessary, each one 720 identified by a different SI value in the BIER header. Transit 721 adjacencies may need to be given BitPositions in multiple BIFTs to 722 achieve the desired path engineering for packets replicated with 723 different SIs/BIFTs. 725 5. Avoiding loops and duplicates 726 5.1. Loops 728 Whenever BIER-TE creates a copy of a packet, the BitString of that 729 copy will have all BitPositions cleared that are associated with 730 adjacencies in the BFR. This inhibits looping of packets. The only 731 exception are adjacencies with DNR set. 733 With DNR set, looping can happen. Consider in the ring picture that 734 link L4 from BFR3 is plugged into the L1 interface of BFRa. This 735 creates a loop where the rings clockwise BitPosition is never reset 736 for copies of the packets traveling clockwise around the ring. 738 To inhibit looping in the face of such physical misconfiguration, 739 only forward_connected adjacencies are permitted to have DNR set, and 740 the link layer destination address of the adjacency (eg.: MAC 741 address) protects against closing the loop. Link layers without port 742 unique link layer addresses should not used with the DNR flag set. 744 5.2. Duplicates 746 Duplicates happen when the topology of the BitString is not a tree 747 but redundantly connecting BFRs with each other. The controller must 748 therefore ensure to only create BitStrings that are trees in the 749 topology. 751 When links are incorrectly physically re-connected before the 752 controller updates BitStrings in BFIRs, duplicates can happen. Like 753 loops, these can be inhibited by link layer addressing in 754 forward_connected adjacencies. 756 If interface or loopback addresses used in forward_routed adjacencies 757 are moved from one BFR to another, duplicates can equally happen. 758 Such re-addressing operations must be coordinated with the 759 controller. 761 6. BIER-TE FRR 763 FRR is an optional procedure. To leverage it, the BIER-TE controller 764 host and BFRs need to support it. It does not have to be supported 765 on all BFRs, but only those that are attached to a link/adjacency for 766 which FRR support is required. 768 If BIER-TE FRR is supported by the BIER-TE controller host, then it 769 needs to calculate the desired backup paths for link and/or node 770 failures in the BIER-TE domain and download this information into the 771 BIER-TE Adjacency FRR Table (BTAFT) of the BFRs. The BTAFT then 772 drives FRR operations in the BIER-TE forwarding plane of that BFR. 774 6.1. The BIER-TE Adjacency FRR Table (BTAFT) 776 The BIER-TE IF FRR Table exists in every BFR that is supporting BIER- 777 TE FRR procedures. It is indexed by FRR Adjacency Index. Associated 778 with each FRR Adjacency Index is a ResetBitmask, AddBitmask and 779 BitPosition. 781 ----------------------------------------------------------- 782 | FRR Adjacency | BitPosition | ResetBitmask | AddBitmask | 783 | Index | | | | 784 =========================================================== 785 | 1 | 5 | ..0010000 | ..11000000 | 786 ----------------------------------------------------------- 787 ... 789 An FRR Adjacency is an adjacency that is used in the BIFT of the BFR. 790 The BFR has to be able to determine whether the adjacency is up or 791 down in less than 50msec. An FRR adjacency can be a 792 forward_connected adjacency with fast L2 link state Up/Down state 793 notifications or a forward_connected or forward_routed adjacency with 794 a fast aliveness mechanism such as BFD. Details of those mechanism 795 are outside the scope of this architecture. 797 The FRR Adjacency Index is the index that would be indicated on the 798 fast Up/Down notifications to the BIER-TE forwarding plane 800 The BitPosition is the BP in the BIFT in which the FRR Adjacency is 801 used 803 6.2. FRR in BIER-TE forwarding 805 The BIER-TE forwarding plane receives fast Up/Down notifications with 806 the FRR Adjacency Index. From the BitPosition in the BTAFT entry, it 807 remembers which BPs are currently affected (have a down adjacency). 809 When a packet is received, BIER-TE forwarding checks if it has 810 affected BPs to which it would forward. If it does, it will remove 811 the ResetBitmask bits from the packets BitString and add the 812 AddBitmask bits to the packets BitString. 814 Afterwards, normal BIER-TE forwarding occurs, taking the modified 815 BitString into account. 817 6.3. FRR in the BIER-TE Controller Host 819 The basic rules how the BIER-TE controller host would calculate 820 ResetBitMask and AddBitmask are as follows: 822 1. The BIER-TE controller host has to determine whether a failure of 823 the adjacency should be taken to indicate link or node failure. 824 This is a policy decision. 826 2. The ResetBitmask has the BitPosition of the failed adjacency. 828 3. In the case of link protection, the AddBitmask are the segments 829 forming a path from the BFR over to the BFR on the other end of 830 the failed link. 832 4. In the case of node protection, the AddBitmask are the segments 833 forming a tree from the BFR over to all necessary BFR downstream 834 of the (assumed to be failed) BFR across the failed adjacency. 836 5. The ResetBitmask is extended with those segments that could lead 837 to duplicate packets if the AddBitmask is added to possible 838 BitStrings of packets using the failing BitPosition. 840 6.4. BIER-TE FRR Benefits 842 Compared to other FRR solutions, such as RSVP-TE/P2MP FRR, BIER-TE 843 FRR has two key distinctions 845 o It maintains the goal of BIER-TE not to establish in-network per 846 multicast traffic flow state. For that reason, the backup path/ 847 trees are only tied to the topology but not to individual 848 distribution trees. 850 o For the case of node failure, it allows to build a path engineered 851 backup tree (4.) as opposed to only a set of p2p backup tunnels. 853 7. BIER-TE Forwarding Pseudocode 855 The following sections of Pseudocode are meant to illustrate the 856 BIER-TE forwarding plane. This code is not meant to be normative but 857 to serve both as a potentially easier to read and more precise 858 representation of the forwarding functionality and to illustrate how 859 simple BIER-TE forwarding is and that it can be efficiently be 860 implemented. 862 The following procedure is executed on a BFR whenever the BIFT is 863 changed by the BIER-TE controller host: 865 global MyBitsOfInterest 867 void BIFTChanged() 868 { 870 for (Index = 0; Index++ ; Index <= BitStringLength) 871 if(BIFT[Index] != ) 872 MyBitsOfInterest != 2<<(Index-1) 873 } 875 The following procedure is executed whenever an adjacency used for 876 BIER-TE FRR changes state: 878 global ResetBitMaskByBT[BitStringLength] 879 global AddtBitMaskByBT[BitStringLength] 880 global FRRaffectedBP 882 void FrrUpDown(FrrAdjacencyIndex, UpDown) 883 { 884 global FRRAdjacenciesDown 885 local Idx = FrrAdjacencyIndex 887 if (UpDown == Up) 888 FRRAdjacenciesDown &= ~ 2<<(FrrAdjacencyIndex-1) 889 else 890 FRRAdjacenciesDown |= 2<<(FrrAdjacencyIndex-1) 892 for (Index = GetFirstBitPosition(FRRAdjacenciesDown); Index ; 893 Index = GetNextBitPosition(FRRAdjacenciesDown, Index)) 895 local BP = BTAFT[Index].BitPosition 896 FRRaffectedBP |= 2<<(Index) 897 ResetBitMaskByBT[BP] |= BTAFT[Index].ResetBitMask 898 AddBitMaskByBT[BP] |= BTAFT[Index].AddBitMask 899 } 901 The following procedure is executed whenever a BIER-TE packet is to 902 be forwarded: 904 void ForwardBierTePacket (Packet) 905 { 906 // We calculate in BitMask the subset of BPs of the BitString 907 // for which we have adjacencies. This is purely an 908 // optimization to avoid to replicate for every BP 909 // set in BitString only to discover that for most of them, 910 // the BIFT has no adjacency. 912 local BitMask = Packet->BitString 913 Packet->BitString &= ~MyBitsOfInterest 914 BitMask &= MyBitsOfInterest 916 // FRR Operations 917 // Note: this algorithm is not optimal yet for ECMP cases 918 // it performs FRR replacement for all candidate ECMP paths 920 local MyFRRBP = BitMask & FRRaffectedBP 921 for (BP = GetFirstBitPosition(MyFRRNP); BP ; 922 BP = GetNextBitPosition(MyFRRNP, BP)) 923 BitMask &= ~ResetBitMaskByBT[BP] 924 BitMask |= ResetBitMaskByBT[BT] 926 // Replication 927 for (Index = GetFirstBitPosition(BitMask); Index ; 928 Index = GetNextBitPosition(BitMask, Index)) 929 foreach adjacency BIFT[Index] 931 if(adjacency == ECMP(ListOfAdjacencies, seed) ) 932 I = ECMP_hash(sizeof(ListOfAdjacencies), 933 Packet->Entropy, seed) 934 adjacency = ListOfAdjacencies[I] 936 PacketCopy = Copy(Packet) 938 switch(adjacency) 939 case forward_connected(interface,neighbor,DNR): 940 if(DNR) 941 PacketCopy->BitString |= 2<<(Index-1) 942 SendToL2Unicast(PacketCopy,interface,neighbor) 944 case forward_routed([VRF],neighbor): 945 SendToL3(PacketCopy,[VRF,]l3-neighbor) 947 case local_decap([VRF],neighbor): 948 DecapBierHeader(PacketCopy) 949 PassTo(PacketCopy,[VRF,]Packet->NextProto) 950 } 952 8. Further considerations 954 8.1. BIER-TE and existing FRR 956 BIER-TE as described above is an advanced method for mode-protection 957 where the replication in a failed node is on the fly replaced by 958 another replication tree through bit operations on the BitString. 960 If BIER-TE is not feasible or necessary, it is also possible for 961 BIER-TE to leverage any existing form of "link" protection. For 962 example: instead of dorectly setting up a forward_connected adjacency 963 to a next-hop neighbor, this can be a "protected" adjacency that is 964 maintained by RSVP-TE (or another FRR mechanism) and passes via a 965 backup path if the link fails. 967 8.2. BIER-TE and Segment Routing 969 Segment Routing aims to achieve lightweight path engineering via 970 loose source routing. Compared for example to RSVP-TE, it does not 971 require per-path signaling to each of these hops. 973 BIER-TE is supports the same design philosophy for multicast. Like 974 in SR, it relies on source-routing - via the definition of a 975 BitString. Like SR, it only requires to consider the "hops" on which 976 either replication has to happen, or across which the traffic should 977 be steered (even without replication). Any other hops can be skipped 978 via the use of routed adjacencies. 980 Instead of defining BitPositions for non-replicating hops, it is 981 equally possible to use segment routing encapsulations (eg: MPLS 982 label stacks) for "forward_routed" adjacencies. 984 9. Security Considerations 986 The security considerations are the same as for BIER with the 987 following differences: 989 BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures 990 for their distribution, so these are not attack vectors against BIER- 991 TE. 993 10. IANA Considerations 995 This document requests no action by IANA. 997 11. Acknowledgements 999 The authors would like to thank Greg Shepherd, Ijsbrand Wijnands and 1000 Neale Ranns for their extensive review and suggestions. 1002 12. Change log [RFC Editor: Please remove] 1004 01: Added explanation of SI, difference to BIER ECMP, 1005 consideration for Segment Routing, unicast FRR, considerations for 1006 encapsulation, explanations of BIER-TE controller host and CLI. 1008 00: Initial version. 1010 13. References 1012 [I-D.wijnands-bier-architecture] 1013 Wijnands, I., Rosen, E., Dolganow, A., Przygienda, T., and 1014 S. Aldrin, "Multicast using Bit Index Explicit 1015 Replication", draft-wijnands-bier-architecture-05 (work in 1016 progress), March 2015. 1018 [I-D.wijnands-mpls-bier-encapsulation] 1019 Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., and 1020 S. Aldrin, "Encapsulation for Bit Index Explicit 1021 Replication in MPLS Networks", draft-wijnands-mpls-bier- 1022 encapsulation-02 (work in progress), December 2014. 1024 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1025 Requirement Levels", BCP 14, RFC 2119, March 1997. 1027 Authors' Addresses 1029 Toerless Eckert 1030 Cisco Systems, Inc. 1032 Email: eckert@cisco.com 1034 Gregory Cauchie 1035 Bouygues Telecom 1037 Email: GCAUCHIE@bouyguestelecom.fr