idnits 2.17.1 draft-eckert-bier-te-arch-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([I-D.wijnands-bier-architecture]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 5, 2015) is 3334 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'VRF' is mentioned on line 882, but not defined == Missing Reference: 'Index' is mentioned on line 864, but not defined == Missing Reference: 'BitStringLength' is mentioned on line 814, but not defined == Missing Reference: 'BP' is mentioned on line 858, but not defined == Missing Reference: 'BT' is mentioned on line 859, but not defined == Missing Reference: 'I' is mentioned on line 869, but not defined == Unused Reference: 'I-D.wijnands-mpls-bier-encapsulation' is defined on line 917, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-wijnands-bier-architecture-04 Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Eckert 3 Internet-Draft Cisco 4 Intended status: Standards Track March 5, 2015 5 Expires: September 6, 2015 7 Traffic Enginering for Bit Index Explicit Replication BIER-TE 8 draft-eckert-bier-te-arch-00 10 Abstract 12 This document proposes an architecture for BIER-TE: Traffic 13 Engineering for Bit Index Explicit Replication (BIER). 15 BIER-TE shares part of its architecture with BIER as described in 16 [I-D.wijnands-bier-architecture]. It also proposes to share the 17 packet format with BIER. 19 BIER-TE forwards and replicates packets like BIER based on a 20 BitString in the packet header but it does not require an IGP. It 21 does support traffic engineering by explicit hop-by-hop forwarding 22 and loose hop forwarding of packets. It does support Fast ReRoute 23 (FRR) for link and node protection and incremental deployment. 24 Because BIER-TE like BIER operates without explicit in-network tree- 25 building but also supports traffic engineering, it is more similar to 26 SR than RSVP-TE. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on September 6, 2015. 45 Copyright Notice 47 Copyright (c) 2015 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 65 2. Layering . . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 2.1. The Multicast Flow Overlay . . . . . . . . . . . . . . . 4 67 2.2. The BIER-TE Controller Host . . . . . . . . . . . . . . . 4 68 2.2.1. Assignment of BitPositions to adjacencies of the 69 network topology . . . . . . . . . . . . . . . . . . 5 70 2.2.2. Changes in the network topology . . . . . . . . . . . 5 71 2.2.3. Set up per-multicast flow BIER-TE state . . . . . . . 5 72 2.2.4. Link/Node Failures and Recovery . . . . . . . . . . . 6 73 2.3. The BIER-TE Forwarding Layer . . . . . . . . . . . . . . 6 74 2.4. The Routing Underlay . . . . . . . . . . . . . . . . . . 6 75 3. BIER-TE Forwarding . . . . . . . . . . . . . . . . . . . . . 6 76 3.1. The Bit Index Forwarding Table (BIFT) . . . . . . . . . . 7 77 3.2. Adjacency Types . . . . . . . . . . . . . . . . . . . . . 7 78 3.2.1. Forward Connected . . . . . . . . . . . . . . . . . . 7 79 3.2.2. Forward Routed . . . . . . . . . . . . . . . . . . . 8 80 3.2.3. ECMP . . . . . . . . . . . . . . . . . . . . . . . . 8 81 3.2.4. Local Decap . . . . . . . . . . . . . . . . . . . . . 8 82 3.3. Basic BIER-TE Forwarding Example . . . . . . . . . . . . 8 83 4. BIER-TE Controller Host BitPosition Assignments . . . . . . . 10 84 4.1. P2P Links . . . . . . . . . . . . . . . . . . . . . . . . 10 85 4.2. BFER . . . . . . . . . . . . . . . . . . . . . . . . . . 11 86 4.3. Leaf BFIRs . . . . . . . . . . . . . . . . . . . . . . . 11 87 4.4. LANs . . . . . . . . . . . . . . . . . . . . . . . . . . 11 88 4.5. Hub and Spoke . . . . . . . . . . . . . . . . . . . . . . 12 89 4.6. Rings . . . . . . . . . . . . . . . . . . . . . . . . . . 12 90 4.7. Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . . 12 91 4.8. Routed adjacencies . . . . . . . . . . . . . . . . . . . 15 92 4.8.1. Supporting nodes without BIER-TE . . . . . . . . . . 15 94 5. Avoiding loops and duplicates . . . . . . . . . . . . . . . . 15 95 5.1. Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 15 96 5.2. Duplicates . . . . . . . . . . . . . . . . . . . . . . . 16 97 6. FRR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 98 6.1. The BIER-TE Adjacency FRR Table (BTAFT) . . . . . . . . . 16 99 6.2. FRR in BIER-TE forwarding . . . . . . . . . . . . . . . . 17 100 6.3. FRR in the BIER-TE Controller Host . . . . . . . . . . . 17 101 6.4. BIER-TE FRR Benefits . . . . . . . . . . . . . . . . . . 18 102 7. BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . . 18 103 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 104 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 105 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 106 11. Change log [RFC Editor: Please remove] . . . . . . . . . . . 21 107 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 108 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 21 110 1. Introduction 112 1.1. Overview 114 This document specifies the architecture for BIER-TE: traffic 115 engineering for Bit Index Explicit Replication BIER. 117 BIER-TE shares architecture and packet formats with BIER as described 118 in [I-D.wijnands-bier-architecture]. 120 BIER-TE forwards and replicates packets like BIER based on a 121 BitString in the packet header but it does not require an IGP. It 122 does support traffic engineering by explicit hop-by-hop forwarding 123 and loose hop forwarding of packets. It does support Fast ReRoute 124 (FRR) for link and node protection and incremental deployment. 125 Because BIER-TE like BIER operates without explicit in-network tree- 126 building but also supports traffic engineering, it is more similar to 127 SR than RSVP-TE. 129 The key differences over BIER are: 131 o BIER-TE replaces in-network autonomous path calculation by 132 explicit paths calculated offpath by the BIER-TE controller host. 134 o In BIER-TE every BitPosition of the BitString of a BIER-TE packet 135 indicates one or more adjacencies - instead of a BFER as in BIER. 137 o BIER-TE in each BFR has no routing table but only a BIER-TE 138 Forwarding Table (BIFT) indexed by BitPosition and populated with 139 only those adjacencies to which the BFR should replicate packets 140 to. 142 Currently, BIER-TE does not support BIER-sub-domains and it does not 143 not use BFR-id or "Set Identifiers" (SI) in BIER-TE headers that 144 share the same format as BIER headers. 146 1.2. Requirements Language 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 150 document are to be interpreted as described in RFC 2119 [RFC2119]. 152 2. Layering 154 End to end BIER-TE operations consists of four components: The 155 "Multicast Flow Overlay", the "BIER-TE Controller Host", the "Routing 156 Underlay" and the "BIER-TE forwarding layer". 158 Picture 2: Layers of BIER-TE 160 <------BGP/PIM-----> 161 |<-IGMP/PIM-> multicast flow <-PIM/IGMP->| 162 overlay 164 [Bier-TE Controller Host] 165 ^ ^ ^ 166 / | \ BIER-TE control protocol 167 | | | eg.: Netconf/Restconf/Yang 168 v v v 169 Src -> Rtr1 -> BFIR-----BFR-----BFER -> Rtr2 -> Rcvr 171 |--------------------->| 172 BIER-TE forwarding layer 174 |<- BIER-TE domain-->| 176 |<--------------------->| 177 Routing underlay 179 2.1. The Multicast Flow Overlay 181 The Multicast Flow Overlay operates as in BIER. See 182 [I-D.wijnands-bier-architecture]. Instead of interacting with the 183 BIER layer, it interacts with the BIER-TE Controller Host 185 2.2. The BIER-TE Controller Host 187 The BIER-TE controller host is an offpath central host. It 188 communicates via protocols such as Netconf/Restconf/Yang with BFRs. 189 The protocols used between BFRs and the controller are outside the 190 scope of this document. This document is only concerned about the 191 logic how a controller can assign BitPositions to the topology and 192 BitStrings to BIER-TE packets: 194 During bring-up or modifications of the network topology, the 195 controller needs to talk to all BFRs to assign BitPositions to 196 adjacencies of the network topology. During day-to-day operations of 197 the network it only needs to talks to BFIRs to install BitStrings for 198 multicast flows. 200 These two tasks have the following steps: 202 2.2.1. Assignment of BitPositions to adjacencies of the network 203 topology 205 The BIER-TE controller host tracks the BFR topology of the BIER-TE 206 domain. It determines what adjacencies require BitPositions so that 207 BIER-TE explicit paths can be built through them as desired by 208 operator policy. 210 The controller then pushes the BitPositions/adjacencies to the BIFT 211 of the BFRs, populating only those BitPositions to the BIFT of each 212 BFR to which that BFR should be able to send packets to - adjacencies 213 connecting to this BFR. 215 2.2.2. Changes in the network topology 217 If the network topology changes (not failure based) so that 218 adjacencies that are assigned to BitPositions are no longer needed, 219 the controller can re-use those BitPositions for new adjacencies. 220 First, these BitPositions need to be removed from any BFIR flow state 221 and BFR BIFT state (and BTAFT if FRR is supported, see below), then 222 they can be repopulated, first into BIFT (and if FRR is supported 223 BTAFT), then into BFIR. 225 2.2.3. Set up per-multicast flow BIER-TE state 227 The BIER-TE controller host tracks the multicast flow overlay to 228 determine what multicast flow needs to be sent by a BFIR to which set 229 of BFER. It calculates the desired distribution tree across the 230 BIER-TE domain based on algorithms outside the scope of this document 231 (eg.: CSFP, Steiner Tree,...). It then pushes the calculated 232 BitString into the BFIR. 234 2.2.4. Link/Node Failures and Recovery 236 When link or nodes fail or recover in the topology, BIER-TE can 237 quickly respond with the optional FRR procedures described below. It 238 can also more slowly react by recalculating the BitStrings of 239 affected multicast flows. This reaction is slower than the FR 240 procedure because the controller needs to receive link/node up/down 241 indications, recalculate the desired BitStrings and push them down 242 into the BFIRs. with FRR, this is all performed locally on a BFR 243 receiving the adjacency up/down notification. 245 2.3. The BIER-TE Forwarding Layer 247 When the BIER-TE Forwarding Layer receives a packet, it simply looks 248 up the BitPositions that are set in the BitString of the packet in 249 the Bit Index Forwarding Table (BIFT) that was populated by the BIER- 250 TE controller host. For every BP that is set in the BitString, and 251 that has one or more adjacencies in the BIFT, a copy is made 252 according to the type of adjacencies for that BP in the BIFT. Before 253 sending any copy, the BFR resets all BitPositions in the BitString of 254 the packet to which it can create a copy. This is done to inhibit 255 that packets can loop. 257 If the BFR support BIER-TE FRR operations, then the BIER-TE 258 forwarding layer will receive fast adjacency up/down notification 259 uses the BIER-TE FRR Adjacency Table to modify the BitString of the 260 packet before it performs BIER-TE forwarding. This is detailed in 261 the FRR section. 263 2.4. The Routing Underlay 265 BIER-TE is sending BIER packets to directly connected BIER-TE 266 neighbors as L2 (unicasted) BIER packets without requiring a routing 267 underlay. BIER-TE forwarding uses the Routing underlay for 268 forward_routed adjacencies which copy BIER-TE packets to not- 269 directly-connected BFRs (see below for adjacency definitions). 271 If the BFR intends to support FRR for BIER-TE, then the BIER-TE 272 forwarding plane needs to receive fast adjacency up/down 273 notifications: Link up/down or neighbor up/down, eg.: from BFD. 274 Providing these notifications is considered to be part of the routing 275 underlay in this document. 277 3. BIER-TE Forwarding 278 3.1. The Bit Index Forwarding Table (BIFT) 280 The Bit Index Forwarding Table (BIFT) exists in every BFR. It is a 281 table indexed by BitPosition and is populated by the BIER-TE control 282 plane. Each index can be empty or contain a list of one or more 283 adjacencies. 285 ------------------------------------------------------------------ 286 | Index | Adjacencies | 287 ================================================================== 288 | 1 | forward_connected(interface,neighbor,DNR) | 289 ------------------------------------------------------------------ 290 | 2 | forward_connected(interface,neighbor,DNR) | 291 | | forward_connected(interface,neighbor,DNR) | 292 ------------------------------------------------------------------ 293 | 3 | local_decap([VRF]) | 294 ------------------------------------------------------------------ 295 | 4 | forward_routed([VRF,]l3-neighbor) | 296 ------------------------------------------------------------------ 297 | 5 | | 298 ------------------------------------------------------------------ 299 | 6 | ECMP({adjacency1,...adjacencyN}, seed) | 300 ------------------------------------------------------------------ 301 ... 302 | BitStringLength | ... | 303 ------------------------------------------------------------------ 304 Bit Index Forwarding Table 306 The BIFT is programmed into the data plane of BFRs by the BIER-TE 307 controller host and used to forward packets, according to the rules 308 specified in the BIER-TE Forwarding Procedures. 310 Adjacencies for the same BP when populated in more than one BFR by 311 the controller do not have to have the same adjacencies. This is up 312 to the controller. BPs for p2p links are one case (see below). 314 3.2. Adjacency Types 316 3.2.1. Forward Connected 318 A "forward_connected" adjacency is towards a directly connected BFR 319 neighbor using an interface address of that BFR on the connecting 320 interface. A forward_connected adjacency does not route packets but 321 only L2 forwards them to the neighbor. 323 Packets sent to an adjacency with "DoNotReset" (DNR) set in the BIFT 324 will not have the BitPosition for that adjacency reset when the BFR 325 creates a copy for it. The BitPosition will still be reset for 326 copies of the packet made towards other adjacencies. The can be used 327 for example in ring topologies as explained below. 329 3.2.2. Forward Routed 331 A "forward_routed" adjacency is an adjacency towards a BFR that is 332 not a forward_connected adjacency: towards a loopback address of a 333 BFR or towards an interface address that is non-directly connected. 334 Forward_routed packets are forwarded via the Routing Underlay. 336 If the Routing Underlay has multiple paths for a forward_routed 337 adjacency, it will perform ECMP independent of BIER-TE for packets 338 forwarded across a forward_routed adjacency. 340 If the Routing Underlay has FRR, it will perform FRR independent of 341 BIER-TE for packets forwarded across a forward_routed adjacency. 343 3.2.3. ECMP 345 An "Equal Cost Multipath" (ECMP) adjacency has a list of two or more 346 adjacencies included in it. It copies the BIER-TE to one of those 347 adjacencies based on the ECMP hash calculation. The BIER-TE ECMP 348 hash algorithm must select the same adjacency from that list for all 349 packets with the same "entropy" value in the BIER-TE header if the 350 same number of adjacencies and same seed are given as parameters. 351 Further use of the seed parameter is explained below. 353 3.2.4. Local Decap 355 A "local_decap" adjacency passes a copy of the payload of the BIER-TE 356 packet to the packets NextProto within the BFR (IPv4/IPv6, 357 Ethernet,...). A local_decap adjacency turns the BFR into a BFER for 358 matching packets. Local_decap adjacencies require the BFER to 359 support routing or switching for NextProto to determine how to 360 further process the packet. 362 3.3. Basic BIER-TE Forwarding Example 364 Step by step example of basic BIER-TE forwarding. This does not use 365 ECMP or forward_routed adjacencies nor does it try to minimize the 366 number of required BitPositions for the topology. 368 Picture 1: Forwarding Example 370 [Bier-Te Controller Host] 371 / | \ 372 v v v 374 | p13 p1 | 375 +- BFIR2 --+ | 376 | | p2 p6 | LAN2 377 | +-- BFR3 --+ | 378 | | | p7 p11 | 379 Src -+ +-- BFER1 --+ 380 | | p3 p8 | | 381 | +-- BFR4 --+ +-- Rcv1 382 | | | | 383 | | 384 | p14 p4 | 385 +- BFIR1 --+ | 386 | +-- BFR5 --+ p10 p12 | 387 LAN1 | p5 p9 +-- BFER2 --+ 388 | +-- Rcv2 389 | 390 LAN3 392 IP |..... BIER-TE network......| IP 394 pXX indicate the BitPositions number assigned by the BIER-TE 395 controller host to adjacencies in the BIER-TE topology. For example, 396 p9 is the adjacency towards BFR9 on the LAN connecting to BFER2. 398 BIFT BFIR2: 399 p13: local_decap() 400 p2: forward_connected(BFR3) 402 BIFT BFR3: 403 p1: forward_connected(BFIR2) 404 p7: forward_connected(BFER1) 405 p8: forward_connected(BFR4) 407 BIFT BFER1: 408 p11: local_decap() 409 p6: forward_connected(BFR3) 410 p8: forward_connected(BFR4) 412 ...and so on. 414 Traffic needs to flow from BFIR2 towards Rcv1, Rcv2. The controller 415 determines it wants it to pass across the following paths: 417 -> BFER1 ---------------> Rcv1 418 BFIR2 -> BFR3 419 -> BFR4 -> BFR5 -> BFER2 -> Rcv2 421 These paths equal to the following BitString: p2, p5, p7, p8, p10, 422 p11, p12 424 This BitString is set up in BFIR2. Multicast packets arriving at 425 BFIR2 from Src are assigned this BitString. 427 BFIR2 forwards based on that BitString. It has p2 and p13 populated. 428 Only p13 is in BitString which has an adjacency towards BFR3. BFIR2 429 resets p2 in BitString and sends a copy towards BFR2. 431 BFR3 sees a BitString of p5,p7,p8,p10,p11,p12. It is only interested 432 in p1,p7,p8. It creates a copy of the packet to BFER1 (due to p7) 433 and one to BFR4 (due to p8). It resets p7, p8 before sending. 435 BFER1 sees a BitString of p5,p10,p11,p12. It is only interested in 436 p6,p7,p8,p11 and therefore considers only p11. p11 is a "local_decap" 437 adjacency installed by the BIER-TE controller host because BFER1 438 should pass packets to IP multicast. The local_decap adjacency 439 instructs BFER1 to create a copy, decapsulate it from the BIER header 440 and pass it on to the NextProtocol, in this example IP multicast. IP 441 multicast will then forward the packet out to LAN2 because it did 442 receive PIM or IGMP joins on LAN2 for the traffic. 444 Further processing of the packet in BFR4, BFR5 and BFER2 accordingly. 446 4. BIER-TE Controller Host BitPosition Assignments 448 This section describes how the BIER-TE controller host can use the 449 different BIER-TE adjacency types to define the BitPositions of a 450 BIER-TE domain. 452 Because the size of the BitString is limiting the size of the BIER-TE 453 domain, many of the options described exist to support larger 454 topologies with fewer BitPositions (4.1, 4.3, 4.4, 4.5, 4.6, 4.7, 455 4.8). 457 4.1. P2P Links 459 Each P2p link in the BIER-TE domain is assigned one unique 460 BitPosition with a forward_connected adjacency pointing to the 461 neighbor on the p2p link. 463 4.2. BFER 465 Every BFER is given a unique BitPosition with a local_decap 466 adjacency. 468 4.3. Leaf BFIRs 470 Leaf BFIRs are BFIRs where incoming BIER-TE packets never need to be 471 forwarded to another BFR but are only sent to the BFIR to exit the 472 BIER-TE domain. For example, in networks where PEs are spokes 473 connected to P routers, those PEs are Leaf BFIRs unless there is a 474 U-turn between two PEs. 476 All leaf-BFIR in a BIER-TE domain can share a single BitPosition. 477 This is possible because the BitPosition for the adjacency to reach 478 the BFIR can be used to distinguish whether or not packets should 479 reach the BFIR. 481 This optimization will not work if an upstream interface of the BFIR 482 is using a BitPosition optimized as described in the following two 483 sections (LAN, Hub and Spoke). 485 4.4. LANs 487 In a LAN, the adjacency to each neighboring BFR on the LAN is given a 488 unique BitPosition. The adjacency of this BitPosition is a 489 forward_connected adjacency towards the BFR and this BitPosition is 490 populated into the BIFT of all the other BFRs on that LAN. 492 BFR1 493 |p1 494 LAN1-+-+---+-----+ 495 p3| p4| p2| 496 BFR3 BFR4 BFR7 498 If Bandwidth on the LAN is not an issue and most BIER-TE traffic 499 should be copied to all neighbors on a LAN, then BitPositions can be 500 saved by assigning just a single BitPosition to the LAN and 501 populating the BitPosition of the BIFTs of each BFRs on the LAN with 502 a list of forward_connected adjacencies to all other neighbors on the 503 LAN. 505 This optimization does not work in the face of BFRs redundantly 506 connected to more than one LANs with this optimization because these 507 BFRs would receive duplicates and forward those duplicates into the 508 opposite LANs. Adjacencies of such BFRs into their LANs still need a 509 separate BitPosition. 511 4.5. Hub and Spoke 513 In a setup with a hub and multiple spokes connected via separate p2p 514 links to the hub, all p2p links can share the same BitPosition. The 515 BitPosition on the hubs BIFT is set up with a list of 516 forward_connected adjacencies, one for each Spoke. 518 This option is similar to the BitPosition optimization in LANs: 519 Redundantly connected spokes need their own BitPositions. 521 4.6. Rings 523 In L3 rings, instead of assigning a single BitPosition for every p2p 524 link in the ring, it is possible to save BitPositions by setting the 525 "Do Not Reset" (DNR) flag on forward_connected adjacencies. 527 For the rings shown in the following picture, a single BitPosition 528 will suffice to forward traffic entering the ring at BFRa or BFRb all 529 the way up to BFR1: 531 On BFRa, BFRb, BFR30,... BFR3, the BitPosition is populated with a 532 forward_connected adjacency pointing to the clockwise neighbor on the 533 ring and with DNR set. On BFR2, the adjacency also points to the 534 clockwise neighbor BFR1, but without DNR set. Handling DNR this way 535 ensures that copies forwarded from any BFR in the ring to a BFR 536 outside the ring will not have this BitPosition, therefore minimizing 537 the chance to create loops. 539 v v 540 | | 541 L1 | L2 | L3 542 /-------- BFRa ---- BFRb --------------------\ 543 | | 544 \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/ 545 | | L4 | | 546 p33| p15| 547 BFRd BFRc 549 4.7. Equal Cost MultiPath (ECMP) 551 The ECMP adjacency allows to use just one BP per link bundle between 552 two BFRs instead of one BP for each p2p member link of that link 553 bundle. In the following picture, one BP is used across L1,L2,L3 and 554 BFR1/BFR2 have for the BP 555 --L1----- 556 BFR1 --L2----- BFR2 557 --L3----- 559 BIFT entry in BFR1: 560 ------------------------------------------------------------------ 561 | Index | Adjacencies | 562 ================================================================== 563 | 6 | ECMP({L1-to-BFR2,L2-to-BFR2,L3-to-BFR2}, seed) | 564 ------------------------------------------------------------------ 566 BIFT entry in BFR2: 567 ------------------------------------------------------------------ 568 | Index | Adjacencies | 569 ================================================================== 570 | 6 | ECMP({L1-to-BFR1,L2-to-BFR1,L3-to-BFR1}, seed) | 571 ------------------------------------------------------------------ 573 In the following example, all traffic from BFR1 towards BFR10 is 574 intended to be ECMP load split equally across the topology. This 575 example is not mean as a likely setup, but to illustrate that ECMP 576 can be used to share BPs not only across link bundles, and it 577 explains the use of the seed parameter. 579 BFR1 580 / \ 581 /L11 \L12 582 BFR2 BFR3 583 / \ / \ 584 /L21 \L22 /L31 \L32 585 BFR4 BFR5 BFR6 BFR7 586 \ / \ / 587 \ / \ / 588 BFR8 BFR9 589 \ / 590 \ / 591 BFR10 593 BIFT entry in BFR1: 594 ------------------------------------------------------------------ 595 | 6 | ECMP({L11-to-BFR2,L12-to-BFR3}, seed) | 596 ------------------------------------------------------------------ 598 BIFT entry in BFR2: 599 ------------------------------------------------------------------ 600 | 6 | ECMP({L21-to-BFR4,L22-to-BFR5}, seed) | 601 ------------------------------------------------------------------ 603 BIFT entry in BFR3: 604 ------------------------------------------------------------------ 605 | 6 | ECMP({L31-to-BFR6,L32-to-BFR7}, seed) | 606 ------------------------------------------------------------------ 608 With the setup of ECMP in above topology, traffic would not be 609 equally load-split. Instead, links L22 and L31 would see no traffic 610 at all: BFR2 will only see traffic from BFR1 for which the ECMP hash 611 in BFR1 selected the first adjacency in a list of 2 adjacencies: link 612 L11-to-BFR2. When forwarding in BFR2 performs again an ECMP with two 613 adjacencies on that subset of traffic, then it will again select the 614 first of its two adjacencies to it: L21-to-BFR4. And therefore L22 615 and BFR5 sees no traffic. 617 To resolve this issue, the ECMP adjaceny on BFR1 simply needs to be 618 set up with a different seed than the ECMP adjacncies on BFR2/BFR3 620 This issue is called polarization. It depends on the ECMP hash. It 621 is possible to build ECMP that does not have polarization, for 622 example by taking entropy from the actual adjacency members into 623 account, but that can make it harder to achieve evenly balanced load- 624 splitting on all BFR without making the ECMP hash algorithm 625 potentially too complex for fast forwarding in the BFRs. 627 4.8. Routed adjacencies 629 Routed adjacencies can reduce the number of BitPositions required 630 when the traffic engineering requirement is not hop-by-hop explicit 631 path selection, but loose-hop selection. 633 ............... ............... 634 BFR1--... Redundant ...--L1-- BFR2... Redundant ...--- 635 \--... Network ...--L2--/ ... Network ...--- 636 BFR4--... Segment 1 ...--L3-- BFR3... Segment 2 ...--- 637 ............... ............... 639 Assume he requirement in above network is to explicitly engineer 640 paths such that specific traffic flows are passed from segment 1 to 641 segment 2 via link L1 (or via L2 or via L3). 643 To achieve this, BFR1 and BFR4 are set up with a forward_routed 644 adjacency BitPosition towards an address of BFR2 on link L1 (or link 645 L2 BFR3 via L3). 647 For paths to be engineered through a specific node BFR2 (or BFR3), 648 BFR1 and BFR4 are set up up with a forward_routed adjacency 649 BitPosition towards a loopback address of BFR2 (or BFR3). 651 4.8.1. Supporting nodes without BIER-TE 653 Routed adjacencies also enable incremental deployment of BIER-TE. 654 Only the nodes through which BIER-TE traffic needs to be steered - 655 with or without replication - need to support BIER-TE. Where they 656 are not directly connected to each other, forward_routed adjacencies 657 are used to pass over non BIER-TE enabled nodes. 659 5. Avoiding loops and duplicates 661 5.1. Loops 663 Whenever BIER-TE creates a copy of a packet, the BitString of that 664 copy will have all BitPositions cleared that are associated with 665 adjacencies in the BFR. This inhibits looping of packets. The only 666 exception are adjacencies with DNR set. 668 With DNR set, looping can happen. Consider in the ring picture that 669 link L4 from BFR3 is plugged into the L1 interface of BFRa. This 670 creates a loop where the rings clockwise BitPosition is never reset 671 for copies of the packets traveling clockwise around the ring. 673 To inhibit looping in the face of such physical misconfiguration, 674 only forward_connected adjacencies are permitted to have DNR set, and 675 the link layer destination address of the adjacency (eg.: MAC 676 address) protects against closing the loop. Link layers without port 677 unique link layer addresses should not used with the DNR flag set. 679 5.2. Duplicates 681 Duplicates happen when the topology of the BitString is not a tree 682 but redundantly connecting BFRs with each other. The controller must 683 therefore ensure to only create BitStrings that are trees in the 684 topology. 686 When links are incorrectly physically re-connected before the 687 controller updates BitStrings in BFIRs, duplicates can happen. Like 688 loops, these can be inhibited by link layer addressing in 689 forward_connected adjacencies. 691 If interface or loopback addresses used in forward_routed adjacencies 692 are moved from one BFR to another, duplicates can equally happen. 693 Such re-addressing operations must be coordinated with the 694 controller. 696 6. FRR 698 FRR is an optional procedure. To leverage it, the BIER-TE controller 699 host and BFRs need to support it. It does not have to be supported 700 on all BFRs, but only those that are attached to a link/adjacency for 701 which FRR support is required. 703 If BIER-TE FRR is supported by the BIER-TE controller host, then it 704 needs to calculate the desired backup paths for link and/or node 705 failures in the BIER-TE domain and download this information into the 706 BIER-TE Adjacency FRR Table (BTAFT) of the BFRs. The BTAFT then 707 drives FRR operations in the BIER-TE forwarding plane of that BFR. 709 6.1. The BIER-TE Adjacency FRR Table (BTAFT) 711 The BIER-TE IF FRR Table exists in every BFR that is supporting BIER- 712 TE FRR procedures. It is indexed by FRR Adjacency Index. Associated 713 with each FRR Adjacency Index is a ResetBitmask, AddBitmask and 714 BitPosition. 716 ----------------------------------------------------------- 717 | FRR Adjacency | BitPosition | ResetBitmask | AddBitmask | 718 | Index | | | | 719 =========================================================== 720 | 1 | 5 | ..0010000 | ..11000000 | 721 ----------------------------------------------------------- 722 ... 724 An FRR Adjacency is an adjacency that is used in the BIFT of the BFR. 725 The BFR has to be able to determine whether the adjacency is up or 726 down in less than 50msec. An FRR adjacency can be a 727 forward_connected adjacency with fast L2 link state Up/Down state 728 notifications or a forward_connected or forward_routed adjacency with 729 a fast aliveness mechanism such as BFD. Details of those mechanism 730 are outside the scope of this architecture. 732 The FRR Adjacency Index is the index that would be indicated on the 733 fast Up/Down notifications to the BIER-TE forwarding plane 735 The BitPosition is the BP in the BIFT in which the FRR Adjacency is 736 used 738 6.2. FRR in BIER-TE forwarding 740 The BIER-TE forwarding plane receives fast Up/Down notifications with 741 the FRR Adjacency Index. From the BitPosition in the BTAFT entry, it 742 remembers which BPs are currently affected (have a down adjacency). 744 When a packet is received, BIER-TE forwarding checks if it has 745 affected BPs to which it would forward. If it does, it will remove 746 the ResetBitmask bits from the packets BitString and add the 747 AddBitmask bits to the packets BitString. 749 Afterwards, normal BIER-TE forwarding occurs, taking the modified 750 BitString into account. 752 6.3. FRR in the BIER-TE Controller Host 754 The basic rules how the BIER-TE controller host would calculate 755 ResetBitMask and AddBitmask are as follows: 757 1. The BIER-TE controller host has to determine whether a failure of 758 the adjacency should be taken to indicate link or node failure. 759 This is a policy decision. 761 2. The ResetBitmask has the BitPosition of the failed adjacency. 763 3. In the case of link protection, the AddBitmask are the segments 764 forming a path from the BFR over to the BFR on the other end of 765 the failed link. 767 4. In the case of node protection, the AddBitmask are the segments 768 forming a tree from the BFR over to all necessary BFR downstream 769 of the (assumed to be failed) BFR across the failed adjacency. 771 5. The ResetBitmask is extended with those segments that could lead 772 to duplicate packets if the AddBitmask is added to possible 773 BitStrings of packets using the failing BitPosition. 775 6.4. BIER-TE FRR Benefits 777 Compared to other FRR solutions, such as RSVP-TE/P2MP FRR, BIER-TE 778 FRR has two key distinctions 780 o It maintains the goal of BIER-TE not to establish in-network per 781 multicast traffic flow state. For that reason, the backup path/ 782 trees are only tied to the topology but not to individual 783 distribution trees. 785 o For the case of node failure, it allows to build a path engineered 786 backup tree (4.) as opposed to only a set of p2p backup tunnels. 788 7. BIER-TE Forwarding Pseudocode 790 The following sections of Pseudocode are meant to illustrate the 791 BIER-TE forwarding plane. This code is not meant to be normative but 792 to serve both as a potentially easier to read and more precise 793 representation of the forwarding functionality and to illustrate how 794 simple BIER-TE forwarding is and that it can be efficiently be 795 implemented. 797 The following procedure is executed on a BFR whenever the BIFT is 798 changed by the BIER-TE controller host: 800 global MyBitsOfInterest 802 void BIFTChanged() 803 { 805 for (Index = 0; Index++ ; Index <= BitStringLength) 806 if(BIFT[Index] != ) 807 MyBitsOfInterest != 2<<(Index-1) 808 } 810 The following procedure is executed whenever an adjacency used for 811 BIER-TE FRR changes state: 813 global ResetBitMaskByBT[BitStringLength] 814 global AddtBitMaskByBT[BitStringLength] 815 global FRRaffectedBP 817 void FrrUpDown(FrrAdjacencyIndex, UpDown) 818 { 819 global FRRAdjacenciesDown 820 local Idx = FrrAdjacencyIndex 822 if (UpDown == Up) 823 FRRAdjacenciesDown &= ~ 2<<(FrrAdjacencyIndex-1) 824 else 825 FRRAdjacenciesDown |= 2<<(FrrAdjacencyIndex-1) 827 for (Index = GetFirstBitPosition(FRRAdjacenciesDown); Index ; 828 Index = GetNextBitPosition(FRRAdjacenciesDown, Index)) 830 local BP = BTAFT[Index].BitPosition 831 FRRaffectedBP |= 2<<(Index) 832 ResetBitMaskByBT[BP] |= BTAFT[Index].ResetBitMask 833 AddBitMaskByBT[BP] |= BTAFT[Index].AddBitMask 834 } 836 The following procedure is executed whenever a BIER-TE packet is to 837 be forwarded: 839 void ForwardBierTePacket (Packet) 840 { 841 // We calculate in BitMask the subset of BPs of the BitString 842 // for which we have adjacencies. This is purely an 843 // optimization to avoid to replicate for every BP 844 // set in BitString only to discover that for most of them, 845 // the BIFT has no adjacency. 847 local BitMask = Packet->BitString 848 Packet->BitString &= ~MyBitsOfInterest 849 BitMask &= MyBitsOfInterest 851 // FRR Operations 852 // Note: this algorithm is not optimal yet for ECMP cases 853 // it performs FRR replacement for all candidate ECMP paths 855 local MyFRRBP = BitMask & FRRaffectedBP 856 for (BP = GetFirstBitPosition(MyFRRNP); BP ; 857 BP = GetNextBitPosition(MyFRRNP, BP)) 858 BitMask &= ~ResetBitMaskByBT[BP] 859 BitMask |= ResetBitMaskByBT[BT] 861 // Replication 862 for (Index = GetFirstBitPosition(BitMask); Index ; 863 Index = GetNextBitPosition(BitMask, Index)) 864 foreach adjacency BIFT[Index] 866 if(adjacency == ECMP(ListOfAdjacencies, seed) ) 867 I = ECMP_hash(sizeof(ListOfAdjacencies), 868 Packet->Entropy, seed) 869 adjacency = ListOfAdjacencies[I] 871 PacketCopy = Copy(Packet) 873 switch(adjacency) 874 case forward_connected(interface,neighbor,DNR): 875 if(DNR) 876 PacketCopy->BitString |= 2<<(Index-1) 877 SendToL2Unicast(PacketCopy,interface,neighbor) 879 case forward_routed([VRF],neighbor): 880 SendToL3(PacketCopy,[VRF,]l3-neighbor) 882 case local_decap([VRF],neighbor): 883 DecapBierHeader(PacketCopy) 884 PassTo(PacketCopy,[VRF,]Packet->NextProto) 885 } 887 8. Security Considerations 889 The security considerations are the same as for BIER with the 890 following differences: 892 BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures 893 for their distribution, so these are not attack vectors against BIER- 894 TE. 896 9. IANA Considerations 898 This document requests no action by IANA. 900 10. Acknowledgements 902 The author would like to thank Ijsbrand Wijnands and Neale Ranns for 903 their extensive review and suggestions. 905 11. Change log [RFC Editor: Please remove] 907 00: Initial version. 909 12. References 911 [I-D.wijnands-bier-architecture] 912 Wijnands, I., Rosen, E., Dolganow, A., Przygienda, T., and 913 S. Aldrin, "Multicast using Bit Index Explicit 914 Replication", draft-wijnands-bier-architecture-04 (work in 915 progress), February 2015. 917 [I-D.wijnands-mpls-bier-encapsulation] 918 Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., and 919 S. Aldrin, "Encapsulation for Bit Index Explicit 920 Replication in MPLS Networks", draft-wijnands-mpls-bier- 921 encapsulation-02 (work in progress), December 2014. 923 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 924 Requirement Levels", BCP 14, RFC 2119, March 1997. 926 Author's Address 928 Toerless Eckert 929 Cisco 931 Email: eckert@cisco.com