idnits 2.17.1 

draft-ietf-bier-te-arch-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC8279]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 1, 2019) is 1638 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '2' on line 1260

  -- Looks like a reference, but probably isn't: '1' on line 1274

  == Missing Reference: 'SI' is mentioned on line 1314, but not defined

  == Missing Reference: 'I' is mentioned on line 1321, but not defined

  == Missing Reference: 'VRF' is mentioned on line 1788, but not defined

  == Outdated reference: A later version (-06) exists of
     draft-ietf-bier-multicast-http-response-01


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                     T. Eckert, Ed.
3	Internet-Draft                                                 Futurewei
4	Intended status: Standards Track                              G. Cauchie
5	Expires: May 4, 2020                                    Bouygues Telecom
6	                                                                M. Menth
7	                                                 University of Tuebingen
8	                                                        November 1, 2019

10	    Traffic Engineering for Bit Index Explicit Replication (BIER-TE)
11	                       draft-ietf-bier-te-arch-05

13	Abstract

15	   This memo introduces per-packet stateless strict and loose path
16	   engineered replication and forwarding for Bit Index Explicit
17	   Replication packets ([RFC8279]).  This is called BIER-TE.

19	   BIER-TE leverages the BIER architecture ([RFC8279]) and extends it
20	   with a new semantic for bits in the bitstring.  BIER-TE can leverage
21	   BIER forwarding engines with little or no changes.

23	   In BIER, the BitPositions (BP) of the packets bitstring indicate BIER
24	   Forwarding Egress Routers (BFER), and hop-by-hop forwarding uses a
25	   Routing Underlay such as an IGP.

27	   In BIER-TE, BitPositions indicate adjacencies.  The BIFT of each BFR
28	   are only populated with BPs that are adjacent to the BFR in the BIER-
29	   TE topology.  The BIER-TE topology can consist of layer 2 or remote
30	   (route) adjacencies.  The BFR then replicates and forwards BIER
31	   packets to those adjacencies.  This results in the aforementioned
32	   strict and loose path forwarding.

34	   BIER-TE can co-exist with BIER forwarding in the same domain, for
35	   example by using separate sub-domains.  In the absence of routed
36	   adjacencies, BIER-TE does not require a BIER routing underlay, and
37	   can then be operated without requiring an IGP routing protocol.

39	   BIER-TE operates without explicit in-network tree-building and
40	   carries the multicast distribution tree in the packet header.  It can
41	   therefore be a good fit to support multicast path steering in Segment
42	   Routing (SR) networks.

44	Status of This Memo

46	   This Internet-Draft is submitted in full conformance with the
47	   provisions of BCP 78 and BCP 79.

49	   Internet-Drafts are working documents of the Internet Engineering
50	   Task Force (IETF).  Note that other groups may also distribute
51	   working documents as Internet-Drafts.  The list of current Internet-
52	   Drafts is at https://datatracker.ietf.org/drafts/current/.

54	   Internet-Drafts are draft documents valid for a maximum of six months
55	   and may be updated, replaced, or obsoleted by other documents at any
56	   time.  It is inappropriate to use Internet-Drafts as reference
57	   material or to cite them other than as "work in progress."

59	   This Internet-Draft will expire on May 4, 2020.

61	Copyright Notice

63	   Copyright (c) 2019 IETF Trust and the persons identified as the
64	   document authors.  All rights reserved.

66	   This document is subject to BCP 78 and the IETF Trust's Legal
67	   Provisions Relating to IETF Documents
68	   (https://trustee.ietf.org/license-info) in effect on the date of
69	   publication of this document.  Please review these documents
70	   carefully, as they describe your rights and restrictions with respect
71	   to this document.  Code Components extracted from this document must
72	   include Simplified BSD License text as described in Section 4.e of
73	   the Trust Legal Provisions and are provided without warranty as
74	   described in the Simplified BSD License.

76	Table of Contents

78	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
79	     1.1.  Basic Examples  . . . . . . . . . . . . . . . . . . . . .   4
80	     1.2.  BIER-TE Topology and adjacencies  . . . . . . . . . . . .   7
81	     1.3.  Comparison with BIER  . . . . . . . . . . . . . . . . . .   8
82	     1.4.  Requirements Language . . . . . . . . . . . . . . . . . .   8
83	   2.  Components  . . . . . . . . . . . . . . . . . . . . . . . . .   8
84	     2.1.  The Multicast Flow Overlay  . . . . . . . . . . . . . . .   9
85	     2.2.  The BIER-TE Controller Host . . . . . . . . . . . . . . .   9
86	       2.2.1.  Assignment of BitPositions to adjacencies of the
87	               network topology  . . . . . . . . . . . . . . . . . .  10
88	       2.2.2.  Changes in the network topology . . . . . . . . . . .  10
89	       2.2.3.  Set up per-multicast flow BIER-TE state . . . . . . .  10
90	       2.2.4.  Link/Node Failures and Recovery . . . . . . . . . . .  11
91	     2.3.  The BIER-TE Forwarding Layer  . . . . . . . . . . . . . .  11
92	     2.4.  The Routing Underlay  . . . . . . . . . . . . . . . . . .  11
93	   3.  BIER-TE Forwarding  . . . . . . . . . . . . . . . . . . . . .  11
94	     3.1.  The Bit Index Forwarding Table (BIFT) . . . . . . . . . .  11
95	     3.2.  Adjacency Types . . . . . . . . . . . . . . . . . . . . .  13
96	       3.2.1.  Forward Connected . . . . . . . . . . . . . . . . . .  13
97	       3.2.2.  Forward Routed  . . . . . . . . . . . . . . . . . . .  13
98	       3.2.3.  ECMP  . . . . . . . . . . . . . . . . . . . . . . . .  13
99	       3.2.4.  Local Decap . . . . . . . . . . . . . . . . . . . . .  14
100	     3.3.  Encapsulation considerations  . . . . . . . . . . . . . .  14
101	     3.4.  Basic BIER-TE Forwarding Example  . . . . . . . . . . . .  14
102	     3.5.  Forwarding comparison with BIER . . . . . . . . . . . . .  17
103	     3.6.  Requirements  . . . . . . . . . . . . . . . . . . . . . .  17
104	   4.  BIER-TE Controller Host BitPosition Assignments . . . . . . .  18
105	     4.1.  P2P Links . . . . . . . . . . . . . . . . . . . . . . . .  18
106	     4.2.  BFER  . . . . . . . . . . . . . . . . . . . . . . . . . .  18
107	     4.3.  Leaf BFERs  . . . . . . . . . . . . . . . . . . . . . . .  18
108	     4.4.  LANs  . . . . . . . . . . . . . . . . . . . . . . . . . .  19
109	     4.5.  Hub and Spoke . . . . . . . . . . . . . . . . . . . . . .  20
110	     4.6.  Rings . . . . . . . . . . . . . . . . . . . . . . . . . .  20
111	     4.7.  Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . .  21
112	     4.8.  Routed adjacencies  . . . . . . . . . . . . . . . . . . .  24
113	       4.8.1.  Reducing BitPositions . . . . . . . . . . . . . . . .  24
114	       4.8.2.  Supporting nodes without BIER-TE  . . . . . . . . . .  24
115	     4.9.  Reuse of BitPositions (without DNR) . . . . . . . . . . .  24
116	     4.10. Summary of BP optimizations . . . . . . . . . . . . . . .  26
117	   5.  Avoiding loops and duplicates . . . . . . . . . . . . . . . .  27
118	     5.1.  Loops . . . . . . . . . . . . . . . . . . . . . . . . . .  27
119	     5.2.  Duplicates  . . . . . . . . . . . . . . . . . . . . . . .  27
120	   6.  BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . .  27
121	   7.  Managing SI, subdomains and BFR-ids . . . . . . . . . . . . .  30
122	     7.1.  Why SI and sub-domains  . . . . . . . . . . . . . . . . .  31
123	     7.2.  Bit assignment comparison BIER and BIER-TE  . . . . . . .  32
124	     7.3.  Using BFR-id with BIER-TE . . . . . . . . . . . . . . . .  32
125	     7.4.  Assigning BFR-ids for BIER-TE . . . . . . . . . . . . . .  33
126	     7.5.  Example bit allocations . . . . . . . . . . . . . . . . .  34
127	       7.5.1.  With BIER . . . . . . . . . . . . . . . . . . . . . .  34
128	       7.5.2.  With BIER-TE  . . . . . . . . . . . . . . . . . . . .  35
129	     7.6.  Summary . . . . . . . . . . . . . . . . . . . . . . . . .  36
130	   8.  BIER-TE and Segment Routing (SR)  . . . . . . . . . . . . . .  36
131	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  37
132	   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  38
133	   11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  38
134	   12. Change log [RFC Editor: Please remove]  . . . . . . . . . . .  38
135	   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  42
136	     13.1.  Normative References . . . . . . . . . . . . . . . . . .  42
137	     13.2.  Informative References . . . . . . . . . . . . . . . . .  43
138	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  43

140	1.  Introduction

142	   BIER-TE shares architecture, terminology and packet formats with BIER
143	   as described in [RFC8279] and [RFC8296].  This document describes
144	   BIER-TE in the expectation that the reader is familiar with these two
145	   documents.

147	   In BIER-TE, BitPositions (BP) indicate adjacencies.  The BIFT of each
148	   BFR is only populated with BP that are adjacent to the BFR in the
149	   BIER-TE Topology.  Other BPs are left without adjacency.  The BFR
150	   replicate and forwards BIER packets to adjacent BPs that are set in
151	   the packet.  BPs are normally also reset upon forwarding to avoid
152	   duplicates and loops.  This is detailed further below.

154	   Note that related work, [I-D.ietf-roll-ccast] uses bloom filters to
155	   represent leaves or edges of the intended delivery tree.  Bloom
156	   filters in general can support larger trees/topologies with fewer
157	   addressing bits than explicit bitstrings, but they introduce the
158	   heuristic risk of false positives and cannot reset bits in the
159	   bitstring during forwarding to avoid loops.  For these reasons, BIER-
160	   TE uses explicit bitstrings like BIER.  The explicit bitstrings of
161	   BIER-TE can also be seen as a special type of bloom filter, and this
162	   is how related work [ICC] describes it.

164	1.1.  Basic Examples

166	   BIER-TE forwarding is best introduced with simple examples.

168	   BIER-TE Topology:

170	      Diagram:

172	                       p5    p6
173	                     --- BFR3 ---
174	                  p3/    p13     \p7
175	      BFR1 ---- BFR2              BFR5 ----- BFR6
176	         p1   p2  p4\    p14     /p10 p11   p12
177	                     --- BFR4 ---
178	                       p8    p9

180	      (simplified) BIER-TE Bit Index Forwarding Tables (BIFT):

182	      BFR1:   p1  -> local_decap
183	              p2  -> forward_connected to BFR2

185	      BFR2:   p1  -> forward_connected to BFR1
186	              p5  -> forward_connected to BFR3
187	              p8  -> forward_connected to BFR4

189	      BFR3:   p3  -> forward_connected to BFR2
190	              p7  -> forward_connected to BFR5
191	              p13 -> local_decap

193	      BFR4:   p4  -> forward_connected to BFR2
194	              p10 -> forward_connected to BFR5
195	              p14 -> local_decap

197	      BFR5:   p6  -> forward_connected to BFR3
198	              p9  -> forward_connected to BFR4
199	              p12 -> forward_connected to BFR6

201	      BFR6:   p11 -> forward_connected to BFR5
202	              p12 -> local_decap

204	                      Figure 1: BIER-TE basic example

206	   Consider the simple network in the above BIER-TE overview example
207	   picture with 6 BFRs. p1...p14 are the BitPositions (BP) used.  All
208	   BFRs can act as ingress BFR (BFIR), BFR1, BFR3, BFR4 and BFR6 can
209	   also be egress BFR (BFER).  Forward_connected is the name for
210	   adjacencies that are representing subnet adjacencies of the network.
211	   Local_decap is the name of the adjacency to decapsulate BIER-TE
212	   packets and pass their payload to higher layer processing.

214	   Assume a packet from BFR1 should be sent via BFR4 to BFR6.  This
215	   requires a bitstring (p2,p8,p10,p12).  When this packet is examined
216	   by BIER-TE on BFR1, the only BitPosition from the bitstring that is
217	   also set in the BIFT is p2.  This will cause BFR1 to send the only
218	   copy of the packet to BFR2.  Similarly, BFR2 will forward to BFR4
219	   because of p8, BFR4 to BFR5 because of p10 and BFR5 to BFR6 because
220	   of p12. p12 also makes BFR6 receive and decapsulate the packet.

222	   To send in addition to BFR6 via BFR4 also a copy to BFR3, the
223	   bitstring needs to be (p2,p5,p8,p10,p12,p13).  When this packet is
224	   examined by BFR2, p5 causes one copy to be sent to BFR3 and p8 one
225	   copy to BFR4.  When BFR3 receives the packet, p13 will cause it to
226	   receive and decapsulate the packet.

228	   If instead the bitstring was (p2,p6,p8,p10,p12,p13), the packet would
229	   be copied by BFR5 towards BFR3 because p6 instead of BFR2 to BFR5
230	   because of p6 in the prior case.  This is showing the ability of the
231	   shown BIER-TE Topology to make the traffic pass across any possible
232	   path and be replicated where desired.

234	   BIER-TE has various options to minimize BP assignments, many of which
235	   are based on assumptions about the required multicast traffic paths
236	   and bandwidth consumption in the network.

238	   The following picture shows a modified example, in which Rtr2 and
239	   Rtr5 are assumed not to support BIER-TE, so traffic has to be unicast
240	   encapsulated across them.  Unicast tunneling of BIER-TE packets can
241	   leverage any feasible mechanism such as MPLS or IP, these
242	   encapsulations are out of scope of this document.  To emphasize non-
243	   native forwarding of BIER-TE packets, these adjacencies are called
244	   "forward_routed", but otherwise there is no difference in their
245	   processing over the aforementioned "forward_connected" adjacencies.

247	   In addition, bits are saved in the following example by assuming that
248	   BFR1 only needs to be BFIR but not BFER or transit BFR.

250	   BIER-TE Topology:

252	      Diagram:

254	                      p1  p3  p7
255	                   ....> BFR3 <....       p5
256	           ........                ........>
257	      BFR1       (Rtr2)          (Rtr5)      BFR6
258	           ........                ........>
259	                   ....> BFR4 <....       p6
260	                      p2  p4  p8

262	      (simplified) BIER-TE Bit Index Forwarding Tables (BIFT):

264	      BFR1:   p1  -> forward_routed to BFR3
265	              p2  -> forward_routed to BFR4

267	      BFR3:   p3  -> local_decap
268	              p5  -> forward_routed to BFR6

270	      BFR4:   p4  -> local_decap
271	              p6  -> forward_routed to BFR6

273	      BFR6:   p5  -> local_decap
274	              p6  -> local_decap
275	              p7  -> forward_routed to BFR3
276	              p8  -> forward_routed to BFR4

278	                  Figure 2: BIER-TE basic overlay example

280	   To send a BIER-TE packet from BFR1 via BFR3 to BFR6, the bitstring is
281	   (p1,p5).  From BFR1 via BFR4 to BFR6 it is (p2,p6).  A packet from
282	   BFR1 to BFR3,BFR4 and BFR6 can use (p1,p2,p3,p4,p5) or
283	   (p1,p2,p3,p4,p6), or via BFR6 (p2,p3,p4,p6,p7) or (p1.p3,p4,p5,p8).

285	1.2.  BIER-TE Topology and adjacencies

287	   The key new component in BIER-TE to control where replication can or
288	   should happens and how to minimize the required BP for segments is -
289	   as shown in these two examples - the BIER-TE topology.

291	   The BIER-TE Topology effectively consists of the BIFT of all the BFR
292	   and can also be expressed in a diagram as a graph where the edges are
293	   the adjacencies between the BFR.  Adjacencies are naturally
294	   unidirectional.  BP can be reused across multiple adjacencies as long
295	   as this does not lead to undesired duplicates or loops as explained
296	   further down in the text.

298	   If the BIER-TE topology represents the underlying (layer 2) topology
299	   of the network, this is called "native" BIER-TE as shown in the first
300	   example.  This can be freely mixed with "overlay" BIER-TE, in
301	   "forward_routed" adjacencies are used.

303	1.3.  Comparison with BIER

305	   The key differences over BIER are:

307	   o  BIER-TE replaces in-network autonomous path calculation by
308	      explicit paths calculated off-path by the BIER-TE controller host.

310	   o  In BIER-TE every BitPosition of the BitString of a BIER-TE packet
311	      indicates one or more adjacencies - instead of a BFER as in BIER.

313	   o  BIER-TE in each BFR has no routing table but only a BIER-TE
314	      Forwarding Table (BIFT) indexed by SI:BitPosition and populated
315	      with only those adjacencies to which the BFR should replicate
316	      packets to.

318	   BIER-TE headers use the same format as BIER headers.

320	   BIER-TE forwarding does not require/use the BFIR-ID.  The BFIR-ID can
321	   still be useful though for coordinated BFIR/BFER functions, such as
322	   the context for upstream assigned labels for MPLS payloads in MVPN
323	   over BIER-TE.

325	   If the BIER-TE domain is also running BIER, then the BFIR-ID in BIER-
326	   TE packets can be set to the same BFIR-ID as used with BIER packets.

328	   If the BIER-TE domain is not running full BIER or does not want to
329	   reduce the need to allocate bits in BIER bitstrings for BFIR-ID
330	   values, then the allocation of BFIR-ID values in BIER-TE packets can
331	   be done through other mechanisms outside the scope of this document,
332	   as long as this is appropriately agreed upon between all BFIR/BFER.

334	1.4.  Requirements Language

336	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
337	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
338	   document are to be interpreted as described in RFC 2119 [RFC2119].

340	2.  Components

342	   End to end BIER-TE operations consists of four mayor components: The
343	   "Multicast Flow Overlay", the "BIER-TE control plane" consisting of
344	   the "BIER-TE Controller Host" and its signaling channels to the BFR,
345	   the "Routing Underlay" and the "BIER-TE forwarding layer".  The Bier-
346	   TE Controller Host is the new architectural component in BIER-TE
347	   compared to BIER.

349	      Picture 2: Components of BIER-TE

351	                   <------BGP/PIM----->
352	      |<-IGMP/PIM->  multicast flow   <-PIM/IGMP->|
353	                        overlay

355	                  [BIER-TE Controller Host] <=> [BIER-TE Topology]
356	                   BIER-TE control plane
357	                      ^      ^     ^
358	                     /       |      \   BIER-TE control protocol
359	                    |        |       |  e.g. Netconf/Restconf/Yang
360	                    v        v       v
361	    Src -> Rtr1 -> BFIR-----BFR-----BFER -> Rtr2 -> Rcvr

363	                   |<----------------->|
364	                 BIER-TE forwarding layer

366	                   |<- BIER-TE domain->|

368	                 |<--------------------->|
369	                     Routing underlay

371	                      Figure 3: BIER-TE architecture

373	2.1.  The Multicast Flow Overlay

375	   The Multicast Flow Overlay operates as in BIER.  See [RFC8279].
376	   Instead of interacting with the BIER forwarding layer (as in BIER),
377	   it interacts with the BIER-TE Controller Host.

379	2.2.  The BIER-TE Controller Host

381	   The BIER-TE controller host is representing the control plane of
382	   BIER-TE.  It communicates two sets of information with BFRs:

384	   During initial provisioning or modifications of the network topology,
385	   the controller discovers the network topology and creates the BIER-TE
386	   topology from it: determine which adjacencies are required/desired
387	   and assign BitPositions to them.  Then it signals the resulting of
388	   BitPositions and their adjacencies to each BFR to set up their BIER-
389	   TE BIFTs.

391	   During day-to-day operations of the network, the controller signals
392	   to BFIRs what multicast flows are mapped to what BitStrings.

394	   Communications between the BIER-TE controller host to BFRs is ideally
395	   via standardized protocols and data-models such as Netconf/Restconf/
396	   Yang.  This is currently outside the scope of this document.  Vendor-
397	   specific CLI on the BFRs is also a possible stopgap option (as in
398	   many other SDN solutions lacking definition of standardized data
399	   model).

401	   For simplicity, the procedures of the BIER-TE controller host are
402	   described in this document as if it is a single, centralized
403	   automated entity, such as an SDN controller.  It could equally be an
404	   operator setting up CLI on the BFRs.  Distribution of the functions
405	   of the BIER-TE controller host is currently outside the scope of this
406	   document.

408	2.2.1.  Assignment of BitPositions to adjacencies of the network
409	        topology

411	   The BIER-TE controller host tracks the BFR topology of the BIER-TE
412	   domain.  It determines what adjacencies require BitPositions so that
413	   BIER-TE explicit paths can be built through them as desired by
414	   operator policy.

416	   The controller then pushes the BitPositions/adjacencies to the BIFT
417	   of the BFRs, populating only those SI:BitPositions to the BIFT of
418	   each BFR to which that BFR should be able to send packets to -
419	   adjacencies connecting to this BFR.

421	2.2.2.  Changes in the network topology

423	   If the network topology changes (not failure based) so that
424	   adjacencies that are assigned to BitPositions are no longer needed,
425	   the controller can re-use those BitPositions for new adjacencies.
426	   First, these BitPositions need to be removed from any BFIR flow state
427	   and BFR BIFT state, then they can be repopulated, first into BIFT and
428	   then into the BFIR.

430	2.2.3.  Set up per-multicast flow BIER-TE state

432	   The BIER-TE controller host interacts with the multicast flow overlay
433	   to determine what multicast flow needs to be sent by a BFIR to which
434	   set of BFER.  It calculates the desired distribution tree across the
435	   BIER-TE domain based on algorithms outside the scope of this document
436	   (e.g.  CSFP, Steiner Tree, ...).  It then pushes the calculated
437	   BitString into the BFIR.

439	   See [I-D.ietf-bier-multicast-http-response] for a solution describing
440	   this interaction.

442	2.2.4.  Link/Node Failures and Recovery

444	   When link or nodes fail or recover in the topology, BIER-TE can
445	   quickly respond with the optional FRR procedures described in [I-
446	   D.eckert-bier-te-frr].  It can also more slowly react by
447	   recalculating the BitStrings of affected multicast flows.  This
448	   reaction is slower than the FRR procedure because the controller
449	   needs to receive link/node up/down indications, recalculate the
450	   desired BitStrings and push them down into the BFIRs.  With FRR, this
451	   is all performed locally on a BFR receiving the adjacency up/down
452	   notification.

454	2.3.  The BIER-TE Forwarding Layer

456	   When the BIER-TE Forwarding Layer receives a packet, it simply looks
457	   up the BitPositions that are set in the BitString of the packet in
458	   the Bit Index Forwarding Table (BIFT) that was populated by the BIER-
459	   TE controller host.  For every BP that is set in the BitString, and
460	   that has one or more adjacencies in the BIFT, a copy is made
461	   according to the type of adjacencies for that BP in the BIFT.  Before
462	   sending any copy, the BFR resets all BP in the BitString of the
463	   packet for which the BFR has one or more adjacencies in the BIFT,
464	   except when the adjacency indicates "DoNotReset" (DNR, see
465	   Section 3.2.1).  This is done to inhibit that packets can loop.

467	2.4.  The Routing Underlay

469	   BIER-TE is sending BIER packets to directly connected BIER-TE
470	   neighbors as L2 (unicasted) BIER packets without requiring a routing
471	   underlay.  BIER-TE forwarding uses the Routing underlay for
472	   forward_routed adjacencies which copy BIER-TE packets to not-
473	   directly-connected BFRs (see below for adjacency definitions).

475	   If the BFR intends to support FRR for BIER-TE, then the BIER-TE
476	   forwarding plane needs to receive fast adjacency up/down
477	   notifications: Link up/down or neighbor up/down, e.g. from BFD.
478	   Providing these notifications is considered to be part of the routing
479	   underlay in this document.

481	3.  BIER-TE Forwarding

483	3.1.  The Bit Index Forwarding Table (BIFT)

485	   The Bit Index Forwarding Table (BIFT) exists in every BFR.  For every
486	   subdomain in use, it is a table indexed by SI:BitPosition and is
487	   populated by the BIER-TE control plane.  Each index can be empty or
488	   contain a list of one or more adjacencies.

490	   BIER-TE can support multiple subdomains like BIER.  Each one with a
491	   separate BIFT

493	   In the BIER architecture, indices into the BIFT are explained to be
494	   both BFR-id and SI:BitString (BitPosition).  This is because there is
495	   a 1:1 relationship between BFR-id and SI:BitString - every bit in
496	   every SI is/can be assigned to a BFIR/BFER.  In BIER-TE there are
497	   more bits used in each BitString than there are BFIR/BFER assigned to
498	   the bitstring.  This is because of the bits required to express the
499	   (traffic engineered) path through the topology.  The BIER-TE
500	   forwarding definitions do therefore not use the term BFR-id at all.
501	   Instead, BFR-ids are only used as required by routing underlay, flow
502	   overlay of BIER headers.  Please refer to Section 7 for explanations
503	   how to deal with SI, subdomains and BFR-id in BIER-TE.

505	     ------------------------------------------------------------------
506	     | Index:          |  Adjacencies:                                |
507	     | SI:BitPosition  |  <empty> or one or more per entry            |
508	     ==================================================================
509	     | 0:1             |  forward_connected(interface,neighbor{,DNR}) |
510	     ------------------------------------------------------------------
511	     | 0:2             |  forward_connected(interface,neighbor{,DNR}) |
512	     |                 |  forward_connected(interface,neighbor{,DNR}) |
513	     ------------------------------------------------------------------
514	     | 0:3             |  local_decap({VRF})                          |
515	     ------------------------------------------------------------------
516	     | 0:4             |  forward_routed({VRF,}l3-neighbor)           |
517	     ------------------------------------------------------------------
518	     | 0:5             |  <empty>                                     |
519	     ------------------------------------------------------------------
520	     | 0:6             |  ECMP({adjacency1,...adjacencyN}, seed)      |
521	     ------------------------------------------------------------------
522	     ...
523	     | BitStringLength |  ...                                         |
524	     ------------------------------------------------------------------
525	                      Bit Index Forwarding Table

527	                        Figure 4: BIFT adjacencies

529	   The BIFT is programmed into the data plane of BFRs by the BIER-TE
530	   controller host and used to forward packets, according to the rules
531	   specified in the BIER-TE Forwarding Procedures.

533	   Adjacencies for the same BP when populated in more than one BFR by
534	   the controller does not have to have the same adjacencies.  This is
535	   up to the controller.  BPs for p2p links are one case (see below).

537	3.2.  Adjacency Types

539	3.2.1.  Forward Connected

541	   A "forward_connected" adjacency is towards a directly connected BFR
542	   neighbor using an interface address of that BFR on the connecting
543	   interface.  A forward_connected adjacency does not route packets but
544	   only L2 forwards them to the neighbor.

546	   Packets sent to an adjacency with "DoNotReset" (DNR) set in the BIFT
547	   will not have the BitPosition for that adjacency reset when the BFR
548	   creates a copy for it.  The BitPosition will still be reset for
549	   copies of the packet made towards other adjacencies.  This can be
550	   used for example in ring topologies as explained below.

552	3.2.2.  Forward Routed

554	   A "forward_routed" adjacency is an adjacency towards a BFR that is
555	   not a forward_connected adjacency: towards a loopback address of a
556	   BFR or towards an interface address that is non-directly connected.
557	   Forward_routed packets are forwarded via the Routing Underlay.

559	   If the Routing Underlay has multiple paths for a forward_routed
560	   adjacency, it will perform ECMP independent of BIER-TE for packets
561	   forwarded across a forward_routed adjacency.  This is independent of
562	   BIER-TE ECMP described in Section 3.2.3.

564	   If the Routing Underlay has FRR, it will perform FRR independent of
565	   BIER-TE for packets forwarded across a forward_routed adjacency.

567	3.2.3.  ECMP

569	   The ECMP mechanisms in BIER are tied to the BIER BIFT and are
570	   therefore not directly useable with BIER-TE.  The following
571	   procedures describe ECMP for BIER-TE that we consider to be
572	   lightweight but also well manageable.  It leverages the existing
573	   entropy parameter in the BIER header to keep packets of the flows on
574	   the same path and it introduces a "seed" parameter to allow
575	   engineering traffic to be polarized or randomized across multiple
576	   hops.

578	   An "Equal Cost Multipath" (ECMP) adjacency has a list of two or more
579	   adjacencies included in it.  It copies the BIER-TE to one of those
580	   adjacencies based on the ECMP hash calculation.  The BIER-TE ECMP
581	   hash algorithm must select the same adjacency from that list for all
582	   packets with the same "entropy" value in the BIER-TE header if the
583	   same number of adjacencies and same seed are given as parameters.
584	   Further use of the seed parameter is explained below.

586	3.2.4.  Local Decap

588	   A "local_decap" adjacency passes a copy of the payload of the BIER-TE
589	   packet to the packets NextProto within the BFR (IPv4/IPv6,
590	   Ethernet,...).  A local_decap adjacency turns the BFR into a BFER for
591	   matching packets.  Local_decap adjacencies require the BFER to
592	   support routing or switching for NextProto to determine how to
593	   further process the packet.

595	3.3.  Encapsulation considerations

597	   Specifications for BIER-TE encapsulation are outside the scope of
598	   this document.  This section gives explanations and guidelines.

600	   Because a BFR needs to interpret the BitString of a BIER-TE packet
601	   differently from a BIER packet, it is necessary to distinguish BIER
602	   from BIER-TE packets.  This is subject to definitions in BIER
603	   encapsulation specifications.

605	   MPLS encapsulation [RFC8296] for example assigns one label by which
606	   BFRs recognizes BIER packets for every (SI,subdomain) combination.
607	   If it is desirable that every subdomain can forward only BIER or
608	   BIER-TE packets, then the label allocation could stay the same, and
609	   only the forwarding model (BIER/BIER-TE) would have to be defined per
610	   subdomain.  If it is desirable to support both BIER and BIER-TE
611	   forwarding in the same subdomain, then additional labels would need
612	   to be assigned for BIER-TE forwarding.

614	   "forward_routed" requires an encapsulation permitting to unicast
615	   BIER-TE packets to a specific interface address on a target BFR.
616	   With MPLS encapsulation, this can simply be done via a label stack
617	   with that addresses label as the top label - followed by the label
618	   assigned to (SI,subdomain) - and if necessary (see above) BIER-TE.
619	   With non-MPLS encapsulation, some form of IP tunneling (IP in IP,
620	   LISP, GRE) would be required.

622	   The encapsulation used for "forward_routed" adjacencies can equally
623	   support existing advanced adjacency information such as "loose source
624	   routes" via e.g.  MPLS label stacks or appropriate header extensions
625	   (e.g. for IPv6).

627	3.4.  Basic BIER-TE Forwarding Example

629	   [RFC Editor: remove this section.]

631	   THIS SECTION TO BE REMOVED IN RFC BECAUSE IT WAS SUPERCEEDED BY
632	   SECTION 1.1 EXAMPLE - UNLESS REVIEWERS CHIME IN AND EXPRESS DESIRE TO
633	   KEEP THIS ADDITIONAL EXAMPLE SECTION.

635	   Step by step example of basic BIER-TE forwarding.  This does not use
636	   ECMP or forward_routed adjacencies nor does it try to minimize the
637	   number of required BitPositions for the topology.

639	               [Bier-Te Controller Host]
640	                       /   | \
641	                      v    v  v

643	           | p13   p1 |
644	           +- BFIR2 --+          |
645	           |          | p2   p6  |           LAN2
646	           |          +-- BFR3 --+           |
647	           |          |          |  p7  p11  |
648	      Src -+                     +-- BFER1 --+
649	           |          | p3   p8  |           |
650	           |          +-- BFR4 --+           +-- Rcv1
651	           |          |          |           |
652	           |          |
653	           | p14  p4  |
654	           +- BFIR1 --+          |
655	           |          +-- BFR5 --+ p10  p12  |
656	         LAN1         | p5   p9  +-- BFER2 --+
657	                                 |           +-- Rcv2
658	                                             |
659	                                             LAN3

661	          IP  |..... BIER-TE network......| IP

663	                   Figure 5: BIER-TE Forwarding Example

665	   pXX indicate the BitPositions number assigned by the BIER-TE
666	   controller host to adjacencies in the BIER-TE topology.  For example,
667	   p9 is the adjacency towards BFR5 on the LAN connecting to BFER2.

669	      BIFT BFIR2:
670	        p13: local_decap()
671	         p2: forward_connected(BFR3)

673	      BIFT BFR3:
674	         p1: forward_connected(BFIR2)
675	         p7: forward_connected(BFER1)
676	         p8: forward_connected(BFR4)

678	      BIFT BFER1:
679	        p11: local_decap()
680	         p6: forward_connected(BFR3)
681	         p8: forward_connected(BFR4)

683	             Figure 6: BIER-TE Forwarding Example Adjacencies

685	   ...and so on.

687	   For example, we assume that some multicast traffic seen on LAN1 needs
688	   to be sent via BIER-TE by BFIR2 towards Rcv1 and Rcv2.  The
689	   controller determines it wants it to pass this traffic across the
690	   following paths:

692	                 -> BFER1 ---------------> Rcv1
693	    BFIR2 -> BFR3
694	                 -> BFR4 -> BFR5 -> BFER2 -> Rcv2

696	                Figure 7: BIER-TE Forwarding Example Paths

698	   These paths equal to the following BitString: p2, p5, p7, p8, p10,
699	   p11, p12.

701	   This BitString is assigned by BFIR2 to the example multicast traffic
702	   received from LAN1.

704	   Then BFIR2 forwards this multicast traffic with BIER-TE based on that
705	   BitString.  The BIFT of BFIR2 has only p2 and p13 populated.  Only p2
706	   is in the BitString and this is an adjacency towards BFR3.  BFIR2
707	   therefore resets p2 in the BitString and sends a copy towards BFR2.

709	   BFR3 sees a BitString of p5,p7,p8,p10,p11,p12.  It is only interested
710	   in p1,p7,p8.  It creates a copy of the packet to BFER1 (due to p7)
711	   and one to BFR4 (due to p8).  It resets p7, p8 before sending.

713	   BFER1 sees a BitString of p5,p10,p11,p12.  It is only interested in
714	   p6,p7,p8,p11 and therefore considers only p11. p11 is a "local_decap"
715	   adjacency installed by the BIER-TE controller host because BFER1
716	   should pass packets to IP multicast.  The local_decap adjacency
717	   instructs BFER1 to create a copy, decapsulate it from the BIER header
718	   and pass it on to the NextProtocol, in this example IP multicast.  IP
719	   multicast will then forward the packet out to LAN2 because it did
720	   receive PIM or IGMP joins on LAN2 for the traffic.

722	   Further processing of the packet in BFR4, BFR5 and BFER2 accordingly.

724	3.5.  Forwarding comparison with BIER

726	   Forwarding of BIER-TE is designed to allow common forwarding hardware
727	   with BIER.  In fact, one of the main goals of this document is to
728	   encourage the building of forwarding hardware that can not only
729	   support BIER, but also BIER-TE - to allow experimentation with BIER-
730	   TE and support building of BIER-TE control plane code.

732	   The pseudocode in Section 6 shows how existing BIER/BIFT forwarding
733	   can be amended to support basic BIER-TE forwarding, by using BIER
734	   BIFT's F-BM.  Only the masking of bits due to avoid duplicates must
735	   be skipped when forwarding is for BIER-TE.

737	   Whether to use BIER or BIER-TE forwarding can simply be a configured
738	   choice per subdomain and accordingly be set up by a BIER-TE
739	   controller host.  The BIER packet encapsulation [RFC8296] too can be
740	   reused without changes except that the currently defined BIER-TE ECMP
741	   adjacency does not leverage the entropy field so that field would be
742	   unused when BIER-TE forwarding is used.

744	3.6.  Requirements

746	   Basic BIER-TE forwarding MUST support to configure Subdomains to use
747	   basic BIER-TE forwarding rules (instead of BIER).  With basic BIER-TE
748	   forwarding, every bit MUST support to have zero or one adjacency.  It
749	   MUST support the adjacency types forward_connected without DNR flag,
750	   forward_routed and local_decap.  All other BIER-TE forwarding
751	   features are optional.  These basic BIER-TE requirements make BIER-TE
752	   forwarding exactly the same as BIER forwarding with the exception of
753	   skipping the aforementioned F-BM masking on egress.

755	   BIER-TE forwarding SHOULD support the DNR flag, as this is highly
756	   useful to save bits in rings (see Section 4.6).

758	   BIER-TE forwarding MAY support more than one adjacency on a bit and
759	   ECMP adjacencies.  The importance of ECMP adjacencies is unclear when
760	   traffic engineering is used because it may be more desirable to
761	   explicitly steer traffic across non-ECMP paths to make per-path
762	   traffic calculation easier for controllers.  Having more than one
763	   adjacency for a bit allows further savings of bits in hub&spoke
764	   scenarios, but unlike rings it is less "natural" to flood traffic
765	   across multiple links unconditional.  Both ECMP and multiple
766	   adjacencies are forwarding plane features that should be possible to
767	   support later when needed as they do not impact the basic BIER-TE
768	   replication loop.  This is true because there is no inter-copy
769	   dependency through resetting of F-BM as in BIER.

771	4.  BIER-TE Controller Host BitPosition Assignments

773	   This section describes how the BIER-TE controller host can use the
774	   different BIER-TE adjacency types to define the BitPositions of a
775	   BIER-TE domain.

777	   Because the size of the BitString is limiting the size of the BIER-TE
778	   domain, many of the options described exist to support larger
779	   topologies with fewer BitPositions (4.1, 4.3, 4.4, 4.5, 4.6, 4.7,
780	   4.8).

782	4.1.  P2P Links

784	   Each P2p link in the BIER-TE domain is assigned one unique
785	   BitPosition with a forward_connected adjacency pointing to the
786	   neighbor on the p2p link.

788	4.2.  BFER

790	   Every non-Leaf BFER is given a unique BitPosition with a local_decap
791	   adjacency.

793	4.3.  Leaf BFERs

795	           BFR1(P) BFR2(P)             BFR1(P)  BFR2(P)
796	             |  \ /  |                    |       |
797	             |   X   |                    |       |
798	             |  / \  |                    |       |
799	        BFER1(PE)  BFER2(PE)        BFER1(PE)----BFER2(PE)

801	            Leaf BFER /               Non-Leaf BFER /
802	             PE-router                  PE-router

804	                 Figure 8: Leaf vs. non-Leaf BFER Example

806	   Leaf BFERs are BFERs where incoming BIER-TE packets never need to be
807	   forwarded to another BFR but are only sent to the BFER to exit the
808	   BIER-TE domain.  For example, in networks where PEs are spokes
809	   connected to P routers, those PEs are Leaf BFERs unless there is a
810	   U-turn between two PEs.  Consider how redundant disjoint traffic can
811	   reach BFER1/BFER2 in above picture: When BFER1/BFER2 are Non-Leaf
812	   BFER as shown on the right hand side, one traffic copy would be
813	   forwarded to BFER1 from BFR1, but the other one could only reach
814	   BFER1 via BFER2, which makes BFER2 a non-Leaf BFER.  Likewise BFER1
815	   is a non-Leaf BFER when forwarding traffic to BFER2.

817	   Note that the BFERs in the left hand picture are only guaranteed to
818	   be leaf-BFER by fitting routing configuration that prohibits transit
819	   traffic to pass through the BFERs, which is commonly applied in these
820	   topologies.

822	   All leaf-BFER in a BIER-TE domain can share a single BitPosition.
823	   This is possible because the BitPosition for the adjacency to reach
824	   the BFER can be used to distinguish whether or not packets should
825	   reach the BFER.

827	   This optimization will not work if an upstream interface of the BFER
828	   is using a BitPosition optimized as described in the following two
829	   sections (LAN, Hub and Spoke).

831	4.4.  LANs

833	   In a LAN, the adjacency to each neighboring BFR on the LAN is given a
834	   unique BitPosition.  The adjacency of this BitPosition is a
835	   forward_connected adjacency towards the BFR and this BitPosition is
836	   populated into the BIFT of all the other BFRs on that LAN.

838	            BFR1
839	             |p1
840	      LAN1-+-+---+-----+
841	          p3|  p4|   p2|
842	          BFR3 BFR4  BFR7

844	                           Figure 9: LAN Example

846	   If Bandwidth on the LAN is not an issue and most BIER-TE traffic
847	   should be copied to all neighbors on a LAN, then BitPositions can be
848	   saved by assigning just a single BitPosition to the LAN and
849	   populating the BitPosition of the BIFTs of each BFRs on the LAN with
850	   a list of forward_connected adjacencies to all other neighbors on the
851	   LAN.

853	   This optimization does not work in the case of BFRs redundantly
854	   connected to more than one LANs with this optimization because these
855	   BFRs would receive duplicates and forward those duplicates into the
856	   opposite LANs.  Adjacencies of such BFRs into their LANs still need a
857	   separate BitPosition.

859	4.5.  Hub and Spoke

861	   In a setup with a hub and multiple spokes connected via separate p2p
862	   links to the hub, all p2p links can share the same BitPosition.  The
863	   BitPosition on the hub's BIFT is set up with a list of
864	   forward_connected adjacencies, one for each Spoke.

866	   This option is similar to the BitPosition optimization in LANs:
867	   Redundantly connected spokes need their own BitPositions.

869	   This type of optimized BP could be used for example when all traffic
870	   is "broadcast" traffic (very dense receiver set) such as live-TV or
871	   situation-awareness (SA).  This BP optimization can then be used to
872	   explicitly steer different traffic flows across different ECMP paths
873	   in Data-Center or broadband-aggregation networks with minimal use of
874	   BPs.

876	4.6.  Rings

878	   In L3 rings, instead of assigning a single BitPosition for every p2p
879	   link in the ring, it is possible to save BitPositions by setting the
880	   "Do Not Reset" (DNR) flag on forward_connected adjacencies.

882	   For the rings shown in the following picture, a single BitPosition
883	   will suffice to forward traffic entering the ring at BFRa or BFRb all
884	   the way up to BFR1:

886	   On BFRa, BFRb, BFR30,... BFR3, the BitPosition is populated with a
887	   forward_connected adjacency pointing to the clockwise neighbor on the
888	   ring and with DNR set.  On BFR2, the adjacency also points to the
889	   clockwise neighbor BFR1, but without DNR set.

891	   Handling DNR this way ensures that copies forwarded from any BFR in
892	   the ring to a BFR outside the ring will not have the ring BitPosition
893	   set, therefore minimizing the chance to create loops.

895	                  v        v
896	                  |        |
897	           L1     |   L2   |   L3
898	       /-------- BFRa ---- BFRb --------------------\
899	       |                                            |
900	       \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/
901	           |      |    L4               |      |
902	        p33|                         p15|
903	           BFRd                       BFRc

905	                          Figure 10: Ring Example

907	   Note that this example only permits for packets to enter the ring at
908	   BFRa and BFRb, and that packets will always travel clockwise.  If
909	   packets should be allowed to enter the ring at any ring BFR, then one
910	   would have to use two ring BitPositions.  One for clockwise, one for
911	   counterclockwise.

913	   Both would be set up to stop rotating on the same link, e.g.  L1.
914	   When the ingress ring BFR creates the clockwise copy, it will reset
915	   the counterclockwise BitPosition because the DNR bit only applies to
916	   the bit for which the replication is done.  Likewise for the
917	   clockwise BitPosition for the counterclockwise copy.  In result, the
918	   ring ingress BFR will send a copy in both directions, serving BFRs on
919	   either side of the ring up to L1.

921	4.7.  Equal Cost MultiPath (ECMP)

923	   The ECMP adjacency allows to use just one BP per link bundle between
924	   two BFRs instead of one BP for each p2p member link of that link
925	   bundle.  In the following picture, one BP is used across L1,L2,L3.

927	                --L1-----
928	           BFR1 --L2----- BFR2
929	                --L3-----

931	     BIFT entry in BFR1:
932	     ------------------------------------------------------------------
933	     | Index |  Adjacencies                                           |
934	     ==================================================================
935	     | 0:6   |  ECMP({forward_connected(L1, BFR2),                    |
936	     |       |        forward_connected(L2, BFR2),                    |
937	     |       |        forward_connected(L3, BFR2)}, seed)             |
938	     ------------------------------------------------------------------

940	     BIFT entry in BFR2:
941	     ------------------------------------------------------------------
942	     | Index |  Adjacencies                                           |
943	     ==================================================================
944	     | 0:6   |  ECMP({forward_connected(L1, BFR1),                    |
945	     |       |        forward_connected(L2, BFR1),                    |
946	     |       |        forward_connected(L3, BFR1)}, seed)             |
947	     ------------------------------------------------------------------

949	                          Figure 11: ECMP Example

951	   This document does not standardize any ECMP algorithm because it is
952	   sufficient for implementations to document their freely chosen ECMP
953	   algorithm.  This allows the BIER-TE controller host to calculate ECMP
954	   paths and seeds.  The following picture shows an example ECMP
955	   algorithm:

957	      forward(packet, ECMP(adj(0), adj(1),... adj(N-1), seed)):
958	         i = (packet(bier-header-entropy) XOR seed) % N
959	         forward packet to adj(i)

961	                     Figure 12: ECMP algorithm Example

963	   In the following example, all traffic from BFR1 towards BFR10 is
964	   intended to be ECMP load split equally across the topology.  This
965	   example is not meant as a likely setup, but to illustrate that ECMP
966	   can be used to share BPs not only across link bundles, and it
967	   explains the use of the seed parameter.

969	                    BFR1         (BFIR)
970	                  /L11  \L12
971	                 /       \
972	             BFR2         BFR3
973	            /L21 \L22    /L31 \L32
974	           /      \     /      \
975	          BFR4  BFR5   BFR6  BFR7
976	           \      /     \      /
977	            \    /       \    /
978	             BFR8         BFR9
979	                 \       /
980	                  \     /
981	                   BFR10         (BFER)

983	     BIFT entry in BFR1:
984	     ------------------------------------------------------------------
985	     | 0:6   |  ECMP({forward_connected(L11, BFR2),                   |
986	     |       |        forward_connected(L12, BFR3)}, seed1)           |
987	     ------------------------------------------------------------------

989	     BIFT entry in BFR2:
990	     ------------------------------------------------------------------
991	     | 0:7   |  ECMP({forward_connected(L21, BFR4),                   |
992	     |       |        forward_connected(L22, BFR5)}, seed1)           |
993	     ------------------------------------------------------------------

995	     BIFT entry in BFR3:
996	     ------------------------------------------------------------------
997	     | 0:7   |  ECMP({forward_connected(L31, BFR6),                   |
998	     |       |        forward_connected(L32, BFR7)}, seed1)           |
999	     ------------------------------------------------------------------
1000	     BIFT entry in BFR4, BFR5:
1001	     ------------------------------------------------------------------
1002	     | 0:8   |  forward_connected(Lxx, BFR8)  |xx differs on BFR4/BFR5|
1003	     ------------------------------------------------------------------

1005	     BIFT entry in BFR6, BFR7:
1006	     ------------------------------------------------------------------
1007	     | 0:8   |  forward_connected(Lxx, BFR9)  |xx differs on BFR6/BFR7|
1008	     ------------------------------------------------------------------

1010	     BIFT entry in BFR8, BFR9:
1011	     ------------------------------------------------------------------
1012	     | 0:9   |  forward_connected(Lxx, BFR10) |xx differs on BFR8/BFR9|
1013	     ------------------------------------------------------------------

1015	                      Figure 13: Polarization Example

1017	   Note that for the following discussion of ECMP, only the BIFT ECMP
1018	   adjacencies on BFR1, BFR2, BFR3 are relevant.  The re-use of BP
1019	   across BFR in this example is further explained in Section 4.9 below.

1021	   With the setup of ECMP in above topology, traffic would not be
1022	   equally load-split.  Instead, links L22 and L31 would see no traffic
1023	   at all: BFR2 will only see traffic from BFR1 for which the ECMP hash
1024	   in BFR1 selected the first adjacency in the list of 2 adjacencies
1025	   given as parameters to the ECMP.  It is link L11-to-BFR2.  BFR2
1026	   performs again ECMP with two adjacencies on that subset of traffic
1027	   using the same seed1, and will therefore again select the first of
1028	   its two adjacencies: L21-to-BFR4.  And therefore L22 and BFR5 sees no
1029	   traffic.  Likewise for L31 and BFR6.

1031	   This issue in BFR2/BFR3 is called polarization.  It results from the
1032	   re-use of the same hash function across multiple consecutive hops in
1033	   topologies like these.  To resolve this issue, the ECMP adjacency on
1034	   BFR1 can be set up with a different seed2 than the ECMP adjacencies
1035	   on BFR2/BFR3.  BFR2/BFR3 can use the same hash because packets will
1036	   not sequentially pass across both of them.  Therefore, they can also
1037	   use the same BP 0:7.

1039	   Note that ECMP solutions outside of BIER often hide the seed by auto-
1040	   selecting it from local entropy such as unique local or next-hop
1041	   identifiers.  The solutions choosen for BIER-TE to allow the
1042	   controller to explicitly set the seed maximizes the ability of the
1043	   controller to choose the seed, independent of such seed source that
1044	   the controller may not be able to control well, and even calculate
1045	   optimized seeds for multi-hop cases.

1047	4.8.  Routed adjacencies

1049	4.8.1.  Reducing BitPositions

1051	   Routed adjacencies can reduce the number of BitPositions required
1052	   when the traffic engineering requirement is not hop-by-hop explicit
1053	   path selection, but loose-hop selection.  Routed adjacencies can also
1054	   allow to operate BIER-TE across intermediate hop routers that do not
1055	   support BIER-TE.

1057	                      ...............
1058	            ...BFR1--...           ...--L1-- BFR2...
1059	                     ... .Routers. ...--L2--/
1060	            ...BFR4--...           ...------ BFR3...
1061	                      ...............         |
1062	                                             LO
1063	                       Network Area 1

1065	                   Figure 14: Routed Adjacencies Example

1067	   Assume the requirement in the above picture is to explicitly steer
1068	   traffic flows that have arrived at BFR1 or BFR4 via a shortest path
1069	   in the routing underlay "Network Area 1" to one of the following
1070	   three next segments: (1) BFR2 via link L1, (2) BFR2 via link L2, (3)
1071	   via BFR3.

1073	   To enable this, both BFR1 and BFR4 are set up with a forward_routed
1074	   adjacency BitPosition towards an address of BFR2 on link L1, another
1075	   forward_routed BitPosition towards an address of BFR2 on link L2 and
1076	   a third forward_routed Bitposition towards a node address LO of BFR3.

1078	4.8.2.  Supporting nodes without BIER-TE

1080	   Routed adjacencies also enable incremental deployment of BIER-TE.
1081	   Only the nodes through which BIER-TE traffic needs to be steered -
1082	   with or without replication - need to support BIER-TE.  Where they
1083	   are not directly connected to each other, forward_routed adjacencies
1084	   are used to pass over non BIER-TE enabled nodes.

1086	4.9.  Reuse of BitPositions (without DNR)

1088	   BitPositions can be re-used across multiple BFR to minimize the
1089	   number of BP needed.  This happens when adjacencies on multiple BFR
1090	   use the DNR flag as described above, but it can also be done for non-
1091	   DNR adjacencies.  This section only discussses this non-DNR case.

1093	   Because BP are reset after passing a BFR with an adjacency for that
1094	   BP, reuse of BP across multiple BFR does not introduce any problems
1095	   with duplicates or loops that do not also exist when every adjacency
1096	   has a unique BP: Instead of setting one BP in a BitString that is
1097	   reused in N-adjacencies, one would get the same or worse results if
1098	   each of these adjacencies had a unique BP and all of them where set
1099	   in the BitString.  Instead, based on the case, BPs can be reused
1100	   without limitation, or they introduce fewer path engineering choices,
1101	   or they do not work.

1103	   BP cannot be reused across two BFR that would need to be passed
1104	   sequentially for some path: The first BFR will reset the BP, so those
1105	   paths cannot be built.  BP can be set across BFR that would (A) only
1106	   occur across different paths or (B) across different branches of the
1107	   same tree.

1109	   An example of (A) was given in Figure 13, where BP 0:7, BP 0:8 and BP
1110	   0:9 are each reused across multiple BFR because a single packet/path
1111	   would never be able to reach more than one BFR sharing the same BP.

1113	   Assume the example was changed: BFR1 has no ECMP adjacency for BP
1114	   0:6, but instead BP 0:5 with forward_connected to BFR2 and BP 0:6
1115	   with forward_connected to BFR3.  Packets with both BP 0:5 and BP 0:6
1116	   would now be able to reach both BFR2 and BFR3 and the still existing
1117	   re-use of BP 0:7 between BFR2 and BFR3 is a case of (B) where reuse
1118	   of BP is perfect because it does not limit the set of useful path
1119	   choices:

1121	   If instead of reusing BP 0:7, BFR3 used a separate BP 0:10 for its
1122	   ECMP adjacency, no useful additional path engineering would be
1123	   enabled.  If duplicates at BFR10 where undesirable, this would be
1124	   done by not setting BP 0:5 and BP 0:6 for the same packet.  If the
1125	   duplicates where desirable (e.g.: resilient transmission), the
1126	   additional BP 0:10 would also not render additional value.

1128	   Reuse may also save BPs in larger topologies.  Consider the topology
1129	   shown in Figure 17, but only the following explanations: A BFIR/
1130	   sender (e.g.: video headend) is attached to area 1, and area 2...6
1131	   contain receivers/BFER.  Assume each area had a distribution ring,
1132	   each with two BPs to indicate the direction (as explained in before).
1133	   These two BPs could be reused across the 5 areas.  Packets would be
1134	   replicated through other BPs to the desired subset of areas, and once
1135	   a packet copy reaches the ring of the area, the two ring BPs come
1136	   into play.  This reuse is a case of (B), but it limits the topology
1137	   choices: Packets can only flow around the same direction in the rings
1138	   of all areas.  This may or may not be acceptable based on the desired
1139	   traffic engineering: If resilient transmission is the traffic
1140	   engineering goal, then it is likely a good optimization, if the
1141	   bandwidth of each ring was to be optimized separately, it would not
1142	   be a good limitation.

1144	4.10.  Summary of BP optimizations

1146	   This section reviewed a range of techniques by which a controller can
1147	   create a BIER-TE topology in a way that minimizes the number of
1148	   necessary BPs.

1150	   Without any optimization, a controller would attempt to map the
1151	   network subnet topology 1:1 into the BIER-TE topology and every
1152	   subnet adjacent neighbor requires a forward_connected BP and every
1153	   BFER requires a local_decap BP.

1155	   The optimizations described are then as follows:

1157	   o  P2p links require only one BP (Section 4.1).

1159	   o  All leaf-BFER can share a single local_decap BP (Section 4.3).

1161	   o  A LAN with N BFR needs at most N BP (one for each BFR).  It only
1162	      needs one BP for all those BFR tha are not redundanty connected to
1163	      multiple LANs (Section 4.4).

1165	   o  A hub with p2p connections to multiple non-leaf-BFER spokes can
1166	      share one BP to all spokes if traffic can be flooded to all
1167	      spokes, e.g.: because of no bandwidth concerns or dense receiver
1168	      sets (Section 4.5).

1170	   o  Rings of BFR can be built with just two BP (one for each
1171	      direction) except for BFR with multiple ring connections - similar
1172	      to LANs (Section 4.6).

1174	   o  ECMP adjacencies to N neighbors can replace N BP with 1 BP.
1175	      Multihop ECMP can avoid polarization through different seeds of
1176	      the ECMP algorithm (Section 4.7).

1178	   o  Routed adjacencies allow to "tunnel" across non-BIER-TE capable
1179	      routers and across BIER-TE capable routers where no traffic-
1180	      steering or replications are required (Section 4.8).

1182	   o  BP can generally be reused across nodes that do not need to be
1183	      consecutive in paths, but depending on scenario, this may limit
1184	      the feasible traffic engineering options (Section 4.9).

1186	   Note that the described list of optimizations is not exhaustive.
1187	   Especially when the set of required path engineering choices is
1188	   limited and the set of possible subsets of BFER that should be able
1189	   to receive traffic is limited, further optimizations of BP are
1190	   possible.  The hub & spoke optimization is a simple example of such
1191	   traffic pattern dependent optimizations.

1193	5.  Avoiding loops and duplicates

1195	5.1.  Loops

1197	   Whenever BIER-TE creates a copy of a packet, the BitString of that
1198	   copy will have all BitPositions cleared that are associated with
1199	   adjacencies on the BFR.  This inhibits looping of packets.  The only
1200	   exception are adjacencies with DNR set.

1202	   With DNR set, looping can happen.  Consider in the ring picture that
1203	   link L4 from BFR3 is plugged into the L1 interface of BFRa.  This
1204	   creates a loop where the rings clockwise BitPosition is never reset
1205	   for copies of the packets traveling clockwise around the ring.

1207	   To inhibit looping in the face of such physical misconfiguration,
1208	   only forward_connected adjacencies are permitted to have DNR set, and
1209	   the link layer port unique unicast destination address of the
1210	   adjacency (e.g.  MAC address) protects against closing the loop.
1211	   Link layers without port unique link layer addresses should not be
1212	   used with the DNR flag set.

1214	5.2.  Duplicates

1216	   Duplicates happen when the topology of the BitString is not a tree
1217	   but redundantly connecting BFRs with each other.  The controller must
1218	   therefore ensure to only create BitStrings that are trees in the
1219	   topology.

1221	   When links are incorrectly physically re-connected before the
1222	   controller updates BitStrings in BFIRs, duplicates can happen.  Like
1223	   loops, these can be inhibited by link layer addressing in
1224	   forward_connected adjacencies.

1226	   If interface or loopback addresses used in forward_routed adjacencies
1227	   are moved from one BFR to another, duplicates can equally happen.
1228	   Such re-addressing operations must be coordinated with the
1229	   controller.

1231	6.  BIER-TE Forwarding Pseudocode

1233	   The following simplified pseudocode for BIER-TE forwarding is using
1234	   BIER forwarding pseudocode of [RFC8279], section 6.5 with the one
1235	   modification necessary to support basic BIER-TE forwarding.  Like the
1236	   BIER pseudo forwarding code, for simplicity it does hide the details
1237	   of the adjacency processing inside PacketSend() which can be
1238	   forward_connected, forward_routed or local_decap.

1240	      void ForwardBitMaskPacket_withTE (Packet)
1241	      {
1242	          SI=GetPacketSI(Packet);
1243	          Offset=SI*BitStringLength;
1244	          for (Index = GetFirstBitPosition(Packet->BitString); Index ;
1245	               Index = GetNextBitPosition(Packet->BitString, Index)) {
1246	              F-BM = BIFT[Index+Offset]->F-BM;
1247	              if (!F-BM) continue;
1248	              BFR-NBR = BIFT[Index+Offset]->BFR-NBR;
1249	              PacketCopy = Copy(Packet);
1250	              PacketCopy->BitString &= F-BM;                  [2]
1251	              PacketSend(PacketCopy, BFR-NBR);
1252	              // The following must not be done for BIER-TE:
1253	              // Packet->BitString &= ~F-BM;                  [1]
1254	          }
1255	      }

1257	            Figure 15: Simplified BIER-TE Forwarding Pseudocode

1259	   The difference is that in BIER-TE, step [1] must not be performed,
1260	   but is replaced with [2] (when the forwarding plane algorithm is
1261	   implemented verbatim as shown above).

1263	   In BIER, the F-BM of a BP has all BP set that are meant to be
1264	   forwarded via the same neighbor.  It is used to reset those BP in the
1265	   packet after the first copy to this neighbor has been made to inhibit
1266	   multiple copies to the same neighbor.

1268	   In BIER-TE, the F-BM of a particular BP with an adjacency is the list
1269	   of all BPs with an adjacency on this BFR except the particular BP
1270	   itself if it has an adjacency with the DNR bit set.  The F-BM is used
1271	   to reset the F-BM BPs before creating copies.

1273	   In BIER, the order of BPs impacts the result of forwarding because of
1274	   [1].  In BIER-TE, forwarding is not impacted by the order of BPs.  It
1275	   is therefore possible to further optimize forwarding than in BIER.
1276	   For example, BIER-TE forwarding can be parallelized such that a
1277	   parallel instance (such as an egres linecard) can process any subset
1278	   of BPs without any considerations for the other BPs - and without any
1279	   prior, cross-BP shared processing.

1281	   The above simplified pseudocode is elaborated further as follows:

1283	   o  This pseudocode eliminates per-bit F-BM, therefore reducing state
1284	      by BitStringLength^2*SI and eliminating the need for per-packet-
1285	      copy masking operation except for adjacencies with DNR flag set:

1287	      *  AdjacentBits[SI] are bits with a non-empty list of adjacencies.
1288	         This can be computed whenever the BIER-TE controller host
1289	         updates the adjacencies.

1291	      *  Only the AdjacentBits need to be examined in the loop for
1292	         packet copies.

1294	      *  The packets BitString is masked with those AdjacentBits on
1295	         ingress to avoid packets looping.

1297	   o  The code loops over the adjacencies because there may be more than
1298	      one adjacency for a bit.

1300	   o  When an adjacency has the DNR bit, the bit is set in the packet
1301	      copy (to save bits in rings for example).

1303	   o  The ECMP adjacency is shown.  Its parameters are a
1304	      ListOfAdjacencies from which one is picked.

1306	   o  The forward_local, forward_routed, local_decap adjacencies are
1307	      shown with their parameters.

1309	     void ForwardBitMaskPacket_withTE (Packet)
1310	     {
1311	         SI=GetPacketSI(Packet);
1312	         Offset=SI*BitStringLength;
1313	         AdjacentBitstring = Packet->BitString &= ~AdjacentBits[SI];
1314	         Packet->BitString &= AdjacentBits[SI];
1315	         for (Index = GetFirstBitPosition(AdjacentBits); Index ;
1316	              Index = GetNextBitPosition(AdjacentBits, Index)) {
1317	             foreach adjacency BIFT[Index+Offset] {
1318	                 if(adjacency == ECMP(ListOfAdjacencies, seed) ) {
1319	                     I = ECMP_hash(sizeof(ListOfAdjacencies),
1320	                                   Packet->Entropy, seed);
1321	                     adjacency = ListOfAdjacencies[I];
1322	                 }
1323	                 PacketCopy = Copy(Packet);
1324	                 switch(adjacency) {
1325	                     case forward_connected(interface,neighbor,DNR):
1326	                         if(DNR)
1327	                             PacketCopy->BitString |= 2<<(Index-1);
1328	                         SendToL2Unicast(PacketCopy,interface,neighbor);

1330	                     case forward_routed({VRF},neighbor):
1331	                         SendToL3(PacketCopy,{VRF,}l3-neighbor);

1333	                     case local_decap({VRF},neighbor):
1334	                         DecapBierHeader(PacketCopy);
1335	                         PassTo(PacketCopy,{VRF,}Packet->NextProto);
1336	                 }
1337	             }
1338	         }
1339	     }

1341	                 Figure 16: BIER-TE Forwarding Pseudocode

1343	7.  Managing SI, subdomains and BFR-ids

1345	   When the number of bits required to represent the necessary hops in
1346	   the topology and BFER exceeds the supported bitstring length,
1347	   multiple SI and/or subdomains must be used.  This section discusses
1348	   how.

1350	   BIER-TE forwarding does not require the concept of BFR-id, but
1351	   routing underlay, flow overlay and BIER headers may.  This section
1352	   also discusses how BFR-ids can be assigned to BFIR/BFER for BIER-TE.

1354	7.1.  Why SI and sub-domains

1356	   For BIER and BIER-TE forwarding, the most important result of using
1357	   multiple SI and/or subdomains is the same: Packets that need to be
1358	   sent to BFER in different SI or subdomains require different BIER
1359	   packets: each one with a bitstring for a different (SI,subdomain)
1360	   combination.  Each such bitstring uses one bitstring length sized SI
1361	   block in the BIFT of the subdomain.  We call this a BIFT:SI (block).

1363	   For BIER and BIER-TE forwarding itself there is also no difference
1364	   whether different SI and/or sub-domains are chosen, but SI and
1365	   subdomain have different purposes in the BIER architecture shared by
1366	   BIER-TE.  This impacts how operators are managing them and how
1367	   especially flow overlays will likely use them.

1369	   By default, every possible BFIR/BFER in a BIER network would likely
1370	   be given a BFR-id in subdomain 0 (unless there are > 64k BFIR/BFER).

1372	   If there are different flow services (or service instances) requiring
1373	   replication to different subsets of BFER, then it will likely not be
1374	   possible to achieve the best replication efficiency for all of these
1375	   service instances via subdomain 0.  Ideal replication efficiency for
1376	   N BFER exists in a subdomain if they are split over not more than
1377	   ceiling(N/bitstring-length) SI.

1379	   If service instances justify additional BIER:SI state in the network,
1380	   additional subdomains will be used: BFIR/BFER are assigned BFIR-id in
1381	   those subdomains and each service instance is configured to use the
1382	   most appropriate subdomain.  This results in improved replication
1383	   efficiency for different services.

1385	   Even if creation of subdomains and assignment of BFR-id to BFIR/BFER
1386	   in those subdomains is automated, it is not expected that individual
1387	   service instances can deal with BFER in different subdomains.  A
1388	   service instance may only support configuration of a single subdomain
1389	   it should rely on.

1391	   To be able to easily reuse (and modify as little as possible)
1392	   existing BIER procedures including flow-overlay and routing underlay,
1393	   when BIER-TE forwarding is added, we therefore reuse SI and subdomain
1394	   logically in the same way as they are used in BIER: All necessary
1395	   BFIR/BFER for a service use a single BIER-TE BIFT and are split
1396	   across as many SI as necessary (see below).  Different services may
1397	   use different subdomains that primarily exist to provide more
1398	   efficient replication (and for BIER-TE desirable traffic engineering)
1399	   for different subsets of BFIR/BFER.

1401	7.2.  Bit assignment comparison BIER and BIER-TE

1403	   In BIER, bitstrings only need to carry bits for BFER, which leads to
1404	   the model that BFR-ids map 1:1 to each bit in a bitstring.

1406	   In BIER-TE, bitstrings need to carry bits to indicate not only the
1407	   receiving BFER but also the intermediate hops/links across which the
1408	   packet must be sent.  The maximum number of BFER that can be
1409	   supported in a single bitstring or BIFT:SI depends on the number of
1410	   bits necessary to represent the desired topology between them.

1412	   "Desired" topology because it depends on the physical topology, and
1413	   on the desire of the operator to allow for explicit traffic
1414	   engineering across every single hop (which requires more bits), or
1415	   reducing the number of required bits by exploiting optimizations such
1416	   as unicast (forward_route), ECMP or flood (DNR) over "uninteresting"
1417	   sub-parts of the topology - e.g. parts where different trees do not
1418	   need to take different paths due to traffic-engineering reasons.

1420	   The total number of bits to describe the topology vs. the BFER in a
1421	   BIFT:SI can range widely based on the size of the topology and the
1422	   amount of alternative paths in it.  The higher the percentage, the
1423	   higher the likelihood, that those topology bits are not just BIER-TE
1424	   overhead without additional benefit, but instead that they will allow
1425	   to express desirable traffic-engineering path alternatives.

1427	7.3.  Using BFR-id with BIER-TE

1429	   Because there is no 1:1 mapping between bits in the bitstring and
1430	   BFER, BIER-TE cannot simply rely on the BIER 1:1 mapping between bits
1431	   in a bitstring and BFR-id.

1433	   In BIER, automatic schemes could assign all possible BFR-ids
1434	   sequentially to BFERs.  This will not work in BIER-TE.  In BIER-TE,
1435	   the operator or BIER-TE controller host has to determine a BFR-id for
1436	   each BFER in each required subdomain.  The BFR-id may or may not have
1437	   a relationship with a bit in the bitstring.  Suggestions are detailed
1438	   below.  Once determined, the BFR-id can then be configured on the
1439	   BFER and used by flow overlay, routing underlay and the BIER header
1440	   almost the same as the BFR-id in BIER.

1442	   The one exception are application/flow-overlays that automatically
1443	   calculate the bitstring(s) of BIER packets by converting BFR-id to
1444	   bits.  In BIER-TE, this operation can be done in two ways:

1446	   "Independent branches": For a given application or (set of) trees,
1447	   the branches from a BFIR to every BFER are independent of the
1448	   branches to any other BFER.  For example, shortest part trees have
1449	   independent branches.

1451	   "Interdependent branches": When a BFER is added or deleted from a
1452	   particular distribution tree, branches to other BFER still in the
1453	   tree may need to change.  Steiner tree are examples of dependent
1454	   branch trees.

1456	   If "independent branches" are sufficient, the BIER-TE controller host
1457	   can provide to such applications for every BFR-id a SI:bitstring with
1458	   the BIER-TE bits for the branch towards that BFER.  The application
1459	   can then independently calculate the SI:bitstring for all desired
1460	   BFER by OR'ing their bitstrings.

1462	   If "interdependent branches" are required, the application could call
1463	   a BIER-TE controller host API with the list of required BFER-id and
1464	   get the required bitstring back.  Whenever the set of BFER-id
1465	   changes, this is repeated.

1467	   Note that in either case (unlike in BIER), the bits in BIER-TE may
1468	   need to change upon link/node failure/recovery, network expansion and
1469	   network load by other traffic (as part of traffic engineering goals).
1470	   Interactions between such BFIR applications and the BIER-TE
1471	   controller host do therefore need to support dynamic updates to the
1472	   bitstrings.

1474	7.4.  Assigning BFR-ids for BIER-TE

1476	   For a non-leaf BFER, there is usually a single bit k for that BFER
1477	   with a local_decap() adjacency on the BFER.  The BFR-id for such a
1478	   BFER is therefore most easily the one it would have in BIER: SI *
1479	   bitstring-length + k.

1481	   As explained earlier in the document, leaf BFERs do not need such a
1482	   separate bit because the fact alone that the BIER-TE packet is
1483	   forwarded to the leaf BFER indicates that the BFER should decapsulate
1484	   it.  Such a BFER will have one or more bits for the links leading
1485	   only to it.  The BFR-id could therefore most easily be the BFR-id
1486	   derived from the lowest bit for those links.

1488	   These two rules are only recommendations for the operator or BIER-TE
1489	   controller assigning the BFR-ids.  Any allocation scheme can be used,
1490	   the BFR-ids just need to be unique across BFRs in each subdomain.

1492	   It is not currently determined if a single subdomain could or should
1493	   be allowed to forward both BIER and BIER-TE packets.  If this should
1494	   be supported, there are two options:

1496	   A.  BIER and BIER-TE have different BFR-id in the same subdomain.
1497	   This allows higher replication efficiency for BIER because their BFR-
1498	   id can be assigned sequentially, while the bitstrings for BIER-TE
1499	   will have also the additional bits for the topology.  There is no
1500	   relationship between a BFR BIER BFR-id and BIER-TE BFR-id.

1502	   B.  BIER and BIER-TE share the same BFR-id.  The BFR-id are assigned
1503	   as explained above for BIER-TE and simply reused for BIER.  The
1504	   replication efficiency for BIER will be as low as that for BIER-TE in
1505	   this approach.  Depending on topology, only the same 20%..80% of bits
1506	   as possible for BIER-TE can be used for BIER.

1508	7.5.  Example bit allocations

1510	7.5.1.  With BIER

1512	   Consider a network setup with a bitstring length of 256 for a network
1513	   topology as shown in the picture below.  The network has 6 areas,
1514	   each with ca. 170 BFR, connecting via a core with some larger (core)
1515	   BFR.  To address all BFER with BIER, 4 SI are required.  To send a
1516	   BIER packet to all BFER in the network, 4 copies need to be sent by
1517	   the BFIR.  On the BFIR it does not make a difference how the BFR-id
1518	   are allocated to BFER in the network, but for efficiency further down
1519	   in the network it does make a difference.

1521	                area1           area2        area3
1522	               BFR1a BFR1b  BFR2a BFR2b   BFR3a BFR3b
1523	                 |  \         /    \        /  |
1524	                 ................................
1525	                 .                Core          .
1526	                 ................................
1527	                 |    /       \    /        \  |
1528	               BFR4a BFR4b  BFR5a BFR5b   BFR6a BFR6b
1529	                area4          area5        area6

1531	                 Figure 17: Scaling BIER-TE bits by reuse

1533	   With random allocation of BFR-id to BFER, each receiving area would
1534	   (most likely) have to receive all 4 copies of the BIER packet because
1535	   there would be BFR-id for each of the 4 SI in each of the areas.
1536	   Only further towards each BFER would this duplication subside - when
1537	   each of the 4 trees runs out of branches.

1539	   If BFR-id are allocated intelligently, then all the BFER in an area
1540	   would be given BFR-id with as few as possible different SI.  Each
1541	   area would only have to forward one or two packets instead of 4.

1543	   Given how networks can grow over time, replication efficiency in an
1544	   area will also easily go down over time when BFR-id are network wide
1545	   allocated sequentially over time.  An area that initially only has
1546	   BFR-id in one SI might end up with many SI over a longer period of
1547	   growth.  Allocating SIs to areas with initially sufficiently many
1548	   spare bits for growths can help to alleviate this issue.  Or renumber
1549	   BFR-id after network expansion.  In this example one may consider to
1550	   use 6 SI and assign one to each area.

1552	   This example shows that intelligent BFR-id allocation within at least
1553	   subdomain 0 can even be helpful or even necessary in BIER.

1555	7.5.2.  With BIER-TE

1557	   In BIER-TE one needs to determine a subset of the physical topology
1558	   and attached BFER so that the "desired" representation of this
1559	   topology and the BFER fit into a single bitstring.  This process
1560	   needs to be repeated until the whole topology is covered.

1562	   Once bits/SIs are assigned to topology and BFER, BFR-id is just a
1563	   derived set of identifiers from the operator/BIER-TE controller as
1564	   explained above.

1566	   Every time that different sub-topologies have overlap, bits need to
1567	   be repeated across the bitstrings, increasing the overall amount of
1568	   bits required across all bitstring/SIs.  In the worst case, random
1569	   subsets of BFER are assigned to different SI.  This is much worse
1570	   than in BIER because it not only reduces replication efficiency with
1571	   the same number of overall bits, but even further - because more bits
1572	   are required due to duplication of bits for topology across multiple
1573	   SI.  Intelligent BFER to SI assignment and selecting specific
1574	   "desired" subtopologies can minimize this problem.

1576	   To set up BIER-TE efficiently for above topology, the following bit
1577	   allocation methods can be used.  This method can easily be expanded
1578	   to other, similarly structured larger topologies.

1580	   Each area is allocated one or more SI depending on the number of
1581	   future expected BFER and number of bits required for the topology in
1582	   the area.  In this example, 6 SI, one per area.

1584	   In addition, we use 4 bits in each SI: bia, bib, bea, beb: bit
1585	   ingress a, bit ingress b, bit egress a, bit egress b.  These bits
1586	   will be used to pass BIER packets from any BFIR via any combination
1587	   of ingress area a/b BFR and egress area a/b BFR into a specific
1588	   target area.  These bits are then set up with the right
1589	   forward_routed adjacencies on the BFIR and area edge BFR:

1591	   On all BFIR in an area j, bia in each BIFT:SI is populated with the
1592	   same forward_routed(BFRja), and bib with forward_routed(BFRjb).  On
1593	   all area edge BFR, bea in BIFT:SI=k is populated with
1594	   forward_routed(BFRka) and beb in BIFT:SI=k with
1595	   forward_routed(BFRkb).

1597	   For BIER-TE forwarding of a packet to some subset of BFER across all
1598	   areas, a BFIR would create at most 6 copies, with SI=1...SI=6, In
1599	   each packet, the bits indicate bits for topology and BFER in that
1600	   topology plus the four bits to indicate whether to pass this packet
1601	   via the ingress area a or b border BFR and the egress area a or b
1602	   border BFR, therefore allowing path engineering for those two
1603	   "unicast" legs: 1) BFIR to ingress are edge and 2) core to egress
1604	   area edge.  Replication only happens inside the egress areas.  For
1605	   BFER in the same area as in the BFIR, these four bits are not used.

1607	7.6.  Summary

1609	   BIER-TE can like BIER support multiple SI within a sub-domain to
1610	   allow re-using the concept of BFR-id and therefore minimize BIER-TE
1611	   specific functions in underlay routing, flow overlay methods and BIER
1612	   headers.

1614	   The number of BFIR/BFER possible in a subdomain is smaller than in
1615	   BIER because BIER-TE uses additional bits for topology.

1617	   Subdomains can in BIER-TE be used like in BIER to create more
1618	   efficient replication to known subsets of BFER.

1620	   Assigning bits for BFER intelligently into the right SI is more
1621	   important in BIER-TE than in BIER because of replication efficiency
1622	   and overall amount of bits required.

1624	8.  BIER-TE and Segment Routing (SR)

1626	   Segment Routing (SR ([RFC8402])) aims to enable lightweight path
1627	   engineering via loose source routing.  Compared to its more heavy-
1628	   weight predecessor RSVP-TE ([RFC3209]), SR does for example not
1629	   require per-path signaling to each of these hops.

1631	   BIER-TE supports the same design philosophy for multicast.  Like in
1632	   SR, it relies on source-routing - via the definition of a BitString.
1633	   Like SR, it only requires to consider the "hops" on which either
1634	   replication has to happen, or across which the traffic should be
1635	   steered (even without replication).  Any other hops can be skipped
1636	   via the use of routed adjacencies.

1638	   BIER-TE BitPosition (BP) can be understood as the BIER-TE equivalent
1639	   of "forwarding segments" in SR, but they have a different scope than
1640	   SR forwarding segments.  Whereas forwarding segments in SR are global
1641	   or local, BPs in BIER-TE have a scope that is the group of BFR(s)
1642	   that have adjacencies for this BP in their BIFT.  This can be called
1643	   "adjacency" scoped forwarding segments.

1645	   Adjacency scope could be global, but then every BFR would need an
1646	   adjacency for this BP, for example a forward_routed adjacency with
1647	   encapsulation to the global SR SID of the destination.  Such a BP
1648	   would always result in ingress replication though.  The first BFR
1649	   encountering this BP would directly replicate to it.  Only by using
1650	   non-global adjacency scope for BPs can traffic be steered and
1651	   replicated on non-ingress BFR.

1653	   SR can naturally be combined with BIER-TE and help to optimize it.
1654	   For example, instead of defining BitPositions for non-replicating
1655	   hops, it is equally possible to use segment routing encapsulations
1656	   (eg: MPLS label stacks) for the encapsulation of "forward_routed"
1657	   adjacencies.

1659	   Note that BIER itself can also be seen to be similar to SR.  BIER BPs
1660	   act as global destination Node-SIDs and the BIER bitstring is simply
1661	   a highly optimized mechanism to indicate multiple such SIDS and let
1662	   the network take care of effectively replicating the packet hop-by-
1663	   hop to each destination Node-SID.  What BIER does not allow is to
1664	   indicate intermediate hops, or terms of SR the ability to indicate a
1665	   sequence of SID to reach the destination.  This is what BIER-TE and
1666	   its adjacency scoped BP enables.

1668	   Both BIER and BIER-TE allow BFIR to "opportunistically" copy packets
1669	   to a set of desired BFER on a packet-by-packet basis.  In BIER, this
1670	   is done by OR'ing the BP for the desired BFER.  In BIER-TE this can
1671	   be done by OR'ing for each desired BFER a bitstring using the
1672	   "independent branches" approach described in Section 7.3 and
1673	   therefore also indicating the engineered path towards each desired
1674	   BFER.  This is the approach that
1675	   [I-D.ietf-bier-multicast-http-response] relies on.

1677	9.  Security Considerations

1679	   The security considerations are the same as for BIER with the
1680	   following differences:

1682	   BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures
1683	   for their distribution, so these are not attack vectors against BIER-
1684	   TE.

1686	10.  IANA Considerations

1688	   This document requests no action by IANA.

1690	11.  Acknowledgements

1692	   The authors would like to thank Greg Shepherd, Ijsbrand Wijnands,
1693	   Neale Ranns, Dirk Trossen, Sandy Zheng and Jeffrey Zhang for their
1694	   extensive review and suggestions.

1696	12.  Change log [RFC Editor: Please remove]

1698	   draft-ietf-bier-te-arch:

1700	      05: Review Jeffrey Zhang.

1702	      Part 2:

1704	      4.3 added note about leaf-BFER being also a propery of routing
1705	      setup.

1707	      4.7 Added missing details from example to avoid confusion with
1708	      routed adjacencies, also compressed explanatory text and better
1709	      justification why seed is explicitly configured by controller.

1711	      4.9 added section discussing generic reuse of BP methods.

1713	      4.10 added section summarizing BP optimizations of section 4.

1715	      6.  Rewrote/compressed explanation of comparison BIER/BIER-TE
1716	      forwarding difference.  Explained benefit of BIER-TE per-BP
1717	      forwarding being independent of forwarding for other BPs.

1719	      Part 1:

1721	      Explicitly ue forwarded_connected adjcency in ECMP adjcency
1722	      examples to avoid confusion.

1724	      4.3 Add picture as example for leav vs. non-leaf BFR in topology.
1725	      Improved description.

1727	      4.5 Exampe for traffic that can be broadcast -> for single BP in
1728	      hub&spoke.

1730	      4.8.1 Simplified example picture for routed adjacency, explanatory
1731	      text.

1733	      Review from Dirk Trossen:

1735	      Fixed up explanation of ICC paper vs. bloom filter.

1737	      04: spell check run.

1739	      Addded remaining fixes for Sandys (Zhang Zheng) review:

1741	      4.7 Enhance ECMP explanations:

1743	      example ECMP algorithm, highlight that doc does not standardize
1744	      ECMP algorithm.

1746	      Review from Dirk Trossen:

1748	      1.  Added mentioning of prior work for traffic engineered paths
1749	      with bloom filters.

1751	      2.  Changed title from layers to components and added "BIER-TE
1752	      control plane" to "BIER-TE controller host" to make it clearer,
1753	      what it does.

1755	      2.2.3.  Added reference to I-D.ietf-bier-multicast-http-response
1756	      as an example solution.

1758	      2.3. clarified sentence about resetting BPs before sending copies
1759	      (also forgot to mention DNR here).

1761	      3.4.  Added text saying this section will be removed unless IESG
1762	      review finds enough redeeming value in this example given how -03
1763	      introduced section 1.1 with basic examples.

1765	      7.2.  Removed explicit numbers 20%/80% for number of topology bits
1766	      in BIER-TE, replaced with more vague (high/low) description,
1767	      because we do not have good reference material Added text saying
1768	      this section will be removed unless IESG review finds enough
1769	      redeeming value in this example given how -03 introduced section
1770	      1.1 with basic examples.

1772	      many typos fixed.  Thanks a lot.

1774	      03: Last call textual changes by authors to improve readability:

1776	      removed Wolfgang Braun as co-authors (as requested).

1778	      Improved abstract to be more explanatory.  Removed mentioning of
1779	      FRR (not concluded on so far).

1781	      Added new text into Introduction section because the text was too
1782	      difficult to jump into (too many forward pointers).  This
1783	      primarily consists of examples and the early introduction of the
1784	      BIER-TE Topology concept enabled by these examples.

1786	      Amended comparison to SR.

1788	      Changed syntax from [VRF] to {VRF} to indicate its optional and to
1789	      make idnits happy.

1791	      Split references into normative / informative, added references.

1793	      02: Refresh after IETF104 discussion: changed intended status back
1794	      to standard.  Reasoning:

1796	      Tighter review of standards document == ensures arch will be
1797	      better prepared for possible adoption by other WGs (e.g.  DetNet)
1798	      or std. bodies.

1800	      Requirement against the degree of existing implementations is self
1801	      defined by the WG.  BIER WG seems to think it is not necessary to
1802	      apply multiple interoperating implementations against an
1803	      architecture level document at this time to make it qualify to go
1804	      to standards track.  Also, the levels of support introduced in -01
1805	      rev. should allow all BIER forwarding engines to also be able to
1806	      support the base level BIER-TE forwarding.

1808	      01: Added note comparing BIER and SR to also hopefully clarify
1809	      BIER-TE vs. BIER comparison re.  SR.

1811	      - added requirements section mandating only most basic BIER-TE
1812	      forwarding features as MUST.

1814	      - reworked comparison with BIER forwarding section to only
1815	      summarize and point to pseudocode section.

1817	      - reworked pseudocode section to have one pseudocode that mirrors
1818	      the BIER forwarding pseudocode to make comparison easier and a
1819	      second pseudocode that shows the complete set of BIER-TE
1820	      forwarding options and simplification/optimization possible vs.
1821	      BIER forwarding.  Removed MyBitsOfInterest (was pure
1822	      optimization).

1824	      - Added captions to pictures.

1826	      - Part of review feedback from Sandy (Zhang Zheng) integrated.

1828	      00: Changed target state to experimental (WG conclusion), updated
1829	      references, mod auth association.

1831	      - Source now on http://www.github.com/toerless/bier-te-arch

1833	      - Please open issues on the github for change/improvement requests
1834	      to the document - in addition to posting them on the list
1835	      (bier@ietf.).  Thanks!.

1837	   draft-eckert-bier-te-arch:

1839	      06: Added overview of forwarding differences between BIER, BIER-
1840	      TE.

1842	      05: Author affiliation change only.

1844	      04: Added comparison to Live-Live and BFIR to FRR section
1845	      (Eckert).

1847	      04: Removed FRR content into the new FRR draft [I-D.eckert-bier-
1848	      te-frr] (Braun).

1850	      - Linked FRR information to new draft in Overview/Introduction

1852	      - Removed BTAFT/FRR from "Changes in the network topology"

1854	      - Linked new draft in "Link/Node Failures and Recovery"

1856	      - Removed FRR from "The BIER-TE Forwarding Layer"

1858	      - Moved FRR section to new draft

1860	      - Moved FRR parts of Pseudocode into new draft

1862	      - Left only non FRR parts

1864	      - removed FrrUpDown(..) and //FRR operations in
1865	      ForwardBierTePacket(..)

1867	      - New draft contains FrrUpDown(..) and ForwardBierTePacket(Packet)
1868	      from bier-arch-03

1870	      - Moved "BIER-TE and existing FRR to new draft

1872	      - Moved "BIER-TE and Segment Routing" section one level up

1874	      - Thus, removed "Further considerations" that only contained this
1875	      section

1877	      - Added Changes for version 04
1878	      03: Updated the FRR section.  Added examples for FRR key concepts.
1879	      Added BIER-in-BIER tunneling as option for tunnels in backup
1880	      paths.  BIFT structure is expanded and contains an additional
1881	      match field to support full node protection with BIER-TE FRR.

1883	      03: Updated FRR section.  Explanation how BIER-in-BIER
1884	      encapsulation provides P2MP protection for node failures even
1885	      though the routing underlay does not provide P2MP.

1887	      02: Changed the definition of BIFT to be more inline with BIER.
1888	      In revs. up to -01, the idea was that a BIFT has only entries for
1889	      a single bitstring, and every SI and subdomain would be a separate
1890	      BIFT.  In BIER, each BIFT covers all SI.  This is now also how we
1891	      define it in BIER-TE.

1893	      02: Added Section 7 to explain the use of SI, subdomains and BFR-
1894	      id in BIER-TE and to give an example how to efficiently assign
1895	      bits for a large topology requiring multiple SI.

1897	      02: Added further detailed for rings - how to support input from
1898	      all ring nodes.

1900	      01: Fixed BFIR -> BFER for section 4.3.

1902	      01: Added explanation of SI, difference to BIER ECMP,
1903	      consideration for Segment Routing, unicast FRR, considerations for
1904	      encapsulation, explanations of BIER-TE controller host and CLI.

1906	      00: Initial version.

1908	13.  References

1910	13.1.  Normative References

1912	   [RFC8279]  Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
1913	              Przygienda, T., and S. Aldrin, "Multicast Using Bit Index
1914	              Explicit Replication (BIER)", RFC 8279,
1915	              DOI 10.17487/RFC8279, November 2017,
1916	              <https://www.rfc-editor.org/info/rfc8279>.

1918	   [RFC8296]  Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
1919	              Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation
1920	              for Bit Index Explicit Replication (BIER) in MPLS and Non-
1921	              MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January
1922	              2018, <https://www.rfc-editor.org/info/rfc8296>.

1924	13.2.  Informative References

1926	   [I-D.ietf-bier-multicast-http-response]
1927	              Trossen, D., Rahman, A., Wang, C., and T. Eckert,
1928	              "Applicability of BIER Multicast Overlay for Adaptive
1929	              Streaming Services", draft-ietf-bier-multicast-http-
1930	              response-01 (work in progress), June 2019.

1932	   [I-D.ietf-roll-ccast]
1933	              Bergmann, O., Bormann, C., Gerdes, S., and H. Chen,
1934	              "Constrained-Cast: Source-Routed Multicast for RPL",
1935	              draft-ietf-roll-ccast-01 (work in progress), October 2017.

1937	   [ICC]      Reed, M., Al-Naday, M., Thomos, N., Trossen, D.,
1938	              Petropoulos, G., and S. Spirou, "Stateless multicast
1939	              switching in software defined networks",  IEEE
1940	              International Conference on Communications (ICC), Kuala
1941	              Lumpur, Malaysia, 2016, May 2016,
1942	              <https://ieeexplore.ieee.org/document/7511036>.

1944	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1945	              Requirement Levels", BCP 14, RFC 2119,
1946	              DOI 10.17487/RFC2119, March 1997,
1947	              <https://www.rfc-editor.org/info/rfc2119>.

1949	   [RFC3209]  Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V.,
1950	              and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP
1951	              Tunnels", RFC 3209, DOI 10.17487/RFC3209, December 2001,
1952	              <https://www.rfc-editor.org/info/rfc3209>.

1954	   [RFC8402]  Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
1955	              Decraene, B., Litkowski, S., and R. Shakir, "Segment
1956	              Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
1957	              July 2018, <https://www.rfc-editor.org/info/rfc8402>.

1959	Authors' Addresses

1961	   Toerless Eckert (editor)
1962	   Futurewei Technologies Inc.
1963	   2330 Central Expy
1964	   Santa Clara  95050
1965	   USA

1967	   Email: tte+ietf@cs.fau.de
1968	   Gregory Cauchie
1969	   Bouygues Telecom

1971	   Email: GCAUCHIE@bouyguestelecom.fr

1973	   Michael Menth
1974	   University of Tuebingen

1976	   Email: menth@uni-tuebingen.de