idnits 2.17.1 

draft-ietf-bier-te-arch-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([RFC8279]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 14, 2019) is 1802 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'I-D.eckert-bier-te-frr' is mentioned on line 136,
     but not defined

  == Missing Reference: 'VRF' is mentioned on line 973, but not defined

  -- Looks like a reference, but probably isn't: '2' on line 907

  -- Looks like a reference, but probably isn't: '1' on line 917

  == Missing Reference: 'SI' is mentioned on line 954, but not defined

  == Missing Reference: 'I' is mentioned on line 961, but not defined


     Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                     T. Eckert, Ed.
3	Internet-Draft                                                    Huawei
4	Intended status: Standards Track                              G. Cauchie
5	Expires: November 15, 2019                              Bouygues Telecom
6	                                                                W. Braun
7	                                                                M. Menth
8	                                                 University of Tuebingen
9	                                                            May 14, 2019

11	    Traffic Engineering for Bit Index Explicit Replication (BIER-TE)
12	                       draft-ietf-bier-te-arch-02

14	Abstract

16	   This document proposes an architecture for BIER-TE: Traffic
17	   Engineering for Bit Index Explicit Replication (BIER).

19	   BIER-TE shares part of its architecture with BIER as described in
20	   [RFC8279].  It also proposes to share the packet format with BIER.

22	   BIER-TE forwards and replicates packets like BIER based on a
23	   BitString in the packet header but it does not require an IGP.  It
24	   does support traffic engineering by explicit hop-by-hop forwarding
25	   and loose hop forwarding of packets.  It does support Fast ReRoute
26	   (FRR) for link and node protection and incremental deployment.
27	   Because BIER-TE like BIER operates without explicit in-network tree-
28	   building but also supports traffic engineering, it is more similar to
29	   SR than RSVP-TE.

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at https://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on November 15, 2019.

48	Copyright Notice

50	   Copyright (c) 2019 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (https://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
66	     1.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   3
67	     1.2.  Requirements Language . . . . . . . . . . . . . . . . . .   4
68	   2.  Layering  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
69	     2.1.  The Multicast Flow Overlay  . . . . . . . . . . . . . . .   5
70	     2.2.  The BIER-TE Controller Host . . . . . . . . . . . . . . .   5
71	       2.2.1.  Assignment of BitPositions to adjacencies of the
72	               network topology  . . . . . . . . . . . . . . . . . .   6
73	       2.2.2.  Changes in the network topology . . . . . . . . . . .   6
74	       2.2.3.  Set up per-multicast flow BIER-TE state . . . . . . .   6
75	       2.2.4.  Link/Node Failures and Recovery . . . . . . . . . . .   6
76	     2.3.  The BIER-TE Forwarding Layer  . . . . . . . . . . . . . .   7
77	     2.4.  The Routing Underlay  . . . . . . . . . . . . . . . . . .   7
78	   3.  BIER-TE Forwarding  . . . . . . . . . . . . . . . . . . . . .   7
79	     3.1.  The Bit Index Forwarding Table (BIFT) . . . . . . . . . .   7
80	     3.2.  Adjacency Types . . . . . . . . . . . . . . . . . . . . .   8
81	       3.2.1.  Forward Connected . . . . . . . . . . . . . . . . . .   8
82	       3.2.2.  Forward Routed  . . . . . . . . . . . . . . . . . . .   9
83	       3.2.3.  ECMP  . . . . . . . . . . . . . . . . . . . . . . . .   9
84	       3.2.4.  Local Decap . . . . . . . . . . . . . . . . . . . . .   9
85	     3.3.  Encapsulation considerations  . . . . . . . . . . . . . .  10
86	     3.4.  Basic BIER-TE Forwarding Example  . . . . . . . . . . . .  10
87	     3.5.  Forwarding comparison with BIER . . . . . . . . . . . . .  12
88	     3.6.  Requirements  . . . . . . . . . . . . . . . . . . . . . .  13
89	   4.  BIER-TE Controller Host BitPosition Assignments . . . . . . .  13
90	     4.1.  P2P Links . . . . . . . . . . . . . . . . . . . . . . . .  14
91	     4.2.  BFER  . . . . . . . . . . . . . . . . . . . . . . . . . .  14
92	     4.3.  Leaf BFERs  . . . . . . . . . . . . . . . . . . . . . . .  14
93	     4.4.  LANs  . . . . . . . . . . . . . . . . . . . . . . . . . .  14
94	     4.5.  Hub and Spoke . . . . . . . . . . . . . . . . . . . . . .  15
95	     4.6.  Rings . . . . . . . . . . . . . . . . . . . . . . . . . .  15
96	     4.7.  Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . .  16
97	     4.8.  Routed adjacencies  . . . . . . . . . . . . . . . . . . .  19
98	       4.8.1.  Reducing BitPositions . . . . . . . . . . . . . . . .  19
99	       4.8.2.  Supporting nodes without BIER-TE  . . . . . . . . . .  19
100	   5.  Avoiding loops and duplicates . . . . . . . . . . . . . . . .  19
101	     5.1.  Loops . . . . . . . . . . . . . . . . . . . . . . . . . .  19
102	     5.2.  Duplicates  . . . . . . . . . . . . . . . . . . . . . . .  20
103	   6.  BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . .  20
104	   7.  Managing SI, subdomains and BFR-ids . . . . . . . . . . . . .  23
105	     7.1.  Why SI and sub-domains  . . . . . . . . . . . . . . . . .  24
106	     7.2.  Bit assignment comparison BIER and BIER-TE  . . . . . . .  25
107	     7.3.  Using BFR-id with BIER-TE . . . . . . . . . . . . . . . .  25
108	     7.4.  Assigning BFR-ids for BIER-TE . . . . . . . . . . . . . .  26
109	     7.5.  Example bit allocations . . . . . . . . . . . . . . . . .  27
110	       7.5.1.  With BIER . . . . . . . . . . . . . . . . . . . . . .  27
111	       7.5.2.  With BIER-TE  . . . . . . . . . . . . . . . . . . . .  28
112	     7.6.  Summary . . . . . . . . . . . . . . . . . . . . . . . . .  29
113	   8.  BIER-TE and Segment Routing . . . . . . . . . . . . . . . . .  29
114	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  30
115	   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  30
116	   11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  30
117	   12. Change log [RFC Editor: Please remove]  . . . . . . . . . . .  30
118	   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  33
119	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  33

121	1.  Introduction

123	1.1.  Overview

125	   This document specifies the architecture for BIER-TE: traffic
126	   engineering for Bit Index Explicit Replication BIER.

128	   BIER-TE shares architecture and packet formats with BIER as described
129	   in [RFC8279].

131	   BIER-TE forwards and replicates packets like BIER based on a
132	   BitString in the packet header but it does not require an IGP.  It
133	   does support traffic engineering by explicit hop-by-hop forwarding
134	   and loose hop forwarding of packets.  It does support incremental
135	   deployment and a Fast ReRoute (FRR) extension for link and node
136	   protection is given in [I-D.eckert-bier-te-frr].  Because BIER-TE
137	   like BIER operates without explicit in-network tree-building but also
138	   supports traffic engineering, it is more similar to Segment Routing
139	   (SR) than RSVP-TE.

141	   The key differences over BIER are:

143	   o  BIER-TE replaces in-network autonomous path calculation by
144	      explicit paths calculated offpath by the BIER-TE controller host.

146	   o  In BIER-TE every BitPosition of the BitString of a BIER-TE packet
147	      indicates one or more adjacencies - instead of a BFER as in BIER.

149	   o  BIER-TE in each BFR has no routing table but only a BIER-TE
150	      Forwarding Table (BIFT) indexed by SI:BitPosition and populated
151	      with only those adjacencies to which the BFR should replicate
152	      packets to.

154	   BIER-TE headers use the same format as BIER headers.

156	   BIER-TE forwarding does not require/use the BFIR-ID.  The BFIR-ID can
157	   still be useful though for coordinated BFIR/BFER functions, such as
158	   the context for upstream assigned labels for MPLS payloads in MVPN
159	   over BIER-TE.

161	   If the BIER-TE domain is also running BIER, then the BFIR-ID in BIER-
162	   TE packets can be set to the same BFIR-ID as used with BIER packets.

164	   If the BIER-TE domain is not running full BIER or does not want to
165	   reduce the need to allocate bits in BIER bitstrings for BFIR-ID
166	   values, then the allocation of BFIR-ID values in BIER-TE packets can
167	   be done through other mechanisms outside the scope of this document,
168	   as long as this is appropriately agreed upon between all BFIR/BFER.

170	1.2.  Requirements Language

172	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
173	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
174	   document are to be interpreted as described in RFC 2119 [RFC2119].

176	2.  Layering

178	   End to end BIER-TE operations consists of four components: The
179	   "Multicast Flow Overlay", the "BIER-TE Controller Host", the "Routing
180	   Underlay" and the "BIER-TE forwarding layer".

182	      Picture 2: Layers of BIER-TE

184	                   <------BGP/PIM----->
185	      |<-IGMP/PIM->  multicast flow   <-PIM/IGMP->|
186	                        overlay

188	                   [Bier-TE Controller Host]
189	                      ^      ^     ^
190	                     /       |      \   BIER-TE control protocol
191	                    |        |       |  eg.: Netconf/Restconf/Yang
192	                    v        v       v
193	    Src -> Rtr1 -> BFIR-----BFR-----BFER -> Rtr2 -> Rcvr

195	                   |--------------------->|
196	                   BIER-TE forwarding layer

198	                   |<- BIER-TE domain-->|

200	                  |<--------------------->|
201	                      Routing underlay

203	                      Figure 1: BIER-TE architecture

205	2.1.  The Multicast Flow Overlay

207	   The Multicast Flow Overlay operates as in BIER.  See [RFC8279].
208	   Instead of interacting with the BIER layer, it interacts with the
209	   BIER-TE Controller Host

211	2.2.  The BIER-TE Controller Host

213	   The BIER-TE controller host is representing the control plane of
214	   BIER-TE.  It communicates two sets of information with BFRs:

216	   During bring-up or modifications of the network topology, the
217	   controller discovers the network topology, assigns BitPositions to
218	   adjacencies and signals the resulting mapping of BitPositions to
219	   adjacencies to each BFR connecting to the adjacency.

221	   During day-to-day operations of the network, the controller signals
222	   to BFIRs what multicast flows are mapped to what BitStrings.

224	   Communications between the BIER-TE controller host to BFRs is ideally
225	   via standardized protocols and data-models such as Netconf/Retconf/
226	   Yang.  This is currently outside the scope of this document.  Vendor-
227	   specific CLI on the BFRs is also a possible stopgap option (as in
228	   many other SDN solutions lacking definition of standardized data
229	   model).

231	   For simplicity, the procedures of the BIER-TE controller host are
232	   described in this document as if it is a single, centralized
233	   automated entity, such as an SDN controller.  It could equally be an
234	   operator setting up CLI on the BFRs.  Distribution of the functions
235	   of the BIER-TE controller host is currently outside the scope of this
236	   document.

238	2.2.1.  Assignment of BitPositions to adjacencies of the network
239	        topology

241	   The BIER-TE controller host tracks the BFR topology of the BIER-TE
242	   domain.  It determines what adjacencies require BitPositions so that
243	   BIER-TE explicit paths can be built through them as desired by
244	   operator policy.

246	   The controller then pushes the BitPositions/adjacencies to the BIFT
247	   of the BFRs, populating only those SI:BitPositions to the BIFT of
248	   each BFR to which that BFR should be able to send packets to -
249	   adjacencies connecting to this BFR.

251	2.2.2.  Changes in the network topology

253	   If the network topology changes (not failure based) so that
254	   adjacencies that are assigned to BitPositions are no longer needed,
255	   the controller can re-use those BitPositions for new adjacencies.
256	   First, these BitPositions need to be removed from any BFIR flow state
257	   and BFR BIFT state, then they can be repopulated, first into BIFT and
258	   then into the BFIR.

260	2.2.3.  Set up per-multicast flow BIER-TE state

262	   The BIER-TE controller host tracks the multicast flow overlay to
263	   determine what multicast flow needs to be sent by a BFIR to which set
264	   of BFER.  It calculates the desired distribution tree across the
265	   BIER-TE domain based on algorithms outside the scope of this document
266	   (eg.: CSFP, Steiner Tree,...).  It then pushes the calculated
267	   BitString into the BFIR.

269	2.2.4.  Link/Node Failures and Recovery

271	   When link or nodes fail or recover in the topology, BIER-TE can
272	   quickly respond with the optional FRR procedures described in [I-
273	   D.eckert-bier-te-frr].  It can also more slowly react by
274	   recalculating the BitStrings of affected multicast flows.  This
275	   reaction is slower than the FRR procedure because the controller
276	   needs to receive link/node up/down indications, recalculate the
277	   desired BitStrings and push them down into the BFIRs.  With FRR, this
278	   is all performed locally on a BFR receiving the adjacency up/down
279	   notification.

281	2.3.  The BIER-TE Forwarding Layer

283	   When the BIER-TE Forwarding Layer receives a packet, it simply looks
284	   up the BitPositions that are set in the BitString of the packet in
285	   the Bit Index Forwarding Table (BIFT) that was populated by the BIER-
286	   TE controller host.  For every BP that is set in the BitString, and
287	   that has one or more adjacencies in the BIFT, a copy is made
288	   according to the type of adjacencies for that BP in the BIFT.  Before
289	   sending any copy, the BFR resets all BitPositions in the BitString of
290	   the packet to which it can create a copy.  This is done to inhibit
291	   that packets can loop.

293	2.4.  The Routing Underlay

295	   BIER-TE is sending BIER packets to directly connected BIER-TE
296	   neighbors as L2 (unicasted) BIER packets without requiring a routing
297	   underlay.  BIER-TE forwarding uses the Routing underlay for
298	   forward_routed adjacencies which copy BIER-TE packets to not-
299	   directly-connected BFRs (see below for adjacency definitions).

301	   If the BFR intends to support FRR for BIER-TE, then the BIER-TE
302	   forwarding plane needs to receive fast adjacency up/down
303	   notifications: Link up/down or neighbor up/down, eg.: from BFD.
304	   Providing these notifications is considered to be part of the routing
305	   underlay in this document.

307	3.  BIER-TE Forwarding

309	3.1.  The Bit Index Forwarding Table (BIFT)

311	   The Bit Index Forwarding Table (BIFT) exists in every BFR.  For every
312	   subdomain in use, it is a table indexed by SI:BitPosition and is
313	   populated by the BIER-TE control plane.  Each index can be empty or
314	   contain a list of one or more adjacencies.

316	   BIER-TE can support multiple subdomains like BIER.  Each one with a
317	   separate BIFT

319	   In the BIER architecture, indices into the BIFT are explained to be
320	   both BFR-id and SI:BitString (BitPosition).  This is because there is
321	   a 1:1 relationship between BFR-id and SI:BitString - every bit in
322	   every SI is/can be assigned to a BFIR/BFER.  In BIER-TE there are
323	   more bits used in each BitString than there are BFIR/BFER assigned to
324	   the bitstring.  This is because of the bits required to express the
325	   (traffic engineered) path through the topology.  The BIER-TE
326	   forwarding definitions do therefore not use the term BFR-id at all.
327	   Instead, BFR-ids are only used as required by routing underlay, flow
328	   overlay of BIER headers.  Please refer to Section 7 for explanations
329	   how to deal with SI, subdomains and BFR-id in BIER-TE.

331	     ------------------------------------------------------------------
332	     | Index:          |  Adjacencies:                                |
333	     | SI:BitPosition  |  <empty> or one or more per entry            |
334	     ==================================================================
335	     | 0:1             |  forward_connected(interface,neighbor,DNR)   |
336	     ------------------------------------------------------------------
337	     | 0:2             |  forward_connected(interface,neighbor,DNR)   |
338	     |                 |  forward_connected(interface,neighbor,DNR)   |
339	     ------------------------------------------------------------------
340	     | 0:3             |  local_decap([VRF])                          |
341	     ------------------------------------------------------------------
342	     | 0:4             |  forward_routed([VRF,]l3-neighbor)           |
343	     ------------------------------------------------------------------
344	     | 0:5             |  <empty>                                     |
345	     ------------------------------------------------------------------
346	     | 0:6             |  ECMP({adjacency1,...adjacencyN}, seed)      |
347	     ------------------------------------------------------------------
348	     ...
349	     | BitStringLength |  ...                                         |
350	     ------------------------------------------------------------------
351	                      Bit Index Forwarding Table

353	                        Figure 2: BIFT adjacencies

355	   The BIFT is programmed into the data plane of BFRs by the BIER-TE
356	   controller host and used to forward packets, according to the rules
357	   specified in the BIER-TE Forwarding Procedures.

359	   Adjacencies for the same BP when populated in more than one BFR by
360	   the controller do not have to have the same adjacencies.  This is up
361	   to the controller.  BPs for p2p links are one case (see below).

363	3.2.  Adjacency Types

365	3.2.1.  Forward Connected

367	   A "forward_connected" adjacency is towards a directly connected BFR
368	   neighbor using an interface address of that BFR on the connecting
369	   interface.  A forward_connected adjacency does not route packets but
370	   only L2 forwards them to the neighbor.

372	   Packets sent to an adjacency with "DoNotReset" (DNR) set in the BIFT
373	   will not have the BitPosition for that adjacency reset when the BFR
374	   creates a copy for it.  The BitPosition will still be reset for
375	   copies of the packet made towards other adjacencies.  The can be used
376	   for example in ring topologies as explained below.

378	3.2.2.  Forward Routed

380	   A "forward_routed" adjacency is an adjacency towards a BFR that is
381	   not a forward_connected adjacency: towards a loopback address of a
382	   BFR or towards an interface address that is non-directly connected.
383	   Forward_routed packets are forwarded via the Routing Underlay.

385	   If the Routing Underlay has multiple paths for a forward_routed
386	   adjacency, it will perform ECMP independent of BIER-TE for packets
387	   forwarded across a forward_routed adjacency.

389	   If the Routing Underlay has FRR, it will perform FRR independent of
390	   BIER-TE for packets forwarded across a forward_routed adjacency.

392	3.2.3.  ECMP

394	   The ECMP mechanisms in BIER are tied to the BIER BIFT and are are
395	   therefore not directly useable with BIER-TE.  The following
396	   procedures describe ECMP for BIER-TE that we consider to be
397	   lightweight but also well manageable.  It leverages the existing
398	   entropy parameter in the BIER header to keep packets of the flows on
399	   the same path and it introduces a "seed" parameter to allow
400	   engineering traffic to be polarized or randomized across multiple
401	   hops.

403	   An "Equal Cost Multipath" (ECMP) adjacency has a list of two or more
404	   adjacencies included in it.  It copies the BIER-TE to one of those
405	   adjacencies based on the ECMP hash calculation.  The BIER-TE ECMP
406	   hash algorithm must select the same adjacency from that list for all
407	   packets with the same "entropy" value in the BIER-TE header if the
408	   same number of adjacencies and same seed are given as parameters.
409	   Further use of the seed parameter is explained below.

411	3.2.4.  Local Decap

413	   A "local_decap" adjacency passes a copy of the payload of the BIER-TE
414	   packet to the packets NextProto within the BFR (IPv4/IPv6,
415	   Ethernet,...).  A local_decap adjacency turns the BFR into a BFER for
416	   matching packets.  Local_decap adjacencies require the BFER to
417	   support routing or switching for NextProto to determine how to
418	   further process the packet.

420	3.3.  Encapsulation considerations

422	   Specifications for BIER-TE encapsulation are outside the scope of
423	   this document.  This section gives explanations and guidelines.

425	   Because a BFR needs to interpret the BitString of a BIER-TE packet
426	   differently from a BIER packet, it is necessary to distinguish BIER
427	   from BIER-TE packets.  This is subject to definitions in BIER
428	   encapsulation specifications.

430	   MPLS encapsulation [RFC8296] for example assigns one label by which
431	   BFRs recognizes BIER packets for every (SI,subdomain) combination.
432	   If it is desirable that every subdomain can forward only BIER or
433	   BIER-TE packets, then the label allocation could stay the same, and
434	   only the forwarding model (BIER/BIER-TE) would have to be defined per
435	   subdomain.  If it is desirable to support both BIER and BIER-TE
436	   forwarding in the same subdomain, then additional labels would need
437	   to be assigned for BIER-TE forwarding.

439	   "forward_routed" requires an encapsulation permitting to unicast
440	   BIER-TE packets to a specific interface address on a target BFR.
441	   With MPLS encapsulation, this can simply be done via a label stack
442	   with that addresses label as the top label - followed by the label
443	   assigned to (SI,subdomain) - and if necessary (see above) BIER-TE.
444	   With non-MPLS encapsulation, some form of IP tunneling (IP in IP,
445	   LISP, GRE) would be required.

447	   The encapsulation used for "forward_routed" adjacencies can equally
448	   support existing advanced adjacency information such as "loose source
449	   routes" via eg: MPLS label stacks or appropriate header extensions
450	   (eg: for IPv6).

452	3.4.  Basic BIER-TE Forwarding Example

454	   Step by step example of basic BIER-TE forwarding.  This does not use
455	   ECMP or forward_routed adjacencies nor does it try to minimize the
456	   number of required BitPositions for the topology.

458	               [Bier-Te Controller Host]
459	                       /   | \
460	                      v    v  v

462	           | p13   p1 |
463	           +- BFIR2 --+          |
464	           |          | p2   p6  |           LAN2
465	           |          +-- BFR3 --+           |
466	           |          |          |  p7  p11  |
467	      Src -+                     +-- BFER1 --+
468	           |          | p3   p8  |           |
469	           |          +-- BFR4 --+           +-- Rcv1
470	           |          |          |           |
471	           |          |
472	           | p14  p4  |
473	           +- BFIR1 --+          |
474	           |          +-- BFR5 --+ p10  p12  |
475	         LAN1         | p5   p9  +-- BFER2 --+
476	                                 |           +-- Rcv2
477	                                             |
478	                                             LAN3

480	          IP  |..... BIER-TE network......| IP

482	                   Figure 3: BIER-TE Forwarding Example

484	   pXX indicate the BitPositions number assigned by the BIER-TE
485	   controller host to adjacencies in the BIER-TE topology.  For example,
486	   p9 is the adjacency towards BFR9 on the LAN connecting to BFER2.

488	      BIFT BFIR2:
489	        p13: local_decap()
490	         p2: forward_connected(BFR3)

492	      BIFT BFR3:
493	         p1: forward_connected(BFIR2)
494	         p7: forward_connected(BFER1)
495	         p8: forward_connected(BFR4)

497	      BIFT BFER1:
498	        p11: local_decap()
499	         p6: forward_connected(BFR3)
500	         p8: forward_connected(BFR4)

502	             Figure 4: BIER-TE Forwarding Example Adjacencies

504	   ...and so on.

506	   Traffic needs to flow from BFIR2 towards Rcv1, Rcv2.  The controller
507	   determines it wants it to pass across the following paths:

509	                 -> BFER1 ---------------> Rcv1
510	    BFIR2 -> BFR3
511	                 -> BFR4 -> BFR5 -> BFER2 -> Rcv2

513	                Figure 5: BIER-TE Forwarding Example Paths

515	   These paths equal to the following BitString: p2, p5, p7, p8, p10,
516	   p11, p12.

518	   This BitString is set up in BFIR2.  Multicast packets arriving at
519	   BFIR2 from Src are assigned this BitString.

521	   BFIR2 forwards based on that BitString.  It has p2 and p13 populated.
522	   Only p13 is in BitString which has an adjacency towards BFR3.  BFIR2
523	   resets p2 in BitString and sends a copy towards BFR2.

525	   BFR3 sees a BitString of p5,p7,p8,p10,p11,p12.  It is only interested
526	   in p1,p7,p8.  It creates a copy of the packet to BFER1 (due to p7)
527	   and one to BFR4 (due to p8).  It resets p7, p8 before sending.

529	   BFER1 sees a BitString of p5,p10,p11,p12.  It is only interested in
530	   p6,p7,p8,p11 and therefore considers only p11. p11 is a "local_decap"
531	   adjacency installed by the BIER-TE controller host because BFER1
532	   should pass packets to IP multicast.  The local_decap adjacency
533	   instructs BFER1 to create a copy, decapsulate it from the BIER header
534	   and pass it on to the NextProtocol, in this example IP multicast.  IP
535	   multicast will then forward the packet out to LAN2 because it did
536	   receive PIM or IGMP joins on LAN2 for the traffic.

538	   Further processing of the packet in BFR4, BFR5 and BFER2 accordingly.

540	3.5.  Forwarding comparison with BIER

542	   Forwarding of BIER-TE is designed to allow common forwarding hardware
543	   with BIER.  In fact, one of the main goals of this document is to
544	   encourage the building of forwarding hardware that can not only
545	   support BIER, but also BIER-TE - to allow experimentation with BIER-
546	   TE and support building of BIER-TE control plane code.

548	   The pseudocode in Section 6 shows how existing BIER/BIFT forwarding
549	   can be amended to support basic BIER-TE forwarding, by using BIER
550	   BIFT's F-BM.  Only the masking of bits due to avoid duplicates must
551	   be skipped when forwarding is for BIER-TE.

553	   Whether to use BIER or BIER-TE forwarding can simply be a configured
554	   choice per subdomain and accordingly be set up by a BIER-TE
555	   controller host.  The BIER packet encapsulation [RFC8296] too can be
556	   reused without changes except that the currently defined BIER-TE ECMP
557	   adjacency does not leverage the entropy field so that field would be
558	   unused when BIER-TE forwarding is used.

560	3.6.  Requirements

562	   Basic BIER-TE forwarding MUST support to configure Subdomains to use
563	   basic BIER-TE forwarding rules (instead of BIER).  With basic BIER-TE
564	   forwarding, every bit MUST support to have zero or one adjacency.  It
565	   MUST support the adjacency types forward_connected without DNR flag,
566	   forward_routed and local_decap.  All other BIER-TE forwarding
567	   features are optional.  This Basic BIER-TE requirements make BIER-TE
568	   forwarding exactly the same as BIER forwarding with the exception of
569	   skipping the aforementioned F-BM masking on egres.

571	   BIER-TE forwarding SHOULD support the DNR flag, as this is highly
572	   useful to save bits in rings (see Section 4.6).

574	   BIER-TE forwarding MAY support more than one djacency on a bit and
575	   ECMP adjacencies.  The importance of ECMP adjacencies is unclear when
576	   traffic engineering is used because it may be more desirable to
577	   explicitly steer traffic across non-ECMP paths to make per-path
578	   traffic calculation easier for controllers.  Having more than one
579	   adjacency for a bit allows further savings of bits in hub&spoke
580	   scenarios, but unlike rings it is less "natural" to flood traffic
581	   across multuple links unconditional.  Both ECMP and multiple
582	   adjacencies are forwarding plane features that should be possible to
583	   support later when needed as they do not impact the basic BIER-TE
584	   replication loop.  This is true because there is no inter-copy
585	   depency through resetting of F-BM as in BIER.

587	4.  BIER-TE Controller Host BitPosition Assignments

589	   This section describes how the BIER-TE controller host can use the
590	   different BIER-TE adjacency types to define the BitPositions of a
591	   BIER-TE domain.

593	   Because the size of the BitString is limiting the size of the BIER-TE
594	   domain, many of the options described exist to support larger
595	   topologies with fewer BitPositions (4.1, 4.3, 4.4, 4.5, 4.6, 4.7,
596	   4.8).

598	4.1.  P2P Links

600	   Each P2p link in the BIER-TE domain is assigned one unique
601	   BitPosition with a forward_connected adjacency pointing to the
602	   neighbor on the p2p link.

604	4.2.  BFER

606	   Every BFER is given a unique BitPosition with a local_decap
607	   adjacency.

609	4.3.  Leaf BFERs

611	   Leaf BFERs are BFERs where incoming BIER-TE packets never need to be
612	   forwarded to another BFR but are only sent to the BFER to exit the
613	   BIER-TE domain.  For example, in networks where PEs are spokes
614	   connected to P routers, those PEs are Leaf BFIRs unless there is a
615	   U-turn between two PEs.

617	   All leaf-BFER in a BIER-TE domain can share a single BitPosition.
618	   This is possible because the BitPosition for the adjacency to reach
619	   the BFER can be used to distinguish whether or not packets should
620	   reach the BFER.

622	   This optimization will not work if an upstream interface of the BFER
623	   is using a BitPosition optimized as described in the following two
624	   sections (LAN, Hub and Spoke).

626	4.4.  LANs

628	   In a LAN, the adjacency to each neighboring BFR on the LAN is given a
629	   unique BitPosition.  The adjacency of this BitPosition is a
630	   forward_connected adjacency towards the BFR and this BitPosition is
631	   populated into the BIFT of all the other BFRs on that LAN.

633	            BFR1
634	             |p1
635	      LAN1-+-+---+-----+
636	          p3|  p4|   p2|
637	          BFR3 BFR4  BFR7

639	                           Figure 6: LAN Example

641	   If Bandwidth on the LAN is not an issue and most BIER-TE traffic
642	   should be copied to all neighbors on a LAN, then BitPositions can be
643	   saved by assigning just a single BitPosition to the LAN and
644	   populating the BitPosition of the BIFTs of each BFRs on the LAN with
645	   a list of forward_connected adjacencies to all other neighbors on the
646	   LAN.

648	   This optimization does not work in the face of BFRs redundantly
649	   connected to more than one LANs with this optimization because these
650	   BFRs would receive duplicates and forward those duplicates into the
651	   opposite LANs.  Adjacencies of such BFRs into their LANs still need a
652	   separate BitPosition.

654	4.5.  Hub and Spoke

656	   In a setup with a hub and multiple spokes connected via separate p2p
657	   links to the hub, all p2p links can share the same BitPosition.  The
658	   BitPosition on the hubs BIFT is set up with a list of
659	   forward_connected adjacencies, one for each Spoke.

661	   This option is similar to the BitPosition optimization in LANs:
662	   Redundantly connected spokes need their own BitPositions.

664	4.6.  Rings

666	   In L3 rings, instead of assigning a single BitPosition for every p2p
667	   link in the ring, it is possible to save BitPositions by setting the
668	   "Do Not Reset" (DNR) flag on forward_connected adjacencies.

670	   For the rings shown in the following picture, a single BitPosition
671	   will suffice to forward traffic entering the ring at BFRa or BFRb all
672	   the way up to BFR1:

674	   On BFRa, BFRb, BFR30,... BFR3, the BitPosition is populated with a
675	   forward_connected adjacency pointing to the clockwise neighbor on the
676	   ring and with DNR set.  On BFR2, the adjacency also points to the
677	   clockwise neighbor BFR1, but without DNR set.

679	   Handling DNR this way ensures that copies forwarded from any BFR in
680	   the ring to a BFR outside the ring will not have the ring BitPosition
681	   set, therefore minimizing the chance to create loops.

683	                  v        v
684	                  |        |
685	           L1     |   L2   |   L3
686	       /-------- BFRa ---- BFRb --------------------\
687	       |                                            |
688	       \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/
689	           |      |    L4               |      |
690	        p33|                         p15|
691	           BFRd                       BFRc

693	                          Figure 7: Ring Example

695	   Note that this example only permits for packets to enter the ring at
696	   BFRa and BFRb, and that packets will always travel clockwise.  If
697	   packets should be allowed to enter the ring at any ring BFR, then one
698	   would have to use two ring BitPositions.  One for clockwise, one for
699	   counterclockwise.

701	   Both would be set up to stop rotating on the same link, eg: L1.  When
702	   the ingress ring BFR creates the clockwise copy, it will reset the
703	   counterclockwise BitPosition because the DNR bit only applies to the
704	   bit for which the replication is done.  Likewise for the clockwise
705	   BitPosition for the counterclockwise copy.  In result, the ring
706	   ingress BFR will send a copy in both directions, serving BFRs on
707	   either side of the ring up to L1.

709	4.7.  Equal Cost MultiPath (ECMP)

711	   The ECMP adjacency allows to use just one BP per link bundle between
712	   two BFRs instead of one BP for each p2p member link of that link
713	   bundle.  In the following picture, one BP is used across L1,L2,L3 and
714	   BFR1/BFR2 have for the BP
715	                --L1-----
716	           BFR1 --L2----- BFR2
717	                --L3-----

719	     BIFT entry in BFR1:
720	     ------------------------------------------------------------------
721	     | Index |  Adjacencies                                           |
722	     ==================================================================
723	     | 0:6   |  ECMP({L1-to-BFR2,L2-to-BFR2,L3-to-BFR2}, seed)        |
724	     ------------------------------------------------------------------

726	     BIFT entry in BFR2:
727	     ------------------------------------------------------------------
728	     | Index |  Adjacencies                                           |
729	     ==================================================================
730	     | 0:6   |  ECMP({L1-to-BFR1,L2-to-BFR1,L3-to-BFR1}, seed)        |
731	     ------------------------------------------------------------------

733	                          Figure 8: ECMP Example

735	   In the following example, all traffic from BFR1 towards BFR10 is
736	   intended to be ECMP load split equally across the topology.  This
737	   example is not mean as a likely setup, but to illustrate that ECMP
738	   can be used to share BPs not only across link bundles, and it
739	   explains the use of the seed parameter.

741	                    BFR1
742	                  /     \
743	                 /L11    \L12
744	             BFR2         BFR3
745	            /    \       /    \
746	           /L21   \L22  /L31   \L32
747	          BFR4  BFR5   BFR6  BFR7
748	           \      /     \      /
749	            \    /       \    /
750	             BFR8         BFR9
751	                 \       /
752	                  \     /
753	                   BFR10

755	     BIFT entry in BFR1:
756	     ------------------------------------------------------------------
757	     | 0:6   |  ECMP({L11-to-BFR2,L12-to-BFR3}, seed)                 |
758	     ------------------------------------------------------------------

760	     BIFT entry in BFR2:
761	     ------------------------------------------------------------------
762	     | 0:6   |  ECMP({L21-to-BFR4,L22-to-BFR5}, seed)                 |
763	     ------------------------------------------------------------------

765	     BIFT entry in BFR3:
766	     ------------------------------------------------------------------
767	     | 0:6   |  ECMP({L31-to-BFR6,L32-to-BFR7}, seed)                 |
768	     ------------------------------------------------------------------

770	                      Figure 9: Polarization Example

772	   With the setup of ECMP in above topology, traffic would not be
773	   equally load-split.  Instead, links L22 and L31 would see no traffic
774	   at all: BFR2 will only see traffic from BFR1 for which the ECMP hash
775	   in BFR1 selected the first adjacency in a list of 2 adjacencies: link
776	   L11-to-BFR2.  When forwarding in BFR2 performs again an ECMP with two
777	   adjacencies on that subset of traffic, then it will again select the
778	   first of its two adjacencies to it: L21-to-BFR4.  And therefore L22
779	   and BFR5 sees no traffic.

781	   To resolve this issue, the ECMP adjacency on BFR1 simply needs to be
782	   set up with a different seed than the ECMP adjacencies on BFR2/BFR3

784	   This issue is called polarization.  It depends on the ECMP hash.  It
785	   is possible to build ECMP that does not have polarization, for
786	   example by taking entropy from the actual adjacency members into
787	   account, but that can make it harder to achieve evenly balanced load-
788	   splitting on all BFR without making the ECMP hash algorithm
789	   potentially too complex for fast forwarding in the BFRs.

791	4.8.  Routed adjacencies

793	4.8.1.  Reducing BitPositions

795	   Routed adjacencies can reduce the number of BitPositions required
796	   when the traffic engineering requirement is not hop-by-hop explicit
797	   path selection, but loose-hop selection.

799	              ...............             ...............
800	       BFR1--... Redundant ...--L1-- BFR2... Redundant ...---
801	          \--... Network   ...--L2--/    ... Network   ...---
802	       BFR4--... Segment 1 ...--L3-- BFR3... Segment 2 ...---
803	              ...............             ...............

805	                   Figure 10: Routed Adjacencies Example

807	   Assume the requirement in above network is to explicitly engineer
808	   paths such that specific traffic flows are passed from segment 1 to
809	   segment 2 via link L1 (or via L2 or via L3).

811	   To achieve this, BFR1 and BFR4 are set up with a forward_routed
812	   adjacency BitPosition towards an address of BFR2 on link L1 (or link
813	   L2 BFR3 via L3).

815	   For paths to be engineered through a specific node BFR2 (or BFR3),
816	   BFR1 and BFR4 are set up up with a forward_routed adjacency
817	   BitPosition towards a loopback address of BFR2 (or BFR3).

819	4.8.2.  Supporting nodes without BIER-TE

821	   Routed adjacencies also enable incremental deployment of BIER-TE.
822	   Only the nodes through which BIER-TE traffic needs to be steered -
823	   with or without replication - need to support BIER-TE.  Where they
824	   are not directly connected to each other, forward_routed adjacencies
825	   are used to pass over non BIER-TE enabled nodes.

827	5.  Avoiding loops and duplicates

829	5.1.  Loops

831	   Whenever BIER-TE creates a copy of a packet, the BitString of that
832	   copy will have all BitPositions cleared that are associated with
833	   adjacencies in the BFR.  This inhibits looping of packets.  The only
834	   exception are adjacencies with DNR set.

836	   With DNR set, looping can happen.  Consider in the ring picture that
837	   link L4 from BFR3 is plugged into the L1 interface of BFRa.  This
838	   creates a loop where the rings clockwise BitPosition is never reset
839	   for copies of the packets traveling clockwise around the ring.

841	   To inhibit looping in the face of such physical misconfiguration,
842	   only forward_connected adjacencies are permitted to have DNR set, and
843	   the link layer destination address of the adjacency (eg.: MAC
844	   address) protects against closing the loop.  Link layers without port
845	   unique link layer addresses should not used with the DNR flag set.

847	5.2.  Duplicates

849	   Duplicates happen when the topology of the BitString is not a tree
850	   but redundantly connecting BFRs with each other.  The controller must
851	   therefore ensure to only create BitStrings that are trees in the
852	   topology.

854	   When links are incorrectly physically re-connected before the
855	   controller updates BitStrings in BFIRs, duplicates can happen.  Like
856	   loops, these can be inhibited by link layer addressing in
857	   forward_connected adjacencies.

859	   If interface or loopback addresses used in forward_routed adjacencies
860	   are moved from one BFR to another, duplicates can equally happen.
861	   Such re-addressing operations must be coordinated with the
862	   controller.

864	6.  BIER-TE Forwarding Pseudocode

866	   The following simplified pseudocode for BIER-TE forwarding is using
867	   BIER forwarding pseudocode of [RFC8279], section 6.5 with the one
868	   modification necessary to support basic BIER-TE forwarding.  Like the
869	   BIER pseudo forwarding code, for simplicity it does hide the details
870	   of the adjacency processing inside PacketSend() which can be
871	   forward_connected, forward_routed or local_decap.

873	      void ForwardBitMaskPacket_withTE (Packet)
874	      {
875	          SI=GetPacketSI(Packet);
876	          Offset=SI*BitStringLength;
877	          for (Index = GetFirstBitPosition(Packet->BitString); Index ;
878	               Index = GetNextBitPosition(Packet->BitString, Index)) {
879	              F-BM = BIFT[Index+Offset]->F-BM;
880	              if (!F-BM) continue;
881	              BFR-NBR = BIFT[Index+Offset]->BFR-NBR;
882	              PacketCopy = Copy(Packet);
883	              PacketCopy->BitString &= F-BM;                  [2]
884	              PacketSend(PacketCopy, BFR-NBR);
885	              // The following must not be done for BIER-TE:
886	              // Packet->BitString &= ~F-BM;                  [1]
887	          }
888	      }

890	            Figure 11: Simplified BIER-TE Forwarding Pseudocode

892	   The difference is that in BIER-TE, step [1] must not be performed.

894	   In BIER, this step is necessary to avoid duplicates when two or more
895	   BFER are reachable via the same neighbor.  The F-BM of all those BFER
896	   bits will indicate each others bits, and step [1] will reset all
897	   these bits on the first copy made for the first of those BFER bits
898	   set in the BitString, hence skipping any further copies to that
899	   neighbor.

901	   Whereas in BIER, the F-BM of bits toward a specific neighbor contain
902	   only the bits of those BFER destined to be forwarded across this
903	   neighbor, in BIER-TE the F-BM for a neighbor needs to have all bits
904	   set except all those bits that are actual (non-empty) adjacencies of
905	   this BFR.  Step [2] will reset those adjacency bits to avoid loops,
906	   but all the other bits that are not adjacencies of this BFR need to
907	   stay untouched by [2] so that they can be processed by further BFR
908	   along the path.  If [1] was performed as in BIER, then those non-
909	   adjacency bits would erroneously get reset during replication.

911	   To support the DNR (Do Not Reset) flag of forward_connected()
912	   adjacencies, the F-BM must also have its own bit set in the F-BM of
913	   such an adjacency , so that for the packet copy made for this
914	   adjacency the bit stays on, whereas it will not be set in the F-BM of
915	   other bits so that it will be reset for any other packet copy made.

917	   Eliminating the need to perform [1] also makes processing of bits in
918	   the BIER-TE bitstring independent of processing other bits, which may
919	   also simplify forwarding plane implementations.

921	   The following pseudocode is comprehensive:

923	   o  This pseudocode eliminates per-bit F-BM, therefore reducing state
924	      by BitStringLength^2*SI and eliminating the need for per-packet-
925	      copy masking operation except for adjacencies with DNR flag set:

927	      *  AdjacentBits[SI] are bits with a non-empty list of adjcencies.
928	         This can be computed whenever the BIER-TE controller host
929	         updates the adjacencies.

931	      *  Only the AdjacentBits need to be examined in the loop for
932	         packet copies.

934	      *  The packets BitString is masked with those AdjacentBits on
935	         ingres to avoid packet loopings.

937	   o  The code loops over the adjacencies because there may be more than
938	      one adjacency for a bit.

940	   o  When an adjacency has the DNR bit, the bit is set in the packet
941	      copy (to save bits in rings for example).

943	   o  The ECMP adjacency is shown.  Its parameters are a
944	      ListOfAdjacencies from which one is picked.

946	   o  The forward_local, forward_routed, local_decap adjacencies are
947	      shown with their parameters.

949	     void ForwardBitMaskPacket_withTE (Packet)
950	     {
951	         SI=GetPacketSI(Packet);
952	         Offset=SI*BitStringLength;
953	         AdjacentBitstring = Packet->BitString &= ~AdjacentBits[SI];
954	         Packet->BitString &= AdjacentBits[SI];
955	         for (Index = GetFirstBitPosition(AdjacentBits); Index ;
956	              Index = GetNextBitPosition(AdjacentBits, Index)) {
957	             foreach adjacency BIFT[Index+Offset] {
958	                 if(adjacency == ECMP(ListOfAdjacencies, seed) ) {
959	                     I = ECMP_hash(sizeof(ListOfAdjacencies),
960	                                   Packet->Entropy, seed);
961	                     adjacency = ListOfAdjacencies[I];
962	                 }
963	                 PacketCopy = Copy(Packet);
964	                 switch(adjacency) {
965	                     case forward_connected(interface,neighbor,DNR):
966	                         if(DNR)
967	                             PacketCopy->BitString |= 2<<(Index-1);
968	                         SendToL2Unicast(PacketCopy,interface,neighbor);

970	                     case forward_routed([VRF],neighbor):
971	                         SendToL3(PacketCopy,[VRF,]l3-neighbor);

973	                     case local_decap([VRF],neighbor):
974	                         DecapBierHeader(PacketCopy);
975	                         PassTo(PacketCopy,[VRF,]Packet->NextProto);
976	                 }
977	             }
978	         }
979	     }

981	                 Figure 12: BIER-TE Forwarding Pseudocode

983	7.  Managing SI, subdomains and BFR-ids

985	   When the number of bits required to represent the necessary hops in
986	   the topology and BFER exceeds the supported bitstring length,
987	   multiple SI and/or subdomains must be used.  This section discusses
988	   how.

990	   BIER-TE forwarding does not require the concept of BFR-id, but
991	   routing underlay, flow overlay and BIER headers may.  This section
992	   also discusses how BFR-id can be assigned to BFIR/BFER for BIER-TE.

994	7.1.  Why SI and sub-domains

996	   For BIER and BIER-TE forwarding, the most important result of using
997	   multiple SI and/or subdomains is the same: Packets that need to be
998	   sent to BFER in different SI or subdomains require different BIER
999	   packets: each one with a bitstring for a different (SI,subdomain)
1000	   bitstring.  Each such bitstring uses one bitstring length sized SI
1001	   block in the BIFT of the subdomain.  We call this a BIFT:SI (block).

1003	   For BIER and BIER-TE forwarding itself there is also no difference
1004	   whether different SI and/or sub-domains are chosen, but SI and
1005	   subdomain have different purposes in the BIER architecture shared by
1006	   BIER-TE.  This impacts how operators are managing them and how
1007	   especially flow overlays will likely use them.

1009	   By default, every possible BFIR/BFER in a BIER network would likely
1010	   be given a BFR-id in subdomain 0 (unless there are > 64k BFIR/BFER).

1012	   If there are different flow services (or service instances) requiring
1013	   replication to different subsets of BFER, then it will likely not be
1014	   possible to achieve the best replication efficiency for all of these
1015	   service instances via subdomain 0.  Ideal replication efficiency for
1016	   N BFER exists in a subdomain if they are split over not more than
1017	   ceiling(N/bitstring-length) SI.

1019	   If service instances justify additional BIER:SI state in the network,
1020	   additional subdomains will be used: BFIR/BFER are assigned BFIR-id in
1021	   those subdomains and each service instance is configured to use the
1022	   most appropriate subdomain.  This results in improved replication
1023	   efficiency for different services.

1025	   Even if creation of subdomains and assignment of BFR-id to BFIR/BFER
1026	   in those subdomains is automated, it is not expected that individual
1027	   service instances can deal with BFER in different subdomains.  A
1028	   service instance may only support configuration of a single subdomain
1029	   it should rely on.

1031	   To be able to easily reuse (and modify as little as possible)
1032	   existing BIER procedures including flow-overlay and routing underlay,
1033	   when BIER-TE forwarding is added, we therefore reuse SI and subdomain
1034	   logically in the same way as they are used in BIER: All necessary
1035	   BFIR/BFER for a service use a single BIER-TE BIFT and are split
1036	   across as many SI as necessary (see below).  Different services may
1037	   use different subdomains that primarily exist to provide more
1038	   efficient replication (and for BIER-TE desirable traffic engineering)
1039	   for different subsets of BFIR/BFER.

1041	7.2.  Bit assignment comparison BIER and BIER-TE

1043	   In BIER, bitstrings only need to carry bits for BFER, which lead to
1044	   the model that BFR-ids map 1:1 to each bit in a bitstring.

1046	   In BIER-TE, bitstrings need to carry bits to indicate not only the
1047	   receiving BFER but also the intermediate hops/links across which the
1048	   packet must be sent.  The maximum number of BFER that can be
1049	   supported in a single bitstring or BIFT:SI depends on the number of
1050	   bits necessary to represent the desired topology between them.

1052	   "Desired" topology because it depends on the physical topology, and
1053	   on the desire of the operator to allow for explicit traffic
1054	   engineering across every single hop (which requires more bits), or
1055	   reducing the number of required bits by exploiting optimizations such
1056	   as unicast (forward_route), ECMP or flood (DNR) over "uninteresting"
1057	   sub-parts of the topology - eg: parts where different trees do not
1058	   need to take different paths due to traffic-engineering reasons.

1060	   The total number of bits to describe the topology in a BIFT:SI can
1061	   therefore easily be as low as 20% or as high as 80%. The higher the
1062	   percentage, the higher the likelihood, that those topology bits are
1063	   not just BIER-TE overhead without additional benefit, but instead
1064	   they will allow to express the desired traffic-engineering
1065	   alternatives.

1067	7.3.  Using BFR-id with BIER-TE

1069	   Because there is no 1:1 mapping between bits in the bitstring and
1070	   BFER, BIER-TE can not simply rely on the BIER 1:1 mapping between
1071	   bits in a bitstring and BFR-id.

1073	   In BIER, automatic schemes could assign all possible BFR-ids
1074	   sequentially to BFERs.  This will not work in BIER-TE.  In BIER-TE,
1075	   the operator or BIER-TE controller host has to determine a BFR-id for
1076	   each BFER in each required subdomain.  The BFR-id may or may not have
1077	   a relationship with a bit in the bitstring.  Suggestions are detailed
1078	   below.  Once determined, the BFR-id can then be configured on the
1079	   BFER and used by flow overlay, routing underlay and the BIER header
1080	   almost the same as the BFR-id in BIER.

1082	   The one exception are application/flow-overlays that automatically
1083	   calculate the bitstring(s) of BIER packets by converting BFR-id to
1084	   bits.  In BIER-TE, this operation can be done in two ways:

1086	   "Independent branches": For a given application or (set of) trees,
1087	   the branches from a BFIR to every BFER are independent of the
1088	   branches to any other BFER.  For example, shortest part trees have
1089	   independent branches.

1091	   "Interdependent branches": When a BFER is added or deleted from a
1092	   particular distribution tree, branches to other BFER still in the
1093	   tree may need to change.  Steiner tree are examples of dependent
1094	   branch trees.

1096	   If "independent branches" are sufficient, the BIER-TE controller host
1097	   can provide to such applications for every BFR-id a SI:bitstring with
1098	   the BIER-TE bits for the branch towards that BFER.  The application
1099	   can then independently calculate the SI:bitstring for all desired
1100	   BFER by OR'ing their bitstrings.

1102	   If "interdependent branches" are required, the application could call
1103	   a BIER-TE controller host API with the list of required BFER-id and
1104	   get the required bitstring back.  Whenever the set of BFER-id
1105	   changes, this is repeated.

1107	   Note that in either case (unlike in BIER), the bits in BIER-TE may
1108	   need to change upon link/node failure/recovery, network expansion and
1109	   network load by other traffic (as part of traffic engineering goals).
1110	   Interactions between such BFIR applications and the BIER-TE
1111	   controller host do therefore need to support dynamic updates to the
1112	   bitstrings.

1114	7.4.  Assigning BFR-ids for BIER-TE

1116	   For non-leaf BFER, there is usually a single bit k for that BFER with
1117	   a local_decap() adjacency on the BFER.  The BFR-id for such a BFER is
1118	   therefore most easily the one it would have in BIER: SI * bitstring-
1119	   length + k.

1121	   As explained earlier in the document, leaf BFER do not need such a
1122	   separate bit because the fact alone that the BIER-TE packet is
1123	   forwarded to the leaf BFER indicates that the BFER should decapsulate
1124	   it.  Such a BFER will have one or more bits for the links leading
1125	   only to it.  The BFR-id could therefore most easily be the BFR-id
1126	   derived from the lowest bit for those links.

1128	   These two rules are only recommendations for the operator or BIER-TE
1129	   controller assigning the BFR-ids.  Any allocation scheme can be used,
1130	   the BFR-ids just need to be unique across BFRs in each subdomain.

1132	   It is not currently determined if a single subdomain could or should
1133	   be allowed to forward both BIER and BIER-TE packets.  If this should
1134	   be supported, there are two options:

1136	   A.  BIER and BIER-TE have different BFR-id in the same subdomain.
1137	   This allows higher replication efficiency for BIER because their BFR-
1138	   id can be assigned sequentially, while the bitstrings for BIER-TE
1139	   will have also the additional bits for the topology.  There is no
1140	   relationship between a BFR BIER BFR-id and BIER-TE BFR-id.

1142	   B.  BIER and BIER-TE share the same BFR-id.  The BFR-id are assigned
1143	   as explained above for BIER-TE and simply reused for BIER.  The
1144	   replication efficiency for BIER will be as low as that for BIER-TE in
1145	   this approach.  Depending on topology, only the same 20%..80% of bits
1146	   as possible for BIER-TE can be used for BIER.

1148	7.5.  Example bit allocations

1150	7.5.1.  With BIER

1152	   Consider a network setup with a bitstring length of 256 for a network
1153	   topology as shown in the picture below.  The network has 6 areas,
1154	   each with ca. 180 BFR, connecting via a core with some larger (core)
1155	   BFR.  To address all BFER with BIER, 4 SI are required.  To send a
1156	   BIER packet to all BFER in the network, 4 copies need to be sent by
1157	   the BFIR.  On the BFIR it does not make a difference how the BFR-id
1158	   are allocated to BFER in the network, but for efficiency further down
1159	   in the network it does make a difference.

1161	                area1           area2        area3
1162	               BFR1a BFR1b  BFR2a BFR2b   BFR3a BFR3b
1163	                 |  \         /    \        /  |
1164	                 ................................
1165	                 .                Core          .
1166	                 ................................
1167	                 |    /       \    /        \  |
1168	               BFR4a BFR4b  BFR5a BFR5b   BFR6a BFR6b
1169	                area4          area5        area6

1171	                 Figure 13: Scaling BIER-TE bits by reuse

1173	   With random allocation of BFR-id to BFER, each receiving area would
1174	   (most likely) have to receive all 4 copies of the BIER packet because
1175	   there would be BFR-id for each of the 4 SI in each of the areas.
1176	   Only further towards each BFER would this duplication subside - when
1177	   each of the 4 trees runs out of branches.

1179	   If BFR-id are allocated intelligently, then all the BFER in an area
1180	   would be given BFR-id with as few as possible different SI.  Each
1181	   area would only have to forward one or two packets instead of 4.

1183	   Given how networks can grow over time, replication efficiency in an
1184	   area will also easily go down over time when BFR-id are network wide
1185	   allocated sequentially over time.  An area that initially only has
1186	   BFR-id in one SI might end up with many SI over a longer period of
1187	   growth.  Allocating SIs to areas with initially sufficiently many
1188	   spare bits for growths can help to alleviate this issue.  Or renumber
1189	   BFR-id after network expansion.  In this example one may consider to
1190	   use 6 SI and assign one to each area.

1192	   This example shows that intelligent BFR-id allocation within at least
1193	   subdomain 0 can even be helpful or even necessary in BIER.

1195	7.5.2.  With BIER-TE

1197	   In BIER-TE one needs to determine a subset of the physical topology
1198	   and attached BFER so that the "desired" representation of this
1199	   topology and the BFER fit into a single bitstring.  This process
1200	   needs to be repeated until the whole topology is covered.

1202	   Once bits/SIs are assigned to topology and BFER, BFR-id is just a
1203	   derived set of identifiers from the operator/BIER-TE controller as
1204	   explained above.

1206	   Every time that different sub-topologies have overlap, bits need to
1207	   be repeated across the bitstrings, increasing the overall amount of
1208	   bits required across all bitstring/SIs.  In the worst case, random
1209	   subsets of BFER are assigned to different SI.  This is much worse
1210	   than in BIER because it not only reduces replication efficiency with
1211	   the same number of overall bits, but even further - because more bits
1212	   are required due to duplication of bits for topology across multiple
1213	   SI.  Intelligent BFER to SI assignment and selecting specific
1214	   "desired" subtopologies can minimize this problem.

1216	   To set up BIER-TE efficiently for above topology, the following bit
1217	   allocation methods can be used.  This method can easily be expanded
1218	   to other, similarly structured larger topologies.

1220	   Each area is allocated one or more SI depending on the number of
1221	   future expected BFER and number of bits required for the topology in
1222	   the area.  In this example, 6 SI, one per area.

1224	   In addition, we use 4 bits in each SI: bia, bib, bea, beb: bit
1225	   ingress a, bit ingress b, bit egress a, bit egress b.  These bits
1226	   will be used to pass BIER packets from any BFIR via any combination
1227	   of ingress area a/b BFR and egress area a/b BFR into a specific
1228	   target area.  These bits are then set up with the right
1229	   forward_routed adjacencies on the BFIR and area edge BFR:

1231	   On all BFIR in an area j, bia in each BIFT:SI is populated with the
1232	   same forward_routed(BFRja), and bib with forward_routed(BFRjb).  On
1233	   all area edge BFR, bea in BIFT:SI=k is populated with
1234	   forward_routed(BFRka) and beb in BIFT:SI=k with
1235	   forward_routed(BFRkb).

1237	   For BIER-TE forwarding of a packet to some subset of BFER across all
1238	   areas, a BFIR would create at most 6 copies, with SI=1...SI=6, In
1239	   each packet, the bits indicate bits for topology and BFER in that
1240	   topology plus the four bits to indicate whether to pass this packet
1241	   via the ingress area a or b border BFR and the egress area a or b
1242	   border BFR, therefore allowing path engineering for those two
1243	   "unicast" legs: 1) BFIR to ingress are edge and 2) core to egress
1244	   area edge.  Replication only happens inside the egress areas.  For
1245	   BFER in the same area as in the BFIR, these four bits are not used.

1247	7.6.  Summary

1249	   BIER-TE can like BIER support multiple SI within a sub-domain to
1250	   allow re-using the concept of BFR-id and therefore minimize BIER-TE
1251	   specific functions in underlay routing, flow overlay methods and BIER
1252	   headers.

1254	   The number of BFIR/BFER possible in a subdomain is smaller than in
1255	   BIER because BIER-TE uses additional bits for topology.

1257	   Subdomains can in BIER-TE be used like in BIER to create more
1258	   efficient replication to known subsets of BFER.

1260	   Assigning bits for BFER intelligently into the right SI is more
1261	   important in BIER-TE than in BIER because of replication efficiency
1262	   and overall amount of bits required.

1264	8.  BIER-TE and Segment Routing

1266	   Segment Routing aims to achieve lightweight path engineering via
1267	   loose source routing.  Compared for example to RSVP-TE, it does not
1268	   require per-path signaling to each of these hops.

1270	   BIER-TE is supports the same design philosophy for multicast.  Like
1271	   in SR, it relies on source-routing - via the definition of a
1272	   BitString.  Like SR, it only requires to consider the "hops" on which
1273	   either replication has to happen, or across which the traffic should
1274	   be steered (even without replication).  Any other hops can be skipped
1275	   via the use of routed adjacencies.

1277	   Instead of defining BitPositions for non-replicating hops, it is
1278	   equally possible to use segment routing encapsulations (eg: MPLS
1279	   label stacks) for "forward_routed" adjacencies.

1281	   Note that BIER itself is also similar to SR - it achieves the same as
1282	   "Shortest Path SID" where the label stack uses only one SID to
1283	   indicate the egres node of the SR domain.  Instead of routing such a
1284	   SR packet hop-by-hop based on that SID, BIER routes the packet hop-
1285	   by-hop based on the BFER-id bits of the egres nodes of the BIER
1286	   domain.  What BIER does not allow is to indicate intermediate hops,
1287	   or terms of SR lavbel stacks with more than one SID in the stack (for
1288	   the same SR domain).  This is what BIER-TE provides.

1290	9.  Security Considerations

1292	   The security considerations are the same as for BIER with the
1293	   following differences:

1295	   BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures
1296	   for their distribution, so these are not attack vectors against BIER-
1297	   TE.

1299	10.  IANA Considerations

1301	   This document requests no action by IANA.

1303	11.  Acknowledgements

1305	   The authors would like to thank Greg Shepherd, Ijsbrand Wijnands and
1306	   Neale Ranns for their extensive review and suggestions.

1308	12.  Change log [RFC Editor: Please remove]

1310	   draft-ietf-bier-te-arch:

1312	      02: Refresh after IETF104 discussion: changed intended status back
1313	      to standard.  Reasoning:

1315	      Tighter review of standards document == ensures arch will be
1316	      better prepared for possible adoption by other WGs (e.g.: DetNet)
1317	      or std. bodies.

1319	      Requirement against the degree of existing implementations is self
1320	      defined by the WG.  BIER WG seems to think it is not necessary to
1321	      apply multiple interoperating implementions against an
1322	      architecture level document at this time to make it qualify to go
1323	      to standards track.  Also, the levels of support introduced in -01
1324	      rev. should allow all BIER forwarding engines to also be able to
1325	      support the base level BIER-TE forwarding.

1327	      01: Added note comparing BIER and SR to also hopefully clarify
1328	      BIER-TE vs. BIER comparison re.  SR.

1330	      - added requirements section mandating only most basic BIER-TE
1331	      forwarding features as MUST.

1333	      - reworked comparison with BIER forwarding section to only
1334	      summarize and point to pseudocode section.

1336	      - reworked pseudocode section to have one pseodcode that mirrors
1337	      the BIER forwarding pseudocode to make comparison easier and a
1338	      second pseudocode that shows the complete set of BIER-TE
1339	      forwarding options and simplification/optimization possible vs.
1340	      BIER forwarding.

1342	      - Added captions to pictures.

1344	      00: Changed target state to experimental (WG conclusion), updated
1345	      references, mod auth association.

1347	      - Source now on http://www.github.com/toerless/bier-te-arch

1349	      - Please open issues on the github for change/improvement requests
1350	      to the document - in addition to posting them on the list
1351	      (bier@ietf.).  Thanks!.

1353	   draft-eckert-bier-te-arch:

1355	      06: Added overview of forwarding differences between BIER, BIER-
1356	      TE.

1358	      05: Author affiliation change only.

1360	      04: Added comparison to Live-Live and BFIR to FRR section
1361	      (Eckert).

1363	      04: Removed FRR content into the new FRR draft [I-D.eckert-bier-
1364	      te-frr] (Braun).

1366	      - Linked FRR information to new draft in Overview/Introduction

1368	      - Removed BTAFT/FRR from "Changes in the network topology"

1370	      - Linked new draft in "Link/Node Failures and Recovery"
1371	      - Removed FRR from "The BIER-TE Forwarding Layer"

1373	      - Moved FRR section to new draft

1375	      - Moved FRR parts of Pseudocode into new draft

1377	      - Left only non FRR parts

1379	      - removed FrrUpDown(..) and //FRR operations in
1380	      ForwardBierTePacket(..)

1382	      - New draft contains FrrUpDown(..) and ForwardBierTePacket(Packet)
1383	      from bier-arch-03

1385	      - Moved "BIER-TE and existing FRR to new draft

1387	      - Moved "BIER-TE and Segment Routing" section one level up

1389	      - Thus, removed "Further considerations" that only contained this
1390	      section

1392	      - Added Changes for version 04

1394	      03: Updated the FRR section.  Added examples for FRR key concepts.
1395	      Added BIER-in-BIER tunneling as option for tunnels in backup
1396	      paths.  BIFT structure is expanded and contains an additional
1397	      match field to support full node protection with BIER-TE FRR.

1399	      03: Updated FRR section.  Explanation how BIER-in-BIER
1400	      encapsulation provides P2MP protection for node failures even
1401	      though the routing underlay does not provide P2MP.

1403	      02: Changed the definition of BIFT to be more inline with BIER.
1404	      In revs. up to -01, the idea was that a BIFT has only entries for
1405	      a single bitstring, and every SI and subdomain would be a separate
1406	      BIFT.  In BIER, each BIFT covers all SI.  This is now also how we
1407	      define it in BIER-TE.

1409	      02: Added Section 7 to explain the use of SI, subdomains and BFR-
1410	      id in BIER-TE and to give an example how to efficiently assign
1411	      bits for a large topology requiring multiple SI.

1413	      02: Added further detailed for rings - how to support input from
1414	      all ring nodes.

1416	      01: Fixed BFIR -> BFER for section 4.3.

1418	      01: Added explanation of SI, difference to BIER ECMP,
1419	      consideration for Segment Routing, unicast FRR, considerations for
1420	      encapsulation, explanations of BIER-TE controller host and CLI.

1422	      00: Initial version.

1424	13.  References

1426	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1427	              Requirement Levels", BCP 14, RFC 2119,
1428	              DOI 10.17487/RFC2119, March 1997,
1429	              <https://www.rfc-editor.org/info/rfc2119>.

1431	   [RFC8279]  Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
1432	              Przygienda, T., and S. Aldrin, "Multicast Using Bit Index
1433	              Explicit Replication (BIER)", RFC 8279,
1434	              DOI 10.17487/RFC8279, November 2017,
1435	              <https://www.rfc-editor.org/info/rfc8279>.

1437	   [RFC8296]  Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
1438	              Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation
1439	              for Bit Index Explicit Replication (BIER) in MPLS and Non-
1440	              MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January
1441	              2018, <https://www.rfc-editor.org/info/rfc8296>.

1443	Authors' Addresses

1445	   Toerless Eckert (editor)
1446	   Huawei USA - Futurewei Technologies Inc.
1447	   2330 Central Expy
1448	   Santa Clara  95050
1449	   USA

1451	   Email: tte+ietf@cs.fau.de

1453	   Gregory Cauchie
1454	   Bouygues Telecom

1456	   Email: GCAUCHIE@bouyguestelecom.fr

1458	   Wolfgang Braun
1459	   University of Tuebingen

1461	   Email: wolfgang.braun@uni-tuebingen.de
1462	   Michael Menth
1463	   University of Tuebingen

1465	   Email: menth@uni-tuebingen.de