idnits 2.17.1 

draft-eckert-bier-te-arch-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references
     ([I-D.wijnands-bier-architecture]), which it shouldn't.  Please replace
     those with straight textual mentions of the documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (March 5, 2015) is 3334 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'VRF' is mentioned on line 882, but not defined

  == Missing Reference: 'Index' is mentioned on line 864, but not defined

  == Missing Reference: 'BitStringLength' is mentioned on line 814, but not
     defined

  == Missing Reference: 'BP' is mentioned on line 858, but not defined

  == Missing Reference: 'BT' is mentioned on line 859, but not defined

  == Missing Reference: 'I' is mentioned on line 869, but not defined

  == Unused Reference: 'I-D.wijnands-mpls-bier-encapsulation' is defined on
     line 917, but no explicit reference was found in the text

  == Outdated reference: A later version (-05) exists of
     draft-wijnands-bier-architecture-04


     Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          T. Eckert
3	Internet-Draft                                                     Cisco
4	Intended status: Standards Track                           March 5, 2015
5	Expires: September 6, 2015

7	     Traffic Enginering for Bit Index Explicit Replication BIER-TE
8	                      draft-eckert-bier-te-arch-00

10	Abstract

12	   This document proposes an architecture for BIER-TE: Traffic
13	   Engineering for Bit Index Explicit Replication (BIER).

15	   BIER-TE shares part of its architecture with BIER as described in
16	   [I-D.wijnands-bier-architecture].  It also proposes to share the
17	   packet format with BIER.

19	   BIER-TE forwards and replicates packets like BIER based on a
20	   BitString in the packet header but it does not require an IGP.  It
21	   does support traffic engineering by explicit hop-by-hop forwarding
22	   and loose hop forwarding of packets.  It does support Fast ReRoute
23	   (FRR) for link and node protection and incremental deployment.
24	   Because BIER-TE like BIER operates without explicit in-network tree-
25	   building but also supports traffic engineering, it is more similar to
26	   SR than RSVP-TE.

28	Status of This Memo

30	   This Internet-Draft is submitted in full conformance with the
31	   provisions of BCP 78 and BCP 79.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF).  Note that other groups may also distribute
35	   working documents as Internet-Drafts.  The list of current Internet-
36	   Drafts is at http://datatracker.ietf.org/drafts/current/.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   This Internet-Draft will expire on September 6, 2015.

45	Copyright Notice

47	   Copyright (c) 2015 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document.  Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document.  Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Table of Contents

62	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
63	     1.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   3
64	     1.2.  Requirements Language . . . . . . . . . . . . . . . . . .   4
65	   2.  Layering  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
66	     2.1.  The Multicast Flow Overlay  . . . . . . . . . . . . . . .   4
67	     2.2.  The BIER-TE Controller Host . . . . . . . . . . . . . . .   4
68	       2.2.1.  Assignment of BitPositions to adjacencies of the
69	               network topology  . . . . . . . . . . . . . . . . . .   5
70	       2.2.2.  Changes in the network topology . . . . . . . . . . .   5
71	       2.2.3.  Set up per-multicast flow BIER-TE state . . . . . . .   5
72	       2.2.4.  Link/Node Failures and Recovery . . . . . . . . . . .   6
73	     2.3.  The BIER-TE Forwarding Layer  . . . . . . . . . . . . . .   6
74	     2.4.  The Routing Underlay  . . . . . . . . . . . . . . . . . .   6
75	   3.  BIER-TE Forwarding  . . . . . . . . . . . . . . . . . . . . .   6
76	     3.1.  The Bit Index Forwarding Table (BIFT) . . . . . . . . . .   7
77	     3.2.  Adjacency Types . . . . . . . . . . . . . . . . . . . . .   7
78	       3.2.1.  Forward Connected . . . . . . . . . . . . . . . . . .   7
79	       3.2.2.  Forward Routed  . . . . . . . . . . . . . . . . . . .   8
80	       3.2.3.  ECMP  . . . . . . . . . . . . . . . . . . . . . . . .   8
81	       3.2.4.  Local Decap . . . . . . . . . . . . . . . . . . . . .   8
82	     3.3.  Basic BIER-TE Forwarding Example  . . . . . . . . . . . .   8
83	   4.  BIER-TE Controller Host BitPosition Assignments . . . . . . .  10
84	     4.1.  P2P Links . . . . . . . . . . . . . . . . . . . . . . . .  10
85	     4.2.  BFER  . . . . . . . . . . . . . . . . . . . . . . . . . .  11
86	     4.3.  Leaf BFIRs  . . . . . . . . . . . . . . . . . . . . . . .  11
87	     4.4.  LANs  . . . . . . . . . . . . . . . . . . . . . . . . . .  11
88	     4.5.  Hub and Spoke . . . . . . . . . . . . . . . . . . . . . .  12
89	     4.6.  Rings . . . . . . . . . . . . . . . . . . . . . . . . . .  12
90	     4.7.  Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . .  12
91	     4.8.  Routed adjacencies  . . . . . . . . . . . . . . . . . . .  15
92	       4.8.1.  Supporting nodes without BIER-TE  . . . . . . . . . .  15

94	   5.  Avoiding loops and duplicates . . . . . . . . . . . . . . . .  15
95	     5.1.  Loops . . . . . . . . . . . . . . . . . . . . . . . . . .  15
96	     5.2.  Duplicates  . . . . . . . . . . . . . . . . . . . . . . .  16
97	   6.  FRR . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  16
98	     6.1.  The BIER-TE Adjacency FRR Table (BTAFT) . . . . . . . . .  16
99	     6.2.  FRR in BIER-TE forwarding . . . . . . . . . . . . . . . .  17
100	     6.3.  FRR in the BIER-TE Controller Host  . . . . . . . . . . .  17
101	     6.4.  BIER-TE FRR Benefits  . . . . . . . . . . . . . . . . . .  18
102	   7.  BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . .  18
103	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  21
104	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  21
105	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  21
106	   11. Change log [RFC Editor: Please remove]  . . . . . . . . . . .  21
107	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  21
108	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  21

110	1.  Introduction

112	1.1.  Overview

114	   This document specifies the architecture for BIER-TE: traffic
115	   engineering for Bit Index Explicit Replication BIER.

117	   BIER-TE shares architecture and packet formats with BIER as described
118	   in [I-D.wijnands-bier-architecture].

120	   BIER-TE forwards and replicates packets like BIER based on a
121	   BitString in the packet header but it does not require an IGP.  It
122	   does support traffic engineering by explicit hop-by-hop forwarding
123	   and loose hop forwarding of packets.  It does support Fast ReRoute
124	   (FRR) for link and node protection and incremental deployment.
125	   Because BIER-TE like BIER operates without explicit in-network tree-
126	   building but also supports traffic engineering, it is more similar to
127	   SR than RSVP-TE.

129	   The key differences over BIER are:

131	   o  BIER-TE replaces in-network autonomous path calculation by
132	      explicit paths calculated offpath by the BIER-TE controller host.

134	   o  In BIER-TE every BitPosition of the BitString of a BIER-TE packet
135	      indicates one or more adjacencies - instead of a BFER as in BIER.

137	   o  BIER-TE in each BFR has no routing table but only a BIER-TE
138	      Forwarding Table (BIFT) indexed by BitPosition and populated with
139	      only those adjacencies to which the BFR should replicate packets
140	      to.

142	   Currently, BIER-TE does not support BIER-sub-domains and it does not
143	   not use BFR-id or "Set Identifiers" (SI) in BIER-TE headers that
144	   share the same format as BIER headers.

146	1.2.  Requirements Language

148	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
149	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
150	   document are to be interpreted as described in RFC 2119 [RFC2119].

152	2.  Layering

154	   End to end BIER-TE operations consists of four components: The
155	   "Multicast Flow Overlay", the "BIER-TE Controller Host", the "Routing
156	   Underlay" and the "BIER-TE forwarding layer".

158	      Picture 2: Layers of BIER-TE

160	                   <------BGP/PIM----->
161	      |<-IGMP/PIM->  multicast flow   <-PIM/IGMP->|
162	                        overlay

164	                   [Bier-TE Controller Host]
165	                      ^      ^     ^
166	                     /       |      \   BIER-TE control protocol
167	                    |        |       |  eg.: Netconf/Restconf/Yang
168	                    v        v       v
169	    Src -> Rtr1 -> BFIR-----BFR-----BFER -> Rtr2 -> Rcvr

171	                   |--------------------->|
172	                   BIER-TE forwarding layer

174	                   |<- BIER-TE domain-->|

176	                  |<--------------------->|
177	                      Routing underlay

179	2.1.  The Multicast Flow Overlay

181	   The Multicast Flow Overlay operates as in BIER.  See
182	   [I-D.wijnands-bier-architecture].  Instead of interacting with the
183	   BIER layer, it interacts with the BIER-TE Controller Host

185	2.2.  The BIER-TE Controller Host

187	   The BIER-TE controller host is an offpath central host.  It
188	   communicates via protocols such as Netconf/Restconf/Yang with BFRs.
189	   The protocols used between BFRs and the controller are outside the
190	   scope of this document.  This document is only concerned about the
191	   logic how a controller can assign BitPositions to the topology and
192	   BitStrings to BIER-TE packets:

194	   During bring-up or modifications of the network topology, the
195	   controller needs to talk to all BFRs to assign BitPositions to
196	   adjacencies of the network topology.  During day-to-day operations of
197	   the network it only needs to talks to BFIRs to install BitStrings for
198	   multicast flows.

200	   These two tasks have the following steps:

202	2.2.1.  Assignment of BitPositions to adjacencies of the network
203	        topology

205	   The BIER-TE controller host tracks the BFR topology of the BIER-TE
206	   domain.  It determines what adjacencies require BitPositions so that
207	   BIER-TE explicit paths can be built through them as desired by
208	   operator policy.

210	   The controller then pushes the BitPositions/adjacencies to the BIFT
211	   of the BFRs, populating only those BitPositions to the BIFT of each
212	   BFR to which that BFR should be able to send packets to - adjacencies
213	   connecting to this BFR.

215	2.2.2.  Changes in the network topology

217	   If the network topology changes (not failure based) so that
218	   adjacencies that are assigned to BitPositions are no longer needed,
219	   the controller can re-use those BitPositions for new adjacencies.
220	   First, these BitPositions need to be removed from any BFIR flow state
221	   and BFR BIFT state (and BTAFT if FRR is supported, see below), then
222	   they can be repopulated, first into BIFT (and if FRR is supported
223	   BTAFT), then into BFIR.

225	2.2.3.  Set up per-multicast flow BIER-TE state

227	   The BIER-TE controller host tracks the multicast flow overlay to
228	   determine what multicast flow needs to be sent by a BFIR to which set
229	   of BFER.  It calculates the desired distribution tree across the
230	   BIER-TE domain based on algorithms outside the scope of this document
231	   (eg.: CSFP, Steiner Tree,...).  It then pushes the calculated
232	   BitString into the BFIR.

234	2.2.4.  Link/Node Failures and Recovery

236	   When link or nodes fail or recover in the topology, BIER-TE can
237	   quickly respond with the optional FRR procedures described below.  It
238	   can also more slowly react by recalculating the BitStrings of
239	   affected multicast flows.  This reaction is slower than the FR
240	   procedure because the controller needs to receive link/node up/down
241	   indications, recalculate the desired BitStrings and push them down
242	   into the BFIRs. with FRR, this is all performed locally on a BFR
243	   receiving the adjacency up/down notification.

245	2.3.  The BIER-TE Forwarding Layer

247	   When the BIER-TE Forwarding Layer receives a packet, it simply looks
248	   up the BitPositions that are set in the BitString of the packet in
249	   the Bit Index Forwarding Table (BIFT) that was populated by the BIER-
250	   TE controller host.  For every BP that is set in the BitString, and
251	   that has one or more adjacencies in the BIFT, a copy is made
252	   according to the type of adjacencies for that BP in the BIFT.  Before
253	   sending any copy, the BFR resets all BitPositions in the BitString of
254	   the packet to which it can create a copy.  This is done to inhibit
255	   that packets can loop.

257	   If the BFR support BIER-TE FRR operations, then the BIER-TE
258	   forwarding layer will receive fast adjacency up/down notification
259	   uses the BIER-TE FRR Adjacency Table to modify the BitString of the
260	   packet before it performs BIER-TE forwarding.  This is detailed in
261	   the FRR section.

263	2.4.  The Routing Underlay

265	   BIER-TE is sending BIER packets to directly connected BIER-TE
266	   neighbors as L2 (unicasted) BIER packets without requiring a routing
267	   underlay.  BIER-TE forwarding uses the Routing underlay for
268	   forward_routed adjacencies which copy BIER-TE packets to not-
269	   directly-connected BFRs (see below for adjacency definitions).

271	   If the BFR intends to support FRR for BIER-TE, then the BIER-TE
272	   forwarding plane needs to receive fast adjacency up/down
273	   notifications: Link up/down or neighbor up/down, eg.: from BFD.
274	   Providing these notifications is considered to be part of the routing
275	   underlay in this document.

277	3.  BIER-TE Forwarding
278	3.1.  The Bit Index Forwarding Table (BIFT)

280	   The Bit Index Forwarding Table (BIFT) exists in every BFR.  It is a
281	   table indexed by BitPosition and is populated by the BIER-TE control
282	   plane.  Each index can be empty or contain a list of one or more
283	   adjacencies.

285	     ------------------------------------------------------------------
286	     | Index           |  Adjacencies                                 |
287	     ==================================================================
288	     | 1               |  forward_connected(interface,neighbor,DNR)   |
289	     ------------------------------------------------------------------
290	     | 2               |  forward_connected(interface,neighbor,DNR)   |
291	     |                 |  forward_connected(interface,neighbor,DNR)   |
292	     ------------------------------------------------------------------
293	     | 3               |  local_decap([VRF])                          |
294	     ------------------------------------------------------------------
295	     | 4               |  forward_routed([VRF,]l3-neighbor)           |
296	     ------------------------------------------------------------------
297	     | 5               |  <empty>                                     |
298	     ------------------------------------------------------------------
299	     | 6               |  ECMP({adjacency1,...adjacencyN}, seed)      |
300	     ------------------------------------------------------------------
301	     ...
302	     | BitStringLength |  ...                                         |
303	     ------------------------------------------------------------------
304	                      Bit Index Forwarding Table

306	   The BIFT is programmed into the data plane of BFRs by the BIER-TE
307	   controller host and used to forward packets, according to the rules
308	   specified in the BIER-TE Forwarding Procedures.

310	   Adjacencies for the same BP when populated in more than one BFR by
311	   the controller do not have to have the same adjacencies.  This is up
312	   to the controller.  BPs for p2p links are one case (see below).

314	3.2.  Adjacency Types

316	3.2.1.  Forward Connected

318	   A "forward_connected" adjacency is towards a directly connected BFR
319	   neighbor using an interface address of that BFR on the connecting
320	   interface.  A forward_connected adjacency does not route packets but
321	   only L2 forwards them to the neighbor.

323	   Packets sent to an adjacency with "DoNotReset" (DNR) set in the BIFT
324	   will not have the BitPosition for that adjacency reset when the BFR
325	   creates a copy for it.  The BitPosition will still be reset for
326	   copies of the packet made towards other adjacencies.  The can be used
327	   for example in ring topologies as explained below.

329	3.2.2.  Forward Routed

331	   A "forward_routed" adjacency is an adjacency towards a BFR that is
332	   not a forward_connected adjacency: towards a loopback address of a
333	   BFR or towards an interface address that is non-directly connected.
334	   Forward_routed packets are forwarded via the Routing Underlay.

336	   If the Routing Underlay has multiple paths for a forward_routed
337	   adjacency, it will perform ECMP independent of BIER-TE for packets
338	   forwarded across a forward_routed adjacency.

340	   If the Routing Underlay has FRR, it will perform FRR independent of
341	   BIER-TE for packets forwarded across a forward_routed adjacency.

343	3.2.3.  ECMP

345	   An "Equal Cost Multipath" (ECMP) adjacency has a list of two or more
346	   adjacencies included in it.  It copies the BIER-TE to one of those
347	   adjacencies based on the ECMP hash calculation.  The BIER-TE ECMP
348	   hash algorithm must select the same adjacency from that list for all
349	   packets with the same "entropy" value in the BIER-TE header if the
350	   same number of adjacencies and same seed are given as parameters.
351	   Further use of the seed parameter is explained below.

353	3.2.4.  Local Decap

355	   A "local_decap" adjacency passes a copy of the payload of the BIER-TE
356	   packet to the packets NextProto within the BFR (IPv4/IPv6,
357	   Ethernet,...).  A local_decap adjacency turns the BFR into a BFER for
358	   matching packets.  Local_decap adjacencies require the BFER to
359	   support routing or switching for NextProto to determine how to
360	   further process the packet.

362	3.3.  Basic BIER-TE Forwarding Example

364	   Step by step example of basic BIER-TE forwarding.  This does not use
365	   ECMP or forward_routed adjacencies nor does it try to minimize the
366	   number of required BitPositions for the topology.

368	     Picture 1: Forwarding Example

370	               [Bier-Te Controller Host]
371	                       /   | \
372	                      v    v  v

374	           | p13   p1 |
375	           +- BFIR2 --+          |
376	           |          | p2   p6  |           LAN2
377	           |          +-- BFR3 --+           |
378	           |          |          |  p7  p11  |
379	      Src -+                     +-- BFER1 --+
380	           |          | p3   p8  |           |
381	           |          +-- BFR4 --+           +-- Rcv1
382	           |          |          |           |
383	           |          |
384	           | p14  p4  |
385	           +- BFIR1 --+          |
386	           |          +-- BFR5 --+ p10  p12  |
387	         LAN1         | p5   p9  +-- BFER2 --+
388	                                 |           +-- Rcv2
389	                                             |
390	                                             LAN3

392	          IP  |..... BIER-TE network......| IP

394	   pXX indicate the BitPositions number assigned by the BIER-TE
395	   controller host to adjacencies in the BIER-TE topology.  For example,
396	   p9 is the adjacency towards BFR9 on the LAN connecting to BFER2.

398	      BIFT BFIR2:
399	        p13: local_decap()
400	         p2: forward_connected(BFR3)

402	      BIFT BFR3:
403	         p1: forward_connected(BFIR2)
404	         p7: forward_connected(BFER1)
405	         p8: forward_connected(BFR4)

407	      BIFT BFER1:
408	        p11: local_decap()
409	         p6: forward_connected(BFR3)
410	         p8: forward_connected(BFR4)

412	   ...and so on.

414	   Traffic needs to flow from BFIR2 towards Rcv1, Rcv2.  The controller
415	   determines it wants it to pass across the following paths:

417	                 -> BFER1 ---------------> Rcv1
418	    BFIR2 -> BFR3
419	                 -> BFR4 -> BFR5 -> BFER2 -> Rcv2

421	   These paths equal to the following BitString: p2, p5, p7, p8, p10,
422	   p11, p12

424	   This BitString is set up in BFIR2.  Multicast packets arriving at
425	   BFIR2 from Src are assigned this BitString.

427	   BFIR2 forwards based on that BitString.  It has p2 and p13 populated.
428	   Only p13 is in BitString which has an adjacency towards BFR3.  BFIR2
429	   resets p2 in BitString and sends a copy towards BFR2.

431	   BFR3 sees a BitString of p5,p7,p8,p10,p11,p12.  It is only interested
432	   in p1,p7,p8.  It creates a copy of the packet to BFER1 (due to p7)
433	   and one to BFR4 (due to p8).  It resets p7, p8 before sending.

435	   BFER1 sees a BitString of p5,p10,p11,p12.  It is only interested in
436	   p6,p7,p8,p11 and therefore considers only p11. p11 is a "local_decap"
437	   adjacency installed by the BIER-TE controller host because BFER1
438	   should pass packets to IP multicast.  The local_decap adjacency
439	   instructs BFER1 to create a copy, decapsulate it from the BIER header
440	   and pass it on to the NextProtocol, in this example IP multicast.  IP
441	   multicast will then forward the packet out to LAN2 because it did
442	   receive PIM or IGMP joins on LAN2 for the traffic.

444	   Further processing of the packet in BFR4, BFR5 and BFER2 accordingly.

446	4.  BIER-TE Controller Host BitPosition Assignments

448	   This section describes how the BIER-TE controller host can use the
449	   different BIER-TE adjacency types to define the BitPositions of a
450	   BIER-TE domain.

452	   Because the size of the BitString is limiting the size of the BIER-TE
453	   domain, many of the options described exist to support larger
454	   topologies with fewer BitPositions (4.1, 4.3, 4.4, 4.5, 4.6, 4.7,
455	   4.8).

457	4.1.  P2P Links

459	   Each P2p link in the BIER-TE domain is assigned one unique
460	   BitPosition with a forward_connected adjacency pointing to the
461	   neighbor on the p2p link.

463	4.2.  BFER

465	   Every BFER is given a unique BitPosition with a local_decap
466	   adjacency.

468	4.3.  Leaf BFIRs

470	   Leaf BFIRs are BFIRs where incoming BIER-TE packets never need to be
471	   forwarded to another BFR but are only sent to the BFIR to exit the
472	   BIER-TE domain.  For example, in networks where PEs are spokes
473	   connected to P routers, those PEs are Leaf BFIRs unless there is a
474	   U-turn between two PEs.

476	   All leaf-BFIR in a BIER-TE domain can share a single BitPosition.
477	   This is possible because the BitPosition for the adjacency to reach
478	   the BFIR can be used to distinguish whether or not packets should
479	   reach the BFIR.

481	   This optimization will not work if an upstream interface of the BFIR
482	   is using a BitPosition optimized as described in the following two
483	   sections (LAN, Hub and Spoke).

485	4.4.  LANs

487	   In a LAN, the adjacency to each neighboring BFR on the LAN is given a
488	   unique BitPosition.  The adjacency of this BitPosition is a
489	   forward_connected adjacency towards the BFR and this BitPosition is
490	   populated into the BIFT of all the other BFRs on that LAN.

492	            BFR1
493	             |p1
494	      LAN1-+-+---+-----+
495	          p3|  p4|   p2|
496	          BFR3 BFR4  BFR7

498	   If Bandwidth on the LAN is not an issue and most BIER-TE traffic
499	   should be copied to all neighbors on a LAN, then BitPositions can be
500	   saved by assigning just a single BitPosition to the LAN and
501	   populating the BitPosition of the BIFTs of each BFRs on the LAN with
502	   a list of forward_connected adjacencies to all other neighbors on the
503	   LAN.

505	   This optimization does not work in the face of BFRs redundantly
506	   connected to more than one LANs with this optimization because these
507	   BFRs would receive duplicates and forward those duplicates into the
508	   opposite LANs.  Adjacencies of such BFRs into their LANs still need a
509	   separate BitPosition.

511	4.5.  Hub and Spoke

513	   In a setup with a hub and multiple spokes connected via separate p2p
514	   links to the hub, all p2p links can share the same BitPosition.  The
515	   BitPosition on the hubs BIFT is set up with a list of
516	   forward_connected adjacencies, one for each Spoke.

518	   This option is similar to the BitPosition optimization in LANs:
519	   Redundantly connected spokes need their own BitPositions.

521	4.6.  Rings

523	   In L3 rings, instead of assigning a single BitPosition for every p2p
524	   link in the ring, it is possible to save BitPositions by setting the
525	   "Do Not Reset" (DNR) flag on forward_connected adjacencies.

527	   For the rings shown in the following picture, a single BitPosition
528	   will suffice to forward traffic entering the ring at BFRa or BFRb all
529	   the way up to BFR1:

531	   On BFRa, BFRb, BFR30,... BFR3, the BitPosition is populated with a
532	   forward_connected adjacency pointing to the clockwise neighbor on the
533	   ring and with DNR set.  On BFR2, the adjacency also points to the
534	   clockwise neighbor BFR1, but without DNR set.  Handling DNR this way
535	   ensures that copies forwarded from any BFR in the ring to a BFR
536	   outside the ring will not have this BitPosition, therefore minimizing
537	   the chance to create loops.

539	                  v        v
540	                  |        |
541	           L1     |   L2   |   L3
542	       /-------- BFRa ---- BFRb --------------------\
543	       |                                            |
544	       \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/
545	           |      |    L4               |      |
546	        p33|                         p15|
547	           BFRd                       BFRc

549	4.7.  Equal Cost MultiPath (ECMP)

551	   The ECMP adjacency allows to use just one BP per link bundle between
552	   two BFRs instead of one BP for each p2p member link of that link
553	   bundle.  In the following picture, one BP is used across L1,L2,L3 and
554	   BFR1/BFR2 have for the BP
555	                --L1-----
556	           BFR1 --L2----- BFR2
557	                --L3-----

559	     BIFT entry in BFR1:
560	     ------------------------------------------------------------------
561	     | Index |  Adjacencies                                           |
562	     ==================================================================
563	     | 6     |  ECMP({L1-to-BFR2,L2-to-BFR2,L3-to-BFR2}, seed)        |
564	     ------------------------------------------------------------------

566	     BIFT entry in BFR2:
567	     ------------------------------------------------------------------
568	     | Index |  Adjacencies                                           |
569	     ==================================================================
570	     | 6     |  ECMP({L1-to-BFR1,L2-to-BFR1,L3-to-BFR1}, seed)        |
571	     ------------------------------------------------------------------

573	   In the following example, all traffic from BFR1 towards BFR10 is
574	   intended to be ECMP load split equally across the topology.  This
575	   example is not mean as a likely setup, but to illustrate that ECMP
576	   can be used to share BPs not only across link bundles, and it
577	   explains the use of the seed parameter.

579	                    BFR1
580	                  /     \
581	                 /L11    \L12
582	             BFR2         BFR3
583	            /    \       /    \
584	           /L21   \L22  /L31   \L32
585	          BFR4  BFR5   BFR6  BFR7
586	           \      /     \      /
587	            \    /       \    /
588	             BFR8         BFR9
589	                 \       /
590	                  \     /
591	                   BFR10

593	     BIFT entry in BFR1:
594	     ------------------------------------------------------------------
595	     | 6     |  ECMP({L11-to-BFR2,L12-to-BFR3}, seed)                 |
596	     ------------------------------------------------------------------

598	     BIFT entry in BFR2:
599	     ------------------------------------------------------------------
600	     | 6     |  ECMP({L21-to-BFR4,L22-to-BFR5}, seed)                 |
601	     ------------------------------------------------------------------

603	     BIFT entry in BFR3:
604	     ------------------------------------------------------------------
605	     | 6     |  ECMP({L31-to-BFR6,L32-to-BFR7}, seed)                 |
606	     ------------------------------------------------------------------

608	   With the setup of ECMP in above topology, traffic would not be
609	   equally load-split.  Instead, links L22 and L31 would see no traffic
610	   at all: BFR2 will only see traffic from BFR1 for which the ECMP hash
611	   in BFR1 selected the first adjacency in a list of 2 adjacencies: link
612	   L11-to-BFR2.  When forwarding in BFR2 performs again an ECMP with two
613	   adjacencies on that subset of traffic, then it will again select the
614	   first of its two adjacencies to it: L21-to-BFR4.  And therefore L22
615	   and BFR5 sees no traffic.

617	   To resolve this issue, the ECMP adjaceny on BFR1 simply needs to be
618	   set up with a different seed than the ECMP adjacncies on BFR2/BFR3

620	   This issue is called polarization.  It depends on the ECMP hash.  It
621	   is possible to build ECMP that does not have polarization, for
622	   example by taking entropy from the actual adjacency members into
623	   account, but that can make it harder to achieve evenly balanced load-
624	   splitting on all BFR without making the ECMP hash algorithm
625	   potentially too complex for fast forwarding in the BFRs.

627	4.8.  Routed adjacencies

629	   Routed adjacencies can reduce the number of BitPositions required
630	   when the traffic engineering requirement is not hop-by-hop explicit
631	   path selection, but loose-hop selection.

633	              ...............             ...............
634	       BFR1--... Redundant ...--L1-- BFR2... Redundant ...---
635	          \--... Network   ...--L2--/    ... Network   ...---
636	       BFR4--... Segment 1 ...--L3-- BFR3... Segment 2 ...---
637	              ...............             ...............

639	   Assume he requirement in above network is to explicitly engineer
640	   paths such that specific traffic flows are passed from segment 1 to
641	   segment 2 via link L1 (or via L2 or via L3).

643	   To achieve this, BFR1 and BFR4 are set up with a forward_routed
644	   adjacency BitPosition towards an address of BFR2 on link L1 (or link
645	   L2 BFR3 via L3).

647	   For paths to be engineered through a specific node BFR2 (or BFR3),
648	   BFR1 and BFR4 are set up up with a forward_routed adjacency
649	   BitPosition towards a loopback address of BFR2 (or BFR3).

651	4.8.1.  Supporting nodes without BIER-TE

653	   Routed adjacencies also enable incremental deployment of BIER-TE.
654	   Only the nodes through which BIER-TE traffic needs to be steered -
655	   with or without replication - need to support BIER-TE.  Where they
656	   are not directly connected to each other, forward_routed adjacencies
657	   are used to pass over non BIER-TE enabled nodes.

659	5.  Avoiding loops and duplicates

661	5.1.  Loops

663	   Whenever BIER-TE creates a copy of a packet, the BitString of that
664	   copy will have all BitPositions cleared that are associated with
665	   adjacencies in the BFR.  This inhibits looping of packets.  The only
666	   exception are adjacencies with DNR set.

668	   With DNR set, looping can happen.  Consider in the ring picture that
669	   link L4 from BFR3 is plugged into the L1 interface of BFRa.  This
670	   creates a loop where the rings clockwise BitPosition is never reset
671	   for copies of the packets traveling clockwise around the ring.

673	   To inhibit looping in the face of such physical misconfiguration,
674	   only forward_connected adjacencies are permitted to have DNR set, and
675	   the link layer destination address of the adjacency (eg.: MAC
676	   address) protects against closing the loop.  Link layers without port
677	   unique link layer addresses should not used with the DNR flag set.

679	5.2.  Duplicates

681	   Duplicates happen when the topology of the BitString is not a tree
682	   but redundantly connecting BFRs with each other.  The controller must
683	   therefore ensure to only create BitStrings that are trees in the
684	   topology.

686	   When links are incorrectly physically re-connected before the
687	   controller updates BitStrings in BFIRs, duplicates can happen.  Like
688	   loops, these can be inhibited by link layer addressing in
689	   forward_connected adjacencies.

691	   If interface or loopback addresses used in forward_routed adjacencies
692	   are moved from one BFR to another, duplicates can equally happen.
693	   Such re-addressing operations must be coordinated with the
694	   controller.

696	6.  FRR

698	   FRR is an optional procedure.  To leverage it, the BIER-TE controller
699	   host and BFRs need to support it.  It does not have to be supported
700	   on all BFRs, but only those that are attached to a link/adjacency for
701	   which FRR support is required.

703	   If BIER-TE FRR is supported by the BIER-TE controller host, then it
704	   needs to calculate the desired backup paths for link and/or node
705	   failures in the BIER-TE domain and download this information into the
706	   BIER-TE Adjacency FRR Table (BTAFT) of the BFRs.  The BTAFT then
707	   drives FRR operations in the BIER-TE forwarding plane of that BFR.

709	6.1.  The BIER-TE Adjacency FRR Table (BTAFT)

711	   The BIER-TE IF FRR Table exists in every BFR that is supporting BIER-
712	   TE FRR procedures.  It is indexed by FRR Adjacency Index.  Associated
713	   with each FRR Adjacency Index is a ResetBitmask, AddBitmask and
714	   BitPosition.

716	     -----------------------------------------------------------
717	     | FRR Adjacency | BitPosition | ResetBitmask | AddBitmask |
718	     | Index         |             |              |            |
719	     ===========================================================
720	     | 1             |   5         |  ..0010000   | ..11000000 |
721	     -----------------------------------------------------------
722	     ...

724	   An FRR Adjacency is an adjacency that is used in the BIFT of the BFR.
725	   The BFR has to be able to determine whether the adjacency is up or
726	   down in less than 50msec.  An FRR adjacency can be a
727	   forward_connected adjacency with fast L2 link state Up/Down state
728	   notifications or a forward_connected or forward_routed adjacency with
729	   a fast aliveness mechanism such as BFD.  Details of those mechanism
730	   are outside the scope of this architecture.

732	   The FRR Adjacency Index is the index that would be indicated on the
733	   fast Up/Down notifications to the BIER-TE forwarding plane

735	   The BitPosition is the BP in the BIFT in which the FRR Adjacency is
736	   used

738	6.2.  FRR in BIER-TE forwarding

740	   The BIER-TE forwarding plane receives fast Up/Down notifications with
741	   the FRR Adjacency Index.  From the BitPosition in the BTAFT entry, it
742	   remembers which BPs are currently affected (have a down adjacency).

744	   When a packet is received, BIER-TE forwarding checks if it has
745	   affected BPs to which it would forward.  If it does, it will remove
746	   the ResetBitmask bits from the packets BitString and add the
747	   AddBitmask bits to the packets BitString.

749	   Afterwards, normal BIER-TE forwarding occurs, taking the modified
750	   BitString into account.

752	6.3.  FRR in the BIER-TE Controller Host

754	   The basic rules how the BIER-TE controller host would calculate
755	   ResetBitMask and AddBitmask are as follows:

757	   1.  The BIER-TE controller host has to determine whether a failure of
758	       the adjacency should be taken to indicate link or node failure.
759	       This is a policy decision.

761	   2.  The ResetBitmask has the BitPosition of the failed adjacency.

763	   3.  In the case of link protection, the AddBitmask are the segments
764	       forming a path from the BFR over to the BFR on the other end of
765	       the failed link.

767	   4.  In the case of node protection, the AddBitmask are the segments
768	       forming a tree from the BFR over to all necessary BFR downstream
769	       of the (assumed to be failed) BFR across the failed adjacency.

771	   5.  The ResetBitmask is extended with those segments that could lead
772	       to duplicate packets if the AddBitmask is added to possible
773	       BitStrings of packets using the failing BitPosition.

775	6.4.  BIER-TE FRR Benefits

777	   Compared to other FRR solutions, such as RSVP-TE/P2MP FRR, BIER-TE
778	   FRR has two key distinctions

780	   o  It maintains the goal of BIER-TE not to establish in-network per
781	      multicast traffic flow state.  For that reason, the backup path/
782	      trees are only tied to the topology but not to individual
783	      distribution trees.

785	   o  For the case of node failure, it allows to build a path engineered
786	      backup tree (4.) as opposed to only a set of p2p backup tunnels.

788	7.  BIER-TE Forwarding Pseudocode

790	   The following sections of Pseudocode are meant to illustrate the
791	   BIER-TE forwarding plane.  This code is not meant to be normative but
792	   to serve both as a potentially easier to read and more precise
793	   representation of the forwarding functionality and to illustrate how
794	   simple BIER-TE forwarding is and that it can be efficiently be
795	   implemented.

797	   The following procedure is executed on a BFR whenever the BIFT is
798	   changed by the BIER-TE controller host:

800	      global MyBitsOfInterest

802	      void BIFTChanged()
803	      {

805	          for (Index = 0; Index++ ; Index <= BitStringLength)
806	              if(BIFT[Index] != <empty>)
807	                  MyBitsOfInterest != 2<<(Index-1)
808	      }

810	   The following procedure is executed whenever an adjacency used for
811	   BIER-TE FRR changes state:

813	      global ResetBitMaskByBT[BitStringLength]
814	      global AddtBitMaskByBT[BitStringLength]
815	      global FRRaffectedBP

817	      void FrrUpDown(FrrAdjacencyIndex, UpDown)
818	      {
819	          global FRRAdjacenciesDown
820	          local Idx = FrrAdjacencyIndex

822	          if (UpDown == Up)
823	              FRRAdjacenciesDown &= ~ 2<<(FrrAdjacencyIndex-1)
824	          else
825	              FRRAdjacenciesDown |=   2<<(FrrAdjacencyIndex-1)

827	          for (Index = GetFirstBitPosition(FRRAdjacenciesDown); Index ;
828	              Index = GetNextBitPosition(FRRAdjacenciesDown, Index))

830	              local BP = BTAFT[Index].BitPosition
831	              FRRaffectedBP |= 2<<(Index)
832	              ResetBitMaskByBT[BP] |= BTAFT[Index].ResetBitMask
833	              AddBitMaskByBT[BP]   |= BTAFT[Index].AddBitMask
834	      }

836	   The following procedure is executed whenever a BIER-TE packet is to
837	   be forwarded:

839	      void ForwardBierTePacket (Packet)
840	      {
841	          // We calculate in BitMask the subset of BPs of the BitString
842	          // for which we have adjacencies. This is purely an
843	          // optimization to avoid to replicate for every BP
844	          // set in BitString only to discover that for most of them,
845	          // the BIFT has no adjacency.

847	          local BitMask = Packet->BitString
848	          Packet->BitString &= ~MyBitsOfInterest
849	          BitMask &= MyBitsOfInterest

851	          // FRR Operations
852	          // Note: this algorithm is not optimal yet for ECMP cases
853	          // it performs FRR replacement for all candidate ECMP paths

855	          local MyFRRBP = BitMask & FRRaffectedBP
856	          for (BP = GetFirstBitPosition(MyFRRNP); BP ;
857	               BP = GetNextBitPosition(MyFRRNP, BP))
858	              BitMask &= ~ResetBitMaskByBT[BP]
859	              BitMask |=  ResetBitMaskByBT[BT]

861	          // Replication
862	          for (Index = GetFirstBitPosition(BitMask); Index ;
863	               Index = GetNextBitPosition(BitMask, Index))
864	              foreach adjacency BIFT[Index]

866	                  if(adjacency == ECMP(ListOfAdjacencies, seed) )
867	                      I = ECMP_hash(sizeof(ListOfAdjacencies),
868	                                    Packet->Entropy, seed)
869	                      adjacency = ListOfAdjacencies[I]

871	                  PacketCopy = Copy(Packet)

873	                  switch(adjacency)
874	                      case forward_connected(interface,neighbor,DNR):
875	                          if(DNR)
876	                              PacketCopy->BitString |= 2<<(Index-1)
877	                          SendToL2Unicast(PacketCopy,interface,neighbor)

879	                      case forward_routed([VRF],neighbor):
880	                          SendToL3(PacketCopy,[VRF,]l3-neighbor)

882	                      case local_decap([VRF],neighbor):
883	                          DecapBierHeader(PacketCopy)
884	                          PassTo(PacketCopy,[VRF,]Packet->NextProto)
885	      }

887	8.  Security Considerations

889	   The security considerations are the same as for BIER with the
890	   following differences:

892	   BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures
893	   for their distribution, so these are not attack vectors against BIER-
894	   TE.

896	9.  IANA Considerations

898	   This document requests no action by IANA.

900	10.  Acknowledgements

902	   The author would like to thank Ijsbrand Wijnands and Neale Ranns for
903	   their extensive review and suggestions.

905	11.  Change log [RFC Editor: Please remove]

907	      00: Initial version.

909	12.  References

911	   [I-D.wijnands-bier-architecture]
912	              Wijnands, I., Rosen, E., Dolganow, A., Przygienda, T., and
913	              S. Aldrin, "Multicast using Bit Index Explicit
914	              Replication", draft-wijnands-bier-architecture-04 (work in
915	              progress), February 2015.

917	   [I-D.wijnands-mpls-bier-encapsulation]
918	              Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., and
919	              S. Aldrin, "Encapsulation for Bit Index Explicit
920	              Replication in MPLS Networks", draft-wijnands-mpls-bier-
921	              encapsulation-02 (work in progress), December 2014.

923	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
924	              Requirement Levels", BCP 14, RFC 2119, March 1997.

926	Author's Address

928	   Toerless Eckert
929	   Cisco

931	   Email: eckert@cisco.com