| < draft-ietf-bier-te-arch-04.txt | draft-ietf-bier-te-arch-05.txt > | |||
|---|---|---|---|---|
| Network Working Group T. Eckert, Ed. | Network Working Group T. Eckert, Ed. | |||
| Internet-Draft Futurewei | Internet-Draft Futurewei | |||
| Intended status: Standards Track G. Cauchie | Intended status: Standards Track G. Cauchie | |||
| Expires: April 24, 2020 Bouygues Telecom | Expires: May 4, 2020 Bouygues Telecom | |||
| M. Menth | M. Menth | |||
| University of Tuebingen | University of Tuebingen | |||
| October 22, 2019 | November 1, 2019 | |||
| Traffic Engineering for Bit Index Explicit Replication (BIER-TE) | Traffic Engineering for Bit Index Explicit Replication (BIER-TE) | |||
| draft-ietf-bier-te-arch-04 | draft-ietf-bier-te-arch-05 | |||
| Abstract | Abstract | |||
| This memo introduces per-packet stateless strict and loose path | This memo introduces per-packet stateless strict and loose path | |||
| engineered replication and forwarding for Bit Index Explicit | engineered replication and forwarding for Bit Index Explicit | |||
| Replication packets ([RFC8279]). This is called BIER-TE. | Replication packets ([RFC8279]). This is called BIER-TE. | |||
| BIER-TE leverages the BIER architecture ([RFC8279]) and extends it | BIER-TE leverages the BIER architecture ([RFC8279]) and extends it | |||
| with a new semantic for bits in the bitstring. BIER-TE can leverage | with a new semantic for bits in the bitstring. BIER-TE can leverage | |||
| BIER forwarding engines with little or no changes. | BIER forwarding engines with little or no changes. | |||
| skipping to change at page 2, line 15 ¶ | skipping to change at page 2, line 15 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on April 24, 2020. | This Internet-Draft will expire on May 4, 2020. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 3, line 16 ¶ | skipping to change at page 3, line 16 ¶ | |||
| 3.2.4. Local Decap . . . . . . . . . . . . . . . . . . . . . 14 | 3.2.4. Local Decap . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 3.3. Encapsulation considerations . . . . . . . . . . . . . . 14 | 3.3. Encapsulation considerations . . . . . . . . . . . . . . 14 | |||
| 3.4. Basic BIER-TE Forwarding Example . . . . . . . . . . . . 14 | 3.4. Basic BIER-TE Forwarding Example . . . . . . . . . . . . 14 | |||
| 3.5. Forwarding comparison with BIER . . . . . . . . . . . . . 17 | 3.5. Forwarding comparison with BIER . . . . . . . . . . . . . 17 | |||
| 3.6. Requirements . . . . . . . . . . . . . . . . . . . . . . 17 | 3.6. Requirements . . . . . . . . . . . . . . . . . . . . . . 17 | |||
| 4. BIER-TE Controller Host BitPosition Assignments . . . . . . . 18 | 4. BIER-TE Controller Host BitPosition Assignments . . . . . . . 18 | |||
| 4.1. P2P Links . . . . . . . . . . . . . . . . . . . . . . . . 18 | 4.1. P2P Links . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 4.2. BFER . . . . . . . . . . . . . . . . . . . . . . . . . . 18 | 4.2. BFER . . . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 4.3. Leaf BFERs . . . . . . . . . . . . . . . . . . . . . . . 18 | 4.3. Leaf BFERs . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 4.4. LANs . . . . . . . . . . . . . . . . . . . . . . . . . . 19 | 4.4. LANs . . . . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 4.5. Hub and Spoke . . . . . . . . . . . . . . . . . . . . . . 19 | 4.5. Hub and Spoke . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 4.6. Rings . . . . . . . . . . . . . . . . . . . . . . . . . . 19 | 4.6. Rings . . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 4.7. Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . . 20 | 4.7. Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . . 21 | |||
| 4.8. Routed adjacencies . . . . . . . . . . . . . . . . . . . 23 | 4.8. Routed adjacencies . . . . . . . . . . . . . . . . . . . 24 | |||
| 4.8.1. Reducing BitPositions . . . . . . . . . . . . . . . . 23 | 4.8.1. Reducing BitPositions . . . . . . . . . . . . . . . . 24 | |||
| 4.8.2. Supporting nodes without BIER-TE . . . . . . . . . . 23 | 4.8.2. Supporting nodes without BIER-TE . . . . . . . . . . 24 | |||
| 5. Avoiding loops and duplicates . . . . . . . . . . . . . . . . 24 | 4.9. Reuse of BitPositions (without DNR) . . . . . . . . . . . 24 | |||
| 5.1. Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 24 | 4.10. Summary of BP optimizations . . . . . . . . . . . . . . . 26 | |||
| 5.2. Duplicates . . . . . . . . . . . . . . . . . . . . . . . 24 | 5. Avoiding loops and duplicates . . . . . . . . . . . . . . . . 27 | |||
| 6. BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . . 24 | 5.1. Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
| 7. Managing SI, subdomains and BFR-ids . . . . . . . . . . . . . 27 | 5.2. Duplicates . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
| 7.1. Why SI and sub-domains . . . . . . . . . . . . . . . . . 28 | 6. BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . . 27 | |||
| 7.2. Bit assignment comparison BIER and BIER-TE . . . . . . . 29 | 7. Managing SI, subdomains and BFR-ids . . . . . . . . . . . . . 30 | |||
| 7.3. Using BFR-id with BIER-TE . . . . . . . . . . . . . . . . 29 | 7.1. Why SI and sub-domains . . . . . . . . . . . . . . . . . 31 | |||
| 7.4. Assigning BFR-ids for BIER-TE . . . . . . . . . . . . . . 30 | 7.2. Bit assignment comparison BIER and BIER-TE . . . . . . . 32 | |||
| 7.5. Example bit allocations . . . . . . . . . . . . . . . . . 31 | 7.3. Using BFR-id with BIER-TE . . . . . . . . . . . . . . . . 32 | |||
| 7.5.1. With BIER . . . . . . . . . . . . . . . . . . . . . . 31 | 7.4. Assigning BFR-ids for BIER-TE . . . . . . . . . . . . . . 33 | |||
| 7.5.2. With BIER-TE . . . . . . . . . . . . . . . . . . . . 32 | 7.5. Example bit allocations . . . . . . . . . . . . . . . . . 34 | |||
| 7.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 33 | 7.5.1. With BIER . . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 8. BIER-TE and Segment Routing (SR) . . . . . . . . . . . . . . 33 | 7.5.2. With BIER-TE . . . . . . . . . . . . . . . . . . . . 35 | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . 34 | 7.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 36 | |||
| 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 | 8. BIER-TE and Segment Routing (SR) . . . . . . . . . . . . . . 36 | |||
| 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 35 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 37 | |||
| 12. Change log [RFC Editor: Please remove] . . . . . . . . . . . 35 | 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 | 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . 39 | 12. Change log [RFC Editor: Please remove] . . . . . . . . . . . 38 | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . 39 | 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 42 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 | 13.1. Normative References . . . . . . . . . . . . . . . . . . 42 | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . 43 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43 | ||||
| 1. Introduction | 1. Introduction | |||
| BIER-TE shares architecture, terminology and packet formats with BIER | BIER-TE shares architecture, terminology and packet formats with BIER | |||
| as described in [RFC8279] and [RFC8296]. This document describes | as described in [RFC8279] and [RFC8296]. This document describes | |||
| BIER-TE in the expectation that the reader is familiar with these two | BIER-TE in the expectation that the reader is familiar with these two | |||
| documents. | documents. | |||
| In BIER-TE, BitPositions (BP) indicate adjacencies. The BIFT of each | In BIER-TE, BitPositions (BP) indicate adjacencies. The BIFT of each | |||
| BFR is only populated with BP that are adjacent to the BFR in the | BFR is only populated with BP that are adjacent to the BFR in the | |||
| BIER-TE Topology. Other BPs are left without adjacency. The BFR | BIER-TE Topology. Other BPs are left without adjacency. The BFR | |||
| replicate and forwards BIER packets to adjacent BPs that are set in | replicate and forwards BIER packets to adjacent BPs that are set in | |||
| the packet. BPs are normally also reset upon forwarding to avoid | the packet. BPs are normally also reset upon forwarding to avoid | |||
| duplicates and loops. This is detailed further below. | duplicates and loops. This is detailed further below. | |||
| Note that related work [ICC], [I-D.ietf-roll-ccast] uses bloom | Note that related work, [I-D.ietf-roll-ccast] uses bloom filters to | |||
| filters to represent leaves or edges of the intended delivery tree. | represent leaves or edges of the intended delivery tree. Bloom | |||
| Bloom filters can support larger trees with fewer addressing bits, | filters in general can support larger trees/topologies with fewer | |||
| but they introduce the heuristic risk of false positives and cannot | addressing bits than explicit bitstrings, but they introduce the | |||
| reset bits in the bitstring during forwarding to avoid loops. For | heuristic risk of false positives and cannot reset bits in the | |||
| these reasons, BIER-TE does not use bloom filters, but explicit | bitstring during forwarding to avoid loops. For these reasons, BIER- | |||
| bitstrings like BIER. | TE uses explicit bitstrings like BIER. The explicit bitstrings of | |||
| BIER-TE can also be seen as a special type of bloom filter, and this | ||||
| is how related work [ICC] describes it. | ||||
| 1.1. Basic Examples | 1.1. Basic Examples | |||
| BIER-TE forwarding is best introduced with simple examples. | BIER-TE forwarding is best introduced with simple examples. | |||
| BIER-TE Topology: | BIER-TE Topology: | |||
| Diagram: | Diagram: | |||
| p5 p6 | p5 p6 | |||
| skipping to change at page 12, line 24 ¶ | skipping to change at page 12, line 24 ¶ | |||
| (traffic engineered) path through the topology. The BIER-TE | (traffic engineered) path through the topology. The BIER-TE | |||
| forwarding definitions do therefore not use the term BFR-id at all. | forwarding definitions do therefore not use the term BFR-id at all. | |||
| Instead, BFR-ids are only used as required by routing underlay, flow | Instead, BFR-ids are only used as required by routing underlay, flow | |||
| overlay of BIER headers. Please refer to Section 7 for explanations | overlay of BIER headers. Please refer to Section 7 for explanations | |||
| how to deal with SI, subdomains and BFR-id in BIER-TE. | how to deal with SI, subdomains and BFR-id in BIER-TE. | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | Index: | Adjacencies: | | | Index: | Adjacencies: | | |||
| | SI:BitPosition | <empty> or one or more per entry | | | SI:BitPosition | <empty> or one or more per entry | | |||
| ================================================================== | ================================================================== | |||
| | 0:1 | forward_connected(interface,neighbor,DNR) | | | 0:1 | forward_connected(interface,neighbor{,DNR}) | | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | 0:2 | forward_connected(interface,neighbor,DNR) | | | 0:2 | forward_connected(interface,neighbor{,DNR}) | | |||
| | | forward_connected(interface,neighbor,DNR) | | | | forward_connected(interface,neighbor{,DNR}) | | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | 0:3 | local_decap({VRF}) | | | 0:3 | local_decap({VRF}) | | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | 0:4 | forward_routed({VRF,}l3-neighbor) | | | 0:4 | forward_routed({VRF,}l3-neighbor) | | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | 0:5 | <empty> | | | 0:5 | <empty> | | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | 0:6 | ECMP({adjacency1,...adjacencyN}, seed) | | | 0:6 | ECMP({adjacency1,...adjacencyN}, seed) | | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| ... | ... | |||
| skipping to change at page 13, line 29 ¶ | skipping to change at page 13, line 29 ¶ | |||
| 3.2.2. Forward Routed | 3.2.2. Forward Routed | |||
| A "forward_routed" adjacency is an adjacency towards a BFR that is | A "forward_routed" adjacency is an adjacency towards a BFR that is | |||
| not a forward_connected adjacency: towards a loopback address of a | not a forward_connected adjacency: towards a loopback address of a | |||
| BFR or towards an interface address that is non-directly connected. | BFR or towards an interface address that is non-directly connected. | |||
| Forward_routed packets are forwarded via the Routing Underlay. | Forward_routed packets are forwarded via the Routing Underlay. | |||
| If the Routing Underlay has multiple paths for a forward_routed | If the Routing Underlay has multiple paths for a forward_routed | |||
| adjacency, it will perform ECMP independent of BIER-TE for packets | adjacency, it will perform ECMP independent of BIER-TE for packets | |||
| forwarded across a forward_routed adjacency. | forwarded across a forward_routed adjacency. This is independent of | |||
| BIER-TE ECMP described in Section 3.2.3. | ||||
| If the Routing Underlay has FRR, it will perform FRR independent of | If the Routing Underlay has FRR, it will perform FRR independent of | |||
| BIER-TE for packets forwarded across a forward_routed adjacency. | BIER-TE for packets forwarded across a forward_routed adjacency. | |||
| 3.2.3. ECMP | 3.2.3. ECMP | |||
| The ECMP mechanisms in BIER are tied to the BIER BIFT and are | The ECMP mechanisms in BIER are tied to the BIER BIFT and are | |||
| therefore not directly useable with BIER-TE. The following | therefore not directly useable with BIER-TE. The following | |||
| procedures describe ECMP for BIER-TE that we consider to be | procedures describe ECMP for BIER-TE that we consider to be | |||
| lightweight but also well manageable. It leverages the existing | lightweight but also well manageable. It leverages the existing | |||
| skipping to change at page 17, line 15 ¶ | skipping to change at page 17, line 15 ¶ | |||
| and pass it on to the NextProtocol, in this example IP multicast. IP | and pass it on to the NextProtocol, in this example IP multicast. IP | |||
| multicast will then forward the packet out to LAN2 because it did | multicast will then forward the packet out to LAN2 because it did | |||
| receive PIM or IGMP joins on LAN2 for the traffic. | receive PIM or IGMP joins on LAN2 for the traffic. | |||
| Further processing of the packet in BFR4, BFR5 and BFER2 accordingly. | Further processing of the packet in BFR4, BFR5 and BFER2 accordingly. | |||
| 3.5. Forwarding comparison with BIER | 3.5. Forwarding comparison with BIER | |||
| Forwarding of BIER-TE is designed to allow common forwarding hardware | Forwarding of BIER-TE is designed to allow common forwarding hardware | |||
| with BIER. In fact, one of the main goals of this document is to | with BIER. In fact, one of the main goals of this document is to | |||
| encourage the building of forwarding hardware that cannot only | encourage the building of forwarding hardware that can not only | |||
| support BIER, but also BIER-TE - to allow experimentation with BIER- | support BIER, but also BIER-TE - to allow experimentation with BIER- | |||
| TE and support building of BIER-TE control plane code. | TE and support building of BIER-TE control plane code. | |||
| The pseudocode in Section 6 shows how existing BIER/BIFT forwarding | The pseudocode in Section 6 shows how existing BIER/BIFT forwarding | |||
| can be amended to support basic BIER-TE forwarding, by using BIER | can be amended to support basic BIER-TE forwarding, by using BIER | |||
| BIFT's F-BM. Only the masking of bits due to avoid duplicates must | BIFT's F-BM. Only the masking of bits due to avoid duplicates must | |||
| be skipped when forwarding is for BIER-TE. | be skipped when forwarding is for BIER-TE. | |||
| Whether to use BIER or BIER-TE forwarding can simply be a configured | Whether to use BIER or BIER-TE forwarding can simply be a configured | |||
| choice per subdomain and accordingly be set up by a BIER-TE | choice per subdomain and accordingly be set up by a BIER-TE | |||
| skipping to change at page 18, line 29 ¶ | skipping to change at page 18, line 29 ¶ | |||
| 4.8). | 4.8). | |||
| 4.1. P2P Links | 4.1. P2P Links | |||
| Each P2p link in the BIER-TE domain is assigned one unique | Each P2p link in the BIER-TE domain is assigned one unique | |||
| BitPosition with a forward_connected adjacency pointing to the | BitPosition with a forward_connected adjacency pointing to the | |||
| neighbor on the p2p link. | neighbor on the p2p link. | |||
| 4.2. BFER | 4.2. BFER | |||
| Every BFER is given a unique BitPosition with a local_decap | Every non-Leaf BFER is given a unique BitPosition with a local_decap | |||
| adjacency. | adjacency. | |||
| 4.3. Leaf BFERs | 4.3. Leaf BFERs | |||
| BFR1(P) BFR2(P) BFR1(P) BFR2(P) | ||||
| | \ / | | | | ||||
| | X | | | | ||||
| | / \ | | | | ||||
| BFER1(PE) BFER2(PE) BFER1(PE)----BFER2(PE) | ||||
| Leaf BFER / Non-Leaf BFER / | ||||
| PE-router PE-router | ||||
| Figure 8: Leaf vs. non-Leaf BFER Example | ||||
| Leaf BFERs are BFERs where incoming BIER-TE packets never need to be | Leaf BFERs are BFERs where incoming BIER-TE packets never need to be | |||
| forwarded to another BFR but are only sent to the BFER to exit the | forwarded to another BFR but are only sent to the BFER to exit the | |||
| BIER-TE domain. For example, in networks where PEs are spokes | BIER-TE domain. For example, in networks where PEs are spokes | |||
| connected to P routers, those PEs are Leaf BFIRs unless there is a | connected to P routers, those PEs are Leaf BFERs unless there is a | |||
| U-turn between two PEs. | U-turn between two PEs. Consider how redundant disjoint traffic can | |||
| reach BFER1/BFER2 in above picture: When BFER1/BFER2 are Non-Leaf | ||||
| BFER as shown on the right hand side, one traffic copy would be | ||||
| forwarded to BFER1 from BFR1, but the other one could only reach | ||||
| BFER1 via BFER2, which makes BFER2 a non-Leaf BFER. Likewise BFER1 | ||||
| is a non-Leaf BFER when forwarding traffic to BFER2. | ||||
| Note that the BFERs in the left hand picture are only guaranteed to | ||||
| be leaf-BFER by fitting routing configuration that prohibits transit | ||||
| traffic to pass through the BFERs, which is commonly applied in these | ||||
| topologies. | ||||
| All leaf-BFER in a BIER-TE domain can share a single BitPosition. | All leaf-BFER in a BIER-TE domain can share a single BitPosition. | |||
| This is possible because the BitPosition for the adjacency to reach | This is possible because the BitPosition for the adjacency to reach | |||
| the BFER can be used to distinguish whether or not packets should | the BFER can be used to distinguish whether or not packets should | |||
| reach the BFER. | reach the BFER. | |||
| This optimization will not work if an upstream interface of the BFER | This optimization will not work if an upstream interface of the BFER | |||
| is using a BitPosition optimized as described in the following two | is using a BitPosition optimized as described in the following two | |||
| sections (LAN, Hub and Spoke). | sections (LAN, Hub and Spoke). | |||
| skipping to change at page 19, line 18 ¶ | skipping to change at page 19, line 35 ¶ | |||
| unique BitPosition. The adjacency of this BitPosition is a | unique BitPosition. The adjacency of this BitPosition is a | |||
| forward_connected adjacency towards the BFR and this BitPosition is | forward_connected adjacency towards the BFR and this BitPosition is | |||
| populated into the BIFT of all the other BFRs on that LAN. | populated into the BIFT of all the other BFRs on that LAN. | |||
| BFR1 | BFR1 | |||
| |p1 | |p1 | |||
| LAN1-+-+---+-----+ | LAN1-+-+---+-----+ | |||
| p3| p4| p2| | p3| p4| p2| | |||
| BFR3 BFR4 BFR7 | BFR3 BFR4 BFR7 | |||
| Figure 8: LAN Example | Figure 9: LAN Example | |||
| If Bandwidth on the LAN is not an issue and most BIER-TE traffic | If Bandwidth on the LAN is not an issue and most BIER-TE traffic | |||
| should be copied to all neighbors on a LAN, then BitPositions can be | should be copied to all neighbors on a LAN, then BitPositions can be | |||
| saved by assigning just a single BitPosition to the LAN and | saved by assigning just a single BitPosition to the LAN and | |||
| populating the BitPosition of the BIFTs of each BFRs on the LAN with | populating the BitPosition of the BIFTs of each BFRs on the LAN with | |||
| a list of forward_connected adjacencies to all other neighbors on the | a list of forward_connected adjacencies to all other neighbors on the | |||
| LAN. | LAN. | |||
| This optimization does not work in the face of BFRs redundantly | This optimization does not work in the case of BFRs redundantly | |||
| connected to more than one LANs with this optimization because these | connected to more than one LANs with this optimization because these | |||
| BFRs would receive duplicates and forward those duplicates into the | BFRs would receive duplicates and forward those duplicates into the | |||
| opposite LANs. Adjacencies of such BFRs into their LANs still need a | opposite LANs. Adjacencies of such BFRs into their LANs still need a | |||
| separate BitPosition. | separate BitPosition. | |||
| 4.5. Hub and Spoke | 4.5. Hub and Spoke | |||
| In a setup with a hub and multiple spokes connected via separate p2p | In a setup with a hub and multiple spokes connected via separate p2p | |||
| links to the hub, all p2p links can share the same BitPosition. The | links to the hub, all p2p links can share the same BitPosition. The | |||
| BitPosition on the hubs BIFT is set up with a list of | BitPosition on the hub's BIFT is set up with a list of | |||
| forward_connected adjacencies, one for each Spoke. | forward_connected adjacencies, one for each Spoke. | |||
| This option is similar to the BitPosition optimization in LANs: | This option is similar to the BitPosition optimization in LANs: | |||
| Redundantly connected spokes need their own BitPositions. | Redundantly connected spokes need their own BitPositions. | |||
| This type of optimized BP could be used for example when all traffic | ||||
| is "broadcast" traffic (very dense receiver set) such as live-TV or | ||||
| situation-awareness (SA). This BP optimization can then be used to | ||||
| explicitly steer different traffic flows across different ECMP paths | ||||
| in Data-Center or broadband-aggregation networks with minimal use of | ||||
| BPs. | ||||
| 4.6. Rings | 4.6. Rings | |||
| In L3 rings, instead of assigning a single BitPosition for every p2p | In L3 rings, instead of assigning a single BitPosition for every p2p | |||
| link in the ring, it is possible to save BitPositions by setting the | link in the ring, it is possible to save BitPositions by setting the | |||
| "Do Not Reset" (DNR) flag on forward_connected adjacencies. | "Do Not Reset" (DNR) flag on forward_connected adjacencies. | |||
| For the rings shown in the following picture, a single BitPosition | For the rings shown in the following picture, a single BitPosition | |||
| will suffice to forward traffic entering the ring at BFRa or BFRb all | will suffice to forward traffic entering the ring at BFRa or BFRb all | |||
| the way up to BFR1: | the way up to BFR1: | |||
| skipping to change at page 20, line 24 ¶ | skipping to change at page 20, line 51 ¶ | |||
| v v | v v | |||
| | | | | | | |||
| L1 | L2 | L3 | L1 | L2 | L3 | |||
| /-------- BFRa ---- BFRb --------------------\ | /-------- BFRa ---- BFRb --------------------\ | |||
| | | | | | | |||
| \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/ | \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/ | |||
| | | L4 | | | | | L4 | | | |||
| p33| p15| | p33| p15| | |||
| BFRd BFRc | BFRd BFRc | |||
| Figure 9: Ring Example | Figure 10: Ring Example | |||
| Note that this example only permits for packets to enter the ring at | Note that this example only permits for packets to enter the ring at | |||
| BFRa and BFRb, and that packets will always travel clockwise. If | BFRa and BFRb, and that packets will always travel clockwise. If | |||
| packets should be allowed to enter the ring at any ring BFR, then one | packets should be allowed to enter the ring at any ring BFR, then one | |||
| would have to use two ring BitPositions. One for clockwise, one for | would have to use two ring BitPositions. One for clockwise, one for | |||
| counterclockwise. | counterclockwise. | |||
| Both would be set up to stop rotating on the same link, e.g. L1. | Both would be set up to stop rotating on the same link, e.g. L1. | |||
| When the ingress ring BFR creates the clockwise copy, it will reset | When the ingress ring BFR creates the clockwise copy, it will reset | |||
| the counterclockwise BitPosition because the DNR bit only applies to | the counterclockwise BitPosition because the DNR bit only applies to | |||
| the bit for which the replication is done. Likewise for the | the bit for which the replication is done. Likewise for the | |||
| clockwise BitPosition for the counterclockwise copy. In result, the | clockwise BitPosition for the counterclockwise copy. In result, the | |||
| ring ingress BFR will send a copy in both directions, serving BFRs on | ring ingress BFR will send a copy in both directions, serving BFRs on | |||
| either side of the ring up to L1. | either side of the ring up to L1. | |||
| 4.7. Equal Cost MultiPath (ECMP) | 4.7. Equal Cost MultiPath (ECMP) | |||
| The ECMP adjacency allows to use just one BP per link bundle between | The ECMP adjacency allows to use just one BP per link bundle between | |||
| two BFRs instead of one BP for each p2p member link of that link | two BFRs instead of one BP for each p2p member link of that link | |||
| bundle. In the following picture, one BP is used across L1,L2,L3 and | bundle. In the following picture, one BP is used across L1,L2,L3. | |||
| BFR1/BFR2 have for the BP | ||||
| --L1----- | --L1----- | |||
| BFR1 --L2----- BFR2 | BFR1 --L2----- BFR2 | |||
| --L3----- | --L3----- | |||
| BIFT entry in BFR1: | BIFT entry in BFR1: | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | Index | Adjacencies | | | Index | Adjacencies | | |||
| ================================================================== | ================================================================== | |||
| | 0:6 | ECMP({L1-to-BFR2,L2-to-BFR2,L3-to-BFR2}, seed) | | | 0:6 | ECMP({forward_connected(L1, BFR2), | | |||
| | | forward_connected(L2, BFR2), | | ||||
| | | forward_connected(L3, BFR2)}, seed) | | ||||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| BIFT entry in BFR2: | BIFT entry in BFR2: | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | Index | Adjacencies | | | Index | Adjacencies | | |||
| ================================================================== | ================================================================== | |||
| | 0:6 | ECMP({L1-to-BFR1,L2-to-BFR1,L3-to-BFR1}, seed) | | | 0:6 | ECMP({forward_connected(L1, BFR1), | | |||
| | | forward_connected(L2, BFR1), | | ||||
| | | forward_connected(L3, BFR1)}, seed) | | ||||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| Figure 10: ECMP Example | Figure 11: ECMP Example | |||
| This document does not standardize any ECMP algorithm because it is | This document does not standardize any ECMP algorithm because it is | |||
| sufficient for implementations to document their freely chosen ECMP | sufficient for implementations to document their freely chosen ECMP | |||
| algorithm. This allows the BIER-TE controller host to calculate ECMP | algorithm. This allows the BIER-TE controller host to calculate ECMP | |||
| paths and seeds. The following picture shows an example ECMP | paths and seeds. The following picture shows an example ECMP | |||
| algorithm: | algorithm: | |||
| forward(packet, ECMP(adj(0), adj(1),... adj(N-1), seed)): | forward(packet, ECMP(adj(0), adj(1),... adj(N-1), seed)): | |||
| i = (packet(bier-header-entropy) XOR seed) % N | i = (packet(bier-header-entropy) XOR seed) % N | |||
| forward packet to adj(i) | forward packet to adj(i) | |||
| Figure 11: ECMP algorithm Example | Figure 12: ECMP algorithm Example | |||
| In the following example, all traffic from BFR1 towards BFR10 is | In the following example, all traffic from BFR1 towards BFR10 is | |||
| intended to be ECMP load split equally across the topology. This | intended to be ECMP load split equally across the topology. This | |||
| example is not meant as a likely setup, but to illustrate that ECMP | example is not meant as a likely setup, but to illustrate that ECMP | |||
| can be used to share BPs not only across link bundles, and it | can be used to share BPs not only across link bundles, and it | |||
| explains the use of the seed parameter. | explains the use of the seed parameter. | |||
| BFR1 | BFR1 (BFIR) | |||
| / \ | /L11 \L12 | |||
| /L11 \L12 | / \ | |||
| BFR2 BFR3 | BFR2 BFR3 | |||
| / \ / \ | /L21 \L22 /L31 \L32 | |||
| /L21 \L22 /L31 \L32 | / \ / \ | |||
| BFR4 BFR5 BFR6 BFR7 | BFR4 BFR5 BFR6 BFR7 | |||
| \ / \ / | \ / \ / | |||
| \ / \ / | \ / \ / | |||
| BFR8 BFR9 | BFR8 BFR9 | |||
| \ / | \ / | |||
| \ / | \ / | |||
| BFR10 | BFR10 (BFER) | |||
| BIFT entry in BFR1: | BIFT entry in BFR1: | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | 0:6 | ECMP({L11-to-BFR2,L12-to-BFR3}, seed1) | | | 0:6 | ECMP({forward_connected(L11, BFR2), | | |||
| | | forward_connected(L12, BFR3)}, seed1) | | ||||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| BIFT entry in BFR2: | BIFT entry in BFR2: | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | 0:6 | ECMP({L21-to-BFR4,L22-to-BFR5}, seed1) | | | 0:7 | ECMP({forward_connected(L21, BFR4), | | |||
| | | forward_connected(L22, BFR5)}, seed1) | | ||||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| BIFT entry in BFR3: | BIFT entry in BFR3: | |||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| | 0:6 | ECMP({L31-to-BFR6,L32-to-BFR7}, seed1) | | | 0:7 | ECMP({forward_connected(L31, BFR6), | | |||
| | | forward_connected(L32, BFR7)}, seed1) | | ||||
| ------------------------------------------------------------------ | ||||
| BIFT entry in BFR4, BFR5: | ||||
| ------------------------------------------------------------------ | ||||
| | 0:8 | forward_connected(Lxx, BFR8) |xx differs on BFR4/BFR5| | ||||
| ------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
| Figure 12: Polarization Example | BIFT entry in BFR6, BFR7: | |||
| ------------------------------------------------------------------ | ||||
| | 0:8 | forward_connected(Lxx, BFR9) |xx differs on BFR6/BFR7| | ||||
| ------------------------------------------------------------------ | ||||
| BIFT entry in BFR8, BFR9: | ||||
| ------------------------------------------------------------------ | ||||
| | 0:9 | forward_connected(Lxx, BFR10) |xx differs on BFR8/BFR9| | ||||
| ------------------------------------------------------------------ | ||||
| Figure 13: Polarization Example | ||||
| Note that for the following discussion of ECMP, only the BIFT ECMP | ||||
| adjacencies on BFR1, BFR2, BFR3 are relevant. The re-use of BP | ||||
| across BFR in this example is further explained in Section 4.9 below. | ||||
| With the setup of ECMP in above topology, traffic would not be | With the setup of ECMP in above topology, traffic would not be | |||
| equally load-split. Instead, links L22 and L31 would see no traffic | equally load-split. Instead, links L22 and L31 would see no traffic | |||
| at all: BFR2 will only see traffic from BFR1 for which the ECMP hash | at all: BFR2 will only see traffic from BFR1 for which the ECMP hash | |||
| in BFR1 selected the first adjacency in the list of 2 adjacencies | in BFR1 selected the first adjacency in the list of 2 adjacencies | |||
| given as parameters to the ECMP. It is link L11-to-BFR2. BFR2 | given as parameters to the ECMP. It is link L11-to-BFR2. BFR2 | |||
| performs again ECMP with two adjacencies on that subset of traffic | performs again ECMP with two adjacencies on that subset of traffic | |||
| using the same seed1, and will therefore again select the first of | using the same seed1, and will therefore again select the first of | |||
| its two adjacencies: L21-to-BFR4. And therefore L22 and BFR5 sees no | its two adjacencies: L21-to-BFR4. And therefore L22 and BFR5 sees no | |||
| traffic. Likewise for L31 and BFR6. | traffic. Likewise for L31 and BFR6. | |||
| To resolve this issue, the ECMP adjacency on BFR1 simply needs to be | This issue in BFR2/BFR3 is called polarization. It results from the | |||
| set up with a different seed2 than the ECMP adjacencies on BFR2/BFR3. | re-use of the same hash function across multiple consecutive hops in | |||
| ECMP in BFR2 could use the same seed2 to avoid its issue. | topologies like these. To resolve this issue, the ECMP adjacency on | |||
| BFR1 can be set up with a different seed2 than the ECMP adjacencies | ||||
| on BFR2/BFR3. BFR2/BFR3 can use the same hash because packets will | ||||
| not sequentially pass across both of them. Therefore, they can also | ||||
| use the same BP 0:7. | ||||
| This issue in BFR2/BFR3 is called polarization. It depends on the | Note that ECMP solutions outside of BIER often hide the seed by auto- | |||
| ECMP hash. Instead of explicitly setting up different seeds in | selecting it from local entropy such as unique local or next-hop | |||
| consecutive BFR in a topology subject to polarization, it is possible | identifiers. The solutions choosen for BIER-TE to allow the | |||
| to build ECMP that does not have polarization, for example by taking | controller to explicitly set the seed maximizes the ability of the | |||
| entropy from the actual adjacency members into account such as the | controller to choose the seed, independent of such seed source that | |||
| next-hop identifiers like L11-to-BFR2 and, but that can make it | the controller may not be able to control well, and even calculate | |||
| harder to achieve evenly balanced load-splitting on all BFR without | optimized seeds for multi-hop cases. | |||
| making the ECMP hash algorithm potentially too complex for fast | ||||
| forwarding in the BFRs. In addition, these type of polarization free | ||||
| ECMP algorithms likely make it harder for a BIER-TE controller host | ||||
| to calculate entropy fields for BIER-TE headers that would flow on | ||||
| the same or different ECMP paths. With polarizing algorithms, this | ||||
| is typically easier. | ||||
| 4.8. Routed adjacencies | 4.8. Routed adjacencies | |||
| 4.8.1. Reducing BitPositions | 4.8.1. Reducing BitPositions | |||
| Routed adjacencies can reduce the number of BitPositions required | Routed adjacencies can reduce the number of BitPositions required | |||
| when the traffic engineering requirement is not hop-by-hop explicit | when the traffic engineering requirement is not hop-by-hop explicit | |||
| path selection, but loose-hop selection. | path selection, but loose-hop selection. Routed adjacencies can also | |||
| allow to operate BIER-TE across intermediate hop routers that do not | ||||
| ............... ............... | support BIER-TE. | |||
| BFR1--... Redundant ...--L1-- BFR2... Redundant ...--- | ||||
| \--... Network ...--L2--/ ... Network ...--- | ||||
| BFR4--... Segment 1 ...--L3-- BFR3... Segment 2 ...--- | ||||
| ............... ............... | ||||
| Figure 13: Routed Adjacencies Example | ............... | |||
| ...BFR1--... ...--L1-- BFR2... | ||||
| ... .Routers. ...--L2--/ | ||||
| ...BFR4--... ...------ BFR3... | ||||
| ............... | | ||||
| LO | ||||
| Network Area 1 | ||||
| Assume the requirement in above network is to explicitly engineer | Figure 14: Routed Adjacencies Example | |||
| paths such that specific traffic flows are passed from segment 1 to | ||||
| segment 2 via link L1 (or via L2 or via L3). | ||||
| To achieve this, BFR1 and BFR4 are set up with a forward_routed | Assume the requirement in the above picture is to explicitly steer | |||
| adjacency BitPosition towards an address of BFR2 on link L1 (or link | traffic flows that have arrived at BFR1 or BFR4 via a shortest path | |||
| L2 BFR3 via L3). | in the routing underlay "Network Area 1" to one of the following | |||
| three next segments: (1) BFR2 via link L1, (2) BFR2 via link L2, (3) | ||||
| via BFR3. | ||||
| For paths to be engineered through a specific node BFR2 (or BFR3), | To enable this, both BFR1 and BFR4 are set up with a forward_routed | |||
| BFR1 and BFR4 are set up with a forward_routed adjacency BitPosition | adjacency BitPosition towards an address of BFR2 on link L1, another | |||
| towards a loopback address of BFR2 (or BFR3). | forward_routed BitPosition towards an address of BFR2 on link L2 and | |||
| a third forward_routed Bitposition towards a node address LO of BFR3. | ||||
| 4.8.2. Supporting nodes without BIER-TE | 4.8.2. Supporting nodes without BIER-TE | |||
| Routed adjacencies also enable incremental deployment of BIER-TE. | Routed adjacencies also enable incremental deployment of BIER-TE. | |||
| Only the nodes through which BIER-TE traffic needs to be steered - | Only the nodes through which BIER-TE traffic needs to be steered - | |||
| with or without replication - need to support BIER-TE. Where they | with or without replication - need to support BIER-TE. Where they | |||
| are not directly connected to each other, forward_routed adjacencies | are not directly connected to each other, forward_routed adjacencies | |||
| are used to pass over non BIER-TE enabled nodes. | are used to pass over non BIER-TE enabled nodes. | |||
| 4.9. Reuse of BitPositions (without DNR) | ||||
| BitPositions can be re-used across multiple BFR to minimize the | ||||
| number of BP needed. This happens when adjacencies on multiple BFR | ||||
| use the DNR flag as described above, but it can also be done for non- | ||||
| DNR adjacencies. This section only discussses this non-DNR case. | ||||
| Because BP are reset after passing a BFR with an adjacency for that | ||||
| BP, reuse of BP across multiple BFR does not introduce any problems | ||||
| with duplicates or loops that do not also exist when every adjacency | ||||
| has a unique BP: Instead of setting one BP in a BitString that is | ||||
| reused in N-adjacencies, one would get the same or worse results if | ||||
| each of these adjacencies had a unique BP and all of them where set | ||||
| in the BitString. Instead, based on the case, BPs can be reused | ||||
| without limitation, or they introduce fewer path engineering choices, | ||||
| or they do not work. | ||||
| BP cannot be reused across two BFR that would need to be passed | ||||
| sequentially for some path: The first BFR will reset the BP, so those | ||||
| paths cannot be built. BP can be set across BFR that would (A) only | ||||
| occur across different paths or (B) across different branches of the | ||||
| same tree. | ||||
| An example of (A) was given in Figure 13, where BP 0:7, BP 0:8 and BP | ||||
| 0:9 are each reused across multiple BFR because a single packet/path | ||||
| would never be able to reach more than one BFR sharing the same BP. | ||||
| Assume the example was changed: BFR1 has no ECMP adjacency for BP | ||||
| 0:6, but instead BP 0:5 with forward_connected to BFR2 and BP 0:6 | ||||
| with forward_connected to BFR3. Packets with both BP 0:5 and BP 0:6 | ||||
| would now be able to reach both BFR2 and BFR3 and the still existing | ||||
| re-use of BP 0:7 between BFR2 and BFR3 is a case of (B) where reuse | ||||
| of BP is perfect because it does not limit the set of useful path | ||||
| choices: | ||||
| If instead of reusing BP 0:7, BFR3 used a separate BP 0:10 for its | ||||
| ECMP adjacency, no useful additional path engineering would be | ||||
| enabled. If duplicates at BFR10 where undesirable, this would be | ||||
| done by not setting BP 0:5 and BP 0:6 for the same packet. If the | ||||
| duplicates where desirable (e.g.: resilient transmission), the | ||||
| additional BP 0:10 would also not render additional value. | ||||
| Reuse may also save BPs in larger topologies. Consider the topology | ||||
| shown in Figure 17, but only the following explanations: A BFIR/ | ||||
| sender (e.g.: video headend) is attached to area 1, and area 2...6 | ||||
| contain receivers/BFER. Assume each area had a distribution ring, | ||||
| each with two BPs to indicate the direction (as explained in before). | ||||
| These two BPs could be reused across the 5 areas. Packets would be | ||||
| replicated through other BPs to the desired subset of areas, and once | ||||
| a packet copy reaches the ring of the area, the two ring BPs come | ||||
| into play. This reuse is a case of (B), but it limits the topology | ||||
| choices: Packets can only flow around the same direction in the rings | ||||
| of all areas. This may or may not be acceptable based on the desired | ||||
| traffic engineering: If resilient transmission is the traffic | ||||
| engineering goal, then it is likely a good optimization, if the | ||||
| bandwidth of each ring was to be optimized separately, it would not | ||||
| be a good limitation. | ||||
| 4.10. Summary of BP optimizations | ||||
| This section reviewed a range of techniques by which a controller can | ||||
| create a BIER-TE topology in a way that minimizes the number of | ||||
| necessary BPs. | ||||
| Without any optimization, a controller would attempt to map the | ||||
| network subnet topology 1:1 into the BIER-TE topology and every | ||||
| subnet adjacent neighbor requires a forward_connected BP and every | ||||
| BFER requires a local_decap BP. | ||||
| The optimizations described are then as follows: | ||||
| o P2p links require only one BP (Section 4.1). | ||||
| o All leaf-BFER can share a single local_decap BP (Section 4.3). | ||||
| o A LAN with N BFR needs at most N BP (one for each BFR). It only | ||||
| needs one BP for all those BFR tha are not redundanty connected to | ||||
| multiple LANs (Section 4.4). | ||||
| o A hub with p2p connections to multiple non-leaf-BFER spokes can | ||||
| share one BP to all spokes if traffic can be flooded to all | ||||
| spokes, e.g.: because of no bandwidth concerns or dense receiver | ||||
| sets (Section 4.5). | ||||
| o Rings of BFR can be built with just two BP (one for each | ||||
| direction) except for BFR with multiple ring connections - similar | ||||
| to LANs (Section 4.6). | ||||
| o ECMP adjacencies to N neighbors can replace N BP with 1 BP. | ||||
| Multihop ECMP can avoid polarization through different seeds of | ||||
| the ECMP algorithm (Section 4.7). | ||||
| o Routed adjacencies allow to "tunnel" across non-BIER-TE capable | ||||
| routers and across BIER-TE capable routers where no traffic- | ||||
| steering or replications are required (Section 4.8). | ||||
| o BP can generally be reused across nodes that do not need to be | ||||
| consecutive in paths, but depending on scenario, this may limit | ||||
| the feasible traffic engineering options (Section 4.9). | ||||
| Note that the described list of optimizations is not exhaustive. | ||||
| Especially when the set of required path engineering choices is | ||||
| limited and the set of possible subsets of BFER that should be able | ||||
| to receive traffic is limited, further optimizations of BP are | ||||
| possible. The hub & spoke optimization is a simple example of such | ||||
| traffic pattern dependent optimizations. | ||||
| 5. Avoiding loops and duplicates | 5. Avoiding loops and duplicates | |||
| 5.1. Loops | 5.1. Loops | |||
| Whenever BIER-TE creates a copy of a packet, the BitString of that | Whenever BIER-TE creates a copy of a packet, the BitString of that | |||
| copy will have all BitPositions cleared that are associated with | copy will have all BitPositions cleared that are associated with | |||
| adjacencies in the BFR. This inhibits looping of packets. The only | adjacencies on the BFR. This inhibits looping of packets. The only | |||
| exception are adjacencies with DNR set. | exception are adjacencies with DNR set. | |||
| With DNR set, looping can happen. Consider in the ring picture that | With DNR set, looping can happen. Consider in the ring picture that | |||
| link L4 from BFR3 is plugged into the L1 interface of BFRa. This | link L4 from BFR3 is plugged into the L1 interface of BFRa. This | |||
| creates a loop where the rings clockwise BitPosition is never reset | creates a loop where the rings clockwise BitPosition is never reset | |||
| for copies of the packets traveling clockwise around the ring. | for copies of the packets traveling clockwise around the ring. | |||
| To inhibit looping in the face of such physical misconfiguration, | To inhibit looping in the face of such physical misconfiguration, | |||
| only forward_connected adjacencies are permitted to have DNR set, and | only forward_connected adjacencies are permitted to have DNR set, and | |||
| the link layer destination address of the adjacency (e.g. MAC | the link layer port unique unicast destination address of the | |||
| address) protects against closing the loop. Link layers without port | adjacency (e.g. MAC address) protects against closing the loop. | |||
| unique link layer addresses should not be used with the DNR flag set. | Link layers without port unique link layer addresses should not be | |||
| used with the DNR flag set. | ||||
| 5.2. Duplicates | 5.2. Duplicates | |||
| Duplicates happen when the topology of the BitString is not a tree | Duplicates happen when the topology of the BitString is not a tree | |||
| but redundantly connecting BFRs with each other. The controller must | but redundantly connecting BFRs with each other. The controller must | |||
| therefore ensure to only create BitStrings that are trees in the | therefore ensure to only create BitStrings that are trees in the | |||
| topology. | topology. | |||
| When links are incorrectly physically re-connected before the | When links are incorrectly physically re-connected before the | |||
| controller updates BitStrings in BFIRs, duplicates can happen. Like | controller updates BitStrings in BFIRs, duplicates can happen. Like | |||
| skipping to change at page 25, line 22 ¶ | skipping to change at page 28, line 22 ¶ | |||
| if (!F-BM) continue; | if (!F-BM) continue; | |||
| BFR-NBR = BIFT[Index+Offset]->BFR-NBR; | BFR-NBR = BIFT[Index+Offset]->BFR-NBR; | |||
| PacketCopy = Copy(Packet); | PacketCopy = Copy(Packet); | |||
| PacketCopy->BitString &= F-BM; [2] | PacketCopy->BitString &= F-BM; [2] | |||
| PacketSend(PacketCopy, BFR-NBR); | PacketSend(PacketCopy, BFR-NBR); | |||
| // The following must not be done for BIER-TE: | // The following must not be done for BIER-TE: | |||
| // Packet->BitString &= ~F-BM; [1] | // Packet->BitString &= ~F-BM; [1] | |||
| } | } | |||
| } | } | |||
| Figure 14: Simplified BIER-TE Forwarding Pseudocode | Figure 15: Simplified BIER-TE Forwarding Pseudocode | |||
| The difference is that in BIER-TE, step [1] must not be performed. | ||||
| In BIER, this step is necessary to avoid duplicates when two or more | The difference is that in BIER-TE, step [1] must not be performed, | |||
| BFER are reachable via the same neighbor. The F-BM of all those BFER | but is replaced with [2] (when the forwarding plane algorithm is | |||
| bits will indicate each other's bits, and step [1] will reset all | implemented verbatim as shown above). | |||
| these bits on the first copy made for the first of those BFER bits | ||||
| set in the BitString, hence skipping any further copies to that | ||||
| neighbor. | ||||
| Whereas in BIER, the F-BM of bits toward a specific neighbor contain | In BIER, the F-BM of a BP has all BP set that are meant to be | |||
| only the bits of those BFER destined to be forwarded across this | forwarded via the same neighbor. It is used to reset those BP in the | |||
| neighbor, in BIER-TE the F-BM for a neighbor needs to have all bits | packet after the first copy to this neighbor has been made to inhibit | |||
| set except all those bits that are actual (non-empty) adjacencies of | multiple copies to the same neighbor. | |||
| this BFR. Step [2] will reset those adjacency bits to avoid loops, | ||||
| but all the other bits that are not adjacencies of this BFR need to | ||||
| stay untouched by [2] so that they can be processed by further BFR | ||||
| along the path. If [1] was performed as in BIER, then those non- | ||||
| adjacency bits would erroneously get reset during replication. | ||||
| To support the DNR (Do Not Reset) flag of forward_connected() | In BIER-TE, the F-BM of a particular BP with an adjacency is the list | |||
| adjacencies, the F-BM must also have its own bit set in the F-BM of | of all BPs with an adjacency on this BFR except the particular BP | |||
| such an adjacency , so that for the packet copy made for this | itself if it has an adjacency with the DNR bit set. The F-BM is used | |||
| adjacency the bit stays on, whereas it will not be set in the F-BM of | to reset the F-BM BPs before creating copies. | |||
| other bits so that it will be reset for any other packet copy made. | ||||
| Eliminating the need to perform [1] also makes processing of bits in | In BIER, the order of BPs impacts the result of forwarding because of | |||
| the BIER-TE bitstring independent of processing other bits, which may | [1]. In BIER-TE, forwarding is not impacted by the order of BPs. It | |||
| also simplify forwarding plane implementations. | is therefore possible to further optimize forwarding than in BIER. | |||
| For example, BIER-TE forwarding can be parallelized such that a | ||||
| parallel instance (such as an egres linecard) can process any subset | ||||
| of BPs without any considerations for the other BPs - and without any | ||||
| prior, cross-BP shared processing. | ||||
| The following pseudocode is comprehensive: | The above simplified pseudocode is elaborated further as follows: | |||
| o This pseudocode eliminates per-bit F-BM, therefore reducing state | o This pseudocode eliminates per-bit F-BM, therefore reducing state | |||
| by BitStringLength^2*SI and eliminating the need for per-packet- | by BitStringLength^2*SI and eliminating the need for per-packet- | |||
| copy masking operation except for adjacencies with DNR flag set: | copy masking operation except for adjacencies with DNR flag set: | |||
| * AdjacentBits[SI] are bits with a non-empty list of adjacencies. | * AdjacentBits[SI] are bits with a non-empty list of adjacencies. | |||
| This can be computed whenever the BIER-TE controller host | This can be computed whenever the BIER-TE controller host | |||
| updates the adjacencies. | updates the adjacencies. | |||
| * Only the AdjacentBits need to be examined in the loop for | * Only the AdjacentBits need to be examined in the loop for | |||
| skipping to change at page 27, line 37 ¶ | skipping to change at page 30, line 37 ¶ | |||
| SendToL3(PacketCopy,{VRF,}l3-neighbor); | SendToL3(PacketCopy,{VRF,}l3-neighbor); | |||
| case local_decap({VRF},neighbor): | case local_decap({VRF},neighbor): | |||
| DecapBierHeader(PacketCopy); | DecapBierHeader(PacketCopy); | |||
| PassTo(PacketCopy,{VRF,}Packet->NextProto); | PassTo(PacketCopy,{VRF,}Packet->NextProto); | |||
| } | } | |||
| } | } | |||
| } | } | |||
| } | } | |||
| Figure 15: BIER-TE Forwarding Pseudocode | Figure 16: BIER-TE Forwarding Pseudocode | |||
| 7. Managing SI, subdomains and BFR-ids | 7. Managing SI, subdomains and BFR-ids | |||
| When the number of bits required to represent the necessary hops in | When the number of bits required to represent the necessary hops in | |||
| the topology and BFER exceeds the supported bitstring length, | the topology and BFER exceeds the supported bitstring length, | |||
| multiple SI and/or subdomains must be used. This section discusses | multiple SI and/or subdomains must be used. This section discusses | |||
| how. | how. | |||
| BIER-TE forwarding does not require the concept of BFR-id, but | BIER-TE forwarding does not require the concept of BFR-id, but | |||
| routing underlay, flow overlay and BIER headers may. This section | routing underlay, flow overlay and BIER headers may. This section | |||
| also discusses how BFR-ids can be assigned to BFIR/BFER for BIER-TE. | also discusses how BFR-ids can be assigned to BFIR/BFER for BIER-TE. | |||
| 7.1. Why SI and sub-domains | 7.1. Why SI and sub-domains | |||
| For BIER and BIER-TE forwarding, the most important result of using | For BIER and BIER-TE forwarding, the most important result of using | |||
| multiple SI and/or subdomains is the same: Packets that need to be | multiple SI and/or subdomains is the same: Packets that need to be | |||
| sent to BFER in different SI or subdomains require different BIER | sent to BFER in different SI or subdomains require different BIER | |||
| packets: each one with a bitstring for a different (SI,subdomain) | packets: each one with a bitstring for a different (SI,subdomain) | |||
| bitstring. Each such bitstring uses one bitstring length sized SI | combination. Each such bitstring uses one bitstring length sized SI | |||
| block in the BIFT of the subdomain. We call this a BIFT:SI (block). | block in the BIFT of the subdomain. We call this a BIFT:SI (block). | |||
| For BIER and BIER-TE forwarding itself there is also no difference | For BIER and BIER-TE forwarding itself there is also no difference | |||
| whether different SI and/or sub-domains are chosen, but SI and | whether different SI and/or sub-domains are chosen, but SI and | |||
| subdomain have different purposes in the BIER architecture shared by | subdomain have different purposes in the BIER architecture shared by | |||
| BIER-TE. This impacts how operators are managing them and how | BIER-TE. This impacts how operators are managing them and how | |||
| especially flow overlays will likely use them. | especially flow overlays will likely use them. | |||
| By default, every possible BFIR/BFER in a BIER network would likely | By default, every possible BFIR/BFER in a BIER network would likely | |||
| be given a BFR-id in subdomain 0 (unless there are > 64k BFIR/BFER). | be given a BFR-id in subdomain 0 (unless there are > 64k BFIR/BFER). | |||
| skipping to change at page 31, line 40 ¶ | skipping to change at page 34, line 40 ¶ | |||
| area1 area2 area3 | area1 area2 area3 | |||
| BFR1a BFR1b BFR2a BFR2b BFR3a BFR3b | BFR1a BFR1b BFR2a BFR2b BFR3a BFR3b | |||
| | \ / \ / | | | \ / \ / | | |||
| ................................ | ................................ | |||
| . Core . | . Core . | |||
| ................................ | ................................ | |||
| | / \ / \ | | | / \ / \ | | |||
| BFR4a BFR4b BFR5a BFR5b BFR6a BFR6b | BFR4a BFR4b BFR5a BFR5b BFR6a BFR6b | |||
| area4 area5 area6 | area4 area5 area6 | |||
| Figure 16: Scaling BIER-TE bits by reuse | Figure 17: Scaling BIER-TE bits by reuse | |||
| With random allocation of BFR-id to BFER, each receiving area would | With random allocation of BFR-id to BFER, each receiving area would | |||
| (most likely) have to receive all 4 copies of the BIER packet because | (most likely) have to receive all 4 copies of the BIER packet because | |||
| there would be BFR-id for each of the 4 SI in each of the areas. | there would be BFR-id for each of the 4 SI in each of the areas. | |||
| Only further towards each BFER would this duplication subside - when | Only further towards each BFER would this duplication subside - when | |||
| each of the 4 trees runs out of branches. | each of the 4 trees runs out of branches. | |||
| If BFR-id are allocated intelligently, then all the BFER in an area | If BFR-id are allocated intelligently, then all the BFER in an area | |||
| would be given BFR-id with as few as possible different SI. Each | would be given BFR-id with as few as possible different SI. Each | |||
| area would only have to forward one or two packets instead of 4. | area would only have to forward one or two packets instead of 4. | |||
| skipping to change at page 35, line 11 ¶ | skipping to change at page 38, line 11 ¶ | |||
| BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures | BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures | |||
| for their distribution, so these are not attack vectors against BIER- | for their distribution, so these are not attack vectors against BIER- | |||
| TE. | TE. | |||
| 10. IANA Considerations | 10. IANA Considerations | |||
| This document requests no action by IANA. | This document requests no action by IANA. | |||
| 11. Acknowledgements | 11. Acknowledgements | |||
| The authors would like to thank Greg Shepherd, Ijsbrand Wijnands and | The authors would like to thank Greg Shepherd, Ijsbrand Wijnands, | |||
| Neale Ranns for their extensive review and suggestions. | Neale Ranns, Dirk Trossen, Sandy Zheng and Jeffrey Zhang for their | |||
| extensive review and suggestions. | ||||
| 12. Change log [RFC Editor: Please remove] | 12. Change log [RFC Editor: Please remove] | |||
| draft-ietf-bier-te-arch: | draft-ietf-bier-te-arch: | |||
| 05: Review Jeffrey Zhang. | ||||
| Part 2: | ||||
| 4.3 added note about leaf-BFER being also a propery of routing | ||||
| setup. | ||||
| 4.7 Added missing details from example to avoid confusion with | ||||
| routed adjacencies, also compressed explanatory text and better | ||||
| justification why seed is explicitly configured by controller. | ||||
| 4.9 added section discussing generic reuse of BP methods. | ||||
| 4.10 added section summarizing BP optimizations of section 4. | ||||
| 6. Rewrote/compressed explanation of comparison BIER/BIER-TE | ||||
| forwarding difference. Explained benefit of BIER-TE per-BP | ||||
| forwarding being independent of forwarding for other BPs. | ||||
| Part 1: | ||||
| Explicitly ue forwarded_connected adjcency in ECMP adjcency | ||||
| examples to avoid confusion. | ||||
| 4.3 Add picture as example for leav vs. non-leaf BFR in topology. | ||||
| Improved description. | ||||
| 4.5 Exampe for traffic that can be broadcast -> for single BP in | ||||
| hub&spoke. | ||||
| 4.8.1 Simplified example picture for routed adjacency, explanatory | ||||
| text. | ||||
| Review from Dirk Trossen: | ||||
| Fixed up explanation of ICC paper vs. bloom filter. | ||||
| 04: spell check run. | 04: spell check run. | |||
| Addded remaining fixes for Sandys (Zhang Zheng) review: | Addded remaining fixes for Sandys (Zhang Zheng) review: | |||
| 4.7 Enhance ECMP explanations: | 4.7 Enhance ECMP explanations: | |||
| example ECMP algorithm, highlight that doc does not standardize | example ECMP algorithm, highlight that doc does not standardize | |||
| ECMP algorithm. | ECMP algorithm. | |||
| Review from Dirk Trossen: | Review from Dirk Trossen: | |||
| End of changes. 51 change blocks. | ||||
| 136 lines changed or deleted | 334 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||