| < draft-ietf-bier-entropy-staged-dc-clos-00.txt | draft-ietf-bier-entropy-staged-dc-clos-01.txt > | |||
|---|---|---|---|---|
| Network Working Group J. Xie | Network Working Group J. Xie | |||
| Internet-Draft Huawei Technologies | Internet-Draft Huawei Technologies | |||
| Intended status: Informational X. Xu | Intended status: Informational X. Xu | |||
| Expires: April 25, 2019 Alibaba Inc. | Expires: November 9, 2019 Alibaba Inc. | |||
| G. Yan | G. Yan | |||
| M. McBride | M. McBride | |||
| Huawei Technologies | Huawei Technologies | |||
| October 22, 2018 | May 8, 2019 | |||
| Use of BIER Entropy for Data Center CLOS Networks | Use of BIER Entropy for Data Center Clos Networks | |||
| draft-ietf-bier-entropy-staged-dc-clos-00 | draft-ietf-bier-entropy-staged-dc-clos-01 | |||
| Abstract | Abstract | |||
| Bit Index Explicit Replication (BIER) introduces a new multicast- | Bit Index Explicit Replication (BIER) introduces a new multicast- | |||
| specific BIER Header. BIER can be applied to the Multi Protocol | specific BIER Header. BIER can be applied to the Multi Protocol | |||
| Label Switching (MPLS) data plane or Non-MPLS data plane. Entropy is | Label Switching (MPLS) data plane or Non-MPLS data plane. Entropy is | |||
| a technique used in BIER to support load-balancing. This document | a technique used in BIER to support load-balancing. This document | |||
| examines and describes how BIER Entropy is to be applied to Data | examines and describes how BIER Entropy is to be applied to Data | |||
| Center CLOS networks for path selection. | Center Clos networks for path selection. | |||
| Requirements Language | Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| skipping to change at page 1, line 45 ¶ | skipping to change at page 1, line 45 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on April 25, 2019. | This Internet-Draft will expire on November 9, 2019. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3. Problem Statement and Considerations . . . . . . . . . . . . 3 | 3. Problem Statement and Considerations . . . . . . . . . . . . 3 | |||
| 3.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 3 | 3.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3.2. Considerations . . . . . . . . . . . . . . . . . . . . . 4 | 3.2. Considerations . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 4. Use of BIER Entropy for DC CLOS Network . . . . . . . . . . . 5 | 4. Use of BIER Entropy for DC Clos Network . . . . . . . . . . . 5 | |||
| 4.1. Use of BIER Entropy for DC CLOS Network . . . . . . . . . 5 | 4.1. Use of BIER Entropy for DC Clos Network . . . . . . . . . 5 | |||
| 4.2. Steering for elephant flows . . . . . . . . . . . . . . . 6 | 4.2. Steering for elephant flows . . . . . . . . . . . . . . . 6 | |||
| 4.3. Path Division for Tenant flows to different SIs . . . . . 6 | 4.3. Path Division for Tenant flows to different SIs . . . . . 6 | |||
| 4.4. Link Failure and Convergence . . . . . . . . . . . . . . 6 | 4.4. Link Failure and Convergence . . . . . . . . . . . . . . 6 | |||
| 5. Data-Plane Processing . . . . . . . . . . . . . . . . . . . . 7 | 5. Data-Plane Processing . . . . . . . . . . . . . . . . . . . . 7 | |||
| 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | |||
| 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 9.1. Normative References . . . . . . . . . . . . . . . . . . 7 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 7 | |||
| 9.2. Informative References . . . . . . . . . . . . . . . . . 8 | 9.2. Informative References . . . . . . . . . . . . . . . . . 8 | |||
| skipping to change at page 2, line 50 ¶ | skipping to change at page 2, line 50 ¶ | |||
| 1. Introduction | 1. Introduction | |||
| Bit Index Explicit Replication (BIER) [RFC8279] is an architecture | Bit Index Explicit Replication (BIER) [RFC8279] is an architecture | |||
| that provides optimal multicast forwarding without requiring | that provides optimal multicast forwarding without requiring | |||
| intermediate routers to maintain any per-flow state by using a | intermediate routers to maintain any per-flow state by using a | |||
| multicast-specific BIER header. [RFC8296] defines two types of BIER | multicast-specific BIER header. [RFC8296] defines two types of BIER | |||
| encapsulation formats: one is MPLS encapsulation, the other is non- | encapsulation formats: one is MPLS encapsulation, the other is non- | |||
| MPLS encapsulation. Entropy is a technique used in BIER to support | MPLS encapsulation. Entropy is a technique used in BIER to support | |||
| load-balancing. This document examines and describes how BIER | load-balancing. This document examines and describes how BIER | |||
| Entropy is to be applied to Data Center CLOS networks for path | Entropy is to be applied to Data Center Clos networks for path | |||
| selection. | selection. | |||
| 2. Terminology | 2. Terminology | |||
| Readers of this document are assumed to be familiar with the | Readers of this document are assumed to be familiar with the | |||
| terminology and concepts of the documents listed as Normative | terminology and concepts of the documents listed as Normative | |||
| References. | References. | |||
| 3. Problem Statement and Considerations | 3. Problem Statement and Considerations | |||
| 3.1. Problem Statement | 3.1. Problem Statement | |||
| A common choice for a horizontally scalable topology used in Data | A common choice for a horizontally scalable topology used in Data | |||
| Center is a CLOS topology. This topology features an odd number of | Center is a Clos topology. This topology features an odd number of | |||
| stages, for example, a 5-Stage CLOS Topology as a example in | stages, for example, a 5-Stage Clos Topology as a example in | |||
| [RFC7938]. | [RFC7938]. | |||
| ECMP is the fundamental load-sharing mechanism used by a CLOS | ECMP is the fundamental load-sharing mechanism used by a Clos | |||
| topology. Effectively, every lower-tier device will use all of its | topology. Effectively, every lower-tier device will use all of its | |||
| directly attached upper-tier devices to load-share traffic destined | directly attached upper-tier devices to load-share traffic destined | |||
| to the same IP prefix. The number of ECMP paths between any two Tier | to the same IP prefix. The number of ECMP paths between any two Tier | |||
| 3 devices in CLOS topology is equal to the number of the devices in | 3 devices in Clos topology is equal to the number of the devices in | |||
| the middle stage (Tier 1). For example, Figure 1 illustrates a | the middle stage (Tier 1). For example, Figure 1 illustrates a | |||
| topology where Tier 3 device L1 has four paths to reach servers X and | topology where Tier 3 device L1 has four paths to reach servers X and | |||
| Y, via Tier 2 devices S1 and S2 and then Tier 1 devices S11, S12, S21 | Y, via Tier 2 devices S1 and S2 and then Tier 1 devices S11, S12, S21 | |||
| and S22 respectively. | and S22 respectively. | |||
| Tier 1 | Tier 1 | |||
| +-----+ | +-----+ | |||
| Cluster |SUPER| | Cluster |SUPER| | |||
| +----------------------------+ +--| S11 |--+ | +----------------------------+ +--| S11 |--+ | |||
| | | | +-----+ | | | | | +-----+ | | |||
| skipping to change at page 4, line 30 ¶ | skipping to change at page 4, line 30 ¶ | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| | +-----+ +-----+ | | +-----+ | +-----+ +-----+ | | +-----+ +-----+ | | +-----+ | +-----+ +-----+ | |||
| | | LEAF| | LEAF| | +--|SUPER|--+ | LEAF| | LEAF| | | | LEAF| | LEAF| | +--|SUPER|--+ | LEAF| | LEAF| | |||
| | | L1 | | L2 | Tier 3 | | S22 | Tier 3 | L3 | | L4 | | | | L1 | | L2 | Tier 3 | | S22 | Tier 3 | L3 | | L4 | | |||
| | +-----+ +-----+ | +-----+ +-----+ +-----+ | | +-----+ +-----+ | +-----+ +-----+ +-----+ | |||
| | | | | | | | | | | | | | | | | | | | | | | |||
| | O O O O | X Y O O | | O O O O | X Y O O | |||
| | Servers | Servers | | Servers | Servers | |||
| +----------------------------+ | +----------------------------+ | |||
| Figure 1: 5-Stage CLOS Topology | Figure 1: 5-Stage Clos Topology | |||
| When BIER is deployed in a multi-tenant data center network | When BIER is deployed in a multi-tenant data center network | |||
| environment for efficient delivery of Broadcast, Unknown-unicast and | environment for efficient delivery of Broadcast, Unknown-unicast and | |||
| Multicast (BUM) traffic, a network operator may want a deterministic | Multicast (BUM) traffic, a network operator may want a deterministic | |||
| path for every packet. For example, when L1 needs to send a BUM | path for every packet. For example, when L1 needs to send a BUM | |||
| packet to L3 and L4, which are in different SIs, L1 has to send the | packet to L3 and L4, which are in different SIs, L1 has to send the | |||
| packet twice, and expects the packet along two deterministic paths of | packet twice, and expects the packet along two deterministic paths of | |||
| L1->S1->S11-->L3 and L1->S2->S21-->L4 seperately. Another example of | L1->S1->S11-->L3 and L1->S2->S21-->L4 seperately. Another example of | |||
| using a deterministic path in a DC is for per-flow steering of | using a deterministic path in a DC is for per-flow steering of | |||
| "elephant" flows defined in [I-D.ietf-spring-segment-routing-msdc]. | "elephant" flows defined in [I-D.ietf-spring-segment-routing-msdc]. | |||
| skipping to change at page 5, line 19 ¶ | skipping to change at page 5, line 19 ¶ | |||
| If one wants, however, to get a deterministic path from the equal | If one wants, however, to get a deterministic path from the equal | |||
| cost paths, one can use part of the 20-bit entropy field. For | cost paths, one can use part of the 20-bit entropy field. For | |||
| example, bit 0 to bit 2 of entropy label can represent a value of 0 | example, bit 0 to bit 2 of entropy label can represent a value of 0 | |||
| to 7, and thus can be used to select a deterministic path from 8 | to 7, and thus can be used to select a deterministic path from 8 | |||
| equal cost paths. And thus, a 20-bit entropy label can be used by | equal cost paths. And thus, a 20-bit entropy label can be used by | |||
| routers in different tiers to select a deterministic path | routers in different tiers to select a deterministic path | |||
| independently by using different parts of the 20-bit entropy label, | independently by using different parts of the 20-bit entropy label, | |||
| and form an end-to-end deterministic path. | and form an end-to-end deterministic path. | |||
| This is simple and applicable especially for DC CLOS networks, | This is simple and applicable especially for DC Clos networks, | |||
| because data delivery in DC CLOS networks for tenants is always | because data delivery in DC Clos networks for tenants is always | |||
| multi-staged, with the upstream direction stages having equal cost | multi-staged, with the upstream direction stages having equal cost | |||
| paths. | paths. | |||
| 4. Use of BIER Entropy for DC CLOS Network | 4. Use of BIER Entropy for DC Clos Network | |||
| 4.1. Use of BIER Entropy for DC CLOS Network | 4.1. Use of BIER Entropy for DC Clos Network | |||
| Take the 5-stage CLOS network in figure 1 as an example. | Take the 5-stage Clos network in figure 1 as an example. | |||
| Tier 2 in every cluster has N nodes, and the Tier 1 has M nodes. M | Tier 2 in every cluster has N nodes, and the Tier 1 has M nodes. M | |||
| is equal to N multiplied by P. | is equal to N multiplied by P. | |||
| Tier 3 switches, in upstream direction, act as stage 1 of data | Tier 3 switches, in upstream direction, act as stage 1 of data | |||
| delivery and have N equal cost paths to every BFERs in other | delivery and have N equal cost paths to every BFERs in other | |||
| clusters. Tier 2 switches, in upstream direction, act as stage 2 of | clusters. Tier 2 switches, in upstream direction, act as stage 2 of | |||
| data delivery and have P equal cost paths to every BFERs in other | data delivery and have P equal cost paths to every BFERs in other | |||
| clusters. | clusters. | |||
| skipping to change at page 7, line 50 ¶ | skipping to change at page 7, line 50 ¶ | |||
| [I-D.ietf-mpls-spring-entropy-label] | [I-D.ietf-mpls-spring-entropy-label] | |||
| Kini, S., Kompella, K., Sivabalan, S., Litkowski, S., | Kini, S., Kompella, K., Sivabalan, S., Litkowski, S., | |||
| Shakir, R., and J. Tantsura, "Entropy label for SPRING | Shakir, R., and J. Tantsura, "Entropy label for SPRING | |||
| tunnels", draft-ietf-mpls-spring-entropy-label-12 (work in | tunnels", draft-ietf-mpls-spring-entropy-label-12 (work in | |||
| progress), July 2018. | progress), July 2018. | |||
| [I-D.ietf-spring-segment-routing-msdc] | [I-D.ietf-spring-segment-routing-msdc] | |||
| Filsfils, C., Previdi, S., Dawra, G., Aries, E., and P. | Filsfils, C., Previdi, S., Dawra, G., Aries, E., and P. | |||
| Lapukhov, "BGP-Prefix Segment in large-scale data | Lapukhov, "BGP-Prefix Segment in large-scale data | |||
| centers", draft-ietf-spring-segment-routing-msdc-10 (work | centers", draft-ietf-spring-segment-routing-msdc-11 (work | |||
| in progress), October 2018. | in progress), November 2018. | |||
| [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of | [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of | |||
| BGP for Routing in Large-Scale Data Centers", RFC 7938, | BGP for Routing in Large-Scale Data Centers", RFC 7938, | |||
| DOI 10.17487/RFC7938, August 2016, | DOI 10.17487/RFC7938, August 2016, | |||
| <https://www.rfc-editor.org/info/rfc7938>. | <https://www.rfc-editor.org/info/rfc7938>. | |||
| [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., | [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., | |||
| Przygienda, T., and S. Aldrin, "Multicast Using Bit Index | Przygienda, T., and S. Aldrin, "Multicast Using Bit Index | |||
| Explicit Replication (BIER)", RFC 8279, | Explicit Replication (BIER)", RFC 8279, | |||
| DOI 10.17487/RFC8279, November 2017, | DOI 10.17487/RFC8279, November 2017, | |||
| End of changes. 17 change blocks. | ||||
| 22 lines changed or deleted | 22 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||