| < draft-mohanty-bess-ebgp-dmz-03.txt | draft-ietf-bess-ebgp-dmz-00.txt > | |||
|---|---|---|---|---|
| BESS WorkGroup S. Mohanty | BESS WorkGroup S R. Mohanty | |||
| Internet-Draft Cisco Systems | Internet-Draft Cisco Systems | |||
| Intended status: Informational A. Vayner | Intended status: Informational A. Vayner | |||
| Expires: September 16, 2021 Google | Expires: 28 August 2022 Google | |||
| A. Gattani | A. Gattani | |||
| A. Kini | A. Kini | |||
| Arista Networks | Arista Networks | |||
| March 15, 2021 | 24 February 2022 | |||
| Cumulative DMZ Link Bandwidth and load-balancing | Cumulative DMZ Link Bandwidth and load-balancing | |||
| draft-mohanty-bess-ebgp-dmz-03 | draft-ietf-bess-ebgp-dmz-00 | |||
| Abstract | Abstract | |||
| The DMZ Link Bandwidth draft provides a way to load-balance traffic | The DMZ Link Bandwidth draft provides a way to load-balance traffic | |||
| to a destination (which is in a different AS than the source) which | to a destination (which is in a different AS than the source) which | |||
| is reachable via more than one path. Typically, the link bandwidth | is reachable via more than one path. Typically, the link bandwidth | |||
| (either configured on the link of the EBGP egress interface or set | (either configured on the link of the EBGP egress interface or set | |||
| via a policy) is encoded in an extended community and then sent to | via a policy) is encoded in an extended community and then sent to | |||
| the IBGP peer which employs multi-path. The link-bandwidth value is | the IBGP peer which employs multi-path. The link-bandwidth value is | |||
| then extracted from the path extended community and is used as a | then extracted from the path extended community and is used as a | |||
| skipping to change at page 1, line 48 ¶ | skipping to change at page 1, line 48 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on September 16, 2021. | This Internet-Draft will expire on 28 August 2022. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | license-info) in effect on the date of publication of this document. | |||
| publication of this document. Please review these documents | Please review these documents carefully, as they describe your rights | |||
| carefully, as they describe your rights and restrictions with respect | and restrictions with respect to this document. Code Components | |||
| to this document. Code Components extracted from this document must | extracted from this document must include Revised BSD License text as | |||
| include Simplified BSD License text as described in Section 4.e of | described in Section 4.e of the Trust Legal Provisions and are | |||
| the Trust Legal Provisions and are provided without warranty as | provided without warranty as described in the Revised BSD License. | |||
| described in the Simplified BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 | 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3. Problem Description . . . . . . . . . . . . . . . . . . . . . 3 | 3. Problem Description . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 4. Large Scale Data Centers Use Case . . . . . . . . . . . . . . 6 | 4. Large Scale Data Centers Use Case . . . . . . . . . . . . . . 6 | |||
| 5. Non-Conforming BGP Topologies . . . . . . . . . . . . . . . . 8 | 5. Non-Conforming BGP Topologies . . . . . . . . . . . . . . . . 8 | |||
| 6. Protocol Considerations . . . . . . . . . . . . . . . . . . . 10 | 6. Protocol Considerations . . . . . . . . . . . . . . . . . . . 9 | |||
| 7. Operational Considerations . . . . . . . . . . . . . . . . . 10 | 7. Operational Considerations . . . . . . . . . . . . . . . . . 10 | |||
| 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | |||
| 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 | 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 | 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 | |||
| 10.2. Informative References . . . . . . . . . . . . . . . . . 11 | 10.2. Informative References . . . . . . . . . . . . . . . . . 11 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 1. Introduction | 1. Introduction | |||
| skipping to change at page 3, line 31 ¶ | skipping to change at page 3, line 30 ¶ | |||
| R1- -20 - - | | R1- -20 - - | | |||
| R3- -100 - -| | R3- -100 - -| | |||
| R2- -10 - - | | | R2- -10 - - | | | |||
| | | | | |||
| R6- -40 - - | |- - R4 | R6- -40 - - | |- - R4 | |||
| | | | | | | |||
| R5- -100 - -| | R5- -100 - -| | |||
| R7- -30 - - | | R7- -30 - - | | |||
| EBGP Network with cumulative DMZ requirement | Figure 1 | |||
| Figure 1 | EBGP Network with cumulative DMZ requirement | |||
| 2. Requirements Language | 2. Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
| 3. Problem Description | 3. Problem Description | |||
| Figure 1 above represents an all-EBGP network. Router R3 is peering | Figure 1 above represents an all-EBGP network. Router R3 is peering | |||
| skipping to change at page 5, line 14 ¶ | skipping to change at page 4, line 46 ¶ | |||
| R1- -20 - - | | R1- -20 - - | | |||
| R3- -30 (100) - -| | R3- -30 (100) - -| | |||
| R2- -10 - - | | | R2- -10 - - | | | |||
| | | | | |||
| R6- -40 - - | |- - R4 | R6- -40 - - | |- - R4 | |||
| | | | | | | |||
| R5- -70 (100) - -| | R5- -70 (100) - -| | |||
| R7- -30 - - | | R7- -30 - - | | |||
| EBGP Network showing advertisement of cumulative link bandwidth | Figure 2 | |||
| Figure 2 | ||||
| EBGP Network showing advertisement of cumulative link bandwidth | ||||
| With the existing rules for the DMZ link bandwidth, this is not | With the existing rules for the DMZ link bandwidth, this is not | |||
| possible. First the LB extended community is not sent over EBGP. | possible. First the LB extended community is not sent over EBGP. | |||
| Secondly the DMZ does not have a notion of conveying the cumulative | Secondly the DMZ does not have a notion of conveying the cumulative | |||
| link bandwidth (of the directed tree rooted at a node) to an upstream | link bandwidth (of the directed tree rooted at a node) to an upstream | |||
| router. To enable the use case described above, the cumulative link | router. To enable the use case described above, the cumulative link | |||
| bandwidth of R1 and R2 has to be advertised by R3 to R4, and, | bandwidth of R1 and R2 has to be advertised by R3 to R4, and, | |||
| similarly, the cumulative bandwidth of R6 and R7 has to be advertised | similarly, the cumulative bandwidth of R6 and R7 has to be advertised | |||
| by R5 to R4. This will enable R4 to load-balance based on the | by R5 to R4. This will enable R4 to load-balance based on the | |||
| proportion of the cumulative link bandwidth that it receives from its | proportion of the cumulative link bandwidth that it receives from its | |||
| downstream routers R3 and R5. Figure 2 shows the cumulative link | downstream routers R3 and R5. Figure 2 shows the cumulative link | |||
| skipping to change at page 7, line 32 ¶ | skipping to change at page 7, line 28 ¶ | |||
| +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ | |||
| | | | | | | | | | | |||
| | | | | | | | | | | |||
| | +-----+ | | +-----+ | | | +-----+ | | +-----+ | | |||
| +-| AS4 |-+ +-| AS5 |-+ Tier 3 | +-| AS4 |-+ +-| AS5 |-+ Tier 3 | |||
| +-----+ +-----+ | +-----+ +-----+ | |||
| | | | | | | | | | | | | | | |||
| <- 3 Servers -> <- 10 Servers -> | <- 3 Servers -> <- 10 Servers -> | |||
| Typical Data Center Topology (RFC7938) | Figure 3 | |||
| Figure 3 | ||||
| In a regular ECMP environment, the tier 1 layer would see an ECMP | In a regular ECMP environment, the tier 1 layer would see an ECMP | |||
| path equally load-sharing across all 4 tier 2 paths. This would | path equally load-sharing across all 4 tier 2 paths. This would | |||
| cause the servers on the left part of the data center to be | cause the servers on the left part of the data center to be | |||
| potentially overloaded, while the servers on the right to be | potentially overloaded, while the servers on the right to be | |||
| underutilized. Using link bandwidth advertisements the servers could | underutilized. Using link bandwidth advertisements the servers could | |||
| add a link bandwidth extended community to the advertised service | add a link bandwidth extended community to the advertised service | |||
| prefix. Another option is to add the extended community on the tier | prefix. Another option is to add the extended community on the tier | |||
| 3 network devices as the routes are received from the servers or | 3 network devices as the routes are received from the servers or | |||
| generated locally on the network devices. If the link bandwidth | generated locally on the network devices. If the link bandwidth | |||
| value advertised for the service represents the server capacity for | value advertised for the service represents the server capacity for | |||
| that service, each data center tier would aggregate the values up | that service, each data center tier would aggregate the values up | |||
| when sending the update to the higher tier. The result would be a | when sending the update to the higher tier. The result would be a | |||
| set of weighted load-sharing metrics at each tier allowing the | set of weighted load-sharing metrics at each tier allowing the | |||
| network to distribute the flow load among the different servers in | network to distribute the flow load among the different servers in | |||
| the most optimal way. If a server is added or removed to the service | the most optimal way. If a server is added or removed to the service | |||
| prefix, it would add or remove its link bandwidth value and the | prefix, it would add or remove its link bandwidth value and the | |||
| network would adjust accordingly. | network would adjust accordingly. | |||
| Typical Data Center Topology (RFC7938) | ||||
| Figure 4 shows a more popular Spine Leaf architecture similar to | Figure 4 shows a more popular Spine Leaf architecture similar to | |||
| [RFC7938] section 3.2. Tor1, Tor2 and Tor3 are in the same tier, | [RFC7938] section 3.2. Tor1, Tor2 and Tor3 are in the same tier, | |||
| i.e. the leaf tier (The representation shown in Figure 3 here is the | i.e. the leaf tier (The representation shown in Figure 3 here is the | |||
| unfolded Clos). Using the same example above, it is clear that the | unfolded Clos). Using the same example above, it is clear that the | |||
| LB extended community value received by each of Spine1 and Spine2 | LB extended community value received by each of Spine1 and Spine2 | |||
| from Tor1 and Tor2 is in the ratio 3 to 10 respectively. The Spines | from Tor1 and Tor2 is in the ratio 3 to 10 respectively. The Spines | |||
| will then aggregate the bandwidth, regenerate and advertise the LB | will then aggregate the bandwidth, regenerate and advertise the LB | |||
| extended-community to Tor3. Tor3 will do equal cost sharing to both | extended-community to Tor3. Tor3 will do equal cost sharing to both | |||
| the spines which in turn will do the traffic-splitting in the ratio 3 | the spines which in turn will do the traffic-splitting in the ratio 3 | |||
| to 10 when forwarding the traffic to the Tor1 and Tor2 respectively. | to 10 when forwarding the traffic to the Tor1 and Tor2 respectively. | |||
| skipping to change at page 8, line 39 ¶ | skipping to change at page 8, line 32 ¶ | |||
| +----+--+ +-+----+ | +----+--+ +-+----+ | |||
| | \ / | | | \ / | | |||
| - + - - | - + - - | |||
| | / \ | | | / \ | | |||
| +-----+- + -+-----+ | +-----+- + -+-----+ | |||
| |Tor1 | |Tor2 | Tier 1 | |Tor1 | |Tor2 | Tier 1 | |||
| +-----+ +-----+ | +-----+ +-----+ | |||
| | | | | | | | | | | | | | | |||
| <- 3 Servers -> <- 10 Servers -> | <- 3 Servers -> <- 10 Servers -> | |||
| Two-tier Clos Data Center Topology | Figure 4 | |||
| Figure 4 | Two-tier Clos Data Center Topology | |||
| 5. Non-Conforming BGP Topologies | 5. Non-Conforming BGP Topologies | |||
| This use-case will not readily apply to all topologies. Figure 5 | This use-case will not readily apply to all topologies. Figure 5 | |||
| shows a all EBGP topology: R1, R2, R3, R4, R5 and R6 are in AS1, AS2, | shows a all EBGP topology: R1, R2, R3, R4, R5 and R6 are in AS1, AS2, | |||
| AS3, AS4, AS5 and AS6 respectively. A net p/m, is being advertised | AS3, AS4, AS5 and AS6 respectively. A net p/m, is being advertised | |||
| from a server S1 with LB extended-community value 10 to R1 and R5. | from a server S1 with LB extended-community value 10 to R1 and R5. | |||
| R1 advertises p/m to R2 and R3 and also regenerates the LB extended- | R1 advertises p/m to R2 and R3 and also regenerates the LB extended- | |||
| community with value 10. R4 receives the advertisements from R2, R3 | community with value 10. R4 receives the advertisements from R2, R3 | |||
| and R5 and computes the aggregate bandwidth to be 30. R4 advertises | and R5 and computes the aggregate bandwidth to be 30. R4 advertises | |||
| skipping to change at page 9, line 31 ¶ | skipping to change at page 9, line 20 ¶ | |||
| |- - R2 - 10 --| | |- - R2 - 10 --| | |||
| | | | | | | |||
| | | | | | | |||
| S1- - 10- R1 R4- - - --30 - -R6 | S1- - 10- R1 R4- - - --30 - -R6 | |||
| | | | | | | | | |||
| | | | | | | | | |||
| 10 |- - -R3- 10 - -| | 10 |- - -R3- 10 - -| | |||
| | | | | | | |||
| |- - - R5 - - -- - -- - - -| | |- - - R5 - - -- - -- - - -| | |||
| A non-conforming topology for the Cumulative DMZ | Figure 5 | |||
| Figure 5 | A non-conforming topology for the Cumulative DMZ | |||
| One way to make the topology in the figure above conforming would be | One way to make the topology in the figure above conforming would be | |||
| to regenerate a normalized value of the aggregate link bandwidth when | to regenerate a normalized value of the aggregate link bandwidth when | |||
| the aggregate link bandwidth is being advertised over more than one | the aggregate link bandwidth is being advertised over more than one | |||
| eBGP peer link. Such normalization can be achieved through outbound | eBGP peer link. Such normalization can be achieved through outbound | |||
| policy application on top of the aggregate link bandwidth value. A | policy application on top of the aggregate link bandwidth value. A | |||
| couple of options in this context are: | couple of options in this context are: | |||
| 1. divide the aggregate link bandwidth across the eBGP peers equally | 1. divide the aggregate link bandwidth across the eBGP peers equally | |||
| skipping to change at page 11, line 7 ¶ | skipping to change at page 10, line 41 ¶ | |||
| The authors would like to thank Shyam Sethuram, Sameer Gulrajani, | The authors would like to thank Shyam Sethuram, Sameer Gulrajani, | |||
| Nitin Kumar, Keyur Patel and Juan Alcaide for discussions related to | Nitin Kumar, Keyur Patel and Juan Alcaide for discussions related to | |||
| the draft. | the draft. | |||
| 10. References | 10. References | |||
| 10.1. Normative References | 10.1. Normative References | |||
| [I-D.ietf-idr-link-bandwidth] | [I-D.ietf-idr-link-bandwidth] | |||
| Mohapatra, P. and R. Fernando, "BGP Link Bandwidth | Mohapatra, P. and R. Fernando, "BGP Link Bandwidth | |||
| Extended Community", draft-ietf-idr-link-bandwidth-06 | Extended Community", Work in Progress, Internet-Draft, | |||
| (work in progress), January 2013. | draft-ietf-idr-link-bandwidth-06, 21 January 2013, | |||
| <http://www.ietf.org/internet-drafts/draft-ietf-idr-link- | ||||
| bandwidth-06.txt>. | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of | [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of | |||
| BGP for Routing in Large-Scale Data Centers", RFC 7938, | BGP for Routing in Large-Scale Data Centers", RFC 7938, | |||
| DOI 10.17487/RFC7938, August 2016, | DOI 10.17487/RFC7938, August 2016, | |||
| <https://www.rfc-editor.org/info/rfc7938>. | <https://www.rfc-editor.org/info/rfc7938>. | |||
| skipping to change at page 11, line 40 ¶ | skipping to change at page 11, line 30 ¶ | |||
| [RFC8277] Rosen, E., "Using BGP to Bind MPLS Labels to Address | [RFC8277] Rosen, E., "Using BGP to Bind MPLS Labels to Address | |||
| Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017, | Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017, | |||
| <https://www.rfc-editor.org/info/rfc8277>. | <https://www.rfc-editor.org/info/rfc8277>. | |||
| Authors' Addresses | Authors' Addresses | |||
| Satya Ranjan Mohanty | Satya Ranjan Mohanty | |||
| Cisco Systems | Cisco Systems | |||
| 170 W. Tasman Drive | 170 W. Tasman Drive | |||
| San Jose, CA 95134 | San Jose, CA 95134 | |||
| USA | United States of America | |||
| Email: satyamoh@cisco.com | Email: satyamoh@cisco.com | |||
| Arie Vayner | Arie Vayner | |||
| 1600 Amphitheatre Parkway | 1600 Amphitheatre Parkway | |||
| Mountain View, CA 94043 | Mountain View, CA 94043 | |||
| USA | United States of America | |||
| Email: avayner@google.com | Email: avayner@google.com | |||
| Akshay Gattani | Akshay Gattani | |||
| Arista Networks | Arista Networks | |||
| 5453 Great America Parkway | 5453 Great America Parkway | |||
| Santa Clara, CA 95054 | Santa Clara, CA 95054 | |||
| USA | United States of America | |||
| Email: akshay@arista.com | Email: akshay@arista.com | |||
| Ajay Kini | Ajay Kini | |||
| Arista Networks | Arista Networks | |||
| 5453 Great America Parkway | 5453 Great America Parkway | |||
| Santa Clara, CA 95054 | Santa Clara, CA 95054 | |||
| USA | United States of America | |||
| Email: ajkini@arista.com | Email: ajkini@arista.com | |||
| End of changes. 25 change blocks. | ||||
| 42 lines changed or deleted | 38 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||