| < draft-ietf-rift-applicability-05.txt | draft-ietf-rift-applicability-06.txt > | |||
|---|---|---|---|---|
| RIFT WG Yuehua. Wei, Ed. | RIFT WG Yuehua. Wei, Ed. | |||
| Internet-Draft Zheng. Zhang | Internet-Draft Zheng. Zhang | |||
| Intended status: Informational ZTE Corporation | Intended status: Informational ZTE Corporation | |||
| Expires: 28 October 2021 Dmitry. Afanasiev | Expires: 13 November 2021 Dmitry. Afanasiev | |||
| Yandex | Yandex | |||
| P. Thubert | P. Thubert | |||
| Cisco Systems | Cisco Systems | |||
| Tom. Verhaeg | Tom. Verhaeg | |||
| Juniper Networks | Juniper Networks | |||
| Jaroslaw. Kowalczyk | Jaroslaw. Kowalczyk | |||
| Orange Polska | Orange Polska | |||
| 26 April 2021 | 12 May 2021 | |||
| RIFT Applicability | RIFT Applicability | |||
| draft-ietf-rift-applicability-05 | draft-ietf-rift-applicability-06 | |||
| Abstract | Abstract | |||
| This document discusses the properties, applicability and operational | This document discusses the properties, applicability and operational | |||
| considerations of RIFT in different network scenarios. It intends to | considerations of RIFT in different network scenarios. It intends to | |||
| provide a rough guide how RIFT can be deployed to simplify routing | provide a rough guide how RIFT can be deployed to simplify routing | |||
| operations in Clos topologies and their variations. | operations in Clos topologies and their variations. | |||
| Status of This Memo | Status of This Memo | |||
| skipping to change at page 1, line 41 ¶ | skipping to change at page 1, line 41 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on 28 October 2021. | This Internet-Draft will expire on 13 November 2021. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
| license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
| extracted from this document must include Simplified BSD License text | extracted from this document must include Simplified BSD License text | |||
| as described in Section 4.e of the Trust Legal Provisions and are | as described in Section 4.e of the Trust Legal Provisions and are | |||
| provided without warranty as described in the Simplified BSD License. | provided without warranty as described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Problem Statement of Routing in Modern IP Fabric Fat Tree | 2. Problem Statement of Routing in Modern IP Fabric Fat Tree | |||
| Networks . . . . . . . . . . . . . . . . . . . . . . . . 3 | Networks . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 3 | 3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 4 | |||
| 3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 4 | 3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 4 | |||
| 3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 6 | 3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 6 | |||
| 3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 6 | 3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 7 | |||
| 3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 7 | 3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 7 | |||
| 3.2.3. Generalizing to any Directed Acyclic Graph . . . . . 7 | 3.2.3. Generalizing to any Directed Acyclic Graph . . . . . 7 | |||
| 3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8 | 3.2.4. Reachability of Internal Nodes in the Fabric . . . . 9 | |||
| 3.3.1. Data Center Fabrics . . . . . . . . . . . . . . . . . 8 | 3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 9 | 3.3.1. Data Center Topologies . . . . . . . . . . . . . . . 9 | |||
| 3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 9 | 3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 11 | |||
| 3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 9 | 3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 11 | |||
| 3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 9 | 3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 11 | |||
| 4. Operational Considerations . . . . . . . . . . . . . . . . . 11 | 3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 12 | 4. Operational Considerations . . . . . . . . . . . . . . . . . 13 | |||
| 4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 12 | 4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 14 | |||
| 4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 14 | 4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 14 | |||
| 4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 15 | 4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 16 | |||
| 4.5. Mis-cabling Examples . . . . . . . . . . . . . . . . . . 16 | 4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 17 | |||
| 4.6. Positive vs. Negative Disaggregation . . . . . . . . . . 18 | 4.5. Mis-cabling Examples . . . . . . . . . . . . . . . . . . 18 | |||
| 4.7. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 20 | 4.6. Positive vs. Negative Disaggregation . . . . . . . . . . 20 | |||
| 4.8. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 21 | 4.7. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 22 | |||
| 4.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 22 | 4.8. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 4.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 24 | 4.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 25 | |||
| 4.11. Fabric With A Controller . . . . . . . . . . . . . . . . 25 | 4.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 26 | |||
| 4.11.1. Controller Attached to ToFs . . . . . . . . . . . . 25 | 4.11. Fabric With A Controller . . . . . . . . . . . . . . . . 27 | |||
| 4.11.2. Controller Attached to Leaf . . . . . . . . . . . . 25 | 4.11.1. Controller Attached to ToFs . . . . . . . . . . . . 27 | |||
| 4.12. Internet Connectivity With Underlay . . . . . . . . . . . 26 | 4.11.2. Controller Attached to Leaf . . . . . . . . . . . . 28 | |||
| 4.12.1. Internet Default on the Leaf . . . . . . . . . . . . 26 | 4.12. Internet Connectivity With Underlay . . . . . . . . . . . 28 | |||
| 4.12.2. Internet Default on the ToFs . . . . . . . . . . . . 26 | 4.12.1. Internet Default on the Leaf . . . . . . . . . . . . 28 | |||
| 4.13. Subnet Mismatch and Address Families . . . . . . . . . . 26 | 4.12.2. Internet Default on the ToFs . . . . . . . . . . . . 28 | |||
| 4.14. Anycast Considerations . . . . . . . . . . . . . . . . . 27 | 4.13. Subnet Mismatch and Address Families . . . . . . . . . . 28 | |||
| 4.15. IoT Applicability . . . . . . . . . . . . . . . . . . . . 28 | 4.14. Anycast Considerations . . . . . . . . . . . . . . . . . 29 | |||
| 5. Security Considerations . . . . . . . . . . . . . . . . . . . 28 | 4.15. IoT Applicability . . . . . . . . . . . . . . . . . . . . 30 | |||
| 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 29 | 4.16. Key Management . . . . . . . . . . . . . . . . . . . . . 30 | |||
| 7. Normative References . . . . . . . . . . . . . . . . . . . . 29 | ||||
| 8. Informative References . . . . . . . . . . . . . . . . . . . 30 | 5. Security Considerations . . . . . . . . . . . . . . . . . . . 31 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 31 | 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 31 | |||
| 7. Normative References . . . . . . . . . . . . . . . . . . . . 31 | ||||
| 8. Informative References . . . . . . . . . . . . . . . . . . . 33 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 | ||||
| 1. Introduction | 1. Introduction | |||
| This document discusses the properties and applicability of "Routing | This document discusses the properties and applicability of "Routing | |||
| in Fat Trees" [RIFT] (RIFT) in different deployment scenarios and | in Fat Trees" [RIFT] (RIFT) in different deployment scenarios and | |||
| highlights the operational simplicity of the technology compared to | highlights the operational simplicity of the technology compared to | |||
| traditional routing solutions. It also documents special | traditional routing solutions. It also documents special | |||
| considerations when RIFT is used with or without overlays and/or | considerations when RIFT is used with or without overlays and/or | |||
| controllers, and how RIFT corrects topology mis-cablings and/or node | controllers, and how RIFT identifies topology mis-cablings and | |||
| and link failures. | reroutes around node and link failures. | |||
| 2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks | 2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks | |||
| Clos [CLOS] and fat tree [FATTREE] topologies have gained prominence | Clos [CLOS] and fat tree [FATTREE] topologies have gained prominence | |||
| in today's networking, primarily as a result of the paradigm shift | in today's networking, primarily as a result of the paradigm shift | |||
| towards a centralized data-center based architecture that deliver a | towards a centralized data-center based architecture that deliver a | |||
| majority of computation and storage services. | majority of computation and storage services. | |||
| Today's current routing protocols were geared towards a network with | Today's current routing protocols were geared towards a network with | |||
| an irregular topology with isotropic properties, and low degree of | an irregular topology with isotropic properties, and low degree of | |||
| connectivity. When applied to Fat Tree topologies: | connectivity. When applied to Fat Tree topologies: | |||
| * They tend to need extensive configuration or provisioning during | * They tend to need extensive configuration or provisioning during | |||
| bring up and re-dimensioning. | bring up and re-dimensioning. | |||
| * All nodes including spine and leaf nodes learn the entire network | * All nodes including spine and leaf nodes learn the entire network | |||
| topology and routing information, which is in fact, not needed on | topology and routing information, which is in fact, not needed on | |||
| the leaf nodes during normal operation. | the leaf nodes during normal operation. | |||
| * Significant link-state PDUs (LSPs) flooding duplication between | * They flood significant amounts of duplicate link state information | |||
| spine nodes and leaf nodes occurs during network bring up and | between spine and leaf nodes during topology updates and | |||
| topology updates. | convergence events, requiring that additional CPU and link | |||
| bandwidth be consumed. This may impact the stability and | ||||
| * This consumes both CPU and link bandwidth resources which prevents | scalability of the fabric, make the fabric less reactive to | |||
| the use of cheaper hardware at the lower levels (leaf and spine) | failures, and prevent the use of cheaper hardware at the lower | |||
| and reduces the scalability and reactivity.of the network. | levels (i.e. spine and leaf nodes). | |||
| 3. Applicability of RIFT to Clos IP Fabrics | 3. Applicability of RIFT to Clos IP Fabrics | |||
| Further content of this document assumes that the reader is familiar | Further content of this document assumes that the reader is familiar | |||
| with the terms and concepts used in OSPF [RFC2328] and IS-IS | with the terms and concepts used in OSPF [RFC2328] and IS-IS | |||
| [ISO10589-Second-Edition] link-state protocols. The sections of RIFT | [ISO10589-Second-Edition] link-state protocols. The sections of RIFT | |||
| [RIFT] outline the requirements of routing in IP fabrics and RIFT | [RIFT] outline the requirements of routing in IP fabrics and RIFT | |||
| protocol concepts. | protocol concepts. | |||
| 3.1. Overview of RIFT | 3.1. Overview of RIFT | |||
| RIFT is a dynamic routing protocol that is specifically tailored for | RIFT is a dynamic routing protocol that is tailored for use in Clos, | |||
| use in Clos and Fat Tree network topologies. A core property of RIFT | Fat-Tree, and other anisotropic topologies. A core property of RIFT | |||
| is that its operation is sensitive to the structure of the fabric - | is that its operation is sensitive to the structure of the fabric - | |||
| it is anisotropic. RIFT acts as a link-state protocol when "pointing | it is anisotropic. RIFT acts as a link-state protocol when "pointing | |||
| north" - advertising southwards routes to northwards peer routers | north" - advertising southwards routes to northwards peer routers | |||
| (parents) through flooding and database synchronization- but operates | (parents) through flooding and database synchronization- but operates | |||
| hop-by-hop like a distance-vector protocol when "pointing south" - | hop-by-hop like a distance-vector protocol when "pointing south" - | |||
| typically advertising a fabric default route directed towards the Top | typically advertising a fabric default route directed towards the Top | |||
| of Fabric (ToF, aka superspine) to southwards peer routers (children) | of Fabric (ToF, aka superspine) to southwards peer routers | |||
| -. | (children). | |||
| The fabric default is typically the default route, as described in | ||||
| Section 3.2.3.8 "Southbound Default Route Origination" of RIFT | ||||
| [RIFT]. The ToF nodes may alternatively originate more specific | ||||
| prefixes (P') southbound instead of the default route. In such a | ||||
| scenario, all addresses carried within the RIFT domain MUST be | ||||
| contained within P', and it is possible for a leaf that acts as | ||||
| gateway to the internet to advertise the default route instead. | ||||
| RIFT floods flat link-state information northbound only so that each | RIFT floods flat link-state information northbound only so that each | |||
| level obtains the full topology of levels south of it. That | level obtains the full topology of levels south of it. That | |||
| information is never flooded east-west or back south again. So a top | information is never flooded east-west or back south again. So a top | |||
| tier node has full set of prefixes from the Shortest Path First (SPF) | tier node has full set of prefixes from the Shortest Path First (SPF) | |||
| calculation. | calculation. | |||
| In the southbound direction, the protocol operates like a "fully | In the southbound direction, the protocol operates like a "fully | |||
| summarizing, unidirectional" path-vector protocol or rather a | summarizing, unidirectional" path-vector protocol or rather a | |||
| distance-vector with implicit split horizon. Routing information, | distance-vector with implicit split horizon. Routing information, | |||
| skipping to change at page 5, line 4 ¶ | skipping to change at page 5, line 27 ¶ | |||
| |SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1 | |SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1 | |||
| + ++----++ ++---+-+ +--+--+-+ ++----+-+ | | + ++----++ ++---+-+ +--+--+-+ ++----+-+ | | |||
| + | | | | | | | | | ^ N | + | | | | | | | | | ^ N | |||
| Distance | +-------+ | | +--------+ | | | E | Distance | +-------+ | | +--------+ | | | E | |||
| Vector | | | | | | | | | +------> | Vector | | | | | | | | | +------> | |||
| South | +-------+ | | | +-------+ | | | | | South | +-------+ | | | +-------+ | | | | | |||
| + | | | | | | | | | + | + | | | | | | | | | + | |||
| v ++--++ +-+-++ ++-+-+ +-+--++ + | v ++--++ +-+-++ ++-+-+ +-+--++ + | |||
| |LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0 | |LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0 | |||
| +----+ +----+ +----+ +-----+ | +----+ +----+ +----+ +-----+ | |||
| Figure 1: Rift overview | ||||
| Figure 1: RIFT overview | ||||
| A spine node has only information necessary for its level, which is | A spine node has only information necessary for its level, which is | |||
| all destinations south of the node based on SPF calculation, default | all destinations south of the node based on SPF calculation, default | |||
| route, and potential disaggregated routes. | route, and potential disaggregated routes. | |||
| RIFT combines the advantage of both link-state and distance-vector: | RIFT combines the advantage of both link-state and distance-vector: | |||
| * Fastest possible convergence | * Fastest possible convergence | |||
| * Automatic detection of topology | * Automatic detection of topology | |||
| skipping to change at page 8, line 33 ¶ | skipping to change at page 9, line 7 ¶ | |||
| only incoming vertices is a ToF node. | only incoming vertices is a ToF node. | |||
| There are a number of caveats though: | There are a number of caveats though: | |||
| * The DAG structure must exist before RIFT starts, so there is a | * The DAG structure must exist before RIFT starts, so there is a | |||
| need for a companion protocol to establish the logical DAG | need for a companion protocol to establish the logical DAG | |||
| structure. | structure. | |||
| * A generic DAG does not have a sense of east and west. The | * A generic DAG does not have a sense of east and west. The | |||
| operation specified for east-west links and the southbound | operation specified for east-west links and the southbound | |||
| reflection between nodes are not applicable. | reflection between nodes are not applicable. Also ZTP will derive | |||
| a sense of depth that will eliminate some links. Variations of | ||||
| ZTP could be derived to meet specific objectives, e.g., make it so | ||||
| that most routers have at least 2 parents to reach the ToF. | ||||
| * RIFT applies to any Destination-Oriented DAG (DODAG) where there's | ||||
| only one ToF node and the problem of disaggregation does not | ||||
| exist. In that case, RIFT operates very much like RPL [RFC6550], | ||||
| but using Link State for southbound routes (downwards in RPL's | ||||
| terms). For an arbitrary DAG with multiple destinations (ToFs) | ||||
| the way disaggregation happens has to be considered. | ||||
| * Positive disaggregation expects that most of the ToF nodes reach | ||||
| most of the leaves, so disaggregation is the exception as opposed | ||||
| to the rule. When this is no more true, it makes sense to turn | ||||
| off disaggregation and route between the ToF nodes over a ring, a | ||||
| full mesh, transit network, or a form of area zero. There again, | ||||
| this operation is similar to RPL operating as a single DODAG with | ||||
| a virtual root. | ||||
| * In order to aggregate and disaggregate routes, RIFT requires that | * In order to aggregate and disaggregate routes, RIFT requires that | |||
| all the ToF nodes share the full knowledge of the prefixes in the | all the ToF nodes share the full knowledge of the prefixes in the | |||
| fabric. This can be achieved with a ring as suggested by the RIFT | fabric. | |||
| main specification, by some preconfiguration, or using a | ||||
| * This can be achieved with a ring as suggested by the RIFT main | ||||
| specification, by some preconfiguration, or using a | ||||
| synchronization with a common repository where all the active | synchronization with a common repository where all the active | |||
| prefixes are registered. | prefixes are registered. | |||
| 3.2.4. Reachability of Internal Nodes in the Fabric | ||||
| RIFT does not require that nodes have reachable addresses in the | ||||
| fabric, though it is clearly desirable for operational purposes. | ||||
| Under normal operating conditions this can be easily achieved by | ||||
| injecting the node's loopback address into North and South Prefix | ||||
| TIEs or other implementation specific mechanisms. | ||||
| Special considerations arise when a node loses all northbound | ||||
| adjacencies, but is not at the top of the fabric. These are outside | ||||
| the scope of this document and could be discussed in a separate | ||||
| document. | ||||
| 3.3. Use Cases | 3.3. Use Cases | |||
| 3.3.1. Data Center Fabrics | 3.3.1. Data Center Topologies | |||
| 3.3.1.1. Data Center Fabrics | ||||
| RIFT is suited for applying in the data center (DC) IP fabrics | RIFT is suited for applying in data center (DC) IP fabrics underlay | |||
| underlay routing, vast majority of which seem to be currently (and | routing, vast majority of which seem to be currently (and for the | |||
| for the foreseeable future) Clos architectures. It significantly | foreseeable future) Clos architectures. It significantly simplifies | |||
| simplifies operation and deployment of such fabrics as described in | operation and deployment of such fabrics as described in Section 4 | |||
| Section 4 for environments compared to extensive proprietary | for environments compared to extensive proprietary provisioning and | |||
| provisioning and operational solutions. | operational solutions. | |||
| 3.3.1.2. Adaptations to Other Proposed Data Center Topologies | ||||
| . +-----+ +-----+ | ||||
| . | | | | | ||||
| .+-+ S0 | | S1 | | ||||
| .| ++---++ ++---++ | ||||
| .| | | | | | ||||
| .| | +------------+ | | ||||
| .| | | +------------+ | | ||||
| .| | | | | | ||||
| .| ++-+--+ +--+-++ | ||||
| .| | | | | | ||||
| .| | A0 | | A1 | | ||||
| .| +-+--++ ++---++ | ||||
| .| | | | | | ||||
| .| | +------------+ | | ||||
| .| | +-----------+ | | | ||||
| .| | | | | | ||||
| .| +-+-+-+ +--+-++ | ||||
| .+-+ | | | | ||||
| . | L0 | | L1 | | ||||
| . +-----+ +-----+ | ||||
| Figure 2: Level Shortcut | ||||
| RIFT is not strictly limited to Clos topologies. The protocol only | ||||
| requires a sense of "compass rose directionality" either achieved | ||||
| through configuration or derivation of levels. So, conceptually, | ||||
| shortcuts between levels could be included. Figure 2 depicts an | ||||
| example of a shortcut between levels. In this example, sub-optimal | ||||
| routing will occur when traffic is sent from L0 to L1 via S0's | ||||
| default route and back down through A0 or A1. In order to ensure | ||||
| that, only default routes from A0 or A1 are used, all leaves would be | ||||
| required to install each others routes. | ||||
| While various technical and operational challenges may require the | ||||
| use of such modifications, discussion of those topics are outside the | ||||
| scope of this document. | ||||
| 3.3.2. Metro Fabrics | 3.3.2. Metro Fabrics | |||
| The demand for bandwidth is increasing steadily, driven primarily by | The demand for bandwidth is increasing steadily, driven primarily by | |||
| environments close to content producers (server farms connection via | environments close to content producers (server farms connection via | |||
| DC fabrics) but in proximity to content consumers as well. Consumers | DC fabrics) but in proximity to content consumers as well. Consumers | |||
| are often clustered in metro areas with their own network | are often clustered in metro areas with their own network | |||
| architectures that can benefit from simplified, regular Clos | architectures that can benefit from simplified, regular Clos | |||
| structures and hence from RIFT. | structures and hence from RIFT. | |||
| skipping to change at page 10, line 45 ¶ | skipping to change at page 12, line 45 ¶ | |||
| | |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| | |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| | || VAS7 || || VAS4 || || vIGMP || ||BAA || | | | || VAS7 || || VAS4 || || vIGMP || ||BAA || | | |||
| | |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| | +--------+ +--------+ +----------+ +-------+ | | | +--------+ +--------+ +----------+ +-------+ | | |||
| | | | | | | |||
| ++-----------+ +---------++ | ++-----------+ +---------++ | |||
| |Network I/O | |Access I/O| | |Network I/O | |Access I/O| | |||
| +------------+ +----------+ | +------------+ +----------+ | |||
| Figure 2: An example of CloudCO architecture | Figure 3: An example of CloudCO architecture | |||
| The Spine-Leaf architecture deployed inside CloudCO meets the network | The Spine-Leaf architecture deployed inside CloudCO meets the network | |||
| requirements of adaptable, agile, scalable and dynamic. | requirements of adaptable, agile, scalable and dynamic. | |||
| 4. Operational Considerations | 4. Operational Considerations | |||
| RIFT presents the opportunity for organizations building and | RIFT presents the opportunity for organizations building and | |||
| operating IP fabrics to simplify their operation and deployments | operating IP fabrics to simplify their operation and deployments | |||
| while achieving many desirable properties of a dynamic routing on | while achieving many desirable properties of a dynamic routing on | |||
| such a substrate: | such a substrate: | |||
| skipping to change at page 13, line 31 ¶ | skipping to change at page 15, line 31 ¶ | |||
| | +--------------+ | + ++XX+linkSL6+---+ + | | +--------------+ | + ++XX+linkSL6+---+ + | |||
| | | | | linkSL5 | | linkSL8 | | | | | linkSL5 | | linkSL8 | |||
| | +------------+ | | + +---+linkSL7+-+ | + | | +------------+ | | + +---+linkSL7+-+ | + | |||
| | | | | | | | | | | | | | | | | | | |||
| +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | |||
| |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
| +-+-----+ ++------+ +-----+-+ +-+-----+ | +-+-----+ ++------+ +-----+-+ +-+-----+ | |||
| + + + + | + + + + | |||
| Prefix111 Prefix112 Prefix121 Prefix122 | Prefix111 Prefix112 Prefix121 Prefix122 | |||
| Figure 3: Suboptimal routing upon link failure use case | Figure 4: Suboptimal routing upon link failure use case | |||
| As shown in Figure 3, as the result of the south reflection between | As shown in Figure 4, as the result of the south reflection between | |||
| Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, Spine121 and | Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, Spine121 and | |||
| Spine 122 knows each other at level 1. | Spine 122 knows each other at level 1. | |||
| Without disaggregation mechanism, when linkSL6 fails, the packet from | Without disaggregation mechanism, when linkSL6 fails, the packet from | |||
| leaf121 to prefix122 will probably go up through linkSL5 to linkTS3 | leaf121 to prefix122 will probably go up through linkSL5 to linkTS3 | |||
| then go down through linkTS4 to linkSL8 to Leaf122 or go up through | then go down through linkTS4 to linkSL8 to Leaf122 or go up through | |||
| linkSL5 to linkTS6 then go down through linkTS4 and linkSL8 to | linkSL5 to linkTS6 then go down through linkTS4 and linkSL8 to | |||
| Leaf122 based on pure default route. It's the case of suboptimal | Leaf122 based on pure default route. It's the case of suboptimal | |||
| routing or bow-tieing. | routing or bow-tieing. | |||
| skipping to change at page 14, line 34 ¶ | skipping to change at page 16, line 34 ¶ | |||
| + +---------------+ | + +---+linkSL6+---+ + | + +---------------+ | + +---+linkSL6+---+ + | |||
| linkSL1 | | | linkSL5 | | linkSL8 | linkSL1 | | | linkSL5 | | linkSL8 | |||
| + +--+linkSL3+--+ | | + +---+linkSL7+-+ | + | + +--+linkSL3+--+ | | + +---+linkSL7+-+ | + | |||
| | | | | | | | | | | | | | | | | | | |||
| +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | |||
| |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
| +-+-----+ ++------+ +-----+-+ +-+-----+ | +-+-----+ ++------+ +-----+-+ +-+-----+ | |||
| + + + + | + + + + | |||
| Prefix111 Prefix112 Prefix121 Prefix122 | Prefix111 Prefix112 Prefix121 Prefix122 | |||
| Figure 4: Black-holing upon link failure use case | Figure 5: Black-holing upon link failure use case | |||
| This scenario illustrates a case when double link failure occurs and | This scenario illustrates a case when double link failure occurs and | |||
| with that black-holing can happen. | with that black-holing can happen. | |||
| Without disaggregation mechanism, when linkTS3 and linkTS4 both fail, | Without disaggregation mechanism, when linkTS3 and linkTS4 both fail, | |||
| the packet from leaf111 to prefix122 would suffer 50% black-holing | the packet from leaf111 to prefix122 would suffer 50% black-holing | |||
| based on pure default route. The packet supposed to go up through | based on pure default route. The packet supposed to go up through | |||
| linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be | linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be | |||
| dropped. The packet supposed to go up through linkSL3 to linkTS2 | dropped. The packet supposed to go up through linkSL3 to linkTS2 | |||
| then go down through linkTS3 or linkTS4 will be dropped as well. | then go down through linkTS3 or linkTS4 will be dropped as well. | |||
| skipping to change at page 15, line 35 ¶ | skipping to change at page 17, line 35 ¶ | |||
| truly reflect the relative position of the nodes in the fabric. It | truly reflect the relative position of the nodes in the fabric. It | |||
| is recommended to let ZTP configure the network, and when not, it is | is recommended to let ZTP configure the network, and when not, it is | |||
| recommended to configure the level of all the nodes but those that | recommended to configure the level of all the nodes but those that | |||
| are forced as leaves to avoid an undesirable interaction between ZTP | are forced as leaves to avoid an undesirable interaction between ZTP | |||
| and the manual configuration. | and the manual configuration. | |||
| ZTP requires that the administrator points out the Top-of-Fabric | ZTP requires that the administrator points out the Top-of-Fabric | |||
| (ToF) nodes to set the baseline from which the fabric topology is | (ToF) nodes to set the baseline from which the fabric topology is | |||
| derived. The Top-of-Fabric nodes are configured with TOP_OF_FABRIC | derived. The Top-of-Fabric nodes are configured with TOP_OF_FABRIC | |||
| flag which are initial 'seeds' needed for other ZTP nodes to derive | flag which are initial 'seeds' needed for other ZTP nodes to derive | |||
| their level in the topology. The derivation of the level of each | their level in the topology. ZTP computes the level of each node | |||
| node happens then based on Link Information Elements (LIEs) received | based on the Highest Available Level (HAL) of the potential parent(s) | |||
| from its neighbors whereas each node (with possibly exceptions of | nearest that baseline, which represents the superspine. In a | |||
| configured leaves) tries to attach at the highest possible point in | fashion, RIFT can be seen as a distance-vector protocol that computes | |||
| the fabric. This guarantees that even if the diffusion front reaches | a set of feasible successors towards the superspine and auto- | |||
| a node from "below" faster than from "above", it will greedily | configures the rest of the topology. In a fashion, RIFT can be seen | |||
| abandon already negotiated level derived from nodes topologically | as a distance-vector protocol that computes a set of feasible | |||
| below it and properly peer with nodes above. | successors towards the superspine and auto-configures the rest of the | |||
| topology. | ||||
| The autoconfiguration mechanism computes a global maximum of levels | ||||
| by diffusion. The derivation of the level of each node happens then | ||||
| based on Link Information Elements (LIEs) received from its neighbors | ||||
| whereas each node (with possibly exceptions of configured leaves) | ||||
| tries to attach at the highest possible point in the fabric. This | ||||
| guarantees that even if the diffusion front reaches a node from | ||||
| "below" faster than from "above", it will greedily abandon already | ||||
| negotiated level derived from nodes topologically below it and | ||||
| properly peer with nodes above. | ||||
| The achieved equilibrium can be disturbed massively by all nodes with | ||||
| highest level either leaving or entering the domain (with some finer | ||||
| distinctions not explained further). It is therefore recommended | ||||
| that each node is multi-homed towards nodes with respective HAL | ||||
| offerings. Fortunately, this is the natural state of things for the | ||||
| topology variants considered in RIFT. | ||||
| A RIFT node may also be configured to confine it to the leaf role | A RIFT node may also be configured to confine it to the leaf role | |||
| with the LEAF_ONLY flag. A leaf node can also be configured to | with the LEAF_ONLY flag. A leaf node can also be configured to | |||
| support leaf-2-leaf procedures with the LEAF_2_LEAF flag. In either | support leaf-2-leaf procedures with the LEAF_2_LEAF flag. In either | |||
| case the node cannot be TOP_OF_FABRIC and its level cannot be | case the node cannot be TOP_OF_FABRIC and its level cannot be | |||
| configured. RIFT will fully configure the node's level after it is | configured. RIFT will fully configure the node's level after it is | |||
| attached to the topology and ensure that the node is at the "bottom | attached to the topology and ensure that the node is at the "bottom | |||
| of the hierarchy" (southernmost). | of the hierarchy" (southernmost). | |||
| 4.5. Mis-cabling Examples | 4.5. Mis-cabling Examples | |||
| skipping to change at page 16, line 28 ¶ | skipping to change at page 19, line 4 ¶ | |||
| |Spine111| |Spine112| | |Spine121| |Spine122| LEVEL 1 | |Spine111| |Spine112| | |Spine121| |Spine122| LEVEL 1 | |||
| +-+---+--+ ++----+--+ | +--+---+-+ +-+----+-+ | +-+---+--+ ++----+--+ | +--+---+-+ +-+----+-+ | |||
| | | | | | | | | | | | | | | | | | | | | |||
| | +---------+ | link-M | +---------+ | | | +---------+ | link-M | +---------+ | | |||
| | | | | | | | | | | | | | | | | | | | | |||
| | +-------+ | | | | +-------+ | | | | +-------+ | | | | +-------+ | | | |||
| | | | | | | | | | | | | | | | | | | | | |||
| +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | |||
| |Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0 | |||
| +-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
| Figure 6: A single plane mis-cabling example | ||||
| Figure 5: A single plane mis-cabling example | Figure 6 shows a single plane mis-cabling example. It's a perfect | |||
| Figure 5 shows a single plane mis-cabling example. It's a perfect | ||||
| Fat Tree fabric except link-M connecting Leaf112 to ToF22. | Fat Tree fabric except link-M connecting Leaf112 to ToF22. | |||
| The RIFT control protocol can discover the physical links | The RIFT control protocol can discover the physical links | |||
| automatically and be able to detect cabling that violates Fat Tree | automatically and be able to detect cabling that violates Fat Tree | |||
| topology constraints. It reacts accordingly to such mis-cabling | topology constraints. It reacts accordingly to such mis-cabling | |||
| attempts, at a minimum preventing adjacencies between nodes from | attempts, at a minimum preventing adjacencies between nodes from | |||
| being formed and traffic from being forwarded on those mis-cabled | being formed and traffic from being forwarded on those mis-cabled | |||
| links. Leaf112 will in such scenario use link-M to derive its level | links. Leaf112 will in such scenario use link-M to derive its level | |||
| (unless it is leaf) and can report links to Spine111 and Spine112 as | (unless it is leaf) and can report links to Spine111 and Spine112 as | |||
| mis-cabled unless the implementations allows horizontal links. | mis-cabled unless the implementations allows horizontal links. | |||
| Figure 6 shows a multiple plane mis-cabling example. Since Leaf112 | Figure 7 shows a multiple plane mis-cabling example. Since Leaf112 | |||
| and Spine121 belong to two different PoDs, the adjacency between | and Spine121 belong to two different PoDs, the adjacency between | |||
| Leaf112 and Spine121 can not be formed. link-W would be detected and | Leaf112 and Spine121 can not be formed. link-W would be detected and | |||
| prevented. | prevented. | |||
| +-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
| |ToF A1| |ToF A2| |ToF B1| |ToF B2| LEVEL 2 | |ToF A1| |ToF A2| |ToF B1| |ToF B2| LEVEL 2 | |||
| +-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
| | | | | | | | | | | | | | | | | | | |||
| | | | +-----------------+ | | | | | | | +-----------------+ | | | | |||
| | +--------------------------+ | | | | | | +--------------------------+ | | | | | |||
| skipping to change at page 17, line 29 ¶ | skipping to change at page 19, line 47 ¶ | |||
| | | | | | | | | | | | | | | | | | | | | |||
| | +---------+ | | | +---------+ | | | +---------+ | | | +---------+ | | |||
| | | | | link-W | | | | | | | | | link-W | | | | | |||
| | +-------+ | | | | +-------+ | | | | +-------+ | | | | +-------+ | | | |||
| | | | | | | | | | | | | | | | | | | | | |||
| +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | |||
| |Leaf111| |Leaf112+------+ |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112+------+ |Leaf121| |Leaf122| LEVEL 0 | |||
| +-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
| +--------PoD#1----------+ +---------PoD#2---------+ | +--------PoD#1----------+ +---------PoD#2---------+ | |||
| Figure 6: A multiple plane mis-cabling example | Figure 7: A multiple plane mis-cabling example | |||
| RIFT provides an optional level determination procedure in its Zero | RIFT provides an optional level determination procedure in its Zero | |||
| Touch Provisioning mode. Nodes in the fabric without their level | Touch Provisioning mode. Nodes in the fabric without their level | |||
| configured determine it automatically. This can have possibly | configured determine it automatically. This can have possibly | |||
| counter-intuitive consequences however. One extreme failure scenario | counter-intuitive consequences however. One extreme failure scenario | |||
| is depicted in Figure 7 and it shows that if all northbound links of | is depicted in Figure 8 and it shows that if all northbound links of | |||
| spine11 fail at the same time, spine11 negotiates a lower level than | spine11 fail at the same time, spine11 negotiates a lower level than | |||
| Leaf11 and Leaf12. | Leaf11 and Leaf12. | |||
| To prevent such scenario where leafs are expected to act as switches, | To prevent such scenario where leafs are expected to act as switches, | |||
| LEAF_ONLY flag can be set for Leaf111 and Leaf112. Since level -1 is | LEAF_ONLY flag can be set for Leaf111 and Leaf112. Since level -1 is | |||
| invalid, Spine11 would not derive a valid level from the topology in | invalid, Spine11 would not derive a valid level from the topology in | |||
| Figure 7. It will be isolated from the whole fabric and it would be | Figure 8. It will be isolated from the whole fabric and it would be | |||
| up to the leafs to declare the links towards such spine as mis- | up to the leafs to declare the links towards such spine as mis- | |||
| cabled. | cabled. | |||
| +-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
| |ToF A1| |ToF A2| |ToF A1| |ToF A2| | |ToF A1| |ToF A2| |ToF A1| |ToF A2| | |||
| +-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
| | | | | | | | | | | | | | | |||
| | +-------+ | | | | | +-------+ | | | | |||
| + + | | ====> | | | + + | | ====> | | | |||
| X X +------+ | +------+ | | X X +------+ | +------+ | | |||
| skipping to change at page 18, line 31 ¶ | skipping to change at page 20, line 42 ¶ | |||
| +-+---+-+ +--+--+-+ +-----+-+ +-----+-+ | +-+---+-+ +--+--+-+ +-----+-+ +-----+-+ | |||
| |Leaf111| |Leaf112| |Leaf111| |Leaf112| | |Leaf111| |Leaf112| |Leaf111| |Leaf112| | |||
| +-------+ +-------+ +-+-----+ +-+-----+ | +-------+ +-------+ +-+-----+ +-+-----+ | |||
| | | | | | | |||
| | +--------+ | | +--------+ | |||
| | | | | | | |||
| +-+---+-+ | +-+---+-+ | |||
| |Spine11| | |Spine11| | |||
| +-------+ | +-------+ | |||
| Figure 7: Fallen spine | Figure 8: Fallen spine | |||
| 4.6. Positive vs. Negative Disaggregation | 4.6. Positive vs. Negative Disaggregation | |||
| Disaggregation is the procedure whereby [RIFT] advertises a more | Disaggregation is the procedure whereby [RIFT] advertises a more | |||
| specific route southwards as an exception to the aggregated fabric- | specific route southwards as an exception to the aggregated fabric- | |||
| default north. Disaggregation is useful when a prefix within the | default north. Disaggregation is useful when a prefix within the | |||
| aggregation is reachable via some of the parents but not the others | aggregation is reachable via some of the parents but not the others | |||
| at the same level of the fabric. It is mandatory when the level is | at the same level of the fabric. It is mandatory when the level is | |||
| the ToF since a ToF node that cannot reach a prefix becomes a black | the ToF since a ToF node that cannot reach a prefix becomes a black | |||
| hole for that prefix. The hard problem is to know which prefixes are | hole for that prefix. The hard problem is to know which prefixes are | |||
| skipping to change at page 22, line 32 ¶ | skipping to change at page 24, line 48 ¶ | |||
| +---+---+ |Leaf | | Leaf| | +---+---+ |Leaf | | Leaf| | |||
| +--+--+ +--+--+ | +--+--+ +--+--+ | |||
| | | | | | | |||
| IPv4 prefixes| |IPv4 prefixes | IPv4 prefixes| |IPv4 prefixes | |||
| | | | | | | |||
| +---+----+ +---+----+ | +---+----+ +---+----+ | |||
| | V4 | | V4 | | | V4 | | V4 | | |||
| | subnet | | subnet | | | subnet | | subnet | | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| Figure 8: IPv4 over IPv6 | Figure 9: IPv4 over IPv6 | |||
| 4.9. In-Band Reachability of Nodes | 4.9. In-Band Reachability of Nodes | |||
| RIFT doesn't precondition that nodes of the fabric have reachable | RIFT doesn't precondition that nodes of the fabric have reachable | |||
| addresses. But the operational purposes to reach the internal nodes | addresses. But the operational purposes to reach the internal nodes | |||
| may exist. Figure 9 shows an example that the network management | may exist. Figure 10 shows an example that the network management | |||
| station (NMS) attaches to leaf1. | station (NMS) attaches to leaf1. | |||
| +-------+ +-------+ | +-------+ +-------+ | |||
| | ToF1 | | ToF2 | | | ToF1 | | ToF2 | | |||
| ++---- ++ ++-----++ | ++---- ++ ++-----++ | |||
| | | | | | | | | | | |||
| | +----------+ | | | +----------+ | | |||
| | +--------+ | | | | +--------+ | | | |||
| | | | | | | | | | | |||
| ++-----++ +--+---++ | ++-----++ +--+---++ | |||
| skipping to change at page 23, line 25 ¶ | skipping to change at page 25, line 32 ¶ | |||
| | | | | | | | | | | |||
| | +----------+ | | | +----------+ | | |||
| | +--------+ | | | | +--------+ | | | |||
| | | | | | | | | | | |||
| ++-----++ +--+---++ | ++-----++ +--+---++ | |||
| | Leaf1 | | Leaf2 | | | Leaf1 | | Leaf2 | | |||
| +---+---+ +-------+ | +---+---+ +-------+ | |||
| | | | | |||
| |NMS | |NMS | |||
| Figure 9: In-Band reachability of node | Figure 10: In-Band reachability of node | |||
| If NMS wants to access Leaf2, it simply works. Because loopback | If NMS wants to access Leaf2, it simply works. Because loopback | |||
| address of Leaf2 is flooded in its Prefix North TIE. | address of Leaf2 is flooded in its Prefix North TIE. | |||
| If NMS wants to access Spine2, it simply works too. Because spine | If NMS wants to access Spine2, it simply works too. Because spine | |||
| node always advertises its loopback address in the Prefix North TIE. | node always advertises its loopback address in the Prefix North TIE. | |||
| NMS may reach Spine2 from Leaf1-Spine2 or Leaf1-Spine1-ToF1/ | NMS may reach Spine2 from Leaf1-Spine2 or Leaf1-Spine1-ToF1/ | |||
| ToF2-Spine2. | ToF2-Spine2. | |||
| If NMS wants to access ToF2, ToF2's loopback address needs to be | If NMS wants to access ToF2, ToF2's loopback address needs to be | |||
| injected into its Prefix South TIE. This TIE must be seen by all | injected into its Prefix South TIE. This TIE must be seen by all | |||
| nodes at the level below - the spine nodes in Figure 9 - that must | nodes at the level below - the spine nodes in Figure 10 - that must | |||
| form a ceiling for all the traffic coming from below (south). | form a ceiling for all the traffic coming from below (south). | |||
| Otherwise, the traffic from NMS may follow the default route to the | Otherwise, the traffic from NMS may follow the default route to the | |||
| wrong ToF Node, e.g., ToF1. | wrong ToF Node, e.g., ToF1. | |||
| In a fully connected ToF, in case of failure between ToF2 and spine | In a fully connected ToF, in case of failure between ToF2 and spine | |||
| nodes, ToF2's loopback address must be disaggregated recursively all | nodes, ToF2's loopback address must be disaggregated recursively all | |||
| the way to the leaves. | the way to the leaves. | |||
| In a partitioned ToF, a TOF node is only reachable within its Plane, | In a partitioned ToF, a TOF node is only reachable within its Plane, | |||
| and the disaggregation to the leaves is also required. A possible | and the disaggregation to the leaves is also required. A possible | |||
| skipping to change at page 24, line 36 ¶ | skipping to change at page 26, line 43 ¶ | |||
| | +-----------------+ | | | | | +-----------------+ | | | | |||
| | | | +-------------+ | | | | | | +-------------+ | | | |||
| + | + | | |-----------------+ | | + | + | | |-----------------+ | | |||
| X | X | +--------x-----+ | X | | X | X | +--------x-----+ | X | | |||
| + | + | | | + | | + | + | | | + | | |||
| +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ | |||
| | | | | | | | | | | | | | | | | | | |||
| +---+ +---+ ...............+---+ +---+ | +---+ +---+ ...............+---+ +---+ | |||
| SV(1) SV(2) SV(n+1) SV(n) Leaf | SV(1) SV(2) SV(n+1) SV(n) Leaf | |||
| Figure 10: Dual-homing servers | Figure 11: Dual-homing servers | |||
| In the single plane, the worst condition is disaggregation of every | In the single plane, the worst condition is disaggregation of every | |||
| other servers at the same level. Suppose the links from ToR1 (Top of | other servers at the same level. Suppose the links from ToR1 (Top of | |||
| Rack) to all the leaves become not available. All the servers' | Rack) to all the leaves become not available. All the servers' | |||
| routes are disaggregated and the FIB of the servers will be expanded | routes are disaggregated and the FIB of the servers will be expanded | |||
| with n-1 more specific routes. | with n-1 more specific routes. | |||
| Sometimes, people may prefer to disaggregate from ToR to servers from | Sometimes, people may prefer to disaggregate from ToR to servers from | |||
| start on, i.e. the servers have couple tens of routes in FIB from | start on, i.e. the servers have couple tens of routes in FIB from | |||
| start on beside default routes to avoid breakages at rack level. | start on beside default routes to avoid breakages at rack level. | |||
| skipping to change at page 25, line 34 ¶ | skipping to change at page 27, line 40 ¶ | |||
| RIFT domain |Spine| |Spine| | RIFT domain |Spine| |Spine| | |||
| +--+--+ +-----+ | +--+--+ +-----+ | |||
| | | | | | | | | | | | | |||
| | | +-------------+ | | | | +-------------+ | | |||
| | | +--------+ | | | | | +--------+ | | | |||
| | | | | | | | | | | | | |||
| | +-----+ +-+---+ | | +-----+ +-+---+ | |||
| ------- |Leaf | | Leaf| | ------- |Leaf | | Leaf| | |||
| +-----+ +-----+ | +-----+ +-----+ | |||
| Figure 11: Fabric with a controller | Figure 12: Fabric with a controller | |||
| 4.11.1. Controller Attached to ToFs | 4.11.1. Controller Attached to ToFs | |||
| If a controller is attaching to the RIFT domain from ToF, it usually | If a controller is attaching to the RIFT domain from ToF, it usually | |||
| uses dual-homing connections. The loopback prefix of the controller | uses dual-homing connections. The loopback prefix of the controller | |||
| should be advertised down by the ToF and spine to leaves. If the | should be advertised down by the ToF and spine to leaves. If the | |||
| controller loses link to ToF, make sure the ToF withdraw the prefix | controller loses link to ToF, make sure the ToF withdraw the prefix | |||
| of the controller(use different mechanisms). | of the controller(use different mechanisms). | |||
| 4.11.2. Controller Attached to Leaf | 4.11.2. Controller Attached to Leaf | |||
| If the controller is attaching from a leaf to the fabric, no special | If the controller is attaching from a leaf to the fabric, no special | |||
| provisions are needed. | provisions are needed. | |||
| 4.12. Internet Connectivity With Underlay | 4.12. Internet Connectivity With Underlay | |||
| If global addressing is running without overlay, an external default | If global addressing is running without overlay, an external default | |||
| route needs to be advertised through rift fabric to achieve internet | route needs to be advertised through RIFT fabric to achieve internet | |||
| connectivity. For the purpose of forwarding of the entire rift | connectivity. For the purpose of forwarding of the entire RIFT | |||
| fabric, an internal fabric prefix needs to be advertised in the South | fabric, an internal fabric prefix needs to be advertised in the South | |||
| Prefix TIE by ToF and spine nodes. | Prefix TIE by ToF and spine nodes. | |||
| 4.12.1. Internet Default on the Leaf | 4.12.1. Internet Default on the Leaf | |||
| In case that an internet access request comes from a leaf and the | In case that an internet access request comes from a leaf and the | |||
| internet gateway is another leaf, the leaf node as the internet | internet gateway is another leaf, the leaf node as the internet | |||
| gateway needs to advertise a default route in its Prefix North TIE. | gateway needs to advertise a default route in its Prefix North TIE. | |||
| 4.12.2. Internet Default on the ToFs | 4.12.2. Internet Default on the ToFs | |||
| skipping to change at page 26, line 34 ¶ | skipping to change at page 28, line 39 ¶ | |||
| 4.13. Subnet Mismatch and Address Families | 4.13. Subnet Mismatch and Address Families | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| | | LIE LIE | | | | | LIE LIE | | | |||
| | A | +----> <----+ | B | | | A | +----> <----+ | B | | |||
| | +---------------------+ | | | +---------------------+ | | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| X/24 Y/24 | X/24 Y/24 | |||
| Figure 12: subnet mismatch | Figure 13: subnet mismatch | |||
| LIEs are exchanged over all links running RIFT to perform Link | LIEs are exchanged over all links running RIFT to perform Link | |||
| (Neighbor) Discovery. A node MUST NOT originate LIEs on an address | (Neighbor) Discovery. A node MUST NOT originate LIEs on an address | |||
| family if it does not process received LIEs on that family. LIEs on | family if it does not process received LIEs on that family. LIEs on | |||
| same link are considered part of the same negotiation independent on | same link are considered part of the same negotiation independent on | |||
| the address family they arrive on. An implementation MUST be ready | the address family they arrive on. An implementation MUST be ready | |||
| to accept TIEs on all addresses it used as source of LIE frames. | to accept TIEs on all addresses it used as source of LIE frames. | |||
| As shown in the above figure, without further checks adjacency of | As shown in the above figure, without further checks adjacency of | |||
| node A and B may form, but the forwarding between node A and node B | node A and B may form, but the forwarding between node A and node B | |||
| skipping to change at page 27, line 40 ¶ | skipping to change at page 29, line 40 ¶ | |||
| | | | | | | | | | | | | | | | | | | |||
| +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | |||
| | | | | | | | | | | | | | | | | | | |||
| |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
| +-+-----+ ++------+ +-----+-+ +-----+-+ | +-+-----+ ++------+ +-----+-+ +-----+-+ | |||
| + + + ^ | | + + + ^ | | |||
| PrefixA PrefixB PrefixA | PrefixC | PrefixA PrefixB PrefixA | PrefixC | |||
| | | | | |||
| + traffic | + traffic | |||
| Figure 13: Anycast | Figure 14: Anycast | |||
| If the traffic comes from ToF to Leaf111 or Leaf121 which has anycast | If the traffic comes from ToF to Leaf111 or Leaf121 which has anycast | |||
| prefix PrefixA. RIFT can deal with this case well. But if the | prefix PrefixA. RIFT can deal with this case well. But if the | |||
| traffic comes from Leaf122, it arrives Spine21 or Spine22 at level 1. | traffic comes from Leaf122, it arrives Spine21 or Spine22 at level 1. | |||
| But Spine21 or Spine22 doesn't know another PrefixA attaching | But Spine21 or Spine22 doesn't know another PrefixA attaching | |||
| Leaf111. So it will always get to Leaf121 and never get to Leaf111. | Leaf111. So it will always get to Leaf121 and never get to Leaf111. | |||
| If the intension is that the traffic should been offloaded to | If the intension is that the traffic should been offloaded to | |||
| Leaf111, then use policy guided prefixes defined in "Routing in Fat | Leaf111, then use policy guided prefixes defined in "Routing in Fat | |||
| Trees" [RIFT]. | Trees" [RIFT]. | |||
| skipping to change at page 28, line 43 ¶ | skipping to change at page 30, line 43 ¶ | |||
| within bounded latency. | within bounded latency. | |||
| This could be alleviated with Packet Replication, Elimination and | This could be alleviated with Packet Replication, Elimination and | |||
| Reordering (PREOF) [RFC8655] leaf-2-leaf but PREOF is hard to provide | Reordering (PREOF) [RFC8655] leaf-2-leaf but PREOF is hard to provide | |||
| at the scale of all flows, and the replication may increase the | at the scale of all flows, and the replication may increase the | |||
| probability of the overload that it attempts to solve. | probability of the overload that it attempts to solve. | |||
| Note that the load balancing is not RIFT's problem, but it is key to | Note that the load balancing is not RIFT's problem, but it is key to | |||
| serve IoT adequately. | serve IoT adequately. | |||
| 4.16. Key Management | ||||
| As outlined in Section "Security Considerations" of [RIFT], either a | ||||
| private shared key or a public/private key pair is used to | ||||
| authenticate the adjacency. Both the key distribution and key | ||||
| synchronization methods are out of scope for this document. Both | ||||
| nodes in the adjacency must share the same keys, key type, and | ||||
| algorithm for a given key ID. Mismatched keys will not inter-operate | ||||
| as their security envelopes will be unverifiable. | ||||
| Key roll-over while the adjacency is active MAY be supported. The | ||||
| specific mechanism is well documented in [RFC6518]. | ||||
| 5. Security Considerations | 5. Security Considerations | |||
| This document presents applicability of RIFT. As such, it does not | This document presents applicability of RIFT. As such, it does not | |||
| introduce any security considerations. However, there are a number | introduce any security considerations. However, there are a number | |||
| of security concerns at [RIFT]. | of security concerns at [RIFT]. | |||
| 6. Contributors | 6. Contributors | |||
| The following people (listed in alphabetical order) contributed | The following people (listed in alphabetical order) contributed | |||
| significantly to the content of this document and should be | significantly to the content of this document and should be | |||
| skipping to change at page 30, line 16 ¶ | skipping to change at page 32, line 21 ¶ | |||
| Binderberger, M., Ed., and J. Haas, Ed., "Bidirectional | Binderberger, M., Ed., and J. Haas, Ed., "Bidirectional | |||
| Forwarding Detection (BFD) on Link Aggregation Group (LAG) | Forwarding Detection (BFD) on Link Aggregation Group (LAG) | |||
| Interfaces", RFC 7130, DOI 10.17487/RFC7130, February | Interfaces", RFC 7130, DOI 10.17487/RFC7130, February | |||
| 2014, <https://www.rfc-editor.org/info/rfc7130>. | 2014, <https://www.rfc-editor.org/info/rfc7130>. | |||
| [RFC5549] Le Faucheur, F. and E. Rosen, "Advertising IPv4 Network | [RFC5549] Le Faucheur, F. and E. Rosen, "Advertising IPv4 Network | |||
| Layer Reachability Information with an IPv6 Next Hop", | Layer Reachability Information with an IPv6 Next Hop", | |||
| RFC 5549, DOI 10.17487/RFC5549, May 2009, | RFC 5549, DOI 10.17487/RFC5549, May 2009, | |||
| <https://www.rfc-editor.org/info/rfc5549>. | <https://www.rfc-editor.org/info/rfc5549>. | |||
| [RFC6518] Lebovitz, G. and M. Bhatia, "Keying and Authentication for | ||||
| Routing Protocols (KARP) Design Guidelines", RFC 6518, | ||||
| DOI 10.17487/RFC6518, February 2012, | ||||
| <https://www.rfc-editor.org/info/rfc6518>. | ||||
| [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., | [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., | |||
| Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, | Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, | |||
| JP., and R. Alexander, "RPL: IPv6 Routing Protocol for | JP., and R. Alexander, "RPL: IPv6 Routing Protocol for | |||
| Low-Power and Lossy Networks", RFC 6550, | Low-Power and Lossy Networks", RFC 6550, | |||
| DOI 10.17487/RFC6550, March 2012, | DOI 10.17487/RFC6550, March 2012, | |||
| <https://www.rfc-editor.org/info/rfc6550>. | <https://www.rfc-editor.org/info/rfc6550>. | |||
| [RFC6775] Shelby, Z., Ed., Chakrabarti, S., Nordmark, E., and C. | [RFC6775] Shelby, Z., Ed., Chakrabarti, S., Nordmark, E., and C. | |||
| Bormann, "Neighbor Discovery Optimization for IPv6 over | Bormann, "Neighbor Discovery Optimization for IPv6 over | |||
| Low-Power Wireless Personal Area Networks (6LoWPANs)", | Low-Power Wireless Personal Area Networks (6LoWPANs)", | |||
| End of changes. 40 change blocks. | ||||
| 91 lines changed or deleted | 211 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||