| < draft-wei-rift-applicability-01.txt | draft-wei-rift-applicability-02.txt > | |||
|---|---|---|---|---|
| RIFT WG Yuehua. Wei | RIFT WG Yuehua. Wei | |||
| Internet-Draft Zheng. Zhang | Internet-Draft Zheng. Zhang | |||
| Intended status: Standards Track ZTE Corporation | Intended status: Standards Track ZTE Corporation | |||
| Expires: December 21, 2019 Dmitry. Afanasiev | Expires: May 6, 2020 Dmitry. Afanasiev | |||
| Yandex | Yandex | |||
| Tom. Verhaeg | Tom. Verhaeg | |||
| Interconnect Services B.V. | Interconnect Services B.V. | |||
| Jaroslaw. Kowalczyk | Jaroslaw. Kowalczyk | |||
| Orange Polska | Orange Polska | |||
| June 19, 2019 | November 3, 2019 | |||
| RIFT Applicability | RIFT Applicability | |||
| draft-wei-rift-applicability-01 | draft-wei-rift-applicability-02 | |||
| Abstract | Abstract | |||
| This document discusses the properties and applicability of RIFT in | This document discusses the properties, applicability and operational | |||
| different network topologies. It intends to provide a rough guide | considerations of RIFT in different network scenarios. It intends to | |||
| how RIFT can be deployed to simplify routing operations in Clos | provide a rough guide how RIFT can be deployed to simplify routing | |||
| topologies and their variations. | operations in Clos topologies and their variations. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on December 21, 2019. | This Internet-Draft will expire on May 6, 2020. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Problem statement of a Fat Tree network in modern IP fabric . 2 | 2. Problem Statement of Routing in Modern IP Fabric Fat Tree | |||
| 3. Why ritf is chosen to address this use case . . . . . . . . . 3 | Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 3 | ||||
| 3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 3 | 3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 5 | 3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 5 | |||
| 3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 5 | 3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 6 | |||
| 3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 6 | 3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 6 | |||
| 3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 6 | 3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.3.1. DC Fabrics . . . . . . . . . . . . . . . . . . . . . 6 | 3.3.1. DC Fabrics . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 6 | 3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 7 | |||
| 3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 6 | 3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 7 | |||
| 3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 7 | 3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 7 | |||
| 3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 7 | 3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 4. Operational Simplifications and Considerations . . . . . . . 9 | 4. Deployment Considerations . . . . . . . . . . . . . . . . . . 9 | |||
| 4.1. Automatic Disaggregation . . . . . . . . . . . . . . . . 10 | 4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 10 | |||
| 4.1.1. South reflection . . . . . . . . . . . . . . . . . . 10 | 4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 10 | |||
| 4.1.2. Suboptimal routing upon link failure use case . . . . 10 | 4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 12 | |||
| 4.1.3. Black-holing upon link failure use case . . . . . . . 12 | 4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 13 | |||
| 4.2. Usage of ZTP . . . . . . . . . . . . . . . . . . . . . . 13 | 4.5. Miscabling Examples . . . . . . . . . . . . . . . . . . . 13 | |||
| 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 | 4.6. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 13 | 4.7. In-Band Reachability of Nodes . . . . . . . . . . . . . . 17 | |||
| 7. Normative References . . . . . . . . . . . . . . . . . . . . 14 | 4.7.1. Reachability of Leafs . . . . . . . . . . . . . . . . 17 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 | 4.7.2. Reachability of Spines . . . . . . . . . . . . . . . 17 | |||
| 4.8. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 17 | ||||
| 4.9. Fabric With A Controller . . . . . . . . . . . . . . . . 18 | ||||
| 4.9.1. Controller Attached to ToFs . . . . . . . . . . . . . 19 | ||||
| 4.9.2. Controller Attached to Leaf . . . . . . . . . . . . . 19 | ||||
| 4.10. Internet Connectivity Without Underlay . . . . . . . . . 19 | ||||
| 4.10.1. Internet Default on the Leafs . . . . . . . . . . . 19 | ||||
| 4.10.2. Internet Default on the ToFs . . . . . . . . . . . . 20 | ||||
| 4.11. Subnet Mismatch and Address Families . . . . . . . . . . 20 | ||||
| 4.12. Anycast Considerations . . . . . . . . . . . . . . . . . 20 | ||||
| 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 | ||||
| 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 21 | ||||
| 7. Normative References . . . . . . . . . . . . . . . . . . . . 22 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 | ||||
| 1. Introduction | 1. Introduction | |||
| This document intends to explain the properties and applicability of | This document intends to explain the properties and applicability of | |||
| RIFT [I-D.ietf-rift-rift] in different deployment scenarios and | RIFT [I-D.ietf-rift-rift] in different deployment scenarios and | |||
| highlight the operational simplicity of the technology compared to | highlight the operational simplicity of the technology compared to | |||
| traditional routing solutions. | traditional routing solutions. It also documents special | |||
| considerations when RIFT is used with or without overlays, | ||||
| controllers and corrects topology miscablings and/or node and link | ||||
| failures. | ||||
| 2. Problem statement of a Fat Tree network in modern IP fabric | 2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks | |||
| Clos and Fat-Tree topologies have gained prominence in today's | Clos and Fat-Tree topologies have gained prominence in today's | |||
| networking, primarily as result of the paradigm shift towards a | networking, primarily as result of the paradigm shift towards a | |||
| centralized data-center based architecture that is poised to deliver | centralized data-center based architecture that is poised to deliver | |||
| a majority of computation and storage services in the future. | a majority of computation and storage services in the future. | |||
| Today's current routing protocols were geared towards a network with | Today's current routing protocols were geared towards a network with | |||
| an irregular topology and low degree of connectivity originally. | an irregular topology and low degree of connectivity originally. | |||
| When they are applied to Fat-Tree topologies: | When they are applied to Fat-Tree topologies: | |||
| o There are always extensive configuration or provisioning during | o they tend to need extensive configuration or provisioning during | |||
| bring up and re-dimensioning. | bring up and re-dimensioning. | |||
| o Both the spine node and the leaf node have the entire network | o spine and leaf nodes have the entire network topology and routing | |||
| topology and routing information, but in fact, the leaf node does | information, which is in fact, not needed on the leaf nodes during | |||
| not need so much complete information. | normal operation. | |||
| o There is significant Link State PDUs (LSPs) flooding duplication | ||||
| between spine nodes and leaf nodes during network bring up and | ||||
| topology update. It consumes both spine and leaf nodes' CPU and | ||||
| link bandwidth resources. | ||||
| o When a spine node advertises a topology change, every leaf node | o significant Link State PDUs (LSPs) flooding duplication between | |||
| connected to it will flood the update to all the other spine | spine nodes and leaf nodes occurs during network bring up and | |||
| nodes, and those spine nodes will further flood them to all the | topology updates. It consumes both spine and leaf nodes' CPU and | |||
| leaf nodes, causing a O(n^2) flooding storm which is largely | link bandwidth resources and with that limits protocol | |||
| redundant. | scalability. | |||
| 3. Why ritf is chosen to address this use case | 3. Applicability of RIFT to Clos IP Fabrics | |||
| Further content of this document assumes that the reader is familiar | Further content of this document assumes that the reader is familiar | |||
| with the terms and concepts used in OSPF [RFC2328] and IS-IS | with the terms and concepts used in OSPF [RFC2328] and IS-IS | |||
| [ISO10589-Second-Edition] link-state protocols and at least the | [ISO10589-Second-Edition] link-state protocols and at least the | |||
| sections of RIFT [I-D.ietf-rift-rift] outlining the requirement of | sections of RIFT [I-D.ietf-rift-rift] outlining the requirement of | |||
| routing in IP fabrics and RIFT protocol concepts. | routing in IP fabrics and RIFT protocol concepts. | |||
| 3.1. Overview of RIFT | 3.1. Overview of RIFT | |||
| RIFT is a dynamic routing protocol for Clos and fat-tree network | RIFT is a dynamic routing protocol for Clos and fat-tree network | |||
| skipping to change at page 4, line 5 ¶ | skipping to change at page 4, line 16 ¶ | |||
| level obtains the full topology of levels south of it. That | level obtains the full topology of levels south of it. That | |||
| information is never flooded East-West or back South again. So a top | information is never flooded East-West or back South again. So a top | |||
| tier node has full set of prefixes from the SPF calculation. | tier node has full set of prefixes from the SPF calculation. | |||
| In the southbound direction the protocol operates like a "fully | In the southbound direction the protocol operates like a "fully | |||
| summarizing, unidirectional" path vector protocol or rather a | summarizing, unidirectional" path vector protocol or rather a | |||
| distance vector with implicit split horizon whereas the information | distance vector with implicit split horizon whereas the information | |||
| propagates one hop south and is 're-advertised' by nodes at next | propagates one hop south and is 're-advertised' by nodes at next | |||
| lower level, normally just the default route. | lower level, normally just the default route. | |||
| +-----------+ +-----------+ | +-----------+ +-----------+ | |||
| | ToF | | ToF | LEVEL 2 | | ToF | | ToF | LEVEL 2 | |||
| + +-----+--+--+ +-+--+------+ | + +-----+--+--+ +-+--+------+ | |||
| | | | | | | | | | ^ | | | | | | | | | | ^ | |||
| + | | | +-------------------------+ | | + | | | +-------------------------+ | | |||
| Distance | +-------------------+ | | | | | | Distance | +-------------------+ | | | | | | |||
| Vector | | | | | | | | + | Vector | | | | | | | | + | |||
| South | | | | +--------+ | | | Link+State | South | | | | +--------+ | | | Link+State | |||
| + | | | | | | | | Flooding | + | | | | | | | | Flooding | |||
| | | | +-------------+ | | | North | | | | +-------------+ | | | North | |||
| v | | | | | | | | + | v | | | | | | | | + | |||
| +-+--+-+ +------+ +-------+ +--+--+-+ | | +-+--+-+ +------+ +-------+ +--+--+-+ | | |||
| |SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1 | |SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1 | |||
| + ++----++ ++---+-+ +--+--+-+ ++----+-+ | | + ++----++ ++---+-+ +--+--+-+ ++----+-+ | | |||
| + | | | | | | | | | ^N | + | | | | | | | | | ^ N | |||
| Distance | +-------+ | | +--------+ | | | E | Distance | +-------+ | | +--------+ | | | E | |||
| Vector | | | | | | | | | +------> | Vector | | | | | | | | | +------> | |||
| South | +-------+ | | | +-------+ | | | | | South | +-------+ | | | +-------+ | | | | | |||
| + | | | | | | | | | + | + | | | | | | | | | + | |||
| v ++--++ +-+-++ ++-+-+ +-+--++ + | v ++--++ +-+-++ ++-+-+ +-+--++ + | |||
| |LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0 | |LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0 | |||
| +----+ +----+ +----+ +-----+ | +----+ +----+ +----+ +-----+ | |||
| Figure 1: Rift overview | Figure 1: Rift overview | |||
| A middle tier node has only information necessary for its level, | A middle tier node has only information necessary for its level, | |||
| which are all destinations south of the node based on SPF | which are all destinations south of the node based on SPF | |||
| calculation, default route and potential disaggregated routes. | calculation, default route and potential disaggregated routes. | |||
| RIFT combines the advantage of both Link-State and Distance Vector: | RIFT combines the advantage of both Link-State and Distance Vector: | |||
| o Fastest Possible Convergence | o Fastest Possible Convergence | |||
| skipping to change at page 5, line 51 ¶ | skipping to change at page 6, line 17 ¶ | |||
| allow the reconciliation of topology view of different planes as most | allow the reconciliation of topology view of different planes as most | |||
| desirable solution making proper disaggregation viable in case of | desirable solution making proper disaggregation viable in case of | |||
| failures. This observations hold not only in case of RIFT but in the | failures. This observations hold not only in case of RIFT but in the | |||
| generic case of dynamic routing on Clos variants with multiple planes | generic case of dynamic routing on Clos variants with multiple planes | |||
| and failures in bi-sectional bandwidth, especially on the leafs. | and failures in bi-sectional bandwidth, especially on the leafs. | |||
| 3.2.1. Horizontal Links | 3.2.1. Horizontal Links | |||
| RIFT is not limited to pure Clos divided into PoD and multi-planes | RIFT is not limited to pure Clos divided into PoD and multi-planes | |||
| but supports horizontal links below the top of fabric level. Those | but supports horizontal links below the top of fabric level. Those | |||
| links are used however only as routes of last resort when a spine | links are used however only as routes of last resort northbound when | |||
| loses all northbound links or cannot compute a default route through | a spine loses all northbound links or cannot compute a default route | |||
| them. | through them. | |||
| A possible configuration is a "ring" of horizontal links at a level. | ||||
| In presence of such a "ring" in any level (except ToF level) neither | ||||
| N-SPF nor S-SPF will provide a "ring-based protection" scheme since | ||||
| such a computation would have to deal necessarily with breaking of | ||||
| "loops" in Dijkstra sense; an application for which RIFT is not | ||||
| intended. | ||||
| A full-mesh connectivity between nodes on the same level can be | ||||
| employed and that allows N-SPF to provide for any node loosing all | ||||
| its northbound adjacencies (as long as any of the other nodes in the | ||||
| level are northbound connected) to still participate in northbound | ||||
| forwarding. | ||||
| 3.2.2. Vertical Shortcuts | 3.2.2. Vertical Shortcuts | |||
| Through relaxations of the specified adjacency forming rules RIFT | Through relaxations of the specified adjacency forming rules RIFT | |||
| implementations can be extended to support vertical "shortcuts" as | implementations can be extended to support vertical "shortcuts" as | |||
| proposed by e.g. [I-D.white-distoptflood]. The RIFT specification | proposed by e.g. [I-D.white-distoptflood]. The RIFT specification | |||
| itself does not provide the exact details since the resulting | itself does not provide the exact details since the resulting | |||
| solution suffers from either much larger blast radii with increased | solution suffers from either much larger blast radii with increased | |||
| flooding volumes or in case of maximum aggregation routing bow-tie | flooding volumes or in case of maximum aggregation routing bow-tie | |||
| problems. | problems. | |||
| skipping to change at page 8, line 21 ¶ | skipping to change at page 8, line 21 ¶ | |||
| | | | | +-------------------------+ | | | | | | | | +-------------------------+ | | | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| | | +----------------------+ | | | | | | | | | | | +----------------------+ | | | | | | | | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| | +---------------------------------+ | | | | | | | | | +---------------------------------+ | | | | | | | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| | | | +-----------------------------+ | | | | | | | | | +-----------------------------+ | | | | | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| | | | | | +--------------------+ | | | | | | | | | | +--------------------+ | | | | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| | | | | | | | | | | | | | ||||
| +--+ +-+---+--+ +-+---+--+ +--+----+--+ +-+--+--+ +--+ | +--+ +-+---+--+ +-+---+--+ +--+----+--+ +-+--+--+ +--+ | |||
| |L | | Leaf | | Leaf | | Leaf | | Leaf | |L | | |L | | Leaf | | Leaf | | Leaf | | Leaf | |L | | |||
| |S | | Switch | | Switch | | Switch | | Switch| |S | | |S | | Switch | | Switch | | Switch | | Switch| |S | | |||
| ++-+ +-+-+-+--+ +-+-+-+--+ +--+-+--+--+ ++-+--+-+ +-++ | ++-+ +-+-+-+--+ +-+-+-+--+ +--+-+--+--+ ++-+--+-+ +-++ | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| | +-+-+-+--+ +-+-+-+--+ +--+-+--+--+ ++-+--+-+ | | | +-+-+-+--+ +-+-+-+--+ +--+-+--+--+ ++-+--+-+ | | |||
| | |Compute | |Compute | | Compute | |Compute| | | | |Compute | |Compute | | Compute | |Compute| | | |||
| | |Node | |Node | | Node | |Node | | | | |Node | |Node | | Node | |Node | | | |||
| | | | | | | | | | | | ||||
| | +--------+ +--------+ +----------+ +-------+ | | | +--------+ +--------+ +----------+ +-------+ | | |||
| | || VAS5 || || vDHCP|| || vRouter|| ||VAS1 || | | | || VAS5 || || vDHCP|| || vRouter|| ||VAS1 || | | |||
| | |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| | |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| | || VAS6 || || VAS3 || || v802.1x|| ||VAS2 || | | | || VAS6 || || VAS3 || || v802.1x|| ||VAS2 || | | |||
| | |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| | |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| | || VAS7 || || VAS4 || || vIGMP || ||BAA || | | | || VAS7 || || VAS4 || || vIGMP || ||BAA || | | |||
| | |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| | +--------+ +--------+ +----------+ +-------+ | | | +--------+ +--------+ +----------+ +-------+ | | |||
| | | | | | | |||
| ++-----------+ +---------++ | ++-----------+ +---------++ | |||
| |Network I/O | |Access I/O| | |Network I/O | |Access I/O| | |||
| +------------+ +----------+ | +------------+ +----------+ | |||
| Figure 2: An example of CloudCo architecture | Figure 2: An example of CloudCO architecture | |||
| The Spine-Leaf architectures deployed inside CloudCO meets the | The Spine-Leaf architectures deployed inside CloudCO meets the | |||
| network requirements of adaptable, agile, scalable and dynamic. | network requirements of adaptable, agile, scalable and dynamic. | |||
| 4. Operational Simplifications and Considerations | 4. Deployment Considerations | |||
| RIFT presents the opportunity for organizations building and | RIFT presents the opportunity for organizations building and | |||
| operating IP fabrics to simplify their operation and deployments | operating IP fabrics to simplify their operation and deployments | |||
| while achieving many desirable properties of a dynamic routing on | while achieving many desirable properties of a dynamic routing on | |||
| such a substrate: | such a substrate: | |||
| o RIFT design follows minimum blast radius and minimum necessary | o RIFT design follows minimum blast radius and minimum necessary | |||
| epistemological scope philosophy which leads to very good scaling | epistemological scope philosophy which leads to very good scaling | |||
| properties while delivering maximum reactiveness. | properties while delivering maximum reactiveness. | |||
| skipping to change at page 10, line 11 ¶ | skipping to change at page 10, line 11 ¶ | |||
| o RIFT is designed for minimum delay in case of prefix mobility on | o RIFT is designed for minimum delay in case of prefix mobility on | |||
| the fabric. | the fabric. | |||
| o Many further operational and design points collected over many | o Many further operational and design points collected over many | |||
| years of routing protocol deployments have been incorporated in | years of routing protocol deployments have been incorporated in | |||
| RIFT such as fast flooding rates, protection of information | RIFT such as fast flooding rates, protection of information | |||
| lifetimes and operationally easily recognizable remote ends of | lifetimes and operationally easily recognizable remote ends of | |||
| links and node names. | links and node names. | |||
| 4.1. Automatic Disaggregation | 4.1. South Reflection | |||
| 4.1.1. South reflection | ||||
| South reflection is a mechanism that South Node TIEs are "reflected" | South reflection is a mechanism that South Node TIEs are "reflected" | |||
| back up north to allow nodes in same level without E-W links to "see" | back up north to allow nodes in same level without E-W links to "see" | |||
| each other. | each other. | |||
| For example, Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs | For example, Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs | |||
| from ToF21 to ToF22 separately. Spine111\Spine112\Spine121\Spine122 | from ToF21 to ToF22 separately. Respectively, | |||
| reflects Node S-TIEs from ToF22 to ToF21 separately. So ToF22 and | Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs from ToF22 | |||
| ToF21 knows each other as level 2 node. | to ToF21 separately. So ToF22 and ToF21 see each other's node | |||
| information as level 2 nodes. | ||||
| As the result of the south reflection between | ||||
| Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, Spine121 and | ||||
| Spine 122 knows each other at level 1. | ||||
| This is a use case to explain the deployment of a Fat-Tree and the | In an equivalent fashion, as the result of the south reflection | |||
| algorithm to achieve automatic disaggregation. | between Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, | |||
| Spine121 and Spine 122 knows each other at level 1. | ||||
| 4.1.2. Suboptimal routing upon link failure use case | 4.2. Suboptimal Routing on Link Failures | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| | | | | | | ToF21 | | ToF22 | LEVEL 2 | |||
| | ToF21 | | ToF22 | LEVEL 2 | ++--+-+-++ ++-+--+-++ | |||
| ++-+--+-++ ++-+--+-++ | | | | | | | | + | |||
| | | | | | | | | | | | | | | | | linkTS8 | |||
| | | | | | | | linkTS8 | +-------------+ | +-+linkTS3+-+ | | | +--------------+ | |||
| | | | | | | | | | | | | | | | + | | |||
| | | | | | | | | | | +----------------------------+ | linkTS7 | | |||
| +--------------+ | +--linkTS3--+ | | | +--------------+ | | | | | + + + | | |||
| | | | | | | | | | | | | +-------+linkTS4+------------+ | | |||
| | +-----------------------------+ | linkTS7 | | | | | + + | | | | |||
| | | | | | | | | | | | | +------------+--+ | | | |||
| | | | +--------linkTS4-------------+ | | | | | | | linkTS6 | | | |||
| | | | | | | | | | +-+----++ ++-----++ ++------+ ++-----++ | |||
| | | +-+ +---------------+ | | | |Spin111| |Spin112| |Spin121| |Spin122| LEVEL 1 | |||
| | | | | | linkTS6 | | | +-+---+-+ ++----+-+ +-+---+-+ ++---+--+ | |||
| +-+----++ +-+-----+ ++----+-+ ++-----++ | | | | | | | | | | |||
| | | | | | | | | | | +--------------+ | + ++XX+linkSL6+---+ + | |||
| |Spin111| |Spin112| |Spin121| |Spin122| LEVEL 1 | | | | | linkSL5 | | linkSL8 | |||
| +-+---+-+ ++----+-+ +-+---+-+ ++---+--+ | | +------------+ | | + +---+linkSL7+-+ | + | |||
| | | | | | | | | | | | | | | | | | | |||
| | +---------------+ | | +-XX-linkSL6----+ | | +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | |||
| | | | | linkSL5 | | linkSL8 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
| | +-------------+ | | | +----linkSL7--+ | | | +-+-----+ ++------+ +-----+-+ +-+-----+ | |||
| | | | | | | | | | + + + + | |||
| +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | Prefix111 Prefix112 Prefix121 Prefix122 | |||
| | | | | | | | | | ||||
| |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | ||||
| +-+-----+ ++------+ +-----+-+ +-+-----+ | ||||
| + + + + | ||||
| Prefix111 Prefix112 Prefix121 Prefix122 | ||||
| Figure 3: Suboptimal routing upon link failure use case | Figure 3: Suboptimal routing upon link failure use case | |||
| As shown in figure above, as the result of the south reflection | As shown in Figure 3, as the result of the south reflection between | |||
| between Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, | Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, Spine121 and | |||
| Spine121 and Spine 122 knows each other at level 1. | Spine 122 knows each other at level 1. | |||
| Without disaggregation mechanism, when linkSL6 fails, the packet from | Without disaggregation mechanism, when linkSL6 fails, the packet from | |||
| leaf121 to prefix122 will probably go up through linkSL5 to linkTS3 | leaf121 to prefix122 will probably go up through linkSL5 to linkTS3 | |||
| then go down through linkTS4 to linkSL8 to Leaf122 or go up through | then go down through linkTS4 to linkSL8 to Leaf122 or go up through | |||
| linkSL5 to linkTS6 then go down through linkTS4 and linkSL8 to | linkSL5 to linkTS6 then go down through linkTS4 and linkSL8 to | |||
| Leaf122 based on pure default route. It's the case of suboptimal | Leaf122 based on pure default route. It's the case of suboptimal | |||
| routing. | routing or bow-tieing. | |||
| With disaggregation mechanism, when linkSL6 fails, Spine122 will | With disaggregation mechanism, when linkSL6 fails, Spine122 will | |||
| detect the failure according to the reflected node S-TIE from | detect the failure according to the reflected node S-TIE from | |||
| Spine121. Based on the disaggregation algorithm provided by RITF, | Spine121. Based on the disaggregation algorithm provided by RIFT, | |||
| Spine122 will explicitly advertise prefix122 in Prefix S-TIE | Spine122 will explicitly advertise prefix122 in Disaggregated Prefix | |||
| SouthPrefixesElement(prefix122, cost 1). The packet from leaf121 to | S-TIE PrefixesElement(prefix122, cost 1). The packet from leaf121 to | |||
| prefix122 will only be sent to linkSL7 following a longest-prefix | prefix122 will only be sent to linkSL7 following a longest-prefix | |||
| match to prefix 122 directly then go down through linkSL8 to Leaf122 | match to prefix 122 directly then go down through linkSL8 to Leaf122 | |||
| . | . | |||
| 4.1.3. Black-holing upon link failure use case | 4.3. Black-Holing on Link Failures | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| | | | | | ||||
| | ToF 21 | | ToF 22 | LEVEL 2 | | ToF 21 | | ToF 22 | LEVEL 2 | |||
| ++-+--+-++ ++-+--+-++ | ++-+--+-++ ++-+--+-++ | |||
| | | | | | | | | | | | | | | | | | | |||
| | | | | | | | linkTS8 | | | | | | | | linkTS8 | |||
| | | | | | | | | | ||||
| | | | | | | | | | ||||
| +--------------+ | +--linkTS3-X+ | | | +--------------+ | +--------------+ | +--linkTS3-X+ | | | +--------------+ | |||
| linkTS1 | | | | | | | | linkTS1 | | | | | | | | |||
| | +-----------------------------+ | linkTS7 | | | +-----------------------------+ | linkTS7 | | |||
| | | | | | | | | | | | | | | | | | | |||
| | | linkTS2 +--------linkTS4-X-----------+ | | | | linkTS2 +--------linkTS4-X-----------+ | | |||
| | | | | | | | | | | | | | | | | | | |||
| | linkTS5 +-+ +---------------+ | | | | linkTS5 +-+ +---------------+ | | | |||
| | | | | | linkTS6 | | | | | | | | linkTS6 | | | |||
| +-+----++ +-+-----+ ++----+-+ ++-----++ | +-+----++ +-+-----+ ++----+-+ ++-----++ | |||
| | | | | | | | | | ||||
| |Spin111| |Spin112| |Spin121| |Spin122| LEVEL 1 | |Spin111| |Spin112| |Spin121| |Spin122| LEVEL 1 | |||
| +-+---+-+ ++----+-+ +-+---+-+ ++---+--+ | +-+---+-+ ++----+-+ +-+---+-+ ++---+--+ | |||
| | | | | | | | | | | | | | | | | | | |||
| | +---------------+ | | +----linkSL6----+ | | | +---------------+ | | +----linkSL6----+ | | |||
| linkSL1 | | | linkSL5 | | linkSL8 | linkSL1 | | | linkSL5 | | linkSL8 | |||
| | +---linkSL3---+ | | | +----linkSL7--+ | | | | +---linkSL3---+ | | | +----linkSL7--+ | | | |||
| | | | | | | | | | | | | | | | | | | |||
| +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | |||
| | | | | | | | | | ||||
| |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
| +-+-----+ ++------+ +-----+-+ +-+-----+ | +-+-----+ ++------+ +-----+-+ +-+-----+ | |||
| + + + + | + + + + | |||
| Prefix111 Prefix112 Prefix121 Prefix122 | Prefix111 Prefix112 Prefix121 Prefix122 | |||
| Figure 4: Black-holing upon link failure use case | Figure 4: Black-holing upon link failure use case | |||
| This scenario illustrates a case when double link failure occurs, | This scenario illustrates a case when double link failure occurs and | |||
| black-holing happens. | with that black-holing can happen. | |||
| Without disaggregation mechanism, when linkTS3 and linkTS4 both fail, | Without disaggregation mechanism, when linkTS3 and linkTS4 both fail, | |||
| the packet from leaf111 to prefix122 would suffer 50% black-holing | the packet from leaf111 to prefix122 would suffer 50% black-holing | |||
| based on pure default route. The packet supposed to go up through | based on pure default route. The packet supposed to go up through | |||
| linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be | linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be | |||
| dropped. The packet supposed to go up through linkSL3 to linkTS2 | dropped. The packet supposed to go up through linkSL3 to linkTS2 | |||
| then go down through linkTS3 or linkTS4 will be dropped as well. | then go down through linkTS3 or linkTS4 will be dropped as well. | |||
| It's the case of black-holing. | It's the case of black-holing. | |||
| With disaggregation mechanism, when linkTS3 and linkTS4 both fail, | With disaggregation mechanism, when linkTS3 and linkTS4 both fail, | |||
| ToF22 will detect the failure according to the reflected node S-TIE | ToF22 will detect the failure according to the reflected node S-TIE | |||
| of ToF21 from Spine111\Spine112\Spine121\Spine122. Based on the | of ToF21 from Spine111\Spine112\Spine121\Spine122. Based on the | |||
| disaggregation algorithm provided by RITF, ToF22 will explicitly | disaggregation algorithm provided by RITF, ToF22 will explicitly | |||
| originate an S-TIE with prefix 121 and prefix 122, that is flooded to | originate an S-TIE with prefix 121 and prefix 122, that is flooded to | |||
| spines 111, 112, 121 and 122. | spines 111, 112, 121 and 122. | |||
| The packet from leaf111 to prefix122 will not be routed to linkTS1 or | The packet from leaf111 to prefix122 will not be routed to linkTS1 or | |||
| linkTS2. The packet from leaf111 to prefix122 will only be routed to | linkTS2. The packet from leaf111 to prefix122 will only be routed to | |||
| linkTS5 or linkTS7 following a longest-prefix match to prefix122. | linkTS5 or linkTS7 following a longest-prefix match to prefix122. | |||
| 4.2. Usage of ZTP | 4.4. Zero Touch Provisioning (ZTP) | |||
| Each RIFT node may operate in zero touch provisioning (ZTP) mode. It | Each RIFT node may operate in zero touch provisioning (ZTP) mode. It | |||
| has no configuration (unless it is a Top-of-Fabric at the top of the | has no configuration (unless it is a Top-of-Fabric at the top of the | |||
| topology or the must operate in the topology as leaf and/or support | topology or it is desired to confine it to leaf role w/o leaf-2-leaf | |||
| leaf-2-leaf procedures) and it will fully configure itself after | procedures). In such case RIFT will fully configure the node's level | |||
| being attached to the topology. | after it is attached to the topology. | |||
| The most import component for ZTP is the automatic level derivation | The most import component for ZTP is the automatic level derivation | |||
| procedure. All the Top-of-Fabric nodes are explicitly marked with | procedure. All the Top-of-Fabric nodes are explicitly marked with | |||
| TOP_OF_FABRIC flag which are initial 'seeds' needed for other ZTP | TOP_OF_FABRIC flag which are initial 'seeds' needed for other ZTP | |||
| nodes to derive their level in the topology. | nodes to derive their level in the topology. The derivation of the | |||
| level of each node happens then based on LIEs received from its | ||||
| The derivation of the level of each node happens based on LIEs | neighbors whereas each node (with possibly exceptions of configured | |||
| received from its neighbors whereas each node (with possibly | leafs) tries to attach at the highest possible point in the fabric. | |||
| exceptions of configured leafs) tries to attach at the highest | ||||
| possible point in the fabric. | ||||
| This guarantees that even if the diffusion front reaches a node from | This guarantees that even if the diffusion front reaches a node from | |||
| "below" faster than from "above", it will greedily abandon already | "below" faster than from "above", it will greedily abandon already | |||
| negotiated level derived from nodes topologically below it and | negotiated level derived from nodes topologically below it and | |||
| properly peers with nodes above. | properly peer with nodes above. | |||
| 4.5. Miscabling Examples | ||||
| +----------------+ +-----------------+ | ||||
| | ToF21 | +------+ ToF22 | LEVEL 2 | ||||
| +-------+----+---+ | +----+---+--------+ | ||||
| | | | | | | | | | | ||||
| | | | +----------------------------+ | | ||||
| | +---------------------------+ | | | | | ||||
| | | | | | | | | | | ||||
| | | | | +-----------------------+ | | | ||||
| | | +------------------------+ | | | | ||||
| | | | | | | | | | | ||||
| +-+---+-+ +-+---+-+ | +-+---+-+ +-+---+-+ | ||||
| |Spin111| |Spin112| | |Spin121| |Spin122| LEVEL 1 | ||||
| +-+---+-+ ++----+-+ | +-+---+-+ ++----+-+ | ||||
| | | | | | | | | | | ||||
| | +---------+ | link-M | +---------+ | | ||||
| | | | | | | | | | | ||||
| | +-------+ | | | | +-------+ | | | ||||
| | | | | | | | | | | ||||
| +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | ||||
| |Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0 | ||||
| +-------+ +-------+ +-------+ +-------+ | ||||
| Figure 5: A single plane miscabling example | ||||
| Figure Figure 5 shows a single plane miscabling example. It's a | ||||
| perfect fat-tree fabric except link-M connecting Leaf112 to ToF22. | ||||
| The RIFT control protocol can discover the physical links | ||||
| automatically and be able to detect cabling that violates fat-tree | ||||
| topology constraints. It react accordingly to such mis-cabling | ||||
| attempts, at a minimum preventing adjacencies between nodes from | ||||
| being formed and traffic from being forwarded on those mis-cabled | ||||
| links. Leaf112 will in such scenario use link-M to derive its level | ||||
| (unless it is leaf) and can report links to spines 111 and 112 as | ||||
| miscabled unless the implementations allows horizontal links. | ||||
| Figure Figure 6 shows a multiple plane miscabling example. Since | ||||
| Leaf112 and Spine121 belong to two different PoDs, the adjacency | ||||
| between Leaf112 and Spine121 can not be formed. link-W would be | ||||
| detected and prevented. | ||||
| +-------+ +-------+ +-------+ +-------+ | ||||
| |ToF A1| |ToF A2| |ToF B1| |ToF B2| LEVEL 2 | ||||
| +-------+ +-------+ +-------+ +-------+ | ||||
| | | | | | | | | | ||||
| | | | +-----------------+ | | | | ||||
| | +--------------------------+ | | | | | ||||
| | | | | | | | | | ||||
| | +------+ | | | +------+ | | ||||
| | | +-----------------+ | | | | | | ||||
| | | | +--------------------------+ | | | ||||
| | A | | B | | A | | B | | ||||
| +-----+-+ +-+---+-+ +-+---+-+ +-+-----+ | ||||
| |Spin111| |Spin112| +----+Spin121| |Spin122| LEVEL 1 | ||||
| +-+---+-+ ++----+-+ | +-+---+-+ ++----+-+ | ||||
| | | | | | | | | | | ||||
| | +---------+ | | | +---------+ | | ||||
| | | | | link-W | | | | | ||||
| | +-------+ | | | | +-------+ | | | ||||
| | | | | | | | | | | ||||
| +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | ||||
| |Leaf111| |Leaf112+------+ |Leaf121| |Leaf122| LEVEL 0 | ||||
| +-------+ +-------+ +-------+ +-------+ | ||||
| +--------PoD#1----------+ +---------PoD#2---------+ | ||||
| Figure 6: A multiple plane miscabling example | ||||
| RIFT provides an optional level determination procedure in its Zero | ||||
| Touch Provisioning mode. Nodes in the fabric without their level | ||||
| configured determine it automatically. This can have possibly | ||||
| counter-intuitive consequences however. One extreme failure scenario | ||||
| is depicted in Figure 7 and it shows that if all northbound links of | ||||
| spine11 fail at the same time, spine11 negotiates a lower level than | ||||
| Leaf11 and Leaf12. | ||||
| To prevent such scenario where leafs are expected to act as switches, | ||||
| LEAF_ONLY flag can be set for Leaf111 and Leaf112. Since level -1 is | ||||
| invalid, Spine11 would not derive a valid level from the topology in | ||||
| Figure 7. It will be isolated from the whole fabric and it would be | ||||
| up to the leafs to declare the links towards such spine as miscabled. | ||||
| +-------+ +-------+ +-------+ +-------+ | ||||
| |ToF A1| |ToF A2| |ToF A1| |ToF A2| | ||||
| +-------+ +-------+ +-------+ +-------+ | ||||
| | | | | | | | ||||
| | +-------+ | | | | ||||
| + + | | ====> | | | ||||
| X X +------+ | +------+ | | ||||
| + + | | | | | ||||
| +----+--+ +-+-----+ +-+-----+ | ||||
| |Spine11| |Spine12| |Spine12| | ||||
| +-+---+-+ ++----+-+ ++----+-+ | ||||
| | | | | | | | ||||
| | +---------+ | | | | ||||
| | | | | | | | ||||
| | +-------+ | | +-------+ | | ||||
| | | | | | | | ||||
| +-+---+-+ +--+--+-+ +-----+-+ +-----+-+ | ||||
| |Leaf111| |Leaf112| |Leaf111| |Leaf112| | ||||
| +-------+ +-------+ +-+-----+ +-+-----+ | ||||
| | | | ||||
| | +--------+ | ||||
| | | | ||||
| +-+---+-+ | ||||
| |Spine11| | ||||
| +-------+ | ||||
| Figure 7: Fallen spine | ||||
| 4.6. IPv4 over IPv6 | ||||
| RIFT allows advertising IPv4 prefixes over IPv6 RIFT network. IPv6 | ||||
| AF configures via the usual ND mechanisms and then V4 can use V6 | ||||
| nexthops analogous to RFC5549. It is expected that the whole fabric | ||||
| supports the same type of forwarding of address families on all the | ||||
| links. RIFT provides an indication whether a node is v4 forwarding | ||||
| capable and implementations are possible where different routing | ||||
| tables are computed per address family as long as the computation | ||||
| remains loop-free. | ||||
| +-----+ +-----+ | ||||
| +---+---+ | ToF | | ToF | | ||||
| ^ +--+--+ +-----+ | ||||
| | | | | | | ||||
| | | +-------------+ | | ||||
| | | +--------+ | | | ||||
| | | | | | | ||||
| V6 +-----+ +-+---+ | ||||
| Forwarding |SPINE| |SPINE| | ||||
| | +--+--+ +-----+ | ||||
| | | | | | | ||||
| | | +-------------+ | | ||||
| | | +--------+ | | | ||||
| | | | | | | ||||
| v +-----+ +-+---+ | ||||
| +---+---+ |LEAF | | LEAF| | ||||
| +--+--+ +--+--+ | ||||
| | | | ||||
| IPv4 prefixes| |IPv4 prefixes | ||||
| | | | ||||
| +---+----+ +---+----+ | ||||
| | V4 | | V4 | | ||||
| | subnet | | subnet | | ||||
| +--------+ +--------+ | ||||
| Figure 8: IPv4 over IPv6 | ||||
| 4.7. In-Band Reachability of Nodes | ||||
| 4.7.1. Reachability of Leafs | ||||
| TODO | ||||
| 4.7.2. Reachability of Spines | ||||
| TODO | ||||
| 4.8. Dual Homing Servers | ||||
| Each RIFT node may operate in zero touch provisioning (ZTP) mode. It | ||||
| has no configuration (unless it is a Top-of-Fabric at the top of the | ||||
| topology or the must operate in the topology as leaf and/or support | ||||
| leaf-2-leaf procedures) and it will fully configure itself after | ||||
| being attached to the topology. | ||||
| +---+ +---+ +---+ | ||||
| |ToF| |ToF| |ToF| | ||||
| +---+ +---+ +---+ | ||||
| | | | | | | | ||||
| | +----------------+ | | | ||||
| | | | | | | | ||||
| | +----------------+ | | ||||
| | | | | | | | ||||
| +----------+--+ +--+----------+ | ||||
| | Spine|ToR1 | | Spine|ToR2 | | ||||
| +--+------+---+ +--+-------+--+ | ||||
| +---+ | | | | | | +---+ | ||||
| | | | | | | | | | ||||
| | +-----------------+ | | | | ||||
| | | | +-------------+ | | | ||||
| + | + | | |-----------------+ | | ||||
| X | X | +--------x-----+ | X | | ||||
| + | + | | | + | | ||||
| +---+ +---+ +---+ +---+ | ||||
| | | | | | | | | | ||||
| +---+ +---+ ...............+---+ +---+ | ||||
| SV(1) SV(2) SV(n+1) SV(n) | ||||
| Figure 9: Dual-homing servers | ||||
| In the single plane, the worst condition is disaggregation of every | ||||
| other servers at the same level. Suppose the links from ToR1 to all | ||||
| the leaves become not available. All the servers' routes are | ||||
| disaggregated and the FIB of the servers will be expanded with n-1 | ||||
| more spicific routes. | ||||
| Sometimes, pleople may prefer to disaggregate from ToR to servers | ||||
| from start on, i.e. the servers have couple tens of routes in FIB | ||||
| from start on beside default routes to avoid breakages at rack level. | ||||
| Full disaggregation of the fabric could be achieved by configuration | ||||
| supported by RIFT. | ||||
| 4.9. Fabric With A Controller | ||||
| There are many different ways to deploy the controller. One | ||||
| possibility is attaching a controller to the RIFT domain from ToF and | ||||
| another possibility is attaching a controller from the leaf. | ||||
| +------------+ | ||||
| | Controller | | ||||
| ++----------++ | ||||
| | | | ||||
| | | | ||||
| +----++ ++----+ | ||||
| ---------- | ToF | | ToF | | ||||
| | +--+--+ +-----+ | ||||
| | | | | | | ||||
| | | +-------------+ | | ||||
| | | +--------+ | | | ||||
| | | | | | | ||||
| +-----+ +-+---+ | ||||
| RIFT domain |SPINE| |SPINE| | ||||
| +--+--+ +-----+ | ||||
| | | | | | | ||||
| | | +-------------+ | | ||||
| | | +--------+ | | | ||||
| | | | | | | ||||
| | +-----+ +-+---+ | ||||
| ---------- |LEAF | | LEAF| | ||||
| +-----+ +-----+ | ||||
| Figure 10: Fabric with a controller | ||||
| 4.9.1. Controller Attached to ToFs | ||||
| If a controller is attaching to the RIFT domain from ToF, it usually | ||||
| uses dual-homing connections. The loopback prefix of the controller | ||||
| should be advertised down by the ToF and spine to leaves. If the | ||||
| controller loses link to ToF, make sure the ToF withdraw the prefix | ||||
| of the controller(use different mechanisms). | ||||
| 4.9.2. Controller Attached to Leaf | ||||
| If the controller is attaching from a leaf to the fabric, no special | ||||
| provisions are needed. | ||||
| 4.10. Internet Connectivity Without Underlay | ||||
| 4.10.1. Internet Default on the Leafs | ||||
| TODO | ||||
| 4.10.2. Internet Default on the ToFs | ||||
| TODO | ||||
| 4.11. Subnet Mismatch and Address Families | ||||
| +--------+ +--------+ | ||||
| | | LIE LIE | | | ||||
| | A | +----> <----+ | B | | ||||
| | +---------------------+ | | ||||
| +--------+ +--------+ | ||||
| X/24 Y/24 | ||||
| Figure 11: subnet mismatch | ||||
| LIEs are exchanged over all links running RIFT to perform Link | ||||
| (Neighbor) Discovery. A node MUST NOT originate LIEs on an address | ||||
| family if it does not process received LIEs on that family. LIEs on | ||||
| same link are considered part of the same negotiation independent on | ||||
| the address family they arrive on. An implementation MUST be ready | ||||
| to accept TIEs on all addresses it used as source of LIE frames. | ||||
| As shown in the above figure, without further checks adjacency of | ||||
| node A and B may form, but the forwarding between node A and node B | ||||
| may fail because subnet X mismatches with subnet Y. | ||||
| To prevent this a RIFT implementation should check for subnet | ||||
| mismatch just like e.g. ISIS does. This can lead to scenarios where | ||||
| an adjacency, despite exchange of LIEs in both address families may | ||||
| end up having an adjacency in a single AF only. This is a | ||||
| consideration especially in Section 4.6 scenarios. | ||||
| 4.12. Anycast Considerations | ||||
| + traffic | ||||
| | | ||||
| v | ||||
| +------+------+ | ||||
| | ToF | | ||||
| +---+-----+---+ | ||||
| | | | | | ||||
| +------------+ | | +------------+ | ||||
| | | | | | ||||
| +---+---+ +-------+ +-------+ +---+---+ | ||||
| | | | | | | | | | ||||
| |Spine11| |Spine12| |Spine21| |Spine22| LEVEL 1 | ||||
| +-+---+-+ ++----+-+ +-+---+-+ ++----+-+ | ||||
| | | | | | | | | | ||||
| | +---------+ | | +---------+ | | ||||
| | | | | | | | | | ||||
| | +-------+ | | | +-------+ | | | ||||
| | | | | | | | | | ||||
| +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | ||||
| | | | | | | | | | ||||
| |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | ||||
| +-+-----+ ++------+ +-----+-+ +-----+-+ | ||||
| + + + ^ | | ||||
| PrefixA PrefixB PrefixA | PrefixC | ||||
| | | ||||
| + traffic | ||||
| Figure 12: Anycast | ||||
| If the traffic comes from ToF to Leaf111 or Leaf121 which has anycast | ||||
| prefix PrefixA. RIFT can deal with this case well. But if the | ||||
| traffic comes from Leaf122, it will always get to Leaf121 and never | ||||
| get to Leaf111. If the intension is that the traffic should been | ||||
| offloaded to Leaf111, then use policy guided prefixes [PGP | ||||
| reference]. | ||||
| 5. Acknowledgements | 5. Acknowledgements | |||
| 6. Contributors | 6. Contributors | |||
| The following people (listed in alphabetical order) contributed | The following people (listed in alphabetical order) contributed | |||
| significantly to the content of this document and should be | significantly to the content of this document and should be | |||
| considered co-authors: | considered co-authors: | |||
| Tony Przygienda | Tony Przygienda | |||
| Juniper Networks | ||||
| Juniper Networks | ||||
| 1194 N. Mathilda Ave | 1194 N. Mathilda Ave | |||
| Sunnyvale, CA 94089 | Sunnyvale, CA 94089 | |||
| US | US | |||
| Email: prz@juniper.net | Email: prz@juniper.net | |||
| 7. Normative References | 7. Normative References | |||
| [I-D.ietf-rift-rift] | [I-D.ietf-rift-rift] | |||
| Team, T., "RIFT: Routing in Fat Trees", draft-ietf-rift- | Przygienda, T., Sharma, A., Thubert, P., and D. Afanasiev, | |||
| rift-05 (work in progress), April 2019. | "RIFT: Routing in Fat Trees", draft-ietf-rift-rift-08 | |||
| (work in progress), September 2019. | ||||
| [I-D.white-distoptflood] | [I-D.white-distoptflood] | |||
| White, R. and S. Zandi, "IS-IS Optimal Distributed | White, R., Hegde, S., and S. Zandi, "IS-IS Optimal | |||
| Flooding for Dense Topologies", draft-white- | Distributed Flooding for Dense Topologies", draft-white- | |||
| distoptflood-00 (work in progress), March 2019. | distoptflood-01 (work in progress), September 2019. | |||
| [ISO10589-Second-Edition] | [ISO10589-Second-Edition] | |||
| International Organization for Standardization, | International Organization for Standardization, | |||
| "Intermediate system to Intermediate system intra-domain | "Intermediate system to Intermediate system intra-domain | |||
| routeing information exchange protocol for use in | routeing information exchange protocol for use in | |||
| conjunction with the protocol for providing the | conjunction with the protocol for providing the | |||
| connectionless-mode Network Service (ISO 8473)", Nov 2002. | connectionless-mode Network Service (ISO 8473)", Nov 2002. | |||
| [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, | [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, | |||
| DOI 10.17487/RFC2328, April 1998, | DOI 10.17487/RFC2328, April 1998, | |||
| End of changes. 42 change blocks. | ||||
| 147 lines changed or deleted | 477 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||