| < draft-ietf-rift-applicability-06.txt | draft-ietf-rift-applicability-07.txt > | |||
|---|---|---|---|---|
| RIFT WG Yuehua. Wei, Ed. | RIFT WG Yuehua. Wei, Ed. | |||
| Internet-Draft Zheng. Zhang | Internet-Draft Zheng. Zhang | |||
| Intended status: Informational ZTE Corporation | Intended status: Informational ZTE Corporation | |||
| Expires: 13 November 2021 Dmitry. Afanasiev | Expires: 21 March 2022 Dmitry. Afanasiev | |||
| Yandex | Yandex | |||
| P. Thubert | P. Thubert | |||
| Cisco Systems | Cisco Systems | |||
| Tom. Verhaeg | ||||
| Juniper Networks | ||||
| Jaroslaw. Kowalczyk | Jaroslaw. Kowalczyk | |||
| Orange Polska | Orange Polska | |||
| 12 May 2021 | 17 September 2021 | |||
| RIFT Applicability | RIFT Applicability | |||
| draft-ietf-rift-applicability-06 | draft-ietf-rift-applicability-07 | |||
| Abstract | Abstract | |||
| This document discusses the properties, applicability and operational | This document discusses the properties, applicability and operational | |||
| considerations of RIFT in different network scenarios. It intends to | considerations of RIFT in different network scenarios. It intends to | |||
| provide a rough guide how RIFT can be deployed to simplify routing | provide a rough guide how RIFT can be deployed to simplify routing | |||
| operations in Clos topologies and their variations. | operations in Clos topologies and their variations. | |||
| Status of This Memo | Status of This Memo | |||
| skipping to change at page 1, line 41 ¶ | skipping to change at page 1, line 39 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on 13 November 2021. | This Internet-Draft will expire on 21 March 2022. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
| license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
| extracted from this document must include Simplified BSD License text | extracted from this document must include Simplified BSD License text | |||
| as described in Section 4.e of the Trust Legal Provisions and are | as described in Section 4.e of the Trust Legal Provisions and are | |||
| provided without warranty as described in the Simplified BSD License. | provided without warranty as described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Problem Statement of Routing in Modern IP Fabric Fat Tree | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| Networks . . . . . . . . . . . . . . . . . . . . . . . . 3 | 3. Problem Statement of Routing in Modern IP Fabric Fat Tree | |||
| 3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 4 | Networks . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 4 | 4. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 5 | |||
| 3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 6 | 4.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 5 | |||
| 3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 7 | 4.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 8 | |||
| 3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 7 | 4.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 8 | |||
| 3.2.3. Generalizing to any Directed Acyclic Graph . . . . . 7 | 4.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 9 | |||
| 3.2.4. Reachability of Internal Nodes in the Fabric . . . . 9 | 4.2.3. Generalizing to any Directed Acyclic Graph . . . . . 9 | |||
| 3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 9 | 4.2.4. Reachability of Internal Nodes in the Fabric . . . . 11 | |||
| 3.3.1. Data Center Topologies . . . . . . . . . . . . . . . 9 | 4.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 11 | 4.3.1. Data Center Topologies . . . . . . . . . . . . . . . 11 | |||
| 3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 11 | 4.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 12 | |||
| 3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 11 | 4.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 13 | |||
| 3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 11 | 4.3.4. Internal Router Switching Fabrics . . . . . . . . . . 13 | |||
| 4. Operational Considerations . . . . . . . . . . . . . . . . . 13 | 4.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 14 | 5. Operational Considerations . . . . . . . . . . . . . . . . . 15 | |||
| 4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 14 | 5.1. South Reflection . . . . . . . . . . . . . . . . . . . . 16 | |||
| 4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 16 | 5.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 16 | |||
| 4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 17 | 5.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 18 | |||
| 4.5. Mis-cabling Examples . . . . . . . . . . . . . . . . . . 18 | 5.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 19 | |||
| 4.6. Positive vs. Negative Disaggregation . . . . . . . . . . 20 | 5.5. Mis-cabling Examples . . . . . . . . . . . . . . . . . . 20 | |||
| 4.7. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 22 | 5.6. Positive vs. Negative Disaggregation . . . . . . . . . . 22 | |||
| 4.8. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 24 | 5.7. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 24 | |||
| 4.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 25 | 5.8. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 4.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 26 | 5.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 26 | |||
| 4.11. Fabric With A Controller . . . . . . . . . . . . . . . . 27 | 5.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 28 | |||
| 4.11.1. Controller Attached to ToFs . . . . . . . . . . . . 27 | 5.11. Fabric With A Controller . . . . . . . . . . . . . . . . 29 | |||
| 4.11.2. Controller Attached to Leaf . . . . . . . . . . . . 28 | 5.11.1. Controller Attached to ToFs . . . . . . . . . . . . 29 | |||
| 4.12. Internet Connectivity With Underlay . . . . . . . . . . . 28 | 5.11.2. Controller Attached to Leaf . . . . . . . . . . . . 29 | |||
| 4.12.1. Internet Default on the Leaf . . . . . . . . . . . . 28 | 5.12. Internet Connectivity With Underlay . . . . . . . . . . . 30 | |||
| 4.12.2. Internet Default on the ToFs . . . . . . . . . . . . 28 | 5.12.1. Internet Default on the Leaf . . . . . . . . . . . . 30 | |||
| 4.13. Subnet Mismatch and Address Families . . . . . . . . . . 28 | 5.12.2. Internet Default on the ToFs . . . . . . . . . . . . 30 | |||
| 4.14. Anycast Considerations . . . . . . . . . . . . . . . . . 29 | 5.13. Subnet Mismatch and Address Families . . . . . . . . . . 30 | |||
| 4.15. IoT Applicability . . . . . . . . . . . . . . . . . . . . 30 | 5.14. Anycast Considerations . . . . . . . . . . . . . . . . . 31 | |||
| 4.16. Key Management . . . . . . . . . . . . . . . . . . . . . 30 | 5.15. IoT Applicability . . . . . . . . . . . . . . . . . . . . 32 | |||
| 5.16. Key Management . . . . . . . . . . . . . . . . . . . . . 32 | ||||
| 5. Security Considerations . . . . . . . . . . . . . . . . . . . 31 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 33 | |||
| 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 31 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 | |||
| 7. Normative References . . . . . . . . . . . . . . . . . . . . 31 | 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
| 8. Informative References . . . . . . . . . . . . . . . . . . . 33 | 9. Normative References . . . . . . . . . . . . . . . . . . . . 33 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 | 10. Informative References . . . . . . . . . . . . . . . . . . . 35 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 36 | ||||
| 1. Introduction | 1. Introduction | |||
| This document discusses the properties and applicability of "Routing | This document discusses the properties and applicability of "Routing | |||
| in Fat Trees" [RIFT] (RIFT) in different deployment scenarios and | in Fat Trees" [RIFT] in different deployment scenarios and highlights | |||
| highlights the operational simplicity of the technology compared to | the operational simplicity of the technology compared to traditional | |||
| traditional routing solutions. It also documents special | routing solutions. It also documents special considerations when | |||
| considerations when RIFT is used with or without overlays and/or | RIFT is used with or without overlays and/or controllers, and how | |||
| controllers, and how RIFT identifies topology mis-cablings and | RIFT identifies topology mis-cablings and reroutes around node and | |||
| reroutes around node and link failures. | link failures. | |||
| 2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks | 2. Terminology | |||
| Clos [CLOS] and fat tree [FATTREE] topologies have gained prominence | Clos/Fat Tree: | |||
| in today's networking, primarily as a result of the paradigm shift | ||||
| towards a centralized data-center based architecture that deliver a | This document uses the terms Clos and Fat Tree interchangeably | |||
| majority of computation and storage services. | whereas it always refers to a folded spine-and-leaf topology with | |||
| possibly multiple Points of Delivery (PoDs) and one or multiple Top | ||||
| of Fabric (ToF) planes. | ||||
| Directed Acyclic Graph (DAG): | ||||
| A finite directed graph with no directed cycles (loops). If links in | ||||
| a Clos are considered as either being all directed towards the top or | ||||
| vice versa, each of such two graphs is a DAG. | ||||
| Disaggregation: | ||||
| Process in which a node decides to advertise more specific prefixes | ||||
| Southwards, either positively to attract the corresponding traffic, | ||||
| or negatively to repel it. Disaggregation is performed to prevent | ||||
| black-holing and suboptimal routing to the more specific prefixes. | ||||
| TIE: | ||||
| This is an acronym for a "Topology Information Element". TIEs are | ||||
| exchanged between RIFT nodes to describe parts of a network such as | ||||
| links and address prefixes. A TIE has always a direction and a type. | ||||
| North TIEs (sometimes abbreviated as N-TIEs) are used when dealing | ||||
| with TIEs in the northbound representation and South-TIEs (sometimes | ||||
| abbreviated as S- TIEs) for the southbound equivalent. TIEs have | ||||
| different types such as node and prefix TIEs. | ||||
| Node TIE: | ||||
| This stands as acronym for a "Node Topology Information Element", | ||||
| which contains all adjacencies the node discovered and information | ||||
| about the node itself. Node TIE should NOT be confused with a North | ||||
| TIE since "node" defines the type of TIE rather than its direction. | ||||
| Consequently North Node TIEs and South Node TIEs exist. | ||||
| Prefix TIE: | ||||
| This is an acronym for a "Prefix Topology Information Element" and it | ||||
| contains all prefixes directly attached to this node in case of a | ||||
| North TIE and in case of South TIE the necessary default routes the | ||||
| node advertises southbound. | ||||
| South Reflection: | ||||
| Often abbreviated just as "reflection", it defines a mechanism where | ||||
| South Node TIEs are "reflected" from the level south back up north to | ||||
| allow nodes in the same level without East- West links to "see" each | ||||
| other's node Topology Information Elements (TIEs). | ||||
| LIE: | ||||
| This is an acronym for a "Link Information Element" exchanged on all | ||||
| the system's links running RIFT to form ThreeWay adjacencies and | ||||
| carry information used to perform Zero Touch Provisioning (ZTP) of | ||||
| levels. | ||||
| Shortest-Path First (SPF): | ||||
| A well-known graph algorithm attributed to Dijkstra that establishes | ||||
| a tree of shortest paths from a source to destinations on the graph. | ||||
| SPF acronym is used due to its familiarity as general term for the | ||||
| node reachability calculations. RIFT can employ to ultimately | ||||
| calculate routes of which Dijkstra algorithm is a possible one. | ||||
| North SPF (N-SPF): | ||||
| A reachability calculation that is progressing northbound, as example | ||||
| SPF that is using South Node TIEs only. Normally it progresses a | ||||
| single hop only and installs default routes. | ||||
| South SPF (S-SPF): | ||||
| A reachability calculation that is progressing southbound, as example | ||||
| SPF that is using North Node TIEs only. | ||||
| 3. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks | ||||
| Clos [CLOS] topologies (called commonly a fat tree/network in modern | ||||
| IP fabric considerations as homonym to the original definition of the | ||||
| term Fat Tree [FATTREE])have gained prominence in today's networking, | ||||
| primarily as a result of the paradigm shift towards a centralized | ||||
| data-center based architecture that deliver a majority of computation | ||||
| and storage services. | ||||
| Today's current routing protocols were geared towards a network with | Today's current routing protocols were geared towards a network with | |||
| an irregular topology with isotropic properties, and low degree of | an irregular topology with isotropic properties, and low degree of | |||
| connectivity. When applied to Fat Tree topologies: | connectivity. When applied to Fat Tree topologies: | |||
| * They tend to need extensive configuration or provisioning during | * They tend to need extensive configuration or provisioning during | |||
| bring up and re-dimensioning. | bring up and re-dimensioning. | |||
| * All nodes including spine and leaf nodes learn the entire network | * All nodes including spine and leaf nodes learn the entire network | |||
| topology and routing information, which is in fact, not needed on | topology and routing information, which is in fact, not needed on | |||
| the leaf nodes during normal operation. | the leaf nodes during normal operation. | |||
| * They flood significant amounts of duplicate link state information | * They flood significant amounts of duplicate link state information | |||
| between spine and leaf nodes during topology updates and | between spine and leaf nodes during topology updates and | |||
| convergence events, requiring that additional CPU and link | convergence events, requiring that additional CPU and link | |||
| bandwidth be consumed. This may impact the stability and | bandwidth be consumed. This may impact the stability and | |||
| scalability of the fabric, make the fabric less reactive to | scalability of the fabric, make the fabric less reactive to | |||
| failures, and prevent the use of cheaper hardware at the lower | failures, and prevent the use of cheaper hardware at the lower | |||
| levels (i.e. spine and leaf nodes). | levels (i.e. spine and leaf nodes). | |||
| 3. Applicability of RIFT to Clos IP Fabrics | 4. Applicability of RIFT to Clos IP Fabrics | |||
| Further content of this document assumes that the reader is familiar | Further content of this document assumes that the reader is familiar | |||
| with the terms and concepts used in OSPF [RFC2328] and IS-IS | with the terms and concepts used in OSPF [RFC2328] and IS-IS | |||
| [ISO10589-Second-Edition] link-state protocols. The sections of RIFT | [ISO10589-Second-Edition] link-state protocols. The sections of RIFT | |||
| [RIFT] outline the requirements of routing in IP fabrics and RIFT | [RIFT] outline the requirements of routing in IP fabrics and RIFT | |||
| protocol concepts. | protocol concepts. | |||
| 3.1. Overview of RIFT | 4.1. Overview of RIFT | |||
| RIFT is a dynamic routing protocol that is tailored for use in Clos, | RIFT is a dynamic routing protocol that is tailored for use in Clos, | |||
| Fat-Tree, and other anisotropic topologies. A core property of RIFT | Fat-Tree, and other anisotropic topologies. A core property of RIFT | |||
| is that its operation is sensitive to the structure of the fabric - | is that its operation is sensitive to the structure of the fabric - | |||
| it is anisotropic. RIFT acts as a link-state protocol when "pointing | it is anisotropic. RIFT acts as a link-state protocol when "pointing | |||
| north" - advertising southwards routes to northwards peer routers | north" - advertising southwards routes to northwards peer routers | |||
| (parents) through flooding and database synchronization- but operates | (parents) through flooding and database synchronization- but operates | |||
| hop-by-hop like a distance-vector protocol when "pointing south" - | hop-by-hop like a distance-vector protocol when "pointing south" - | |||
| typically advertising a fabric default route directed towards the Top | typically advertising a fabric default route directed towards the Top | |||
| of Fabric (ToF, aka superspine) to southwards peer routers | of Fabric (ToF, aka superspine) to southwards peer routers | |||
| (children). | (children). | |||
| The fabric default is typically the default route, as described in | The fabric default is typically the default route, as described in | |||
| Section 3.2.3.8 "Southbound Default Route Origination" of RIFT | Section 4.2.3.8 "Southbound Default Route Origination" of RIFT | |||
| [RIFT]. The ToF nodes may alternatively originate more specific | [RIFT]. The ToF nodes may alternatively originate more specific | |||
| prefixes (P') southbound instead of the default route. In such a | prefixes (P') southbound instead of the default route. In such a | |||
| scenario, all addresses carried within the RIFT domain MUST be | scenario, all addresses carried within the RIFT domain must be | |||
| contained within P', and it is possible for a leaf that acts as | contained within P', and it is possible for a leaf that acts as | |||
| gateway to the internet to advertise the default route instead. | gateway to the internet to advertise the default route instead. | |||
| RIFT floods flat link-state information northbound only so that each | RIFT floods flat link-state information northbound only so that each | |||
| level obtains the full topology of levels south of it. That | level obtains the full topology of levels south of it. That | |||
| information is never flooded east-west or back south again. So a top | information is never flooded east-west or back south again. So a top | |||
| tier node has full set of prefixes from the Shortest Path First (SPF) | tier node has full set of prefixes from the Shortest Path First (SPF) | |||
| calculation. | calculation. | |||
| In the southbound direction, the protocol operates like a "fully | In the southbound direction, the protocol operates like a "fully | |||
| summarizing, unidirectional" path-vector protocol or rather a | summarizing, unidirectional" path-vector protocol or rather a | |||
| distance-vector with implicit split horizon. Routing information, | distance-vector with implicit split horizon. Routing information, | |||
| normally just the default route, propagates one hop south and is 're- | normally just the default route, propagates one hop south and is "re- | |||
| advertised' by nodes at next lower level. | advertised" by nodes at next lower level. | |||
| +-----------+ +-----------+ | +-----------+ +-----------+ | |||
| | ToF | | ToF | LEVEL 2 | | ToF | | ToF | LEVEL 2 | |||
| + +-----+--+--+ +-+--+------+ | + +-----+--+--+ +-+--+------+ | |||
| | | | | | | | | | ^ | | | | | | | | | | ^ | |||
| + | | | +-------------------------+ | | + | | | +-------------------------+ | | |||
| Distance | +-------------------+ | | | | | | Distance | +-------------------+ | | | | | | |||
| Vector | | | | | | | | + | Vector | | | | | | | | + | |||
| South | | | | +--------+ | | | Link-State | South | | | | +--------+ | | | Link-State | |||
| + | | | | | | | | Flooding | + | | | | | | | | Flooding | |||
| skipping to change at page 6, line 23 ¶ | skipping to change at page 7, line 43 ¶ | |||
| IS-IS but also MANET and IoT, to provide unique features: | IS-IS but also MANET and IoT, to provide unique features: | |||
| * Automatic (positive or negative) route disaggregation of | * Automatic (positive or negative) route disaggregation of | |||
| northwards routes upon fallen leaves | northwards routes upon fallen leaves | |||
| * Recursive operation in the case of negative route disaggregation | * Recursive operation in the case of negative route disaggregation | |||
| * Anisotropic routing that extends a principle seen in RPL [RFC6550] | * Anisotropic routing that extends a principle seen in RPL [RFC6550] | |||
| to wide superspines | to wide superspines | |||
| * Optimal Flooding Reduction that derives from the concept of a | * Optimal flooding reduction that derives from the concept of a | |||
| "multipoint relay" (MPR) found in OLSR [RFC3626] and balances the | "multipoint relay" (MPR) found in OLSR [RFC3626] and balances the | |||
| flooding load over northbound links and nodes. | flooding load over northbound links and nodes. | |||
| Additional advantages that are unique to RIFT are listed below, the | Additional advantages that are unique to RIFT are listed below, the | |||
| details of which can be found in RIFT [RIFT]. | details of which can be found in RIFT [RIFT]. | |||
| * True ZTP | * True ZTP(Zero Touch Provisioning) | |||
| * Minimal blast radius on failures | * Minimal blast radius on failures | |||
| * Can utilize all Paths through fabric without looping | * Can utilize all paths through fabric without looping | |||
| * Simple leaf implementation that can scale down to servers | * Simple leaf implementation that can scale down to servers | |||
| * Key-Value store | * Key-Value store | |||
| * Horizontal links used for protection only | * Horizontal links used for protection only | |||
| * Supports non-equal cost multipath (NECMP) and can replace multi- | * Supports non-equal cost multipath and can replace multi-chassis | |||
| chassis link aggregation group (MLAG or MC-LAG) | link aggregation group (MLAG or MC-LAG) | |||
| 3.2. Applicable Topologies | 4.2. Applicable Topologies | |||
| Albeit RIFT is specified primarily for "proper" Clos or Fat Tree | Albeit RIFT is specified primarily for "proper" Clos or Fat Tree | |||
| topologies, the protocol natively supports Points of Delivery (PoD) | topologies, the protocol natively supports Points of Delivery (PoD) | |||
| concepts, which, strictly speaking, are not found in the original | concepts, which, strictly speaking, are not found in the original | |||
| Clos concept. | Clos concept. | |||
| Further, the specification explains and supports operations of multi- | Further, the specification explains and supports operations of multi- | |||
| plane Clos variants where the protocol recommends the use of inter- | plane Clos variants where the protocol recommends the use of inter- | |||
| plane rings at the Top-of-Fabric level to allow the reconciliation of | plane rings at the Top-of-Fabric level to allow the reconciliation of | |||
| topology view of different planes to make the negative disaggregation | topology view of different planes to make the negative disaggregation | |||
| viable in case of failures within a plane. These observations hold | viable in case of failures within a plane. These observations hold | |||
| not only in case of RIFT but also in the generic case of dynamic | not only in case of RIFT but also in the generic case of dynamic | |||
| routing on Clos variants with multiple planes and failures in bi- | routing on Clos variants with multiple planes and failures in bi- | |||
| sectional bandwidth, especially on the leafs. | sectional bandwidth, especially on the leafs. | |||
| 3.2.1. Horizontal Links | 4.2.1. Horizontal Links | |||
| RIFT is not limited to pure Clos divided into PoD and multi-planes | RIFT is not limited to pure Clos divided into PoD and multi-planes | |||
| but supports horizontal (East-West) links below the top of fabric | but supports horizontal (East-West) links below the top of fabric | |||
| level. Those links are used only for last resort northbound routes | level. Those links are used only for last resort northbound routes | |||
| when a spine loses all its northbound links or cannot compute a | when a spine loses all its northbound links or cannot compute a | |||
| default route through them. | default route through them. | |||
| A possible configuration is a "ring" of horizontal links at a level. | A possible configuration is a "ring" of horizontal links at a level. | |||
| In presence of such a "ring" in any level (except Top of Fabric (ToF) | In presence of such a "ring" in any level (except Top of Fabric (ToF) | |||
| level) neither North SPF (N-SPF) nor South SPF (S-SPF) will provide a | level) neither North SPF (N-SPF) nor South SPF (S-SPF) will provide a | |||
| "ring-based protection" scheme since such a computation would have to | "ring-based protection" scheme since such a computation would have to | |||
| deal necessarily with breaking of "loops" in Dijkstra sense; an | deal necessarily with breaking of "loops" in Dijkstra sense; an | |||
| application for which RIFT is not intended. | application for which RIFT is not intended. | |||
| A full-mesh connectivity between nodes on the same level can be | A full-mesh connectivity between nodes on the same level can be | |||
| employed and that allows N-SPF to provide for any node loosing all | employed and that allows N-SPF to provide for any node loosing all | |||
| its northbound adjacencies (as long as any of the other nodes in the | its northbound adjacencies (as long as any of the other nodes in the | |||
| level are northbound connected) to still participate in northbound | level are northbound connected) to still participate in northbound | |||
| forwarding. | forwarding. | |||
| 3.2.2. Vertical Shortcuts | 4.2.2. Vertical Shortcuts | |||
| Through relaxations of the specified adjacency forming rules, RIFT | Through relaxations of the specified adjacency forming rules, RIFT | |||
| implementations can be extended to support vertical "shortcuts" as | implementations can be extended to support vertical "shortcuts". The | |||
| proposed by e.g. [I-D.white-distoptflood]. The RIFT specification | RIFT specification itself does not provide the exact details since | |||
| itself does not provide the exact details since the resulting | the resulting solution suffers from either much larger blast radius | |||
| solution suffers from either much larger blast radius with increased | with increased flooding volumes or in case of maximum aggregation | |||
| flooding volumes or in case of maximum aggregation routing, bow-tie | routing, bow-tie problems. | |||
| problems. | ||||
| 3.2.3. Generalizing to any Directed Acyclic Graph | 4.2.3. Generalizing to any Directed Acyclic Graph | |||
| RIFT is an anisotropic routing protocol, meaning that it has a sense | RIFT is an anisotropic routing protocol, meaning that it has a sense | |||
| of direction (northbound, southbound, east-west) and that it operates | of direction (northbound, southbound, east-west) and that it operates | |||
| differently depending on the direction. | differently depending on the direction. | |||
| * Northbound, RIFT operates as a link-state protocol, whereby the | * Northbound, RIFT operates as a link-state protocol, whereby the | |||
| control packets are reflooded first all the way north and only | control packets are reflooded first all the way north and only | |||
| interpreted later. All the individual fine grained routes are | interpreted later. All the individual fine grained routes are | |||
| advertised. | advertised. | |||
| skipping to change at page 9, line 7 ¶ | skipping to change at page 10, line 26 ¶ | |||
| only incoming vertices is a ToF node. | only incoming vertices is a ToF node. | |||
| There are a number of caveats though: | There are a number of caveats though: | |||
| * The DAG structure must exist before RIFT starts, so there is a | * The DAG structure must exist before RIFT starts, so there is a | |||
| need for a companion protocol to establish the logical DAG | need for a companion protocol to establish the logical DAG | |||
| structure. | structure. | |||
| * A generic DAG does not have a sense of east and west. The | * A generic DAG does not have a sense of east and west. The | |||
| operation specified for east-west links and the southbound | operation specified for east-west links and the southbound | |||
| reflection between nodes are not applicable. Also ZTP will derive | reflection between nodes are not applicable. Also ZTP(Zero Touch | |||
| a sense of depth that will eliminate some links. Variations of | Provisioning) will derive a sense of depth that will eliminate | |||
| ZTP could be derived to meet specific objectives, e.g., make it so | some links. Variations of ZTP(Zero Touch Provisioning) could be | |||
| that most routers have at least 2 parents to reach the ToF. | derived to meet specific objectives, e.g., make it so that most | |||
| routers have at least 2 parents to reach the ToF. | ||||
| * RIFT applies to any Destination-Oriented DAG (DODAG) where there's | * RIFT applies to any Destination-Oriented DAG (DODAG) where there's | |||
| only one ToF node and the problem of disaggregation does not | only one ToF node and the problem of disaggregation does not | |||
| exist. In that case, RIFT operates very much like RPL [RFC6550], | exist. In that case, RIFT operates very much like RPL [RFC6550], | |||
| but using Link State for southbound routes (downwards in RPL's | but using Link State for southbound routes (downwards in RPL's | |||
| terms). For an arbitrary DAG with multiple destinations (ToFs) | terms). For an arbitrary DAG with multiple destinations (ToFs) | |||
| the way disaggregation happens has to be considered. | the way disaggregation happens has to be considered. | |||
| * Positive disaggregation expects that most of the ToF nodes reach | * Positive disaggregation expects that most of the ToF nodes reach | |||
| most of the leaves, so disaggregation is the exception as opposed | most of the leaves, so disaggregation is the exception as opposed | |||
| to the rule. When this is no more true, it makes sense to turn | to the rule. When this is no more true, it makes sense to turn | |||
| off disaggregation and route between the ToF nodes over a ring, a | off disaggregation and route between the ToF nodes over a ring, a | |||
| full mesh, transit network, or a form of area zero. There again, | full mesh, transit network, or a form of area zero. There again, | |||
| this operation is similar to RPL operating as a single DODAG with | this operation is similar to RPL operating as a single DODAG with | |||
| a virtual root. | a virtual root. | |||
| * In order to aggregate and disaggregate routes, RIFT requires that | * In order to aggregate and disaggregate routes, RIFT requires that | |||
| all the ToF nodes share the full knowledge of the prefixes in the | all the ToF nodes share the full knowledge of the prefixes in the | |||
| fabric. | fabric. | |||
| * This can be achieved with a ring as suggested by the RIFT main | * This can be achieved with a ring as suggested by "RIFT" [RIFT], by | |||
| specification, by some preconfiguration, or using a | some preconfiguration, or using a synchronization with a common | |||
| synchronization with a common repository where all the active | repository where all the active prefixes are registered. | |||
| prefixes are registered. | ||||
| 3.2.4. Reachability of Internal Nodes in the Fabric | 4.2.4. Reachability of Internal Nodes in the Fabric | |||
| RIFT does not require that nodes have reachable addresses in the | RIFT does not require that nodes have reachable addresses in the | |||
| fabric, though it is clearly desirable for operational purposes. | fabric, though it is clearly desirable for operational purposes. | |||
| Under normal operating conditions this can be easily achieved by | Under normal operating conditions this can be easily achieved by | |||
| injecting the node's loopback address into North and South Prefix | injecting the node's loopback address into North and South Prefix | |||
| TIEs or other implementation specific mechanisms. | TIEs or other implementation specific mechanisms. | |||
| Special considerations arise when a node loses all northbound | Special considerations arise when a node loses all northbound | |||
| adjacencies, but is not at the top of the fabric. These are outside | adjacencies, but is not at the top of the fabric. These are outside | |||
| the scope of this document and could be discussed in a separate | the scope of this document and could be discussed in a separate | |||
| document. | document. | |||
| 3.3. Use Cases | 4.3. Use Cases | |||
| 3.3.1. Data Center Topologies | 4.3.1. Data Center Topologies | |||
| 3.3.1.1. Data Center Fabrics | ||||
| 4.3.1.1. Data Center Fabrics | ||||
| RIFT is suited for applying in data center (DC) IP fabrics underlay | RIFT is suited for applying in data center (DC) IP fabrics underlay | |||
| routing, vast majority of which seem to be currently (and for the | routing, vast majority of which seem to be currently (and for the | |||
| foreseeable future) Clos architectures. It significantly simplifies | foreseeable future) Clos architectures. It significantly simplifies | |||
| operation and deployment of such fabrics as described in Section 4 | operation and deployment of such fabrics as described in Section 5 | |||
| for environments compared to extensive proprietary provisioning and | for environments compared to extensive proprietary provisioning and | |||
| operational solutions. | operational solutions. | |||
| 3.3.1.2. Adaptations to Other Proposed Data Center Topologies | 4.3.1.2. Adaptations to Other Proposed Data Center Topologies | |||
| . +-----+ +-----+ | . +-----+ +-----+ | |||
| . | | | | | . | | | | | |||
| .+-+ S0 | | S1 | | .+-+ S0 | | S1 | | |||
| .| ++---++ ++---++ | .| ++---++ ++---++ | |||
| .| | | | | | .| | | | | | |||
| .| | +------------+ | | .| | +------------+ | | |||
| .| | | +------------+ | | .| | | +------------+ | | |||
| .| | | | | | .| | | | | | |||
| .| ++-+--+ +--+-++ | .| ++-+--+ +--+-++ | |||
| .| | | | | | .| | | | | | |||
| skipping to change at page 11, line 5 ¶ | skipping to change at page 12, line 41 ¶ | |||
| example of a shortcut between levels. In this example, sub-optimal | example of a shortcut between levels. In this example, sub-optimal | |||
| routing will occur when traffic is sent from L0 to L1 via S0's | routing will occur when traffic is sent from L0 to L1 via S0's | |||
| default route and back down through A0 or A1. In order to ensure | default route and back down through A0 or A1. In order to ensure | |||
| that, only default routes from A0 or A1 are used, all leaves would be | that, only default routes from A0 or A1 are used, all leaves would be | |||
| required to install each others routes. | required to install each others routes. | |||
| While various technical and operational challenges may require the | While various technical and operational challenges may require the | |||
| use of such modifications, discussion of those topics are outside the | use of such modifications, discussion of those topics are outside the | |||
| scope of this document. | scope of this document. | |||
| 3.3.2. Metro Fabrics | 4.3.2. Metro Fabrics | |||
| The demand for bandwidth is increasing steadily, driven primarily by | The demand for bandwidth is increasing steadily, driven primarily by | |||
| environments close to content producers (server farms connection via | environments close to content producers (server farms connection via | |||
| DC fabrics) but in proximity to content consumers as well. Consumers | DC fabrics) but in proximity to content consumers as well. Consumers | |||
| are often clustered in metro areas with their own network | are often clustered in metro areas with their own network | |||
| architectures that can benefit from simplified, regular Clos | architectures that can benefit from simplified, regular Clos | |||
| structures and hence from RIFT. | structures and hence from RIFT. | |||
| 3.3.3. Building Cabling | 4.3.3. Building Cabling | |||
| Commercial edifices are often cabled in topologies that are either | Commercial edifices are often cabled in topologies that are either | |||
| Clos or its isomorphic equivalents. The Clos can grow rather high | Clos or its isomorphic equivalents. The Clos can grow rather high | |||
| with many floors. That presents a challenge for traditional routing | with many floors. That presents a challenge for traditional routing | |||
| protocols (except BGP and by now largely phased-out PNNI) which do | protocols (except BGP and by now largely phased-out PNNI) which do | |||
| not support an arbitrary number of levels which RIFT does naturally. | not support an arbitrary number of levels which RIFT does naturally. | |||
| Moreover, due to the limited sizes of forwarding tables in network | Moreover, due to the limited sizes of forwarding tables in network | |||
| elements of building cabling, the minimum FIB size RIFT maintains | elements of building cabling, the minimum FIB size RIFT maintains | |||
| under normal conditions is cost-effective in terms of hardware and | under normal conditions is cost-effective in terms of hardware and | |||
| operational costs. | operational costs. | |||
| 3.3.4. Internal Router Switching Fabrics | 4.3.4. Internal Router Switching Fabrics | |||
| It is common in high-speed communications switching and routing | It is common in high-speed communications switching and routing | |||
| devices to use fabrics when a crossbar is not feasible due to cost, | devices to use fabrics when a crossbar is not feasible due to cost, | |||
| head-of-line blocking or size trade-offs. Normally such fabrics are | head-of-line blocking or size trade-offs. Normally such fabrics are | |||
| not self-healing or rely on 1:/+1 protection schemes but it is | not self-healing or rely on 1:/+1 protection schemes but it is | |||
| conceivable to use RIFT to operate Clos fabrics that can deal | conceivable to use RIFT to operate Clos fabrics that can deal | |||
| effectively with interconnections or subsystem failures in such | effectively with interconnections or subsystem failures in such | |||
| module. RIFT is neither IP specific and hence any link addressing | module. RIFT is neither IP specific and hence any link addressing | |||
| connecting internal device subnets is conceivable. | connecting internal device subnets is conceivable. | |||
| 3.3.5. CloudCO | 4.3.5. CloudCO | |||
| The Cloud Central Office (CloudCO) is a new stage of telecom Central | The Cloud Central Office (CloudCO) is a new stage of telecom Central | |||
| Office. It takes the advantage of Software Defined Networking (SDN) | Office. It takes the advantage of Software Defined Networking (SDN) | |||
| and Network Function Virtualization (NFV) in conjunction with general | and Network Function Virtualization (NFV) in conjunction with general | |||
| purpose hardware to optimize current networks. The following figure | purpose hardware to optimize current networks. The following figure | |||
| illustrates this architecture at a high level. It describes a single | illustrates this architecture at a high level. It describes a single | |||
| instance or macro-node of cloud CO that provides a number of Value | instance or macro-node of cloud CO that provides a number of Value | |||
| Added Services (VAS), a Broadband Access Abstraction (BAA), and | Added Services (VAS), a Broadband Access Abstraction (BAA), and | |||
| virtualized nerwork services. An Access I/O module faces a Cloud CO | virtualized nerwork services. An Access I/O module faces a Cloud CO | |||
| access node, and the Customer Premises Equipments (CPEs) behind it. | access node, and the Customer Premises Equipments (CPEs) behind it. | |||
| skipping to change at page 13, line 5 ¶ | skipping to change at page 15, line 5 ¶ | |||
| | | | | | | |||
| ++-----------+ +---------++ | ++-----------+ +---------++ | |||
| |Network I/O | |Access I/O| | |Network I/O | |Access I/O| | |||
| +------------+ +----------+ | +------------+ +----------+ | |||
| Figure 3: An example of CloudCO architecture | Figure 3: An example of CloudCO architecture | |||
| The Spine-Leaf architecture deployed inside CloudCO meets the network | The Spine-Leaf architecture deployed inside CloudCO meets the network | |||
| requirements of adaptable, agile, scalable and dynamic. | requirements of adaptable, agile, scalable and dynamic. | |||
| 4. Operational Considerations | 5. Operational Considerations | |||
| RIFT presents the opportunity for organizations building and | RIFT presents the opportunity for organizations building and | |||
| operating IP fabrics to simplify their operation and deployments | operating IP fabrics to simplify their operation and deployments | |||
| while achieving many desirable properties of a dynamic routing on | while achieving many desirable properties of a dynamic routing on | |||
| such a substrate: | such a substrate: | |||
| * RIFT only floods routing information to the devices that | * RIFT only floods routing information to the devices that | |||
| absolutely need it. RIFT design follows minimum blast radius and | absolutely need it. RIFT design follows minimum blast radius and | |||
| minimum necessary epistemological scope philosophy which leads to | minimum necessary epistemological scope philosophy which leads to | |||
| good scaling properties while delivering maximum reactiveness. | good scaling properties while delivering maximum reactiveness. | |||
| skipping to change at page 14, line 16 ¶ | skipping to change at page 16, line 16 ¶ | |||
| the fabric. In conjunction with [RFC8505], RIFT can differentiate | the fabric. In conjunction with [RFC8505], RIFT can differentiate | |||
| anycast advertisements from mobility events and retain only the | anycast advertisements from mobility events and retain only the | |||
| most recent advertisement in the latter case. | most recent advertisement in the latter case. | |||
| * Many further operational and design points collected over many | * Many further operational and design points collected over many | |||
| years of routing protocol deployments have been incorporated in | years of routing protocol deployments have been incorporated in | |||
| RIFT such as fast flooding rates, protection of information | RIFT such as fast flooding rates, protection of information | |||
| lifetimes and operationally easily recognizable remote ends of | lifetimes and operationally easily recognizable remote ends of | |||
| links and node names. | links and node names. | |||
| 4.1. South Reflection | 5.1. South Reflection | |||
| South reflection is a mechanism that South Node TIEs are "reflected" | South reflection is a mechanism that South Node TIEs are "reflected" | |||
| back up north to allow nodes in same level without East-west links to | back up north to allow nodes in same level without east-west links to | |||
| "see" each other. | "see" each other. | |||
| For example, Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs | For example, Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs | |||
| from ToF21 to ToF22 separately. Respectively, | from ToF21 to ToF22 separately. Respectively, | |||
| Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs from ToF22 | Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs from ToF22 | |||
| to ToF21 separately. So ToF22 and ToF21 see each other's node | to ToF21 separately. So ToF22 and ToF21 see each other's node | |||
| information as level 2 nodes. | information as level 2 nodes. | |||
| In an equivalent fashion, as the result of the south reflection | In an equivalent fashion, as the result of the south reflection | |||
| between Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, | between Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, | |||
| Spine121 and Spine 122 knows each other at level 1. | Spine121 and Spine 122 knows each other at level 1. | |||
| 4.2. Suboptimal Routing on Link Failures | 5.2. Suboptimal Routing on Link Failures | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| | ToF21 | | ToF22 | LEVEL 2 | | ToF21 | | ToF22 | LEVEL 2 | |||
| ++--+-+-++ ++-+--+-++ | ++--+-+-++ ++-+--+-++ | |||
| | | | | | | | + | | | | | | | | + | |||
| | | | | | | | linkTS8 | | | | | | | | linkTS8 | |||
| +-------------+ | +-+linkTS3+-+ | | | +-------------+ | +------------+ | +-+linkTS3+-+ | | | +-------------+ | |||
| | | | | | | + | | | | | | | | + | | |||
| | +----------------------------+ | linkTS7 | | | +---------------------------+ | linkTS7 | | |||
| | | | | + + + | | | | | | + + + | | |||
| | | | +-------+linkTS4+------------+ | | | | | +-------+linkTS4+------------+ | | |||
| | | | + + | | | | | | | + + | | | | |||
| | | | +------------+--+ | | | | | | +-------------+--+ | | | |||
| | | | | | linkTS6 | | | | | | | | linkTS6 | | | |||
| +-+----+-+ +-----+--+ ++--------+ +-+----+-+ | +-+----+-+ +-+----+-+ ++--------+ +-+----+-+ | |||
| |Spine111| |Spine112| |Spine121 | |Spine122| LEVEL 1 | |Spine111| |Spine112| |Spine121 | |Spine122| LEVEL 1 | |||
| +-+---+--+ +----+---+ +-+---+---+ +-+---+--+ | +-+---+--+ +-+----+-+ +-+---+---+ +-+----+-+ | |||
| | | | | | | | | | | | | | | | | | | |||
| | +--------------+ | + ++XX+linkSL6+---+ + | | +-------------+ | + ++XX+linkSL6+---+ + | |||
| | | | | linkSL5 | | linkSL8 | | | | | linkSL5 | | linkSL8 | |||
| | +------------+ | | + +---+linkSL7+-+ | + | | +-----------+ | | + +---+linkSL7+-+ | + | |||
| | | | | | | | | | | | | | | | | | | |||
| +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | |||
| |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
| +-+-----+ ++------+ +-----+-+ +-+-----+ | +-+-----+ +-+-----+ +-----+-+ +-+-----+ | |||
| + + + + | + + + + | |||
| Prefix111 Prefix112 Prefix121 Prefix122 | Prefix111 Prefix112 Prefix121 Prefix122 | |||
| Figure 4: Suboptimal routing upon link failure use case | Figure 4: Suboptimal routing upon link failure use case | |||
| As shown in Figure 4, as the result of the south reflection between | As shown in Figure 4, as the result of the south reflection between | |||
| Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, Spine121 and | Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, Spine121 and | |||
| Spine 122 knows each other at level 1. | Spine 122 knows each other at level 1. | |||
| Without disaggregation mechanism, when linkSL6 fails, the packet from | Without disaggregation mechanism, when linkSL6 fails, the packet from | |||
| leaf121 to prefix122 will probably go up through linkSL5 to linkTS3 | leaf121 to prefix122 will probably go up through linkSL5 to linkTS3 | |||
| then go down through linkTS4 to linkSL8 to Leaf122 or go up through | then go down through linkTS4 to linkSL8 to Leaf122 or go up through | |||
| skipping to change at page 16, line 5 ¶ | skipping to change at page 18, line 5 ¶ | |||
| With disaggregation mechanism, when linkSL6 fails, Spine122 will | With disaggregation mechanism, when linkSL6 fails, Spine122 will | |||
| detect the failure according to the reflected node S-TIE from | detect the failure according to the reflected node S-TIE from | |||
| Spine121. Based on the disaggregation algorithm provided by RIFT, | Spine121. Based on the disaggregation algorithm provided by RIFT, | |||
| Spine122 will explicitly advertise prefix122 in Disaggregated Prefix | Spine122 will explicitly advertise prefix122 in Disaggregated Prefix | |||
| S-TIE PrefixesElement(prefix122, cost 1). The packet from leaf121 to | S-TIE PrefixesElement(prefix122, cost 1). The packet from leaf121 to | |||
| prefix122 will only be sent to linkSL7 following a longest-prefix | prefix122 will only be sent to linkSL7 following a longest-prefix | |||
| match to prefix 122 directly then go down through linkSL8 to Leaf122 | match to prefix 122 directly then go down through linkSL8 to Leaf122 | |||
| . | . | |||
| 4.3. Black-Holing on Link Failures | 5.3. Black-Holing on Link Failures | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| | ToF 21 | | ToF 22 | LEVEL 2 | | ToF 21 | | ToF 22 | LEVEL 2 | |||
| ++-+--+-++ ++-+--+-++ | ++-+--+-++ ++-+--+-++ | |||
| | | | | | | | + | | | | | | | | + | |||
| | | | | | | | linkTS8 | | | | | | | | linkTS8 | |||
| +--------------+ | +-+linkTS3+X+ | | | +--------------+ | +--------------+ | +-+linkTS3+X+ | | | +--------------+ | |||
| linkTS1 | | | | | + | | linkTS1 | | | | | + | | |||
| + +-----------------------------+ | linkTS7 | | + +-----------------------------+ | linkTS7 | | |||
| | | + | + + + | | | | + | + + + | | |||
| | | linkTS2 +-------+linkTS4+X+----------+ | | | | linkTS2 +-------+linkTS4+X+----------+ | | |||
| | + + + + | | | | | + + + + | | | | |||
| | linkTS5 +-+ +------------+--+ | | | | linkTS5 +-+ +------------+--+ | | | |||
| | + | | | linkTS6 | | | | + | | | linkTS6 | | | |||
| +-+----+-+ +-+----+-+ ++-------+ +-+-----++ | +-+----+-+ +-+----+-+ ++-------+ +-+-----++ | |||
| |Spine111| |Spine112| |Spine121| |Spine122| LEVEL 1 | |Spine111| |Spine112| |Spine121| |Spine122| LEVEL 1 | |||
| +-+---+--+ ++----+--+ +-+---+--+ +-+---+--+ | +-+---+--+ ++----+--+ +-+---+--+ +-+----+-+ | |||
| | | | | | | | | | | | | | | | | | | |||
| + +---------------+ | + +---+linkSL6+---+ + | + +---------------+ | + +---+linkSL6+---+ + | |||
| linkSL1 | | | linkSL5 | | linkSL8 | linkSL1 | | | linkSL5 | | linkSL8 | |||
| + +--+linkSL3+--+ | | + +---+linkSL7+-+ | + | + +--+linkSL3+--+ | | + +---+linkSL7+-+ | + | |||
| | | | | | | | | | | | | | | | | | | |||
| +-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | |||
| |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
| +-+-----+ ++------+ +-----+-+ +-+-----+ | +-+-----+ +-+-----+ +-----+-+ +-----+-+ | |||
| + + + + | + + + + | |||
| Prefix111 Prefix112 Prefix121 Prefix122 | Prefix111 Prefix112 Prefix121 Prefix122 | |||
| Figure 5: Black-holing upon link failure use case | Figure 5: Black-holing upon link failure use case | |||
| This scenario illustrates a case when double link failure occurs and | This scenario illustrates a case when double link failure occurs and | |||
| with that black-holing can happen. | with that black-holing can happen. | |||
| Without disaggregation mechanism, when linkTS3 and linkTS4 both fail, | Without disaggregation mechanism, when linkTS3 and linkTS4 both fail, | |||
| the packet from leaf111 to prefix122 would suffer 50% black-holing | the packet from leaf111 to prefix122 would suffer 50% black-holing | |||
| based on pure default route. The packet supposed to go up through | based on pure default route. The packet supposed to go up through | |||
| linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be | linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be | |||
| dropped. The packet supposed to go up through linkSL3 to linkTS2 | dropped. The packet supposed to go up through linkSL3 to linkTS2 | |||
| then go down through linkTS3 or linkTS4 will be dropped as well. | then go down through linkTS3 or linkTS4 will be dropped as well. | |||
| It's the case of black-holing. | It's the case of black-holing. | |||
| With disaggregation mechanism, when linkTS3 and linkTS4 both fail, | With disaggregation mechanism, when linkTS3 and linkTS4 both fail, | |||
| ToF22 will detect the failure according to the reflected node S-TIE | ToF22 will detect the failure according to the reflected node S-TIE | |||
| of ToF21 from Spine111\Spine112. Based on the disaggregation | of ToF21 from Spine111\Spine112. Based on the disaggregation | |||
| algorithm provided by RITF, ToF22 will explicitly originate an S-TIE | algorithm provided by RIFT, ToF22 will explicitly originate an S-TIE | |||
| with prefix 121 and prefix 122, that is flooded to spines 111, 112, | with prefix 121 and prefix 122, that is flooded to spines 111, 112, | |||
| 121 and 122. | 121 and 122. | |||
| The packet from leaf111 to prefix122 will not be routed to linkTS1 or | The packet from leaf111 to prefix122 will not be routed to linkTS1 or | |||
| linkTS2. The packet from leaf111 to prefix122 will only be routed to | linkTS2. The packet from leaf111 to prefix122 will only be routed to | |||
| linkTS5 or linkTS7 following a longest-prefix match to prefix122. | linkTS5 or linkTS7 following a longest-prefix match to prefix122. | |||
| 4.4. Zero Touch Provisioning (ZTP) | 5.4. Zero Touch Provisioning (ZTP) | |||
| RIFT is designed to require a very minimal configuration to simplify | RIFT is designed to require a very minimal configuration to simplify | |||
| its operation and avoid human errors; based on that minimal | its operation and avoid human errors; based on that minimal | |||
| information, Zero Touch Provisioning (ZTP) autoconfigures the key | information, Zero Touch Provisioning (ZTP) autoconfigures the key | |||
| operational parameters of all the RIFT nodes, that is, on the one | operational parameters of all the RIFT nodes, that is, on the one | |||
| hand, the SystemID of the node that must be unique in the RIFT | hand, the SystemID of the node that must be unique in the RIFT | |||
| network, and on the other hand the level of the node in the Fat Tree, | network, and on the other hand the level of the node in the Fat Tree, | |||
| which determines which peers are northwards "parents" and which are | which determines which peers are northwards "parents" and which are | |||
| southwards "children". | southwards "children". | |||
| skipping to change at page 17, line 40 ¶ | skipping to change at page 19, line 40 ¶ | |||
| ZTP requires that the administrator points out the Top-of-Fabric | ZTP requires that the administrator points out the Top-of-Fabric | |||
| (ToF) nodes to set the baseline from which the fabric topology is | (ToF) nodes to set the baseline from which the fabric topology is | |||
| derived. The Top-of-Fabric nodes are configured with TOP_OF_FABRIC | derived. The Top-of-Fabric nodes are configured with TOP_OF_FABRIC | |||
| flag which are initial 'seeds' needed for other ZTP nodes to derive | flag which are initial 'seeds' needed for other ZTP nodes to derive | |||
| their level in the topology. ZTP computes the level of each node | their level in the topology. ZTP computes the level of each node | |||
| based on the Highest Available Level (HAL) of the potential parent(s) | based on the Highest Available Level (HAL) of the potential parent(s) | |||
| nearest that baseline, which represents the superspine. In a | nearest that baseline, which represents the superspine. In a | |||
| fashion, RIFT can be seen as a distance-vector protocol that computes | fashion, RIFT can be seen as a distance-vector protocol that computes | |||
| a set of feasible successors towards the superspine and auto- | a set of feasible successors towards the superspine and auto- | |||
| configures the rest of the topology. In a fashion, RIFT can be seen | configures the rest of the topology. | |||
| as a distance-vector protocol that computes a set of feasible | ||||
| successors towards the superspine and auto-configures the rest of the | ||||
| topology. | ||||
| The autoconfiguration mechanism computes a global maximum of levels | The autoconfiguration mechanism computes a global maximum of levels | |||
| by diffusion. The derivation of the level of each node happens then | by diffusion. The derivation of the level of each node happens then | |||
| based on Link Information Elements (LIEs) received from its neighbors | based on Link Information Elements (LIEs) received from its neighbors | |||
| whereas each node (with possibly exceptions of configured leaves) | whereas each node (with possibly exceptions of configured leaves) | |||
| tries to attach at the highest possible point in the fabric. This | tries to attach at the highest possible point in the fabric. This | |||
| guarantees that even if the diffusion front reaches a node from | guarantees that even if the diffusion front reaches a node from | |||
| "below" faster than from "above", it will greedily abandon already | "below" faster than from "above", it will greedily abandon already | |||
| negotiated level derived from nodes topologically below it and | negotiated level derived from nodes topologically below it and | |||
| properly peer with nodes above. | properly peer with nodes above. | |||
| skipping to change at page 18, line 30 ¶ | skipping to change at page 20, line 20 ¶ | |||
| topology variants considered in RIFT. | topology variants considered in RIFT. | |||
| A RIFT node may also be configured to confine it to the leaf role | A RIFT node may also be configured to confine it to the leaf role | |||
| with the LEAF_ONLY flag. A leaf node can also be configured to | with the LEAF_ONLY flag. A leaf node can also be configured to | |||
| support leaf-2-leaf procedures with the LEAF_2_LEAF flag. In either | support leaf-2-leaf procedures with the LEAF_2_LEAF flag. In either | |||
| case the node cannot be TOP_OF_FABRIC and its level cannot be | case the node cannot be TOP_OF_FABRIC and its level cannot be | |||
| configured. RIFT will fully configure the node's level after it is | configured. RIFT will fully configure the node's level after it is | |||
| attached to the topology and ensure that the node is at the "bottom | attached to the topology and ensure that the node is at the "bottom | |||
| of the hierarchy" (southernmost). | of the hierarchy" (southernmost). | |||
| 4.5. Mis-cabling Examples | 5.5. Mis-cabling Examples | |||
| +----------------+ +-----------------+ | +----------------+ +-----------------+ | |||
| | ToF21 | +------+ ToF22 | LEVEL 2 | | ToF21 | +------+ ToF22 | LEVEL 2 | |||
| +-------+----+---+ | +----+---+--------+ | +-------+----+---+ | +----+---+--------+ | |||
| | | | | | | | | | | | | | | | | | | | | |||
| | | | +----------------------------+ | | | | | +----------------------------+ | | |||
| | +---------------------------+ | | | | | | +---------------------------+ | | | | | |||
| | | | | | | | | | | | | | | | | | | | | |||
| | | | | +-----------------------+ | | | | | | | +-----------------------+ | | | |||
| | | +------------------------+ | | | | | | +------------------------+ | | | | |||
| skipping to change at page 20, line 44 ¶ | skipping to change at page 22, line 40 ¶ | |||
| +-------+ +-------+ +-+-----+ +-+-----+ | +-------+ +-------+ +-+-----+ +-+-----+ | |||
| | | | | | | |||
| | +--------+ | | +--------+ | |||
| | | | | | | |||
| +-+---+-+ | +-+---+-+ | |||
| |Spine11| | |Spine11| | |||
| +-------+ | +-------+ | |||
| Figure 8: Fallen spine | Figure 8: Fallen spine | |||
| 4.6. Positive vs. Negative Disaggregation | 5.6. Positive vs. Negative Disaggregation | |||
| Disaggregation is the procedure whereby [RIFT] advertises a more | Disaggregation is the procedure whereby [RIFT] advertises a more | |||
| specific route southwards as an exception to the aggregated fabric- | specific route southwards as an exception to the aggregated fabric- | |||
| default north. Disaggregation is useful when a prefix within the | default north. Disaggregation is useful when a prefix within the | |||
| aggregation is reachable via some of the parents but not the others | aggregation is reachable via some of the parents but not the others | |||
| at the same level of the fabric. It is mandatory when the level is | at the same level of the fabric. It is mandatory when the level is | |||
| the ToF since a ToF node that cannot reach a prefix becomes a black | the ToF since a ToF node that cannot reach a prefix becomes a black | |||
| hole for that prefix. The hard problem is to know which prefixes are | hole for that prefix. The hard problem is to know which prefixes are | |||
| reachable by whom. | reachable by whom. | |||
| skipping to change at page 22, line 32 ¶ | skipping to change at page 24, line 24 ¶ | |||
| node(s) that announces a more-specific route attracts all the traffic | node(s) that announces a more-specific route attracts all the traffic | |||
| for that route and may suffer from a transient incast. A ToP node | for that route and may suffer from a transient incast. A ToP node | |||
| that defers injecting the longer prefix in the FIB, in order to | that defers injecting the longer prefix in the FIB, in order to | |||
| receive more advertisements and spread the packets better, also keeps | receive more advertisements and spread the packets better, also keeps | |||
| on sending a portion of the traffic to the black hole in the | on sending a portion of the traffic to the black hole in the | |||
| meantime. In the case of Negative Disaggregation, the last ToF | meantime. In the case of Negative Disaggregation, the last ToF | |||
| node(s) that injects the route may also incur an incast issue; this | node(s) that injects the route may also incur an incast issue; this | |||
| problem would occur if a prefix that becomes totally unreachable is | problem would occur if a prefix that becomes totally unreachable is | |||
| disaggregated, but doing so is mostly useless and is not recommended. | disaggregated, but doing so is mostly useless and is not recommended. | |||
| 4.7. Mobile Edge and Anycast | 5.7. Mobile Edge and Anycast | |||
| When a physical or a virtual node changes its point of attachement in | When a physical or a virtual node changes its point of attachement in | |||
| the fabric from a previous-leaf to a next-leaf, new routes must be | the fabric from a previous-leaf to a next-leaf, new routes must be | |||
| installed that supersede the old ones. Since the flooding flows | installed that supersede the old ones. Since the flooding flows | |||
| northwards, the nodes (if any) between the previous-leaf and the | northwards, the nodes (if any) between the previous-leaf and the | |||
| common parent are not immediately aware that the path via previous- | common parent are not immediately aware that the path via previous- | |||
| leaf is obsolete, and a stale route may exist for a while. The | leaf is obsolete, and a stale route may exist for a while. The | |||
| common parent needs to select the freshest route advertisement in | common parent needs to select the freshest route advertisement in | |||
| order to install the correct route via the next-leaf. This requires | order to install the correct route via the next-leaf. This requires | |||
| that the fabric determines the sequence of the movements of the | that the fabric determines the sequence of the movements of the | |||
| skipping to change at page 24, line 11 ¶ | skipping to change at page 25, line 43 ¶ | |||
| may be used to protect the ownership of an address [RFC8928]. When | may be used to protect the ownership of an address [RFC8928]. When | |||
| using [RFC8505], the parallel registration of an anycast address to | using [RFC8505], the parallel registration of an anycast address to | |||
| multiple leaves is done with the same sequence counter, whereas the | multiple leaves is done with the same sequence counter, whereas the | |||
| sequence counter is incremented when the point of attachement | sequence counter is incremented when the point of attachement | |||
| changes. This way, it is possible to differentiate a mobile node | changes. This way, it is possible to differentiate a mobile node | |||
| from a multihomed node, even when the mobility happens within the | from a multihomed node, even when the mobility happens within the | |||
| timing precision. It is also possible for a mobile node to be | timing precision. It is also possible for a mobile node to be | |||
| multihomed as well, e.g., to change only one of its points of | multihomed as well, e.g., to change only one of its points of | |||
| attachement. | attachement. | |||
| 4.8. IPv4 over IPv6 | 5.8. IPv4 over IPv6 | |||
| RIFT allows advertising IPv4 prefixes over IPv6 RIFT network. IPv6 | RIFT allows advertising IPv4 prefixes over IPv6 RIFT network. IPv6 | |||
| Address Family (AF) configures via the usual Neighbor Discovery (ND) | Address Family (AF) configures via the usual Neighbor Discovery (ND) | |||
| mechanisms and then V4 can use V6 nexthops analogous to [RFC5549]. | mechanisms and then V4 can use V6 nexthops analogous to [RFC8950]. | |||
| It is expected that the whole fabric supports the same type of | It is expected that the whole fabric supports the same type of | |||
| forwarding of address families on all the links. RIFT provides an | forwarding of address families on all the links. RIFT provides an | |||
| indication whether a node is v4 forwarding capable and | indication whether a node is v4 forwarding capable and | |||
| implementations are possible where different routing tables are | implementations are possible where different routing tables are | |||
| computed per address family as long as the computation remains loop- | computed per address family as long as the computation remains loop- | |||
| free. | free. | |||
| +-----+ +-----+ | +-----+ +-----+ | |||
| +---+---+ | ToF | | ToF | | +---+---+ | ToF | | ToF | | |||
| ^ +--+--+ +-----+ | ^ +--+--+ +-----+ | |||
| skipping to change at page 25, line 5 ¶ | skipping to change at page 26, line 34 ¶ | |||
| | | | | | | |||
| IPv4 prefixes| |IPv4 prefixes | IPv4 prefixes| |IPv4 prefixes | |||
| | | | | | | |||
| +---+----+ +---+----+ | +---+----+ +---+----+ | |||
| | V4 | | V4 | | | V4 | | V4 | | |||
| | subnet | | subnet | | | subnet | | subnet | | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| Figure 9: IPv4 over IPv6 | Figure 9: IPv4 over IPv6 | |||
| 4.9. In-Band Reachability of Nodes | 5.9. In-Band Reachability of Nodes | |||
| RIFT doesn't precondition that nodes of the fabric have reachable | RIFT doesn't precondition that nodes of the fabric have reachable | |||
| addresses. But the operational purposes to reach the internal nodes | addresses. But the operational purposes to reach the internal nodes | |||
| may exist. Figure 10 shows an example that the network management | may exist. Figure 10 shows an example that the network management | |||
| station (NMS) attaches to leaf1. | station (NMS) attaches to leaf1. | |||
| +-------+ +-------+ | +-------+ +-------+ | |||
| | ToF1 | | ToF2 | | | ToF1 | | ToF2 | | |||
| ++---- ++ ++-----++ | ++---- ++ ++-----++ | |||
| | | | | | | | | | | |||
| skipping to change at page 26, line 12 ¶ | skipping to change at page 28, line 5 ¶ | |||
| nodes, ToF2's loopback address must be disaggregated recursively all | nodes, ToF2's loopback address must be disaggregated recursively all | |||
| the way to the leaves. | the way to the leaves. | |||
| In a partitioned ToF, a TOF node is only reachable within its Plane, | In a partitioned ToF, a TOF node is only reachable within its Plane, | |||
| and the disaggregation to the leaves is also required. A possible | and the disaggregation to the leaves is also required. A possible | |||
| alternative is to use the ring that interconnects the ToF nodes to | alternative is to use the ring that interconnects the ToF nodes to | |||
| transmit packets between them for their loopback addresses only. The | transmit packets between them for their loopback addresses only. The | |||
| idea is that this is mostly control traffic and should not alter the | idea is that this is mostly control traffic and should not alter the | |||
| load balancing properties of the fabric. | load balancing properties of the fabric. | |||
| 4.10. Dual Homing Servers | 5.10. Dual Homing Servers | |||
| Each RIFT node may operate in Zero Touch Provisioning (ZTP) mode. It | Each RIFT node may operate in Zero Touch Provisioning (ZTP) mode. It | |||
| has no configuration (unless it is a Top-of-Fabric at the top of the | has no configuration (unless it is a Top-of-Fabric at the top of the | |||
| topology or the must operate in the topology as leaf and/or support | topology or the must operate in the topology as leaf and/or support | |||
| leaf-2-leaf procedures) and it will fully configure itself after | leaf-2-leaf procedures) and it will fully configure itself after | |||
| being attached to the topology. | being attached to the topology. | |||
| +---+ +---+ +---+ | +---+ +---+ +---+ | |||
| |ToF| |ToF| |ToF| ToF | |ToF| |ToF| |ToF| ToF | |||
| +---+ +---+ +---+ | +---+ +---+ +---+ | |||
| skipping to change at page 27, line 11 ¶ | skipping to change at page 29, line 5 ¶ | |||
| Rack) to all the leaves become not available. All the servers' | Rack) to all the leaves become not available. All the servers' | |||
| routes are disaggregated and the FIB of the servers will be expanded | routes are disaggregated and the FIB of the servers will be expanded | |||
| with n-1 more specific routes. | with n-1 more specific routes. | |||
| Sometimes, people may prefer to disaggregate from ToR to servers from | Sometimes, people may prefer to disaggregate from ToR to servers from | |||
| start on, i.e. the servers have couple tens of routes in FIB from | start on, i.e. the servers have couple tens of routes in FIB from | |||
| start on beside default routes to avoid breakages at rack level. | start on beside default routes to avoid breakages at rack level. | |||
| Full disaggregation of the fabric could be achieved by configuration | Full disaggregation of the fabric could be achieved by configuration | |||
| supported by RIFT. | supported by RIFT. | |||
| 4.11. Fabric With A Controller | 5.11. Fabric With A Controller | |||
| There are many different ways to deploy the controller. One | There are many different ways to deploy the controller. One | |||
| possibility is attaching a controller to the RIFT domain from ToF and | possibility is attaching a controller to the RIFT domain from ToF and | |||
| another possibility is attaching a controller from the leaf. | another possibility is attaching a controller from the leaf. | |||
| +------------+ | +------------+ | |||
| | Controller | | | Controller | | |||
| ++----------++ | ++----------++ | |||
| | | | | | | |||
| | | | | | | |||
| skipping to change at page 27, line 42 ¶ | skipping to change at page 29, line 36 ¶ | |||
| | | | | | | | | | | | | |||
| | | +-------------+ | | | | +-------------+ | | |||
| | | +--------+ | | | | | +--------+ | | | |||
| | | | | | | | | | | | | |||
| | +-----+ +-+---+ | | +-----+ +-+---+ | |||
| ------- |Leaf | | Leaf| | ------- |Leaf | | Leaf| | |||
| +-----+ +-----+ | +-----+ +-----+ | |||
| Figure 12: Fabric with a controller | Figure 12: Fabric with a controller | |||
| 4.11.1. Controller Attached to ToFs | 5.11.1. Controller Attached to ToFs | |||
| If a controller is attaching to the RIFT domain from ToF, it usually | If a controller is attaching to the RIFT domain from ToF, it usually | |||
| uses dual-homing connections. The loopback prefix of the controller | uses dual-homing connections. The loopback prefix of the controller | |||
| should be advertised down by the ToF and spine to leaves. If the | should be advertised down by the ToF and spine to leaves. If the | |||
| controller loses link to ToF, make sure the ToF withdraw the prefix | controller loses link to ToF, make sure the ToF withdraw the prefix | |||
| of the controller(use different mechanisms). | of the controller(use different mechanisms). | |||
| 4.11.2. Controller Attached to Leaf | 5.11.2. Controller Attached to Leaf | |||
| If the controller is attaching from a leaf to the fabric, no special | If the controller is attaching from a leaf to the fabric, no special | |||
| provisions are needed. | provisions are needed. | |||
| 4.12. Internet Connectivity With Underlay | 5.12. Internet Connectivity With Underlay | |||
| If global addressing is running without overlay, an external default | If global addressing is running without overlay, an external default | |||
| route needs to be advertised through RIFT fabric to achieve internet | route needs to be advertised through RIFT fabric to achieve internet | |||
| connectivity. For the purpose of forwarding of the entire RIFT | connectivity. For the purpose of forwarding of the entire RIFT | |||
| fabric, an internal fabric prefix needs to be advertised in the South | fabric, an internal fabric prefix needs to be advertised in the South | |||
| Prefix TIE by ToF and spine nodes. | Prefix TIE by ToF and spine nodes. | |||
| 4.12.1. Internet Default on the Leaf | 5.12.1. Internet Default on the Leaf | |||
| In case that an internet access request comes from a leaf and the | In case that an internet access request comes from a leaf and the | |||
| internet gateway is another leaf, the leaf node as the internet | internet gateway is another leaf, the leaf node as the internet | |||
| gateway needs to advertise a default route in its Prefix North TIE. | gateway needs to advertise a default route in its Prefix North TIE. | |||
| 4.12.2. Internet Default on the ToFs | 5.12.2. Internet Default on the ToFs | |||
| In case that an internet access request comes from a leaf and the | In case that an internet access request comes from a leaf and the | |||
| internet gateway is a ToF, the ToF and spine nodes need to advertise | internet gateway is a ToF, the ToF and spine nodes need to advertise | |||
| a default route in the Prefix South TIE. | a default route in the Prefix South TIE. | |||
| 4.13. Subnet Mismatch and Address Families | 5.13. Subnet Mismatch and Address Families | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| | | LIE LIE | | | | | LIE LIE | | | |||
| | A | +----> <----+ | B | | | A | +----> <----+ | B | | |||
| | +---------------------+ | | | +---------------------+ | | |||
| +--------+ +--------+ | +--------+ +--------+ | |||
| X/24 Y/24 | X/24 Y/24 | |||
| Figure 13: subnet mismatch | Figure 13: subnet mismatch | |||
| LIEs are exchanged over all links running RIFT to perform Link | LIEs are exchanged over all links running RIFT to perform Link | |||
| (Neighbor) Discovery. A node MUST NOT originate LIEs on an address | (Neighbor) Discovery. A node must NOT originate LIEs on an address | |||
| family if it does not process received LIEs on that family. LIEs on | family if it does not process received LIEs on that family. LIEs on | |||
| same link are considered part of the same negotiation independent on | same link are considered part of the same negotiation independent on | |||
| the address family they arrive on. An implementation MUST be ready | the address family they arrive on. An implementation must be ready | |||
| to accept TIEs on all addresses it used as source of LIE frames. | to accept TIEs on all addresses it used as source of LIE frames. | |||
| As shown in the above figure, without further checks adjacency of | As shown in the above figure, without further checks adjacency of | |||
| node A and B may form, but the forwarding between node A and node B | node A and B may form, but the forwarding between node A and node B | |||
| may fail because subnet X mismatches with subnet Y. | may fail because subnet X mismatches with subnet Y. | |||
| To prevent this a RIFT implementation should check for subnet | To prevent this a RIFT implementation should check for subnet | |||
| mismatch just like e.g. ISIS does. This can lead to scenarios where | mismatch just like e.g. ISIS does. This can lead to scenarios where | |||
| an adjacency, despite exchange of LIEs in both address families may | an adjacency, despite exchange of LIEs in both address families may | |||
| end up having an adjacency in a single AF only. This is a | end up having an adjacency in a single AF only. This is a | |||
| consideration especially in Section 4.8 scenarios. | consideration especially in Section 5.8 scenarios. | |||
| 4.14. Anycast Considerations | 5.14. Anycast Considerations | |||
| + traffic | + traffic | |||
| | | | | |||
| v | v | |||
| +------+------+ | +------+------+ | |||
| | ToF | | | ToF | | |||
| +---+-----+---+ | +---+-----+---+ | |||
| | | | | | | | | | | |||
| +------------+ | | +------------+ | +------------+ | | +------------+ | |||
| | | | | | | | | | | |||
| skipping to change at page 29, line 48 ¶ | skipping to change at page 31, line 48 ¶ | |||
| + traffic | + traffic | |||
| Figure 14: Anycast | Figure 14: Anycast | |||
| If the traffic comes from ToF to Leaf111 or Leaf121 which has anycast | If the traffic comes from ToF to Leaf111 or Leaf121 which has anycast | |||
| prefix PrefixA. RIFT can deal with this case well. But if the | prefix PrefixA. RIFT can deal with this case well. But if the | |||
| traffic comes from Leaf122, it arrives Spine21 or Spine22 at level 1. | traffic comes from Leaf122, it arrives Spine21 or Spine22 at level 1. | |||
| But Spine21 or Spine22 doesn't know another PrefixA attaching | But Spine21 or Spine22 doesn't know another PrefixA attaching | |||
| Leaf111. So it will always get to Leaf121 and never get to Leaf111. | Leaf111. So it will always get to Leaf121 and never get to Leaf111. | |||
| If the intension is that the traffic should been offloaded to | If the intension is that the traffic should been offloaded to | |||
| Leaf111, then use policy guided prefixes defined in "Routing in Fat | Leaf111, then use policy guided prefixes defined in RIFT [RIFT]. | |||
| Trees" [RIFT]. | ||||
| 4.15. IoT Applicability | 5.15. IoT Applicability | |||
| The design of RIFT inherits from RPL [RFC6550] the anisotropic design | The design of RIFT inherits from RPL [RFC6550] the anisotropic design | |||
| of a default route upwards (northwards); it also inherits the | of a default route upwards (northwards); it also inherits the | |||
| capability to inject external host routes at the Leaf level using | capability to inject external host routes at the Leaf level using | |||
| Wireless ND (WiND) [RFC8505][RFC8928] between a RIFT-agnostic host | Wireless ND (WiND) [RFC8505][RFC8928] between a RIFT-agnostic host | |||
| and a RIFT router. Both the RPL and the RIFT protocols are meant for | and a RIFT router. Both the RPL and the RIFT protocols are meant for | |||
| large scale, and WiND enables device mobility at the edge the same | large scale, and WiND enables device mobility at the edge the same | |||
| way in both cases. | way in both cases. | |||
| The main difference between RIFT and RPL is that with RPL, there's a | The main difference between RIFT and RPL is that with RPL, there's a | |||
| skipping to change at page 30, line 43 ¶ | skipping to change at page 32, line 43 ¶ | |||
| within bounded latency. | within bounded latency. | |||
| This could be alleviated with Packet Replication, Elimination and | This could be alleviated with Packet Replication, Elimination and | |||
| Reordering (PREOF) [RFC8655] leaf-2-leaf but PREOF is hard to provide | Reordering (PREOF) [RFC8655] leaf-2-leaf but PREOF is hard to provide | |||
| at the scale of all flows, and the replication may increase the | at the scale of all flows, and the replication may increase the | |||
| probability of the overload that it attempts to solve. | probability of the overload that it attempts to solve. | |||
| Note that the load balancing is not RIFT's problem, but it is key to | Note that the load balancing is not RIFT's problem, but it is key to | |||
| serve IoT adequately. | serve IoT adequately. | |||
| 4.16. Key Management | 5.16. Key Management | |||
| As outlined in Section "Security Considerations" of [RIFT], either a | As outlined in Section "Security Considerations" of [RIFT], either a | |||
| private shared key or a public/private key pair is used to | private shared key or a public/private key pair is used to | |||
| authenticate the adjacency. Both the key distribution and key | authenticate the adjacency. Both the key distribution and key | |||
| synchronization methods are out of scope for this document. Both | synchronization methods are out of scope for this document. Both | |||
| nodes in the adjacency must share the same keys, key type, and | nodes in the adjacency must share the same keys, key type, and | |||
| algorithm for a given key ID. Mismatched keys will not inter-operate | algorithm for a given key ID. Mismatched keys will not inter-operate | |||
| as their security envelopes will be unverifiable. | as their security envelopes will be unverifiable. | |||
| Key roll-over while the adjacency is active MAY be supported. The | Key roll-over while the adjacency is active may be supported. The | |||
| specific mechanism is well documented in [RFC6518]. | specific mechanism is well documented in [RFC6518]. | |||
| 5. Security Considerations | 6. Security Considerations | |||
| This document presents applicability of RIFT. As such, it does not | This document presents applicability of RIFT. As such, it does not | |||
| introduce any security considerations. However, there are a number | introduce any security considerations. However, there are a number | |||
| of security concerns at [RIFT]. | of security concerns at [RIFT]. | |||
| 6. Contributors | 7. IANA Considerations | |||
| This document has no IANA actions. | ||||
| 8. Contributors | ||||
| The following people (listed in alphabetical order) contributed | The following people (listed in alphabetical order) contributed | |||
| significantly to the content of this document and should be | significantly to the content of this document and should be | |||
| considered co-authors: | considered co-authors: | |||
| Tom Verhaeg | ||||
| Juniper Networks | ||||
| Email: tverhaeg@juniper.net | ||||
| Tony Przygienda | Tony Przygienda | |||
| Juniper Networks | Juniper Networks | |||
| 1194 N. Mathilda Ave | 1194 N. Mathilda Ave | |||
| Sunnyvale, CA 94089 | Sunnyvale, CA 94089 | |||
| US | US | |||
| Email: prz@juniper.net | Email: prz@juniper.net | |||
| 7. Normative References | 9. Normative References | |||
| [ISO10589-Second-Edition] | [ISO10589-Second-Edition] | |||
| International Organization for Standardization, | International Organization for Standardization, | |||
| "Intermediate system to Intermediate system intra-domain | "Intermediate system to Intermediate system intra-domain | |||
| routeing information exchange protocol for use in | routeing information exchange protocol for use in | |||
| conjunction with the protocol for providing the | conjunction with the protocol for providing the | |||
| connectionless-mode Network Service (ISO 8473)", November | connectionless-mode Network Service (ISO 8473)", November | |||
| 2002. | 2002. | |||
| [TR-384] Broadband Forum Technical Report, "TR-384 Cloud Central | [TR-384] Broadband Forum Technical Report, "TR-384 Cloud Central | |||
| skipping to change at page 32, line 16 ¶ | skipping to change at page 34, line 28 ¶ | |||
| Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", | Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", | |||
| RFC 5357, DOI 10.17487/RFC5357, October 2008, | RFC 5357, DOI 10.17487/RFC5357, October 2008, | |||
| <https://www.rfc-editor.org/info/rfc5357>. | <https://www.rfc-editor.org/info/rfc5357>. | |||
| [RFC7130] Bhatia, M., Ed., Chen, M., Ed., Boutros, S., Ed., | [RFC7130] Bhatia, M., Ed., Chen, M., Ed., Boutros, S., Ed., | |||
| Binderberger, M., Ed., and J. Haas, Ed., "Bidirectional | Binderberger, M., Ed., and J. Haas, Ed., "Bidirectional | |||
| Forwarding Detection (BFD) on Link Aggregation Group (LAG) | Forwarding Detection (BFD) on Link Aggregation Group (LAG) | |||
| Interfaces", RFC 7130, DOI 10.17487/RFC7130, February | Interfaces", RFC 7130, DOI 10.17487/RFC7130, February | |||
| 2014, <https://www.rfc-editor.org/info/rfc7130>. | 2014, <https://www.rfc-editor.org/info/rfc7130>. | |||
| [RFC5549] Le Faucheur, F. and E. Rosen, "Advertising IPv4 Network | [RFC8950] Litkowski, S., Agrawal, S., Ananthamurthy, K., and K. | |||
| Layer Reachability Information with an IPv6 Next Hop", | Patel, "Advertising IPv4 Network Layer Reachability | |||
| RFC 5549, DOI 10.17487/RFC5549, May 2009, | Information (NLRI) with an IPv6 Next Hop", RFC 8950, | |||
| <https://www.rfc-editor.org/info/rfc5549>. | DOI 10.17487/RFC8950, November 2020, | |||
| <https://www.rfc-editor.org/info/rfc8950>. | ||||
| [RFC6518] Lebovitz, G. and M. Bhatia, "Keying and Authentication for | [RFC6518] Lebovitz, G. and M. Bhatia, "Keying and Authentication for | |||
| Routing Protocols (KARP) Design Guidelines", RFC 6518, | Routing Protocols (KARP) Design Guidelines", RFC 6518, | |||
| DOI 10.17487/RFC6518, February 2012, | DOI 10.17487/RFC6518, February 2012, | |||
| <https://www.rfc-editor.org/info/rfc6518>. | <https://www.rfc-editor.org/info/rfc6518>. | |||
| [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., | [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., | |||
| Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, | Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, | |||
| JP., and R. Alexander, "RPL: IPv6 Routing Protocol for | JP., and R. Alexander, "RPL: IPv6 Routing Protocol for | |||
| Low-Power and Lossy Networks", RFC 6550, | Low-Power and Lossy Networks", RFC 6550, | |||
| skipping to change at page 32, line 44 ¶ | skipping to change at page 35, line 10 ¶ | |||
| Bormann, "Neighbor Discovery Optimization for IPv6 over | Bormann, "Neighbor Discovery Optimization for IPv6 over | |||
| Low-Power Wireless Personal Area Networks (6LoWPANs)", | Low-Power Wireless Personal Area Networks (6LoWPANs)", | |||
| RFC 6775, DOI 10.17487/RFC6775, November 2012, | RFC 6775, DOI 10.17487/RFC6775, November 2012, | |||
| <https://www.rfc-editor.org/info/rfc6775>. | <https://www.rfc-editor.org/info/rfc6775>. | |||
| [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, | [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, | |||
| "Deterministic Networking Architecture", RFC 8655, | "Deterministic Networking Architecture", RFC 8655, | |||
| DOI 10.17487/RFC8655, October 2019, | DOI 10.17487/RFC8655, October 2019, | |||
| <https://www.rfc-editor.org/info/rfc8655>. | <https://www.rfc-editor.org/info/rfc8655>. | |||
| [RIFT] Przygienda, T., Sharma, A., Thubert, P., Rijsman, B., and | [RIFT] Sharma, A., Thubert, P., Rijsman, B., and D. Afanasiev, | |||
| D. Afanasiev, "RIFT: Routing in Fat Trees", Work in | "RIFT: Routing in Fat Trees", Work in Progress, Internet- | |||
| Progress, Internet-Draft, draft-ietf-rift-rift-12, 26 May | Draft, draft-ietf-rift-rift-13, 12 July 2021, | |||
| 2020, | <https://datatracker.ietf.org/doc/html/draft-ietf-rift- | |||
| <https://tools.ietf.org/html/draft-ietf-rift-rift-12>. | rift-13>. | |||
| [I-D.white-distoptflood] | ||||
| White, R., Hegde, S., and S. Zandi, "IS-IS Optimal | ||||
| Distributed Flooding for Dense Topologies", Work in | ||||
| Progress, Internet-Draft, draft-white-distoptflood-04, 27 | ||||
| July 2020, | ||||
| <https://tools.ietf.org/html/draft-white-distoptflood-04>. | ||||
| 8. Informative References | 10. Informative References | |||
| [IEEEstd1588] | [IEEEstd1588] | |||
| IEEE standard for Information Technology, "IEEE Standard | IEEE standard for Information Technology, "IEEE Standard | |||
| for a Precision Clock Synchronization Protocol for | for a Precision Clock Synchronization Protocol for | |||
| Networked Measurement and Control Systems", | Networked Measurement and Control Systems", | |||
| <https://standards.ieee.org/standard/1588-2019.html>. | <https://standards.ieee.org/standard/1588-2019.html>. | |||
| [CLOS] Yuan, X., "On Nonblocking Folded-Clos Networks in Computer | [CLOS] Yuan, X., "On Nonblocking Folded-Clos Networks in Computer | |||
| Communication Environments", IEEE International Parallel & | Communication Environments", IEEE International Parallel & | |||
| Distributed Processing Symposium, 2011. | Distributed Processing Symposium, 2011. | |||
| skipping to change at page 34, line 37 ¶ | skipping to change at page 36, line 45 ¶ | |||
| Pascal Thubert | Pascal Thubert | |||
| Cisco Systems, Inc | Cisco Systems, Inc | |||
| Building D | Building D | |||
| 45 Allee des Ormes - BP1200 | 45 Allee des Ormes - BP1200 | |||
| 06254 MOUGINS - Sophia Antipolis | 06254 MOUGINS - Sophia Antipolis | |||
| France | France | |||
| Phone: +33 497 23 26 34 | Phone: +33 497 23 26 34 | |||
| Email: pthubert@cisco.com | Email: pthubert@cisco.com | |||
| Tom Verhaeg | ||||
| Juniper Networks | ||||
| Email: tverhaeg@juniper.net | ||||
| Jaroslaw Kowalczyk | Jaroslaw Kowalczyk | |||
| Orange Polska | Orange Polska | |||
| Email: jaroslaw.kowalczyk2@orange.com | Email: jaroslaw.kowalczyk2@orange.com | |||
| End of changes. 75 change blocks. | ||||
| 189 lines changed or deleted | 263 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||