< draft-ietf-rift-applicability-04.txt   draft-ietf-rift-applicability-05.txt >
RIFT WG Yuehua. Wei, Ed. RIFT WG Yuehua. Wei, Ed.
Internet-Draft Zheng. Zhang Internet-Draft Zheng. Zhang
Intended status: Informational ZTE Corporation Intended status: Informational ZTE Corporation
Expires: 24 July 2021 Dmitry. Afanasiev Expires: 28 October 2021 Dmitry. Afanasiev
Yandex Yandex
P. Thubert
Cisco Systems
Tom. Verhaeg Tom. Verhaeg
Juniper Networks Juniper Networks
Jaroslaw. Kowalczyk Jaroslaw. Kowalczyk
Orange Polska Orange Polska
P. Thubert 26 April 2021
Cisco Systems
20 January 2021
RIFT Applicability RIFT Applicability
draft-ietf-rift-applicability-04 draft-ietf-rift-applicability-05
Abstract Abstract
This document discusses the properties, applicability and operational This document discusses the properties, applicability and operational
considerations of RIFT in different network scenarios. It intends to considerations of RIFT in different network scenarios. It intends to
provide a rough guide how RIFT can be deployed to simplify routing provide a rough guide how RIFT can be deployed to simplify routing
operations in Clos topologies and their variations. operations in Clos topologies and their variations.
Status of This Memo Status of This Memo
skipping to change at page 1, line 41 skipping to change at page 1, line 41
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 24 July 2021. This Internet-Draft will expire on 28 October 2021.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 23 skipping to change at page 2, line 23
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Problem Statement of Routing in Modern IP Fabric Fat Tree 2. Problem Statement of Routing in Modern IP Fabric Fat Tree
Networks . . . . . . . . . . . . . . . . . . . . . . . . 3 Networks . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 3 3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 3
3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 4 3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 4
3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 6 3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 6
3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 6 3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 6
3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 6 3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 7
3.2.3. Generalizing to any Directed Acyclic Graph . . . . . 7 3.2.3. Generalizing to any Directed Acyclic Graph . . . . . 7
3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.1. Data Center Fabrics . . . . . . . . . . . . . . . . . 8 3.3.1. Data Center Fabrics . . . . . . . . . . . . . . . . . 8
3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 8 3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 9
3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 8 3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 9
3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 9 3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 9
3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 9
4. Deployment Considerations . . . . . . . . . . . . . . . . . . 11 4. Operational Considerations . . . . . . . . . . . . . . . . . 11
4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 12 4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 12
4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 12 4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 12
4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 14 4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 14
4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 15 4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 15
4.5. Mis-cabling Examples . . . . . . . . . . . . . . . . . . 15 4.5. Mis-cabling Examples . . . . . . . . . . . . . . . . . . 16
4.6. Positive vs. Negative Disaggregation . . . . . . . . . . 17 4.6. Positive vs. Negative Disaggregation . . . . . . . . . . 18
4.7. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 19 4.7. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 20
4.8. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 21 4.8. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 21
4.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 22 4.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 22
4.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 23 4.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 24
4.11. Fabric With A Controller . . . . . . . . . . . . . . . . 24 4.11. Fabric With A Controller . . . . . . . . . . . . . . . . 25
4.11.1. Controller Attached to ToFs . . . . . . . . . . . . 24 4.11.1. Controller Attached to ToFs . . . . . . . . . . . . 25
4.11.2. Controller Attached to Leaf . . . . . . . . . . . . 25 4.11.2. Controller Attached to Leaf . . . . . . . . . . . . 25
4.12. Internet Connectivity With Underlay . . . . . . . . . . . 25 4.12. Internet Connectivity With Underlay . . . . . . . . . . . 26
4.12.1. Internet Default on the Leaf . . . . . . . . . . . . 25 4.12.1. Internet Default on the Leaf . . . . . . . . . . . . 26
4.12.2. Internet Default on the ToFs . . . . . . . . . . . . 25 4.12.2. Internet Default on the ToFs . . . . . . . . . . . . 26
4.13. Subnet Mismatch and Address Families . . . . . . . . . . 25 4.13. Subnet Mismatch and Address Families . . . . . . . . . . 26
4.14. Anycast Considerations . . . . . . . . . . . . . . . . . 26 4.14. Anycast Considerations . . . . . . . . . . . . . . . . . 27
4.15. IoT Applicability . . . . . . . . . . . . . . . . . . . . 27 4.15. IoT Applicability . . . . . . . . . . . . . . . . . . . . 28
5. Security Considerations . . . . . . . . . . . . . . . . . . . 27 5. Security Considerations . . . . . . . . . . . . . . . . . . . 28
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 28 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 29
7. Normative References . . . . . . . . . . . . . . . . . . . . 28 7. Normative References . . . . . . . . . . . . . . . . . . . . 29
8. Informative References . . . . . . . . . . . . . . . . . . . 29 8. Informative References . . . . . . . . . . . . . . . . . . . 30
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 31
1. Introduction 1. Introduction
This document intends to explain the properties and applicability of This document discusses the properties and applicability of "Routing
"Routing in Fat Trees" [RIFT] in different deployment scenarios and in Fat Trees" [RIFT] (RIFT) in different deployment scenarios and
highlight the operational simplicity of the technology compared to highlights the operational simplicity of the technology compared to
traditional routing solutions. It also documents special traditional routing solutions. It also documents special
considerations when RIFT is used with or without overlays, with or considerations when RIFT is used with or without overlays and/or
without controllers, corrects topology mis-cablings, and node or link controllers, and how RIFT corrects topology mis-cablings and/or node
failures. and link failures.
2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks 2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks
Clos [CLOS] and fat tree [FATTREE] topologies have gained prominence Clos [CLOS] and fat tree [FATTREE] topologies have gained prominence
in today's networking, primarily as a result of the paradigm shift in today's networking, primarily as a result of the paradigm shift
towards a centralized data-center based architecture that deliver a towards a centralized data-center based architecture that deliver a
majority of computation and storage services. majority of computation and storage services.
Today's current routing protocols were geared towards a network with Today's current routing protocols were geared towards a network with
an irregular topology and low degree of connectivity originally. an irregular topology with isotropic properties, and low degree of
When they are applied to fat tree topologies: connectivity. When applied to Fat Tree topologies:
* They tend to need extensive configuration or provisioning during * They tend to need extensive configuration or provisioning during
bring up and re-dimensioning. bring up and re-dimensioning.
* Spine and leaf nodes have the entire network topology and routing * All nodes including spine and leaf nodes learn the entire network
information which is in fact not needed on the leaf nodes during topology and routing information, which is in fact, not needed on
normal operation. the leaf nodes during normal operation.
* Significant Link State PDUs (LSPs) flooding duplication between * Significant link-state PDUs (LSPs) flooding duplication between
spine nodes and leaf nodes occurs during network bring up and spine nodes and leaf nodes occurs during network bring up and
topology updates. It consumes both spine and leaf nodes' CPU and topology updates.
link bandwidth resources.
* This consumes both CPU and link bandwidth resources which prevents
the use of cheaper hardware at the lower levels (leaf and spine)
and reduces the scalability and reactivity.of the network.
3. Applicability of RIFT to Clos IP Fabrics 3. Applicability of RIFT to Clos IP Fabrics
Further content of this document assumes that the reader is familiar Further content of this document assumes that the reader is familiar
with the terms and concepts used in OSPF [RFC2328] and IS-IS with the terms and concepts used in OSPF [RFC2328] and IS-IS
[ISO10589-Second-Edition] link-state protocols. The sections of RIFT [ISO10589-Second-Edition] link-state protocols. The sections of RIFT
[RIFT] outline the requirements of routing in IP fabrics and RIFT [RIFT] outline the requirements of routing in IP fabrics and RIFT
protocol concepts. protocol concepts.
3.1. Overview of RIFT 3.1. Overview of RIFT
RIFT is a dynamic routing protocol for Clos and fat tree network RIFT is a dynamic routing protocol that is specifically tailored for
topologies. It defines a link-state protocol when "pointing north" use in Clos and Fat Tree network topologies. A core property of RIFT
and path-vector protocol when "pointing south". is that its operation is sensitive to the structure of the fabric -
it is anisotropic. RIFT acts as a link-state protocol when "pointing
north" - advertising southwards routes to northwards peer routers
(parents) through flooding and database synchronization- but operates
hop-by-hop like a distance-vector protocol when "pointing south" -
typically advertising a fabric default route directed towards the Top
of Fabric (ToF, aka superspine) to southwards peer routers (children)
-.
It floods flat link-state information northbound only so that each RIFT floods flat link-state information northbound only so that each
level obtains the full topology of levels south of it. That level obtains the full topology of levels south of it. That
information is never flooded east-west or back south again. So a top information is never flooded east-west or back south again. So a top
tier node has full set of prefixes from the Shortest Path First (SPF) tier node has full set of prefixes from the Shortest Path First (SPF)
calculation. calculation.
In the southbound direction, the protocol operates like a "fully In the southbound direction, the protocol operates like a "fully
summarizing, unidirectional" path vector protocol or rather a summarizing, unidirectional" path-vector protocol or rather a
distance vector with implicit split horizon. Routing information, distance-vector with implicit split horizon. Routing information,
normally just the default route, propagates one hop south and is 're- normally just the default route, propagates one hop south and is 're-
advertised' by nodes at next lower level. advertised' by nodes at next lower level.
+-----------+ +-----------+ +-----------+ +-----------+
| ToF | | ToF | LEVEL 2 | ToF | | ToF | LEVEL 2
+ +-----+--+--+ +-+--+------+ + +-----+--+--+ +-+--+------+
| | | | | | | | | ^ | | | | | | | | | ^
+ | | | +-------------------------+ | + | | | +-------------------------+ |
Distance | +-------------------+ | | | | | Distance | +-------------------+ | | | | |
Vector | | | | | | | | + Vector | | | | | | | | +
South | | | | +--------+ | | | Link-state South | | | | +--------+ | | | Link-State
+ | | | | | | | | Flooding + | | | | | | | | Flooding
| | | +-------------+ | | | North | | | +-------------+ | | | North
v | | | | | | | | + v | | | | | | | | +
+-+--+-+ +------+ +-------+ +--+--+-+ | +-+--+-+ +------+ +-------+ +--+--+-+ |
|SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1 |SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1
+ ++----++ ++---+-+ +--+--+-+ ++----+-+ | + ++----++ ++---+-+ +--+--+-+ ++----+-+ |
+ | | | | | | | | | ^ N + | | | | | | | | | ^ N
Distance | +-------+ | | +--------+ | | | E Distance | +-------+ | | +--------+ | | | E
Vector | | | | | | | | | +------> Vector | | | | | | | | | +------>
South | +-------+ | | | +-------+ | | | | South | +-------+ | | | +-------+ | | | |
skipping to change at page 4, line 45 skipping to change at page 5, line 4
|SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1 |SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1
+ ++----++ ++---+-+ +--+--+-+ ++----+-+ | + ++----++ ++---+-+ +--+--+-+ ++----+-+ |
+ | | | | | | | | | ^ N + | | | | | | | | | ^ N
Distance | +-------+ | | +--------+ | | | E Distance | +-------+ | | +--------+ | | | E
Vector | | | | | | | | | +------> Vector | | | | | | | | | +------>
South | +-------+ | | | +-------+ | | | | South | +-------+ | | | +-------+ | | | |
+ | | | | | | | | | + + | | | | | | | | | +
v ++--++ +-+-++ ++-+-+ +-+--++ + v ++--++ +-+-++ ++-+-+ +-+--++ +
|LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0 |LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0
+----+ +----+ +----+ +-----+ +----+ +----+ +----+ +-----+
Figure 1: Rift overview Figure 1: Rift overview
A spine node has only information necessary for its level, which is A spine node has only information necessary for its level, which is
all destinations south of the node based on SPF calculation, default all destinations south of the node based on SPF calculation, default
route, and potential disaggregated routes. route, and potential disaggregated routes.
RIFT combines the advantage of both link-state and distance vector: RIFT combines the advantage of both link-state and distance-vector:
* Fastest possible convergence * Fastest possible convergence
* Automatic detection of topology * Automatic detection of topology
* Minimal routes/info on tors * Minimal routes/info on Top-of-Rack (ToR) switches, aka leaf nodes
* High degree of ECMP * High degree of ECMP
* Fast de-commissioning of nodes * Fast de-commissioning of nodes
* Maximum Propagation speed with flexible prefixes in an update * Maximum propagation speed with flexible prefixes in an update
And RIFT eliminates the disadvantages of link-state or distance
vector:
* Reduced and balanced flooding
* Automatic neighbor detection
So there are two types of link-state database which are "north So there are two types of link-state database which are "north
representation" North Topology Information Elements (N-TIEs) and representation" North Topology Information Elements (N-TIEs) and
"south representation" South Topology Information Elements (S-TIEs). "south representation" South Topology Information Elements (S-TIEs).
The N-TIEs contain a link-state topology description of lower levels The N-TIEs contain a link-state topology description of lower levels
and S-TIEs carry simply default routes for the lower levels. and S-TIEs carry simply default routes for the lower levels.
There are more advantages unique to RIFT listed below which could be RIFT also eliminates major disadvantages of link-state and distance-
understood if you read the details of RIFT [RIFT]. vector with:
* Reduced and balanced flooding
* Automatic neighbor detection
To achieve this, RIFT builds on the art of IGPs, not only OSPF and
IS-IS but also MANET and IoT, to provide unique features:
* Automatic (positive or negative) route disaggregation of
northwards routes upon fallen leaves
* Recursive operation in the case of negative route disaggregation
* Anisotropic routing that extends a principle seen in RPL [RFC6550]
to wide superspines
* Optimal Flooding Reduction that derives from the concept of a
"multipoint relay" (MPR) found in OLSR [RFC3626] and balances the
flooding load over northbound links and nodes.
Additional advantages that are unique to RIFT are listed below, the
details of which can be found in RIFT [RIFT].
* True ZTP * True ZTP
* Minimal blast radius on failures * Minimal blast radius on failures
* Can utilize all paths through fabric without looping * Can utilize all Paths through fabric without looping
* Automatic disaggregation on failures
* Simple leaf implementation that can scale down to servers * Simple leaf implementation that can scale down to servers
* Key-Value store * Key-Value store
* Horizontal links used for protection only * Horizontal links used for protection only
* Supports non-equal cost multipath and can replace MC-LAG * Supports non-equal cost multipath (NECMP) and can replace multi-
chassis link aggregation group (MLAG or MC-LAG)
* Optimal flooding reduction and load-balancing
3.2. Applicable Topologies 3.2. Applicable Topologies
Albeit RIFT is specified primarily for "proper" Clos or "fat tree" Albeit RIFT is specified primarily for "proper" Clos or Fat Tree
structures, it already supports Points of Delivery (PoD) concepts topologies, the protocol natively supports Points of Delivery (PoD)
which are strictly speaking not found in original Clos concepts. concepts, which, strictly speaking, are not found in the original
Clos concept.
Further, the specification explains and supports operations of multi- Further, the specification explains and supports operations of multi-
plane Clos variants where the protocol relies on set of rings to plane Clos variants where the protocol recommends the use of inter-
allow the reconciliation of topology view of different planes as most plane rings at the Top-of-Fabric level to allow the reconciliation of
desirable solution making proper disaggregation viable in case of topology view of different planes to make the negative disaggregation
failures. These observations hold not only in case of RIFT but also viable in case of failures within a plane. These observations hold
in the generic case of dynamic routing on Clos variants with multiple not only in case of RIFT but also in the generic case of dynamic
planes and failures in bi-sectional bandwidth, especially on the routing on Clos variants with multiple planes and failures in bi-
leafs. sectional bandwidth, especially on the leafs.
3.2.1. Horizontal Links 3.2.1. Horizontal Links
RIFT is not limited to pure Clos divided into PoD and multi-planes RIFT is not limited to pure Clos divided into PoD and multi-planes
but supports horizontal links below the top of fabric level. Those but supports horizontal (East-West) links below the top of fabric
links are used only as routes of last resort northbound when a spine level. Those links are used only for last resort northbound routes
loses all northbound links or cannot compute a default route through when a spine loses all its northbound links or cannot compute a
them. default route through them.
A possible configuration is a "ring" of horizontal links at a level. A possible configuration is a "ring" of horizontal links at a level.
In presence of such a "ring" in any level (except Top of Fabric (ToF) In presence of such a "ring" in any level (except Top of Fabric (ToF)
level) neither North SPF (N-SPF) nor South SPF (S-SPF) will provide a level) neither North SPF (N-SPF) nor South SPF (S-SPF) will provide a
"ring-based protection" scheme since such a computation would have to "ring-based protection" scheme since such a computation would have to
deal necessarily with breaking of "loops" in Dijkstra sense; an deal necessarily with breaking of "loops" in Dijkstra sense; an
application for which RIFT is not intended. application for which RIFT is not intended.
A full-mesh connectivity between nodes on the same level can be A full-mesh connectivity between nodes on the same level can be
employed and that allows N-SPF to provide for any node loosing all employed and that allows N-SPF to provide for any node loosing all
skipping to change at page 6, line 48 skipping to change at page 7, line 25
level are northbound connected) to still participate in northbound level are northbound connected) to still participate in northbound
forwarding. forwarding.
3.2.2. Vertical Shortcuts 3.2.2. Vertical Shortcuts
Through relaxations of the specified adjacency forming rules, RIFT Through relaxations of the specified adjacency forming rules, RIFT
implementations can be extended to support vertical "shortcuts" as implementations can be extended to support vertical "shortcuts" as
proposed by e.g. [I-D.white-distoptflood]. The RIFT specification proposed by e.g. [I-D.white-distoptflood]. The RIFT specification
itself does not provide the exact details since the resulting itself does not provide the exact details since the resulting
solution suffers from either much larger blast radius with increased solution suffers from either much larger blast radius with increased
flooding volumes or in case of maximum aggregation routing bow-tie flooding volumes or in case of maximum aggregation routing, bow-tie
problems. problems.
3.2.3. Generalizing to any Directed Acyclic Graph 3.2.3. Generalizing to any Directed Acyclic Graph
RIFT is an anisotropic routing protocol, meaning that it has a sense RIFT is an anisotropic routing protocol, meaning that it has a sense
of direction (northbound, southbound, east-west) and that it operates of direction (northbound, southbound, east-west) and that it operates
differently depending on the direction. differently depending on the direction.
* Northbound, RIFT operates as a link-state IGP, whereby the control * Northbound, RIFT operates as a link-state protocol, whereby the
packets are reflooded first all the way north and only interpreted control packets are reflooded first all the way north and only
later. All the individual fine grained routes are advertised. interpreted later. All the individual fine grained routes are
advertised.
* Southbound, RIFT operates as a distance vector IGP, whereby the * Southbound, RIFT operates as a distance-vector protocol, whereby
control packets are flooded only one hop, interpreted, and the the control packets are flooded only one-hop, interpreted, and the
consequence of that computation is what gets flooded one more hop consequence of that computation is what gets flooded one more hop
south. In the most common use-cases, a ToF node can reach most of south. In the most common use-cases, a ToF node can reach most of
the prefixes in the fabric. If that is the case, the ToF node the prefixes in the fabric. If that is the case, the ToF node
advertises the fabric default and disaggregates the prefixes that advertises the fabric default and disaggregates the prefixes that
it cannot reach. On the other hand, a ToF node that can reach it cannot reach. On the other hand, a ToF node that can reach
only a small subset of the prefixes in the fabric will preferably only a small subset of the prefixes in the fabric will preferably
advertise those prefixes and refrain from aggregating. advertise those prefixes and refrain from aggregating.
In the general case, what gets advertised south is in more In the general case, what gets advertised south is in more
details: details:
skipping to change at page 7, line 39 skipping to change at page 8, line 16
reachable within the fabric, and that could be a default route reachable within the fabric, and that could be a default route
or a prefix that is dedicated to this particular fabric. or a prefix that is dedicated to this particular fabric.
2. The loopback addresses of the northbound nodes, e.g., for 2. The loopback addresses of the northbound nodes, e.g., for
inband management. inband management.
3. The disaggregated prefixes for the dynamic exceptions to the 3. The disaggregated prefixes for the dynamic exceptions to the
fabric default, advertised to route around the black hole that fabric default, advertised to route around the black hole that
may form. may form.
* East-west routing can optionally be used, with specific * East-West routing can optionally be used, with specific
restrictions. It is useful in particular when a sibling has restrictions. It is used when a sibling has access to the fabric
access to the fabric default but this node does not. default but this node does not.
A Directed Acyclic Graph (DAG) provides a sense of north (the A Directed Acyclic Graph (DAG) provides a sense of north (the
direction of the DAG) and of south (the reverse), which can be used direction of the DAG) and of south (the reverse), which can be used
to apply RIFT. For the purpose of RIFT, an edge in the DAG that has to apply RIFT. For the purpose of RIFT, an edge in the DAG that has
only incoming vertices is a ToF node. only incoming vertices is a ToF node.
There are a number of caveats though: There are a number of caveats though:
* The DAG structure must exist before RIFT starts, so there is a * The DAG structure must exist before RIFT starts, so there is a
need for a companion protocol to establish the logical DAG need for a companion protocol to establish the logical DAG
skipping to change at page 8, line 20 skipping to change at page 8, line 46
all the ToF nodes share the full knowledge of the prefixes in the all the ToF nodes share the full knowledge of the prefixes in the
fabric. This can be achieved with a ring as suggested by the RIFT fabric. This can be achieved with a ring as suggested by the RIFT
main specification, by some preconfiguration, or using a main specification, by some preconfiguration, or using a
synchronization with a common repository where all the active synchronization with a common repository where all the active
prefixes are registered. prefixes are registered.
3.3. Use Cases 3.3. Use Cases
3.3.1. Data Center Fabrics 3.3.1. Data Center Fabrics
RIFT is largely driven by demands and hence ideally suited for RIFT is suited for applying in the data center (DC) IP fabrics
applying in data center (DC) IP fabrics underlay routing, vast underlay routing, vast majority of which seem to be currently (and
majority of which seem to be currently (and for the foreseeable for the foreseeable future) Clos architectures. It significantly
future) Clos architectures. It significantly simplifies operation simplifies operation and deployment of such fabrics as described in
and deployment of such fabrics as described in Section 4 for Section 4 for environments compared to extensive proprietary
environments compared to extensive proprietary provisioning and provisioning and operational solutions.
operational solutions.
3.3.2. Metro Fabrics 3.3.2. Metro Fabrics
The demand for bandwidth is increasing steadily, driven primarily by The demand for bandwidth is increasing steadily, driven primarily by
environments close to content producers (server farms connection via environments close to content producers (server farms connection via
DC fabrics) but in proximity to content consumers as well. Consumers DC fabrics) but in proximity to content consumers as well. Consumers
are often clustered in metro areas with their own network are often clustered in metro areas with their own network
architectures that can benefit from simplified, regular Clos architectures that can benefit from simplified, regular Clos
structures and hence RIFT. structures and hence from RIFT.
3.3.3. Building Cabling 3.3.3. Building Cabling
Commercial edifices are often cabled in topologies that are either Commercial edifices are often cabled in topologies that are either
Clos or its isomorphic equivalents. The Clos can grow rather high Clos or its isomorphic equivalents. The Clos can grow rather high
with many floors. That presents a challenge for traditional routing with many floors. That presents a challenge for traditional routing
protocols (except BGP and by now largely phased-out PNNI) which do protocols (except BGP and by now largely phased-out PNNI) which do
not support an arbitrary number of levels which RIFT does naturally. not support an arbitrary number of levels which RIFT does naturally.
Moreover, due to the limited sizes of forwarding tables in network Moreover, due to the limited sizes of forwarding tables in network
elements of building cabling&#65292;the minimum FIB size RIFT elements of building cabling, the minimum FIB size RIFT maintains
maintains under normal conditions is cost-effective in terms of under normal conditions is cost-effective in terms of hardware and
hardware and operational costs. operational costs.
3.3.4. Internal Router Switching Fabrics 3.3.4. Internal Router Switching Fabrics
It is common in high-speed communications switching and routing It is common in high-speed communications switching and routing
devices to use fabrics when a crossbar is not feasible due to cost, devices to use fabrics when a crossbar is not feasible due to cost,
head-of-line blocking or size trade-offs. Normally such fabrics are head-of-line blocking or size trade-offs. Normally such fabrics are
not self-healing or rely on 1:/+1 protection schemes but it is not self-healing or rely on 1:/+1 protection schemes but it is
conceivable to use RIFT to operate Clos fabrics that can deal conceivable to use RIFT to operate Clos fabrics that can deal
effectively with interconnections or subsystem failures in such effectively with interconnections or subsystem failures in such
module. RIFT is neither IP specific and hence any link addressing module. RIFT is neither IP specific and hence any link addressing
connecting internal device subnets is conceivable. connecting internal device subnets is conceivable.
3.3.5. CloudCO 3.3.5. CloudCO
The Cloud Central Office (CloudCO) is a new stage of telecom Central The Cloud Central Office (CloudCO) is a new stage of telecom Central
Office. It takes the advantage of Software Defined Networking (SDN) Office. It takes the advantage of Software Defined Networking (SDN)
and Network Function Virtualization (NFV) in conjunction with general and Network Function Virtualization (NFV) in conjunction with general
purpose hardware to optimize current networks. The following figure purpose hardware to optimize current networks. The following figure
illustrates this architecture at a high level. It describes a single illustrates this architecture at a high level. It describes a single
instance or macro-node of cloud CO. An Access I/O module faces a instance or macro-node of cloud CO that provides a number of Value
Cloud CO access node, and the Customer Premises Equipments (CPEs) Added Services (VAS), a Broadband Access Abstraction (BAA), and
behind it. A Network I/O module is facing the core network. The two virtualized nerwork services. An Access I/O module faces a Cloud CO
I/O modules are interconnected by a leaf and spine fabric. [TR-384] access node, and the Customer Premises Equipments (CPEs) behind it.
A Network I/O module is facing the core network. The two I/O modules
are interconnected by a leaf and spine fabric [TR-384].
+---------------------+ +----------------------+ +---------------------+ +----------------------+
| Spine | | Spine | | Spine | | Spine |
| Switch | | Switch | | Switch | | Switch |
+------+---+------+-+-+ +--+-+-+-+-----+-------+ +------+---+------+-+-+ +--+-+-+-+-----+-------+
| | | | | | | | | | | | | | | | | | | | | | | |
| | | | | +-------------------------------+ | | | | | | +-------------------------------+ |
| | | | | | | | | | | | | | | | | | | | | | | |
| | | | +-------------------------+ | | | | | | | +-------------------------+ | | |
| | | | | | | | | | | | | | | | | | | | | | | |
| | +----------------------+ | | | | | | | | | | +----------------------+ | | | | | | | |
skipping to change at page 11, line 5 skipping to change at page 11, line 5
| | | |
++-----------+ +---------++ ++-----------+ +---------++
|Network I/O | |Access I/O| |Network I/O | |Access I/O|
+------------+ +----------+ +------------+ +----------+
Figure 2: An example of CloudCO architecture Figure 2: An example of CloudCO architecture
The Spine-Leaf architecture deployed inside CloudCO meets the network The Spine-Leaf architecture deployed inside CloudCO meets the network
requirements of adaptable, agile, scalable and dynamic. requirements of adaptable, agile, scalable and dynamic.
4. Deployment Considerations 4. Operational Considerations
RIFT presents the opportunity for organizations building and RIFT presents the opportunity for organizations building and
operating IP fabrics to simplify their operation and deployments operating IP fabrics to simplify their operation and deployments
while achieving many desirable properties of a dynamic routing on while achieving many desirable properties of a dynamic routing on
such a substrate: such a substrate:
* RIFT only foods routing information to the devices that absolutely * RIFT only floods routing information to the devices that
need it. RIFT design follows minimum blast radius and minimum absolutely need it. RIFT design follows minimum blast radius and
necessary epistemological scope philosophy which leads to good minimum necessary epistemological scope philosophy which leads to
scaling properties while delivering maximum reactiveness. good scaling properties while delivering maximum reactiveness.
* RIFT allows for extensive Zero Touch Provisioning within the * RIFT allows for extensive Zero Touch Provisioning within the
protocol. In its most extreme version RIFT does not rely on any protocol. In its most extreme version RIFT does not rely on any
specific addressing and for IP fabric can operate using IPv6 ND specific addressing and for IP fabric can operate using IPv6 ND
[RFC4861] only. [RFC4861] only.
* RIFT has provisions to detect common IP fabric mis-cabling * RIFT has provisions to detect common IP fabric mis-cabling
scenarios. scenarios.
* RIFT negotiates automatically BFD per link allowing this way for * RIFT negotiates automatically BFD per link allowing this way for
skipping to change at page 12, line 6 skipping to change at page 12, line 6
This allows the use of any such valley-free path in bi-sectional This allows the use of any such valley-free path in bi-sectional
fabric bandwidth between two destination irrespective of their fabric bandwidth between two destination irrespective of their
metrics which can be used to balance load on the fabric in metrics which can be used to balance load on the fabric in
different ways. different ways.
* RIFT includes a key-value distribution mechanism which allows for * RIFT includes a key-value distribution mechanism which allows for
many future applications such as automatic provisioning of basic many future applications such as automatic provisioning of basic
overlay services or automatic key roll-overs over whole fabrics. overlay services or automatic key roll-overs over whole fabrics.
* RIFT is designed for minimum delay in case of prefix mobility on * RIFT is designed for minimum delay in case of prefix mobility on
the fabric. the fabric. In conjunction with [RFC8505], RIFT can differentiate
anycast advertisements from mobility events and retain only the
most recent advertisement in the latter case.
* Many further operational and design points collected over many * Many further operational and design points collected over many
years of routing protocol deployments have been incorporated in years of routing protocol deployments have been incorporated in
RIFT such as fast flooding rates, protection of information RIFT such as fast flooding rates, protection of information
lifetimes and operationally easily recognizable remote ends of lifetimes and operationally easily recognizable remote ends of
links and node names. links and node names.
4.1. South Reflection 4.1. South Reflection
South reflection is a mechanism that South Node TIEs are "reflected" South reflection is a mechanism that South Node TIEs are "reflected"
skipping to change at page 15, line 11 skipping to change at page 15, line 11
algorithm provided by RITF, ToF22 will explicitly originate an S-TIE algorithm provided by RITF, ToF22 will explicitly originate an S-TIE
with prefix 121 and prefix 122, that is flooded to spines 111, 112, with prefix 121 and prefix 122, that is flooded to spines 111, 112,
121 and 122. 121 and 122.
The packet from leaf111 to prefix122 will not be routed to linkTS1 or The packet from leaf111 to prefix122 will not be routed to linkTS1 or
linkTS2. The packet from leaf111 to prefix122 will only be routed to linkTS2. The packet from leaf111 to prefix122 will only be routed to
linkTS5 or linkTS7 following a longest-prefix match to prefix122. linkTS5 or linkTS7 following a longest-prefix match to prefix122.
4.4. Zero Touch Provisioning (ZTP) 4.4. Zero Touch Provisioning (ZTP)
Each RIFT node may operate in zero touch provisioning (ZTP) mode. It RIFT is designed to require a very minimal configuration to simplify
has no configuration (unless it is a ToF at the top of the topology its operation and avoid human errors; based on that minimal
or it is desired to confine it to leaf role w/o leaf-2-leaf information, Zero Touch Provisioning (ZTP) autoconfigures the key
procedures). In such case RIFT will fully configure the node's level operational parameters of all the RIFT nodes, that is, on the one
after it is attached to the topology. hand, the SystemID of the node that must be unique in the RIFT
network, and on the other hand the level of the node in the Fat Tree,
which determines which peers are northwards "parents" and which are
southwards "children".
The most important component for ZTP is the automatic level ZTP is always on, but its decisions can be overridden when a network
derivation procedure. All the ToF nodes are explicitly marked with administrator prefers to impose its own configuration. In that case,
TOP_OF_FABRIC flag which are initial 'seeds' needed for other ZTP it is the responsibility of the administrator to ensure that the
nodes to derive their level in the topology. The derivation of the configured parameters are correct, in other words that the SystemID
level of each node happens then based on Link Information Elements of each node is unique, and that the administratively set levels
(LIEs) received from its neighbors whereas each node (with possibly truly reflect the relative position of the nodes in the fabric. It
exceptions of configured leafs) tries to attach at the highest is recommended to let ZTP configure the network, and when not, it is
possible point in the fabric. This guarantees that even if the recommended to configure the level of all the nodes but those that
diffusion front reaches a node from "below" faster than from "above", are forced as leaves to avoid an undesirable interaction between ZTP
it will greedily abandon already negotiated level derived from nodes and the manual configuration.
topologically below it and properly peer with nodes above.
ZTP requires that the administrator points out the Top-of-Fabric
(ToF) nodes to set the baseline from which the fabric topology is
derived. The Top-of-Fabric nodes are configured with TOP_OF_FABRIC
flag which are initial 'seeds' needed for other ZTP nodes to derive
their level in the topology. The derivation of the level of each
node happens then based on Link Information Elements (LIEs) received
from its neighbors whereas each node (with possibly exceptions of
configured leaves) tries to attach at the highest possible point in
the fabric. This guarantees that even if the diffusion front reaches
a node from "below" faster than from "above", it will greedily
abandon already negotiated level derived from nodes topologically
below it and properly peer with nodes above.
A RIFT node may also be configured to confine it to the leaf role
with the LEAF_ONLY flag. A leaf node can also be configured to
support leaf-2-leaf procedures with the LEAF_2_LEAF flag. In either
case the node cannot be TOP_OF_FABRIC and its level cannot be
configured. RIFT will fully configure the node's level after it is
attached to the topology and ensure that the node is at the "bottom
of the hierarchy" (southernmost).
4.5. Mis-cabling Examples 4.5. Mis-cabling Examples
+----------------+ +-----------------+ +----------------+ +-----------------+
| ToF21 | +------+ ToF22 | LEVEL 2 | ToF21 | +------+ ToF22 | LEVEL 2
+-------+----+---+ | +----+---+--------+ +-------+----+---+ | +----+---+--------+
| | | | | | | | | | | | | | | | | |
| | | +----------------------------+ | | | | +----------------------------+ |
| +---------------------------+ | | | | | +---------------------------+ | | | |
| | | | | | | | | | | | | | | | | |
skipping to change at page 16, line 4 skipping to change at page 16, line 28
|Spine111| |Spine112| | |Spine121| |Spine122| LEVEL 1 |Spine111| |Spine112| | |Spine121| |Spine122| LEVEL 1
+-+---+--+ ++----+--+ | +--+---+-+ +-+----+-+ +-+---+--+ ++----+--+ | +--+---+-+ +-+----+-+
| | | | | | | | | | | | | | | | | |
| +---------+ | link-M | +---------+ | | +---------+ | link-M | +---------+ |
| | | | | | | | | | | | | | | | | |
| +-------+ | | | | +-------+ | | | +-------+ | | | | +-------+ | |
| | | | | | | | | | | | | | | | | |
+-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+
|Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0 |Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0
+-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
Figure 5: A single plane mis-cabling example Figure 5: A single plane mis-cabling example
Figure 5 shows a single plane mis-cabling example. It's a perfect Figure 5 shows a single plane mis-cabling example. It's a perfect
fat tree fabric except link-M connecting Leaf112 to ToF22. Fat Tree fabric except link-M connecting Leaf112 to ToF22.
The RIFT control protocol can discover the physical links The RIFT control protocol can discover the physical links
automatically and be able to detect cabling that violates fat tree automatically and be able to detect cabling that violates Fat Tree
topology constraints. It reacts accordingly to such mis-cabling topology constraints. It reacts accordingly to such mis-cabling
attempts, at a minimum preventing adjacencies between nodes from attempts, at a minimum preventing adjacencies between nodes from
being formed and traffic from being forwarded on those mis-cabled being formed and traffic from being forwarded on those mis-cabled
links. Leaf112 will in such scenario use link-M to derive its level links. Leaf112 will in such scenario use link-M to derive its level
(unless it is leaf) and can report links to Spine111 and Spine112 as (unless it is leaf) and can report links to Spine111 and Spine112 as
mis-cabled unless the implementations allows horizontal links. mis-cabled unless the implementations allows horizontal links.
Figure 6 shows a multiple plane mis-cabling example. Since Leaf112 Figure 6 shows a multiple plane mis-cabling example. Since Leaf112
and Spine121 belong to two different PoDs, the adjacency between and Spine121 belong to two different PoDs, the adjacency between
Leaf112 and Spine121 can not be formed. link-W would be detected and Leaf112 and Spine121 can not be formed. link-W would be detected and
skipping to change at page 21, line 23 skipping to change at page 22, line 7
RIFT allows advertising IPv4 prefixes over IPv6 RIFT network. IPv6 RIFT allows advertising IPv4 prefixes over IPv6 RIFT network. IPv6
Address Family (AF) configures via the usual Neighbor Discovery (ND) Address Family (AF) configures via the usual Neighbor Discovery (ND)
mechanisms and then V4 can use V6 nexthops analogous to [RFC5549]. mechanisms and then V4 can use V6 nexthops analogous to [RFC5549].
It is expected that the whole fabric supports the same type of It is expected that the whole fabric supports the same type of
forwarding of address families on all the links. RIFT provides an forwarding of address families on all the links. RIFT provides an
indication whether a node is v4 forwarding capable and indication whether a node is v4 forwarding capable and
implementations are possible where different routing tables are implementations are possible where different routing tables are
computed per address family as long as the computation remains loop- computed per address family as long as the computation remains loop-
free. free.
+-----+ +-----+ +-----+ +-----+
+---+---+ | ToF | | ToF | +---+---+ | ToF | | ToF |
^ +--+--+ +-----+ ^ +--+--+ +-----+
| | | | | | | | | |
| | +-------------+ | | | +-------------+ |
| | +--------+ | | | | +--------+ | |
+ | | | | + | | | |
V6 +-----+ +-+---+ V6 +-----+ +-+---+
Forwarding |Spine| |Spine| Forwarding |Spine| |Spine|
+ +--+--+ +-----+ + +--+--+ +-----+
| | | | | | | | | |
| | +-------------+ | | | +-------------+ |
| | +--------+ | | | | +--------+ | |
| | | | | | | | | |
v +-----+ +-+---+ v +-----+ +-+---+
+---+---+ |Leaf | | Leaf| +---+---+ |Leaf | | Leaf|
+--+--+ +--+--+ +--+--+ +--+--+
| | | |
IPv4 prefixes| |IPv4 prefixes IPv4 prefixes| |IPv4 prefixes
| | | |
+---+----+ +---+----+ +---+----+ +---+----+
| V4 | | V4 | | V4 | | V4 |
| subnet | | subnet | | subnet | | subnet |
+--------+ +--------+ +--------+ +--------+
Figure 8: IPv4 over IPv6 Figure 8: IPv4 over IPv6
4.9. In-Band Reachability of Nodes 4.9. In-Band Reachability of Nodes
RIFT doesn't precondition that nodes of the fabric have reachable RIFT doesn't precondition that nodes of the fabric have reachable
addresses. But the operational purposes to reach the internal nodes addresses. But the operational purposes to reach the internal nodes
may exist. Figure 9 shows an example that the network management may exist. Figure 9 shows an example that the network management
station (NMS) attaches to leaf1. station (NMS) attaches to leaf1.
skipping to change at page 30, line 18 skipping to change at page 31, line 18
Networked Measurement and Control Systems", Networked Measurement and Control Systems",
<https://standards.ieee.org/standard/1588-2019.html>. <https://standards.ieee.org/standard/1588-2019.html>.
[CLOS] Yuan, X., "On Nonblocking Folded-Clos Networks in Computer [CLOS] Yuan, X., "On Nonblocking Folded-Clos Networks in Computer
Communication Environments", IEEE International Parallel & Communication Environments", IEEE International Parallel &
Distributed Processing Symposium, 2011. Distributed Processing Symposium, 2011.
[FATTREE] Leiserson, C. E., "Fat-Trees: Universal Networks for [FATTREE] Leiserson, C. E., "Fat-Trees: Universal Networks for
Hardware-Efficient Supercomputing", 1985. Hardware-Efficient Supercomputing", 1985.
[RFC3626] Clausen, T., Ed. and P. Jacquet, Ed., "Optimized Link
State Routing Protocol (OLSR)", RFC 3626,
DOI 10.17487/RFC3626, October 2003,
<https://www.rfc-editor.org/info/rfc3626>.
[RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
"Network Time Protocol Version 4: Protocol and Algorithms "Network Time Protocol Version 4: Protocol and Algorithms
Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010, Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
<https://www.rfc-editor.org/info/rfc5905>. <https://www.rfc-editor.org/info/rfc5905>.
[RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", STD 86, RFC 8200, (IPv6) Specification", STD 86, RFC 8200,
DOI 10.17487/RFC8200, July 2017, DOI 10.17487/RFC8200, July 2017,
<https://www.rfc-editor.org/info/rfc8200>. <https://www.rfc-editor.org/info/rfc8200>.
skipping to change at page 30, line 47 skipping to change at page 32, line 4
2020, <https://www.rfc-editor.org/info/rfc8928>. 2020, <https://www.rfc-editor.org/info/rfc8928>.
Authors' Addresses Authors' Addresses
Yuehua Wei (editor) Yuehua Wei (editor)
ZTE Corporation ZTE Corporation
No.50, Software Avenue No.50, Software Avenue
Nanjing Nanjing
210012 210012
China China
Email: wei.yuehua@zte.com.cn Email: wei.yuehua@zte.com.cn
Zheng Zhang Zheng Zhang
ZTE Corporation ZTE Corporation
No.50, Software Avenue No.50, Software Avenue
Nanjing Nanjing
210012 210012
China China
Email: zhang.zheng@zte.com.cn Email: zhang.zheng@zte.com.cn
Dmitry Afanasiev Dmitry Afanasiev
Yandex Yandex
Email: fl0w@yandex-team.ru Email: fl0w@yandex-team.ru
Tom Verhaeg
Juniper Networks
Email: tverhaeg@juniper.net
Jaroslaw Kowalczyk
Orange Polska
Email: jaroslaw.kowalczyk2@orange.com
Pascal Thubert Pascal Thubert
Cisco Systems, Inc Cisco Systems, Inc
Building D Building D
45 Allee des Ormes - BP1200 45 Allee des Ormes - BP1200
06254 MOUGINS - Sophia Antipolis 06254 MOUGINS - Sophia Antipolis
France France
Phone: +33 497 23 26 34 Phone: +33 497 23 26 34
Email: pthubert@cisco.com Email: pthubert@cisco.com
Tom Verhaeg
Juniper Networks
Email: tverhaeg@juniper.net
Jaroslaw Kowalczyk
Orange Polska
Email: jaroslaw.kowalczyk2@orange.com
 End of changes. 53 change blocks. 
164 lines changed or deleted 210 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/