idnits 2.17.1 draft-filyurin-rift-access-networks-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet has text resembling RFC 2119 boilerplate text. -- The document date (June 13, 2018) is 2145 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 716 -- Looks like a reference, but probably isn't: '2' on line 718 -- Looks like a reference, but probably isn't: '3' on line 720 == Unused Reference: 'I-D.ietf-rift-rift' is defined on line 675, but no explicit reference was found in the text == Unused Reference: 'RFC3971' is defined on line 687, but no explicit reference was found in the text == Unused Reference: 'RFC4861' is defined on line 692, but no explicit reference was found in the text == Unused Reference: 'RFC5120' is defined on line 697, but no explicit reference was found in the text == Unused Reference: 'RFC6550' is defined on line 703, but no explicit reference was found in the text == Unused Reference: 'RFC8202' is defined on line 710, but no explicit reference was found in the text == Outdated reference: A later version (-21) exists of draft-ietf-rift-rift-01 Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RIFT Working Group Y. Filyurin, Ed. 3 Internet-Draft Bloomberg LP 4 Intended status: Informational June 13, 2018 5 Expires: December 15, 2018 7 RIFT -- Motivation, Additional Requirements and Use Cases in User Access 8 Networks 9 draft-filyurin-rift-access-networks-00 11 Abstract 13 RIFT is a new specialized dynamic routing protocol originally 14 designed for Clos and Fat Tree Data Center networks. It is designed 15 to work on multilevel network topologies in which nodes in certain 16 level will only connect to nodes in one upper or lower level with 17 optional and non-contiguous intra-level connectivity. 19 While the protocol was originally designed to meet the needs of 20 Massively Scalable Data Centers, its ability to automatically prune 21 the information distribution from higher levels to lower levels, as 22 well as provide optimal routing for intra and inter-level traffic 23 makes it a good match for user access networks, or any network that 24 combines end user access and various compute enabling various network 25 service for these end users. Current directions in distributed 26 computing seek to blur even that distinction. Large distributed 27 networks can be created, where virtual compute units can be in all 28 tiers, combining and crossing many requirements for DC or User Access 29 design. This draft seeks to analyze these requirements. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on December 15, 2018. 48 Copyright Notice 50 Copyright (c) 2018 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Definitions of Terms Used in This Memo . . . . . . . . . . . 2 66 2. Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 68 4. Additional Requirements for RIFT Access Networks . . . . . . 4 69 5. Network Slicing . . . . . . . . . . . . . . . . . . . . . . . 4 70 5.1. Overall Network Slicing . . . . . . . . . . . . . . . . . 5 71 5.2. Identification and Propagation of Slice Information . . . 5 72 5.3. Network Instances and RIB and FIB Requirement . . . . . . 6 73 5.4. Network Instances and Control Plane . . . . . . . . . . . 7 74 5.5. Network Instances and Forwarding . . . . . . . . . . . . 9 75 6. External Routing Information . . . . . . . . . . . . . . . . 10 76 7. RIFT and Endpoint Address Mobility . . . . . . . . . . . . . 11 77 7.1. Mobility Use Cases . . . . . . . . . . . . . . . . . . . 11 78 8. Border Nodes and Superspine East/West traffic . . . . . . . . 12 79 9. Border Nodes and Superspine East/West traffic . . . . . . . . 13 80 10. Security Considerations . . . . . . . . . . . . . . . . . . . 14 81 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 14 82 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 83 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 84 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 85 14.1. Normative References . . . . . . . . . . . . . . . . . . 15 86 14.2. Informative References . . . . . . . . . . . . . . . . . 15 87 14.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 16 88 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 16 90 1. Definitions of Terms Used in This Memo 92 MSDC - Massively Scalable Data Center 94 IGP - Interior Gateway Protocol 95 RIB - Routing Information Base 97 FIB - Forwarding Information Base 99 MT - Mutli-Topology in the context of IS-IS 101 MI - Mutli-Instance in the context of IS-IS 103 AD - Auto-discovery 105 UDP - User Datagram Protocol 107 IID - Instance ID, in the context of control and data plane slicing 108 of network devices 110 TIE - Topology Information Element, per original RIFT specification 112 N-TIE - Northbound Topology Information Element, flooded in the 113 Northbound direction, per original RIFT specification 115 S-TIE - Northbound Topology Information Element, propagated in the 116 Southbound direction, per original RIFT specification 118 Node TIE - Node Topology Information Element, per original RIFT 119 specification 121 Prefix TIE - Prefix Topology Information Element, per original RIFT 122 specification 124 Key Value TIE or K/V TIE - A TIE (mainly Southbound) that is 125 carrying a set of key value pairs, per original RIFT specification 127 LIE - Link Information Element, per original RIFT specification 129 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 130 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 131 "OPTIONAL" in this document are to be interpreted as described in BCP 132 14 [RFC2119] when, and only when, they appear in all capitals, as 133 shown here. 135 2. Authors 137 Following authors substantially contributed to the current format of 138 the document: 140 3. Introduction 142 Typical access networks are built in a hierarchical fashion using 143 "Core", "Distribution" and "Access" layers designed to support 144 collections of wiring distribution blocks that in turn connect to end 145 user devices, server compute nodes and various forms of utility 146 devices. This design is just variation of the Fat Tree design and 147 RIFT presents an opportunity to significantly reduce traditional 148 switched networks design limitations, bring seamless mobility to end 149 systems within the entire access network domain and remove the 150 operational overhead that comes with provisioning access networks. 151 All this can be done without forcing lower level network devices to 152 carry feature sets traditionally found in higher end aggregation 153 devices. 155 Decoupling network layer information from device reachability 156 information allows any network layer information to be propagated, 157 and thus, expand the protocol to support routing for any type of 158 network layer addressing. Use of Policy Guided Prefixes allows 159 specialized forwarding policies where packets are forwarded through 160 specialized paths or redirected to specialized service nodes, such as 161 packet shapers. Use of Key/Value N-TIEs and S-TIEs would allow 162 propagation of both configuration information to facilitate fully 163 automated deployment and operations. Key/Value TIEs can be used to 164 propagate other information that can aid forwarding such as interface 165 queuing policies, access control policies or configuration of 166 auxiliary services such as DHCP relay. The use of IPv6 Link Local 167 addressing on all infrastructure for exchange of LIEs removes a lot 168 of operational overhead in bringing up and supporting RIFT network. 170 4. Additional Requirements for RIFT Access Networks 172 The original RIFT specification was created for traditional Data 173 Center environments. Access networks may call for additional 174 capabilities. This desire for additional capabilities is due to the 175 fact that many endpoints in these traditional access environments 176 often lack the capabilities of providing traditional delineation 177 between the network infrastructure domain and individual workloads 178 running on these devices and must rely on the network edge to provide 179 that delineation. 181 5. Network Slicing 183 Network slicing in this context is defined as creating individual 184 separate virtual networks within our access networks connecting sets 185 of edge devices. The slices are effectively their own virtual Fat 186 Trees with separate Control Plane data structures holding prefix 187 information. The protocol processes populate virtual RIBs, which 188 program the FIB (assuming common FIB in most platforms) to define 189 instance specific packet identification and its per hop forwarding 190 behavior. 192 Network slices can also be called network instances. Often they are 193 used interchangeably, but often network instance applies a virtual 194 network construct local to an individual device, where network slice 195 covers a virtual network carved out from the set of interconnected 196 devices. 198 5.1. Overall Network Slicing 200 RIFT original specification uses the concepts of Multi-Topology 201 RFC5120 [1] and Multi-Instance RFC6822 [2] to create network-wide 202 virtual routing domains. RIFT capabilities to form separate neighbor 203 relationship for each instance make MI approach more appropriate for 204 creating network slices, allowing multiple virtual Fat Trees to 205 operate as "ships in the night" creating completely separate RIFT 206 flooding/propagation domains. As part of initial LIE exchange 207 individual adjacencies per instance will be formed, as long as the 208 nodes can agree on the instance ID. Standard discovery process can 209 apply, and it could be argued that all auto-configuration information 210 exchange can happen only at the global instance. 212 5.2. Identification and Propagation of Slice Information 214 The process is no different in principle than for many other forms of 215 virtual private network services. The process starts with Auto- 216 Discovery where nodes hosting a particular instance can propagate 217 this information to other nodes (using Key/Value (K/V) Ties, for 218 example) and individual neighbor relationships will form. Once 219 instance adjacencies form, then all other information can be 220 exchanged and propagated. 222 Since RIFT is fundamentally an underlay protocol, and relies on 223 itself for next hop resolution, instance awareness must not just be 224 on edge devices hosting the instance, but all transit devices. In 225 both MT and MI approaches, topologies and instances are explicitly 226 configured. When provisioning RIFT networks, there must be some 227 approach to facilitate instance activation on transit devices. Once 228 the device becomes "instance aware", then LIE exchange can take place 229 to establish common parameters such as UDP ports and neighbor 230 adjacency can be established using standard process. 232 Due to K/V capabilities of RIFT, there should be no need to define 233 special Instance ID TLVs or modify the Thrift models. Some external 234 entity will configure instance parameters and access policies 235 instance system IDs established and K/V N-TIEs can be propagated to 236 higher levels and neighbor adjacencies established. 238 5.3. Network Instances and RIB and FIB Requirement 240 While RIFT is an underlay protocol, as soon as individual virtual Fat 241 Trees are created, packet forwarding on links, that are used for 242 multiple slices can no longer be programmed using standard network 243 layer information. This is a typical example of using some unique 244 identifier to determine unique per-hop behavior. The price of using 245 unique identifiers whether they take on the form of shim headers, 246 special packet metadata or even translation and encapsulation 247 techniques is the requirement of creating more advanced forwarding 248 state on the transit network devices. First, there must be an 249 association that maps a particular identifier to a particular 250 instance, then another action that makes the forwarding decision 251 identifying the next hop and the final action of adding the right 252 metadata to the packet allowing the next hop to perform the same set 253 of actions. 255 The problem can be resolved using two standard approaches outside of 256 deploying multi-operation forwarding devices. Either put the 257 destination address based forwarding on the edges, that already have 258 the policies to associate network layer information with instance 259 information, or create more advanced FIB data structures that map to 260 hardware operations that allow metadata/address lookup and forwarding 261 to be done as simple atomic operations. 263 The first approach to some degree defeats the purpose of using RIFT 264 as a routing protocol - access devices having visibility to all 265 destinations and metadata available to them. The second approach is 266 more realistic, but these advanced capabilities may only be available 267 on more advanced devices. These devices are less likely to be 268 deployed closer to edges of RIFT network, and possibly get in the way 269 of the requirement of less expensive and feature rich access network. 271 Within MI RIFT domain, there would be three types of forwarding 272 behavior. First forwarding behavior is on the leaf devices 273 connecting to endpoints that apply policies associating end systems 274 with instances, impose and dispose of the metadata and forward the 275 packet to transit devices. The second forwarding behavior is found 276 on transit devices. These devices forward exclusively based on 277 metadata, or shim headers, effectively forwarding traffic to the 278 highest level aggregation devices. The last type of device is the 279 aggregation device, that maintains advanced FIB that processes and 280 forwards packets, imposing metadata used to forward to the leaf 281 devices. 283 5.4. Network Instances and Control Plane 285 Taking the example drawing from original RIFT spec: 287 : 289 . +--------+ +--------+ 290 . | | | | ^ N 291 . |Spine 21| |Spine 22| | 292 .Level 2 ++-+--+-++ ++-+--+-++ <-*-> E/W 293 | | | | | | | | | 294 . P111/2| |P121 | | | | S v 295 . ^ ^ ^ ^ | | | | 296 . | | | | | | | | 297 . +--------------+ | +-----------+ | | | +---------------+ 298 . | | | | | | | | 299 . South +-----------------------------+ | | ^ 300 . | | | | | | | All TIEs 301 . 0/0 0/0 0/0 +-----------------------------+ | 302 . v v v | | | | | 303 . | | +-+ +<-0/0----------+ | | 304 (I1, I11, I-Odd, I-Even)| | | | | | 305 .+-+----++ optional +-+----++ ++----+-+ ++-----++ 306 .| | E/W link | | | | | | 307 .|Node111+----------+Node112| |Node121| |Node122| 308 .+-+---+-+ ++----+-+ +-+---+-+ ++---+--+ 309 . | | | South | | | | 310 . | +---0/0--->-----+ 0/0 | +----------------+ | 311 . (I1, I11, I-Odd) | | | | | | | 312 . | +---<-0/0-----+ | v | +--------------+ | | 313 . v | (I1, I11, I-Even) | | | | 314 .+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ 315 .| | (L2L) | | | | Level 0 | | 316 .|Leaf111~~~~~~~~~~~~Leaf112| |Leaf121| |Leaf122| 317 .+-+-----+ +-+---+-+ +--+--+-+ +-+-----+ 318 . + + \ / + + 319 . Prefix111 Prefix112 \ / Prefix121 Prefix122 320 . multi-homed 321 . Prefix 322 .+---------- Pod 1 ---------+ +---------- Pod 2 ---------+ 324 A two level spine-and-leaf topology 325 Assuming we take every "Leaf" device (111,112,121 and 122) and create 326 instance I1 on each device, as well as instance policies. At the 327 same time, Leafs 111 and 112 can host an instance I11 and leafs 121 328 and 122 can host instance I12. 111 and 121 are hosting I-Odd and 112 329 and 122 are hosting I-even. Northbound K/V TIEs can be used to 330 propagate instance information and set up instance RIB data 331 structures on the transit devices. Leafs will have those data 332 structures set up during instance creation and transit devices as 333 soon as they receive K/V TIEs. In this example Spines will have the 334 RIB data structures for all the instances created, Node 111 and Node 335 112 should only have the state from I1, I11 and I-Odd and I-Even and 336 Nodes 121 and 122 should have the identical state, except that I11 337 would be replaced by I12. 339 The same approach would be applied to forming adjacencies. Once the 340 initial LIE exchange completes and instance TIEs have been exchanged 341 between the devices and parameter negotiation is complete - instance 342 specific neighbor adjacency can be established. The creation of all 343 the data structures, TIE flooding and propagation starts then. 345 In the above setup, the leafs maintain the needed Control Plane state 346 created as part of configuration and propagation of Prefix S-TIEs 347 from transit nodes. Their 0/0 or ::/0 or any other relevant routing 348 state within each instance is designed to route packets towards the 349 spines. 351 Transit nodes (Node 111, 112, 121 and 122) would have instance 352 adjacencies with leafs based on which leaf hosts which instance. For 353 example all transit nodes with maintain I1 adjacency with every leaf, 354 but I-Even adjacency with leafs 112 and 122 and I-Odd with 111 and 355 112. Since per instance adjacencies are formed this is even more 356 flexible than MI-ISIS, and there is no need to do IID TLV mechanism. 357 A direct association exists between instance RIB data structures and 358 per instance adjacencies. 360 Spines 22 and 22 would create RIB data structures for all the 361 instances, as the spines are responsible for routing the traffic 362 between leafs. In our example their adjacencies are still based on 363 advertise K/V TIEs indicating instance memberships. Spines 21 and 22 364 would have all the instance adjacencies with nodes 111 and 112, 365 except for instance I12 and with 121 and 122 for all the instances, 366 except for I11. 368 All the standard RIFT rules must apply for adjacency establishment on 369 horizontal links between nodes of the same level. The same rules 370 must apply for prefix disaggregation and treatment of Policy Group 371 Prefix (PGP) TIEs. 373 A leaf is expected to connect to multiple nodes and failure of 374 instance synchronization on the horizontal link either indicates and 375 outage of an error. It could be up to the implementation to define 376 default behavior and correlation of K/V TIEs with flooded Node TIEs 378 5.5. Network Instances and Forwarding 380 Leafs apply instance policies, dispose of the metadata and make 381 forwarding decisions to forward packets to spines through various 382 node transit devices. This is the primary difference between normal 383 RIFT operation and per-instance RIFT, designed to address the 384 forwarding limitations of transit devices, that would have to 385 identify the topology, perform the forwarding action within the 386 context of that topology and potentially put another identifier for 387 the next device. Leafs therefore must not just forward the packets, 388 but impose the right information on it, to allow transparent 389 forwarding to the spines by transit devices. Spines in turn have the 390 task of identifying the topology, determining the leaf device for the 391 destination address (for any address schema) and properly marking the 392 packet for topology identification as it is forwarded towards the 393 destination leaf. 395 Techniques for forwarding packets to the spines and then to the 396 appropriate leafs can be up to implementations or may be hardware 397 specific, where some set-ups are better off with encapsulation, some 398 better of with shim headers and some with address manipulation. The 399 forwarding tables of transit devices must have the information to 400 forward packets to the spines, and multiple instances can share that 401 information, as long as spines can uniquely identify the instance. 403 This is a potential use case for various techniques ranging from 404 simple Label Switched Paths (LSPs) using both label swapping and 405 forwarding, more complex approaches for path set-up with use of 406 deeper label stacks to identify the devices, instance and some other 407 per-hop behavior. Doing this conflicts with the Requirement #13 as 408 outlined in the original RIFT draft, where all traffic must transit 409 the spine. This may be very much acceptable in a traditional user 410 access network, where most of the traffic is ultimately North/South 411 or has to be North/South due to various security requirements, but as 412 traffic patterns change, various systems become more distributed and 413 enterprise data processing starts resembling smaller scale MSDCs, it 414 may not be a bad idea to have the capability to have multiple levels 415 of devices capable of executing advanced per-hop actions on the 416 packets. 418 6. External Routing Information 420 In most environments RIFT will not be the only control plane 421 protocol. Recent advances in compute virtualization designs create 422 an opportunity for designs in which traditional compute hosts are now 423 running multiple workloads where network virtualization is now at the 424 network layer, as opposed to traditional approach of transport layer 425 virtualization. As such, individual virtual operating system 426 instances or virtual processes present their own network layer 427 address. These addresses exist on the network only for the duration 428 of the workload and in some situations even move. This applies to 429 primarily Data Center networks and while these can found in access 430 environments, the scale requirements are unlikely to be significant. 431 In access environments, however, server compute nodes are replaced by 432 numerous systems that in turn support mobile devices, special purpose 433 mesh networks. 435 Mobility will be discussed later, but various control plane protocols 436 can be deployed on lower level nodes, especially leaf nodes, where 437 these external protocols are used to create routing information used 438 to forward packet to these compute workloads. In addition to these 439 protocols, Network Admission Control protocols as well as network 440 discovery protocols can be used to populate device routing tables. 442 All this routing information must be exchanged with RIFT as part of 443 export/import relationship between RIFT and RIB manager or 444 redistribution between RIFT and databases of these protocols. These 445 foreign prefixes are propagated as Prefix TIEs Northbound with the 446 ability to carry some information that identifies these as external 447 and some additional information allowing non-leaf devices to treat 448 the information in a special way. Prefix TIEs are able to carry 449 optional attribute set. As part of this optional set, Route Tags can 450 be defined and used for external route identification. Aside from 451 the optional attribute set, there would not even be difference 452 between "internal" and "external" prefixes, as the import process is 453 nearly identical. 455 External routes by default should not be propagated Southbound and 456 would be subject of the same de-aggregation rules that apply to 457 normal RIFT operation. External prefixes would only be propagated 458 southbound if the node in the southern direction could follow the 459 default in the direction where there would no visibility of that 460 route. Implementation should offer the option to propagate external 461 routes without any explicit configuration. Situations, in which RIFT 462 domain could be used to interconnect other routing domains can be a 463 match for this requirement. 465 RIFT is not meant to become an inter-domain routing protocol, but 466 various forms of stub networks of many compute and transit entities 467 using other specialized routing protocols could be interconnected 468 using RIFT domain, as well as connecting to other external systems. 470 7. RIFT and Endpoint Address Mobility 472 Most of endpoint addressing including network addressing belongs to 473 fixed locations, as the network address is associated with a 474 connecting interface. When service endpoints have their own 475 addresses that exist independent of network addresses, this 476 separation ultimately creates the need for address mobility. 477 Endpoint address mobility is both the ability to move the association 478 of any address endpoint to any network device interface, as well as 479 ability to reuse any endpoint address anywhere in the mobility 480 domain. 482 Numerous traditional approaches exist ranging from relying on 483 combining locator and endpoints in a single address, keeping all 484 endpoints in a continuous broadcast domain relying on auto-discovery 485 mechanisms to various centralized and distributed Locator/Endpoint 486 mapping systems, that keep track of endpoint mobility. Numerous work 487 went into making both approaches scalable utilizing various 488 networking layers, but the problem has been reduced to one of 489 distributed dynamic routing - ability to re-advertise the address of 490 the endpoint to reroute to it through a different locator. 492 7.1. Mobility Use Cases 494 Actual mobility use cases may include activation, deactivation and 495 moves of virtual compute systems in both server and access 496 environment. They can be both virtual servers serving clients to 497 virtual nodes in various peer-to-peer applications. Other use cases 498 may include activation, deactivation and association of wireless 499 nodes to different point-to-point, point-to-multipoint and mesh 500 wireless networks. Whether the locator is a physical compute node or 501 a wireless access point, the locator serves as the boundary between 502 the static locator and dynamic endpoint networks. RIFT takes on 503 dynamically routing in the first to support access to the second. 505 RIFT support for mobility is defined in the Mobility section of the 506 RIFT specification. The fundamental requirement for the mobile node 507 management systems, whether centralized or distributed is to support 508 notifying RIFT either through redistribution/import mechanism or 509 directly when mobility events happen, as RIFT does not have a native 510 purge mechanism and RIFT will insure the right network state to 511 provide routing to the right locator is maintained using time stamp 512 and sequence counter mechanisms. Both unicast and unicast routing 513 can be supported. 515 Address mobility should be supported in both single global instances 516 as well as multi-instance configuration. For non-global instances 517 RIFT operation should be no different, and each instance would 518 maintain its own data structures keeping track of timestamps and 519 sequence numbers. As stated in the original RIFT specification, 520 mobility can be defined as a service and supported through a separate 521 instance. If done so, then various transit nodes between leafs and 522 super-spines are either forwarding encapsulated packets or programmed 523 to process just shim headers and metadata. While this does not 524 minimize the control plane effort needed to perform mobility at scale 525 (as RIFT is an underlay protocol), this would reduce the FIB sizes 526 and minimize the data plane requirements. 528 As outlined in the original RIFT specifications, some environments 529 would already be designed to support mobility using other techniques 530 for locator/endpoint separation such as LISP or ILA. While RIFT can 531 assist these protocols with providing the needed configuration to the 532 leaf nodes, such as instance mapping and resolver information, the 533 two systems operate independently. 535 8. Border Nodes and Superspine East/West traffic 537 Border Nodes are special purpose leaf nodes connected directly to the 538 top level of the hierarchy. They may run a foreign routing protocol 539 and will often be used to interconnect to different networks. Most 540 of the external routes, including the default routes would be 541 originated from those. The first approach is to treat these devices 542 as any other nodes in the hierarchy. They will assume a lower level 543 and will flood N-TIEs and receive standard Node S-TIEs and all Prefix 544 S-TIEs. They are just regular leafs, but because of their function, 545 they are capable of propagating external routing information and also 546 receive all prefix TIEs, as opposed to just originated default. 548 The second approach is to have the mechanism to treat the super- 549 spine, other interconnected super-spines and border nodes (which 550 become super-spines at this point) as part of a single flood domain. 551 This is similar as treating super-spines as a traditional backbone 552 area in OSPF or Layer 2 domain in IS-IS. All N-TIEs are flooded on 553 all links in the higher available level. 555 This is a request to have the only allowed exception to the original 556 specification that explicitly states that neither N-SPF nor S-SPF can 557 provide full loop prevention capability as the entire Fat Tree design 558 is not based on the continuous connectivity at any level. If the 559 super-spine domain becomes its own Link State flooding domain, or 560 East/West TIEs are introduced, than Prefix S-TIEs must be used to 561 populate the RIB if they are available. In addition East/West ties 562 can never be used to propagate information as S-TIEs. Super-spines 563 do not get to act as backbone areas, or various techniques used for 564 things like route leaking have to be employed. 566 9. Border Nodes and Superspine East/West traffic 568 Where super-spines represent the top of the hierarchy bringing 569 various design ideas and their caveats such as continuous super-spine 570 domain, leafs also want to take advantage of certain topology 571 optimizations. In certain set-ups especially in large campus and 572 metro area networks, leaf connectivity can be deployed in a "daisy 573 chain" fashion. In such connectivity set-up, a set of leaf devices 574 will be interconnected where the "leftmost" and "rightmost" devices 575 provide connectivity to higher levels of the tree. Similar to the 576 interconnected super-spine concept, this violates some of the design 577 principles of Fat Tree topology and some accommodations for this in 578 the RIFT protocol may be required. 580 RIFT is not designed to provide full ring protection, unless the ring 581 consists of 2-3 nodes (becoming either an interconnected single tier 582 or leaf/spine with a single leaf). A ring of more than 3 nodes 583 becomes a broken Fat Tree topology. Before a multilevel RIFT 584 environment with the bottom level being a daisy chain of leafs, we 585 can try a simple ring approach. Assuming two adjacent nodes on the 586 ring can be configured as SUPER_SPINE, then it is theoretically 587 possible that all other nodes of the newly formed "half-ring" could 588 have a level assigned to them, and depending on the number of nodes 589 in a ring, one or two nodes would become level 0 leafs. Assuming 590 that we would want all the nodes to become leafs, then either the 591 nodes must be explicitly configured to be LEAF_ONLY, or the links 592 from the two "aggregation nodes" to the leaf nodes must be configured 593 to explicitly tell other nodes that they are leafs, which those leaf 594 nodes must continue propagating. This may require creation of 595 another flag used in adjacency formation. 597 Assuming the correct adjacencies have been formed and we have a set 598 of two nodes: Node1 and Node2 of level 1 and a set of leafs, Leaf1, 599 Leaf2 and Leaf3 where Leaf1 connects to Node1 and Leaf2, Leaf2 600 connects to Leaf1 and Leaf3 and Leaf3 connects to Node2. Nodes 1 and 601 Node 2 can either have a direct E/W link or just links to Nodes of 602 Level 2. 604 The first design violation is breaking one of the Leaf-to-Leaf rules, 605 which states that only the N-TIEs that are originated by a particular 606 leaf are sent over East/West Leaf-to-Leaf link. Since the leaf 607 devices in a daisy chain are part of the same level, this rule could 608 be relaxed, as N-TIEs from leafs in the chain can be propagated to 609 higher levels where they get to run N-SPF and deal with partitioned 610 leaf network. The condition of this relaxation can be that devices 611 in the daisy chain ultimately rely on S-SPF only based on what is 612 propagated with S-TIEs. S-TIEs in turn get propagated in both 613 directions of the chain without being sent Northbound. 615 Endpoints on devices in the half-ring rely on S-TIEs to reach other 616 endpoints in this sub-topology and S-TIEs to reach endpoints outside 617 the half ring. N-TIEs are propagated to allow endpoints outside the 618 half-ring to reach endpoints in the half-ring. Partition within the 619 half-ring would have to trigger the reflooding of the N-TIEs, as well 620 as propagation of the S-TIEs. This may be the only possible 621 situation on which a purge like Southbound mechanism is used, but 622 ultimately the direction is not Southbound, but East-West. 624 10. Security Considerations 626 Access environments are less trusted environments. RIFT is designed 627 in such a way to make it possible for a device to join the network 628 without too much extra configuration. The protocol was designed to 629 simplify operations, but at the price at making it a lot easier for 630 devices to become part of the network. In many MSDC environments the 631 devices are deployed to come online with special interfaces that 632 connect to dedicated Out-of-Band (OOB) management network. Not only 633 the process preconfigures these devices, the system ensures that the 634 initial configuration as well as software and firmware of the version 635 that a particular enterprise considers secure. Devices deployed in 636 more remote locations or just those without out of band management 637 network connectivity may not go through the initial configuration, 638 plus general physical security is lowered. Security procedures for 639 neighbor authentication become a lot more critical. 641 Implementing Secure Neighbor Discovery would make the attachment to 642 the network more difficult and implementing a protocol that supports 643 encryption could keep protocol communications secure. 645 A number of activities in 6lo working group have developed a number 646 of ideas that could create a more secure way for the RIFT neighbors 647 to authenticate each before forming a formation. Of note is the work 648 done in RFC6775 [3] as well as its extension that addresses security 650 11. Conclusions 652 RIFT started out as a Data Center protocol, and will evolve in that 653 direction, allowing greater scalability in building multi-tier 654 fabrics. As the requirements of MSDCs and larger access environments 655 start to look very similar, and as end user compute and server 656 compute start performing very similar functions, there will be more 657 similarities between end user mobility and workload mobility than 658 there differences. RIFT and its enhancements that combine many 659 aspects of control and management planes, can become the IGP these 660 environments have been waiting for. 662 12. IANA Considerations 664 At this point there is no need for any allocations 666 13. Acknowledgments 668 Would like to acknowledge Antoni Przygienda for the original feedback 669 and hopefully steering this document in the right direction. 671 14. References 673 14.1. Normative References 675 [I-D.ietf-rift-rift] 676 Przygienda, T., Sharma, A., Thubert, P., Atlas, A., and J. 677 Drake, "RIFT: Routing in Fat Trees", draft-ietf-rift- 678 rift-01 (work in progress), April 2018. 680 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 681 Requirement Levels", BCP 14, RFC 2119, 682 DOI 10.17487/RFC2119, March 1997, 683 . 685 14.2. Informative References 687 [RFC3971] Arkko, J., Ed., Kempf, J., Zill, B., and P. Nikander, 688 "SEcure Neighbor Discovery (SEND)", RFC 3971, 689 DOI 10.17487/RFC3971, March 2005, 690 . 692 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 693 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 694 DOI 10.17487/RFC4861, September 2007, 695 . 697 [RFC5120] Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi 698 Topology (MT) Routing in Intermediate System to 699 Intermediate Systems (IS-ISs)", RFC 5120, 700 DOI 10.17487/RFC5120, February 2008, 701 . 703 [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., 704 Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, 705 JP., and R. Alexander, "RPL: IPv6 Routing Protocol for 706 Low-Power and Lossy Networks", RFC 6550, 707 DOI 10.17487/RFC6550, March 2012, 708 . 710 [RFC8202] Ginsberg, L., Previdi, S., and W. Henderickx, "IS-IS 711 Multi-Instance", RFC 8202, DOI 10.17487/RFC8202, June 712 2017, . 714 14.3. URIs 716 [1] https://tools.ietf.org/html/rfc5120 718 [2] https://tools.ietf.org/html/rfc6822 720 [3] https://tools.ietf.org/html/rfc6775 722 Author's Address 724 Yan Filyurin (editor) 725 Bloomberg LP 726 731 Lexington Ave. 727 New York, NY 10022 728 US 730 EMail: yfilyurin@bloomberg.net