idnits 2.17.1 draft-ietf-lsvr-applicability-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 22, 2018) is 2013 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-29) exists of draft-ietf-lsvr-bgp-spf-03 == Outdated reference: A later version (-17) exists of draft-acee-idr-lldp-peer-discovery-03 == Outdated reference: A later version (-02) exists of draft-li-lsr-dynamic-flooding-01 == Outdated reference: A later version (-12) exists of draft-xu-idr-neighbor-autodiscovery-10 == Outdated reference: A later version (-03) exists of draft-ymbk-lsvr-lsoe-01 -- Obsolete informational reference (is this intentional?): RFC 7752 (Obsoleted by RFC 9552) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 LSVR K. Patel 3 Internet-Draft Arrcus, Inc. 4 Intended status: Informational A. Lindem 5 Expires: April 25, 2019 Cisco Systems 6 S. Zandi 7 G. Dawra 8 Linkedin 9 October 22, 2018 11 Usage and Applicability of Link State Vector Routing in Data Centers 12 draft-ietf-lsvr-applicability-01.txt 14 Abstract 16 This document discusses the usage and applicability of Link State 17 Vector Routing (LSVR) extensions in the CLOS architecture of Data 18 Center Networks. The document is intended to provide a simplified 19 guide for the deployment of LSVR extensions. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on April 25, 2019. 38 Copyright Notice 40 Copyright (c) 2018 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 57 3. Recommended Reading . . . . . . . . . . . . . . . . . . . . . 3 58 4. Common Deployment Scenario . . . . . . . . . . . . . . . . . 3 59 5. Justification for BGP SPF Extension . . . . . . . . . . . . . 4 60 6. LSVR Applicability to CLOS Networks . . . . . . . . . . . . . 5 61 6.1. Usage of BGP-LS SAFI . . . . . . . . . . . . . . . . . . 5 62 6.1.1. Relationship to Other BGP AFI/SAFI Tuples . . . . . . 6 63 6.2. Peering Models . . . . . . . . . . . . . . . . . . . . . 6 64 6.2.1. Sparse Peering Model . . . . . . . . . . . . . . . . 6 65 6.2.2. Bi-Connected Graph Heuristic . . . . . . . . . . . . 7 66 6.3. BGP Peer Discovery . . . . . . . . . . . . . . . . . . . 7 67 6.3.1. BGP Peer Discovery Requirements . . . . . . . . . . . 7 68 6.3.2. BGP Peer Discovery Alternatives . . . . . . . . . . . 8 69 6.3.3. Data Center Interconnect (DCI) Applicability . . . . 8 70 6.4. Non-CLOS/FAT Tree Topology Applicability . . . . . . . . 9 71 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 72 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 73 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 74 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 75 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 76 10.2. Informative References . . . . . . . . . . . . . . . . . 10 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 79 1. Introduction 81 This document complements [I-D.ietf-lsvr-bgp-spf] by discussing the 82 applicability of the technology in a simple and fairly common 83 deployment scenario, which is described in Section 4. 85 After describing the deployment scenario, Section 5 will describe the 86 reasons for BGP modifications for such deployments. 88 Once the control plane routing protocol requirements are described, 89 Section 6 will cover the LSVR protocol enhancements to BGP to meet 90 these requirements and their applicability to Data Center CLOS 91 networks. 93 2. Requirements Language 95 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 96 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 97 "OPTIONAL" in this document are to be interpreted as described in BCP 98 14 [RFC2119] [RFC8174] when, and only when, they appear in all 99 capitals, as shown here. 101 3. Recommended Reading 103 This document assumes knowledge of existing data center networks and 104 data center network topologies [CLOS]. This document also assumes 105 knowledge of data center routing protocols like BGP [RFC4271], BGP- 106 SPF [I-D.ietf-lsvr-bgp-spf], OSPF [RFC2328], as well as, data center 107 OAM protocols like LLDP [RFC4957] and BFD [RFC5580]. 109 4. Common Deployment Scenario 111 Within a Data Center, a common network design to interconnect servers 112 is done using the CLOS topology [CLOS]. The CLOS topology is fully 113 non-blocking and the topology is realized using Equal Cost Multipath 114 (ECMP). In a CLOS topology, the minimum number of parallel paths 115 between two servers is determined by the width of a tier-1 stage as 116 shown in the figure 1. 118 The following example illustrates multistage CLOS topology. 120 Tier-1 121 +-----+ 122 |NODE | 123 +->| 12 |--+ 124 | +-----+ | 125 Tier-2 | | Tier-2 126 +-----+ | +-----+ | +-----+ 127 +------------>|NODE |--+->|NODE |--+--|NODE |-------------+ 128 | +-----| 9 |--+ | 10 | +--| 11 |-----+ | 129 | | +-----+ +-----+ +-----+ | | 130 | | | | 131 | | +-----+ +-----+ +-----+ | | 132 | +-----+---->|NODE |--+ |NODE | +--|NODE |-----+-----+ | 133 | | | +---| 6 |--+->| 7 |--+--| 8 |---+ | | | 134 | | | | +-----+ | +-----+ | +-----+ | | | | 135 | | | | | | | | | | 136 +-----+ +-----+ | +-----+ | +-----+ +-----+ 137 |NODE | |NODE | Tier-3 +->|NODE |--+ Tier-3 |NODE | |NODE | 138 | 1 | | 2 | | 3 | | 4 | | 5 | 139 +-----+ +-----+ +-----+ +-----+ +-----+ 140 | | | | | | | | 141 A O B O <- Servers -> Z O O O 143 Figure 1: Illustration of the basic CLOS 145 5. Justification for BGP SPF Extension 147 In order to simplify layer-3 routing and operations [RFC7938], many 148 data centers use BGP as a routing protocol to create an overlay as 149 well as an underlay network for their CLOS Topologies. However, BGP 150 is a path-vector routing protocol. Since it does not create a fabric 151 topology, it uses hop-by-hop EBGP peering to facilitate hop-by-hop 152 routing to create the underlay network and to resolve any overlay 153 next hops. The hop-by-hop BGP peering paradigm imposes several 154 restrictions within a CLOS. It severely prohibits a deployment of 155 Route Reflectors/Route Controllers as the EBGP sessions are congruent 156 with the data path. The BGP best path algorithm is prefix-based and 157 it prevents announcements of prefixes to other BGP speakers until the 158 best path decision process is performed for the prefix at each 159 intermediate hop. These restrictions significantly delay the overall 160 convergence of the underlay network within a CLOS. 162 The LSVR SPF modifications allow BGP to overcome these limitations. 163 Furthermore, using the BGP-LS NLRI format [RFC7752] allows the LSVR 164 data to be advertised for nodes, links, and prefixes in the BGP 165 routing domain and used for SPF computations. 167 6. LSVR Applicability to CLOS Networks 169 With the BGP SPF extensions [I-D.ietf-lsvr-bgp-spf], the BGP best 170 path computation and route computation are replaced with OSPF-like 171 algorithms [RFC2328] both to determine whether an BGP-LS NLRI has 172 changed and needs to be re-advertised and to compute the routing 173 table. These modifications will significantly improve convergence of 174 the underlay while affording the operational benefits of a single 175 routing protocol [RFC7938]. 177 Data center controllers typically require visibility to the BGP 178 topology to compute traffic-engineered paths. These controllers 179 learn the topology and other relevant information via the BGP-LS 180 address family [RFC7752] which is totally independent of the underlay 181 address families (usually IPv4/IPv6 unicast). Furthermore, in 182 traditional BGP underlays, all the BGP routers will need to advertise 183 their BGP-LS information independently. With the BGP SPF extensions, 184 controllers can learn the topology using the same BGP advertisements 185 used to compute the underlay routes. Furthermore, these data center 186 controllers can avail the convergence advantages of the BGP SPF 187 extensions. The placement of controllers can be outside of the 188 forwarding path or within the forwarding path. 190 Alternatively, as each and every router in the BGP SPF domain will 191 have a complete view of the topology, the operator can also choose to 192 configure BGP sessions in hop-by-hop peering model described in 193 [RFC7938] along with BFD [RFC5580]. In doing so, while the hop-by- 194 hop peering model lacks inherent benefits of the controller-based 195 model, BGP updates need not be serialized by BGP best path algorithm 196 in either of these models. This helps overall network convergence. 198 6.1. Usage of BGP-LS SAFI 200 The BGP SPF extensions [I-D.ietf-lsvr-bgp-spf] define a new BGP-LS 201 SAFI for announcement of BGP SPF link-state. The NLRI format and its 202 associated attributes follow the format of BGP-LS for node, link, and 203 prefix announcements. Whether the peering model within a CLOS 204 follows hop-by-hop peering described in [RFC7938] or any controller- 205 based or route-reflector peering, an operator can exchange BGP SPF 206 SAFI routes over the BGP peering by simply configuring BGP SPF SAFI 207 between the necessary BGP speakers. 209 The BGP-LS SPF SAFI can also co-exist with BGP IP Unicast SAFI which 210 could exchange overlapping IP routes. The routes received by these 211 SAFIs are evaluated, stored, and announced separately according to 212 the rules of [RFC4760]. The tie-breaking of route installation is a 213 matter of the local policies and preferences of the network operator. 215 Finally, as the BGP SPF peering is done following the procedures 216 described in [RFC4271], all the existing transport security 217 mechanisms including [RFC5925] are available for the BGP-LS SPF SAFI. 219 6.1.1. Relationship to Other BGP AFI/SAFI Tuples 221 Normally, the BGP-LS AFI/SAFI is used solely to compute the underlay 222 and is given preference over other AFI/SAFIs. Other BGP SAFIs, e.g., 223 IPv6/IPv6 Unicast VPN would use the BGP-SPF computed routes for next 224 hop resolution. However, if BGP-LS NLRI is also being advertised for 225 controller consumption, there is no need to replicate the Node, Link, 226 and Prefix NLRI in BGP-NLRI. Rather, additional NLRI attributes can 227 be advertised in the BGP-LS SPF AFI/SAFI as required. 229 6.2. Peering Models 231 As previously stated, BGP SPF can be deployed using the existing 232 peering model where there is a single hop BGP session on each and 233 every link in the data center fabric [RFC7938]. This provides for 234 both the advertisement of routes and the determination of link and 235 neighboring switch availability. With BGP SPF, the underlay will 236 converge faster due to changes in the decision process which will 237 allow NLRI changes to be advertised faster after detecting a change. 239 6.2.1. Sparse Peering Model 241 Alternately, BFD [RFC5580] can be used to swiftly determine the 242 availability of links and the BGP peering model can be significantly 243 sparser than the data center fabric. BGP SPF sessions then only be 244 established with enough peers to provide a bi-connected graph. If 245 IEBGP is used, then the BGP routers at tier N-1 will act as route- 246 reflectors for the routers at tier N. 248 The obvious usage of sparse peering is to avoid parallel sessions 249 between the same two BGP speakers in the data center fabric. 250 However, this use case is not very useful since parallel layer-3 251 links between the same two BGP routers are rare in CLOS or Fat-Tree 252 topologies. Two more interesting scenarios are described below. 254 In current Data Center topologies, there is often a very dense mesh 255 of links between levels, e.g., leaf and spine, providing 32-way, 256 64-way, or more Equal-Cost Multi-Path (ECMP) paths. In these 257 topologies, it is desirable not to have a BGP session on every link 258 and techniques such as the one described below Section 6.2.2 can be 259 used establish sessions on some subset of northbound links. 261 Alternately, controller-based data center topologies are envisioned 262 where BGP speakers within the data center only establish BGP sessions 263 with two or more controllers. In these topologies, fabric nodes 264 below the first tier (using [RFC7938] hierarchy) will establish BGP 265 multi-hop sessions with the controllers. For the multi-hop sessions, 266 determining the route to the controllers without depending on BGP 267 would need to be through some other means beyond the scope of this 268 document. However, the BGP discovery mechanisms Section 6.3 would be 269 one possibility. 271 6.2.2. Bi-Connected Graph Heuristic 273 With this heuristic, discovery of BGP peers is assumed Section 6.3. 274 Additionally, it assumed that the direction of the peering can be 275 ascertained. In the context of a data center fabric, direction is 276 either northbound (toward the spine), southbound (toward the Top-Of- 277 Rack (TOR) switches) or east-west (same level in hierarchy. The 278 determination of the direction is beyond the scope of this document. 279 However, it would be reasonable to assume a technique where the TOR 280 switches can be identified and the number of hops to the TOR is used 281 to determine the direction. 283 In this heuristic, BGP speakers allow passive session establishment 284 for southbound BGP sessions. For northbound sessions, BGP speakers 285 will attempt to maintain two northbound BGP sessions with different 286 switches (in data center fabrics there is normally a single layer-3 287 connection anyway). For east-west sessions, passive BGP session 288 establishment is allowed. However, BGP speaker will never actively 289 establish an east-west BGP session unless it can't establish two 290 northbound BGP sessions. 292 6.3. BGP Peer Discovery 294 6.3.1. BGP Peer Discovery Requirements 296 The most basic requirement is to be able to discover the address of a 297 single-hop peer without pre-configuration. This is being 298 accomplished today with using IPv6 Router Advertisements (RA) 299 [RFC4861] and assuming that a BGP sessions is desired with any 300 discovered peer. Beyond the basic requirement, it is useful to have 301 to following information relating to the BGP session: 303 o Autonomous System (AS) and BGP Identifier of a potential peer. 304 The latter can be used for debugging and to decrease the 305 likelihood of BGP session establishment collisions. 307 o Security capabilities supported and for cryptographic 308 authentication, the security capabilities and possibly a key-chain 309 [RFC8177] to be used. 311 o Session Policy Identifier - A group number or name used to 312 associate common session parameters with the peer. For example, 313 in a data center, BGP sessions with a Top of Rack (ToR) device 314 could have parameters than BGP sessions between leaf and spine. 316 In a data center fabric, it is often useful to know whether a peer is 317 southbound (towards the servers) or northbound (towards the spine or 318 super-spine) Section 6.2.2. A potential requirement would also be to 319 determine this dynamically. One mechanism, without specifying all 320 the details, might be for the ToRs to be identified when installed 321 and for the others switches in the fabric to determine their level 322 based on the distance from the closest ToR. 324 If there are multiple links between BGP speakers or the links between 325 BGP speakers are unnumbered, it is also useful to be able to 326 establish multi-hop sessions using the loopback addresses. This will 327 often require the discovery protocol to install route(s) toward the 328 potential peer loopback addresses prior to BGP session establishment. 330 Finally, a simple BGP discovery protocol could also be used to 331 establish a multi-hop session with one or more controllers by 332 advertising connectivity to one or more controllers. However, once 333 the multi-hop session actually traverses multiple nodes, it is 334 bordering a distance-vector routing protocol and possibly this is not 335 a good requirement for the discovery protocol. 337 6.3.2. BGP Peer Discovery Alternatives 339 While BGP peer discovery is not part of [I-D.ietf-lsvr-bgp-spf], 340 there are, at least, three proposals for BGP peer discovery. At 341 least one of these mechanisms will be adopted and will be applicable 342 to deployments other than the data center. It is strongly 343 RECOMMENDED that the accepted mechanism be used in conjunction with 344 BGP SPF in data centers. The BGP discovery mechanism should 345 discovery both peer addresses and endpoints for BFD discovery. 346 Additionally, it would be great if there were a heuristic for 347 determining whether the peer is at a tier above or below the 348 discovering BGP speaker (refer to Section 6.2.2). 350 The BGP discovery mechanisms under consideration are 351 [I-D.acee-idr-lldp-peer-discovery], 352 [I-D.xu-idr-neighbor-autodiscovery], and [I-D.ymbk-lsvr-lsoe]. 354 6.3.3. Data Center Interconnect (DCI) Applicability 356 Since BGP SPF is to be used for the routing underlay and DCI gateway 357 boxes typically have direct or very simple connectivity, BGP external 358 sessions would typically not include the BGP SPF SAFI. 360 6.4. Non-CLOS/FAT Tree Topology Applicability 362 The BGP SPF extensions [I-D.ietf-lsvr-bgp-spf] can be used in other 363 topologies and avail the inherent convergence improvements. 364 Additionally, sparse peering techniques may be utilized Section 6.2. 365 However, determining whether or to establish a BGP session is more 366 complex and the heuristic described in Section 6.2.2 cannot be used. 367 In such topologies, other techniques such as those described in 368 [I-D.li-lsr-dynamic-flooding] may be employed. One potential 369 deployment would be the underlay for a Service Provider (SP) backbone 370 where usage of a single protocol, i.e., BGP, is desired. 372 7. IANA Considerations 374 No IANA updates are requested by this document. 376 8. Security Considerations 378 This document introduces no new security considerations above and 379 beyond those already specified in the [RFC4271] and 380 [I-D.ietf-lsvr-bgp-spf]. 382 9. Acknowledgements 384 The authors would like to thank Alvaro Retana and Yan Filyurin for 385 the review and comments. 387 10. References 389 10.1. Normative References 391 [I-D.ietf-lsvr-bgp-spf] 392 Patel, K., Lindem, A., Zandi, S., and W. Henderickx, 393 "Shortest Path Routing Extensions for BGP Protocol", 394 draft-ietf-lsvr-bgp-spf-03 (work in progress), September 395 2018. 397 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 398 Requirement Levels", BCP 14, RFC 2119, 399 DOI 10.17487/RFC2119, March 1997, . 402 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 403 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 404 May 2017, . 406 10.2. Informative References 408 [CLOS] "A Study of Non-Blocking Switching Networks", The Bell 409 System Technical Journal, Vol. 32(2), DOI 410 10.1002/j.1538-7305.1953.tb01433.x, March 1953. 412 [I-D.acee-idr-lldp-peer-discovery] 413 Lindem, A., Patel, K., Zandi, S., Haas, J., and X. Xu, 414 "BGP Logical Link Discovery Protocol (LLDP) Peer 415 Discovery", draft-acee-idr-lldp-peer-discovery-03 (work in 416 progress), June 2018. 418 [I-D.li-lsr-dynamic-flooding] 419 Li, T., Psenak, P., Ginsberg, L., Przygienda, T., and D. 420 Cooper, "Dynamic Flooding on Dense Graphs", draft-li-lsr- 421 dynamic-flooding-01 (work in progress), October 2018. 423 [I-D.xu-idr-neighbor-autodiscovery] 424 Xu, X., Talaulikar, K., Bi, K., Tantsura, J., and N. 425 Triantafillis, "BGP Neighbor Discovery", draft-xu-idr- 426 neighbor-autodiscovery-10 (work in progress), October 427 2018. 429 [I-D.ymbk-lsvr-lsoe] 430 Bush, R. and K. Patel, "Link State Over Ethernet", draft- 431 ymbk-lsvr-lsoe-01 (work in progress), July 2018. 433 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 434 DOI 10.17487/RFC2328, April 1998, . 437 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 438 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 439 DOI 10.17487/RFC4271, January 2006, . 442 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 443 "Multiprotocol Extensions for BGP-4", RFC 4760, 444 DOI 10.17487/RFC4760, January 2007, . 447 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 448 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 449 DOI 10.17487/RFC4861, September 2007, . 452 [RFC4957] Krishnan, S., Ed., Montavont, N., Njedjou, E., Veerepalli, 453 S., and A. Yegin, Ed., "Link-Layer Event Notifications for 454 Detecting Network Attachments", RFC 4957, 455 DOI 10.17487/RFC4957, August 2007, . 458 [RFC5580] Tschofenig, H., Ed., Adrangi, F., Jones, M., Lior, A., and 459 B. Aboba, "Carrying Location Objects in RADIUS and 460 Diameter", RFC 5580, DOI 10.17487/RFC5580, August 2009, 461 . 463 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 464 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 465 June 2010, . 467 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 468 S. Ray, "North-Bound Distribution of Link-State and 469 Traffic Engineering (TE) Information Using BGP", RFC 7752, 470 DOI 10.17487/RFC7752, March 2016, . 473 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 474 BGP for Routing in Large-Scale Data Centers", RFC 7938, 475 DOI 10.17487/RFC7938, August 2016, . 478 [RFC8177] Lindem, A., Ed., Qu, Y., Yeung, D., Chen, I., and J. 479 Zhang, "YANG Data Model for Key Chains", RFC 8177, 480 DOI 10.17487/RFC8177, June 2017, . 483 Authors' Addresses 485 Keyur Patel 486 Arrcus, Inc. 487 2077 Gateway Pl 488 San Jose, CA 95110 489 USA 491 Email: keyur@arrcus.com 492 Acee Lindem 493 Cisco Systems 494 301 Midenhall Way 495 Cary, NC 95110 496 USA 498 Email: acee@cisco.com 500 Shawn Zandi 501 Linkedin 502 222 2nd Street 503 San Francisco, CA 94105 504 USA 506 Email: szandi@linkedin.com 508 Gaurav Dawra 509 Linkedin 510 222 2nd Street 511 San Francisco, CA 94105 512 USA 514 Email: gdawra@linkedin.com