idnits 2.17.1 draft-ietf-bess-datacenter-gateway-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 4, 2020) is 1412 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-15 ** Obsolete normative reference: RFC 7752 (Obsoleted by RFC 9552) == Outdated reference: A later version (-06) exists of draft-farrel-spring-sr-domain-interconnect-05 == Outdated reference: A later version (-18) exists of draft-ietf-idr-bgp-ls-segment-routing-ext-16 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Working Group A. Farrel 3 Internet-Draft Old Dog Consulting 4 Intended status: Standards Track J. Drake 5 Expires: December 6, 2020 E. Rosen 6 Juniper Networks 7 K. Patel 8 Arrcus, Inc. 9 L. Jalil 10 Verizon 11 June 4, 2020 13 Gateway Auto-Discovery and Route Advertisement for Segment Routing 14 Enabled Domain Interconnection 15 draft-ietf-bess-datacenter-gateway-07 17 Abstract 19 Data centers are critical components of the infrastructure used by 20 network operators to provide services to their customers. Data 21 centers are attached to the Internet or a backbone network by gateway 22 routers. One data center typically has more than one gateway for 23 commercial, load balancing, and resiliency reasons. 25 Segment Routing is a protocol mechanism that can be used within a 26 data center, and also for steering traffic that flows between two 27 data center sites. In order that one data center site may load 28 balance the traffic it sends to another data center site, it needs to 29 know the complete set of gateway routers at the remote data center, 30 the points of connection from those gateways to the backbone network, 31 and the connectivity across the backbone network. 33 Segment Routing may also be operated in other domains, such as access 34 networks. Those domains also need to be connected across backbone 35 networks through gateways. 37 This document defines a mechanism using the BGP Tunnel Encapsulation 38 attribute to allow each gateway router to advertise the routes to the 39 prefixes in the Segment Routing domains to which it provides access, 40 and also to advertise on behalf of each other gateway to the same 41 Segment Routing domain. 43 Status of This Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF). Note that other groups may also distribute 50 working documents as Internet-Drafts. The list of current Internet- 51 Drafts is at https://datatracker.ietf.org/drafts/current/. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 This Internet-Draft will expire on December 6, 2020. 60 Copyright Notice 62 Copyright (c) 2020 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents 67 (https://trustee.ietf.org/license-info) in effect on the date of 68 publication of this document. Please review these documents 69 carefully, as they describe your rights and restrictions with respect 70 to this document. Code Components extracted from this document must 71 include Simplified BSD License text as described in Section 4.e of 72 the Trust Legal Provisions and are provided without warranty as 73 described in the Simplified BSD License. 75 Table of Contents 77 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 78 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 5 79 3. SR Domain Gateway Auto-Discovery . . . . . . . . . . . . . . 5 80 4. Relationship to BGP Link State and Egress Peer Engineering . 7 81 5. Advertising an SR Domain Route Externally . . . . . . . . . . 7 82 6. Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . 7 83 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 84 7.1. Tunnel Encapsulation Tunnel Type . . . . . . . . . . . . 7 85 7.2. Tunnel Encapsulation Sub-TLVs . . . . . . . . . . . . . . 8 86 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 87 9. Manageability Considerations . . . . . . . . . . . . . . . . 9 88 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 89 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 90 11.1. Normative References . . . . . . . . . . . . . . . . . . 10 91 11.2. Informative References . . . . . . . . . . . . . . . . . 10 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 94 1. Introduction 96 Data centers (DCs) are critical components of the infrastructure used 97 by network operators to provide services to their customers. DCs are 98 attached to the Internet or a backbone network by gateway routers 99 (GWs). One DC typically has more than one GW for various reasons 100 including commercial preferences, load balancing, and resiliency 101 against connection of device failure. 103 Segment Routing (SR) [RFC8402] is a protocol mechanism that can be 104 used within a DC, and also for steering traffic that flows between 105 two DC sites. In order for a source (ingress) DC that uses SR to 106 load balance the flows it sends to a destination (egress) DC, it 107 needs to know the complete set of entry nodes (i.e., GWs) for that 108 egress DC from the backbone network connecting the two DCs. Note 109 that it is assumed that the connected set of DCs and the backbone 110 network connecting them are part of the same SR BGP Link State (LS) 111 instance ([RFC7752] and [I-D.ietf-idr-bgpls-segment-routing-epe]) so 112 that traffic engineering using SR may be used for these flows. 114 SR may also be operated in other domains, such as access networks. 115 Those domains also need to be connected across backbone networks 116 through gateways. For illustrative purposes, consider the Ingress 117 and Egress SR Domains shown in Figure 1 as spearate ASes. 119 Suppose that there are two gateways, GW1 and GW2 as shown in 120 Figure 1, for a given egress SR domain and that they each advertise a 121 route to prefix X which is located within the egress SR domain with 122 each setting itself as next hop. One might think that the GWs for X 123 could be inferred from the routes' next hop fields, but typically it 124 is not the case that both routes get distributed across the backbone: 125 rather only the best route, as selected by BGP, is distributed. This 126 precludes load balancing flows across both GWs. 128 ----------------- --------------------- 129 | Ingress | | Egress ------ | 130 | SR Domain | | SR Domain |Prefix| | 131 | | | | X | | 132 | | | ------ | 133 | -- | | --- --- | 134 | |GW| | | |GW1| |GW2| | 135 -------++-------- ----+-----------+-+-- 136 | \ | / | 137 | \ | / | 138 | -+------------- --------+--------+-- | 139 | ||ASBR| ----| |---- |ASBR| |ASBR| | | 140 | | ---- |ASBR+------+ASBR| ---- ---- | | 141 | | ----| |---- | | 142 | | | | | | 143 | | ----| |---- | | 144 | | AS1 |ASBR+------+ASBR| AS2 | | 145 | | ----| |---- | | 146 | --------------- -------------------- | 147 --+-----------------------------------------------+-- 148 | |ASBR| |ASBR| | 149 | ---- AS3 ---- | 150 | | 151 ----------------------------------------------------- 153 Figure 1: Example Segment Routing Domain Interconnection 155 The obvious solution to this problem is to use the BGP feature that 156 allows the advertisement of multiple paths in BGP (known as Add- 157 Paths) [RFC7911] to ensure that all routes to X get advertised by 158 BGP. However, even if this is done, the identity of the GWs will be 159 lost as soon as the routes get distributed through an Autonomous 160 System Border Router (ASBR) that will set itself to be the next hop. 161 And if there are multiple Autonomous Systems (ASes) in the backbone, 162 not only will the next hop change several times, but the Add-Paths 163 technique will experience scaling issues. This all means that the 164 Add-Paths approach is limited to SR domains connected over a single 165 AS. 167 This document defines a solution that overcomes this limitation and 168 works equally well with a backbone constructed from one or more ASes. 169 The solution uses the Tunnel Encapsulation attribute 170 [I-D.ietf-idr-tunnel-encaps] as follows: 172 We define a new tunnel type, "SR Tunnel". When the GWs to a given 173 SR domain advertise a route to a prefix X within the SR domain, 174 they will each include a Tunnel Encapsulation attribute with 175 multiple tunnel instances each of type "SR Tunnel" (value 17), one 176 for each GW, and each containing a Remote Endpoint sub-TLV with 177 that GW's address. 179 In other words, each route advertised by a GW identifies all of the 180 GWs to the same SR domain (see Section 3 for a discussion of how GWs 181 discover each other). Therefore, even if only one of the routes is 182 distributed to other ASes, it will not matter how many times the next 183 hop changes, as the Tunnel Encapsulation attribute (and its remote 184 endpoint sub-TLVs) will remain unchanged. 186 To put this in the context of Figure 1, GW1 and GW2 discover each 187 other as gateways for the egress SR domain. Both GW1 and GW2 188 advertise themselves as having routes to prefix X. Furthermore, GW1 189 includes a Tunnel Encapsulation attribute with a tunnel instance of 190 type "SR tunnel" for itself and another for GW2. Similarly, GW2 191 includes a Tunnel Encapsulation for itself and another for GW1. The 192 gateway in the ingress SR domain can now see all possible paths to 193 the egress SR domain regardless of which route advertisement is 194 propagated to it, and it can choose one, or balance traffic flows as 195 it sees fit. 197 The protocol extensions defined in this document are put into the 198 broader context of SR domain interconnection by 199 [I-D.farrel-spring-sr-domain-interconnect]. That document shows how 200 other existing protocol elements may be combined with the extensions 201 defined in this document to provide a full system. 203 2. Requirements Language 205 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 206 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 207 "OPTIONAL" in this document are to be interpreted as described in BCP 208 14 [RFC2119] [RFC8174] when, and only when, they appear in all 209 capitals, as shown here. 211 3. SR Domain Gateway Auto-Discovery 213 To allow a given SR domain's GWs to auto-discover each other and to 214 coordinate their operations, the following procedures are 215 implemented: 217 o Each GW is configured with an identifier for the SR domain. That 218 identifier is common across all GWs to the domain (i.e., the same 219 identifier is used by all GWs to the same SR domain), and unique 220 across all SR domains that are connected (i.e., across all GWs to 221 all SR domains that are interconnected). 223 o A route target ([RFC4360]) is attached to each GW's auto-discovery 224 route and has its value set to the SR domain identifier. 226 o Each GW constructs an import filtering rule to import any route 227 that carries a route target with the same SR domain identifier 228 that the GW itself uses. This means that only these GWs will 229 import those routes, and that all GWs to the same SR domain will 230 import each other's routes and will learn (auto-discover) the 231 current set of active GWs for the SR domain. 233 The auto-discovery route that each GW advertises consists of the 234 following: 236 o An IPv4 or IPv6 NLRI containing one of the GW's loopback addresses 237 (that is, with an AFI/SAFI pair that is one of 1/1, 2/1, 1/4, or 238 2/4). 240 o A Tunnel Encapsulation attribute containing the GW's encapsulation 241 information, which at a minimum consists of an SR Tunnel TLV (type 242 TBD1 to be allocated by IANA) with a Remote Endpoint sub-TLV as 243 specified in [I-D.ietf-idr-tunnel-encaps]. 245 To avoid the side effect of applying the Tunnel Encapsulation 246 attribute to any packet that is addressed to the GW itself, the GW 247 SHOULD use a different loopback address for the two cases. 249 As described in Section 1, each GW will include a Tunnel 250 Encapsulation attribute for each GW that is active for the SR domain 251 (including itself), and will include these in every route advertised 252 externally to the SR domain by each GW. As the current set of active 253 GWs changes (due to the addition of a new GW or the failure/removal 254 of an existing GW) each externally advertised route will be re- 255 advertised with the set of SR tunnel instances reflecting the current 256 set of active GWs. 258 If a gateway becomes disconnected from the backbone network, or if 259 the SR domain operator decides to terminate the gateway's activity, 260 it withdraws the advertisements described above. This means that 261 remote gateways at other sites will stop seeing advertisements from 262 this gateway. It also means that other local gateways at this site 263 will "unlearn" the removed gateway and stop including a Tunnel 264 Encapsulation attribute for the removed gateway in their 265 advertisements. 267 4. Relationship to BGP Link State and Egress Peer Engineering 269 When a remote GW receives a route to a prefix X it can use the SR 270 tunnel instances within the contained Tunnel Encapsulation attribute 271 to identify the GWs through which X can be reached. It uses this 272 information to compute SR Traffic Engineering (SR TE) paths across 273 the backbone network looking at the information advertised to it in 274 SR BGP Link State (BGP-LS) [I-D.ietf-idr-bgp-ls-segment-routing-ext] 275 and correlated using the SR domain identity. SR Egress Peer 276 Engineering (EPE) [I-D.ietf-idr-bgpls-segment-routing-epe] can be 277 used to supplement the information advertised in BGP-LS. 279 5. Advertising an SR Domain Route Externally 281 When a packet destined for prefix X is sent on an SR TE path to a GW 282 for the SR domain containing X, it needs to carry the receiving GW's 283 label for X such that this label rises to the top of the stack before 284 the GW completes its processing of the packet. To achieve this we 285 place a Prefix SID sub-TLV [I-D.ietf-idr-tunnel-encaps] for X in each 286 SR tunnel instance in the Tunnel Encapsulation attribute in the 287 externally advertised route for X. 289 Alternatively, if the GWs for a given SR domain are configured to 290 allow remote GWs to perform SR TE through that SR domain for a prefix 291 X, then each GW computes an SR TE path through that SR domain to X 292 from each of the currently active GWs, and places each in an MPLS 293 label stack sub-TLV [I-D.ietf-idr-tunnel-encaps] in the SR tunnel 294 instance for that GW. 296 6. Encapsulation 298 If the GWs for a given SR domain are configured to allow remote GWs 299 to send them a packet in that SR domain's native encapsulation, then 300 each GW will also include multiple instances of a tunnel TLV for that 301 native encapsulation in externally advertised routes: one for each GW 302 and each containing a remote endpoint sub-TLV with that GW's address. 303 A remote GW may then encapsulate a packet according to the rules 304 defined via the sub-TLVs included in each of the tunnel TLV 305 instances. 307 7. IANA Considerations 309 7.1. Tunnel Encapsulation Tunnel Type 311 IANA maintains a registry called "Border Gateway Protocol (BGP) 312 Parameters" with a sub-registry called "BGP Tunnel Encapsulation 313 Attribute Tunnel Types." The registration policy for this registry 314 is First-Come First-Served [RFC8126]. 316 IANA has assigned the value 17 from this sub-registry for "SR 317 Tunnel". 319 7.2. Tunnel Encapsulation Sub-TLVs 321 IANA maintains a registry called "Border Gateway Protocol (BGP) 322 Parameters" with a sub-registry called "BGP Tunnel Encapsulation 323 Attribute Sub-TLVs." The registration policy for this registry is 324 Standards Action.[RFC8126]. 326 IANA is requested to assign a codepoint from this sub-registry for 327 "SR Tunnel TLV" (TBD1). The next available value may be used and 328 reference should be made to this document. 330 8. Security Considerations 332 From a protocol point of view, the mechanisms described in this 333 document can leverage the security mechanisms already defined for 334 BGP. Further discussion of security considerations for BGP may be 335 found in the BGP specification itself [RFC4271] and in the security 336 analysis for BGP [RFC4272]. The original discussion of the use of 337 the TCP MD5 signature option to protect BGP sessions is found in 338 [RFC5925], while [RFC6952] includes an analysis of BGP keying and 339 authentication issues. 341 The mechanisms described in this document involve sharing routing or 342 reachability information between domains: that may mean disclosing 343 information that is normally contained within a domain. So it needs 344 to be understood that normal security paradigms based on the 345 boundaries of domains are weakened. Discussion of these issues with 346 respect to VPNs can be found in [RFC4364], while [RFC7926] describes 347 many of the issues associated with the exchange of topology or TE 348 information between domains. 350 Particular exposures resulting from this work include: 352 o Gateways to a domain will know about all other gateways to the 353 same domain. This feature applies within a domain and so is not a 354 substantial exposure, but it does mean that if the BGP exchanges 355 within a domain can be snooped or if a gateway can be subverted 356 then an attacker may learn the full set of gateways to a domain. 357 This would facilitate more effective attacks on that domain. 359 o The existence of multiple gateways to a domain becomes more 360 visible across the backbone and even into remote domains. This 361 means that an attacker is able to prepare a more comprehensive 362 attack than exists when only the locally attached backbone network 363 (e.g., the AS that hosts the domain) can see all of the gateways 364 to a site. For example, a Denial of Service attack on a single GW 365 is mitigated by the existence of other GWs, but if the attacker 366 knows about all the gateways then the whole set can be attacked at 367 once. 369 o A node in a domain that does not have external BGP peering (i.e., 370 is not really a domain gateway and cannot speak BGP into the 371 backbone network) may be able to get itself advertised as a 372 gateway by letting other genuine gateways discover it (by speaking 373 BGP to them within the domain) and so may get those genuine 374 gateways to advertise it as a gateway into the backbone network. 375 This would allow the malicious node to attract traffic without 376 having to have secure BGP peerings with out-of-domain nodes. 378 o If it is possible to modify a BGP message within the backbone, it 379 may be possible to spoof the existence of a gateway. This could 380 cause traffic to be attracted to a specific node and might result 381 in black-holing of traffic. 383 All of the issues in the list above could cause disruption to domain 384 interconnection, but are not new protocol vulnerabilities so much as 385 new exposures of information that SHOULD be protected against using 386 existing protocol mechanisms. Furthermore, it is a general 387 observation that if these attacks are possible then it is highly 388 likely that far more significant attacks can be made on the routing 389 system. It should be noted that BGP peerings are not discovered, but 390 always arise from explicit configuration. 392 9. Manageability Considerations 394 The principal configuration item added by this solution is the 395 allocation of an SR domain identifier. The same identifier MUST be 396 assigned to every GW to the same domain, and each domain MUST have a 397 different identifier. This requires coordination, probably through a 398 central management agent. 400 It should be noted that BGP peerings are not discovered, but always 401 arise from explicit configuration. This is no different from any 402 other BGP operation. 404 10. Acknowledgements 406 Thanks to Bruno Rijsman, Stephane Litkowsji, and Boris Hassanov for 407 review comments, and to Robert Raszuk for useful discussions. 409 11. References 411 11.1. Normative References 413 [I-D.ietf-idr-bgpls-segment-routing-epe] 414 Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, 415 S., and J. Dong, "BGP-LS extensions for Segment Routing 416 BGP Egress Peer Engineering", draft-ietf-idr-bgpls- 417 segment-routing-epe-19 (work in progress), May 2019. 419 [I-D.ietf-idr-tunnel-encaps] 420 Patel, K., Velde, G., and S. Ramachandra, "The BGP Tunnel 421 Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-15 422 (work in progress), December 2019. 424 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 425 Requirement Levels", BCP 14, RFC 2119, 426 DOI 10.17487/RFC2119, March 1997, 427 . 429 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 430 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 431 DOI 10.17487/RFC4271, January 2006, 432 . 434 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 435 Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, 436 February 2006, . 438 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 439 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 440 June 2010, . 442 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 443 S. Ray, "North-Bound Distribution of Link-State and 444 Traffic Engineering (TE) Information Using BGP", RFC 7752, 445 DOI 10.17487/RFC7752, March 2016, 446 . 448 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 449 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 450 May 2017, . 452 11.2. Informative References 454 [I-D.farrel-spring-sr-domain-interconnect] 455 Farrel, A. and J. Drake, "Interconnection of Segment 456 Routing Domains - Problem Statement and Solution 457 Landscape", draft-farrel-spring-sr-domain-interconnect-05 458 (work in progress), October 2018. 460 [I-D.ietf-idr-bgp-ls-segment-routing-ext] 461 Previdi, S., Talaulikar, K., Filsfils, C., Gredler, H., 462 and M. Chen, "BGP Link-State extensions for Segment 463 Routing", draft-ietf-idr-bgp-ls-segment-routing-ext-16 464 (work in progress), June 2019. 466 [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", 467 RFC 4272, DOI 10.17487/RFC4272, January 2006, 468 . 470 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 471 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 472 2006, . 474 [RFC6952] Jethanandani, M., Patel, K., and L. Zheng, "Analysis of 475 BGP, LDP, PCEP, and MSDP Issues According to the Keying 476 and Authentication for Routing Protocols (KARP) Design 477 Guide", RFC 6952, DOI 10.17487/RFC6952, May 2013, 478 . 480 [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, 481 "Advertisement of Multiple Paths in BGP", RFC 7911, 482 DOI 10.17487/RFC7911, July 2016, 483 . 485 [RFC7926] Farrel, A., Ed., Drake, J., Bitar, N., Swallow, G., 486 Ceccarelli, D., and X. Zhang, "Problem Statement and 487 Architecture for Information Exchange between 488 Interconnected Traffic-Engineered Networks", BCP 206, 489 RFC 7926, DOI 10.17487/RFC7926, July 2016, 490 . 492 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 493 Writing an IANA Considerations Section in RFCs", BCP 26, 494 RFC 8126, DOI 10.17487/RFC8126, June 2017, 495 . 497 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 498 Decraene, B., Litkowski, S., and R. Shakir, "Segment 499 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 500 July 2018, . 502 Authors' Addresses 504 Adrian Farrel 505 Old Dog Consulting 507 Email: adrian@olddog.co.uk 509 John Drake 510 Juniper Networks 512 Email: jdrake@juniper.net 514 Eric Rosen 515 Juniper Networks 517 Email: erosen52@gmail.com 519 Keyur Patel 520 Arrcus, Inc. 522 Email: keyur@arrcus.com 524 Luay Jalil 525 Verizon 527 Email: luay.jalil@verizon.com