idnits 2.17.1 draft-drake-bess-datacenter-gateway-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 16, 2016) is 2720 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-19) exists of draft-ietf-idr-bgpls-segment-routing-epe-05 == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-02 ** Obsolete normative reference: RFC 7752 (Obsoleted by RFC 9552) == Outdated reference: A later version (-04) exists of draft-gredler-idr-bgp-ls-segment-routing-ext-03 == Outdated reference: A later version (-15) exists of draft-ietf-spring-segment-routing-09 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Working Group J. Drake 3 Internet-Draft A. Farrel 4 Intended status: Informational E. Rosen 5 Expires: April 19, 2017 Juniper Networks 6 K. Patel 7 Arrcus, Inc. 8 L. Jalil 9 Verizon 10 October 16, 2016 12 Gateway Auto-Discovery and Route Advertisement for Segment Routing 13 Enabled Data Center Interconnection 14 draft-drake-bess-datacenter-gateway-02 16 Abstract 18 Data centers have become critical components of the infrastructure 19 used by network operators to provide services to their customers. 20 Data centers are attached to the Internet or a backbone network by 21 gateway routers. One data center typically has more than one gateway 22 for commercial, load balancing, and resiliency reasons. 24 Segment routing is a popular protocol mechanism for operating within 25 a data center, but also for steering traffic that flows between two 26 data center sites. In order that one data center site may load 27 balance the traffic it sends to another data center site it needs to 28 know the complete set of gateway routers at the remote data center, 29 the points of connection from those gateways to the backbone network, 30 and the connectivity across the backbone network. 32 This document defines a mechanism using the BGP Tunnel Encapsulation 33 attribute to allow each gateway router to advertise the routes to the 34 prefixes in the data center site to which it provides access, and 35 also to advertise on behalf of each other gateway to the same data 36 center site. 38 Requirements Language 40 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 41 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 42 document are to be interpreted as described in RFC 2119 [RFC2119]. 44 Status of This Memo 46 This Internet-Draft is submitted in full conformance with the 47 provisions of BCP 78 and BCP 79. 49 Internet-Drafts are working documents of the Internet Engineering 50 Task Force (IETF). Note that other groups may also distribute 51 working documents as Internet-Drafts. The list of current Internet- 52 Drafts is at http://datatracker.ietf.org/drafts/current/. 54 Internet-Drafts are draft documents valid for a maximum of six months 55 and may be updated, replaced, or obsoleted by other documents at any 56 time. It is inappropriate to use Internet-Drafts as reference 57 material or to cite them other than as "work in progress." 59 This Internet-Draft will expire on April 19, 2017. 61 Copyright Notice 63 Copyright (c) 2016 IETF Trust and the persons identified as the 64 document authors. All rights reserved. 66 This document is subject to BCP 78 and the IETF Trust's Legal 67 Provisions Relating to IETF Documents 68 (http://trustee.ietf.org/license-info) in effect on the date of 69 publication of this document. Please review these documents 70 carefully, as they describe your rights and restrictions with respect 71 to this document. Code Components extracted from this document must 72 include Simplified BSD License text as described in Section 4.e of 73 the Trust Legal Provisions and are provided without warranty as 74 described in the Simplified BSD License. 76 Table of Contents 78 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 79 2. DC Gateway Auto-Discovery . . . . . . . . . . . . . . . . . . 5 80 3. Relationship to BGP Link State and Egress Peer Engineering . 6 81 4. Advertising a DC Route Externally . . . . . . . . . . . . . . 6 82 5. Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . 7 83 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 84 7. Security Considerations . . . . . . . . . . . . . . . . . . . 7 85 8. Manageability Considerations . . . . . . . . . . . . . . . . 7 86 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 87 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 88 10.1. Normative References . . . . . . . . . . . . . . . . . . 8 89 10.2. Informative References . . . . . . . . . . . . . . . . . 8 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 92 1. Introduction 94 Data centers (DCs) have become critical components of the 95 infrastructure used by network operators to provide services to their 96 customers. DCs are attached to the Internet or a backbone network by 97 gateway routers (GWs). One DC typically has more than one GW for 98 various reasons including commercial preferences, load balancing, and 99 resiliency against connection of device failure. 101 Segment routing (SR) [I-D.ietf-spring-segment-routing] is a popular 102 protocol mechanism for operating within a DC, but also for steering 103 traffic that flows between two DC sites. In order for an ingress DC 104 that uses SR to load balance the flows it sends to an egress DC, it 105 needs to know the complete set of entry nodes (i.e., GWs) for that 106 egress DC from the backbone network connecting the two DCs. Note 107 that it is assumed that the connected set of DCs and the backbone 108 network connecting them are part of the same SR BGP Link State (LS) 109 instance ([RFC7752] and [I-D.ietf-idr-bgpls-segment-routing-epe]) so 110 that traffic engineering using SR may be used for these flows. 112 Suppose that there are two gateways, GW1 and GW2 as shown in 113 Figure 1, for a given egress DC and that they each advertise a route 114 to prefix X which is located within the egress DC with each setting 115 itself as next hop. One might think that the GWs for X could be 116 inferred from the routes' next hop fields, but typically it is not 117 the case that both routes get distributed across the backbone, rather 118 only the best route, as selected by BGP, is distributed. This 119 precludes load balancing flows across both GWs. 121 ----------------- -------------------- 122 | Ingress | | Egress ------ | 123 | DC Site | | DC Site |Prefix| | 124 | | | | X | | 125 | | | ------ | 126 | -- | | --- --- | 127 | |GW| | | |GW1| |GW2| | 128 -------++--------- ----+----------+-+-- 129 | \ | / | 130 | \ | / | 131 | -+------------- --------+-------+-- | 132 | ||PE| ----| |---- |PE| |PE| | | 133 | | -- |ASBR+------+ASBR| -- -- | | 134 | | ----| |---- | | 135 | | | | | | 136 | | ----| |---- | | 137 | | AS1 |ASBR+------+ASBR| AS2 | | 138 | | ----| |---- | | 139 | --------------- ------------------- | 140 --+----------------------------------------------+-- 141 | |PE| |PE| | 142 | -- AS3 -- | 143 | | 144 ---------------------------------------------------- 146 Figure 1: Example Data Center Interconnection 148 The obvious solution to this problem is to use the BGP feature that 149 allows the advertisement of multiple paths in BGP (known as Add- 150 Paths) [RFC7911] to ensure that all routes to X get advertised by 151 BGP. However, even if this is done, the identity of the GWs will be 152 lost as soon as the routes get distributed through an Autonomous 153 System Border Router (ASBR) that will set itself to be the next hop. 154 And if there are multiple Autonomous Systems (ASes) in the backbone, 155 not only will the next hop change several times, but the Add-Paths 156 technique will experience scaling issues. This all means that this 157 approach is limited to DC sites connected over a single AS. 159 This document defines a solution that overcomes this limitation and 160 works equally well with a backbone constructed from one or more ASes. 161 This solution uses the Tunnel Encapsulation attribute 162 [I-D.ietf-idr-tunnel-encaps] as follows: 164 We define a new tunnel type, "SR tunnel". When the GWs to a given 165 DC advertise a route to a prefix X within the DC, they will each 166 include a Tunnel Encapsulation attribute with multiple tunnel 167 instances each of type "SR tunnel", one for each GW, and each 168 containing a Remote Endpoint sub-TLV with that GW's address. 170 In other words, each route advertised by any GW identifies all of the 171 GWs to the same DC (see Section 2 for a discussion of how GWs 172 discover each other). Therefore, even if only one of the routes is 173 distributed to other ASes, it will not matter how many times the next 174 hop changes, as the Tunnel Encapsulation attribute (and its remote 175 endpoint sub-TLVs) will remain unchanged. 177 To put this in the context of Figure 1, GW1 and GW2 discover each 178 other as gateways for the egress data center site. Both GW1 and GW2 179 advertise themselves as having routes to prefix X. Furthermore, GW1 180 includes a Tunnel Encapsulation attribute with a tunnel instance of 181 type "SR tunnel" for itself and another for GW2. Similarly, GW2 182 includes a Tunnel Encapsulation for itself and another for GW1. The 183 gateway in the ingress data center site can now see all possible 184 paths to the egress data center site regardless of which route 185 advertisement is propagated to it, and it can choose one or balance 186 traffic flows as it sees fit. 188 2. DC Gateway Auto-Discovery 190 To allow a given DC's GWs to auto-discover each other and to 191 coordinate their operations, the following procedures are 192 implemented: 194 o Each GW is configured with an identifier for the DC that is common 195 across all GWs to the DC (i.e., across all GWs to all DC sites 196 that are interconnected) and unique across all DCs that are 197 connected. 199 o A route target ([RFC4360]) is attached to each GW's auto-discovery 200 route and has its value set to the DC identifier. 202 o Each GW constructs an import filtering rule to import any route 203 that carries a route target with the same DC identifier that the 204 GW itself uses. This means that only these GWs will import those 205 routes and that all GWs to the same DC will import each other's 206 routes and will learn (auto- discover) the current set of active 207 GWs for the DC. 209 The auto-discovery route each GW advertises consists of the 210 following: 212 o An IPv4 or IPv6 NLRI containing one of the GW's loopback addresses 213 (that is, with AFI/SAFI that is one of 1/1, 2/1, 1/4, or 2/4) 215 o A Tunnel Encapsulation attribute containing the GW's encapsulation 216 information, which at a minimum consists of an SR tunnel TLV (type 217 to be allocated by IANA) with a Remote Endpoint sub-TLV as 218 specified in [I-D.ietf-idr-tunnel-encaps]. 220 To avoid the side effect of applying the Tunnel Encapsulation 221 attribute to any packet that is addressed to the GW itself, the GW 222 SHOULD use a different loopback address for the two cases. 224 As described in Section 1, each GW will include a Tunnel 225 Encapsulation attribute for each GW that is active for the DC site 226 (including itself), and will include these in every route advertised 227 externally to the DC site by each GW. As the current set of active 228 GWs changes (due to the addition of a new GW or the failure/removal 229 of an existing GW) each externally advertised route will be re- 230 advertised with the set of SR tunnel instances reflecting the current 231 set of active GWs. 233 If a gateway becomes disconnected from the backbone network, or if 234 the DC operator decides to terminate the gateway's activity, it 235 withdraws the advertisements described above. This means that remote 236 gateways at other sites will stop seeing advertisements from this 237 gateway. It also means that other local gateways at this site will 238 "unlearn" the removed gateway and stop including a Tunnel 239 Encapsulation attribute for the removed gateway in their 240 advertisements. 242 3. Relationship to BGP Link State and Egress Peer Engineering 244 When a remote GW receives a route to a prefix X it can use the SR 245 tunnel instances within the contained Tunnel Encapsulation attribute 246 to identify the GWs through which X can be reached. It uses this 247 information to compute SR TE paths across the backbone network 248 looking at the information advertised to it in SR BGP Link State 249 (BGP-LS) [I-D.gredler-idr-bgp-ls-segment-routing-ext] and correlated 250 using the DC identity. SR Egress Peer Engineering (EPE) 251 [I-D.ietf-idr-bgpls-segment-routing-epe] can be used to supplement 252 the information advertised in the BGP-LS. 254 4. Advertising a DC Route Externally 256 When a packet destined for prefix X is sent on an SR TE path to a GW 257 for the DC site containing X, it needs to carry the receiving GW's 258 label for X such that this label rises to the top of the stack before 259 the GW completes its processing of the packet. To achieve this we 260 place a prefix-SID sub-TLV for X in each SR tunnel instance in the 261 Tunnel Encapsulation attribute in the externally advertised route for 262 X. 264 Alternatively, if the GWs for a given DC are configured to allow 265 remote GWs to perform SR TE through that DC for a prefix X, then each 266 GW computes an SR TE path through that DC to X from each of the 267 currently active GWs, and places each in an MPLS label stack sub-TLV 268 [I-D.ietf-idr-tunnel-encaps] in the SR tunnel instance for that GW. 270 5. Encapsulation 272 If the GWs for a given DC are configured to allow remote GWs to send 273 them a packet in that DC's native encapsulation, then each GW will 274 also include multiple instances of a tunnel TLV for that native 275 encapsulation in externally advertised routes: one for each GW and 276 each containing a remote endpoint sub-TLV with that GW's address. A 277 remote GW may then encapsulate a packet according to the rules 278 defined via the sub-TLVs included in each of the tunnel TLV 279 instances. 281 6. IANA Considerations 283 IANA maintains a registry called "BGP parameters" with a sub-registry 284 called "BGP Tunnel Encapsulation Tunnel Types." The registration 285 policy for this registry is First-Come First-Served. 287 IANA is requested to assign a codepoint from this sub-registry for 288 "SR Tunnel". The next available value may be used and reference 289 should be made to this document. 291 [[Note: This text is likely to be replaced with a specific code point 292 value once FCFS allocation has been made.]] 294 7. Security Considerations 296 TBD 298 8. Manageability Considerations 300 TBD 302 9. Acknowledgements 304 Thanks to Bruno Rijsman for review comments, and to Robert Raszuk for 305 useful discussions. 307 10. References 308 10.1. Normative References 310 [I-D.ietf-idr-bgpls-segment-routing-epe] 311 Previdi, S., Filsfils, C., Ray, S., Patel, K., Dong, J., 312 and M. Chen, "Segment Routing BGP Egress Peer Engineering 313 BGP-LS Extensions", draft-ietf-idr-bgpls-segment-routing- 314 epe-05 (work in progress), May 2016. 316 [I-D.ietf-idr-tunnel-encaps] 317 Rosen, E., Patel, K., and G. Velde, "The BGP Tunnel 318 Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-02 319 (work in progress), May 2016. 321 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 322 Requirement Levels", BCP 14, RFC 2119, 323 DOI 10.17487/RFC2119, March 1997, 324 . 326 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 327 Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, 328 February 2006, . 330 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 331 S. Ray, "North-Bound Distribution of Link-State and 332 Traffic Engineering (TE) Information Using BGP", RFC 7752, 333 DOI 10.17487/RFC7752, March 2016, 334 . 336 10.2. Informative References 338 [I-D.gredler-idr-bgp-ls-segment-routing-ext] 339 Previdi, S., Psenak, P., Filsfils, C., Gredler, H., Chen, 340 M., and j. jefftant@gmail.com, "BGP Link-State extensions 341 for Segment Routing", draft-gredler-idr-bgp-ls-segment- 342 routing-ext-03 (work in progress), July 2016. 344 [I-D.ietf-spring-segment-routing] 345 Filsfils, C., Previdi, S., Decraene, B., Litkowski, S., 346 and R. Shakir, "Segment Routing Architecture", draft-ietf- 347 spring-segment-routing-09 (work in progress), July 2016. 349 [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, 350 "Advertisement of Multiple Paths in BGP", RFC 7911, 351 DOI 10.17487/RFC7911, July 2016, 352 . 354 Authors' Addresses 356 John Drake 357 Juniper Networks 359 Email: jdrake@juniper.net 361 Adrian Farrel 362 Juniper Networks 364 Email: adrian@olddog.co.uk 366 Eric Rosen 367 Juniper Networks 369 Email: erosen@juniper.net 371 Keyur Patel 372 Arrcus, Inc. 374 Email: keyur@arrcus.com 376 Luay Jalil 377 Verizon 379 Email: luay.jalil@verizon.com