idnits 2.17.1 draft-drake-bess-datacenter-gateway-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 16, 2016) is 2872 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-19) exists of draft-ietf-idr-bgpls-segment-routing-epe-05 == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-02 ** Obsolete normative reference: RFC 7752 (Obsoleted by RFC 9552) == Outdated reference: A later version (-04) exists of draft-gredler-idr-bgp-ls-segment-routing-ext-02 == Outdated reference: A later version (-15) exists of draft-ietf-spring-segment-routing-08 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Working Group J. Drake 3 Internet-Draft A. Farrel 4 Intended status: Informational E. Rosen 5 Expires: December 18, 2016 Juniper Networks 6 June 16, 2016 8 Gateway Auto-Discovery and Route Advertisement for Segment Routing 9 Enabled Data Center Interconnection 10 draft-drake-bess-datacenter-gateway-01 12 Abstract 14 Data centers have become critical components of the infrastructure 15 used by network operators to provide services to their customers. 16 Data centers are attached to the Internet or a backbone network by 17 gateway routers and one data center typically has more than one 18 gateway for commercial, load balancing, and resiliency reasons. 20 Segment routing is a popular protocol mechanism for operating within 21 a data center, but also for steering traffic that flows between two 22 data center sites. In order that one data center site may load 23 balance the traffic it sends to another data center site it needs to 24 know the complete set of gateway routers at the remote data center, 25 the points of connection from those gateways to the backbone network, 26 and the connectivity across the backbone network. 28 This document defines a mechanism using the BGP Tunnel Encapsulation 29 attribute to allow each gateway router to advertise the routes to the 30 prefixes in the data center site to which it provides access, and 31 also to advertise on behalf of each other gateway to the same data 32 center site. 34 Requirements Language 36 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 37 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 38 document are to be interpreted as described in RFC 2119 [RFC2119]. 40 Status of This Memo 42 This Internet-Draft is submitted in full conformance with the 43 provisions of BCP 78 and BCP 79. 45 Internet-Drafts are working documents of the Internet Engineering 46 Task Force (IETF). Note that other groups may also distribute 47 working documents as Internet-Drafts. The list of current Internet- 48 Drafts is at http://datatracker.ietf.org/drafts/current/. 50 Internet-Drafts are draft documents valid for a maximum of six months 51 and may be updated, replaced, or obsoleted by other documents at any 52 time. It is inappropriate to use Internet-Drafts as reference 53 material or to cite them other than as "work in progress." 55 This Internet-Draft will expire on December 18, 2016. 57 Copyright Notice 59 Copyright (c) 2016 IETF Trust and the persons identified as the 60 document authors. All rights reserved. 62 This document is subject to BCP 78 and the IETF Trust's Legal 63 Provisions Relating to IETF Documents 64 (http://trustee.ietf.org/license-info) in effect on the date of 65 publication of this document. Please review these documents 66 carefully, as they describe your rights and restrictions with respect 67 to this document. Code Components extracted from this document must 68 include Simplified BSD License text as described in Section 4.e of 69 the Trust Legal Provisions and are provided without warranty as 70 described in the Simplified BSD License. 72 Table of Contents 74 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 75 2. DC Gateway Auto-Discovery . . . . . . . . . . . . . . . . . . 4 76 3. Relationship to BGP Link State and Egress Peer Engineering . 5 77 4. Advertising a DC Route Externally . . . . . . . . . . . . . . 6 78 5. Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . 6 79 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 80 7. Security Considerations . . . . . . . . . . . . . . . . . . . 7 81 8. Manageability Considerations . . . . . . . . . . . . . . . . 7 82 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 7 83 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 85 11.1. Normative References . . . . . . . . . . . . . . . . . . 7 86 11.2. Informative References . . . . . . . . . . . . . . . . . 8 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 89 1. Introduction 91 Data centers (DCs) have become critical components of the 92 infrastructure used by network operators to provide services to their 93 customers. DCs are attached to the Internet or a backbone network by 94 gateway routers (GWs) and one DC typically has more than one GW for 95 various reasons including commercial preferences, load balancing, and 96 resiliency against connection of device failure. 98 Segment routing (SR) [I-D.ietf-spring-segment-routing] is a popular 99 protocol mechanism for operating within a DC, but also for steering 100 traffic that flows between two DC sites. In order for an ingress DC 101 that uses SR to load balance the flows it sends to an egress DC, it 102 needs to know the complete set of entry nodes (i.e., GWs) for that 103 egress DC from the backbone network connecting the two DCs. Note 104 that it is assumed that the connected set of DCs and the backbone 105 network connecting them are part of the same SR BGP Link State (LS) 106 instance ([RFC7752] and [I-D.ietf-idr-bgpls-segment-routing-epe]) so 107 that traffic engineering using SR may be used for these flows. 109 Suppose that there are two gateways, GW1 and GW2 as shown in 110 Figure 1, for a given egress DC and they both advertise a route to 111 prefix X which is located within that DC with each setting itself as 112 next hop. One might think that the GWs for X could be inferred from 113 the routes' next hop fields, but typically both routes do not get 114 distributed across the backbone, rather only the best route, as 115 selected by BGP, is distributed. This precludes load balancing flows 116 across both GWs. 118 ----------------- -------------------- 119 | Ingress | | Egress ------ | 120 | DC Site | | DC Site |Prefix| | 121 | | | | X | | 122 | | | ------ | 123 | -- | | --- --- | 124 | |GW| | | |GW1| |GW2| | 125 -------++--------- ----+----------+-+-- 126 | \ | / | 127 | \ | / | 128 | -+------------- --------+-------+-- | 129 | ||PE| ----| |---- |PE| |PE| | | 130 | | -- |ASBR+------+ASBR| -- -- | | 131 | | ----| |---- | | 132 | | | | | | 133 | | ----| |---- | | 134 | | AS1 |ASBR+------+ASBR| AS2 | | 135 | | ----| |---- | | 136 | --------------- ------------------- | 137 --+----------------------------------------------+-- 138 | |PE| |PE| | 139 | -- AS3 -- | 140 | | 141 ---------------------------------------------------- 143 Figure 1: Example Data Center Interconnection 145 The obvious solution to this problem is to use add-paths 146 [I-D.ietf-idr-add-paths] to ensure that all routes to X get 147 advertised by BGP. However, even if this is done, the identity of 148 the GWs will be lost as soon as the routes get distributed through an 149 Autonomous System Border Router (ASBR) that will set itself to be the 150 next hop. And if there are multiple Autonomous Systems (ASes) in the 151 backbone, not only will the next hop change several times, but the 152 add-paths technique will experience scaling issues. This all means 153 that this approach is limited to DC sites connected over a single AS. 155 This document defines a solution that overcomes this limitation and 156 works equally well with a backbone constructed from one or more AS. 157 This solution uses the Tunnel Encapsulation attribute 158 [I-D.ietf-idr-tunnel-encaps] as follows: 160 We define a new tunnel type, "SR tunnel", and when the GWs to a 161 given DC advertise a route to a prefix X within the DC, they will 162 each include a Tunnel Encapsulation attribute with multiple tunnel 163 instances each of type "SR tunnel", one for each GW and each 164 containing a Remote Endpoint sub-TLV with that GW's address. 166 In other words, each route advertised by any GW identifies all of the 167 GWs to the same DC (see Section 2 for a discussion of how GWs 168 discover each other). Therefore, even if only one of the routes is 169 distributed to other ASes, it will not matter how many times the next 170 hop changes, as the Tunnel Encapsulation attribute (and its remote 171 endpoint sub-TLVs) will remain unchanged. 173 To put this in the context of Figure 1, GW1 and GW2 discover each 174 other as gateways for the egress data center site. Both GW1 and GW2 175 advertise themselves as having routes to prefix X. Furthermore, GW1 176 includes a Tunnel Encapsulation attribute with a tunnel instance of 177 type "SR tunnel" for itself and another for GW2. Similarly, GW2 178 includes a Tunnel Encapsulation for itself and another for GW1. The 179 gateway in the ingress data center site can now see the possible 180 paths to the egress data center site and choose one or balance 181 traffic flows as it sees fit. 183 2. DC Gateway Auto-Discovery 185 To allow a given DC's GWs to auto-discover each other and to 186 coordinate their operations, the following procedures are 187 implemented: 189 o Each GW is configured with an identifier for the DC that is common 190 across all GWs to the DC (i.e., all GWs to all DC sites that are 191 connected) and unique across all DCs that are connected. 193 o A route target ([RFC4360]) is attached to each GW's auto-discovery 194 route and has its value set to the DC identifier. 196 o Each GW constructs an import filtering rule to import any route 197 that carries a route target with the same DC identifier that the 198 GW itself uses. This means that only these GWs will import those 199 routes and that all GWs to the same DC will import each other's 200 routes and will learn (auto- discover) the current set of active 201 GWs for the DC. 203 The auto-discovery route each GW advertises consists of the 204 following: 206 o An IPv4 or IPv6 NLRI containing one of the GW's loopback addresses 207 (that is, with AFI/SAFI that is one of 1/1, 2/1, 1/4, 2/4) 209 o A Tunnel Encapsulation attribute containing the GW's encapsulation 210 information, which at a minimum consists of an SR tunnel TLV (type 211 to be allocated by IANA) with a Remote Endpoint sub-TLV as 212 specified in [I-D.ietf-idr-tunnel-encaps]. 214 To avoid the side effect of applying the Tunnel Encapsulation 215 attribute to any packet that is addressed to the GW, the GW SHOULD 216 use a different loopback address. 218 As described in Section 1, each GW will include a Tunnel 219 Encapsulation attribute for each GW that is active for the DC site 220 (including itself), and will include these in every route advertised 221 externally to the DC site by each GW. As the current set of active 222 GWs changes (due to the addition of a new GW or the failure/removal 223 of an existing GW) each externally advertised route will be re- 224 advertised with the set of SR tunnel instances reflecting the current 225 set of active GWs. 227 If a gateway becomes disconnected from the backbone network, or if 228 the DC operator decides to terminate the gateway's activity, it 229 withdraws the advertisements described above. This means that remote 230 gateways at other sites will stop seeing advertisements from this 231 gateway. It also means that other local gateways at this site will 232 "unlearn" the removed gateway and stop including a Tunnel 233 Encapsulation attribute for the removed gateway in their 234 advertisements. 236 3. Relationship to BGP Link State and Egress Peer Engineering 238 When a remote GW receives a route to a prefix X it can use the SR 239 tunnel instances within the contained Tunnel Encapsulation attribute 240 to identify the GWs through which X can be reached. It uses this 241 information to compute SR TE paths across the backbone network 242 looking at the information advertised to it in SR BGP Link State 243 (BGP-LS) [I-D.gredler-idr-bgp-ls-segment-routing-ext] and correlated 244 using the DC identity. SR Egress Peer Engineering (EPE) 245 [I-D.ietf-idr-bgpls-segment-routing-epe] can be used to supplement 246 the information advertised in the BGP-LS. 248 4. Advertising a DC Route Externally 250 When a packet destined for prefix X is sent on an SR TE path to a GW 251 for the DC site containing X, it needs to carry the receiving GW's 252 label for X such that this label rises to the top of the stack before 253 the GW complete its processing of the packet. To achieve this we 254 place a prefix-SID sub-TLV for X in each SR tunnel instance in the 255 Tunnel Encapsulation attribute in the externally advertised route for 256 X. 258 Alternatively, if the GWs for a given DC are configured to allow 259 remote GWs to perform SR TE through that DC for a prefix X, then each 260 GW computes an SR TE path through that DC to X from each of the 261 current active GWs and places each in an MPLS label stack sub-TLV 262 [I-D.ietf-idr-tunnel-encaps] in the SR tunnel instance for that GW. 264 5. Encapsulation 266 If the GWs for a given DC are configured to allow remote GWs send 267 them a packet in that DC's native encapsulation, then each GW will 268 also include multiple instances of a tunnel TLV for that native 269 encapsulation, one for each GW and each containing a remote endpoint 270 sub-TLV with that GW's address, in externally advertised routes. A 271 remote GW may then encapsulate a packet according to the rules 272 defined via the sub-TLVs included in each of the tunnel TLV 273 instances. 275 6. IANA Considerations 277 IANA maintains a registry called "BGP parameters" with a sub-registry 278 called "BGP Tunnel Encapsulation Tunnel Types." The registration 279 policy for this registry is First-Come First-Served. 281 IANA is requested to assign a codepoint from this sub-registry for 282 "SR Tunnel". The next available value may be used and reference 283 should be made to this document. 285 [[Note: This text is likely to be replaced with a specific code point 286 value once FCFS allocation has been made.]] 288 7. Security Considerations 290 TBD 292 8. Manageability Considerations 294 TBD 296 9. Contributors 298 The following people contributed to discussions that led to the 299 development of this document: 301 TBD 302 name 303 Email: email 305 10. Acknowledgements 307 Thanks to Bruno Rijsman for review comments, and to Robert Raszuk for 308 useful discussions. 310 11. References 312 11.1. Normative References 314 [I-D.ietf-idr-bgpls-segment-routing-epe] 315 Previdi, S., Filsfils, C., Ray, S., Patel, K., Dong, J., 316 and M. Chen, "Segment Routing BGP Egress Peer Engineering 317 BGP-LS Extensions", draft-ietf-idr-bgpls-segment-routing- 318 epe-05 (work in progress), May 2016. 320 [I-D.ietf-idr-tunnel-encaps] 321 Rosen, E., Patel, K., and G. Velde, "The BGP Tunnel 322 Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-02 323 (work in progress), May 2016. 325 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 326 Requirement Levels", BCP 14, RFC 2119, 327 DOI 10.17487/RFC2119, March 1997, 328 . 330 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 331 Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, 332 February 2006, . 334 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 335 S. Ray, "North-Bound Distribution of Link-State and 336 Traffic Engineering (TE) Information Using BGP", RFC 7752, 337 DOI 10.17487/RFC7752, March 2016, 338 . 340 11.2. Informative References 342 [I-D.gredler-idr-bgp-ls-segment-routing-ext] 343 Previdi, S., Psenak, P., Filsfils, C., Gredler, H., Chen, 344 M., and j. jefftant@gmail.com, "BGP Link-State extensions 345 for Segment Routing", draft-gredler-idr-bgp-ls-segment- 346 routing-ext-02 (work in progress), June 2016. 348 [I-D.ietf-idr-add-paths] 349 Walton, D., Retana, A., Chen, E., and J. Scudder, 350 "Advertisement of Multiple Paths in BGP", draft-ietf-idr- 351 add-paths-15 (work in progress), May 2016. 353 [I-D.ietf-spring-segment-routing] 354 Filsfils, C., Previdi, S., Decraene, B., Litkowski, S., 355 and R. Shakir, "Segment Routing Architecture", draft-ietf- 356 spring-segment-routing-08 (work in progress), May 2016. 358 Authors' Addresses 360 John Drake 361 Juniper Networks 363 Email: jdrake@juniper.net 365 Adrian Farrel 366 Juniper Networks 368 Email: adrian@olddog.co.uk 370 Eric Rosen 371 Juniper Networks 373 Email: erosen@juniper.net