idnits 2.17.1 draft-zzhang-bess-evpn-bum-procedure-updates-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 16 instances of too long lines in the document, the longest one being 3 characters in excess of 72. -- The draft header indicates that this document updates RFC7432, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: A RNVE (Regular, or legacy NVE that does not support the procedures discussed in this section) replicate traffic directly to all NVEs/ RNVEs. RNVEs can be identified by the lack of indication as discussed in Section 5.3 in their I-PMSI A-D routes. In case of MPLS encapsulation, NVEs and RNVEs advertise a label in their I-PMSI A-D routes, and RBRs MUST not change that when re-advertise the routes. Note that, the label is advertised even though an NVE sets the LIR bit. -- The document date (December 18, 2015) is 3052 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 7524' is mentioned on line 177, but not defined == Unused Reference: 'I-D.ietf-bess-ir' is defined on line 832, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 837, but no explicit reference was found in the text == Unused Reference: 'RFC7117' is defined on line 842, but no explicit reference was found in the text == Unused Reference: 'RFC7432' is defined on line 847, but no explicit reference was found in the text == Unused Reference: 'RFC7524' is defined on line 852, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-bess-dci-evpn-overlay' is defined on line 860, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-bess-evpn-overlay' is defined on line 866, but no explicit reference was found in the text == Unused Reference: 'I-D.rabadan-bess-evpn-optimized-ir' is defined on line 872, but no explicit reference was found in the text == Unused Reference: 'I-D.wijnands-bier-architecture' is defined on line 878, but no explicit reference was found in the text == Unused Reference: 'RFC6513' is defined on line 884, but no explicit reference was found in the text == Unused Reference: 'RFC6514' is defined on line 888, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-ietf-bess-ir-00 == Outdated reference: A later version (-10) exists of draft-ietf-bess-dci-evpn-overlay-00 == Outdated reference: A later version (-12) exists of draft-ietf-bess-evpn-overlay-01 == Outdated reference: A later version (-02) exists of draft-rabadan-bess-evpn-optimized-ir-00 Summary: 2 errors (**), 0 flaws (~~), 18 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Z. Zhang 3 Internet-Draft W. Lin 4 Updates: 7432 (if approved) Juniper Networks, Inc. 5 Intended status: Standards Track J. Rabadan 6 Expires: June 20, 2016 Alcatel-Lucent 7 K. Patel 8 Cisco Systems 9 December 18, 2015 11 Updates on EVPN BUM Procedures 12 draft-zzhang-bess-evpn-bum-procedure-updates-01 14 Abstract 16 This document specifies procedure updates for broadcast, unknown 17 unicast, and multicast (BUM) traffic in Ethernet VPNs (EVPN), 18 including selective multicast, and provider tunnel segmentation. 20 Requirements Language 22 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 23 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 24 document are to be interpreted as described in RFC2119. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on June 20, 2016. 43 Copyright Notice 45 Copyright (c) 2015 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2.1. Reasons for Tunnel Segmentation . . . . . . . . . . . . . 4 63 3. Additional Route Types of EVPN NLRI . . . . . . . . . . . . . 5 64 3.1. Per-Region I-PMSI A-D route . . . . . . . . . . . . . . . 6 65 3.2. S-PMSI A-D route . . . . . . . . . . . . . . . . . . . . 6 66 3.3. Leaf-AD route . . . . . . . . . . . . . . . . . . . . . . 6 67 4. Selective Multicast . . . . . . . . . . . . . . . . . . . . . 7 68 5. Inter-AS Segmentation . . . . . . . . . . . . . . . . . . . . 7 69 5.1. Changes to Section 7.2.2 of RFC 7117 . . . . . . . . . . 7 70 5.2. I-PMSI Leaf Tracking . . . . . . . . . . . . . . . . . . 8 71 5.3. Backward Compatibility . . . . . . . . . . . . . . . . . 9 72 6. Inter-Region Segmentation . . . . . . . . . . . . . . . . . . 10 73 6.1. Area vs. Region . . . . . . . . . . . . . . . . . . . . . 10 74 6.2. Per-region Aggregation . . . . . . . . . . . . . . . . . 12 75 6.3. Use of S-NH-EC . . . . . . . . . . . . . . . . . . . . . 13 76 6.4. Ingress PE's I-PMSI Leaf Tracking . . . . . . . . . . . . 13 77 7. Intra-region Segmentation and Assisted Ingress Replication . 13 78 7.1. Reducing Leaf A-D Routes . . . . . . . . . . . . . . . . 14 79 7.2. Mix of inter-region and intra-region segmentation . . . . 15 80 8. Multi-homing Support . . . . . . . . . . . . . . . . . . . . 15 81 9. EVPN DCI . . . . . . . . . . . . . . . . . . . . . . . . . . 15 82 9.1. Non-GW Option . . . . . . . . . . . . . . . . . . . . . . 16 83 9.2. GW option . . . . . . . . . . . . . . . . . . . . . . . . 17 84 10. Security Considerations . . . . . . . . . . . . . . . . . . . 18 85 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 86 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 87 12.1. Normative References . . . . . . . . . . . . . . . . . . 18 88 12.2. Informative References . . . . . . . . . . . . . . . . . 19 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 91 1. Terminology 93 To be added 95 2. Introduction 97 RFC 7432 specifies procedures to handle broadcast, unknown unicast, 98 and multicast (BUM) traffic in Section 11, 12 and 16, using Inclusive 99 Multicast Ethernet Tag Route. A lot of details are referred to RFC 100 7117 (VPLS Multicast). In particular, selective multicast is briefly 101 mentioned for Ingress Replication but referred to RFC 7117. 103 RFC 7117 specifies procedures for using both inclusive tunnels and 104 selective tunnels, similar to MVPN procedures specified in RFC 6513 105 and RFC 6514. A new SAFI "MCAST-VPLS" is introduced, with two types 106 of NLRIs that match MVPN's S-PMSI A-D routes and Leaf A-D routes. 107 The same procedures can be applied to EVPN selective multicast for 108 both Ingress Replication and other tunnel types, but new route types 109 need to be defined under the same EVPN SAFI. 111 MVPN uses terms I-PMSI and S-PMSI A-D Routes. For consistency and 112 convenience, this document will use the same I/S-PMSI terms for VPLS 113 and EVPN. In particular, EVPN's Inclusive Multicast Ethernet Tag 114 Route and VPLS's VPLS A-D route carrying PTA (PMSI Tunnel Attribute) 115 for BUM traffic purpose will all be referred to as I-PMSI A-D routes. 116 Depending on the context, they may be used interchangeably. 118 MVPN provider tunnels and EVPN/VPLS BUM provider tunnels, which are 119 referred to as MVPN/EVPN/VPLS provider tunnels in this document for 120 simplicity, can be segmented for technical or administrative reasons, 121 which are summarized in Section 2.1 of this document. RFC 6513/6514 122 cover MVPN inter-as segmentation, RFC 7117 covers VPLS multicast 123 inter-as segmentation, and RFC 7524 (Seamless MPLS Multicast) covers 124 inter-area segmentation for both MVPN and VPLS. 126 There is a difference between MVPN and VPLS multicast inter-as 127 segmentation. For simplicity, EVPN uses the same procedures as in 128 MVPN. All ASBRs can re-advertise their choice of the best route. 129 Each can become the root of its intra-AS segment and inject traffic 130 it receives from its upstream, while each downstream PE/ASBR will 131 only pick one of the upstream ASBRs as its upstream. This is also 132 the behavior even for VPLS in case of inter-area segmentation. 134 For inter-area segmentation, RFC 7524 requires the use of Inter-area 135 P2MP Segmented Next-Hop Extended Community (S-NH-EC), and the setting 136 of "Leaf Information Required" (LIR) flag in PTA in certain 137 situations. Either of these could be optional in case of EVPN. 138 Removing these requirements would make the segmentation procedures 139 transparent to ingress and egress PEs. 141 RFC 7524 assumes that segmentation happens at area borders. However, 142 it could be at "regional" borders, where a region could be a sub- 143 area, or even an entire AS plus its external links (Section 6). That 144 would allow for more flexible deployment scenarios (e.g. for single- 145 area provider networks). 147 This document specifies/clarifies/redefines certain/additional EVPN 148 BUM procedures, with a salient goal that they're better aligned among 149 MVPN, EVPN and VPLS. For brevity, only changes/additions to relevant 150 RFC 7117 and RFC 7524 procedures are specified, instead of repeating 151 the entire procedures. Note that these are to be applied to EVPN 152 only, even though sometimes they may sound to be updates to RFC 153 7117/7524. 155 2.1. Reasons for Tunnel Segmentation 157 Tunnel segmentation may be required and/or desired because of 158 administrative and/or technical reasons. 160 For example, an MVPN/VPLS/EVPN network may span multiple providers 161 and Inter-AS Option-B has to be used, in which the end-to-end 162 provider tunnels have to be segmented at and stitched by the ASBRs. 163 Different providers may use different tunnel technologies (e.g., 164 provider A uses Ingress Replication, provider B uses RSVP-TE P2MP 165 while provider C uses mLDP). Even if they use the same tunnel 166 technology like RSVP-TE P2MP, it may be impractical to set up the 167 tunnels across provider boundaries. 169 The same situations may apply between the ASes and/or areas of a 170 single provider. For example, the backbone area may use RSVP-TE P2MP 171 tunnels while non-backbone areas may use mLDP tunnels. 173 Segmentation can also be used to divide an AS/area to smaller 174 regions, so that control plane state and/or forwarding plane state/ 175 burden can be limited to that of individual regions. For example, 176 instead of Ingress Replicating to 100 PEs in the entire AS, with 177 inter-area segmentation [RFC 7524] a PE only needs to replicate to 178 local PEs and ABRs. The ABRs will further replicate to their 179 downstream PEs and ABRs. This not only reduces the forwarding plane 180 burden, but also reduces the leaf tracking burden in the control 181 plane. This inter-region segmentation can be further extended to 182 intra-region as an alternative way to achieve Assisted Replication as 183 proposed in [draft-rabadan-bess-evpn-optimized-ir], and it works for 184 MPLS encapsulation. 186 Smaller regions also have the benefit that, in case of tunnel 187 aggregation, it is easier to find congruence among the segments of 188 different constituent (service) tunnels and the resulting aggregation 189 (base) tunnel in a region. This leads to better bandwidth 190 efficiency, because the more congruent they are, the fewer leaves of 191 the base tunnel need to discard traffic when a service tunnel's 192 segment does not need to receive the traffic (yet it is receiving the 193 traffic due to aggregation). 195 Another advantage of the smaller region is smaller BIER sub-domains. 196 In this new multicast architecture BIER, packets carry a BitString, 197 in which the bits correspond to edge routers that needs to receive 198 traffic. Smaller sub-domains means smaller BitStrings can be used 199 without having to send multiple copies of the same packet. 201 Finally, EVPN tunnel segmentation can be used for EVPN DCIs, as 202 discussed in Section 9. It follows the same concepts discussed 203 above. 205 3. Additional Route Types of EVPN NLRI 207 RFC 7432 defines the format of EVPN NLRI as the following: 209 +-----------------------------------+ 210 | Route Type (1 octet) | 211 +-----------------------------------+ 212 | Length (1 octet) | 213 +-----------------------------------+ 214 | Route Type specific (variable) | 215 +-----------------------------------+ 217 So far five types have been defined: 219 + 1 - Ethernet Auto-Discovery (A-D) route 220 + 2 - MAC/IP Advertisement route 221 + 3 - Inclusive Multicast Ethernet Tag route 222 + 4 - Ethernet Segment route 223 + 5 - IP Prefix Route 225 This document defines three additional route types: 227 + 6 - Per-Region I-PMSI A-D route 228 + 7 - S-PMSI A-D route 229 + 8 - Leaf A-D route 231 The "Route Type specific" field of the type 6 and type 7 EVPN NLRIs 232 starts with a type 1 RD, whose Administrative sub-field MUST match 233 that of the RD in all the EVPN routes from the same advertising 234 router for a given EVI, except the Leaf A-D route (Section 3.3). 236 3.1. Per-Region I-PMSI A-D route 238 The Per-region I-PMSI A-D route has the following format. Its usage 239 is discussed in Section 6.2. 241 +-----------------------------------+ 242 | RD (8 octets) | 243 +-----------------------------------+ 244 | Ethernet Tag ID (4 octets) | 245 +-----------------------------------+ 246 | Extended Community (8 octets) | 247 +-----------------------------------+ 249 After Ethernet Tag ID, an Extended Community (EC) is used to identify 250 the region. Various types and sub-types of ECs provide maximum 251 flexibility. Note that this is not an EC Attribute, but an 8-octet 252 field embedded in the NLRI itself, following EC encoding scheme. 254 3.2. S-PMSI A-D route 256 The S-PMSI A-D route has the following format: 258 +-----------------------------------+ 259 | RD (8 octets) | 260 +-----------------------------------+ 261 | Ethernet Tag ID (4 octets) | 262 +-----------------------------------+ 263 | Multicast Source Length (1 octet) | 264 +-----------------------------------+ 265 | Multicast Source (Variable) | 266 +-----------------------------------+ 267 | Multicast Group Length (1 octet) | 268 +-----------------------------------+ 269 | Multicast Group (Variable) | 270 +-----------------------------------+ 271 | Originating Router's IP Addr | 272 +-----------------------------------+ 274 Other than the addition of Ethernet Tag ID, it is identical to the 275 S-PMSI A-D route as defined in RFC 7117. The procedures in RFC 7117 276 also apply (including wildcard functionality), except that the 277 granularity level is per Ethernet Tag. 279 3.3. Leaf-AD route 281 The Route Type specific field of a Leaf A-D route consists of the 282 following: 284 +-----------------------------------+ 285 | Route Key (variable) | 286 +-----------------------------------+ 287 | Originating Router's IP Addr | 288 +-----------------------------------+ 290 A Leaf A-D route is originated in response to a PMSI route, which 291 could be an Inclusive Multicast Tag route, a per-region I-PMSI A-D 292 route, an S-PMSI A-D route, or some other types of routes that may be 293 defined in the future that triggers Leaf A-D routes. The Route Key 294 is the "Route Type Specific" field of the route for which this Leaf 295 A-D route is generated. 297 The general procedures of Leaf A-D route are first specified in RFC 298 6514 for MVPN. The principles apply to VPLS and EVPN as well. RFC 299 7117 has details for VPLS Multicast, and this document points out 300 some specifics for EVPN, e.g. in Section 5. 302 4. Selective Multicast 304 RFC 7117 specifies Selective Multicast for VPLS. Other than that 305 different route types and formats are specified with EVPN SAFI for 306 S-PMSI A-D and Leaf A-D routes (Section 3), all procedures in RFC 307 7117 with respect to Selective Multicast apply to EVPN as well, 308 including wildcard procedures. 310 5. Inter-AS Segmentation 312 5.1. Changes to Section 7.2.2 of RFC 7117 314 The first paragraph of Section 7.2.2.2 of RFC 7117 says: 316 "... The best route procedures ensure that if multiple 317 ASBRs, in an AS, receive the same Inter-AS A-D route from their EBGP 318 neighbors, only one of these ASBRs propagates this route in Internal 319 BGP (IBGP). This ASBR becomes the root of the intra-AS segment of 320 the inter-AS tree and ensures that this is the only ASBR that accepts 321 traffic into this AS from the inter-AS tree." 323 The above VPLS behavior requires complicated VPLS specific procedures 324 for the ASBRs to reach agreement. For EVPN, a different approach is 325 used and the above quoted text is not applicable to EVPN. 327 The Leaf A-D based procedure is used for each ASBR who re-advertises 328 into the AS to discover the leaves on the segment rooted at itself. 329 This is the same as the procedures for S-PMSI in RFC 7117 itself. 331 The following text at the end of the second bullet: 333 "................................................... If, in order 334 to instantiate the segment, the ASBR needs to know the leaves of 335 the tree, then the ASBR obtains this information from the A-D 336 routes received from other PEs/ASBRs in the ASBR's own AS." 338 is changed to the following: 340 "................................................... If, in order 341 to instantiate the segment, the ASBR needs to know the leaves of 342 the tree, then the ASBR MUST set the LIR flag to 1 in the PTA to 343 trigger Leaf A-D routes from egress PEs and downstream ASBRs. 344 It MUST be (auto-)configured with an import RT, which controls 345 acceptance of leaf A-D routes by the ASBR." 347 Accordingly, the following paragraph in Section 7.2.2.4: 349 "If the received Inter-AS A-D route carries the PMSI Tunnel attribute 350 with the Tunnel Identifier set to RSVP-TE P2MP LSP, then the ASBR 351 that originated the route MUST establish an RSVP-TE P2MP LSP with the 352 local PE/ASBR as a leaf. This LSP MAY have been established before 353 the local PE/ASBR receives the route, or it MAY be established after 354 the local PE receives the route." 356 is changed to the following: 358 "If the received Inter-AS A-D route has the LIR flag set in its PTA, 359 then a receiving PE must originate a corresponding Leaf A-D route, 360 and a receiving ASBR must originate a corresponding Leaf A-D route 361 if and only if it received and imported one or more corresponding Leaf 362 A-D routes from its downstream IBGP or EBGP peers, or it has non-null 363 downstream forwarding state for the PIM/mLDP tunnel that instantiates 364 its downstream intra-AS segment. The ASBR that (re-)advertised the 365 Inter-AS A-D route then establishes a tunnel to the leaves discovered 366 by the Leaf A-D routes." 368 5.2. I-PMSI Leaf Tracking 370 An ingress PE does not set the LIR flag in its I-PMSI's PTA, even 371 with Ingress Replication or RSVP-TE P2MP tunnels. It does not rely 372 on the Leaf A-D routes to discover leaves in its AS, and Section 11.2 373 of RFC 7432 explicitly states that the LIR flag must be set to zero. 375 An implementation of RFC 7432 might have used the Originating 376 Router's IP Address field of the Inclusive Multicast Ethernet Tag 377 routes to determine the leaves, or might have used the Next Hop field 378 instead. Within the same AS, both will lead to the same result. 380 With segmentation, an ingress PE MUST determine the leaves in its AS 381 from the BGP next hops in all its received I-PMSI A-D routes, so it 382 does not have to set the LIR bit set to request Leaf A-D routes. PEs 383 within the same AS will all have different next hops in their I-PMSI 384 A-D routes (hence will all be considered as leaves), and PEs from 385 other ASes will have the next hop in their I-PMSI A-D routes set to 386 addresses of ASBRs in this local AS, hence only those ASBRs will be 387 considered as leaves (as proxies for those PEs in other ASes). Note 388 that in case of Ingress Replication, when an ASBR re-advertises IBGP 389 I-PMSI A-D routes, it MUST advertise the same label for all those for 390 the same Ethernet Tag ID and the same EVI. When an ingress PE builds 391 its flooding list, multiple routes may have the same (nexthop, label) 392 tuple and they will only be added as a single branch in the flooding 393 list. 395 5.3. Backward Compatibility 397 The above procedures assume that all PEs are upgraded to support the 398 segmentation procedures: 400 o An ingress PE uses the Next Hop instead of Originating Router's IP 401 Address to determine leaves for the I-PMSI tunnel. 403 o An egress PE sends Leaf A-D routes in response to I-PMSI routes, 404 if the PTA has the LIR flag set (by the re-advertising ASBRs). 406 o In case of Ingress Replication, when an ingress PE builds its 407 flooding list, multiple I-PMSI routes may have the same (nexthop, 408 label) tuple and only a single branch for those will be added in 409 the flooding list. 411 If a deployment has legacy PEs that does not support the above, then 412 a legacy ingress PE would include all PEs (including those in remote 413 ASes) as leaves of the inclusive tunnel and try to send traffic to 414 them directly (no segmentation), which is either undesired or not 415 possible; a legacy egress PE would not send Leaf A-D routes so the 416 ASBRs would not know to send external traffic to them. 418 To address this backward compatibility problem, the following 419 procedure can be used (see Section 6.2 for per-PE/AS/region I-PMSI 420 A-D routes): 422 o An upgraded PE indicates in its per-PE I-PMSI A-D route that it 423 supports the new procedures. Details will be provided in a future 424 revision. 426 o All per-PE I-PMSI A-D routes are restricted to the local AS and 427 not propagated to external peers. 429 o The ASBRs in an AS originate per-region I-PMSI A-D routes and 430 advertise to their external peers to advertise tunnels used to 431 carry traffic from the local AS to other ASes. Depending on the 432 types of tunnels being used, the LIR flag in the PTA may be set, 433 in which case the downstream ASBRs and upgraded PEs will send Leaf 434 A-D routes to pull traffic from their upstream ASBRs. In a 435 particular downstream AS, one of the ASBRs is elected, based on 436 the per-region I-PMSI A-D routes for a particular source AS, to 437 send traffic from that source AS to legacy PEs in the downsrream 438 AS. The traffic arrives at the elected ASBR on the tunnel 439 announced in the best per-region I-PMSI A-D route for the source 440 AS, that the ASBR has selected of all those that it received over 441 EBGP or IBGP sessions. Details of the election procedure will be 442 provided in a future revision. 444 o In an ingress AS, if and only if an ASBR has active downstream 445 receivers (PEs and ASBRs), which are learned either explicitly via 446 Leaf AD routes or implicitly via PIM join or mLDP label mapping, 447 the ASBR originates a per-PE I-PMSI A-D route (i.e., regular 448 Inclusive Multicast Ethernet Tag route) into the local AS, and 449 stitches incoming per-PE I-PMSI tunnels into its per-region I-PMSI 450 tunnel. With this, it gets traffic from local PEs and send to 451 other ASes via the tunnel announced in its per-region I-PMSI A-D 452 route. 454 Note that, even if there is no backward compatibility issue, the 455 above procedures has the benefit of keeping all per-PE I-PMSI A-D 456 routes in their local ASes, greatly reducing the flooding of the 457 routes and their corresponding Leaf A-D routes (when needed), and the 458 number of inter-as tunnels. 460 6. Inter-Region Segmentation 462 6.1. Area vs. Region 464 RFC 7524 is for MVPN/VPLS inter-area segmentation and does not 465 explicitly cover EVPN. However, if "area" is replaced by "region" 466 and "ABR" is replaced by "RBR" (Regional Border Router) then 467 everything still works, and can be applied to EVPN as well. 469 A region can be a sub-area, or can be an entire AS including its 470 external links. Instead of automatic region definition based on IGP 471 areas, a region would be defined as a BGP peer group. In fact, even 472 with IGP area based region definition, a BGP peer group listing the 473 PEs and ABRs in an area is still needed. 475 Consider the following example diagram: 477 --------- ------ --------- 478 / \ / \ / \ 479 / \ / \ / \ 480 | PE1 o ASBR1 -- ASBR2 ASBR3 -- ASBR4 o PE2 | 481 \ / \ / \ / 482 \ / \ / \ / 483 --------- ------ --------- 484 AS 100 AS 200 AS 300 485 |-----------|--------|---------|--------|------------| 486 segment1 segment2 segment3 segment4 segment5 488 The inter-as segmentation procedures specified so far (RFC 6513/6514, 489 7117, and Section 5 of this document) requires all ASBRs to be 490 involved, and Ingress Replication is used between two ASBRs in 491 different ASes. 493 In the above diagram, it's possible that ASBR1/4 does not support 494 segmentation, and the provider tunnels in AS 100/300 can actually 495 extend across the external link. In the case, the inter-region 496 segmentation procedures can be used instead - a region is the entire 497 (AS100 + ASBR1-ASBR2 link) or (AS300 + ASBR3-ASBR4 link). ASBR2/3 498 would be the RBRs, and ASBR1/4 will just be a transit core router 499 with respect to provider tunnels. 501 As illustrated in the diagram below, ASBR2/3 will establish a 502 multihop EBGP session with either a RR or directly with PEs in the 503 neighboring AS. I/S-PMSI A-D routes from ingress PEs will not be 504 processed by ASBR1/4. When ASBR2 re-advertises the routes into AS 505 200, it changes the next hop to its own address and changes PTA to 506 specify the tunnel type/identification in its own AS. When ASBR3 re- 507 advertises I/S-PMSI A-D routes into the neighboring AS 300, it 508 changes the next hop to its own address and changes PTA to specify 509 the tunnel type/identification in the neighboring region 3. Now the 510 segment is rooted at ASBR3 and extends across the external link to 511 PEs. 513 --------- ------ --------- 514 / RR....\.mh-ebpg / \ mh-ebgp/....RR \ 515 / : \ `. / \ .' / : \ 516 | PE1 o ASBR1 -- ASBR2 ASBR3 -- ASBR4 o PE2 | 517 \ / \ / \ / 518 \ / \ / \ / 519 --------- ------ --------- 520 AS 100 AS 200 AS 300 521 |-------------------|----------|---------------------| 522 segment 1 segment 2 segment 3 524 6.2. Per-region Aggregation 526 Notice that every I/S-PMSI route from each PE will be propagated 527 throughout all the ASes or regions. They may also trigger 528 corresponding Leaf A-D routes depending on the types of tunnels used 529 in each region. This may become too many - routes and corresponding 530 tunnels. To address this concern, the I-PMSI routes from all PEs in 531 a AS/region can be aggregated into a single I-PMSI route originated 532 from the RBRs, and traffic from all those individual I-PMSI tunnels 533 will be switched into the single I-PMSI tunnel. This is like the 534 MVPN Inter-AS I-PMSI route originated by ASBRs. 536 The MVPN Inter-AS I-PMSI A-D route can be better called as per-AS 537 I-PMSI A-D route, to be compared against the (per-PE) Intra-AS I-PMSI 538 A-D routes originated by each PE. In this document we will call it 539 as per-region I-PMSI A-D route, in case we want to apply the 540 aggregation at regional level. The per-PE I-PMSI routes will not be 541 propagated to other regions. If multiple RBRs are connected to a 542 region, then each will advertise such a route, with the same route 543 key (Section 3.1). Similar to the per-PE I-PMSI A-D routes, RBRs/PEs 544 in a downstream region will each select a best one from all those re- 545 advertised by the upstream RBRs, hence will only receive traffic 546 injected by one of them. 548 MVPN does not aggregate S-PMSI routes from all PEs in an AS like it 549 does for I-PMSIs routes, because the number of PEs that will 550 advertise S-PMSI routes for the same (s,g) or (*,g) is small. This 551 is also the case for EVPN, i.e., there is no per-region S-PMSI 552 routes. 554 Notice that per-region I-PMSI routes can also be used to address 555 backwards compatibility issue, as discussed in Section 5.3. 557 The per-region I-PMSI route uses an embedded EC in NLRI to identify a 558 region. As long as it uniquely identify the region and the RBRs for 559 the same region uses the same EC it is permitted. In the case where 560 an AS number or area ID is needed, the following can be used: 562 o For a two-octet AS number, a Transitive Two-Octet AS-Specific EC 563 of sub-type 0x09 (Source AS), with the Global Administrator sub- 564 field set to the AS number and the Local Administrator sub-field 565 set to 0. 567 o For a four-octet AS number, a Transitive Four-Octet AS-Specific EC 568 of sub-type 0x09 (Source AS), with the Global Administrator sub- 569 field set to the AS number and the Local Administrator sub-field 570 set to 0. 572 o For an area ID, a Transitive IPv4-Address-Specific EC of any sub- 573 type. 575 Uses of other particular ECs may be specified in other documents. 577 6.3. Use of S-NH-EC 579 RFC 7524 specifies the use of S-NH-EC because it does not allow ABRs 580 to change the BGP next hop when they re-advertise I/S-PMSI AD routes 581 to downstream areas. That is only to be consistent with the MVPN 582 Inter-AS I-PMSI A-D routes, whose next hop must not be changed when 583 they're re-advertised by the segmenting ABRs for reasons specific to 584 MVPN. For EVPN, it is perfectly fine to change the next hop when 585 RBRs re-advertise the I/S-PMSI A-D routes, instead of relying on S- 586 NH-EC. As a result, this document specifies that RBRs change the BGP 587 next hop when they re-advertise I/S-PMSI A-D routes and do not use S- 588 NH-EC. if a downstream PE/RBR needs to originate Leaf A-D routes, it 589 simply uses the BGP next hop in the corresponding I/S-PMSI A-D routes 590 to construct Route Targets. 592 The advantage of this is that neither ingress nor egress PEs need to 593 understand/use S-NH-EC, and consistent procedure (based on BGP next 594 hop) is used for both inter-as and inter-region segmentation. 596 6.4. Ingress PE's I-PMSI Leaf Tracking 598 RFC 7524 specifies that when an ingress PE/ASBR (re-)advertises an 599 VPLS I-PMSI A-D route, it sets the LIR flag to 1 in the route's PTA. 600 Similar to the inter-as case, this is actually not really needed for 601 EVPN. To be consistent with the inter-as case, the ingress PE does 602 not set the LIR flag in its originated I-PMSI A-D routes, and 603 determines the leaves based on the BGP next hops in its received 604 I-PMSI A-D routes, as specified in Section 5.2. 606 The same backward compatibility issue exists, and the same solution 607 as in the inter-as case applies, as specified in Section 5.3. 609 7. Intra-region Segmentation and Assisted Ingress Replication 611 [draft-rabadan-bess-evpn-optimized-ir] describes "Assisted Ingress 612 Replication", which reduces the burden of NVEs by having them 613 replicate to only one of a few designated replicators, which will 614 then replicate to other relevant NVEs. The tunnel segmentation 615 procedures can be extended to achieve the same, even with the support 616 for MPLS encapsulation. 618 With inter-region segmentation, an RBR, which is a Route Reflector, 619 changes the BGP Next Hop to one of its own addresses when it re- 620 advertises an I/S-PMSI route to other regions, and sets the LIR bit 621 in the PTA Flag field when necessary, but it does not do so when re- 622 advertising to NVEs in its own region. If it does that even when re- 623 advertising to local NVEs, then it becomes a replicator as in [draft- 624 rabadan-bess-evpn-optimized-ir]: NVEs will respond with Leaf AD 625 routes to individual I-PMSI routes from NVEs, but targeted to the re- 626 advertising RBR of the selected best one (out of all those same 627 routes re-advertised by different RBRs). so that the sending NVEs 628 will only replicate to the RBRs, which will in turn replicate to 629 NVEs. 631 In case of MPLS encapsulation, for split-horizon purpose, NVEs MUST 632 set the LIR bit in their I-PMSI A-D routes to trigger corresponding 633 Leaf A-D routes from RBRs, with different labels advertised in the 634 Leaf A-D routes for different NVEs, so that RBRs know the source NVEs 635 of incoming packets, and will not relay the traffic back to the 636 source NVE. 638 A RNVE (Regular, or legacy NVE that does not support the procedures 639 discussed in this section) replicate traffic directly to all NVEs/ 640 RNVEs. RNVEs can be identified by the lack of indication as 641 discussed in Section 5.3 in their I-PMSI A-D routes. In case of MPLS 642 encapsulation, NVEs and RNVEs advertise a label in their I-PMSI A-D 643 routes, and RBRs MUST not change that when re-advertise the routes. 644 Note that, the label is advertised even though an NVE sets the LIR 645 bit. 647 A RNVE is not able to send back Leaf A-D routes, so RBRs won't relay 648 received traffic to them. An ingress NVE (legacy or not) always send 649 to RNVEs directly. For comparison, in inter-as scenario 650 (Section 5.3) an ASBR is elected to relay traffic but in this intra- 651 region case, it is reasonable for the ingress NVE to send to RNVEs 652 directly - it is feasible and simpler. 654 7.1. Reducing Leaf A-D Routes 656 To address the possible concern with too many Leaf A-D routes (every 657 NVE responds with one to its selected RBR for each I-PMSI A-D route), 658 a RBR can clear the LIR bit when it re-advertises the I-PMSI routes 659 so that no Leaf A-D routes will be triggered for the per-PE I-PMSI 660 routes. It also originates a per-region I-PMSI A-D route 661 (Section 6.2), but instead of into other regions, it is back into the 662 same region. The route has the LIR bit set so that NVEs will respond 663 with a Leaf A-D route, allowing a RBR to determine the set of NVEs 664 that it is responsible for relaying incoming traffic to. 666 The per-region I-PMSI A-D routes from the RBRs and corresponding Leaf 667 A-D routes from NVEs are comparable to the Replicator-AR and Leaf-AR 668 routes with the Optimized IR method (Selective Mode). 670 This is also comparable to the per-region aggregation discussed 671 earlier, only that the per-region I-PMSI A-D route is advertised back 672 to the same region instead of to other regions. Similarly, the RBRs 673 could terminate the per-PE I-PMSI A-D routes if there are no RNVEs. 675 7.2. Mix of inter-region and intra-region segmentation 677 Some more details may need to be spelled out when intra-region 678 segmentation is used for IR optimization while in the mean time 679 inter-region segmentation is used, with RNVEs present in different 680 regions. 682 8. Multi-homing Support 684 If multi-homing does not span across different ASes or regions, 685 existing procedures work with segmenation. If an ES is multi-homed 686 to PEs in different ASes or regions, additional procedures are needed 687 to work with segmentation. The procedures are well understood but 688 omitted here until the requirement becomes clear. 690 9. EVPN DCI 692 In addition to inter-as/region segmentation uses cases, EVPN Overlay 693 DC Interconnect is another important use case for EVPN tunnel 694 segmentation. 696 Section 5.1.1.1 and 5.1.1.2 of [draft-ietf-bess-evpn-overlay] discuss 697 two options of interconnecting EVPN Overlay DCs. With the GW option, 698 DC EVPNs and Interconnect EVPN (DCI) are independent and terminate at 699 the GWs. With the non-GW option, DC EVPNs and Interconnect EVPN form 700 an integral EVPN, just like EVPN inter-as option-B. The GW option is 701 discussed in details in section 3.4 of [draft-ietf-bess-dci-evpn- 702 overlay]. 704 The non-GW option can only be used when PEs can use VNI/VSID that has 705 local significance (like mpls labels), and the GW option must be used 706 otherwise. With the GW option, mac lookup must be performed when 707 traffic comes from where non-local VNI/VSID are used. Otherwise, 708 label/VNI/VSID switching can be used (typical inter-as option-B 709 behavior). 711 Note that with either option, BUM traffic forwarding can be based on 712 tunnel stitching instead of mac lookup (except if IR is used together 713 with non-local VNI/VSID), because BUM traffic goes to all PEs on 714 corresponding provider tunnels instead of to targeted PEs. The 715 following sections discusses some specific details for each option. 717 9.1. Non-GW Option 719 The non-GW option can be easily compared to EVPN/mpls inter-region 720 scenario where a region spans an entire AS - assuming that each DC is 721 in its own AS that is different from the DCI's and other DCs'.. 723 Consider the following diagram: 725 +--------------+ 726 +---------+ | | +---------+ 727 +----+ | +----+ +----+ | +----+ 728 |NVE1|--| |RBR1| |RBR3| |--|NVE3| 729 +----+ | | | | | | +----+ 730 | +----+ +----+ | 731 | DC1 | WAN | DC2 | 732 | +----+ +----+ | 733 | |RBR3| |RBR4| | 734 +----+ | | | | | | +----+ 735 |NVE2|--| +----+ +----+ |--|NVE4| 736 +----+ +---------+ | | +---------+ +----+ 737 +--------------+ 739 |---EVPN-Overlay----|---EVPN-MPLS---|----EVPN-Overlay---| 741 Data Center Interconnect without Gateway 743 The RBRs are WAN Edge routers. They re-advertise I/S-PMSI routes 744 from one side to the other, following the previous described 745 segmentation procedures. For example, the Inclusive Multicast Route 746 from NVE1 is re-advertised into the WAN side by both RBR 1 and RBR2, 747 with the LIR flag bit set in the PTA, and then re-advertised into DC2 748 by RBR3/4. NVE3/4 could both choose either the one re-advertised by 749 RBR3 or by RBR4, or could each choose a different one (e.g., NVE3 750 chooses the one re-advertised by RBR3 while NVE4 chooses the one re- 751 advertised by RBR4). Each either joins the advertised PIM tunnel or 752 send a corresponding Leaf A-D route to the re-advertiser of the 753 chosen best route. RBR3 and/or RBR4 repeat the process, followed by 754 RBR1 and/or RBR2 doing the same. At the end, a segmented tunnel is 755 established to reach all NVE3/4. When BUM traffic arrives on RBR1/2 756 from NVE1 via the tunnel segment in DC1, the multicast VXLAN 757 encapsulation is removed and the traffic is directly switched into 758 the segment in the WAN w/o going through mac lookup. 760 The per-region aggregation method (Section 6.2) can be used to limit 761 the I-PMSI A-D routes to each DC. 763 9.2. GW option 765 Consider the following diagram adapted from section 3.4 of [draft- 766 ietf-bess-dci-evpn-overlay]: 768 +--------------+ 769 +---------+ | | +---------+ 770 +----+ | +---+ +---+ | +----+ 771 |NVE1|--| | | | | |--|NVE3| 772 +----+ | |GW1| |GW3| | +----+ 773 | +---+ +---+ | 774 | DC1 | WAN | DC2 | 775 | +---+ +---+ | 776 | | | | | | 777 +----+ | |GW2| |GW4| | +----+ 778 |NVE2|--| +---+ +---+ |--|NVE4| 779 +----+ +---------+ | | +---------+ +----+ 780 +--------------+ 782 |---EVPN-Overlay----|---EVPN-MPLS---|----EVPN-Overlay---| 784 The GWs consumes EVPN routes from the DC side and re-originate new 785 ones into the WAN side, and vice versa. All GWs will advertise their 786 own I-PMSI A-D route to the DC and WAN side, but only the DF on an 787 internal ESI (I-ESI) for the local DC will forward BUM traffic from 788 one EVPN domain to the other. For example, BUM traffic from NVE1 789 will reach both GW1 and GW2, but only the DF, say GW1, will forward 790 to the WAN side. The traffic will then reach both GW3 and GW4, but 791 again only the DF (for the I-ESI for DC2, say GW4) will forward 792 traffic into DC2. 794 In [draft-ietf-bess-dci-evpn-overlay], the traffic forwarding by GWs 795 is based on mac lookup - because of global significance of VNIs in 796 DCs, the VXLAN encapsulation cannot indicate to which remote NVE a 797 known unicast packet should be forwarded to. However for BUM 798 traffic, this is not a problem - a BUM packet only need to be put 799 onto the appropriate tunnel. As a result, the DF GW on the I-ESI for 800 a local DC can stitch all incoming BUM tunnels from local NVEs to its 801 tunnel on the WAN side, and stitch all incoming BUM tunnels from 802 remote GWs in the DCI into its tunnel on the DC side. This way, BUM 803 traffic will be switched via label/VNI/VSID or multicast vxlan tunnel 804 destination, bypassing mac lookup. Note that, this works only if 805 Ingress Replication is not used for BUM traffic in an EVPN Overlay 806 DC, because in that case the only way to distinguish BUM traffic from 807 known uncast traffic is by checking mac address of the packets. 809 Because the I-PMSI routes/tunnels are terminated in each DC/DCI, the 810 I-PMSI routes originated by GWs are somewhat similar to the per- 811 region I-PMSI routes discussed in the previous section. However, the 812 per-region I-PMSI routes from RBRs in the same DC have the same route 813 key and NVEs will only receive traffic from one of the RBRs based on 814 best route selection, while the per-GW I-PMSI routes are distinct and 815 all NVEs receive traffic from the same one of the GWs because only 816 the DF on the I-ESI can forward traffic. 818 10. Security Considerations 820 This document does not seem to introduce new security risks, though 821 this may be revised after further review and scrutiny. 823 11. Acknowledgements 825 The authors thank Eric Rosen, John Drake, and Ron Bonica for their 826 comments and suggestions. 828 12. References 830 12.1. Normative References 832 [I-D.ietf-bess-ir] 833 Rosen, E., Subramanian, K., and J. Zhang, "Ingress 834 Replication Tunnels in Multicast VPN", draft-ietf-bess- 835 ir-00 (work in progress), January 2015. 837 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 838 Requirement Levels", BCP 14, RFC 2119, 839 DOI 10.17487/RFC2119, March 1997, 840 . 842 [RFC7117] Aggarwal, R., Ed., Kamite, Y., Fang, L., Rekhter, Y., and 843 C. Kodeboniya, "Multicast in Virtual Private LAN Service 844 (VPLS)", RFC 7117, DOI 10.17487/RFC7117, February 2014, 845 . 847 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 848 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 849 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 850 2015, . 852 [RFC7524] Rekhter, Y., Rosen, E., Aggarwal, R., Morin, T., 853 Grosclaude, I., Leymann, N., and S. Saad, "Inter-Area 854 Point-to-Multipoint (P2MP) Segmented Label Switched Paths 855 (LSPs)", RFC 7524, DOI 10.17487/RFC7524, May 2015, 856 . 858 12.2. Informative References 860 [I-D.ietf-bess-dci-evpn-overlay] 861 Rabadan, J., Sathappan, S., Henderickx, W., Palislamovic, 862 S., Balus, F., Sajassi, A., and D. Cai, "Interconnect 863 Solution for EVPN Overlay networks", draft-ietf-bess-dci- 864 evpn-overlay-00 (work in progress), January 2015. 866 [I-D.ietf-bess-evpn-overlay] 867 Sajassi, A., Drake, J., Bitar, N., Isaac, A., Uttaro, J., 868 and W. Henderickx, "A Network Virtualization Overlay 869 Solution using EVPN", draft-ietf-bess-evpn-overlay-01 870 (work in progress), February 2015. 872 [I-D.rabadan-bess-evpn-optimized-ir] 873 Rabadan, J., Sathappan, S., Henderickx, W., Sajassi, A., 874 and A. Isaac, "Optimized Ingress Replication solution for 875 EVPN", draft-rabadan-bess-evpn-optimized-ir-00 (work in 876 progress), October 2014. 878 [I-D.wijnands-bier-architecture] 879 Wijnands, I., Rosen, E., Dolganow, A., Przygienda, T., and 880 S. Aldrin, "Multicast using Bit Index Explicit 881 Replication", draft-wijnands-bier-architecture-05 (work in 882 progress), March 2015. 884 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 885 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 886 2012, . 888 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 889 Encodings and Procedures for Multicast in MPLS/BGP IP 890 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 891 . 893 Authors' Addresses 895 Zhaohui Zhang 896 Juniper Networks, Inc. 898 EMail: zzhang@juniper.net 899 Wen Lin 900 Juniper Networks, Inc. 902 EMail: wlin@juniper.net 904 Jorge Rabadan 905 Alcatel-Lucent 907 EMail: jorge.rabadan@alcatel-lucent.com 909 Keyur Patel 910 Cisco Systems 912 EMail: keyupate@cisco.com