idnits 2.17.1 draft-balaji-l2vpn-trill-over-ip-multi-level-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([DRAFT-EVPN]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 322 has weird spacing: '...opology or (c...' == Line 406 has weird spacing: '...ustomer who i...' == Line 450 has weird spacing: '...ustomer who i...' == Line 848 has weird spacing: '...terface would...' -- The document date (February 19, 2013) is 4078 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 167, but not defined == Missing Reference: 'U-PEs' is mentioned on line 329, but not defined == Missing Reference: 'N-PE' is mentioned on line 472, but not defined == Missing Reference: 'U-PE' is mentioned on line 329, but not defined == Missing Reference: 'U-PEB' is mentioned on line 472, but not defined == Missing Reference: 'U-PEA' is mentioned on line 472, but not defined == Unused Reference: '1' is defined on line 1279, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 1282, but no explicit reference was found in the text == Unused Reference: 'RadiaCloudlet' is defined on line 1294, but no explicit reference was found in the text == Outdated reference: A later version (-02) exists of draft-ietf-l2vpn-trill-evpn-00 == Outdated reference: A later version (-01) exists of draft-xl-trill-over-wan-00 == Outdated reference: A later version (-10) exists of draft-perlman-trill-rbridge-multilevel-03 Summary: 1 error (**), 0 flaws (~~), 17 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 L2VPN Working Group Radia Perlman 3 Intel Labs 4 Internet-draft Bhargav Bhikkaji 5 Intended Status: Proposed Standard Balaji Venkat Venkataswami 6 Expires: August 2013 Ramasubramani Mahadevan 7 Shivakumar Sundaram 8 Narayana Perumal Swamy 9 DELL 10 February 19, 2013 12 Connecting Disparate TRILL-based Data Center/PBB/Campus sites using BGP 13 draft-balaji-l2vpn-trill-over-ip-multi-level-03 15 Abstract 17 There is a need to connect (a) TRILL based data centers or (b) TRILL 18 based networks which provide Provider Backbone like functionalities 19 or (c) Campus TRILL based networks over the WAN using one or more 20 ISPs that provide regular IP+GRE or IP+MPLS transport. Some of the 21 solutions proposed as in [DRAFT-EVPN] have not dealt with the 22 scalable methods in their details as to how these services could be 23 provided such that multiple TRILL sites can be inter-connected with 24 issues like nick-name collisions for unicast and multicast being 25 taken care of. It has been found that with extensions to BGP and a 26 scalable method on Provider edge devices the problem statement which 27 we will define below can be handled. Specifically the division of the 28 nick-name into site-id and Rbridge-ID is one that can limit the 29 number of sites that can be interconnected and the number of Rbridges 30 within each such site that can be provisioned. The draft proposed 31 herein deals / overcomes these issues by not limiting the number of 32 sites and Rbridges within a TRILL site interconnect. Only the maximum 33 space for a nick-name which happens to be 16 bits is the actual 34 limit. MAC moves across TRILL sites and within TRILL sites can also 35 be realized. This document / proposal envisions the use of BGP-MAC- 36 VPN vrfs at the ISP cloud PE devices. We deal in depth with the 37 control plane and data plane particulars for unicast and multicast in 38 this scheme. Additionally Provider Backbone like functionality is 39 also covered. 41 Status of this Memo 43 This Internet-Draft is submitted to IETF in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF), its areas, and its working groups. Note that 48 other groups may also distribute working documents as 49 Internet-Drafts. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 The list of current Internet-Drafts can be accessed at 57 http://www.ietf.org/1id-abstracts.html 59 The list of Internet-Draft Shadow Directories can be accessed at 60 http://www.ietf.org/shadow.html 62 Copyright and License Notice 64 Copyright (c) 2013 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents 69 (http://trustee.ietf.org/license-info) in effect on the date of 70 publication of this document. Please review these documents 71 carefully, as they describe your rights and restrictions with respect 72 to this document. Code Components extracted from this document must 73 include Simplified BSD License text as described in Section 4.e of 74 the Trust Legal Provisions and are provided without warranty as 75 described in the Simplified BSD License. 77 Table of Contents 79 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 80 1.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 4 81 1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 82 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . 5 83 1.2.1 TRILL Data Centers requiring connectivity over WAN . . . 5 84 1.2.2 Provider Backbone remote TRILL cloud requirements . . . 6 85 1.2.3 Campus TRILL network requirements . . . . . . . . . . . 7 86 2. Architecture where the solution applies . . . . . . . . . . . 7 87 2.1 Proposed Solution . . . . . . . . . . . . . . . . . . . . . 8 88 2.1.1 Control Plane . . . . . . . . . . . . . . . . . . . . . 8 89 2.1.1.1 Nickname Collision Solution . . . . . . . . . . . . 8 90 2.1.1.2 N-PE BGP-MAC-VPN-VRFs for Data Center and Campus 91 networks . . . . . . . . . . . . . . . . . . . . . 9 92 2.1.1.3 Control Plane overview . . . . . . . . . . . . . . . 12 94 2.1.2 Corresponding Data plane for the above control plane 95 example. . . . . . . . . . . . . . . . . . . . . . . . . 13 96 2.1.2.1 First phase of deployment for Campus and Data 97 Center sites . . . . . . . . . . . . . . . . . . . . 13 98 2.1.2.2 Other Data plane particulars. . . . . . . . . . . . 16 99 2.1.3 Encapsulations . . . . . . . . . . . . . . . . . . . . . 18 100 2.1.3.1 IP + GRE . . . . . . . . . . . . . . . . . . . . . . 18 101 2.1.3.2 IP + MPLS . . . . . . . . . . . . . . . . . . . . . 19 102 2.2 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . 19 103 2.3 Uniqueness and advantages . . . . . . . . . . . . . . . . . 19 104 2.3.1 Multi-level IS-IS . . . . . . . . . . . . . . . . . . . 20 105 2.3.2 Benefits of the VPN mechanism . . . . . . . . . . . . . 20 106 2.3.3 Benefits of using Multi-level . . . . . . . . . . . . . 20 107 2.4 Comparison with OTV and VPN4DC and other schemes . . . . . . 21 108 2.5 Multi-pathing . . . . . . . . . . . . . . . . . . . . . . . 21 109 2.6 TRILL extensions for BGP . . . . . . . . . . . . . . . . . . 21 110 2.6.1 Format of the MAC-VPN NLRI . . . . . . . . . . . . . . . 21 111 2.6.2. BGP MAC-VPN MAC Address Advertisement . . . . . . . . . 22 112 2.6.2.1 Next hop field in MP_REACH_NLRI . . . . . . . . . . 23 113 2.6.2.2 Route Reflectors for scaling . . . . . . . . . . . . 23 114 2.6.3 Multicast Operations in Interconnecting TRILL sites . . 23 115 2.6.4 Comparison with DRAFT-EVPN . . . . . . . . . . . . . . . 26 116 2.6.4.1 No nickname integration issues in our scheme . . . . 26 117 2.6.4.2 Hierarchical Nicknames and their disadvantages in 118 the DRAFT-EVPN scheme . . . . . . . . . . . . . . . 26 119 2.6.4.3 Load-Balancing issues with respect to DRAFT-EVPN . . 27 120 2.6.4.4 Inter-operating with DRAFT_EVPN . . . . . . . . . . 27 121 2.6.5 Table sizes in hardware . . . . . . . . . . . . . . . . 28 122 2.6.6 The N-PE and its implementation . . . . . . . . . . . . 28 123 3 Security Considerations . . . . . . . . . . . . . . . . . . . . 29 124 4 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 29 125 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 126 5.1 Normative References . . . . . . . . . . . . . . . . . . . 29 127 5.2 Informative References . . . . . . . . . . . . . . . . . . 29 128 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 129 A.1 Appendix I . . . . . . . . . . . . . . . . . . . . . . . . . 31 131 1 Introduction 133 There is a need to connect (a) TRILL based data centers or (b) TRILL 134 based networks which provide Provider Backbone like functionalities 135 or (c) Campus TRILL based networks over the WAN using one or more 136 ISPs that provide regular IP+GRE or IP+MPLS transport. Some of the 137 solutions proposed as in [DRAFT-EVPN] have not dealt with the 138 scalable methods in their details as to how these services could be 139 provided such that multiple TRILL sites can be inter-connected with 140 issues like nick-name collisions for unicast and multicast being 141 taken care of. It has been found that with extensions to BGP and a 142 scalable method on Provider edge devices the problem statement which 143 we will define below can be handled. Specifically the division of the 144 nick-name into site-id and Rbridge-ID is one that can limit the 145 number of sites that can be interconnected and the number of Rbridges 146 within each such site that can be provisioned. The draft proposed 147 herein deals / overcomes these issues by not limiting the number of 148 sites and Rbridges within a TRILL site interconnect. Only the maximum 149 space for a nick-name which happens to be 16 bits is the actual 150 limit. MAC moves across TRILL sites and within TRILL sites can also 151 be realized. This document / proposal envisions the use of BGP-MAC- 152 VPN vrfs at the ISP cloud PE devices. We deal in depth with the 153 control plane and data plane particulars for unicast and multicast in 154 this scheme. Additionally Provider Backbone like functionality is 155 also covered. 157 1.1 Acknowledgements 159 The authors would like to thank Janardhanan Pathangi, Anoop Ghanwani 160 and Ignas Bagdonas for their inputs for this proposal. 162 1.2 Terminology 164 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 165 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 166 document are to be interpreted as described in RFC 2119 [RFC2119]. 168 Legend : 170 U-PE / ARB : User-near PE device or Access Rbridge. U-PEs are edge 171 devices in the Customer site or tier-2 site. It has VRF instances for 172 each tenant it is connected to in the case of Provider-Backbone 173 functionality use-case. 175 U-Ps / CRB : Core Rbridges or core devices in the Customer site that 176 do not directly interact with the Customer's Customer. 178 N-PE : Network Transport PE device. This is a device with RBridge 179 capabilities in the non-core facing side. On the core facing side it 180 is a Layer 3 device supporting IP+GRE and/or IP+MPLS. On the non-core 181 facing side it has support for VRFs one for each TRILL site that it 182 connects to. It runs BGP to convey the BGP-MAC-VPN VRF routes 183 referring to area nicknames to its peer N-PEs. It also supports IGP 184 on the core facing side like OSPF or IS-IS for Layer 3 and supports 185 IP+GRE and/or IP+MPLS if need be. A pseudo-interface representing 186 the N-PE's connection to the Pseudo Level 2 area is provided at each 187 N-PE and a forwarding adjacency is maintained between the near-end N- 188 PE to its remote participating N-PEs pseudo-interface in the common 189 Pseudo Level 2 area which is the IP+GRE or IP+MPLS core. 191 N-P : Network Transport core device. This device is IP and/or 192 IP+MPLS core device that is part of the ISP / ISPs that provide the 193 transport network that connect the disparate TRILL networks together. 195 1.2 Problem Statement 197 1.2.1 TRILL Data Centers requiring connectivity over WAN 199 ____[U-PE]____ ____________ ____[U-PE]____ 200 ( ) ( ) ( ) 201 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 202 ( Data Center Site) ( IP+GRE Encap ) ( Data Center Site) 203 [U-PEs] (A) [N-PE] or IP+MPLS [N-PE] (B) [U-PE] 204 ( ) ( Encap Tunnels ) ( ) 205 ( ) ( between N-PEs) ( ) 206 (___[U-PE]_____) (____________) (____[U-PE]____) 208 Figure 1.0 : TRILL based Data Center sites inter-connectivity. 210 o Providing Layer 2 extension capabilities amongst different 211 disparate data centers running TRILL. 213 o Recognizing MAC Moves across data centers and within data centers 214 to enjoin disparate sites to look and feel as one big Layer 2 cloud. 216 o Provide a solution agnostic to the technology used in the service 217 provider network 219 o Provide a cost effective and simple solution to the above. 221 o Provide auto-configured tunnels instead of pre-configured ones in 222 the transport network. 224 o Provide additional facilities as part of the transport network for 225 eg., TE, QoS etc 227 o Routing and forwarding state is to be maintained at the network 228 edges and not within the site or the core of the transport network. 229 This requires minimization of the state explosion required to provide 230 this solution. 232 o So connectivity for end-customers is through U-PE onto N-PE onto 233 remote-N-PE and onto remote U-PE. 235 1.2.2 Provider Backbone remote TRILL cloud requirements 237 ____[U-PE]____ ____________ ____[U-PE]____ 238 ( ) ( ) ( ) 239 ( Provider ) ( IP Core with ) ( Provider ) 240 ( Backbone TRILL ) ( IP+GRE Encap ) ( Backbone TRILL ) 241 [U-PEs] Site (A) [N-PE] or IP+MPLS [N-PE] Site (B) [U-PE] 242 ( ) ( Encap Tunnels ) ( ) 243 ( ) ( Between N-PEs) ( ) 244 (___[U-PE]_____) (____________) (____[U-PE]____) 246 Figure 2.0 : TRILL based Provider backbone sites inter-connectivity 248 o Providing Layer 2 extension capabilities amongst different Provider 249 Backbone Layer 2 clouds that need connectivity with each other. 251 o Recognizing MAC Moves across Provider Backbone Layer 2 Clouds and 252 within a single site Layer 2 Cloud to enjoin disparate sites to look 253 and feel as one big Layer 2 Cloud. 255 o Provide a solution agnostic to the technology used in the service 256 provider network 258 o Provide a cost effective and simple solution to the above. 260 o Provide auto-configured tunnels instead of pre-configured ones in 261 the transport network. 263 o Provide additional facilities as part of the transport network for 264 eg., TE, QoS etc 266 o Routing and forwarding state is to be maintained at the network 267 edges and not within the site or the core of the transport network. 268 This requires minimization of the state explosion required to provide 269 this solution. 271 o These clouds could be part of the same provider but be far away 272 from each other. The customers of these clouds could demand 273 connectivity to their sites through these TRILL clouds. These TRILL 274 clouds could offer Provider Layer 2 VLAN transport for each of their 275 customers. Hence Provide a seamless connectivity wherever these sites 276 are placed. 278 o So connectivity for end-customers is through U-PE onto N-PE onto 279 remote-N-PE and onto remote U-PE. 281 1.2.3 Campus TRILL network requirements 283 ____[U-PE]____ ____________ ____[U-PE]____ 284 ( ) ( ) ( ) 285 ( Campus ) ( IP Core with ) ( Campus ) 286 ( TRILL Based ) ( IP+GRE Encap ) ( TRILL Based ) 287 [U-PEs] Site (A) [N-PE] or IP+MPLS [N-PE] Site (B) [U-PE] 288 ( ) ( Encap Tunnels ) ( ) 289 ( ) ( between N-PEs) ( ) 290 (___[U-PE]_____) (____________) (____[U-PE]____) 292 Figure 3.0 : TRILL based Campus inter-connectivity 294 o Providing Layer 2 extension capabilities amongst different 295 disparate distantly located Campus Layer 2 clouds that need 296 connectivity with each other. 298 o Recognizing MAC Moves across these Campus Layer 2 clouds and within 299 a single site Campus cloud to enjoin disparate sites to look and feel 300 as one Big Layer 2 Cloud. 302 o Provide a solution agnostic to the technology used in the service 303 provider network. 305 o Provide a cost effective and simple solution to the above. 307 o Provide auto-configured tunnels instead of pre-configured ones in 308 the transport network. 310 o Provide additional facilities as part of the transport network for 311 eg., TE, QoS etc. 313 o Routing and Forwarding state optimizations as in 1.2.1 and 1.2.2. 315 o So connectivity for end-customers is through U-PE onto N-PE onto 316 remote-N-PE and onto remote U-PE. 318 2. Architecture where the solution applies 319 2.1 Proposed Solution 321 The following section outlines (a) Campus TRILL topology or (b) TRILL 322 Data Center topology or (c) Provider backbone Network topology for 323 which solution is intended. 325 ____[U-PE]____ ____________ ____[U-PE]____ 326 ( ) ( ) ( ) 327 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 328 ( RBridges as U-PEs) ( IP+GRE Encap ) ( RBridges as U-PEs) 329 [U-PEs]RBridges as [N-PE] or IP+MPLS [N-PE] RBridges as [U-PE] 330 ( U-Ps ) ( Encap Tunnels ) ( U-Ps ) 331 ( ) ( between N-PEs) ( ) 332 (___[U-PE]_____) (____________) (____[U-PE]____) 334 Figure 4.0 : Proposed Architecture 336 2.1.1 Control Plane 338 o Site network U-PEs still adopt learning function for source MACs 339 bridged through their PE-CE links. For Campus TRILL networks (non- 340 Provider-Backbone networks) the PE-CE links connect the regular hosts 341 / servers. In the case of a data center the PE-CE links connect the 342 servers in a rack to the U-PEs / Top of Rack Switches. 344 o End customer MACs for that specific site are placed in BGP-MAC-VPN 345 VRFs in the N-PE facing that specific site. The MAC learning on the 346 N-PE is done through regular ARP snooping of the source MAC address 347 and its appropriate U-PE is also learnt. 349 o In Provider Backbone like situations the BGP-MAC-VPN VRFs are also 350 placed on the U-PE and the U-PEs in one specific site exchange this 351 information with other site U-PEs. 353 2.1.1.1 Nickname Collision Solution 355 o The near-end N-PE for a site has a forwarding adjacency for the 356 Pseudo Level 2 area Pseudo-Interface to obtain trill nicknames of the 357 next hop far-end N-PE's Level 2 Pseudo-Interface. This forwarding 358 adjacency is built up during the course of BGP-MAC-VPN exchanges 359 between the N-PEs. This forwarding adjacency is a kind of targeted 360 IS-IS adjacency through the IP+GRE or IP+MPLS core. This forwarding 361 adjacency exchange is accomplished through tweaking BGP to connect 362 the near-end N-PE with the far-end N-PEs. Nickname election is done 363 with N-PE Rbridge Pseudo-Interfaces participating in nickname 364 election in Level 2 Area and their non-core facing interfaces which 365 are Level 1 interfaces in the sites in the site considered to be a 366 Level 1 area. 368 o The Nicknames of each site are made distinct within the site since 369 the nickname election process PDUs for Level 1 area are NOT tunneled 370 across the transport network to make sure that each U-P or U-PE or N- 371 PE's Rbridge interface have knowledge of the nickname election 372 process only in their respective sites / domains. If a new domain is 373 connected as a site to an already existing network then the election 374 process NEED NOT be repeated in the newly added site in order to make 375 sure the nicknames are distinct as Multi-Level IS-IS takes care of 376 forwarding from one site / domain to another. It is only the Pseudo- 377 interface of the N-PE of the newly added site that will have to 378 partake in an election to generate a new Pseudo Level 2 area Nickname 379 for itself. 381 2.1.1.2 N-PE BGP-MAC-VPN-VRFs for Data Center and Campus networks 383 o The Customer MACs are placed as routes in the MAC-VPN VRFs on that 384 site's facing N-PE interface with Nexthops being the Nicknames of the 385 U-PEs to which these customer MAC addresses are connected to for that 386 specific site alone. For MAC routes within the Level 1 area the 387 Nicknames are those of the local U-PE itself while the MAC routes 388 from other sites are NOT learnt at all. When the source learning 389 happens the BGP-MAC-VPN-NLRI are NOT communicated to the 390 participating U-PEs in all the sites of the said customer except for 391 the exchange of nicknames of each site which is considered an area. 392 Refer to section A.1.1 in Appendix A.1 for more details on how 393 forwarding takes place between the sites through the multi-level IS- 394 IS mechanism orchestrated over the IP core network. 396 Format of the BGP-MAC-VPN VRF on a N-PE 397 +---------------------+------------------------+ 398 | MAC address | U-PE Nickname | 399 +---------------------+------------------------+ 400 | 00:be:ab:ce:fg:9f | <16-bit U-PE Nickname> | 401 | (local) | | 402 +---------------------+------------------------+ 403 .... 404 .... 406 o A VRF is allocated for each customer who in turn may have multiple 407 VLANs in their end customer sites. So in theory a total of 4K VLANs 408 can be supported per customer. 410 o ISIS for Layer 2 is run atop the Rbridges in the site / Tier-2 411 network 413 o ISIS for Layer 2 disseminates MACs reachable via the TRILL nexthop 414 nicknames of site / Tier-2 network Rbridges amongst the Rbridges in 415 the network site. 417 o N-PEs have VRFs for each tier-2 access network that gain 418 connectivity through the IP+GRE or IP+MPLS core. 420 2.1.1.2.1 U-PE BGP-MAC-VPN VRFs for Provider Backbone Bridging 422 o The Customer MACs are placed as routes in the MAC-VPN VRFs with 423 Nexthops being the area number Nicknames of the U-PEs to which these 424 customer MAC addresses are connected to. For MAC routes within the 425 Level 1 area the Nicknames are those of the local U-PE itself while 426 the MAC routes learnt from other sites have the area number of the 427 site to which the remote U-PE belongs to. When the source learning 428 happens the BGP-MAC-VPN-NLRI are communicated to the participating U- 429 PEs in all the sites of the said customer. Refer to section A.1.1 in 430 Appendix A.1 for more details on how forwarding takes place between 431 the sites through the multi-level IS-IS mechanism orchestrated over 432 the IP core network. 434 o The N-PE requirements for the Tier-1 network is the same as in 435 section 2.1.1.2. 437 Format of the BGP-MAC-VPN VRF on a U-PE / ARB 438 +---------------------+------------------------+ 439 | MAC address | U-PE Nickname | 440 +---------------------+------------------------+ 441 | 00:be:ab:ce:fg:9f | <16-bit U-PE Nickname> | 442 | (local) | | 443 +---------------------+------------------------+ 444 | 00:ce:cb:fe:fc:0f | <16-bit U-PE Area Num> | 445 | (Non-local) | | 446 +---------------------+------------------------+ 447 .... 448 .... 450 o A VRF is allocated for each customer who in turn may have multiple 451 VLANs in their end customer sites. So in theory a total of 4K VLANs 452 can be supported per customer. The P-VLAN or the provider VLAN in the 453 case of a Provider Backbone category can also be 4K VLANs. So in 454 effect in this scheme upto 4K customers could be supported if P-VLAN 455 encapsulation is to be used to differentiate between multiple 456 customers. 458 o ISIS for Layer 2 is run atop the Rbridges in the site / Tier-2 459 network 461 o ISIS for Layer 2 disseminates MACs reachable via the TRILL nexthop 462 nicknames of site / Tier-2 network Rbridges amongst the Rbridges in 463 the network site. 465 o N-PEs have VRFs for each tier-2 access network that gain 466 connectivity through the IP+GRE or IP+MPLS core. 468 ____[U-PE]____ ____________ ____[U-PE]____ 469 ( ) ( ) ( ) 470 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 471 ( RBridges as U-PEs) ( IP+GRE Encap ) ( RBridges as U-PEs) 472 [U-PEB]RBridges as [N-PE] or IP+MPLS [N-PE] RBridges as [U-PEA] 473 .( U-Ps / ).( Encap Tunnels ).( \ U-Ps ) . 474 . ( (X) ) . ( between N-PEs) . ( (Y) ) . 475 . (___[U-PE]_____) . (____________) . (____[U-PE]____) . 476 . . . Other remote 477 Other remote U-PEs ... (BGP-MAC-VPN)... U-PEs known 478 known through TRILL MP-iBGP session through TRILL 479 installing site MAC routes 480 with NextHop as suitable RBridge Nicknames 482 Legend : 483 (X) - Customer A Site 1 MAC-VPN-VRF 484 (Y) - Customer A Site 2 MAC-VPN-VRF 486 U-PEs are edge devices a.k.a Access Rbridges (ARBs) 487 U-Ps a.k.a Core Rbridges (CRBs) are core devices that interconnect U- 488 PEs. 490 Figure 5.0 : BGP-MAC-VPN VRFs amongst N-PEs (and U-PEs in PBB) 492 o N-PEs in the Campus and Data Center Interconnect cases exchange 493 only the area Nicknames. The MAC routes of a specific site are 494 contained within the N-PE for that site. 496 o N-PEs exchange BGP information through route-targets for various 497 customer sites with other N-PEs. This involves only nickname exchange 498 of the area numbers of the sites inter-connected. 500 o For Provider Backbone type networks the MAC routes for the various 501 customer sites are placed in the BGP-MAC-VPN VRF of each U-PE for 502 each customer site it connects to. The MAC routes placed in the VRFs 503 of the U-PEs indicate the MAC addresses for the various Rbridges of 504 the remote tier-2 customer sites with the respective next-hops being 505 the Nicknames of the Level 2 pseudo-interface of the far-end N-PE 506 through which these MAC routes are reachable. 508 o U-PE and U-P Rbridges MACs and TRILL nicknames are placed in BGP- 509 MAC-VPN vrf on the N-PEs. 511 o For Provider Backbone type networks routes to various end customer 512 MACs within a tier-2 customer's sites are exchanged through BGP MAC- 513 VPN sessions between U-PEs. IP connectivity is provided through IP 514 addresses on same subnet for participating U-PEs. 516 2.1.1.3 Control Plane overview 518 ____[U-PE]____ ____________ ____[U-PE]____ 519 ( ) ( ) ( ) 520 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 521 ( RBridges as U-PEs) ( IP+GRE Encap ) ( RBridges as U-PEs) 522 [ B1 ] RBridges as [ N1 ] or IP+MPLS [ N2 ] RBridges as [ B2 ] 523 .( U-Ps / ).( Encap Tunnels ).( \ U-Ps ) . 524 . ( (A1) (X) ) . ( between N-PEs) . ( (Y) (A2) ) . 525 . (___[U-PE]_____) . (____________) . (____[U-PE]____) . 526 (H1) . . (H2) 527 ... (BGP-MAC-VPN)... 528 MP-iBGP session 529 installing site MAC routes 530 with NextHop as suitable RBridge Nicknames 532 Legend : 533 (X) - Customer A Site 1 MAC-VPN-VRF 534 (Y) - Customer A Site 2 MAC-VPN-VRF 536 U-PEs are edge devices a.k.a Access Rbridges (ARBs) 537 U-Ps a.k.a Core Rbridges (CRBs) are core devices that interconnect U- 538 PEs. 540 Figure 6.0 : BGP-MAC-VPN VRFs amongst N-PEs 542 1) B1 and B2 learn that MACs of H1 and H2 are reachable via the ARP 543 mechanism. Example., H2-MAC is reachable via B2-MAC through area 544 Nickname A2. This is accomplished through ARP learning and inspecting 545 the Area nickname in the ARP reply. 547 1.1) ARP request goes as a multicast destination frame from B1 on 548 default multicast distribution tree setup as a spanning tree that 549 includes all U-PEs across the multiple TRILL sites for that customer 550 across the IP core. 552 1.2) ARP reply comes back as unicast. 554 2) N1 and N2 exchange that A1 and A2 are reachable through N1 555 Nickname and N2 Nickname respectively via BGP. 557 3) N1 and N2 need NOT exchange the MACs of U-PEs B1 and B2. 559 4) The routes in the N1 and N2 need NOT be re-distributed into IS-IS 560 of the other site. So we end up with the following correlated routing 561 state. 563 Now the correlated route in B1 is that H2 -> reachable via A2 -> 564 reachable via N1 Nickname. 566 And the correlated route in B2 is that H1 -> reachable via A1 -> 567 reachable via N2 Nickname. 569 And the correlated route in N1 is that A2 -> reachable via Nickname 570 N2 572 And the correlated route in N2 is that A1 -> reachable via Nickname 573 N1 575 2.1.2 Corresponding Data plane for the above control plane example. 577 ____[U-PE]____ ____________ ____[U-PE]____ 578 ( ) ( ) ( ) 579 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 580 ( RBridges as U-PEs) ( IP+GRE Encap ) ( RBridges as U-PEs) 581 [ B1 ] RBridges as [ N1 ] or IP+MPLS [ N2 ] RBridges as [ B2 ] 582 .( U-Ps / ).( Encap Tunnels ).( \ U-Ps ) . 583 . ( (A1) (X) ) . ( between N-PEs) . ( (Y) (A2) ) . 584 . (___[U-PE]_____) . (____________) . (____[U-PE]____) . 585 (H1) . . (H2) 586 ... (BGP-MAC-VPN)... 587 MP-iBGP session 588 installing site MAC routes 589 with NextHop as suitable RBridge Nicknames 591 Legend : 592 (X) - Customer A Site 1 MAC-VPN-VRF 593 (Y) - Customer A Site 2 MAC-VPN-VRF 595 U-PEs are edge devices a.k.a Access Rbridges (ARBs) 596 U-Ps a.k.a Core Rbridges (CRBs) are core devices that interconnect U- 597 PEs. 599 Figure 7.0 : BGP-MAC-VPN VRFs amongst N-PEs 601 2.1.2.1 First phase of deployment for Campus and Data Center sites 603 For the first phase of deployment it is recommended that MP-BGP 604 sessions be constructed between N-PEs alone in case of Data Center 605 and Campus sites. This is necessary as PBB tunnels are not involved. 606 The exchanges remain between the N-PEs about the concerned sites 607 alone and only with respect to area Nicknames of the other areas 608 (sites in the interconnect) and other peering sessions of BGP between 609 U-PEs are not needed since connectivity is the key. 611 2.1.2.1.2 Control Plane in detail for Data Centers and Campus 613 1) N1 and N2 exchange that A1 and A2 are reachable through N1 614 Nickname and N2 Nickname respectively via BGP. 616 2) N1 knows that B1 is within its site and N2 knows that B2 is within 617 its site. N1 and N2 know that H1 and H2 are attached to B1 and B2 618 respectively. 620 3) The corresponding ESADI protocol routes for end stations will also 621 be exchanged between N-PEs using BGP for MAC-Moves. 623 Now the correlated route in B1 is that H2 -> reachable via A2 -> 624 reachable via N1 Nickname. 626 And the correlated route in B2 is that H1 -> reachable via A1 -> 627 reachable via N2 Nickname. 629 And the correlated route in N1 is that A2 -> reachable via Nickname 630 N2 632 And the correlated route in N2 is that A1 -> reachable via Nickname 633 N1 635 2.1.2.1.3 Data Plane in detail for Data Centers and Campus 637 1) H1 sends a packet to B1 with SourceMac as H1-MAC and DestMac as 638 H2-MAC and C-VLAN as C1. This frame is named F1. 640 2) B1 being and Rbridge encapsulates a TRILL header on top of F2, 641 with Ingress Rbridge as B1 and Egress Rbridge as A2. 643 3) This reaches N1 where N1 preserves the TRILL header and sends 644 frame F2 inside a IP+GRE header with GRE key as Cust-A's VRF id. 646 5) Packet reaches N2 where N2 looks up the GRE key to identify which 647 customer / VRF to be looked into. 649 6) In that VRF table N2 looks up H2-MAC and encapsulates F1 with 650 TRILL header with Ingress Rbridge as A1 and Egress Rbridge being B2. 652 7) Finally the packet reaches B2 and is decapsulated and sends F1 to 653 the host. 655 2.1.2.2 Other Data plane particulars. 657 Default Dtree which is spanning all sites is setup for P-VLAN for 658 Customer's Customer CCA supported on all Tier-2 sites. Denoted by 659 ===, //. 660 _____________ ____________ _____________ 661 ( ) ( ) ( ) 662 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 663 ( Customer A Site 1) ( IP+GRE Encap ) ( Customer A Site 2) 664 [U-PEA]============[N-PE]=============[N-PE]==============[U-PEB] 665 ( / ) ( Encap Tunnels ) ( \ // ) 666 ( (X) ) ( between N-PEs) ( (Y) // ) 667 (___[U-PE]_____) (____________) (____[U-PEC]___) 669 Legend : 670 (X) - Customer A Site 1 MAC-VPN-VRF 671 (Y) - Customer A Site 2 MAC-VPN-VRF 673 Figure 8.0 : Dtree spanning all U-PEs for unknown floods. 675 Default Dtree which is spanning all sites is setup for P-VLAN for 676 Customer's Customer CCA supported on all Tier-2 sites. 678 Denoted by ===, //. 680 Forwarding for unknown frames using the default Dtree spanning all 681 customer sites and their respective U-PEs and onto their customers. 683 _____________ ____________ _____________ 684 ( ) ( ) ( ) 685 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 686 ( Customer A Site 1) ( IP+GRE Encap ) ( Customer A Site 2) 687 ( ) ( ) ( ) 688 [U-PEA]============[N-PE]=============[N-PE]==============[U-PEB] 689 ( / ) ( Encap Tunnels ) ( \ // ) 690 ( (X) ) ( between N-PEs) ( (Y) // ) 691 (______________) (____________) (____[U-PEC]___) 692 Legend : 693 (X) - Customer A Site 1 MAC-VPN-VRF 694 (Y) - Customer A Site 2 MAC-VPN-VRF 696 Figure 9.0 : Unknown floods through Dtree spanning for that P-VLAN 698 (1) The Spanning tree (which could be a dtree for that VLAN) carries 699 that packet through site network switches all the way to N-PEs 700 bordering that network site. U-PEs can drop the packet if there exist 701 no ports for that customer VLAN on that U-PE. The Spanning tree 702 includes auto-configured IP-GRE tunnels or MPLS LSPs across the 703 IP+GRE and/or IP+MPLS cloud which are constituent parts of that tree 704 and hence the unknown flood is carried over to the remote N-PEs 705 participating in the said Dtree. The packet then heads to that 706 remote-end (leaf) U-PEs and on to the end customer sites. For 707 purposes of connecting multiple N-PE devices for a Dtree that is 708 being used for unknown floods, a mechanism such as PIM-Bidir overlay 709 using the MVPN mechanism in the core of the IP network can be used. 710 This PIM-Bidir tree would stitch together all the N-PEs of a specific 711 customer. 713 (2) BGP-MAC-VPN VRF exchanges between N-PEs DO NOT carry the routes 714 for MACs of the near-end Rbridges in the near-end site network to the 715 remote-end site network. The MPLS inner label or the GRE key 716 indicates which VRF to consult for an incoming encapsulated packet at 717 an ingress N-PE and at the outgoing N-PE in the IP core. 719 Flooding when DstMAC is unknown. The flooding reaches all U-PEs and 720 is forwarded to the customer devices (Customer's customer devices). 722 ___[U-PE]____ ____________ ____[U-PE]____ 723 ( . ) ( ) ( . ) 724 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 725 ( Customer A Site 1) ( IP+GRE Encap ) ( Customer A Site 2) 726 ( ............ ) ( ............. ) ( .............. ) 727 [U-PEA]============[N-PE]=============[N-PE]==============[U-PEB] 728 ( . / ) ( Encap Tunnels ) ( \ //. ) 729 ( . (X) ) ( between N-PEs) ( (Y) //. ) 730 (___[U-PE]_____) (____________) (____[U-PEC]___) 731 Customer's 733 Legend : 734 (X) - Customer A Site 1 MAC-VPN-VRF 735 (Y) - Customer A Site 2 MAC-VPN-VRF 737 Figure 10.0 : Forwarding when DstMAC is unknown. 739 740 When DstMAC is known. Payload is carried in the following fashion in 741 the IP core. 743 (, 744 In PBB like environments / sites interconnected, the payload is P- 745 VLAN headers encapsulating actual payload. 747 748 , ) 750 In Campus and Data Center environments only the latter is carried. 751 There is no P-VLAN header required. 753 ___[U-PE]____ ____________ ____[U-PE]____ 754 ( ) ( ) ( ) 755 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 756 ( Customer A Site 1) ( IP+GRE Encap ) ( Customer A Site 2) 757 ( ............ ) ( ............. ) ( .............. ) 758 [U-PEA]============[N-PE]=============[N-PE]==============[U-PEB] 759 ( / ) ( Encap Tunnels ) ( \ // ) 760 ( (X) ) ( between N-PEs) ( (Y) // ) 761 (___[U-PE]_____) (____________) (____[U-PEC]___) 763 Legend : 764 (X) - Customer A Site 1 MAC-VPN-VRF 765 (Y) - Customer A Site 2 MAC-VPN-VRF 767 Figure 11.0 : Forwarding when the DstMAC is known. 769 (5) The reverse path would do the same for reachability of the near- 770 end from the far-end. 772 (6) Connectivity is thus established between end customer-sites 773 through site networks and through the IP+GRE and/or IP+MPLS core. 775 (7) End customer packets are carried IP+GRE tunnels or IP+MPLS LSPs 776 through access network site to near-end N-PE in the near-end. N-PE 777 encapsulates this in auto-configured MPLS LSPs or IP+GRE tunnels to 778 far-end N-PEs through the IP+GRE and/or IP+MPLS core. The label is 779 stripped at the far-end N-PE and the inner frame continues to far-end 780 U-PE and onto the customer. 782 2.1.3 Encapsulations 784 2.1.3.1 IP + GRE 785 (, 786 , ) 788 In non-PBB like environments such as Campus and Data Center the 789 Ethernet header with P-VLAN header is not required. 791 2.1.3.2 IP + MPLS 793 (, TRILL 794 Header, 795 , ) 797 2.2 Novelty 799 o MAC routes of a site are restricted to the BGP-MAC-VPN VRFs of the 800 N-PE facing the site. 802 o No Nickname re-election needs to be done when attaching a new site. 804 o Thus BGP-MAC-VPNs on N-Pes in the transport network contain MAC 805 routes with nexthops as TRILL Area nicknames. 807 o The customer edge Rbridges / Provider bridges too contain MAC 808 routes with associated nexthops as TRILL nicknames. This proposal is 809 an extension of BGP-MAC-VPN I-D to include MAC routes with TRILL Area 810 nicknames as Nexthops. 812 2.3 Uniqueness and advantages 814 o Uses existing protocols such as IS-IS for Layer 2 and BGP to 815 achieve this. No changes to IS-IS except for redistribution into BGP 816 at the transport core edge and vice-versa. 818 o Multi-tenancy through the IP+GRE or IP+MPLS core is possible when 819 N-PEs at the edge of the L3 core place various customer sites using 820 the VPN VRF mechanism. This is otherwise not possible in traditional 821 networks and using other mechanisms suggested in recent drafts. 823 o The VPN mechanism also provides ability to use overlapping MAC 824 address spaces within distinct customer sites interconnected using 825 this proposal. 827 o Multi-tenancy within each data center site is possible by using 828 VLAN separation within the VRF. 830 o Mac Moves can be detected if source learning / Grauitous ARP 831 combined with the BGP-MAC-VPN update involving ESADI triggers a 832 change in the concerned VRF tables. 834 o Uses regular BGP supporting MAC-VPN features, between transport 835 core edge devices. 837 o When new TRILL sites are added then no re-election in the Level 1 838 area is needed. Only the Pseudo-interface of the N-PE has to be added 839 to the mix with the transport of the election PDUs being done across 840 the transport network core. 842 2.3.1 Multi-level IS-IS 844 Akin to TRILL IS-IS multi-level draft where each N-PE can be 845 considered as a ABR having one nickname in a customer site which in 846 turn is a level-1 area and a Pseudo Interface facing the core of the 847 transport network which belongs to a Level 2 Area, the Pseudo 848 Interface would do the TRILL header decapsulation for the incoming 849 packet from the Level 1 Area and NOT throw away the TRILL header but 850 re-write it with Area numbers within the Pseudo Level 2 Area and 851 transport the packets across the Layer 3 core (IP+GRE and/or IP+MPLS) 852 after an encapsulation in IP+GRE or IP+MPLS. Thus we should have to 853 follow a scheme with the NP-E core facing Pseudo-interface in the 854 Level 2 Pseudo-Area doing the TRILL encapsulation and decapsulation 855 for outgoing and incoming packets respectively from and to the 856 transport core. The incoming packets from the Level 1 area are 857 subject to encapsulation in IP+GRE or IP+MPLS by the sending N-PE's 858 Pseudo-Interface and the outgoing packets from the transport core are 859 subject to decapsulation from their IP+GRE or IP+MPLS headers by the 860 Pseudo-Interface on the receiving N-PE. 862 2.3.2 Benefits of the VPN mechanism 864 Using the VPN mechanism it is possible that MAC-routes are placed in 865 distinct VRFs in the N-PEs thus providing separation between 866 customers. Assume customer A and customer B have several sites that 867 need to be interconnected. By isolating the routes within specific 868 VRFs multi-tenancy across the L3 core can be achieved. Customer A's 869 sites talk to customer A's sites alone and the same is applicable 870 with Customer B. 872 The same mechanism also provides for overlapping MAC addresses 873 amongst the various customers. Customer A could use the same MAC- 874 addresses as Customer B. This is otherwise not possible with other 875 mechanisms that have been recently proposed. 877 2.3.3 Benefits of using Multi-level 878 The benefits of using Multi-level are choosing appropriate Multicast 879 Trees in other sites through the inter-area multicast method as 880 proposed by Radia Perlman et.al. 882 2.4 Comparison with OTV and VPN4DC and other schemes 884 o OTV requires a few proprietary changes to IS-IS. There are less 885 proprietary changes required for this scheme with regard to IS-IS 886 compared to OTV. 888 o VPN4DC is a problem statement and is not yet as comprehensive as 889 the scheme proposed in this document. 891 o [4] deals with Pseudo-wires being setup across the transport core. 892 The control plane protocols for TRILL seem to be tunneled through the 893 transport core. The scheme in the proposal we make do NOT require 894 anything more than Pseudo Level 2 area number exchanges and those for 895 the Pseudo-interfaces. BGP takes care of the rest of the routing. 896 Also [4] does not take care of nick-name collision detection since 897 the control plane TRILL is also tunneled and as a result when a new 898 site is sought to be brought up into the inter-connection amongst 899 existing TRILL sites, nick-name re-election may be required. 901 o [5] does not have a case for TRILL. It was intended for other types 902 of networks which exclude TRILL since [5] has not yet proposed TRILL 903 Nicknames as nexthops for MAC addresses. 905 2.5 Multi-pathing 907 By using different RDs to export the BGP-MAC routes with their 908 appropriate Nickname next-hops from more than one N-PE we could 909 achieve multi-pathing over the transport IP+GRE and/or IP+MPLS core. 911 2.6 TRILL extensions for BGP 913 2.6.1 Format of the MAC-VPN NLRI 915 +-----------------------------------+ 916 | Route Type (1 octet) | 917 +-----------------------------------+ 918 | Length (1 octet) | 919 +-----------------------------------+ 920 | Route Type specific (variable) | 921 +-----------------------------------+ 923 The Route Type field defines encoding of the rest of MAC-VPN NLRI 924 (Route Type specific MAC-VPN NLRI). 926 The Length field indicates the length in octets of the Route Type 927 specific field of MAC-VPN NLRI. 929 This document defines the following Route Types: 931 + 1 - Ethernet Tag Auto-Discovery (A-D) route 932 + 2 - MAC advertisement route 933 + 3 - Inclusive Multicast Ethernet Tag Route 934 + 4 - Ethernet Segment Route 935 + 5 - Selective Multicast Auto-Discovery (A-D) Route 936 + 6 - Leaf Auto-Discovery (A-D) Route 937 + 7 - MAC Advertisement Route with Nexthop as TRILL Nickname 939 Here type 7 is used in this proposal. 941 2.6.2. BGP MAC-VPN MAC Address Advertisement 943 BGP is extended to advertise these MAC addresses using the MAC 944 advertisement route type in the MAC-VPN-NLRI. 946 A MAC advertisement route type specific MAC-VPN NLRI consists of the 947 following: 949 +---------------------------------------+ 950 | RD (8 octets) | 951 +---------------------------------------+ 952 | MAC Address (6 octets) | 953 +---------------------------------------+ 954 |GRE key / MPLS Label rep. VRF(3 octets)| 955 +---------------------------------------+ 956 | Originating Rbridge's IP Address | 957 +---------------------------------------+ 958 | Originating Rbridge's MAC address | 959 | (8 octets) (N-PE non-core interface) | 960 +---------------------------------------+ 961 | TRILL Area Nickname | 962 +---------------------------------------+ 964 The RD MUST be the RD of the MAC-VPN instance that is advertising the 965 NLRI. The procedures for setting the RD for a given MAC VPN are 966 described in section 8 in [3]. 968 The encoding of a MAC address is the 6-octet MAC address specified by 969 IEEE 802 documents [802.1D-ORIG] [802.1D-REV]. 971 If using the IP+GRE and/or IP+MPLS core networks the GRE key or MPLS 972 label MUST be the downstream assigned MAC-VPN GRE key or MPLS label 973 that is used by the N-PE to forward IP+GRE or IP+MPLS encapsulated 974 ethernet packets received from remote N-PEs, where the destination 975 MAC address in the ethernet packet is the MAC address advertised in 976 the above NLRI. The forwarding procedures are specified in previous 977 sections of this document. A N-PE may advertise the same MAC-VPN 978 label for all MAC addresses in a given MAC-VPN instance. Or a N-PE 979 may advertise a unique MAC-VPN label per MAC address. All of these 980 methodologies have their tradeoffs. 982 Per MAC-VPN instance label assignment requires the least number of 983 MAC-VPN labels, but requires a MAC lookup in addition to a GRE key or 984 MPLS lookup on an egress N-PE for forwarding. On the other hand a 985 unique label per MAC allows an egress N-PE to forward a packet that 986 it receives from another N-PE, to the connected CE, after looking up 987 only the GRE key or MPLS labels and not having to do a MAC lookup. 989 The Originating Rbridge's IP address MUST be set to an IP address of 990 the PE (N-PE). This address SHOULD be common for all the MAC-VPN 991 instances on the PE (e.,g., this address may be PE's loopback 992 address). 994 2.6.2.1 Next hop field in MP_REACH_NLRI 996 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 997 be set to the Nickname of the N-PE. 999 The BGP advertisement that advertises the MAC advertisement route 1000 MUST also carry one or more Route Target (RT) attributes. 1002 2.6.2.2 Route Reflectors for scaling 1004 It is recommended that Route Reflectors SHOULD be deployed to mesh 1005 the U-PEs in the sites with other U-PEs at other sites (belonging to 1006 the same customer) and the transport network also have RRs to mesh 1007 the N-PEs. This takes care of the scaling issues that may arise if 1008 full mesh is deployed amongst U-PEs or the N-PEs. 1010 2.6.3 Multicast Operations in Interconnecting TRILL sites 1012 For the purpose of multicast it is possible that the IP core can have 1013 a Multicast-VPN based PIM-bidir tree (akin to Rosen or NGEN-MVPN) for 1014 each customer that will connect all the N-PEs related to a customer 1015 and carry the multicast traffic over the transport core thus 1016 connecting site to site multicast trees. Each site that is connected 1017 to the N-PE would have the N-PE as the member of the MVPN PIM-Bidir 1018 Tree connecting that site to the other sites' chosen N-PE. Thus only 1019 one N-PE from each site is part of the MVPN PIM-Bidir tree so 1020 constructed. If there exists more than one N-PE per site then that 1021 other N-PE is part of a different MVPN PIM-Bidir tree. Consider the 1022 following diagram that represents three sites that have connectivity 1023 to each other over a WAN. The Site A has 2 N-PEs connected from the 1024 WAN to itself and the others B and C have one each. It is to be noted 1025 that two MVPN Bidir-Trees are constructed one with Site A's N-PE1 and 1026 Site B and C's N-PE respectively while the other MVPN Bidir-tree is 1027 constructed with Site A's N-PE2 and site B and C's respective N-PEs. 1028 It is possible to load-balancing of multicast groups among the sites. 1029 The method of interconnecting trees from the respective Level 1 areas 1030 (that is the sites) to each other is akin to stitching the Dtrees 1031 that have the N-PEs as their stitch end-points in the Pseudo-Level 2 1032 area with the MVPN Bidir tree acting as the conduit for such 1033 stitching. The tree-ids in each site are non-unique and need not be 1034 distince across sites. It is only that the N-PEs which have their one 1035 foot in the Level 1 area are stitched together using the MVPN Bidir 1036 overlay in the Layer 3 core. 1037 -------------- ------------ -------------- 1038 | | | | | | 1039 |TRILL Campus | | WAN | | TRILL Campus | 1040 | Site A | | | | Site B | 1041 | N-PE1==| |===N-PE4 | 1042 RB1 | | | | RB2 1043 | N-PE2==| | | | 1044 | | | | | | 1045 -------------- ------------ -------------- 1046 || 1047 || 1048 ||N-PE3 1049 ------------ 1050 | | 1051 |TRILL Campus| 1052 | Site C | 1053 | | 1054 | | 1055 | | 1056 | | 1057 -----RB3---- 1059 Here N-PE1, N-PE3 and N-PE4 form a MVPN Bidir-tree amongst themselves 1060 to link up the multilevel trees in the 3 sites. While N-PE2, N-PE3 1061 and N-PE4 form a MVPN Bidir-tree amongst themselves to up the 1062 multilevel trees in the 3 sites. 1064 There exist 2 PIM-Bidir overlay trees that can be used to load- 1065 balance say Group G1 on the first and G2 on the second. Lets say the 1066 source of the Group G1 lies within Site A and the first overlay tree 1067 is chosen for multicasting the stream. When the packet hits the WAN 1068 link on N-PE1 the packet is replicated to N-PE3 and N-PE4. It is 1069 important to understand that a concept like Group Designated Border 1070 Rbridge (GDBR) is applied in this case where group assignments are 1071 made to specific N-PEs such that only one of them is active for a 1072 particular group and the other does not send it across the WAN using 1073 the respective MVPN PIM-Bidir tree. Now Group G2 could then use the 1074 MVPN PIM-bidir based tree for its transport. The procedures for 1075 election of Group Designated Border Rbridge within a site will be 1076 further discussed in detail in future versions of this draft or may 1077 be taken to a separate document. VLAN based load-balancing of 1078 multicast groups is also possible and feasible in this scenario. It 1079 also can be VLAN, Multicast MAC-DA based. The GDBR scheme is 1080 applicable only for packets that N-PEs receive as TRILL decapsulated 1081 MVPN PIM-Bidir tree frames from the Layer 3 core. If a TRILL 1082 encapsulated multicast frame arrives at a N-PE only the GDBR for that 1083 group can decapsulate the TRILL header and send it across the Layer 3 1084 core. The other N-PEs can however forward these multi-destination 1085 frames coming from N-PEs across the core belonging to a different 1086 site. 1088 When the packet originates from the source host the Egress Nickname 1089 of the multicast packet is set to the Dtree root at the Level 1 area 1090 where the source is originating the stream from. The packet flows 1091 along the multicast distribution tree to all Rbridges which are part 1092 of the Dtree. Now the N-PE that provides connectivity to the Pseudo- 1093 Level 2 area and to other sites beyond it, also recieves the packet. 1094 The MVPN PIM-bidir tree is used by the near end N-PE to send the 1095 packet to all the other member N-PEs of the customer sites and 1096 appropriate TRILL encapsulation is done at the ingress N-PE for this 1097 multicast stream with the TRILL header containing a local Dtree root 1098 on the receiving site and packet streamed to the said receivers in 1099 that site. Source suppression such that the packet is not put back on 1100 the core, is done by looking at the Group Designated Border Rbridge 1101 information at the receiving site. If then other N-PEs which connect 1102 the site to the Layer 3 core receive the multicast packet sent into 1103 the site by the GDBR for that group then the other N-PEs check if 1104 they are indeed the GDBR for the said group and if not they do not 1105 forward the traffic back into the core. 1107 It is to be noted that the Group Address TLV is transported by BGP 1108 from across the other sites into a site and it is the GDBR for that 1109 group from the remote side that enables this transport. This way the 1110 MVPN PIM-bidir tree is pointed to from within each site through the 1111 configured GDBR N-PEs for a said group. The GDBR thus lies as one of 1112 the receivers in the Dtree for a said group within the site where the 1113 multicast stream originates. 1115 2.6.4 Comparison with DRAFT-EVPN 1117 With respect to DRAFT-EVPN scheme outlined in [DRAFT-EVPN], the 1118 scheme explained in this document has the following advantages over 1119 and above the DRAFT-EVPN scheme. 1121 2.6.4.1 No nickname integration issues in our scheme 1123 Existing TRILL based sites can be brought into the interconnect 1124 without any re-election / re-assignment of nicknames. The one benefit 1125 it seems to have vs DRAFT-EVPN is that adding a new site to a VPN, or 1126 merging 2 distinctly nicknamed VPNs, doesn't cause issues with 1127 nickname clashes. This is a major advantage since the new TRILL site 1128 can hit the ground running without any interruptions to the existing 1129 sites in the interconnect. 1131 2.6.4.2 Hierarchical Nicknames and their disadvantages in the DRAFT-EVPN 1132 scheme 1134 The [DRAFT-EVPN] scheme advocates the use of Hierarchical Nicknames 1135 where the nickname is split into the Site-ID and the Rbridge-ID. The 1136 use of the nicknames has the following corollary disadvantages. 1138 (a) The nickname is a 16 bit entity. With a interconnect where there 1139 are for eg., 18 sites the DRAFT-EVPN scheme has to use 5 bits in the 1140 nickname bitspace for Site-ID. It wastes (32 - 18) = 14 Site-IDs. The 1141 number of sites is also limited to say at best 255 sites. 1143 (b) The nickname is a 16 bit entity. With a interconnect where there 1144 are at least 4K Rbdriges in each site, the nickname space has to set 1145 aside 12 bits at the least in the nickname space for the Rbridge-ID. 1146 This means that the Sites cannot be more than 2^4 = 16. 1148 Thus the use of the hierarchical scheme limits the Site-IDs and also 1149 the number of Rbridges within the site. If we want to have more Sites 1150 we set aside more bits for the Site-ID thus sacrificing maximum 1151 number of Rbridge-IDs within the site. If there are more RBridges 1152 within each site, then allocating more bits for the RBridge-ID would 1153 sacrifice the maximum number of Site-IDs possible. 1155 For eg., in a branch office scenario if there are 32 sites and more 1156 than 255 Rbridges in each of the branch offices it would be difficult 1157 to accomodate the set of sites along with the number of Rbridges 1158 using the hierarchical nickname scheme. 1160 In the scheme outlined in this document, it is possible to set aside 1161 1000 nicknames or 2000 nicknames or even 200 nicknames depending on 1162 the number of sites (since this is a range of nicknames without 1163 hierarchy in the nickname space), without compromising on the maximum 1164 number of Rbridges within each site. If M were the number of sites to 1165 be supported then the number of Rbridges would be 2^16 - M = X. This 1166 X number would be available to all sites since the nickname is site- 1167 local and not globally unique. 1169 It would be possible to set aside a sizeable number within the 1170 nickname space for future expansion of sites without compromising on 1171 the number of Rbrdiges within the site. 1173 2.6.4.3 Load-Balancing issues with respect to DRAFT-EVPN 1175 While DRAFT-EVPN allows for active/active load-balancing the actual 1176 method of distributing the load leads to pinning the flow onto one of 1177 the multi-homed N-PEs for a specific site rather than the multi-path 1178 hashing based scheme that is possible with our scheme. 1180 2.6.4.4 Inter-operating with DRAFT_EVPN 1182 Overall there are Two original approaches: 1184 a) nicknames are hierarchically assigned; say for example top 5 bits 1185 are "site", remainder used within the site 1187 b) a few (say 1000) site nicknames. Within a site, all nicknames 1188 (other than the 1000 area nicknames) can be assigned to individual 1189 RBridges. 1191 With approach b), the TRILL header has to be rewritten when exiting 1192 or entering a site. Suppose R3 is the border RB from site A1, and R4 1193 is the border RB from site A2. And suppose the source is attached to 1194 R1 (within site A1), and the destination is attached to R2 (within 1195 site A2). 1197 R1 will write the TRILL header as "ingress=my nickname", "egress=A2's 1198 site nickname". 1200 When it reaches R3, R3 has to replace "ingress" with "ingress=A1's 1201 site nickname". When it reaches R4, R4 has to replace "egress" with 1202 "R2's individual nickname". 1204 If R4 does not know where the destination MAC is, then R4 has to 1205 flood within A2. 1207 2.6.4.4.1 Proposed merged proposal 1209 When R1 advertises across the backbone to R2, it says: 1211 1) Whether a site is advertising an area prefix (proposal a)) or an 1212 area nickname (proposal b)) 1214 2) The 16-bit number for the area (which is either a prefix or a 1215 nickname). 1217 If R3 (attached to site A1) advertises a prefix, say "15" to R4 1218 (attached to site A2), then R4 must assure that none of the nicknames 1219 of the form <15.nickname> are assigned within site A2. 1221 We suggest that TRILL has a way for R4 to announce that it "owns" a 1222 bunch of nicknames, so when R4 hears from R3 that R3 is claiming all 1223 nicknames of the form <15.nickname>, then R4 would need to advertise 1224 (within site A2), that R4 owns all nicknames in the range <15.0> to 1225 <15.1111111111> (in addition to all the area nicknames from other 1226 areas, plus its own nickname). 1228 Also, in the original example (source attached to R1 at site A1, with 1229 border RB R3, entering destination's site A2 at R4, and destination 1230 attached to R2)... 1232 If site A1 is using the prefix approach, then R3 does not rewrite. 1233 If it's using the site nickname approach, then R3 needs to rewrite 1234 the ingress nickname with the site nickname. 1236 If site A2 is using the prefix approach, then R4 does not need to 1237 rewrite. If it's using the site nickname, then R4 does need to 1238 rewrite. 1240 2.6.5 Table sizes in hardware 1242 The table sizes in hardware will increase only to the extent of the 1243 local conversational C-MACs. There may be a concern that table sizes 1244 in hardware may be a problem with respect to the C-MAC scaling. With 1245 the possibility of having more table sizes in merchant silicon this 1246 may no longer be a issue. 1248 2.6.6 The N-PE and its implementation 1250 It is possible that the N-PE placed as the border Rbridge and router- 1251 PE device respectively on either side of the L3 core, the actual 1252 implementation would be in the form of two devices one acting as the 1253 border Rbridge and the other as the plain Provider Edge router. The 1254 link between the two would be an attachment circuit. 1256 3 Security Considerations 1258 TBD. 1260 4 IANA Considerations 1262 A few IANA considerations need to be considered at this point. A 1263 proper AFI-SAFI indicator would have to be provided to carry MAC 1264 addresses as NLRI with Next-hops as Rbridbge Nicknames. This one AFI- 1265 SAFI indicator could be used for both U-PE MP-iBGP sessions and N-PE 1266 MP-iBGP sessions. For transporting the Group Address TLV suitable 1267 extensions to BGP must be done and appropriate type codes assigned 1268 for the tranport of such TLVs in the BGP-MAC-VPN VRF framework. 1270 5 References 1272 5.1 Normative References 1274 5.2 Informative References 1276 [DRAFT-EVPN] draft-ietf-l2vpn-trill-evpn-00.txt, Sajassi 1277 et.al, 2012 Work in progress 1279 [1] draft-xl-trill-over-wan-00.txt, XiaoLan. Wan et.al 1280 December 11th ,2011 Work in Progress 1282 [2] draft-perlman-trill-rbridge-multilevel-03.txt, Radia 1283 Perlman et.al October 31, 2011 Work in Progress 1285 [3] draft-raggarwa-mac-vpn-01.txt, Rahul Aggarwal et.al, 1286 June 2010, Work in Progress. 1288 [4] draft-yong-trill-trill-o-mpls, Yong et.al, October 1289 2011, Work in Progress. 1291 [5] draft-raggarwa-sajassi-l2vpn-evpn Rahul Aggarwal 1292 et.al, September 2011, Work in Progress. 1294 [RadiaCloudlet] draft-perlman-trill-cloudlet-00, Radia 1295 Perlman et.al, July 30 2012, Work in Progress. 1297 Authors' Addresses 1299 Radia Perlman 1300 Intel Labs 1301 2200 Mission College Blvd 1302 Santa Clara, CA 1303 USA 1305 Email:radia@alum.mit.edu 1307 Bhargav Bhikkaji, 1308 Dell-Force10, 1309 350 Holger Way, 1310 San Jose, CA 1311 U.S.A 1313 Email: Bhargav_Bhikkaji@dell.com 1315 Balaji Venkat Venkataswami, 1316 Dell-Force10, 1317 Olympia Technology Park, 1318 Fortius block, 7th & 8th Floor, 1319 Plot No. 1, SIDCO Industrial Estate, 1320 Guindy, Chennai - 600032. 1321 TamilNadu, India. 1322 Tel: +91 (0) 44 4220 8400 1323 Fax: +91 (0) 44 2836 2446 1325 EMail: BALAJI_VENKAT_VENKAT@dell.com 1327 Ramasubramani Mahadevan, 1328 Dell-Force10, 1329 Olympia Technology Park, 1330 Fortius block, 7th & 8th Floor, 1331 Plot No. 1, SIDCO Industrial Estate, 1332 Guindy, Chennai - 600032. 1333 TamilNadu, India. 1334 Tel: +91 (0) 44 4220 8400 1335 Fax: +91 (0) 44 2836 2446 1337 EMail: Ramasubramani_Mahade@dell.com 1338 Shivakumar Sundaram, 1339 Dell-Force10, 1340 Olympia Technology Park, 1341 Fortius block, 7th & 8th Floor, 1342 Plot No. 1, SIDCO Industrial Estate, 1343 Guindy, Chennai - 600032. 1344 TamilNadu, India. 1345 Tel: +91 (0) 44 4220 8400 1346 Fax: +91 (0) 44 2836 2446 1348 EMail: Shivakumar_sundaram@dell.com 1350 Narayana Perumal Swamy, 1351 Dell-Force10, 1352 Olympia Technology Park, 1353 Fortius block, 7th & 8th Floor, 1354 Plot No. 1, SIDCO Industrial Estate, 1355 Guindy, Chennai - 600032. 1356 TamilNadu, India. 1357 Tel: +91 (0) 44 4220 8400 1358 Fax: +91 (0) 44 2836 2446 1360 Email: Narayana_Perumal@dell.com 1362 A.1 Appendix I 1364 A.1.1 Extract from Multi-level IS-IS draft made applicable to scheme 1366 In the following picture, RB2 and RB3 are area border RBridges. A 1367 source S is attached to RB1. The two areas have nicknames 15961 and 1368 15918, respectively. RB1 has a nickname, say 27, and RB4 has a 1369 nickname, say 44 (and in fact, they could even have the same 1370 nickname, since the RBridge nickname will not be visible outside the 1371 area). 1372 Pseudo 1373 Area 15961 level 2 Area 15918 1374 +-------------------+ +-----------------+ +--------------+ 1375 | | | IP Core network | | | 1376 | S--RB1---Rx--Rz----RB2--- ----RB3---Rk--RB4---D | 1377 | 27 | | . . | | 44 | 1378 | | |Pseudo-Interface | | | 1379 +-------------------+ +-----------------+ +--------------+ 1380 Here RB2 and RB3 are N-PEs. RB4 and RB1 are U-PEs. 1382 This sample topology could apply to Campus and data-center 1383 topologies. For Provider Backbone topologies S would fall outside the 1384 Area 15961 and RB1 would be the U-PE carrying the C-VLANs inside a P- 1385 VLAN for a specific customer. 1387 Let's say that S transmits a frame to destination D, which is 1388 connected to RB4, and let's say that D's location is learned by the 1389 relevant RBridges already. The relevant RBridges have learned the 1390 following: 1392 1) RB1 has learned that D is connected to nickname 15918 1393 2) RB3 has learned that D is attached to nickname 44. 1395 The following sequence of events will occur: 1397 - S transmits an Ethernet frame with source MAC = S and destination 1398 MAC = D. 1400 - RB1 encapsulates with a TRILL header with ingress RBridge = 27, 1401 and egress = 15918. 1403 - RB2 has announced in the Level 1 IS-IS instance in area 15961, 1404 that it is attached to all the area nicknames, including 15918. 1405 Therefore, IS-IS routes the frame to RB2. (Alternatively, if a 1406 distinguished range of nicknames is used for Level 2, Level 1 1407 RBridges seeing such an egress nickname will know to route to the 1408 nearest border router, which can be indicated by the IS-IS attached 1409 bit.) 1411 In the original draft on multi-level IS-IS the following happens and 1413 QUOTE... 1415 - RB2, when transitioning the frame from Level 1 to Level 2, 1416 replaces the ingress RBridge nickname with the area nickname, so 1417 replaces 27 with 15961. Within Level 2, the ingress RBridge field in 1418 the TRILL header will therefore be 15961, and the egress RBridge 1419 field will be 15918. Also RB2 learns that S is attached to nickname 1420 27 in area 15961 to accommodate return traffic. 1422 - The frame is forwarded through Level 2, to RB3, which has 1423 advertised, in Level 2, reachability to the nickname 15918. 1425 - RB3, when forwarding into area 15918, replaces the egress nickname 1426 in the TRILL header with RB4's nickname (44). So, within the 1427 destination area, the ingress nickname will be 15961 and the egress 1428 nickname will be 44. 1430 - RB4, when decapsulating, learns that S is attached to nickname 1431 15961, which is the area nickname of the ingress. 1433 Now suppose that D's location has not been learned by RB1 and/or RB3. 1434 What will happen, as it would in TRILL today, is that RB1 will 1435 forward the frame as a multi-destination frame, choosing a tree. As 1436 the multi-destination frame transitions into Level 2, RB2 replaces 1437 the ingress nickname with the area nickname. If RB1 does not know the 1438 location of D, the frame must be flooded, subject to possible 1439 pruning, in Level 2 and, subject to possible pruning, from Level 2 1440 into every Level 1 area that it reaches on the Level 2 distribution 1441 tree. 1443 UNQUOTE... 1445 In the current proposal that we outline in this document, the TRILL 1446 header is preserved in the IP+GRE or IP+MPLS core. A re-look into the 1447 inner headers after de-capsulation gives the appropriate information 1448 to carry the frame from the N-PE towards the destination U-PE.