idnits 2.17.1 draft-balaji-trill-over-ip-multi-level-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 310 has weird spacing: '...opology or (c...' == Line 390 has weird spacing: '...ustomer who i...' == Line 613 has weird spacing: '... packet with ...' == Line 868 has weird spacing: '...e U-Pes conne...' == Line 932 has weird spacing: '...terface would...' -- The document date (March 26, 2012) is 4411 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 154, but not defined == Missing Reference: 'U-PEs' is mentioned on line 317, but not defined == Missing Reference: 'N-PE' is mentioned on line 412, but not defined == Missing Reference: 'U-PE' is mentioned on line 317, but not defined == Missing Reference: 'U-PEB' is mentioned on line 510, but not defined == Missing Reference: 'U-PEA' is mentioned on line 510, but not defined == Missing Reference: 'N1-PE' is mentioned on line 510, but not defined == Missing Reference: 'N2-PE' is mentioned on line 510, but not defined == Unused Reference: 'KEYWORDS' is defined on line 1234, but no explicit reference was found in the text == Unused Reference: 'RFC1776' is defined on line 1237, but no explicit reference was found in the text == Unused Reference: 'TRUTHS' is defined on line 1240, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 1248, but no explicit reference was found in the text == Unused Reference: 'EVILBIT' is defined on line 1260, but no explicit reference was found in the text == Unused Reference: 'RFC5513' is defined on line 1263, but no explicit reference was found in the text == Unused Reference: 'RFC5514' is defined on line 1266, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1776 ** Downref: Normative reference to an Informational RFC: RFC 1925 (ref. 'TRUTHS') == Outdated reference: A later version (-01) exists of draft-xl-trill-over-wan-00 == Outdated reference: A later version (-10) exists of draft-perlman-trill-rbridge-multilevel-03 Summary: 3 errors (**), 0 flaws (~~), 23 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TRILL Working Group Bhargav Bhikkaji 3 Internet-draft Balaji Venkat Venkataswami 4 Intended Status: Proposed Standard Ramasubramani Mahadevan 5 Expires: September 2012 Shivakumar Sundaram 6 Narayana Perumal Swamy 7 DELL-Force10 8 March 26, 2012 10 Connecting Disparate Data Center/PBB/Campus TRILL sites using BGP 11 draft-balaji-trill-over-ip-multi-level-05 13 Abstract 15 There is a need to connect (a) TRILL based data centers or (b) TRILL 16 based networks which provide Provider Backbone like functionalities 17 or (c) Campus TRILL based networks over the WAN using one or more 18 ISPs that provide regular IP+GRE or IP+MPLS transport. A few 19 solutions have been proposed as in [1] in the recent past that have 20 not looked at the PB-like functionality. These solutions have not 21 dealt with the details as to how these services could be provided 22 such that multiple TRILL sites can be inter-connected with issues 23 like nick-name collisons for unicast and multicast being taken care 24 of. It has been found that with extensions to BGP the problem 25 statement which we will define below can be handled. Both control 26 plane and data plane operations can be driven into the solution to 27 make it seamlessly look at the entire set of TRILL sites as a single 28 entity which then can be viewed as one single Layer 2 cloud. MAC 29 moves across TRILL sites and within TRILL sites can be realized. This 30 document / proposal envisions the use of BGP-MAC-VPN vrfs both at the 31 IP cloud PE devices and at the peripheral PEs within a TRILL site 32 providing Provider Backbone like functionality. We deal in depth with 33 the control plane and data plane particulars for unicast and 34 multicast with nick-name election being taken care of as part of the 35 solution. 37 Status of this Memo 39 This Internet-Draft is submitted to IETF in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF), its areas, and its working groups. Note that 44 other groups may also distribute working documents as 45 Internet-Drafts. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 The list of current Internet-Drafts can be accessed at 53 http://www.ietf.org/1id-abstracts.html 55 The list of Internet-Draft Shadow Directories can be accessed at 56 http://www.ietf.org/shadow.html 58 Copyright and License Notice 60 Copyright (c) 2012 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (http://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 76 1.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 4 77 1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 78 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . 5 79 1.2.1 TRILL Data Centers requiring connectivity over WAN . . . 5 80 1.2.2 Provider Backbone remote TRILL cloud requirements . . . 6 81 1.2.3 Campus TRILL network requirements . . . . . . . . . . . 7 82 2. Architecture where the solution applies . . . . . . . . . . . 7 83 2.1 Proposed Solution . . . . . . . . . . . . . . . . . . . . . 7 84 2.1.1 Control Plane . . . . . . . . . . . . . . . . . . . . . 8 85 2.1.1.1 Nickname Collision Solution . . . . . . . . . . . . 8 86 2.1.1.2 U-PE BGP-MAC-VPN VRFs . . . . . . . . . . . . . . . 9 87 2.1.1.3 Control Plane explained in detail. . . . . . . . . . 11 88 2.1.2 Corresponding Data plane for the above control plane 89 example. . . . . . . . . . . . . . . . . . . . . . . . . 12 90 2.1.2.1 Control plane for regular Campus and Data center 91 sites . . . . . . . . . . . . . . . . . . . . . . . 13 93 2.1.2.2 Other Data plane particulars. . . . . . . . . . . . 15 94 2.1.3 Encapsulations . . . . . . . . . . . . . . . . . . . . . 20 95 2.1.3.1 IP + GRE . . . . . . . . . . . . . . . . . . . . . . 20 96 2.1.3.2 IP + MPLS . . . . . . . . . . . . . . . . . . . . . 20 97 2.2 Other use cases . . . . . . . . . . . . . . . . . . . . . . 20 98 2.3 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . 20 99 2.4 Uniqueness and advantages . . . . . . . . . . . . . . . . . 21 100 2.4.1 Multi-level IS-IS . . . . . . . . . . . . . . . . . . . 22 101 2.4.2 Benefits of the VPN mechanism . . . . . . . . . . . . . 22 102 2.4.3 Inter-working with other VXLAN, NVGRE sites . . . . . . 22 103 2.4.4 Benefits of using Multi-level . . . . . . . . . . . . . 22 104 2.5 Comparison with OTV and VPN4DC and other schemes . . . . . . 23 105 2.6 Multi-pathing . . . . . . . . . . . . . . . . . . . . . . . 23 106 2.7 TRILL extensions for BGP . . . . . . . . . . . . . . . . . . 23 107 2.7.1 Format of the MAC-VPN NLRI . . . . . . . . . . . . . . . 23 108 2.7.2. BGP MAC-VPN MAC Address Advertisement . . . . . . . . . 24 109 2.7.2.1 Next hop field in MP_REACH_NLRI . . . . . . . . . . 25 110 2.7.2.2 Route Reflectors for scaling . . . . . . . . . . . . 25 111 2.7.3 Multicast Operations in Interconnecting TRILL sites . . 25 112 3 Security Considerations . . . . . . . . . . . . . . . . . . . . 29 113 4 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 29 114 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 115 5.1 Normative References . . . . . . . . . . . . . . . . . . . 29 116 5.2 Informative References . . . . . . . . . . . . . . . . . . 29 117 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 118 A.1 Appendix I . . . . . . . . . . . . . . . . . . . . . . . . . 31 120 1 Introduction 122 There is a need to connect (a) TRILL based data centers or (b) TRILL 123 based networks which provide Provider Backbone like functionalities 124 or (c) Campus TRILL based networks over the WAN using one or more 125 ISPs that provide regular IP+GRE or IP+MPLS transport. A few 126 solutions have been proposed as in [1] in the recent past that have 127 not looked at the Provider Backbone-like functionality. These 128 solutions have not dealt with the details as to how these services 129 could be provided such that multiple TRILL sites can be inter- 130 connected with issues like nick-name collisions for unicast 131 (multicast is still TBD) being taken care of. It has been found that 132 with extensions to BGP the problem statement which we will define 133 below can be well handled. Both control plane and data plane 134 operations can be driven into the solution to make it seamlessly look 135 at the entire set of TRILL sites as a single entity which then can be 136 viewed as one single Layer 2 cloud. MAC moves across TRILL sites and 137 within TRILL sites can be realized. This document / proposal 138 envisions the use of BGP-MAC-VPN vrfs both at the IP cloud PE devices 139 and at the peripheral PEs within a TRILL site providing Provider 140 Backbone like functionality. We deal in depth with the control plane 141 and data plane particulars for unicast (multicast is still TBD) with 142 nick-name election being taken care of as part of the solution. 144 1.1 Acknowledgements 146 The authors would like to thank Janardhanan Pathangi, Anoop Ghanwani 147 for their inputs for this proposal. 149 1.2 Terminology 151 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 152 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 153 document are to be interpreted as described in RFC 2119 [RFC2119]. 155 Legend : 157 U-PE / ARB : User-near PE device or Access Rbridge. U-PEs are edge 158 devices in the Customer site or tier-2 site. This is a Rbridge with 159 BGP capabilities. It has VRF instances for each tenant it is 160 connected to in the case of Provider-Backbone functionality use-case. 162 U-Ps / CRB : Core Rbridges or core devices in the Customer site that 163 do not directly interact with the Customer's Customer. 165 N-PE : Network Transport PE device. This is a device with RBridge 166 capabilities in the non-core facing side. On the core facing side it 167 is a Layer 3 device supporting IP+GRE and/or IP+MPLS. On the non-core 168 facing side it has support for VRFs one for each TRILL site that it 169 connects to. It runs BGP to convey the BGP-MAC-VPN VRF routes to its 170 peer N-PEs. It also supports IGP on the core facing side like OSPF or 171 IS-IS for Layer 3 and supports IP+GRE and/or IP+MPLS if need be. A 172 pseudo-interface representing the N-PE's connection to the Pseudo 173 Level 2 area is provided at each N-PE and a forwarding adjacency is 174 maintained between the near-end N-PE to its remote participating N- 175 PEs pseudo-interface in the common Pseudo Level 2 area. 177 N-P : Network Transport core device. This device is IP and/or 178 IP+MPLS core device that is part of the ISP / ISPs that provide the 179 transport network that connect the disparate TRILL networks together. 181 1.2 Problem Statement 183 1.2.1 TRILL Data Centers requiring connectivity over WAN 185 ____[U-PE]____ ____________ ____[U-PE]____ 186 ( ) ( ) ( ) 187 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 188 ( Data Center Site) ( IP+GRE Encap ) ( Data Center Site) 189 [U-PEs] (A) [N-PE] or IP+MPLS [N-PE] (B) [U-PE] 190 ( ) ( Encap Tunnels ) ( ) 191 ( ) ( between N-PEs) ( ) 192 (___[U-PE]_____) (____________) (____[U-PE]____) 194 Figure 1.0 : TRILL based Data Center sites inter-connectivity. 196 o Providing Layer 2 extension capabilities amongst different 197 disparate data centers running TRILL. 199 o Recognizing MAC Moves across data centers and within data centers 200 to enjoin disparate sites to look and feel as one big Layer 2 cloud. 202 o Provide a solution agnostic to the technology used in the service 203 provider network 205 o Provide a cost effective and simple solution to the above. 207 o Provide auto-configured tunnels instead of pre-configured ones in 208 the transport network. 210 o Provide additional facilities as part of the transport network for 211 eg., TE, QoS etc 213 o Routing and forwarding state is to be maintained at the network 214 edges and not within the site or the core of the transport network. 216 This requires minimization of the state explosion required to provide 217 this solution. 219 o So connectivity for end-customers is through U-PE onto N-PE onto 220 remote-N-PE and onto remote U-PE. 222 1.2.2 Provider Backbone remote TRILL cloud requirements 224 ____[U-PE]____ ____________ ____[U-PE]____ 225 ( ) ( ) ( ) 226 ( Provider ) ( IP Core with ) ( Provider ) 227 ( Backbone TRILL ) ( IP+GRE Encap ) ( Backbone TRILL ) 228 [U-PEs] Site (A) [N-PE] or IP+MPLS [N-PE] Site (B) [U-PE] 229 ( ) ( Encap Tunnels ) ( ) 230 ( ) ( Between N-PEs) ( ) 231 (___[U-PE]_____) (____________) (____[U-PE]____) 233 Figure 2.0 : TRILL based Provider backbone sites inter-connectivity 235 o Providing Layer 2 extension capabilities amongst different Provider 236 Backbone Layer 2 clouds that need connectivity with each other. 238 o Recognizing MAC Moves across Provider Backbone Layer 2 Clouds and 239 within a single site Layer 2 Cloud to enjoin disparate sites to look 240 and feel as one big Layer 2 Cloud. 242 o Provide a solution agnostic to the technology used in the service 243 provider network 245 o Provide a cost effective and simple solution to the above. 247 o Provide auto-configured tunnels instead of pre-configured ones in 248 the transport network. 250 o Provide additional facilities as part of the transport network for 251 eg., TE, QoS etc 253 o Routing and forwarding state is to be maintained at the network 254 edges and not within the site or the core of the transport network. 255 This requires minimization of the state explosion required to provide 256 this solution. 258 o These clouds could be part of the same provider but be far away 259 from each other. The customers of these clouds could demand 260 connectivity to their sites through these TRILL clouds. These TRILL 261 clouds could offer Provider Layer 2 VLAN transport for each of their 262 customers. Hence Provide a seamless connectivity wherever these sites 263 are placed. 265 o So connectivity for end-customers is through U-PE onto N-PE onto 266 remote-N-PE and onto remote U-PE. 268 1.2.3 Campus TRILL network requirements 270 ____[U-PE]____ ____________ ____[U-PE]____ 271 ( ) ( ) ( ) 272 ( Campus ) ( IP Core with ) ( Campus ) 273 ( TRILL Based ) ( IP+GRE Encap ) ( TRILL Based ) 274 [U-PEs] Site (A) [N-PE] or IP+MPLS [N-PE] Site (B) [U-PE] 275 ( ) ( Encap Tunnels ) ( ) 276 ( ) ( between N-PEs) ( ) 277 (___[U-PE]_____) (____________) (____[U-PE]____) 279 Figure 3.0 : TRILL based Campus inter-connectivity 281 o Providing Layer 2 extension capabilities amongst different 282 disparate distantly located Campus Layer 2 clouds that need 283 connectivity with each other. 285 o Recognizing MAC Moves across these Campus Layer 2 clouds and within 286 a single site Campus cloud to enjoin disparate sites to look and feel 287 as one Big Layer 2 Cloud. 289 o Provide a solution agnostic to the technology used in the service 290 provider network. 292 o Provide a cost effective and simple solution to the above. 294 o Provide auto-configured tunnels instead of pre-configured ones in 295 the transport network. 297 o Provide additional facilities as part of the transport network for 298 eg., TE, QoS etc. 300 o Routing and Forwarding state optimizations as in 1.2.1 and 1.2.2. 302 o So connectivity for end-customers is through U-PE onto N-PE onto 303 remote-N-PE and onto remote U-PE. 305 2. Architecture where the solution applies 307 2.1 Proposed Solution 309 The following section outlines (a) Campus TRILL topology or (b) TRILL 310 Data Center topology or (c) Provider backbone Network topology for 311 which solution is intended. 313 ____[U-PE]____ ____________ ____[U-PE]____ 314 ( ) ( ) ( ) 315 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 316 ( RBridges as U-PEs) ( IP+GRE Encap ) ( RBridges as U-PEs) 317 [U-PEs]RBridges as [N-PE] or IP+MPLS [N-PE] RBridges as [U-PE] 318 ( U-Ps ) ( Encap Tunnels ) ( U-Ps ) 319 ( ) ( between N-PEs) ( ) 320 (___[U-PE]_____) (____________) (____[U-PE]____) 322 Figure 4.0 : Proposed Architecture 324 2.1.1 Control Plane 326 o Site network U-PEs still adopt learning function for source MACs 327 bridged through their PE-CE links. For Campus TRILL networks (non- 328 Provider-Backbone networks) the PE-CE links connect the regular hosts 329 / servers. In the case of a data center the PE-CE links connect the 330 servers in a rack to the U-PEs / Top of Rack Switches. 332 o End customer MACs are placed in BGP-MAC-VPN VRFs in the U-PE to 333 customer PE-CE links. (at tier 2). 335 2.1.1.1 Nickname Collision Solution 337 o The near-end N-PE for a site has a forwarding adjacency for the 338 Pseudo Level 2 area Pseudo-Interface to obtain trill nicknames of the 339 next hop far-end N-PE's Level 2 Pseudo-Interface. This forwarding 340 adjacency is built up during the course of BGP-MAC-VPN exchanges 341 between the N-PEs. This forwarding adjacency is a kind of targeted 342 IS-IS adjacency through the IP+GRE or IP+MPLS core. This forwarding 343 adjacency exchange is accomplished through tweaking BGP to connect 344 the near-end N-PE with the far-end N-PEs. Nickname election is done 345 with N-PE Rbridge Pseudo-Interfaces participating in nickname 346 election in Level 2 Area and their non-core facing interfaces which 347 are Level 1 interfaces in the sites in the site considered to be a 348 Level 1 area. 350 o The Nicknames of each site are made distinct within the site since 351 the nickname election process PDUs for Level 1 area are NOT tunneled 352 across the transport network to make sure that each U-P or U-PE or N- 353 PE's Rbridge interface have knowledge of the nickname election 354 process only in their respective sites / domains. If a new domain is 355 connected as a site to an already existing network then the election 356 process NEED NOT be repeated in the newly added site in order to make 357 sure the nicknames are distinct as Multi-Level IS-IS takes care of 358 forwarding from one site / domain to another. It is only the Pseudo- 359 interface of the N-PE of the newly added site that will have to 360 partake in an election to generate a new Pseudo Level 2 area Nickname 361 for itself. 363 2.1.1.2 U-PE BGP-MAC-VPN VRFs 365 o The Customer MACs are placed as routes in the MAC-VPN VRFs with 366 Nexthops being the area number Nicknames of the U-PEs to which these 367 customer MAC addresses are connected to. For MAC routes within the 368 Level 1 area the Nicknames are those of the local U-PE itself while 369 the MAC routes learnt from other sites have the area number of the 370 site to which the remote U-PE belongs to. When the source learning 371 happens the BGP-MAC-VPN-NLRI are communicated to the participating U- 372 PEs in all the sites of the said customer. Refer to section A.1.1 in 373 Appendix A.1 for more details on how forwarding takes place between 374 the sites through the multi-level IS-IS mechanism orchestrated over 375 the IP core network. 377 Format of the BGP-MAC-VPN VRF on a U-PE / ARB 378 +---------------------+------------------------+ 379 | MAC address | U-PE Nickname | 380 +---------------------+------------------------+ 381 | 00:be:ab:ce:fg:9f | <16-bit U-PE Nickname> | 382 | (local) | | 383 +---------------------+------------------------+ 384 | 00:ce:cb:fe:fc:0f | <16-bit U-PE Area Num> | 385 | (Non-local) | | 386 +---------------------+------------------------+ 387 .... 388 .... 390 o A VRF is allocated for each customer who in turn may have multiple 391 VLANs in their end customer sites. So in theory a total of 4K VLANs 392 can be supported per customer. The P-VLAN or the provider VLAN in the 393 case of a Provider Backbone category can also be 4K VLANs. So in 394 effect in this scheme upto 4K customers could be supported if P-VLAN 395 encapsulation is to be used to differentiate between multiple 396 customers. 398 o ISIS for Layer 2 is run atop the Rbridges in the site / Tier-2 399 network 401 o ISIS for Layer 2 disseminates MACs reachable via the TRILL nexthop 402 nicknames of site / Tier-2 network Rbridges amongst the Rbridges in 403 the network site. 405 o N-PEs have VRFs for each tier-2 access network that gain 406 connectivity through the IP+GRE or IP+MPLS core. 408 ____[U-PE]____ ____________ ____[U-PE]____ 409 ( ) ( ) ( ) 410 ( TRILL Based ) ( IP Core with ) ( TRILL Based ) 411 ( RBridges as U-PEs) ( IP+GRE Encap ) ( RBridges as U-PEs) 412 [U-PEB]RBridges as [N-PE] or IP+MPLS [N-PE] RBridges as [U-PEA] 413 .( U-Ps / ).( Encap Tunnels ).( \ U-Ps ) . 414 . ( (X) ) . ( between N-PEs) . ( (Y) ) . 415 . (___[U-PE]_____) . (____________) . (____[U-PE]____) . 416 . . . Other remote 417 Other remote U-PEs ... (BGP-MAC-VPN)... U-PEs known 418 known through TRILL MP-iBGP session through TRILL 419 installing site MAC routes 420 with NextHop as suitable RBridge Nicknames 422 Legend : 423 (X) - Customer A Site 1 MAC-VPN-VRF 424 (Y) - Customer A Site 2 MAC-VPN-VRF 426 U-PEs are edge devices a.k.a Access Rbridges (ARBs) 427 U-Ps a.k.a Core Rbridges (CRBs) are core devices that interconnect U- 428 PEs. 430 Figure 5.0 : BGP-MAC-VPN VRFs amongst N-PEs 432 o N-PEs re-distribute the MAC routes in their respective VRFs into 433 the IS-IS Level 1 area after export / import amongst the N-PEs is 434 done. The reverse re-distribution from IS-IS to BGP is also done at 435 each N-PE for its tier-2 customer site. 437 o N-PEs exchange BGP information through route-targets for various 438 customer sites with other N-PEs. The MAC routes for the various 439 customer sites are placed in the BGP-MAC-VPN VRF of each N-PE for 440 each customer site it connects to on the same lines as U-PE MAC-VPN- 441 VRFs. The MAC routes placed in the VRFs of the N-PEs indicate the MAC 442 addresses for the various Rbridges of the remote tier-2 customer 443 sites with the respective next-hops being the Nicknames of the Level 444 2 pseudo-interface of the far-end N-PE through which these MAC routes 445 are reachable. 447 o U-PE and U-P Rbridges MACs and TRILL nicknames are placed in BGP- 448 MAC-VPN vrf on the N-PEs. 450 o Routes to various end customer MACs within a tier-2 customer's 451 sites are exchanged through BGP MAC-VPN sessions between U-PEs. IP 452 connectivity is provided through IP addresses on same subnet for 453 participating U-PEs. 455 (VRF-CCA)[U-PE]____ ____________ ____[U-PE]____ 456 . ( ) ( ) ( (VRF-CCA) 457 . ( ) ( IP Core ) ( ) . 458 .( PBB-CustA-Site 1 ) ( ) ( PBB-CustA-Site 2 ) . 459 [U-PEA] A1 [N1-PE] [N2-PE] A2 [U-PEB] 460 .( / ) ( ) ( \ ) . 461 . ( (X) ) ( ) ( (Y) ) . 462 . (___[U-PE-B1]__) (____________) (____[U-PE-B2]_) . 463 . | | . 464 . H1 H2 . 465 . Customer's . 466 Customer's............... (BGP-MAC-VPN)............Customer CCA. 467 Customer CCA MP-iBGP session Site 1 468 Site 2 installing Customer's Customer site MAC routes 469 with NextHop as suitable RBridge Area Nicknames 471 Legend : 472 A1, A2 - Area Nicknames of the customer sites in TRILL 473 N1, N2 - These are the N-PEs connecting A1 and A2 running BGP 474 sessions 475 B1, B2 - U-PEs in A1 and A2 respectively running BGP sessions 476 H1, H2 - Hosts connected to B1 and B2 U-PEs. 477 Figure 6.0 : BGP-MAC-VPN VRFs between U-PE amongst various sites 479 2.1.1.3 Control Plane explained in detail. 481 1) B1 and B2 exchange that MACs of H1 and H2 are reachable via BGP. 482 Example., H2-MAC is reachable via B2-MAC through area Nickname A2. 484 2) N1 and N2 exchange that A1 and A2 are reachable through N1 485 Nickname and N2 Nickname respectively via BGP. 487 3) N1 and N2 also exchange the MACs of U-PEs B1 and B2. 489 4) The routes in the N1 and N2 are re-distributed into IS-IS to end 490 up with the following correlated routing state. 492 Now the correlated route in B1 is that H2 -> reachable via -> B2 -> 493 reachable via A2 -> reachable via N1 Nickname. 495 And the correlated route in B2 is that H1 -> reachable via -> B1 -> 496 reachable via A1 -> reachable via N2 Nickname. 498 And the correlated route in N1 is that B2 -> reachable via -> A2 -> 499 reachable via Nickname N2 501 And the correlated route in N2 is that B1 -> reachable via -> A1 -> 502 reachable via Nickname N1 504 2.1.2 Corresponding Data plane for the above control plane example. 506 (VRF-CCA)[U-PE]____ ____________ ____[U-PE]____ 507 . ( ) ( ) ( (VRF-CCA) 508 . ( ) ( IP Core ) ( ) . 509 .( PBB-CustA-Site 1 ) ( ) ( PBB-CustA-Site 2 ) . 510 [U-PEA] A1 [N1-PE] [N2-PE] A2 [U-PEB] 511 .( / ) ( ) ( \ ) . 512 . ( (X) ) ( ) ( (Y) ) . 513 . (___[U-PE-B1]__) (____________) (____[U-PE-B2]_) . 514 . | | . 515 . H1 H2 . 516 . Customer's . 517 Customer's............... (BGP-MAC-VPN)............Customer CCA. 518 Customer CCA MP-iBGP session Site 1 519 Site 2 installing Customer's Customer site MAC routes 520 with NextHop as suitable RBridge Area Nicknames 522 Legend : 523 A1, A2 - Area Nicknames of the customer sites in TRILL 524 N1, N2 - These are the N-PEs connecting A1 and A2 running BGP 525 sessions 526 B1, B2 - U-PEs in A1 and A2 respectively running BGP sessions 527 H1, H2 - Hosts connected to B1 and B2 U-PEs. 528 Figure 6.0 : BGP-MAC-VPN VRFs between U-PE amongst various sites 530 1) H1 sends a packet to B1 with SourceMac as H1-MAC and DestMac as 531 H2-MAC and C-VLAN as C1. This frame is named F1. 533 2) B1 encapsulates this packet in a P-VLAN (Provider VLAN) packet 534 with outer SourceMac as B1-MAC and DestMac as B2-MAC with P-VLAN PV1. 535 This frame is named F2. 537 3) B1 being and Rbridge encapsulates a TRILL header on top of F2, 538 with Ingress Rbridge as B1 and Egress Rbridge as A2. 540 4) This reaches N1 where N1 decapsulates the TRILL header and sends 541 frame F2 inside a IP+GRE header with GRE key as Cust-A's VRF id. 543 5) Packet reaches N2 where N2 looks up the GRE key to identify which 544 customer / VRF to be looked into. 546 6) In that VRF table N2 looks up B2 and encapsulates F2 with TRILL 547 header with Ingress Rbridge as A1 and Egress Rbridge being B2. 549 7) Finally the packet reaches B2 and is decapsulated and sends F1 to 550 the host. 552 2.1.2.1 Control plane for regular Campus and Data center sites 554 For non-PBB like environments one could choose the same capabilities 555 as a PBB like environment with all TORs for e.g in a data center 556 having BGP sessions through BGP Route Reflectors with other TORs. By 557 manipulating the Route Targets specific TORs could be tied in 558 together in the topology within a site or even across sites. The 559 easier way to go about the initial phase of deployment would be to 560 restrict the MP-BGP sessions between N-PEs alone within Campus 561 networks and Data centers and let IS-IS do the job of re-distributing 562 into BGP. Flexibility however can be achieved by letting the U-PEs in 563 the Campus or data center networks too to have MP-BGP sessions. 564 Different logical topologies could be achieved as the result of the 565 U-PE BGP sessions. 567 2.1.2.1.1 First phase of deployment for Campus and Data Center sites 569 For the first phase of deployment it is recommended that MP-BGP 570 sessions be constructed between N-PEs alone in case of Data Center 571 and Campus sites. This is necessary as PBB tunnels are not involved. 572 The exchanges remain between the N-PEs about the concerned sites 573 alone and other peering sessions of BGP are not needed since 574 connectivity is the key. When TOR silo based topologies need to be 575 executed then MP-BGP sessions between TORs on the near site and the 576 remote sites can be considered. This will be explored in other 577 documents in the future. 579 2.1.2.1.2 Control Plane for Data Centers and Campus 581 1) N1 and N2 exchange that A1 and A2 are reachable through N1 582 Nickname and N2 Nickname respectively via BGP. 584 2) N1 and N2 also exchange that B1 and B2 are within A1 and A2 and 585 that H1 and H2 are attached to B1 and B2 respectively. 587 3) N1 and N2 also exchange the MACs of ARBs B1 and B2. 589 4) The routes in the N1 and N2 are re-distributed into IS-IS to end 590 up with the following correlated routing state. 592 5) The corresponding ESADI protocol routes for end stations will also 593 be exchanged between N-PEs using BGP. The Nickname of the nexthop 594 will be the Area number from which the route originated. 596 Now the correlated route in B1 is that H2 -> reachable via -> B2 -> 597 reachable via A2 -> reachable via N1 Nickname. 599 And the correlated route in B2 is that H1 -> reachable via -> B1 -> 600 reachable via A1 -> reachable via N2 Nickname. 602 And the correlated route in N1 is that B2 -> reachable via -> A2 -> 603 reachable via Nickname N2 605 And the correlated route in N2 is that B1 -> reachable via -> A1 -> 606 reachable via Nickname N1 608 2.1.2.1.3 Data Plane for Data Centers and Campus 610 1) H1 sends a packet to B1 with SourceMac as H1-MAC and DestMac as 611 H2-MAC and C-VLAN as C1. This frame is named F1. 613 2) B1 encapsulates this packet with outer SourceMac as B1-MAC and 614 DestMac as B2-MAC. This frame is named F2. 616 3) B1 being and Rbridge encapsulates a TRILL header on top of F2, 617 with Ingress Rbridge as B1 and Egress Rbridge as A2. 619 4) This reaches N1 where N1 decapsulates the TRILL header and sends 620 frame F2 inside a IP+GRE header with GRE key as Cust-A's VRF id. 622 5) Packet reaches N2 where N2 looks up the GRE key to identify which 623 customer / VRF to be looked into. 625 6) In that VRF table N2 looks up B2 and encapsulates F2 with TRILL 626 header with Ingress Rbridge as A1 and Egress Rbridge being B2. 628 7) Finally the packet reaches B2 and is decapsulated and sends F1 to 629 the host. 631 2.1.2.2 Other Data plane particulars. 633 Default Dtree which is spanning all sites is setup for P-VLAN for 634 Customer's Customer CCA supported on all Tier-2 sites. Denoted by 635 ===, //. 636 (VRF-CCA)[U-PE]____ ____________ ____[U-PE]____ 637 . ( ) ( ) ( (VRF-CCA) 638 . ( TRILL Based ) ( IP Core with ) ( TRILL Based ) . 639 .( Customer A Site 1) ( IP+GRE Encap ) ( Customer A Site 2) . 640 [U-PEA]============[N-PE]=============[N-PE]==============[U-PEB] 641 .( / ) ( Encap Tunnels ) ( \ // ) . 642 . ( (X) ) ( between N-PEs) ( (Y) // ) . 643 . (___[U-PE]_____) (____________) (____[U-PEC]....(VRF-CCA) 644 . Customer's . 645 Customer's............... (BGP-MAC-VPN)............Customer CCA. 646 Customer CCA MP-iBGP session Site 1 647 Site 2 installing Customer's Customer site MAC routes 648 with NextHop as suitable RBridge Area Nicknames 650 Legend : 651 (X) - Customer A Site 1 MAC-VPN-VRF 652 (Y) - Customer A Site 2 MAC-VPN-VRF 653 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 1 654 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 2 655 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 3 657 Figure 8.0 : Dtree spanning all U-PEs for unknown floods. 659 (1) When a packet comes into a U-PE from the near-end the source MAC 660 is learned and placed in the near-end U-PE BGP-MAC-VPN VRF. This is 661 done in a sub-table depending on which VLAN they belong to in the 662 end-customer's VLANs. The destination MAC if unknown is flooded 663 through a default Spanning tree (could be a dtree) constructed for 664 that provider VLAN which is mapped to carry traffic for the end- 665 customer VLAN in the customer's network sites involved. 667 Default Dtree which is spanning all sites is setup for P-VLAN for 668 Customer's Customer CCA supported on all Tier-2 sites. 670 Denoted by ===, //. 672 Forwarding for unknown frames using the default Dtree spanning all 673 customer sites and their respective U-PEs and onto their customers. 675 (VRF-CCA)[U-PE]____ ____________ ____[U-PE]____ 676 . ( ) ( ) ( (VRF-CCA) 677 . ( TRILL Based ) ( IP Core with ) ( TRILL Based ) . 678 .( Customer A Site 1) ( IP+GRE Encap ) ( Customer A Site 2) . 679 ( ) ( ) ( ) . 680 [U-PEA]============[N-PE]=============[N-PE]==============[U-PEB] 681 .( / ) ( Encap Tunnels ) ( \ // ) . 682 . ( (X) ) ( between N-PEs) ( (Y) // ) . 683 . (___[U-PE]_____) (____________) (____[U-PEC]....(VRF-CCA) 684 . Customer's . 685 Customer's............... (BGP-MAC-VPN)............Customer CCA. 686 Customer CCA MP-iBGP session Site 1 687 Site 2 installing Customer's Customer site MAC routes 688 with NextHop as suitable RBridge Area Nicknames 690 Legend : 691 (X) - Customer A Site 1 MAC-VPN-VRF 692 (Y) - Customer A Site 2 MAC-VPN-VRF 693 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 1 694 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 2 695 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 3 697 Figure 9.0 : Unknown floods through Dtree spanning for that P-VLAN 699 (2) The Spanning tree (which could be a dtree for that VLAN) carries 700 that packet through site network switches all the way to N-PEs 701 bordering that network site. U-PEs can drop the packet if there exist 702 no ports for that customer VLAN on that U-PE. The Spanning tree 703 includes auto-configured IP-GRE tunnels or MPLS LSPs across the 704 IP+GRE and/or IP+MPLS cloud which are constituent parts of that tree 705 and hence the unknown flood is carried over to the remote N-PEs 706 participating in the said Dtree. The packet then heads to that 707 remote-end (leaf) U-PEs and on to the end customer sites. For 708 purposes of connecting multiple N-PE devices for a Dtree that is 709 being used for unknown floods, a mechanism such as PIM-Bidir overlay 710 using the MVPN mechanism in the core of the IP network can be used. 711 This PIM-Bidir tree would stitch together all the N-PEs of a specific 712 customer. 714 (3) BGP-MAC-VPN VRF exchanges between N-PEs carry the routes for MACs 715 of the near-end Rbridges in the near-end site network to the remote- 716 end site network. At the remote end U-PE a correlation between near- 717 end U-PE and the customer MAC is made after BGP-MAC-VPN VRF exchanges 718 between near-end and far-end U-PEs. The MPLS inner label or the GRE 719 key indicates which VRF to consult for an incoming encapsulated 720 packet at an ingress N-PE and at the outgoing N-PE in the IP core. 722 (4) From thereon the source MAC so learnt at the far end is reachable 723 just like a Hierarchical VPN case in MPLS Carrier Supporting Carrier. 724 The only difference is that the nicknames of the far-end U-PEs/U-Ps 725 may be the same as the nicknames of the near-end U-PEs/U-Ps. In order 726 to overcome this, the MAC-routes exchanged between the U-PEs have the 727 next-hops as Area nicknames of the far-end U-PE and then the Area 728 number nickname is resolved to the near-end N-PE/N-PEs in the local 729 site that provide connectivity to the far-end U-PE in question. 731 srcMac is known at U-PEA, so advertize to other U- 732 PEs through BGP in the other customer sites for Customer A that 733 srcMAC is reachable via U-PEA. This is received at the BGP-MAC-VPN 734 VRFs in U-PEB and U-PEC. 736 (VRF-CCA)[U-PE]____ ____________ ____[U-PE]____ 737 . ( ) ( ) ( (VRF-CCA) 738 . ( TRILL Based ) ( IP Core with ) ( TRILL Based ) . 739 .( Customer A Site 1) ( IP+GRE Encap ) ( Customer A Site 2) . 740 ( ............ ) ( ............. ) ( .............. ) . 741 [U-PEA]============[N-PE]=============[N-PE]==============[U-PEB] 742 .( / ) ( Encap Tunnels ) ( \ // ) . 743 . ( (X) ) ( between N-PEs) ( (Y) // ) . 744 . (___[U-PE]_____) (____________) (____[U-PEC]....(VRF-CCA) 745 . Customer's . 746 Customer's............... (BGP-MAC-VPN)............Customer CCA. 747 Customer CCA MP-iBGP session Site 1 748 Site 2 installing Customer's Customer site MAC routes 749 with NextHop as suitable RBridge Area Nicknames 751 Legend : 752 (X) - Customer A Site 1 MAC-VPN-VRF 753 (Y) - Customer A Site 2 MAC-VPN-VRF 754 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 1 755 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 2 756 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 3 758 Figure 10.0 : Distributing MAC routes through BGP-MAC-VPN 759 761 Flooding when DstMAC is unknown. The flooding reaches all U-PEs and 762 is forwarded to the customer devices (Customer's customer devices). 764 (VRF-CCA)[U-PE]____ ____________ ____[U-PE]____ 765 . ( ) ( ) ( (VRF-CCA) 766 . ( TRILL Based ) ( IP Core with ) ( TRILL Based ) . 767 .( Customer A Site 1) ( IP+GRE Encap ) ( Customer A Site 2) . 768 ( ............ ) ( ............. ) ( .............. ) . 769 [U-PEA]============[N-PE]=============[N-PE]==============[U-PEB] 770 .( / ) ( Encap Tunnels ) ( \ //. ) . 771 . ( (X) ) ( between N-PEs) ( (Y) //. ) . 772 . (___[U-PE]_____) (____________) (____[U-PEC]....(VRF-CCA) 773 . Customer's . 774 Customer's............... (BGP-MAC-VPN)............Customer CCA. 775 Customer CCA MP-iBGP session Site 1 776 Site 2 installing Customer's Customer site MAC routes 777 with NextHop as suitable RBridge Area Nicknames 779 Legend : 780 (X) - Customer A Site 1 MAC-VPN-VRF 781 (Y) - Customer A Site 2 MAC-VPN-VRF 782 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 1 783 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 2 784 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 3 786 Figure 11.0 : Forwarding when DstMAC is unknown. 788 789 When DstMAC is known. Payload is carried in the following fashion in 790 the IP core. 792 (, 793 In PBB like environments / sites interconnected, the payload is P- 794 VLAN headers encapsulating actual payload. 796 797 , ) 799 In Campus and Data Center environments only the latter is carried. 800 There is no P-VLAN header required. 802 (VRF-CCA)[U-PE]____ ____________ ____[U-PE]____ 803 . ( ) ( ) ( (VRF-CCA) 804 . ( TRILL Based ) ( IP Core with ) ( TRILL Based ) . 805 .( Customer A Site 1) ( IP+GRE Encap ) ( Customer A Site 2) . 806 ( ............ ) ( ............. ) ( .............. ) . 807 [U-PEA]============[N-PE]=============[N-PE]==============[U-PEB] 808 .( / ) ( Encap Tunnels ) ( \ // ) . 809 . ( (X) ) ( between N-PEs) ( (Y) // ) . 810 . (___[U-PE]_____) (____________) (____[U-PEC]....(VRF-CCA) 811 . Customer's . 812 Customer's............... (BGP-MAC-VPN)............Customer CCA. 813 Customer CCA MP-iBGP session Site 1 814 Site 2 installing Customer's Customer site MAC routes 815 with NextHop as suitable RBridge Area Nicknames 817 Legend : 818 (X) - Customer A Site 1 MAC-VPN-VRF 819 (Y) - Customer A Site 2 MAC-VPN-VRF 820 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 1 821 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 2 822 (VRF-CCA) - MAC-VPN-VRF for Customer's Customer A (CCA) Site 3 824 Figure 12.0 : Forwarding when the DstMAC is known. 826 (5) The reverse path would do the same for reachability of the near- 827 end from the far-end. 829 (6) Connectivity is thus established between end customer-sites 830 through site networks and through the IP+GRE and/or IP+MPLS core. 832 (7) End customer packets are carried IP+GRE tunnels or IP+MPLS LSPs 833 through access network site to near-end N-PE in the near-end. N-PE 834 encapsulates this in auto-configured MPLS LSPs or IP+GRE tunnels to 835 far-end N-PEs through the IP+GRE and/or IP+MPLS core. The label is 836 stripped at the far-end N-PE and the inner frame continues to far-end 837 U-PE and onto the customer. 839 2.1.3 Encapsulations 841 2.1.3.1 IP + GRE 843 (, 845 In PBB like environments... 847 , 848 , ) 850 In non-PBB like environments such as Campus and Data Center the 851 Ethernet header with P-VLAN header is not required. 853 2.1.3.2 IP + MPLS 855 (, 857 In PBB like environments... 859 , 860 , ) 862 In non-PBB like environments such as Campus and Data Center the 863 Ethernet header with P-VLAN header is not required. 865 2.2 Other use cases 867 o Campus to Campus connectivity can also be achieved using this 868 solution. Multi-homing where multiple U-Pes connect to the same 869 customer site can also facilitate load-balancing if a site-id (can 870 use ESI for MAC-VPN-NLRI) is incorporated in the BGP-MAC-VPN NLRI. 871 Mac Moves can be detected if the site-id of the advertised MAC from 872 U-Pes is different from the older ones available. 874 2.3 Novelty 876 o TRILL MAC routes and their associated nexthops which are TRILL 877 nicknames Are re-distributed into BGP from IS-IS 878 o Thus BGP-MAC-VPNs on N-Pes in the transport network contain MAC 879 routes with nexthops as TRILL Area nicknames. 881 o The customer edge Rbridges / Provider bridges too contain MAC 882 routes with associated nexthops as TRILL nicknames. This proposal is 883 an extension of BGP-MAC-VPN I-D to include MAC routes with TRILL Area 884 nicknames as Nexthops. 886 2.4 Uniqueness and advantages 888 o Uses existing protocols such as IS-IS for Layer 2 and BGP to 889 achieve this. No changes to IS-IS except for redistribution into BGP 890 at the transport core edge and vice-versa. 892 o Uses BGP-MAC-VPNs for transporting MAC-updates of customer devices 893 between edge devices only. 895 o Employs a hierarchical MAC-route hiding from the core Rbridges of 896 the site. Employs a hierarchical VPN like solution to avoid routing 897 state of sites within the transport core. 899 o Multi-tenancy through the IP+GRE or IP+MPLS core is possible when 900 N-PEs at the edge of the L3 core place various customer sites using 901 the VPN VRF mechanism. This is otherwise not possible in traditional 902 networks and using other mechanisms suggested in recent drafts. 904 o The VPN mechanism also provides ability to use overlapping MAC 905 address spaces within distinct customer sites interconnected using 906 this proposal. 908 o Multi-tenancy within each data center site is possible by using 909 VLAN separation within the VRF. 911 o Mac Moves can be detected if source learning / Grauitous ARP 912 combined with the BGP-MAC-VPN update triggers a change in the 913 concerned VRF tables. 915 o PBB like functionality supported where P-VLAN and Customer VLAN are 916 different spaces. 918 o Uses regular BGP supporting MAC-VPN features, between transport 919 core edge devices and the Tier-2 customer edge devices. 921 o When new TRILL sites are added then no re-election in the Level 1 922 area is needed. Only the Pseudo-interface of the N-PE has to be added 923 to the mix with the transport of the election PDUs being done across 924 the transport network core. 926 2.4.1 Multi-level IS-IS 928 Akin to TRILL IS-IS multi-level draft where each N-PE can be 929 considered as a ABR having one nickname in a customer site which in 930 turn is a level-1 area and a Pseudo Interface facing the core of the 931 transport network which belongs to a Level 2 Area, the Pseudo 932 Interface would do the TRILL header decapsulation for the incoming 933 packet from the Level 1 Area and throw away the TRILL header within 934 the Pseudo Level 2 Area and transport the packets across the Layer 3 935 core (IP+GRE and/or IP+MPLS) after an encapsulation in IP+GRE or 936 IP+MPLS. Thus we should have to follow a scheme with the NP-E core 937 facing Pseudo-interface in the Level 2 Pseudo-Area doing the TRILL 938 encapsulation and decapsulation for outgoing and incoming packets 939 respectively from and to the transport core. The incoming packets 940 from the Level 1 area are subject to encapsulation in IP+GRE or 941 IP+MPLS by the sending N-PE's Pseudo-Interface and the outgoing 942 packets from the transport core are subject to decapsulation from 943 their IP+GRE or IP+MPLS headers by the Pseudo-Interface on the 944 receiving N-PE. 946 2.4.2 Benefits of the VPN mechanism 948 Using the VPN mechanism it is possible that MAC-routes are placed in 949 distinct VRFs in the N-PEs thus providing separation between 950 customers. Assume customer A and customer B have several sites that 951 need to be interconnected. By isolating the routes within specific 952 VRFs multi-tenancy across the L3 core can be achieved. Customer A's 953 sites talk to customer A's sites alone and the same is applicable 954 with Customer B. 956 The same mechanism also provides for overlapping MAC addresses 957 amongst the various customers. Customer A could use the same MAC- 958 addresses as Customer B. This is otherwise not possible with other 959 mechanisms that have been recently proposed. 961 2.4.3 Inter-working with other VXLAN, NVGRE sites 963 Without TRILL header it is possible to inter-work with STP sites, 964 VXLAN sites, NVGRE sites and with other TRILL sites. 966 For this purpose if for example TRILL site has to inter-operate with 967 VXLAN sites then the VXLAN site has to have a VXLAN gateway that 968 translated plain Ethernet packets coming in from the WAN core into 969 VXLAN packets with the VRF signifying the VXLAN-ID or the VNI. 971 2.4.4 Benefits of using Multi-level 973 The benefits of using Multi-level are choosing appropriate Multicast 974 Trees in other sites through the inter-area multicast method as 975 proposed by Radia Perlman et.al. 977 2.5 Comparison with OTV and VPN4DC and other schemes 979 o OTV requires a few proprietary changes to IS-IS. There are less 980 proprietary changes required for this scheme with regard to IS-IS 981 compared to OTV. 983 o VPN4DC is a problem statement and is not yet as comprehensive as 984 the scheme proposed in this document. 986 o [4] deals with Pseudo-wires being setup across the transport core. 987 The control plane protocols for TRILL seem to be tunneled through the 988 transport core. The scheme in the proposal we make do NOT require 989 anything more than Pseudo Level 2 area number exchanges and those for 990 the Pseudo-interfaces. BGP takes care of the rest of the routing. 991 Also [4] does not take care of nick-name collision detection since 992 the control plane TRILL is also tunneled and as a result when a new 993 site is sought to be brought up into the inter-connection amongst 994 existing TRILL sites, nick-name re-election may be required. 996 o [5] does not have a case for TRILL. It was intended for other types 997 of networks which exclude TRILL since [5] has not yet proposed TRILL 998 Nicknames as nexthops for MAC addresses. 1000 2.6 Multi-pathing 1002 By using different RDs to export the BGP-MAC routes with their 1003 appropriate Nickname next-hops from more than one N-PE we could 1004 achieve multi-pathing over the transport IP+GRE and/or IP+MPLS core. 1006 2.7 TRILL extensions for BGP 1008 2.7.1 Format of the MAC-VPN NLRI 1010 +-----------------------------------+ 1011 | Route Type (1 octet) | 1012 +-----------------------------------+ 1013 | Length (1 octet) | 1014 +-----------------------------------+ 1015 | Route Type specific (variable) | 1016 +-----------------------------------+ 1018 The Route Type field defines encoding of the rest of MAC-VPN NLRI 1019 (Route Type specific MAC-VPN NLRI). 1021 The Length field indicates the length in octets of the Route Type 1022 specific field of MAC-VPN NLRI. 1024 This document defines the following Route Types: 1026 + 1 - Ethernet Tag Auto-Discovery (A-D) route 1027 + 2 - MAC advertisement route 1028 + 3 - Inclusive Multicast Ethernet Tag Route 1029 + 4 - Ethernet Segment Route 1030 + 5 - Selective Multicast Auto-Discovery (A-D) Route 1031 + 6 - Leaf Auto-Discovery (A-D) Route 1032 + 7 - MAC Advertisement Route with Nexthop as TRILL Nickname 1034 Here type 7 is used in this proposal. 1036 2.7.2. BGP MAC-VPN MAC Address Advertisement 1038 BGP is extended to advertise these MAC addresses using the MAC 1039 advertisement route type in the MAC-VPN-NLRI. 1041 A MAC advertisement route type specific MAC-VPN NLRI consists of the 1042 following: 1044 +---------------------------------------+ 1045 | RD (8 octets) | 1046 +---------------------------------------+ 1047 | MAC Address (6 octets) | 1048 +---------------------------------------+ 1049 |GRE key / MPLS Label rep. VRF(3 octets)| 1050 +---------------------------------------+ 1051 | Originating Rbridge's IP Address | 1052 +---------------------------------------+ 1053 | Originating Rbridge's MAC address | 1054 | (8 octets) | 1055 +---------------------------------------+ 1057 The RD MUST be the RD of the MAC-VPN instance that is advertising the 1058 NLRI. The procedures for setting the RD for a given MAC VPN are 1059 described in section 8 in [3]. 1061 The encoding of a MAC address is the 6-octet MAC address specified by 1062 IEEE 802 documents [802.1D-ORIG] [802.1D-REV]. 1064 If using the IP+GRE and/or IP+MPLS core networks the GRE key or MPLS 1065 label MUST be the downstream assigned MAC-VPN GRE key or MPLS label 1066 that is used by the N-PE to forward IP+GRE or IP+MPLS encapsulated 1067 ethernet packets received from remote N-PEs, where the destination 1068 MAC address in the ethernet packet is the MAC address advertised in 1069 the above NLRI. The forwarding procedures are specified in previous 1070 sections of this document. A N-PE may advertise the same MAC-VPN 1071 label for all MAC addresses in a given MAC-VPN instance. Or a N-PE 1072 may advertise a unique MAC-VPN label per MAC address. All of these 1073 methodologies have their tradeoffs. 1075 Per MAC-VPN instance label assignment requires the least number of 1076 MAC-VPN labels, but requires a MAC lookup in addition to a GRE key or 1077 MPLS lookup on an egress N-PE for forwarding. On the other hand a 1078 unique label per MAC allows an egress N-PE to forward a packet that 1079 it receives from another N-PE, to the connected CE, after looking up 1080 only the GRE key or MPLS labels and not having to do a MAC lookup. 1082 The Originating Rbridge's IP address MUST be set to an IP address of 1083 the PE (U-PE or N-PE). This address SHOULD be common for all the 1084 MAC-VPN instances on the PE (e.,g., this address may be PE's loopback 1085 address). 1087 2.7.2.1 Next hop field in MP_REACH_NLRI 1089 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1090 be set to the Nickname of the N-PE or in the case of the U-PE the 1091 Area Nickname of the Rbridge one whose MAC address is carried in the 1092 Originating Rbridge's MAC Address field. 1094 The BGP advertisement that advertises the MAC advertisement route 1095 MUST also carry one or more Route Target (RT) attributes. 1097 It is to be noted that document [3] does not require N-PEs/U-PEs to 1098 create forwarding state for remote MACs when they are learned in the 1099 control plane. When this forwarding state is actually created is a 1100 local implementation matter. However the proposal in this document 1101 requires that forwarding state be established when these MAC routes 1102 are learned in the control plane. 1104 2.7.2.2 Route Reflectors for scaling 1106 It is recommended that Route Reflectors SHOULD be deployed to mesh 1107 the U-PEs in the sites with other U-PEs at other sites (belonging to 1108 the same customer) and the transport network also have RRs to mesh 1109 the N-PEs. This takes care of the scaling issues that may arise if 1110 full mesh is deployed amongst U-PEs or the N-PEs. 1112 2.7.3 Multicast Operations in Interconnecting TRILL sites 1113 For the purpose of multicast it is possible that the IP core can have 1114 a Multicast-VPN based PIM-bidir tree (akin to Rosen or NGEN-MVPN) for 1115 each customer that will connect all the N-PEs related to a customer 1116 and carry the multicast traffic over the transport core thus 1117 connecting site to site multicast trees. Each site that is connected 1118 to the N-PE would have the N-PE as the member of the MVPN PIM-Bidir 1119 Tree connecting that site to the other sites' chosen N-PE. Thus only 1120 one N-PE from each site is part of the MVPN PIM-Bidir tree so 1121 constructed. If there exists more than one N-PE per site then that 1122 other N-PE is part of a different MVPN PIM-Bidir tree. Consider the 1123 following diagram that represents three sites that have connectivity 1124 to each other over a WAN. The Site A has 2 N-PEs connected from the 1125 WAN to itself and the others B and C have one each. It is to be noted 1126 that two MVPN Bidir-Trees are constructed one with Site A's N-PE1 and 1127 Site B and C's N-PE respectively while the other MVPN Bidir-tree is 1128 constructed with Site A's N-PE2 and site B and C's respective N-PEs. 1129 It is possible to load-balancing of multicast groups among the sites. 1130 The method of interconnecting trees from the respective Level 1 areas 1131 (that is the sites) to each other is akin to stitching the Dtrees 1132 that have the N-PEs as their stitch end-points in the Pseudo-Level 2 1133 area with the MVPN Bidir tree acting as the conduit for such 1134 stitching. The tree-ids in each site are non-unique and need not be 1135 distince across sites. It is only that the N-PEs which have their one 1136 foot in the Level 1 area are stitched together using the MVPN Bidir 1137 overlay in the Layer 3 core. 1138 -------------- ------------ -------------- 1139 | | | | | | 1140 |TRILL Campus | | WAN | | TRILL Campus | 1141 | Site A | | | | Site B | 1142 | N-PE1==| |===N-PE4 | 1143 RB1 | | | | RB2 1144 | N-PE2==| | | | 1145 | | | | | | 1146 -------------- ------------ -------------- 1147 || 1148 || 1149 ||N-PE3 1150 ------------ 1151 | | 1152 |TRILL Campus| 1153 | Site C | 1154 | | 1155 | | 1156 | | 1157 | | 1158 -----RB3---- 1160 Here N-PE1, N-PE3 and N-PE4 form a MVPN Bidir-tree amongst themselves 1161 to link up the multilevel trees in the 3 sites. While N-PE2, N-PE3 1162 and N-PE4 form a MVPN Bidir-tree amongst themselves to up the 1163 multilevel trees in the 3 sites. 1165 There exist 2 PIM-Bidir overlay trees that can be used to load- 1166 balance say Group G1 on the first and G2 on the second. Lets say the 1167 source of the Group G1 lies within Site A and the first overlay tree 1168 is chosen for multicasting the stream. When the packet hits the WAN 1169 link on N-PE1 the packet is replicated to N-PE3 and N-PE4. It is 1170 important to understand that a concept like Group Designated Border 1171 Rbridge (GDBR) is applied in this case where group assignments are 1172 made to specific N-PEs such that only one of them is active for a 1173 particular group and the other does not send it across the WAN using 1174 the respective MVPN PIM-Bidir tree. Now Group G2 could then use the 1175 MVPN PIM-bidir based tree for its transport. The procedures for 1176 election of Group Designated Border Rbridge within a site will be 1177 further discussed in detail in future versions of this draft or may 1178 be taken to a separate document. VLAN based load-balancing of 1179 multicast groups is also possible and feasible in this scenario. It 1180 also can be VLAN, Multicast MAC-DA based. The GDBR scheme is 1181 applicable only for packets that N-PEs receive as TRILL decapsulated 1182 MVPN PIM-Bidir tree frames from the Layer 3 core. If a TRILL 1183 encapsulated multicast frame arrives at a N-PE only the GDBR for that 1184 group can decapsulate the TRILL header and send it across the Layer 3 1185 core. The other N-PEs can however forward these multi-destination 1186 frames coming from N-PEs across the core belonging to a different 1187 site. 1189 When the packet originates from the source host the Egress Nickname 1190 of the multicast packet is set to the Dtree root at the Level 1 area 1191 where the source is originating the stream from. The packet flows 1192 along the multicast distribution tree to all Rbridges which are part 1193 of the Dtree. Now the N-PE that provides connectivity to the Pseudo- 1194 Level 2 area and to other sites beyond it, also recieves the packet. 1195 The MVPN PIM-bidir tree is used by the near end N-PE to send the 1196 packet to all the other member N-PEs of the customer sites and 1197 appropriate TRILL encapsulation is done at the ingress N-PE for this 1198 multicast stream with the TRILL header containing a local Dtree root 1199 on the receiving site and packet streamed to the said receivers in 1200 that site. Source suppression such that the packet is not put back on 1201 the core, is done by looking at the Group Designated Border Rbridge 1202 information at the receiving site. If then other N-PEs which connect 1203 the site to the Layer 3 core receive the multicast packet sent into 1204 the site by the GDBR for that group then the other N-PEs check if 1205 they are indeed the GDBR for the said group and if not they do not 1206 forward the traffic back into the core. 1208 It is to be noted that the Group Address TLV is transported by BGP 1209 from across the other sites into a site and it is the GDBR for that 1210 group from the remote side that enables this transport. This way the 1211 MVPN PIM-bidir tree is pointed to from within each site through the 1212 configured GDBR N-PEs for a said group. The GDBR thus lies as one of 1213 the receivers in the Dtree for a said group within the site where the 1214 multicast stream originates. 1216 3 Security Considerations 1218 TBD. 1220 4 IANA Considerations 1222 A few IANA considerations need to be considered at this point. A 1223 proper AFI-SAFI indicator would have to be provided to carry MAC 1224 addresses as NLRI with Next-hops as Rbridbge Nicknames. This one AFI- 1225 SAFI indicator could be used for both U-PE MP-iBGP sessions and N-PE 1226 MP-iBGP sessions. For transporting the Group Address TLV suitable 1227 extensions to BGP must be done and appropriate type codes assigned 1228 for the tranport of such TLVs in the BGP-MAC-VPN VRF framework. 1230 5 References 1232 5.1 Normative References 1234 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate 1235 Requirement Levels", BCP 14, RFC 2119, March 1997. 1237 [RFC1776] Crocker, S., "The Address is the Message", RFC 1776, April 1238 1 1995. 1240 [TRUTHS] Callon, R., "The Twelve Networking Truths", RFC 1925, 1241 April 1 1996. 1243 5.2 Informative References 1245 [1] draft-xl-trill-over-wan-00.txt, XiaoLan. Wan et.al 1246 December 11th ,2011 Work in Progress 1248 [2] draft-perlman-trill-rbridge-multilevel-03.txt, Radia 1249 Perlman et.al October 31, 2011 Work in Progress 1251 [3] draft-raggarwa-mac-vpn-01.txt, Rahul Aggarwal et.al, 1252 June 2010, Work in Progress. 1254 [4] draft-yong-trill-trill-o-mpls, Yong et.al, October 1255 2011, Work in Progress. 1257 [5] draft-raggarwa-sajassi-l2vpn-evpn Rahul Aggarwal 1258 et.al, September 2011, Work in Progress. 1260 [EVILBIT] Bellovin, S., "The Security Flag in the IPv4 Header", 1261 RFC 3514, April 1 2003. 1263 [RFC5513] Farrel, A., "IANA Considerations for Three Letter 1264 Acronyms", RFC 5513, April 1 2009. 1266 [RFC5514] Vyncke, E., "IPv6 over Social Networks", RFC 5514, April 1 1267 2009. 1269 Authors' Addresses 1271 Bhargav Bhikkaji, 1272 Dell-Force10, 1273 350 Holger Way, 1274 San Jose, CA 1275 U.S.A 1277 Email: Bhargav_Bhikkaji@dell.com 1279 Balaji Venkat Venkataswami, 1280 Dell-Force10, 1281 Olympia Technology Park, 1282 Fortius block, 7th & 8th Floor, 1283 Plot No. 1, SIDCO Industrial Estate, 1284 Guindy, Chennai - 600032. 1285 TamilNadu, India. 1286 Tel: +91 (0) 44 4220 8400 1287 Fax: +91 (0) 44 2836 2446 1289 EMail: BALAJI_VENKAT_VENKAT@dell.com 1291 Ramasubramani Mahadevan, 1292 Dell-Force10, 1293 Olympia Technology Park, 1294 Fortius block, 7th & 8th Floor, 1295 Plot No. 1, SIDCO Industrial Estate, 1296 Guindy, Chennai - 600032. 1297 TamilNadu, India. 1298 Tel: +91 (0) 44 4220 8400 1299 Fax: +91 (0) 44 2836 2446 1300 EMail: Ramasubramani_Mahade@dell.com 1302 Shivakumar Sundaram, 1303 Dell-Force10, 1304 Olympia Technology Park, 1305 Fortius block, 7th & 8th Floor, 1306 Plot No. 1, SIDCO Industrial Estate, 1307 Guindy, Chennai - 600032. 1308 TamilNadu, India. 1309 Tel: +91 (0) 44 4220 8400 1310 Fax: +91 (0) 44 2836 2446 1312 EMail: Shivakumar_sundaram@dell.com 1314 Narayana Perumal Swamy, 1315 Dell-Force10, 1316 Olympia Technology Park, 1317 Fortius block, 7th & 8th Floor, 1318 Plot No. 1, SIDCO Industrial Estate, 1319 Guindy, Chennai - 600032. 1320 TamilNadu, India. 1321 Tel: +91 (0) 44 4220 8400 1322 Fax: +91 (0) 44 2836 2446 1324 Email: Narayana_Perumal@dell.com 1326 A.1 Appendix I 1328 A.1.1 Extract from Multi-level IS-IS draft made applicable to scheme 1330 In the following picture, RB2 and RB3 are area border RBridges. A 1331 source S is attached to RB1. The two areas have nicknames 15961 and 1332 15918, respectively. RB1 has a nickname, say 27, and RB4 has a 1333 nickname, say 44 (and in fact, they could even have the same 1334 nickname, since the RBridge nickname will not be visible outside the 1335 area). 1337 Pseudo 1338 Area 15961 level 2 Area 15918 1339 +-------------------+ +-----------------+ +--------------+ 1340 | | | IP Core network | | | 1341 | S--RB1---Rx--Rz----RB2--- ----RB3---Rk--RB4---D | 1342 | 27 | | . . | | 44 | 1343 | | |Pseudo-Interface | | | 1344 +-------------------+ +-----------------+ +--------------+ 1346 Here RB2 and RB3 are N-PEs. RB4 and RB1 are U-PEs. 1348 This sample topology could apply to Campus and data-center 1349 topologies. For Provider Backbone topologies S would fall outside the 1350 Area 15961 and RB1 would be the U-PE carrying the C-VLANs inside a P- 1351 VLAN for a specific customer. 1353 Let's say that S transmits a frame to destination D, which is 1354 connected to RB4, and let's say that D's location is learned by the 1355 relevant RBridges already. The relevant RBridges have learned the 1356 following: 1358 1) RB1 has learned that D is connected to nickname 15918 1359 2) RB3 has learned that D is attached to nickname 44. 1361 The following sequence of events will occur: 1363 - S transmits an Ethernet frame with source MAC = S and destination 1364 MAC = D. 1366 - RB1 encapsulates with a TRILL header with ingress RBridge = 27, 1367 and egress = 15918. 1369 - RB2 has announced in the Level 1 IS-IS instance in area 15961, 1370 that it is attached to all the area nicknames, including 15918. 1371 Therefore, IS-IS routes the frame to RB2. (Alternatively, if a 1372 distinguished range of nicknames is used for Level 2, Level 1 1373 RBridges seeing such an egress nickname will know to route to the 1374 nearest border router, which can be indicated by the IS-IS attached 1375 bit.) 1377 In the original draft on multi-level IS-IS the following happens and 1379 QUOTE... 1381 - RB2, when transitioning the frame from Level 1 to Level 2, 1382 replaces the ingress RBridge nickname with the area nickname, so 1383 replaces 27 with 15961. Within Level 2, the ingress RBridge field in 1384 the TRILL header will therefore be 15961, and the egress RBridge 1385 field will be 15918. Also RB2 learns that S is attached to nickname 1386 27 in area 15961 to accommodate return traffic. 1388 - The frame is forwarded through Level 2, to RB3, which has 1389 advertised, in Level 2, reachability to the nickname 15918. 1391 - RB3, when forwarding into area 15918, replaces the egress nickname 1392 in the TRILL header with RB4's nickname (44). So, within the 1393 destination area, the ingress nickname will be 15961 and the egress 1394 nickname will be 44. 1396 - RB4, when decapsulating, learns that S is attached to nickname 1397 15961, which is the area nickname of the ingress. 1399 Now suppose that D's location has not been learned by RB1 and/or RB3. 1400 What will happen, as it would in TRILL today, is that RB1 will 1401 forward the frame as a multi-destination frame, choosing a tree. As 1402 the multi-destination frame transitions into Level 2, RB2 replaces 1403 the ingress nickname with the area nickname. If RB1 does not know the 1404 location of D, the frame must be flooded, subject to possible 1405 pruning, in Level 2 and, subject to possible pruning, from Level 2 1406 into every Level 1 area that it reaches on the Level 2 distribution 1407 tree. 1409 UNQUOTE... 1411 In the current proposal that we outline in this document, the TRILL 1412 header is done away with completely in the IP+GRE or IP+MPLS core. A 1413 re-look into the inner headers after decapsulation gives the 1414 appropriate information to carry the frame from the N-PE towards the 1415 destination U-PE.