idnits 2.17.1 draft-ietf-trill-directory-assisted-encap-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 27, 2017) is 2314 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Linda Dunbar 2 Intended status: Proposed Standard Donald Eastlake 3 Huawei 4 Radia Perlman 5 Dell/EMC 6 Expires: May 26, 2018 November 27, 2017 8 Directory Assisted TRILL Encapsulation 9 11 Abstract 13 This draft describes how data center networks can benefit from non- 14 RBridge nodes performing TRILL encapsulation with assistance from a 15 directory service. 17 Status of This Memo 19 This Internet-Draft is submitted to IETF in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Distribution of this document is unlimited. Comments should be sent 23 to the authors or the TRILL working group mailing list: 24 trill@ietf.org 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 38 Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 Table of Contents 43 1. Introduction............................................3 44 2. Conventions Used in This Document.......................4 46 3. Directory Assistance to Non-RBridge.....................5 47 4. Source Nickname in Encapsulation by Non-RBridge Nodes...8 49 5. Benefits of Non-RBridge Performing TRILL Encapsulation..9 50 5.1. Avoid Nickname Exhaustion Issue.......................9 51 5.2. Reduce MAC Tables for Switches on Bridged LANs........9 53 6. Manageability Considerations...........................11 54 7. Security Considerations................................11 56 8. IANA Considerations....................................12 58 Normative References......................................13 59 Informative References....................................13 61 Acknowledgments...........................................13 62 Authors' Addresses........................................14 64 1. Introduction 66 This document describes how data center networks can benefit from 67 non-RBridge nodes performing TRILL encapsulation with assistance from 68 directory service and specifies a method for them to do so. 70 [RFC7067] and [RFC8171] describe the framework and methods for edge 71 RBridges to get MAC&VLAN <-> Edge RBridge mapping from a directory 72 service instead of flooding unknown DAs across TRILL domain. If it 73 has the needed directory information, any node, even a non-RBridge 74 node, can perform the TRILL data packet encapsulation. This draft is 75 to describe the benefits of and a scheme for non-RBridge nodes 76 performing TRILL encapsulation. 78 2. Conventions Used in This Document 80 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 81 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 82 document are to be interpreted as described in [RFC2119]. 84 AF: Appointed Forwarder RBridge port [RFC8139] 86 Bridge: IEEE 802.1Q compliant device. In this draft, Bridge is used 87 interchangeably with Layer 2 switch. 89 DA: Destination Address 91 ES-IS: End System to Intermediate Systems [RFC8171] 93 Host: Application running on a physical server or a virtual 94 machine. A host usually has at least one IP address and at 95 least one MAC address. 97 IS-IS:. Intermediate System to Intermediate System [RFC7176] 99 SA: Source Address 101 TRILL-EN: TRILL Encapsulating node. It is a node that performs the 102 TRILL encapsulation but doesn't participate in RBridge's 103 IS-IS routing. 105 VM: Virtual Machines 107 3. Directory Assistance to Non-RBridge 109 With directory assistance [RFC7067] [RFC8171], a non-RBridge node can 110 be informed if a data packet needs to be forwarded across the RBridge 111 domain and if so the corresponding egress RBridge. Suppose the 112 RBridge domain boundary starts at network switches (not virtual 113 switches embedded on servers), a directory can assist Virtual 114 Switches embedded on servers to encapsulate with a proper TRILL 115 header by providing the nickname of the egress RBridge edge to which 116 the destination is attached. The other information needed to 117 encapsulate can be either learned by listening to TRILL ES-IS Hellos 118 [RFC8171], which will indicate the MAC address and nickname of 119 appropriate edge RBridges, or by configuration. 121 If a destination is not shown as attached to one or more RBridge edge 122 nodes, based on the directory, the non-RBridge node can forward the 123 data frames natively, i.e. not encapsulating with any TRILL header. 124 Or, if the directory is known to be complete, the non-RBridge node 125 can discard such data frames. 127 \ +-------+ +------+ TRILL Domain/ 128 \ +/------+ | +/-----+ | / 129 \ | Aggr11| + ----- |AggrN1| + / 130 \ +---+---+/ +------+/ / 131 \ / \ / \ / 132 \ / \ / \ / 133 \ +---+ +---+ +---+ +---+ / 134 \- |T11|... |T1x| |T21| .. |T2y|--- 135 +---+ +---+ +---+ +---+ 136 | | | | 137 +-|-+ +-|-+ +-|-+ +-|-+ 138 | |... | V | | V | .. | V |<- vSwitch 139 +---+ +---+ +---+ +---+ 140 | |... | V | | V | .. | V | 141 +---+ +---+ +---+ +---+ 142 | |... | V | | V | .. | V | 143 +---+ +---+ +---+ +---+ 145 Figure 1. TRILL domain in typical Data Center Network 147 When a TRILL encapsulated data packet reaches the ingress RBridge, 148 the ingress RBridge simply forwards the pre-encapsulated packet to 149 the RBridge that is specified by the egress nickname field of the 150 TRILL header of the data frame. When the ingress RBridge receives a 151 native Ethernet frame, it handles it as usual and may drop it if it 152 has complete directory information indicating that the target is not 153 attached to the TRILL campus. In such an environment with complete 154 directory information, the ingress RBridge doesn't flood or forward 155 the received data frames when the DA in the Ethernet data frames is 156 unknown. 158 When all nodes attached to an ingress RBridge can pre-encapsulate 159 with a TRILL header for traffic across the TRILL domain, the ingress 160 RBridge don't need to encapsulate any native Ethernet frames to the 161 TRILL domain. The attached nodes can be connected to multiple edge 162 RBridges by having multiple ports or by an bridged LAN. All RBridge 163 edge ports connected to one bridged LAN can receive and forward pre- 164 encapsulated traffic, which can greatly improve the overall network 165 utilization. However, it is still necessary to designate AF ports. 166 For example, to be sure that multi-destination packets from the TRILL 167 campus are only egressed through one RBridge. 169 The TRILL base protocol specification [RFC6325] Section 4.6.2 Bullet 170 8 specifies that an RBridge port can be configured to accept TRILL 171 encapsulated frames from a neighbor that is not an RBridge. 173 When a TRILL frame arrives at an RBridge whose nickname matches the 174 destination nickname in the TRILL header of the frame, the processing 175 is exactly same as normal, i.e. as specified in [RFC6325] the RBridge 176 decapsulates the received TRILL frame and forwards the decapsulated 177 frame to the target attached to its edge ports. When the DA of the 178 decapsulated Ethernet frame is not in the egress RBridge's local MAC 179 attachment tables, the egress RBridge floods the decapsulated frame 180 to all attached links in the frame's VLAN, or drops the frame (if the 181 egress RBridge is configured with that policy). 183 We call a node that, as specified herein, only performs the TRILL 184 encapsulation, but doesn't participate in RBridge's IS-IS routing, a 185 TRILL Encapsulating node (TRILL-EN). The TRILL Encapsulating Node can 186 get the MAC&VLAN <-> Edge RBridge mapping pulled from directory 187 servers [RFC8171]. In order to do this, a TRILL-EN MUST support TRILL 188 ES-IS [RFC8171]. 190 Upon receiving a native Ethernet frame, the TRILL-EN checks the 191 MAC&VLAN <-> Edge RBridge mapping, and perform the corresponding 192 TRILL encapsulation if the mapping entry is found. If the destination 193 address and VLAN of the received Ethernet frame doesn't exist in the 194 mapping table and there is no positive reply from pulling requests to 195 a directory, the Ethernet frame is dropped or forwarded in native 196 form to an edge RBridge. 198 +------------+--------+---------+---------+--+-------+---+ 199 |OuterEtherHd|TRILL HD| InnerDA | InnerSA |..|Payload|FCS| 200 +------------+--------+---------+---------+--+-------+---+ 201 ^ 202 | | | 203 | 204 | 205 | +-------+ TRILL +------+ 206 | | R1 |-----------| R2 | Decapsulate 207 | +---+---+ domain +------+ TRILL header 208 | | | 209 +----------| | 210 | | 211 +-----+ +-----+ 212 Non-RBridge node:|T12 | | T22 | 213 Encapsulate TRILL+-----+ +-----+ 214 Header for data 215 Frames to traverse 216 TRILL domain. 218 Figure 2. Data frames from TRILL-EN 220 4. Source Nickname in Encapsulation by Non-RBridge Nodes 222 The TRILL header includes a Source RBridge's Nickname (ingress) and 223 Destination RBridge's Nickname (egress). When a TRILL header is added 224 to a data packet by TRILL-EN, the Ingress RBridge nickname field in 225 the TRILL header is set to a nickname of the AF for the data packet's 226 VLAN. The TRILL-EN determines the AF by listening to IS-IS Hellos 227 from the edge RBridges on the link with the TRILL-EN in the same way 228 that the RBridges on the link determine the AF [RFC8139]. TRILL-EN is 229 free to send the encapsulated data frame to any of the edge RBridges 230 on its link. 232 5. Benefits of Non-RBridge Performing TRILL Encapsulation 234 5.1. Avoid Nickname Exhaustion Issue 236 For a large Data Center with hundreds of thousands of virtualized 237 servers, setting the TRILL boundary at the servers' virtual switches 238 will create a TRILL domain with hundreds of thousands of RBridge 239 nodes, which has issues of TRILL Nicknames exhaustion and challenges 240 to IS-IS. On the other hand, setting the TRILL boundary at 241 aggregation switches that have many virtualized servers attached can 242 limit the number of RBridge nodes in a TRILL domain, but introduce 243 the issues of very large MAC&VLAN <-> Edge RBridge mapping table to 244 be maintained by RBridge edge nodes. 246 Allowing Non-RBridge nodes to pre-encapsulate data frames with TRILL 247 header makes it possible to have a TRILL domain with a reasonable 248 number of RBridge nodes in a large data center. All the TRILL-ENs 249 attached to one RBridge are represented by one TRILL nickname, which 250 can avoid the Nickname exhaustion problem. 252 5.2. Reduce MAC Tables for Switches on Bridged LANs 254 When hosts in a VLAN (or subnet) span across multiple RBridge edge 255 nodes and each RBridge edge has multiple VLANs enabled, the switches 256 on the bridged LANs attached to the RBridge edge are exposed to all 257 MAC addresses among all the VLANs enabled. 259 For example, for an Access switch with 40 physical servers attached, 260 where each server has 100 VMs, there are 4000 hosts under the Access 261 Switch. If indeed hosts/VMs can be moved anywhere, the worst case for 262 the Access Switch is when all those 4000 VMs belong to different 263 VLANs, i.e. the access switch has 4000 VLANs enabled. If each VLAN 264 has 200 hosts, this access switch's MAC table potentially has 265 200*4000 = 800,000 entries. 267 If the virtual switches on servers pre-encapsulate the data frames 268 destined for hosts attached to other RBridge Edge nodes, the outer 269 MAC DA of those TRILL encapsulated data frames will be the MAC 270 address of a local RBridge edge, i.e. the ingress RBridge. 271 Therefore, the switches on the local bridged LAN don't need to keep 272 the MAC entries for remote hosts attached to other edge RBridges. 274 But the traffic from nodes attached to other RBridges is decapsulated 275 and has the true source and destination MACs. One simple way to 276 prevent local bridges from learning remote hosts' MACs and adding to 277 their MAC tables, if that is a problem, is to disable this data plane 278 learning on local bridges. The local bridges can be pre-configured 279 with MAC addresses of local hosts with the assistance of a directory. 280 The local bridges can always send frames with unknown Destination to 281 the ingress RBridge. In an environment where a large number of VMs 282 are instantiated in one server, the number of remote MAC addresses 283 could be very large. If it is not feasible to disable learning and 284 pre- configure MAC tables for local bridges, one effective method to 285 minimize local bridges' MAC table size is to use the server's MAC 286 address to hide MAC addresses of the attached VMs. I.e., the server 287 acting as an edge node uses its own MAC address in the Source Address 288 field of the packets originated from a host (or VM) embedded. When 289 the Ethernet frame arrives at the target edge node (the egress), the 290 target edge node can send the packet to the corresponding destination 291 host based on the packet's IP address. Very often, the target edge 292 node communicates with the embedded VMs via a layer 2 virtual switch. 293 In this case, the target edge node can construct the proper Ethernet 294 header with the assistance of the directory. The information from 295 the directory includes the proper host IP to MAC mapping information. 297 6. Manageability Considerations 299 It requires directory assistance [RFC8171] to make it possible for a 300 non-TRILL node to pre-encapsulate packets destined towards remote 301 RBridges. 303 7. Security Considerations 305 For Pull Directory and TRILL ES-IS security considerations, see 306 [RFC8171]. 308 For general TRILL security considerations, see [RFC6325]. 310 8. IANA Considerations 312 This document requires no IANA actions. RFC Edtior: please remove 313 this section before publication. 315 Normative References 317 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 318 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 319 March 1997, . 321 [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 322 Ghanwani, "Routing Bridges (RBridges): Base Protocol 323 Specification", RFC 6325, DOI 10.17487/RFC6325, July 2011, 324 . 326 [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D., 327 and A. Banerjee, "Transparent Interconnection of Lots of Links 328 (TRILL) Use of IS-IS", RFC 7176, DOI 10.17487/RFC7176, May 329 2014, . 331 [RFC8139] Eastlake 3rd, D., Li, Y., Umair, M., Banerjee, A., and F. 332 Hu, "Transparent Interconnection of Lots of Links (TRILL): 333 Appointed Forwarders", RFC 8139, DOI 10.17487/RFC8139, June 334 2017, . 336 [RFC8171] Eastlake 3rd, D., Dunbar, L., Perlman, R., and Y. Li, 337 "Transparent Interconnection of Lots of Links (TRILL): Edge 338 Directory Assistance Mechanisms", RFC 8171, DOI 339 10.17487/RFC8171, June 2017, . 342 Informative References 344 [RFC7067] Dunbar, et, al "Directory Assistance Problem and High-Level 345 Design Proposal", RFC7067, November 2013. 347 Acknowledgments 349 The followiing are thanked for their contributions: 351 Igor Gashinsky 353 The document was prepared in raw nroff. All macros used were defined 354 within the source file. 356 Authors' Addresses 358 Linda Dunbar 359 Huawei Technologies 360 5340 Legacy Drive, Suite 175 361 Plano, TX 75024, USA 363 Phone: +1-469-277-5840 364 Email: linda.dunbar@huawei.com 366 Donald Eastlake 367 Huawei Technologies 368 155 Beaver Street 369 Milford, MA 01757 USA 371 Phone: +1-508-333-2270 372 Email: d3e3e3@gmail.com 374 Radia Perlman 375 Dell/EMC 376 2010 256th Avenue NE, #200 377 Bellevue, WA 98007 USA 379 Email: Radia@alum.mit.edu 381 Copyright, Disclaimer, and Additional IPR Provisions 383 Copyright (c) 2017 IETF Trust and the persons identified as the 384 document authors. All rights reserved. 386 This document is subject to BCP 78 and the IETF Trust's Legal 387 Provisions Relating to IETF Documents 388 (http://trustee.ietf.org/license-info) in effect on the date of 389 publication of this document. Please review these documents 390 carefully, as they describe your rights and restrictions with respect 391 to this document. Code Components extracted from this document must 392 include Simplified BSD License text as described in Section 4.e of 393 the Trust Legal Provisions and are provided without warranty as 394 described in the Simplified BSD License. The definitive version of 395 an IETF Document is that published by, or under the auspices of, the 396 IETF. Versions of IETF Documents that are published by third parties, 397 including those that are translated into other languages, should not 398 be considered to be definitive versions of IETF Documents. The 399 definitive version of these Legal Provisions is that published by, or 400 under the auspices of, the IETF. Versions of these Legal Provisions 401 that are published by third parties, including those that are 402 translated into other languages, should not be considered to be 403 definitive versions of these Legal Provisions. For the avoidance of 404 doubt, each Contributor to the IETF Standards Process licenses each 405 Contribution that he or she makes as part of the IETF Standards 406 Process to the IETF Trust pursuant to the provisions of RFC 5378. No 407 language to the contrary, or terms, conditions or rights that differ 408 from or are inconsistent with the rights and licenses granted under 409 RFC 5378, shall have any effect and shall be null and void, whether 410 published or posted by such Contributor, or included with or in such 411 Contribution.