idnits 2.17.1 draft-hasmit-otv-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (February 23, 2013) is 4079 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'IS-IS' is defined on line 1195, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'IS-IS' -- Possible downref: Non-RFC (?) normative reference: ref. 'IS-IS-OTV' Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Grover 3 Internet-Draft D. Rao 4 Intended status: Standards Track D. Farinacci 5 Expires: August 27, 2013 V. Moreno 6 Cisco Systems 7 February 23, 2013 9 Overlay Transport Virtualization 10 draft-hasmit-otv-04 12 Abstract 14 In today's networking environment most enterprise networks span 15 multiple physical sites. Overlay Transport Virtualization (OTV) 16 provides a scalable solution for L2/L3 connectivity across different 17 sites using the currently deployed service provider and enterprise 18 networks. It is a very cost-effective and simple solution requiring 19 deployment of a one or more OTV functional device at each of the 20 enterprise sites. This solution is agnostic to the technology used 21 in the service provider network and connectivity between the 22 enterprise and the service provider network. This document provides 23 an overview of this technology. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on August 27, 2013. 42 Copyright Notice 44 Copyright (c) 2013 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 61 2. Control Plane . . . . . . . . . . . . . . . . . . . . . . . . 8 62 2.1. Provider Control Plane . . . . . . . . . . . . . . . . . . 9 63 2.2. Overlay Control Plane . . . . . . . . . . . . . . . . . . 9 64 2.2.1. Edge Device Discovery and Adjacency setup . . . . . . 10 65 2.2.2. Extended VLANs . . . . . . . . . . . . . . . . . . . . 10 66 2.2.3. Multiple Instances . . . . . . . . . . . . . . . . . . 11 67 2.2.4. Advertising Unicast MAC Routes . . . . . . . . . . . . 11 68 2.2.5. Advertising Multicast Routes . . . . . . . . . . . . . 11 69 2.2.6. Adjacency Server . . . . . . . . . . . . . . . . . . . 13 70 2.3. Connecting an Edge Device to the Overlay . . . . . . . . . 13 71 2.3.1. Edge Devices as MAC Routers . . . . . . . . . . . . . 13 72 2.3.2. Internal Interface Behavior . . . . . . . . . . . . . 13 73 2.3.3. Overlay Interface Behavior . . . . . . . . . . . . . . 14 74 3. Data Plane . . . . . . . . . . . . . . . . . . . . . . . . . . 14 75 3.1. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 14 76 3.2. Forwarding Process . . . . . . . . . . . . . . . . . . . . 18 77 3.2.1. Forwarding between Internal Links . . . . . . . . . . 18 78 3.2.2. Forwarding from an Internal Link to the Overlay . . . 19 79 3.2.3. Forwarding from the Overlay to an Internal Link . . . 19 80 3.2.4. Unicast Packet Flows . . . . . . . . . . . . . . . . . 20 81 3.2.5. Unknown Unicast Packet Handling . . . . . . . . . . . 20 82 3.2.6. Multicast Packet Flows . . . . . . . . . . . . . . . . 21 83 3.2.7. Broadcast Packet Flows . . . . . . . . . . . . . . . . 21 84 3.3. STP BPDU Handling . . . . . . . . . . . . . . . . . . . . 22 85 4. MAC Address Mobility . . . . . . . . . . . . . . . . . . . . . 22 86 5. Multi-homing . . . . . . . . . . . . . . . . . . . . . . . . . 23 87 5.1. Authoritative Edge Device Selection . . . . . . . . . . . 23 88 5.2. Site Identifier . . . . . . . . . . . . . . . . . . . . . 24 89 6. IS-IS as an Overlay Control Protocol . . . . . . . . . . . . . 24 90 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 26 91 8. Security Considerations . . . . . . . . . . . . . . . . . . . 26 92 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 93 10. Normative References . . . . . . . . . . . . . . . . . . . . . 27 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 96 1. Overview 98 OTV is a new "MAC in IP" technique for supporting L2 VPNs over an 99 L2/L3 infrastructure. OTV provides an "over-the-top" method of doing 100 virtualization among a large number of sites where the routing and 101 forwarding state is maintained at the network edges, but not within 102 the site or in the core. 104 OTV can be incrementally deployed and reside in a small number of 105 devices at the edge between sites and the core. We call these 106 devices "Edge Devices" which perform typical layer-2 learning and 107 forwarding functions on their site facing interfaces (internal 108 interfaces) and perform IP-based virtualization functions on their 109 core facing interfaces (for which an overlay network is realized). 111 Traditional L2VPN technologies rely heavily on tunnels. Rather than 112 creating stateful tunnels, OTV encapsulates layer 2 traffic with an 113 IP header ("MAC in IP"), but does not create any fixed tunnels. 114 Based on the IP header, traffic is forwarded natively in the core 115 over which OTV is being deployed. This is an important feature as 116 the native IP treatment of the encapsulated packet allows optimal 117 multi-point connectivity as well as optimal broadcast and multicast 118 forwarding, plus any other benefits the routed core may provide to 119 native IP traffic. OTV virtualization is independent of the 120 technology deployed in the core; the core network may be a layer-2 121 metro Ethernet core, a layer-3 IP network core, or an MPLS network 122 core. 124 Layer-2 traffic which requires traversing the overlay to reach its 125 destination, is prepended with an IP header which ensures the packet 126 is delivered to the edge boxes that provide connectivity to the 127 Layer-2 destination in the original MAC header. As shown in figure 128 1, if a destination is reachable via Edge Device X2 (with a core 129 facing IP address of IPB), other Edge Devices forwarding traffic to 130 such destination will add an IP header with a destination IP address 131 of IPB and forward the traffic into the core. The core will forward 132 traffic based on IP address IPB, once the traffic makes it to Edge 133 Device X2 it will be stripped of the overlay IP header and it will be 134 forwarded into the site in the same way a regular bridge would 135 forward a packet at layer-2. Broadcast or multicast traffic is 136 encapsulated with a multicast header and follows a similar process. 138 +----+ +----+ 139 | H1 |------- ------------ -------| H2 | 140 +----+ \ / \ / +----+ 141 \+----+IPA / L3 Core \ IPB+----+/ 142 ---------| X1 |----< >---| X2 |-------- 143 /+----+ \ Network / +----+\ 144 / \ / \ 145 ------------ 147 +------------+ 148 | DA = IPB | 149 +------------+ 150 | SA = IPA | 151 +-----------+ +------------+ +-----------+ 152 | DMAC = H2 | | DMAC = H2 | | DMAC = H2 | 153 +-----------+ +------------+ +-----------+ 154 | SMAC = H1 | | SMAC = H1 | | SMAC = H1 | 155 +-----------+ +------------+ +-----------+ 156 | VLAN-ID | | VLAN-ID | | VLAN-ID | 157 +-----------+ +------------+ +-----------+ 158 | Payload | | Payload | | Payload | 159 +-----------+ +------------+ +-----------+ 161 Figure 1. Traffic flow from H1 to H2 with encapsulation in the core. 163 The key piece that OTV adds is the state to map a given destination 164 MAC address in the L2 VPN to an IP address of the OTV Edge Device 165 behind which that MAC address is located. OTV forwarding is a 166 function of mapping a destination MAC address in the VPN site to an 167 Edge Device IP address in the overlay network. 169 To achieve all this, a control plane is required to exchange the 170 reachability information among the different OTV Edge Devices. We 171 will refer to this control plane as the oURP and oMRP (Overlay 172 Unicast Routing Protocol and Overlay Multicast Routing Protocol). 173 OTV does not flood unknown unicast traffic among Edge Devices and 174 therefore precludes data-plane learning on the "overlay interface". 175 Data-plane learning continues to happen on the "internal interfaces" 176 to provide compatibility and transparency within the layer-2 sites 177 connecting to the OTV overlay. The Edge Devices appear to each VPN 178 site to be providing L2 switched network connectivity amongst those 179 sites. 181 This document describes the use of IS-IS as an IGP capable of 182 carrying both MAC unicast and multicast and IP multicast group 183 addresses, thereby serving as both the oURP and oMRP. However, any 184 other suitable routing protocol can be used as the OTV control 185 protocol. The information carried in IS-IS LSPs will be MAC unicast 186 addresses and multicast addresses with their associated VLAN IDs and 187 IP next hops. The MAC addresses are those of the hosts connecting to 188 the network and the IP next hops are the addresses of the Edge 189 Devices through which these are reachable in the core. Figure 2 190 shows what the resulting tables would look like in a simple two site 191 example. 193 +----+ +----+ 194 | H1 |------- ------------ -------| H2 | 195 +----+ \ / \ / +----+ 196 E1\+----+IPA / L3 Core \ IPB+----+/E1 197 ---------| X1 |----< >---| X2 |-------- 198 /+----+ \ Network / +----+\ 199 / Overlay1 \ /Overlay1 \ 200 ------------ 202 At X1 At X2 203 +----------------------------+ +----------------------------+ 204 | Destination | Interface/NH | | Destination | Interface/NH | 205 |----------------------------| |----------------------------| 206 | H1 | E1 | | H1 | Overlay1:IPA | 207 | H2 | Overlay1:IPB | | H2 | E1 | 208 +----------------------------+ +----------------------------+ 210 Figure 2. OTV Forwarding Tables. 212 Edge Devices will have an IP address reachable through their core 213 facing interface(s), and these nodes join a configured ASM/Bidir 214 multicast group in the core transport network. The core or the 215 provider network relies on a provider Unicast Routing Protocol (pURP) 216 and a provider Multicast Routing Protocol (pMRP) to connect the Edge 217 Devices to one another. It is not strictly required that the Edge 218 Devices participate in the pURP/pMRP. They typically connect as 219 hosts to the core network. This is compatible and consistent with 220 today's interconnection policies. However, the solution also 221 supports the scenario where the Edge Devices do actively participate 222 at Layer-3 in the pURP/pMRP. 224 The multicast group that the Edge Devices join is referred to as the 225 "Provider Multicast Group (pMG)". The pMG will be used for Edge 226 Devices to become adjacent with each other to exchange their IS-IS 227 Hellos, LSPs and CSNPs. Thus, by virtue of the pMG, all Edge Devices 228 will see each other as if they were directly connected to the same 229 multi-access multicast-capable segment for the purposes of IS-IS 230 peering. The pMG also defines a VPN; thus, when an Edge Device joins 231 a pMG the site becomes part of a VPN. Multiple pMGs can be defined 232 to define multiple VPNs. 234 The pMG can also be used to broadcast data traffic to all Edge 235 Devices when necessary. Broadcast transmission will not incur head- 236 end replication overhead. OTV allows the pMRP to efficiently 237 distribute broadcast traffic by the provider ASM/Bidir group. 239 When forwarding of VPN multicast is required, new multicast state 240 will be used in order to tailor the distribution trees to the optimal 241 group of receivers, these multicast groups are to be created in the 242 provider control plane (pMRP). For instance, each core device will 243 resort to using SSM multicast in the core by having the Edge Device 244 IGMPv3/ MLDv2 join a {source, group} pair. 246 Edge Devices must combine data-plane learning on their bridged 247 internal interfaces with control-plane learning on their overlay 248 interfaces. The key to this combination is a series of rules through 249 which data-plane events can trigger control-plane advertisements 250 and/or learning events. 252 OTV supports L2 multi-homing for sites where one or more of the 253 bridge domains may be connected to multiple Edge Devices. It 254 supports both active-backup and active-active multi-homing 255 capabilities to sites. OTV provides loop elimination for multi-homed 256 "sites" and does not require the extension of STP across sites. This 257 means each site can run it own STP rather than have to create one 258 large STP domain across sites. 260 1.1. Terminology 262 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 263 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 264 document are to be interpreted as described in RFC 2119. 266 Site - A Site is a single or multi-homed connected network which 267 is typically under the control of a single organization. Sites 268 are connected together via Edge Devices that operate in an overlay 269 network. The Edge Devices provide layer-2 connectivity among the 270 sites. A site will not be used by IS-IS as a transit network. A 271 layer-2 site is one that is mostly made up of hosts and switches. 272 Routers may exist but the majority of the topology to the Edge 273 Devices are L2 switched. The MAC addresses advertised on the 274 overlay network are all the hosts and routers connected to the L2 275 devices at the site. The site typically has several VLANs or 276 bridging domains being actively used. A layer-3 site is one that 277 is mostly made up of routers connecting to hosts via switches. 278 The majority of the topology to the Edge Devices are L3 routed. 280 The number of MAC addresses advertised on the overlay network are 281 limited to the router devices at the site. 283 VPN - A VPN is a collection of sites which are controlled by a 284 single administration. The addressing plan, router and switch 285 configuration is consistent as it would be if the sites were 286 physically at the same location. There is one overlay network per 287 VPN which connects all sites. Each VPN uses a dedicated ASM/Bidir 288 provider multicast group allocated by the core network, which 289 provides the separation from other VPNs for the control plane, as 290 well as in the data plane. 292 Edge Device - A modified L2 switch that performs OTV functions. 293 It will run as an L2 device on the site side, but performs L3 294 functions on the core facing interfaces. When OTV functionality 295 is described, this functionality only occurs in an Edge Device. 297 Internal Interface - These are Layer-2 interfaces connected to 298 site based switches or site based routers. The internal interface 299 is layer-2 regardless if it connects to a switch or a router. 301 Overlay Interface - This is a logical multi-access multicast- 302 capable interface. The overlay interface can replicate broadcast 303 and multicast packets efficiently. The overlay interface provides 304 an IP unicast or multicast encapsulation for L2 frames transmitted 305 from the site. The overlay interface is realized by one or more 306 physical core facing interfaces. The core facing interfaces are 307 assigned IP addresses out of the core provider's address space. 309 MAC Table - This is a forwarding table of 48-bit MAC addresses. 310 The table can contain unicast or multicast MAC addresses. The 311 table is populated by two sources. One being traditional data- 312 plane learning on internal interfaces and the other by the URP/MRP 313 at the control-plane on the overlay interface. A MAC table is 314 scoped by VLAN therefore allowing the same MAC address to be used 315 in different VLANs, and potentially in different VPNs. 317 Authoritative Edge Device (AED) - This is an Edge Device that 318 forwards Layer-2 frames in and out of a site from and to the 319 overlay interface. Depending on the multi-homing granularity in 320 use, there will be a single AED in the site for a given VLAN or 321 for a given MAC-level flow. 323 Site-ID - Each Edge Device which resides in an OTV site will 324 advertise over the overlay network the same site-id. The site-id 325 may be determined dynamically or by static configuration. 327 (VLAN, uMAC) - This is the designation of layer-2 network 328 reachability information as encoded in the URP and as stored in 329 the MAC table. This notation describes a given unicast MAC 330 address within a particular VLAN. 332 (VLAN, mMAC, mIP) - This is the designation of layer-2 network 333 reachability information as encoded in the MRP and as stored in 334 the MAC/IP table. This notation describes a given multicast 335 MAC/IP address within a particular VLAN. The 'mIP' part of the 336 3-tuple is provided so both Layer-2 switching and the SSM based 337 tree joins can occur based on the IP group address (since 32-to-1 338 aliasing can happen for IPv4 group address to MAC mappings and 339 worse for IPv6). 341 2. Control Plane 343 This section discusses the control plane hierarchy. At the very base 344 of the hierarchy we find the provider control plane, which enables 345 unicast reachability among the edge boxes and also provides the 346 multicast group that makes edge boxes adjacent from the overlay 347 control plane perspective. The provider control plane also provides 348 the multicast trees in the core that will be used for optimal 349 forwarding of the layer-2 site data traffic. 351 At the next level, the overlay control plane provides discovery of 352 the Edge Devices that are part of the overlay and conveys client-MAC- 353 address reachability and client-multicast group information between 354 the edge devices. 356 In general, the control planes are independent of each other. 357 However, in order to optimize multicasting, multicast control-plane 358 events (reports, joins, leaves) that occur in one MRP may initiate 359 events in another MRP so that the optimal tree is always being used 360 to forward traffic. Also, events in the overlay control plane are 361 triggered by forwarding events in the client data plane (however both 362 client and overlay control planes remain independent of each other). 364 |<------------------------ cURP/cMRP ------------------------>| 365 | | 366 | | 367 | |<--------- oURP/oMRP --------->| | 368 | | | | 369 | | | | 370 | | |<--- pURP/pMRP -->| | | 371 | | | (pMG) | | | 372 | | | | | | 373 | | | | | | 374 +----+ +--+ | | | | +--+ +----+ 375 | R1 |----|S1| | | ------------ | | |S2|----| R2 | 376 +----+ +--+ | | / \ | | +--+ +----+ 377 \+----+IPA | / L3 Core \ | IPB+----+/ 378 ------| X1 |-----< >-----| X2 |----- 379 /+----+ \ Network / +----+\ 380 \ / 381 ------------ 383 Figure 3. OTV Control Plane Hierarchy 385 2.1. Provider Control Plane 387 The provider control plane is the set of routing protocols which run 388 in the core infrastructure to be able to deliver packets sourced from 389 the site networks. There is no required coordination of routing 390 protocols between the site and the core. That is, no more than 391 typically necessary to connect to a core service. In terms of 392 addressing, the Edge Device is allocated an IP address out of the 393 core block of addresses. 395 For each VPN the Edge Device is to support, a multicast group is 396 required to be allocated from the provider core at a minimum. This 397 multicast group is typically ASM/BiDir. In addition, the multicast 398 state created in the client site network will map to some amount of 399 state in the core network. However, it is not required to provision 400 a unique group for every client data group. The Edge Device takes a 401 client multicast packet and encapsulates it in a core-deliverable 402 multicast packet. 404 2.2. Overlay Control Plane 406 The overlay control plane provides auto-discovery of the Edge Devices 407 that are members of an Overlay VPN. It also conveys Layer-2 unicast 408 and multicast reachability information from a site to Edge Devices in 409 other sites and the VLANs or layer-2 bridge domains being extended. 411 The MAC addresses that are locally connected to an Edge Device are 412 advertised in the overlay URP to other Edge Devices in the VPN. 413 Thus, MAC learning on the overlay is not based on data plane 414 flooding, but is based on explicit advertisements of MAC addresses 415 done by the overlay control plane. Similarly, the multicast groups 416 that a site has receivers or sources for are advertised in the 417 overlay MRP to other Edge Devices in the VPN. 419 2.2.1. Edge Device Discovery and Adjacency setup 421 The overlay URP establishes adjacencies only between Edge Devices 422 that are in the same VPN. Edge Devices become part of a VPN when 423 they join a multicast group defined in the core (provider MRP); 424 devices using the same group are members of the same VPN. Thus, the 425 adjacency setup provides a very simple mechanism to automatically 426 discover members of the VPN. The hellos and updates between overlay- 427 URP peers travel over the multicast group defined in the pMRP. Thus, 428 Edge Devices peer with each other as if they were directly connected 429 at layer-2. This peering is possible as all the traffic for the oURP 430 is encapsulated with the pMRP group address and sent into the core. 431 Thus, all Edge Devices in a given VPN receive the oURP multicast 432 traffic as if they were all on the same segment. Similarly, the 433 overlay MRP packets are encapsulated with the pMRP group address 434 corresponding to the VPN. The overlay MRP is used to inform all the 435 Edge Devices that the subscribers to a particular group are reachable 436 over the overlay network. 438 An Edge Device can support multiple overlay VPNs. Each overlay has 439 its own dedicated provider-multicast group address and a distinct set 440 of adjacencies. There may be multiple overlay adjacencies between 441 the same set of Edge Devices, or the membership may be disjoint for 442 each overlay. 444 2.2.2. Extended VLANs 446 Each overlay basically extends a set of VLANs or layer-2 bridge 447 domains among the member sites. On a given Edge Device, a set of 448 VLANs is uniquely extended on a specific overlay. Other VLANs may be 449 extended on other overlays. This entails both advertising and 450 accepting information in the control plane such as VLANs and their 451 associated MAC and group information, as well as forwarding unicast, 452 multicast and broadcast traffic for these VLANs. 454 To allow scalability of connecting large L2 sites together via the 455 overlay, by default, an Edge Device will not advertise any 456 information for any VLANs. To avoid inadvertent merging of VLANs 457 among sites, Edge Devices will be required to configure the VLANs for 458 which Edge Devices will advertise reachability information for. 460 2.2.3. Multiple Instances 462 An Edge Device may support bridging of multiple distinct layer-2 463 domains with overlapping VLANs which are to be treated as distinct. 464 These VLANs may be extended on the overlay by treating them as 465 separate instances both in advertising control plane information and 466 while forwarding in the data plane. A single overlay VPN can support 467 more than one instance among the Edge Devices in that overlay. 469 2.2.3.1. VLAN to Instance Mapping 471 The OTV encapsulation header as specified in this document contains a 472 24-bit Instance ID. This Instance ID can be used in two different 473 modes: 475 1. It distinguishes between multiple distinct 12-bit VLAN domains 476 being extended across the overlay from a site. Each such domain is 477 assigned an Instance-ID. In this mode, the "inner" VLANs are 478 preserved within the 802.1Q header in the OTV payload. The 479 combination of the Instance-ID and the inner VLAN uniquely identify a 480 single Layer-2 broadcast domain. 482 2. At the OTV Edge Device, a local mapping function maps a 12-bit 483 VLAN to a unique 24-bit Instance-ID before sending the encapsulated 484 packets on the overlay. In this case, the inner 802.1Q header is 485 stripped before sending the encapsulated packets on the overlay. The 486 Instance ID uniquely identifies a Layer-2 broadcast domain. 488 2.2.4. Advertising Unicast MAC Routes 490 When a MAC address is learned by arrival of a data packet on an 491 internal interface, the Edge Device advertises the MAC address on the 492 overlay URP. In addition to conveying the MAC address reachability 493 to other edge devices, it also provides a mapping to one of the IP 494 addresses of the advertising Edge Device; i.e., the IP next-hop and 495 encapsulation for that MAC address. Typically, even if a site is 496 multi-homed, a unicast MAC address is advertised by a single Edge 497 device, that is the Authoritative Edge Device. Hence, remote Edge 498 Devices will see a single path to reach a given MAC address. 499 However, when active-active multihoming is being used, there will be 500 equal-cost paths to reach a MAC address in a site and the sender Edge 501 Device will load-balance flows among the paths. 503 2.2.5. Advertising Multicast Routes 505 An Edge Device learns about the multicast groups that hosts in the 506 site are interested in by snooping IGMP/MLD reports on the internal 507 interfaces. When a multicast MAC or group address is learned, the 508 Edge Device notifies other Edge Devices about it by placing a 509 (VLAN,mMAC,mIP) entry in a multicast control PDU. Thus, the overlay 510 MRP informs all the Edge Devices that the subscribers to a particular 511 group are reachable over the overlay network. This information is 512 used by Edge Devices to populate their multicast oif-list at the 513 source site. As long as there is one site that has a receiver for a 514 multicast group, the Edge Devices at the source site will forward 515 traffic for that group onto the overlay. Edge Devices at the 516 receiving sites will also join the corresponding multicast group in 517 the provider plane (pMRP). Thus, multicast trees are built natively 518 in the core, not on the overlay, and provide optimal delivery of 519 multicast data. 521 2.2.5.1. Delivery Groups 523 Delivery groups are multicast groups used in the core network to 524 transport site multicast traffic. Multicast data for various 525 customer data groups are aggregated into a typically smaller set of 526 core multicast trees, without requiring extensive coordination 527 between OTV edge boxes. Delivery group selection is centralized at 528 each source OTV Edge Device which controls the mapping of a (S,G) to 529 a (DS, DG). It exports this mapping to other Edge Devices so that 530 they can join the (DS, DG) in the core. Link-local site multicast 531 groups may also map to a specific delivery group instead of the 532 provider multicast group used for control packets. Delivery group 533 mapping allows for fair amount of flexibility for the customer sites 534 and the provider to decide control of state versus bandwidth tradeoff 535 in the core. 537 When a receiver site Edge Device learns a (S, G) to (DS, DG) mapping, 538 it joins the (DS, DG) tree in the core. As an optimization, this 539 join may be done only if there are local receivers for the group. It 540 also installs a layer-3 multicast route for (DS,DG) to decapsulate 541 incoming packets with the appropriate core uplink interface as the 542 RPF interface. 544 2.2.5.2. Active Source Discovery 546 An OTV Edge Device will advertise a delivery group mapping for a 547 (*,G) or (S,G) route only when there is an active source sending data 548 in its site. For this, the Edge Device will learn the active sources 549 by snooping multicast data received on the internal interfaces. If a 550 remote receiver interested in this group, a (VLAN, S,G) entry is 551 installed with the overlay as an OIF and the (DS,DG) as outer 552 encapsulation. When IGMP/MLD is being used on the core uplink, the 553 (DS,DG) encapsulated packet may be emitted directly on the uplink 554 interface. The first-hop router on the other end of the core uplink 555 will then forward this packet along the core multicast tree. 557 2.2.6. Adjacency Server 559 In case the provider core does not support ASM/Bidir multicast, there 560 is an alternate mechanism to discover the remote Edge Devices which 561 are part of a VPN. In this scenario, an Edge Device is configured as 562 an Adjacency Server. All other Edge Devices inform the Adjacency 563 Server regarding their reachability and capability information via 564 the overlay control plane. Adjacency Server is responsible for 565 informing all the other existing Edge Devices regarding addition or 566 loss of an Edge Device. Based on the reachability information, the 567 Edge Devices can further communicate with one another directly using 568 unicast or multicast data path. 570 2.3. Connecting an Edge Device to the Overlay 572 In order to successfully connect to the overlay, the Edge Device has 573 several functions on its different interfaces. These are summarized 574 in this section. 576 2.3.1. Edge Devices as MAC Routers 578 The Edge Device need not participate in the provider URP (pURP) as a 579 router, but can simply behave as a host. This keeps its requirements 580 and functionality simple. In this mode, the Edge Device has an IP 581 address which is significant in the core/provider addressing space. 582 The Edge Device joins the multicast groups in the core by issuing 583 IGMPv3/MLDv2 reports, just like a host would. Thus the Edge Device 584 does not have an IGP relationship with the core. This allows for 585 simpler insertion into any type of core network. 587 However, the Edge Device does participate in the overlay URP and its 588 IP address is used as a router ID and a next-hop address for unicast 589 traffic by the overlay URP. However, the Edge Device does not build 590 an IP routing table with the information received from the oURP, but 591 rather builds a hybrid table where MAC address destinations are 592 reachable via IP next-hop addresses. This may be termed as a MAC 593 router because it can route packets based on MAC addresses. 595 Thus, Edge Devices are IP hosts in the provider plane, MAC routers in 596 the overlay plane and bridges in the client bridging plane. It 597 should be noted that Edge Devices can also support full IP routing 598 functionality and participate in the pURP/pMRP as routers. 600 2.3.2. Internal Interface Behavior 602 The internal interfaces on an Edge Device are bridged interfaces and 603 are indifferent to whether the site itself is L2 or L3. These 604 interfaces behave as regular switch interfaces and learn the source 605 MAC addresses of traffic they receive. Spanning tree BPDUs are 606 received, processed and sourced on internal interfaces as they would 607 on a regular 802.1d, 802.1s and 802.1w switch. IGMP/MLD and data 608 snooping is enabled on internal interfaces to discover local 609 receivers and sources in the site. Additionally, traffic received on 610 internal interfaces may trigger oURP/oMRP advertisements and/or pMRP 611 group joins as described earlier. 613 Traffic received on an internal interface will be forwarded according 614 to the MAC and multicast tables either onto other internal interfaces 615 (regular bridging) or onto the overlay (OTV forwarding). This is 616 explained in detail in the Forwarding section. 618 2.3.3. Overlay Interface Behavior 620 An overlay interface is a logical interface which is associated with 621 an IP address in the provider/core address space. Traffic out of 622 these interfaces is encapsulated with an IP header, and traffic 623 received on these interfaces must be de-capsulated to produce a L2 624 frame. The encapsulated packets exit the Edge Device on one or more 625 underlying physical or logical L3 interfaces. 627 STP BPDUs are not sourced from overlay interfaces, therefore there 628 should not be STP BPDUs in the core, nor do the overlay interfaces 629 participate in the spanning tree protocol. 631 The IP addresses assigned to the overlay interfaces are used as next- 632 hop addresses by the overlay-URP, therefore the MAC table for the 633 overlay interface will include a remote IP address as the next-hop 634 information for remote MAC addresses. 636 3. Data Plane 638 3.1. Encapsulation 640 The overlay encapsulation format is a Layer-2 ethernet frame 641 encapsulated in UDP inside of IPv4 or IPv6. 643 The format of OTV UDP IPv4 encapsulation is as follows: 645 1 2 3 646 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 647 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 648 |Version| IHL |Type of Service| Total Length | 649 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 650 | Identification |Flags| Fragment Offset | 651 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 652 | Time to Live | Protocol = 17 | Header Checksum | 653 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 654 | Source-site OTV Edge Device IP Address | 655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 656 | Destination-site OTV Edge Device (or multicast) Address | 657 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 658 | Source Port = xxxx | Dest Port = 8472 | 659 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 660 | UDP length | UDP Checksum = 0 | 661 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 662 |R|R|R|R|I|R|R|R| Overlay ID | 663 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 664 | Instance ID | Reserved | 665 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 666 | | 667 | Frame in Ethernet or 802.1Q Format | 668 | | 669 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 671 The format of OTV UDP IPv6 encapsulation is as follows: 673 0 1 2 3 674 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 675 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 676 |Version| Traffic Class | Flow Label | 677 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 678 | Payload Length | Next Header=17| Hop Limit | 679 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 680 | | 681 + + 682 | | 683 + Source-site OTV Edge Device IPv6 Address + 684 | | 685 + + 686 | | 687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 688 | | 689 + + 690 | | 691 + Destination-site OTV Edge Device (or multicast) Address + 692 | | 693 + + 694 | | 695 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 696 | Source Port = xxxx | Dest Port = 8472 | 697 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 698 | UDP Length | UDP Checksum | 699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 700 |R|R|R|R|I|R|R|R| Overlay ID | 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 702 | Instance ID | Reserved | 703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 704 | | 705 | Frame in Ethernet or 802.1Q Format | 706 | | 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 709 Outer IPv4 (or IPv6) Header: 711 Version: Set to value 4 (or 6) in decimal. 713 IHL: Set to value 5 in decimal meaning there are no IP options 714 present in an OTV encapsulated packet. 716 Type of Service/Traffic Class: The 802.1P bits from the Ethernet 717 Frame are copied to this field. 719 Total Length: The total length of the IPv4 datagram in bytes. This 720 includes the IPv4 header, the UDP header, the OTV header, and the L2 721 frame without the preamble and CRC fields. 723 Payload length: The length of the IPv6 payload in bytes. This 724 includes the UDP header, the OTV header, and the L2 frame without the 725 preamble and CRC fields. 727 Identification: Set randomly by the OTV Edge Device. 729 Flags: The DF bit should be set to 1. 731 Time to Live/Hop Limit: Set by the OTV Edge Device and is 732 configurable. 734 Protocol/Next Header: Since the packet is UDP encapsulated, this 735 field is set to 17 decimal. 737 Header Checksum: Must be computed by the OTV Edge Device over the IP 738 header fields. 740 Source Address: The IPv4 (or IPv6) address of the OTV Edge Device 741 doing the encapsulation of the L2 frame. 743 Destination Address: The IPv4 (or IPv6) unicast or multicast address 744 set by the OTV Edge Device which is encapsulating the L2 frame. The 745 Edge Device decides when the address is set to a unicast or multicast 746 addresss. 748 UDP Header: 750 Source Port: Is chosen by the OTV Edge Device which is encapsulating 751 the L2 frame based on a hash of the L2 frame. This allows packets to 752 be load-split evenly over LAGs and ECMP links on routers in the core, 753 responsible for delivering these IP encapsulated packets. 755 Destination Port: This is an IANA assigned well-known user port 756 number. Packets encapsulated by an OTV Edge Device put value 8472 in 757 the destination port field. 759 UDP Length: Is the length in bytes of the UDP header, the OTV header, 760 and the L2 frame without the preamble and CRC fields. 762 UDP Checksum: This is set to 0 by the OTV Edge Device when doing 763 encapsulation and ignored by the OTV Edge Device which is 764 decapsulating at the destination site. 766 OTV Header: 768 Flags: 770 'I' - Instance ID bit. When set to 1, it indicates the Instance ID 771 should be used in the forwarding lookup. 773 'R' - Reserved bits. 775 Overlay ID: Is used only for control plane packets such as the URP/ 776 MRP (IS-IS) to identify packets for a specific overlay. 778 Instance ID: Set by the OTV Edge Device doing the encapsulation to 779 specify a logical table that should be used for lookup by the OTV 780 Edge Device at the destination site. 782 L2 Ethernet Frame: 784 The L2 Frame minus the preamble and CRC received on an internal link 785 by an OTV Edge Device. 787 The addition of OTV encapsulation headers increases the size of an L2 788 packet received on an internal interface such that the core uplinks 789 on the Edge Device as well as the routers in the core need to support 790 an appropriately larger MTU. OTV encapsulated packets must not get 791 fragmented as they traverse the core, and hence the IP header is 792 marked to not fragment by the Edge Device. The Edge Device drops 793 packets that exceed the core uplink MTU. 795 The following tables enumerates how MAC level packets are 796 encapsulated in the OTV header. 798 MAC-level Frame OTV IP Encapsulation 799 --------------- -------------------- 800 Unicast Frame IP unicast packet 801 Broadcast Frame ASM/Bidir IP multicast packet 802 Link-local Multicast Frame ASM/Bidir IP multicast packet 803 Data Multicast Frame SSM IP multicast packet 805 3.2. Forwarding Process 807 Most of the interesting forwarding cases happen when a packet comes 808 from the Overlay Link to be forwarded to an Internal Link, or vice 809 versa. But for completeness, forwarding between internal links is 810 also described 812 3.2.1. Forwarding between Internal Links 814 When an Edge Device has internal links, it operates like a 815 traditional L2 switch. That is, it will send unicast packets on a 816 port where the MAC was learned; it will send multicast packets on the 817 ports it has IGMP/MLD-snooped; and it will send broadcast packets out 818 all ports for a given VLAN or layer-2 bridge domain 820 3.2.2. Forwarding from an Internal Link to the Overlay 822 An Edge Device will decide to forward a Layer-2 unicast, multicast, 823 or broadcast packet over the overlay interface when the overlay 824 control plane has put the logical port of the overlay interface in 825 the forwarding table, such as for the corresponding unicast or 826 multicast address. When a packet is sent over the overlay interface, 827 it is first prepended with an OTV header that includes the IP address 828 of the overlay next-hop. The packet as received from the internal 829 interface is not touched other than to remove the preamble and FCS 830 from the frame. The IP address, outer MAC address and other 831 encapsulation information are all installed in the forwarding 832 hardware by the control plane so the OTV header can be prepended and 833 the packet forwarded at high rate. 835 The Edge Device has to be eligible to forward this packet as per the 836 control plane, such as being the Authoritative Edge Device. Multi- 837 homing of sites imposes additional rules on the forwarding of traffic 838 as described later in this document. 840 3.2.3. Forwarding from the Overlay to an Internal Link 842 When a packet is received on the overlay interface, it will need to 843 be IP decapsulated to reveal the inner MAC header for forwarding. 844 The inner MAC header SA and DA addresses and VLAN-ID will used for 845 forwarding actions. For any type of packet received on the overlay 846 interface, it will be accepted only if the Edge Device is the 847 Authoritative Edge Device as determined by an inspection of the 848 received packet header. 850 When a unicast packet is received on the overlay interface, the outer 851 OTV IP header is removed, and the VLAN-ID and the MAC DA from the 852 inner header is used to do the MAC table lookup. Here onwards, this 853 is a regular bridging operation, whether the MAC address entry is 854 present or not. 856 When a multicast packet is received on the overlay interface, the 857 outer OTV IP header is removed. The VLAN-ID and inner MAC header SA 858 and DA or inner IP header SA and DA are used to do a Layer-2 859 multicast table lookup and forward the packet on the right internal 860 interfaces. A multicast packet received from the overlay will not be 861 sent back out on the overlay. 863 When a broadcast packet is received on the overlay interface, the 864 outer OTV IP header is removed and the packet is then flooded on all 865 internal interfaces. 867 3.2.4. Unicast Packet Flows 869 Hosts typically generate ARP requests and learn the MAC addresses of 870 other hosts from ARP requests and replies. Switches learn the source 871 MACs from packet headers and store this state to optimally forward 872 traffic destined to these MACs. The OTV Edge Devices will also learn 873 the MACs locally on their site facing interfaces, and will install 874 remote MACs received over the overlay control plane into the local 875 MAC table with the appropriate remote Edge Devices as next-hops. 877 Once these actions take place, every switch will forward the L2 878 packet based on the MAC table entry. The OTV Edge Device at the 879 source site will also do a MAC table lookup which will yield a next- 880 hop entry pointing to a remote Edge Device. Once the OTV header with 881 the IP address is prepended, the packet is then forwarded to the 882 destination Edge Device at Layer-3 as a regular IP packet. 884 The Edge Device as well as the core routers may load-balance these 885 encapsulated packets among equal-cost multiple Layer-3 paths, with 886 packets belonging to a single Layer-2 flow being hashed to a specific 887 equal-cost path. 889 3.2.5. Unknown Unicast Packet Handling 891 When the switched network at an OTV site has no state for a MAC 892 address, it will flood the unicast packet on the spanning tree 893 throughout the site. The Edge Devices are on the spanning tree (like 894 any other switch at the site) so they will receive these unknown 895 unicast packets. 897 It is imperative that the Edge Devices hold previously learned MAC 898 addresses for an extended period of time so that remote Edge Devices 899 can get reachability to these local MACs. So the cache timers will 900 be longer than the traditional MAC aging timers on switches. In 901 fact, the Edge Device MAC aging timers generally need to be greater 902 than the ARP request interval from any host. Either an unknown flood 903 or a broadcast packet could cause an update of the MAC entries in the 904 Edge Device. And when MACs go inactive, an Authoritative Edge Device 905 must withdraw the MAC address from the overlay control plane. 906 Traffic to these unknown destinations will not be forwarded onto the 907 overlay. Thus, OTV does not flood unknown unicasts. In an OTV 908 network unknown destinations become known the moment the host emits 909 at least one packet. The assumption is that no host on the network 910 is completely silent. 912 3.2.6. Multicast Packet Flows 914 A multicast receiver host sends out IGMP/MLD reports for the 915 multicast groups it wants to join. The sites may use either IGMPv2 916 or IGMPv3. A multicast capable switch will forward these reports to 917 router ports and querier ports. The OTV Edge Device behaves as 918 either a querier or a router in the network and hence receives these 919 reports. 921 A host in a site may be a source for an (S,G) group and sends data. 922 This data is flooded or forwarded along IGMP/MLD snooped links by the 923 site switches. When an Edge Device receives this packet, it does a 924 Layer-2 multicast table lookup which may yield several OIFs. If the 925 overlay interface is part of the OIF-list, then the Edge Device 926 encapsulates the packet in an OTV IP header which includes the 927 delivery group (DS, DG) IP addresses. It then emits the resulting IP 928 multicast packet into the core which is forwarded along a core 929 multicast tree to the receiver site edge devices. 931 The receiver site Edge Device also joins one or more (DS, DG) core 932 multicast trees as directed by various source site Edge Devices. 933 This allows it to receive data from other sites. The core multicast 934 trees may either be SSM or ASM though this document focusses on the 935 SSM case. 937 3.2.7. Broadcast Packet Flows 939 A broadcast packet originated at an OTV site needs to be delivered to 940 all sites of the same VPN. This is typically done with the ASM/Bidir 941 group encapsulation which is the same group used for the oURP/oMRP 942 (pMG). A different data group can also be used to forward broadcast 943 traffic. 945 A broadcast packet, sourced in a site, gets to all Edge Devices 946 because each Edge Device is on the site spanning tree. However, 947 duplicates must not be allowed to appear on the overlay network when 948 there are multiple Edge Devices, so the Authoritative Edge Device for 949 the VLAN is the only Edge Device that forwards the packet on the 950 overlay network. All edge devices at a remote site will receive the 951 broadcast packet over the core multicast group. To prevent 952 duplicates going into the site, only the Authoritative Edge Device in 953 that site will forward the packet into the site. And once sent into 954 the site, the packet gets to all switches on the site spanning tree. 955 Because only the AED can forward broadcast packets in or out of the 956 site, broadcast loops are avoided. 958 Other types of packets such as link-local multicast packets and 959 non-IP Layer-2 packets may also be sent along the pMG or on a 960 dedicated data group. 962 3.3. STP BPDU Handling 964 Since the Edge Device acts as an L2 switch it does participate in the 965 Spanning Tree Protocol if the site has been configured to use it. 966 However, there is no STP activity on the overlay interface. The 967 following are the rules an OTV Edge Device will follow: 969 o When STP is configured at a site, an Edge Device will send and 970 receive BPDUs on internal interfaces. An OTV Edge Device will not 971 originate or forward BPDUs on the overlay network. 973 o An OTV Edge Device can become a root of one or more spanning trees. 975 o An OTV Edge Device will take the typical action when receiving 976 Topology Change Notification (TCNs) messages. 978 o When on OTV Edge Device detects another Edge Device in it's site 979 has come up or gone down, it may send a TCN so it can gather new 980 state for when its authoritative status changes for a VLAN. 982 To allow the L2 switch network to scale to larger number of nodes and 983 MAC addresses, it is considered a feature of OTV to maintain and keep 984 the spanning trees small and per site. 986 4. MAC Address Mobility 988 In a traditional layer-2 switched network, mobility of a host is 989 easily achievable because each switch in the network tracks the 990 source MAC address in each packet and the interface the last packet 991 was received on. So if that MAC is later seen on another interface, 992 the new interface can be updated at the same time the packet is 993 forwarded. These fast MAC moves need to be achieved when a MAC moves 994 from one OTV site to another. The Authoritative Edge Device for a 995 VLAN determines a MAC move in combination with traditional learning 996 on the internal interfaces and explicit MAC advertisements on the 997 overlay. 999 If an Authoritative Edge Device has a MAC address stored in the MAC 1000 forwarding table which points to the overlay interface, it means that 1001 an Edge Device in another site has explicitly advertised the MAC as 1002 being local to it's site. Therefore, any packets coming from the MAC 1003 will be coming from the overlay. Once that MAC is heard on an 1004 internal interface, it has moved into the site. Since it has moved 1005 into a new site, the Authoritative Edge Device in the new site is 1006 responsible for advertising it. 1008 When a MAC appears in a new site, the Authoritative Edge Device will 1009 advertise the new MAC address with a metric value of 0. When the 1010 Edge Device in the site the MAC has moved from hears the 1011 advertisement, it will withdraw the MAC address that it had 1012 previously advertised. Once the MAC address is withdrawn, the Edge 1013 Device where the MAC has moved to will change the metric value to 1. 1014 All remote sites sending to this MAC address will start using the new 1015 Edge Device as soon as they hear it's MAC advertisement with metric 1016 0. 1018 5. Multi-homing 1020 A site typically will be multi-homed with multiple Edge Devices 1021 connecting to the overlay. This provides the site with increased 1022 network redundancy and resilience to failures. 1024 When sites are multi-homed, there is a potential for loops to be 1025 created between the OTV overlay and the layer-2 domains at different 1026 sites. One option to address such loops is to transport STP BPDUs on 1027 the overlay and rely on STP to break any loops that may form when 1028 multi-homed sites connect to the overlay. However, this is not 1029 desirable as it leads to very large or complex STP domains. OTV 1030 multi-homing avoids loops through a combination of techniques in the 1031 control plane and data plane. 1033 OTV does not transport STP BPDUs over the core. As a result, each 1034 site will have its own STP domain, which is separate and independent 1035 from the STP domains in other sites, even though all sites will be 1036 part of a common broadcast or Layer-2 domain. It also does not flood 1037 unknown unicast traffic on the overlay. 1039 5.1. Authoritative Edge Device Selection 1041 An Authoritative Edge Device is an Edge Device that forwards Layer-2 1042 frames in and out of a site from and to the overlay network. When a 1043 site is multi-homed to the overlay, a proper Authoritative Edge 1044 Device selection ensures that traffic crossing the site-overlay 1045 boundary does not get duplicated, create loops or cause any churn in 1046 the MAC tables of switches within the local and remote sites. 1048 The Authoritative Edge Device (AED) may be statically assigned or 1049 determined via an election among the devices in the same site. A 1050 unique AED may be selected for each VLAN or it may be on a finer MAC- 1051 level granularity. In either case, for a given MAC-level flow, the 1052 data path will be symmetric. 1054 An Authoritative Edge Device has the primary responsibility to 1055 advertise locally learned source MAC addresses and IGMP/MLD-snooped 1056 multicast addresses in the oURP and oMRP. 1058 When done per-VLAN, an AED will be authoritative for all unicast and 1059 multicast addresses within a single VLAN. The authoritative 1060 responsibility can be shared with other Edge Devices for other VLANs 1061 so traffic can be load balanced among all Edge Devices across 1062 different VLANs. 1064 For the particular scenario of all-active multi-homing and load 1065 balancing, AEDs may be elected on a finer granularity. Thus there 1066 may be several AEDs in any given VLAN in this case and different 1067 flows can use different Edge Devices. 1069 Protocol adjacencies are set up among the Edge Devices in the same 1070 site. The AED is selected from this list of Edge Devices in the same 1071 site. The AED selection algorithm tries to ensures an even spread of 1072 VLANs across the Edge Devices. A simple mechanism may be via a hash 1073 of the VLAN-ID. Alternatively, a static AED assignment may be to use 1074 a VLAN range division among all Edge Devices in the site. The local 1075 VLAN/AED specific information may be advertised to other Edge 1076 Devices. 1078 Each Edge Device keeps track of the other Edge Devices in the same 1079 site. If an Edge Device has a failure such that it is incapable of 1080 forwarding traffic for its authorized VLANs, other Edge Devices in 1081 the same site will detect or be notified of this event and run the 1082 AED selection procedure to reassign authority for the failed device's 1083 VLANs. 1085 5.2. Site Identifier 1087 All Edge Devices that belong to a single Layer-2 site will advertise 1088 a Site-ID on the overlay control plane. This information is used by 1089 remote Edge Devices to identify the members of the same site. The 1090 Site-ID influences the AED election and path selection from remote 1091 Edge Devices to the local site. The Site-ID may be statically 1092 assigned or dynamically computed by the devices in the same site. 1094 6. IS-IS as an Overlay Control Protocol 1096 This section describes the use of the IS-IS protocol to serve as the 1097 Overlay URP and MRP. The details of the IS-IS PDUs and TLVs defined 1098 for OTV are described in [IS-IS-OTV]. 1100 It is highly desired to leverage the native and existing IS-IS 1101 protocol functionality where feasible. There are some protocol 1102 extensions specific to OTV which are described in this document. 1104 The overlay network serves as a logical multi-access Ethernet LAN 1105 connecting the various Edge Devices. Hence, IS-IS hellos and LSPs 1106 can be exchanged directly over the overlay network similar to IS-IS 1107 operation on a LAN. These IS-IS packets are encapsulated in the OTV 1108 IP multicast header and reach other Edge Devices on the core 1109 multicast tree. In addition, OTV IS-IS packets use a distinct 1110 Layer-2 multicast destination address. Therefore, OTV IS-IS packets 1111 do not conflict with IS-IS packets used for other technologies even 1112 if they may be sent over the same links in the core or arrive at an 1113 Edge Device on the same core uplink interfaces. 1115 IS-IS packets belonging to different overlay VPNs are mutually 1116 isolated and distinguished by the OTV control packet header and the 1117 use of distinct multicast groups in the core. Standard IS-IS 1118 authentication mechanisms may additionally be used to provide further 1119 isolation and authentication of VPN membership. 1121 OTV IS-IS employs IS-IS LAN procedures on the overlay network. It 1122 forms IS-IS adjacencies with all other Edge Devices in the overlay 1123 and elects a Designated Router (DIS). The IS-IS system ID uniquely 1124 identifies an Edge Device in the IS-IS control plane. 1126 IS-IS IIHs are sent and received on the overlay by all Edge Devices. 1127 The IP addresses assigned to the overlay on an Edge Device is 1128 advertised in the IIHs and provides the IP reachability information 1129 to the edge device through the core. 1131 CSNPs are sent on the overlay by the DIS and used to achieve reliable 1132 delivery of the link state database. This link state database holds 1133 LSPs that describe the Edge Device connectivity to the pseudo-node 1134 (or the multi-access overlay network). The LSPs also hold the 1135 unicast MAC information that is advertised by a site Edge Device. 1136 CSNPs are also used to reliably deliver the Group Membership link 1137 state database that holds LSPs describing the multicast MAC group 1138 addresses. OTV IS-IS only maintains the Level-1 link state database. 1140 Unicast MAC address information is carried in LSPs in the MAC- 1141 Reachability (MAC-RI) TLV defined in [RFC6165]. All MAC addresses 1142 are typically advertised with a metric of 1. When using the MAC move 1143 procedures, the metric will be set to 0. Definition of the fields 1144 used by OTV is specified in [IS-IS-OTV]. 1146 Multicast related information is carried in LSPs in several different 1147 TLVs specified in [IS-IS-OTV]. The multicast groups that a site has 1148 receivers for are carried in the sub-TLVs of the Group Address TLV. 1149 Multicast sources discovered in a site are advertised in a Group 1150 Membership Active Source TLV. This TLV includes the list of groups 1151 for which the source is sending data along with the core Delivery 1152 Groups to which the advertising Edge Device will map the site data 1153 groups. 1155 When an Adjacency Server is being used, all Edge Devices inform the 1156 Adjacency Server regarding their reachability and capability 1157 information by including in their hellos the Adjacency Server TLV. 1158 The Adjacency Server includes a list of all the Edge Devices it has 1159 heard from, and their capabilities, in its hello PDUs. 1161 The Site-ID information is contained in the Site Identifier TLV and 1162 sent in IS-IS IIHs. 1164 7. Acknowledgements 1166 The authors would like to thank many for their careful review. They 1167 include Venu Nair, Victor Moreno, Ashok Chippa, Sameer Merchant, Tony 1168 Speakman, Raghava Sivaramu, Nataraj Batchu, Sreenivas Duvvuri, Gaurav 1169 Badoni, Veena Raghavan, Marc Woolward and Tim Stevenson. 1171 Many have received individual presentations of OTV and provided 1172 critical feedback early in the design process. These reviewers 1173 include Vince Fuller, Peter Lothberg, Dorian Kim, Peter Schoenmaker, 1174 Mark Berly, Scott Kirby, Dana Blair, Tom Edsall, Dinesh Dutt, 1175 Parantap Lahiri, and Jeff Jensen. 1177 8. Security Considerations 1179 The specifications in this document do not add any new security 1180 issues to Layer-2 bridging technologies. Existing security 1181 mechanisms may be used both in the control plane and in data 1182 forwarding to achieve any security requirements. 1184 This document specifies the use of IS-IS as a control protocol for 1185 OTV. It adds no additional security risks to IS-IS, nor does it 1186 provide any additional security for IS-IS. 1188 9. IANA Considerations 1190 There are new IS-IS PDUs and TLVs being proposed for OTV, and are 1191 defined in [IS-IS-OTV]. 1193 10. Normative References 1195 [IS-IS] ISO/IEC 10589, "Intermediate System to Intermediate System 1196 Intra-Domain Routing Exchange Protocol for use in 1197 Conjunction with the Protocol for Providing the 1198 Connectionless-mode Network Service (ISO 8473)", 2005. 1200 [IS-IS-OTV] 1201 Rao, D., "IS-IS Extensions to support OTV", 2011. 1203 [RFC6165] Banerjee, A. and D. Ward, "Extensions to IS-IS for Layer-2 1204 Systems", RFC 6165, April 2011. 1206 Authors' Addresses 1208 Hasmit Grover 1209 Cisco Systems 1210 170 W Tasman Drive 1211 San Jose, CA 95138 1212 US 1214 Email: hasmit@cisco.com 1216 Dhananjaya Rao 1217 Cisco Systems 1218 170 W Tasman Drive 1219 San Jose, CA 95138 1220 US 1222 Email: dhrao@cisco.com 1224 Dino Farinacci 1225 Cisco Systems 1226 170 W Tasman Drive 1227 San Jose, CA 95138 1228 US 1230 Email: dino@cisco.com 1231 Victor Moreno 1232 Cisco Systems 1233 170 W Tasman Drive 1234 San Jose, CA 95138 1235 US 1237 Email: vimoreno@cisco.com