idnits 2.17.1 draft-hasmit-otv-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (April 13, 2010) is 5121 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'IS-IS-Layer2' is mentioned on line 433, but not defined == Unused Reference: 'IS-IS' is defined on line 671, but no explicit reference was found in the text == Unused Reference: 'RBRIDGES' is defined on line 687, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'IS-IS' Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Grover 3 Internet-Draft D. Rao 4 Intended status: Standards Track D. Farinacci 5 Expires: October 15, 2010 Cisco Systems 6 April 13, 2010 8 Overlay Transport Virtualization 9 draft-hasmit-otv-00 11 Abstract 13 In today's networking environment most enterprise networks span 14 multiple physical sites. Overlay Transport Virtualization (OTV) 15 provides a scalable solution for L2/L3 connectivity across different 16 sites using the currently deployed service provider and enterprise 17 networks. It is a very cost-effective and simple solution requiring 18 deployment of a one or more OTV functional device at each of the 19 enterprise sites. This solution is agnostic to the technology used 20 in the service provider network and connectivity between the 21 enterprise and the service provider network. This document provides 22 an overview of this technology. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on October 15, 2010. 41 Copyright Notice 43 Copyright (c) 2010 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 60 2. Overlay Control Plane . . . . . . . . . . . . . . . . . . . . 8 61 2.1. Provider Control Plane . . . . . . . . . . . . . . . . . . 9 62 2.2. Overlay Control Plane . . . . . . . . . . . . . . . . . . 9 63 2.3. Advertising unicast and multicast information . . . . . . 10 64 2.4. Selecting next-hops for a MAC address entry . . . . . . . 10 65 2.5. Edge Device connectivity . . . . . . . . . . . . . . . . . 11 66 2.5.1. Edge Devices are IP hosts and MAC routers . . . . . . 11 67 2.5.2. Internal Interface Behavior . . . . . . . . . . . . . 11 68 2.5.3. Overlay Interface Behavior . . . . . . . . . . . . . . 12 69 3. Forwarding Process and Rules . . . . . . . . . . . . . . . . . 12 70 3.1. Forwarding Process . . . . . . . . . . . . . . . . . . . . 12 71 3.1.1. Forwarding between Internal Links . . . . . . . . . . 12 72 3.1.2. Forwarding from an Internal Link to an Overlay Link . 12 73 3.1.3. Forwarding from an Overlay Interface to an 74 Internal Interface . . . . . . . . . . . . . . . . . . 13 75 3.1.4. Overlay Forwarding and Native Forwarding 76 Concurrently . . . . . . . . . . . . . . . . . . . . . 13 77 3.2. STP BPDU Handling . . . . . . . . . . . . . . . . . . . . 13 78 4. Adjacency Server information . . . . . . . . . . . . . . . . . 14 79 5. Authoritative Edge Device Election . . . . . . . . . . . . . . 14 80 6. Site Identifier . . . . . . . . . . . . . . . . . . . . . . . 14 81 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . 15 83 9. IS-IS Code point Considerations . . . . . . . . . . . . . . . 15 84 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 85 10.1. Normative References . . . . . . . . . . . . . . . . . . . 15 86 10.2. Informative References . . . . . . . . . . . . . . . . . . 15 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16 89 1. Overview 91 OTV is a new "MAC in IP" way of supporting L2 VPNs over an L2/L3 92 infrastructure. OTV provides an "over-the-top" method of doing 93 virtualization versus traditional "in-the-network" style mechanisms 94 where multiple routing and forwarding tables are maintained in every 95 device from a source to a destination. With "over-the-top" methods 96 state is maintained at the network edges, but not at the site or in 97 the core. 99 OTV can be incrementally deployed and reside in a small number of 100 devices at the edge between sites and the core. We call these 101 devices "Edge Devices" which perform typical layer-2 learning and 102 forwarding functions on their site facing interfaces (internal 103 interfaces) and perform IP-based virtualization functions on their 104 core facing interfaces (for which an overlay network is realized). 106 Traditional L2VPN technologies rely heavily on tunnels. Rather than 107 creating stateful tunnels, OTV encapsulates layer 2 traffic with an 108 IP header ("MAC in IP"), but does not create any fixed tunnels. 109 Based on the IP header, traffic is forwarded natively in the core 110 over which OTV is being deployed. This is an important feature as 111 the native IP treatment of the encapsulated packet allows optimal 112 multi-point connectivity as well as optimal broadcast and multicast 113 forwarding, plus any other benefits the routed core may provide to 114 native IP traffic. 116 Layer 2 traffic, which requires traversing the overlay to reach its 117 destination, is pre-pended with an IP header which ensures the packet 118 is delivered to the edge boxes that provide connectivity to the Layer 119 2 destination in the original MAC header. As shown in figure 1, if a 120 destination is reachable via Edge Device X2 (with a core facing IP 121 address of IPB), other edge devices forwarding traffic to such 122 destination will add an IP header with a destination IP address of 123 IPB and forward the traffic into the core. The core will forward 124 traffic based on IP address IPB, once the traffic makes it to Edge 125 Device X2 it will be stripped of the overlay IP header and it will be 126 forwarded into the site in the same way a regular bridge would 127 forward a packet at layer 2. Broadcast or multicast traffic are 128 encapsulated with a multicast header and follows a similar process. 130 +----+ +----+ 131 | H1 |------- ------------ -------| H2 | 132 +----+ \ / \ / +----+ 133 \+----+IPA / L3 Core \ IPB+----+/ 134 ---------| X1 |----< >---| X2 |-------- 135 /+----+ \ Network / +----+\ 136 / \ / \ 137 ------------ 139 +------------+ 140 | DA = IPB | 141 +------------+ 142 | SA = IPA | 143 +-----------+ +------------+ +-----------+ 144 | DMAC = H2 | | DMAC = H2 | | DMAC = H2 | 145 +-----------+ +------------+ +-----------+ 146 | SMAC = H1 | | SMAC = H1 | | SMAC = H1 | 147 +-----------+ +------------+ +-----------+ 148 | Payload | | Payload | | Payload | 149 +-----------+ +------------+ +-----------+ 151 Figure 1. Traffic flow from H1 to H2 with encapsulation in the core. 153 The key piece that OTV adds is the state to map a given destination 154 MAC address in the L2 VPN to the IP address of the OTV edge device 155 behind which that MAC address is located. OTV forwarding is a 156 function of mapping a destination MAC address in the VPN site to an 157 edge device IP address in the overlay network. 159 To achieve all this, a control plane is required to exchange the 160 reachability information among the different OTV Edge Devices. We 161 will refer to this control plane as the oIGP and oMRP (Overlay IGP 162 and Overlay Multicast Routing Protocol). This is due to the fact 163 that OTV does not flood unknown unicasts among Edge Devices and 164 therefore precludes data-plane learning on the "overlay interface". 165 Data-plane learning continues to happen on the "internal interfaces" 166 to provide compatibility and transparency within the layer-2 sites 167 connecting to the OTV overlay. The edge devices appear to each VPN 168 site to be providing L2 switched network connectivity amongst those 169 sites. 171 The required control plane utilizes IS-IS as an IGP capable of 172 carrying a mix of MAC unicast and multicast addresses as well as IP 173 addresses. The information carried in IS-IS LSPs will be MAC unicast 174 and multicast addresses with their associated VLAN IDs and IP next 175 hops. The MAC addresses are those of the hosts connecting to the 176 network and the IP next hops are the addresses of the Edge Devices 177 through which these are reachable in the core. Figure 2 shows what 178 the resulting tables would look like in a simple two site example. 179 Because MAC address on a site are advertised in IS-IS to all other 180 sites, all Edge Devices will have knowledge of all MAC addresses for 181 each VLAN in the VPN. 183 +----+ +----+ 184 | H1 |------- ------------ -------| H2 | 185 +----+ \ / \ / +----+ 186 E1\+----+IPA / L3 Core \ IPB+----+/E1 187 ---------| X1 |----< >---| X2 |-------- 188 /+----+ \ Network / +----+\ 189 / Overlay1 \ /Overlay1 \ 190 ------------ 192 At X1 At X2 193 +----------------------------+ +----------------------------+ 194 | Destination | Interface/NH | | Destination | Interface/NH | 195 |----------------------------| |----------------------------| 196 | H1 | E1 | | H1 | Overlay1:IPA | 197 | H2 | Overlay1:IPB | | H2 | E1 | 198 +----------------------------+ +----------------------------+ 200 Figure 2. OTV Forwarding Tables. 202 Edge Devices will have an IP address on their core facing interface, 203 and these nodes join a configured ASM/Bidir multicast group in the 204 core transport network by sending IGMP/MLD reports like any host 205 joining a group would. So the edge boxes are hosts, relative to the 206 core, subscribing to multicast groups that are created in the 207 "provider network" and which rely on a provider IGP (pIGP) and a 208 provider Multicast Routing Protocol (pMRP). 210 Note: Only core devices participate in the pIGP/pMRP. The edge 211 devices connect as hosts to the core network and therefore do not 212 participate in the pIGP/pMRP. This is compatible and consistent with 213 today's interconnection policies. 215 We refer to the multicast group that the Edge Devices join as the 216 "Provider Multicast Group (pMG)". The pMG will be used for Edge 217 Devices to become adjacent with each other to exchange their IS-IS 218 LSPs, CSNPs, and Hellos. Thus, by virtue of the pMG, all edge 219 devices will see each other as if they were directly connected to the 220 same multi-access multicast-capable segment for the purposes of IS-IS 221 peering. The pMG also defines a VPN; thus, when an Edge Device joins 222 a pMG the site becomes part of a VPN. Multiple pMGs can be defined 223 to define multiple VPNs. IS-IS Hello authentication will be used to 224 validate Edge Devices can join the adjacency set. 226 The pMG could also be used to broadcast data traffic to all edge 227 boxes when necessary. Broadcast transmission will not incur head-end 228 replication overhead. OTV allows the pMRP to efficiently distribute 229 broadcast traffic by the provider ASM/Bidir group. 231 When forwarding of VPN multicast is required, new multicast state 232 will be used in order to tailor the distribution trees to the optimal 233 group of receivers, these multicast groups are to be created in the 234 provider control plane (pMRP). That is, each core device will resort 235 to using SSM multicast in the core by having the Edge Device IGMPv3/ 236 MLDv2 join a {source, group} pair. 238 Edge Devices must combine data-plane learning on their bridged 239 internal interfaces with control-plane learning on their overlay 240 interfaces. The key to this combination is a series of rules through 241 which data-plane events can trigger control-plane advertisements 242 and/or learning events. 244 OTV supports L2 multi-homing for sites where one or more of the 245 bridge domains may be connected to multiple Edge Devices. OTV 246 provides loop elimination for multi-homed "sites" and does not 247 require the extension of STP across sites. This means each site can 248 run it own STP rather than have to create one large STP domain across 249 sites. 251 OTV takes a conservative approach by employing an active-backup 252 capability for all MAC addresses within a VLAN by having a single 253 Edge Device forward data in and out of a site. However, an active- 254 active capability is provided across VLANs thereby allowing use of 255 multiple Edge Devices to load balance traffic in and out of a L2 256 multi-homed site. 258 1.1. Terminology 260 The term "Hello" or "Hello PDU" in this document, when not further 261 qualified, includes the TRILL IIH PDU, the LAN IIH PDU and the P2P 262 IIH PDU. 264 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 265 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 266 document are to be interpreted as described in RFC 2119. 268 Site - A Site is a single or multi-homed connected network which 269 is typically under the control of a single organization. Sites 270 are connected together via Edge Devices that operate in an overlay 271 network. The Edge Devices provide layer-2 connectivity among the 272 sites. A site will not be used by IS-IS as a transit network. A 273 layer-2 site is one that is mostly made up of hosts and switches. 274 Routers may exist but the majority of the topology to the Edge 275 Devices are L2 switched. The number of MAC addresses advertised 276 on the overlay network are all the hosts and routers connected to 277 the L2 devices at the site. A layer-3 site is one that is mostly 278 made up of routers connecting to hosts via switches. The majority 279 of the topology to the Edge Devices are L3 routed. The number of 280 MAC addresses advertised on the overlay network are limited to the 281 router devices at the site. 283 VPN - A VPN is a collection of sites which are controlled by a 284 single administration. The addressing plan, router and switch 285 configuration is consistent as it would be if the sites were 286 physically at the same location. Each VPN uses a unique IS-IS 287 authentication key and a dedicated ASM/Bidir multicast group (pMG 288 - provider Multicast Group) allocated by the core network. There 289 is one overlay network per VPN. 291 Edge Device - A modified L2 switch that performs OTV functions. 292 It will typically run as a L2 device but can be co-located in a 293 device that performs L3 routing on other L3-enabled ports. When 294 OTV functionality is described, this functionality only occurs in 295 an Edge Device. 297 Internal Interface - These are layer 2 interfaces connected to 298 site based switches or site based routers. The internal interface 299 is layer 2 regardless if it connects to a switch or a router. 301 Overlay Interface - This is a logical multi-access multicast- 302 capable interface. The overlay interface can replicate broadcast 303 and multicast packets efficiently. The interface takes L2 frames 304 and encapsulates them in IP unicast or multicast headers. The 305 overlay interface is realized by one or more physical core facing 306 interfaces. The core facing interfaces are assigned IP addresses 307 out of the core provider's address space. 309 MAC Table - This is a forwarding table of 48-bit MAC addresses. 310 The table can contain unicast or multicast MAC addresses. The 311 table is populated by two sources. One being traditional data- 312 plane learning on internal interfaces and the other by the IS-IS 313 protocol at the control-plane on the overlay interface. A MAC 314 table is scoped by VLAN therefore allowing the same MAC address to 315 be used in different VLANs, and potentially in different VPNs. 317 Authoritative Edge Device (AE) - This is an Edge Device that 318 forwards layer 2 frames in and out of a site from and to the 319 overlay interface, respectively. There is one and only one 320 authoritative Edge Device for all MAC unicast and multicast 321 addresses per VLAN. For other VLANs, another Edge Device can be 322 authoritative. A Non-Authoritative Edge Device is called a NAE. 324 Site-ID - Each Edge Device which resides in an OTV site will 325 advertise over the overlay network the same site-id. Site-ID 326 election is dynamically determined by the IS-IS protocol and is 327 syntactically the lowest system-id value of an Edge Device within 328 the site. 330 (VLAN, uMAC) - This is the designation of layer-2 network 331 reachability information as encoded in IS-IS and as stored in the 332 MAC table. This notation describes a given unicast MAC address 333 within a particular VLAN. 335 (VLAN, mMAC, mIP) - This is the designation of layer-2 network 336 reachability information as encoded in IS-IS and as stored in the 337 MAC table. This notation describes a given multicast MAC address 338 within a particular VLAN. The 'mIP' part of the 3-tuple is 339 provided so the SSM based tree is joined based on the IP group 340 address (since 32-to-1 aliasing can happen for IPv4 group address 341 to MAC mappings and worse for IPv6). 343 2. Overlay Control Plane 345 In this section we discuss the control plane hierarchy. At the very 346 base of the hierarchy we find the provider control plane, which 347 enables unicast reachability among the edge boxes and also provides 348 the multicast group that makes edge boxes adjacent from the overlay 349 control plane perspective. At the next level, the overlay control 350 plane conveys client-MAC-address reachability information between the 351 edge devices. 353 In general, the control planes are independent of each other. 354 However, in order to optimize multicasting, multicast control-plane 355 events (reports, joins, leaves) that occur in one MRP may initiate 356 events in another MRP so that the optimal tree is always being used 357 to forward traffic. Also, events in the overlay control plane are 358 triggered by forwarding events in the client data plane (however both 359 client and overlay control planes remain independent of each other). 361 2.1. Provider Control Plane 363 The provider control plane is the set of routing protocols which run 364 in the core infrastructure to be able to deliver packets sourced from 365 the site networks. There is no required coordination of routing 366 protocols between the site and the core. That is, no more than 367 typically necessary to connect to a core service. In terms of 368 addressing, the Edge Device is allocated an IP address out of the 369 core block of addresses. 371 For each VPN the edge device is to support, an ASM/Bidir multicast 372 group is required. That is, only one group per VPN. The multicast 373 state created in the client site network will map to the some amount 374 of state in the core network. However, no group address allocation 375 is needed for these data groups. Since SSM is used in the overlay, 376 uniqueness of multicast is achieved by the uniqueness of the address 377 assigned to the edge device. The edge device takes a client 378 multicast packet and encapsulates it in a core-deliverable multicast 379 packet. 381 2.2. Overlay Control Plane 383 The overlay control plane conveys MAC reachability information 384 between Edge Devices. The MAC addresses that are locally connected 385 to an Edge Device are advertised in the overlay IGP to other Edge 386 Devices in the VPN. Thus, MAC learning on the overlay is not based 387 on data plane flooding, but is based on explicit advertisements of 388 unicast and multicast MAC addresses. This advertisement of MAC 389 addresses is done by the overlay control plane. 391 The overlay IGP establishes adjacencies only between Edge Devices 392 that are in the same VPN. As explained in the previous section, Edge 393 Devices become part of a VPN when they join a multicast group defined 394 in the core (provider-MRP); members of the same group are members of 395 the same VPN. The hellos and updates between overlay-IGP peers 396 travel over the multicast group defined in the pMRP. Thus, edge 397 devices peer with each other as if they were directly connected at 398 layer 2. This peering is possible as all the traffic for the oIGP is 399 encapsulated with the pMRP group address and sent into the core. 400 Thus, all edge devices in a given VPN receive the oIGP multicast 401 traffic as if they were all on the same segment. 403 Similarly, the overlay-MRP traffic is encapsulated with the pMRP 404 group address corresponding to the VPN. The overlay-MRP is used to 405 inform all the Edge Devices that the subscribers to a particular 406 group are reachable over the overlay network. Thus, the Edge Devices 407 snoop IGMP/MLD reports and then the oMRP notifies all edge devices in 408 the VPN which group has been joined by sending an MCAST PDU with the 409 Group MAC address in it. The information conveyed by the oMRP is 410 used solely for the Edge Devices to populate their oif-list at the 411 source site. Edge Devices on the receiving sites will IGMPv3/MLDv2 412 join the corresponding (S,G) group in the provider plane (pMRP) when 413 they snoop the IGMP/MLD traffic from the site. Thus, multicast trees 414 are built natively in the core, not on the overlay. 416 2.3. Advertising unicast and multicast information 418 IS-IS is used as the overlay IGP and MRP. When a MAC address is 419 learned by arrival of a data packet on an internal interface, the MAC 420 address is placed in an IS-IS LSP if and only if the Edge Device is 421 authoritative for the VLAN the MAC resides in. Likewise, when a 422 multicast MAC address is learned by IGMP/MLD snooping, a 423 (VLAN,mMAC,mIP) entry is placed in an MCAST PDU by an authoritative 424 Edge Device. When an Edge Device is authoritative, it advertises a 425 unicast MAC address as soon as it learns the MAC on an internal 426 interface. The TLV used for this purpose is the Unicast MAC RI TLV 427 defined in [IS-IS-Layer2]. 429 As for multicast MAC addresses, when an IGMP/MLD report is received 430 on an internal interface, the authoritative Edge Device will 431 advertise the multicast MAC address in an MCAST PDU with metric 1. 432 The TLV used for this purpose is the GADDR-TLV and its sub-TLVs as 433 defined in [IS-IS-Layer2]. 435 Essential to OTV is that IS-IS is conveying not just MAC address 436 information amongst the edge devices in a given VPN, it is also 437 implying that those MAC address may be mapped to the IP addresses of 438 the advertising edge devices for the purposes of "MAC in IP" 439 forwarding across the overlay. 441 To allow scalability of connecting large L2 sites together via the 442 overlay, by default, an Edge Device will not report any (VLAN-ID, 443 MAC) pairs. To avoid inadvertent merging of VLANs among sites, Edge 444 Devices will be required to configure the VLANs for which Edge 445 Devices will advertise MACs for. 447 2.4. Selecting next-hops for a MAC address entry 449 When a site is multi-homed, a unicast MAC address is advertised by a 450 single Edge Device. That is, the Edge Device that is authoritative 451 for the VLAN. Therefore, remote Edge Devices should never see an 452 equal-cost path to a MAC address at a remote site through multiple 453 Edge Devices. 455 The same can be said for multicast MAC addresses even though next-hop 456 calculations are not necessary. All a remote Edge Device cares about 457 is if there is at least one site that is a member of a multicast 458 group so it knows to forward packets address to the group on the 459 overlay network. So only the authoritative Edge Device for a 460 multicast MAC address will advertise the address in an MCAST PDU. 462 2.5. Edge Device connectivity 464 In order to successfully connect to the overlay, the Edge Device has 465 several functions on its different interfaces. These are summarized 466 in this section. 468 2.5.1. Edge Devices are IP hosts and MAC routers 470 The Edge Device does not participate in the provider IGP (pIGP) as a 471 router, but as a host. The Edge Device has an IP address which is 472 significant in the core/provider addressing space. The Edge Device 473 joins the multicast groups in the core by issuing IGMPv3/MLDv2 474 reports, just like a host would. Thus the Edge Device does not have 475 an IGP relationship with the core. 477 However, the Edge Device does participate in the overlay-IGP and its 478 IP address is used as a router ID and a next hop address for unicast 479 traffic by the overlay IGP. However, the Edge Device does not build 480 an IP routing table with the information received from the oIGP, but 481 rather builds a hybrid table where MAC address destinations are 482 reachable via IP next hop addresses. If we were to name this 483 somehow, we could call it a MAC router because it can route packets 484 based on MAC addresses. 486 Thus, Edge Devices are IP hosts in the provider plane, MAC routers in 487 the overlay plane and bridges in the client bridging plane. 489 2.5.2. Internal Interface Behavior 491 The internal interfaces on an Edge Device are bridged and are 492 indifferent to whether the site itself is L2 or L3. These interfaces 493 behave as regular switch interfaces and learn the source MAC 494 addresses of traffic they receive. Spanning tree BPDUs are received, 495 processed and sourced on internal interfaces as they would on a 496 regular 802.1d, 802.1s and 802.1w switch. Additionally, traffic 497 received on internal interfaces may trigger oIGP/oMRP advertisements 498 and/or pMRP group joins according to the rules described in earlier 499 in the section on "Advertising Unicast and Multicast MAC addresses". 501 Traffic received on an internal interface will be forwarded according 502 to the MAC tables either onto another internal interface (regular 503 bridging) or onto the overlay (OTV forwarding). This is explained in 504 detail in Forwarding section. 506 2.5.3. Overlay Interface Behavior 508 Overlay interfaces are interfaces which have an IP address in the 509 provider/core address space. Traffic out of these interfaces is 510 encapsulated with an IP header, and traffic received on these 511 interfaces must be de-capsulated to produce a L2 frame. The detail 512 on forwarding is discussed in Forwarding section. 514 STP BPDUs are not sourced from overlay interfaces, therefore there 515 should not be STP BPDUs in the core, nor do the overlay interfaces 516 participate in the spanning tree protocol. 518 Even though the overlay interface has an IP address, it does not 519 participate in the provider IGP or MRP, it behaves as a host 520 connected to the core. The IP addresses assigned to the overlay 521 interfaces are used as next hop addresses by the overlay-IGP, 522 therefore the address table for the overlay interface will include a 523 remote IP address as the next hop information for remote MAC 524 addresses. 526 3. Forwarding Process and Rules 528 3.1. Forwarding Process 530 Most of the interesting forwarding cases happen when a packet comes 531 from the Overlay Link to be forwarded to an Internal Link, or vice 532 versa. But for completeness, we will describe how forwarding between 533 Internal Links will occur. 535 3.1.1. Forwarding between Internal Links 537 When an Edge Device has internal links, it operates like a 538 traditional L2 switch. That is it will send unicast packets on a 539 port where the MAC was learned, it will send multicast packets on the 540 ports it has IGMP/MLD-snooped, and it will send broadcast packets out 541 all ports. This is done on a per VLAN basis. 543 3.1.2. Forwarding from an Internal Link to an Overlay Link 545 An Edge Device will decide to forward a unicast, multicast, or 546 broadcast packet over the overlay interface when oIGP has put the 547 logical port of the overlay interface in the MAC table for the 548 corresponding unicast or multicast MAC address. When a packet is 549 sent over the overlay interface, it is first prepended with an IP 550 header. The packet as received from the internal interface is not 551 touched other than to remove the preamble and FCS from the frame. 552 The IP header, outer MAC header, and physical port the packet is to 553 go out is all cached in the hardware. This is so all the information 554 to physically forward the packet is all together to easily prepend 555 and send at high rate. 557 The IP addresses and the outer MAC addresses are all provided and 558 stored for the hardware by the control-plane software. The multi- 559 homing of sites imposes additional rules on the forwarding of 560 traffic. 562 3.1.3. Forwarding from an Overlay Interface to an Internal Interface 564 When a packet is received on the overlay interface, it will need to 565 be IP de-capsulated to reveal the inner MAC header for forwarding. 566 The inner MAC header SA and DA addresses will used for the MAC table 567 lookup. When a unicast or multicast packet is received on the 568 overlay interface, an Edge Device must determine if it should forward 569 it based on its authoritative status for the VLAN. 571 3.1.4. Overlay Forwarding and Native Forwarding Concurrently 573 There are cases where a VLAN will have some MACs that will be 574 advertised and forwarded over the overlay network and others that 575 will have their packets forwarded natively on physical interfaces. 576 This can be controlled by policy configuration on the Edge Device. 578 By default, a VLAN is not adertised on the overlay network, therefore 579 forwarding cannot occur over on the overlay network. When a VLAN is 580 enabled, an Edge Device will begin advertising locally learned MAC 581 addresses in IS-IS. If the MAC needs to be connected through the 582 core natively, the network adminstrator must set up a route-filter 583 based access-list to deny advertising the MAC. 585 3.2. STP BPDU Handling 587 Since the Edge Device acts as an L2 switch it does participate in the 588 Spanning Tree Protocol if the site has been configured to use it. 589 The OTV design does not depend on the Spanning Tree Protocol for any 590 of the VPN connectivity functionality. The following are the rules 591 an OTV Edge Device will follow: 593 o When STP is configured at a site, an Edge Device will send and 594 receive BPDUs on internal interfaces. An OTV Edge Device will not 595 originate or forward BPDUs on the overlay network. 597 o An OTV Edge Device can become a root of one or more spanning 598 trees. 600 o An OTV Edge Device will take the typical action when receiving 601 Topology Change Notification (TCNs) messages. 603 o When on OTV Edge Device detects another Edge Device in it's site 604 has come up or gone down, it may send a TCN so it can gather new 605 state for when its authoritative status changes for a VLAN. 607 To allow the L2 switch network to scale to larger number of nodes and 608 MAC addresses, it is considered a feature of OTV to maintain and keep 609 the spanning trees small and per site. 611 4. Adjacency Server information 613 In case the provider core does not support ASM/Bidir multicast, there 614 is an alternate mechaism to discover the remote Edge Devices which 615 are part of a VPN. In this scenario, an Edge Device is configured as 616 an Adjacency Server. All other Edge Devices inform the Adjacency 617 Server regarding their reachability and capability information using 618 the Adjacency Server TLV defined in [IS-IS-Layer-2]. Adjacency 619 Server is responsible for informing all the other existing Edge 620 Devices regarding addition or loss of an Edge Device. Based on the 621 reachability information, the Edge Devices can further communicate 622 with one another directly using unicast or multicast data path. 624 5. Authoritative Edge Device Election 626 Authoritative Edge Device for a VLAN is selected from a list of local 627 Edge Devices in an site. The AE Device selection algorithm tries to 628 ensure that the VLANs are evenly spread across all the Edge Devices 629 supporting a specific VPN. The VPN specific information is 630 advertised by an Edge Device using Site Group TLV defined in [IS-IS- 631 Layer-2] 633 6. Site Identifier 635 Site-ID information is sent out to all the remote Edge Devices using 636 Site Identifier TLV defined in [IS-IS-Layer-2]. This information is 637 used by the remote EDs to identify all the edge devices belonging to 638 the same site. 640 7. Acknowledgements 642 The authors would like to thank many for their careful review. They 643 include Venu Nair, Victor Moreno, Ashok Chippa, Sameer Merchant, Tony 644 Speakman, Raghava Sivaramu, Nataraj Batchu, Sreenivas Duvvuri, Gaurav 645 Badoni, Veena Raghavan, Marc Woolward and Tim Stevenson. 647 Many have received individual presentations of OTV and provided 648 critical feedback early in the design process. These reviewers 649 include Vince Fuller, Peter Lothberg, Dorian Kim, Peter Schoenmaker, 650 Mark Berly, Scott Kirby, Dana Blair, Tom Edsall, Dinesh Dutt, 651 Parantap Lahiri, and Jeff Jensen. 653 8. Security Considerations 655 This document adds no additional security risks to IS-IS, nor does it 656 provide any additional security for IS-IS. 658 9. IS-IS Code point Considerations 660 This document uses the new PDU types, namely the MCAST PDU, MCAST- 661 CSNP PDU, the MCAST-PSNP PDU as defined in [IS-IS-Layer-2]. This 662 document uses a set of new IS-IS TLVs, the MAC-Reachability TLV (type 663 141), the Group Address TLV (type 142) and its sub-TLVs, some sub- 664 TLVs of the Port-Capability TLV (type 143), and Group Member Active 665 Source TLV (type 146) that are defined in the [IS-IS-Layer-2]. 667 10. References 669 10.1. Normative References 671 [IS-IS] ISO/IEC 10589, "Intermediate System to Intermediate System 672 Intra-Domain Routing Exchange Protocol for use in 673 Conjunction with the Protocol for Providing the 674 Connectionless-mode Network Service (ISO 8473)", 2005. 676 [IS-IS-Layer-2] 677 Banerjee, A., et al., "Extensions to IS-IS for Layer-2 678 Systems", draft-ietf-isis-layer2, work in progress 2010. 680 10.2. Informative References 682 [IEEE 802.1aq] 683 "Standard for Local and Metropolitan Area Networks / 684 Virtual Bridged Local Area Networks / Amendment 9: 685 Shortest Path Bridging, Draft IEEE P802.1aq/D1.5", 2008. 687 [RBRIDGES] 688 Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. 690 Ghanwani, "RBridges: Base Protocol Specification", 2010. 692 Authors' Addresses 694 Hasmit Grover 695 Cisco Systems 696 170 W Tasman Drive 697 San Jose, CA 95138 698 US 700 Email: hasmit@cisco.com 702 Dhananjaya Rao 703 Cisco Systems 704 170 W Tasman Drive 705 San Jose, CA 95138 706 US 708 Email: dhrao@cisco.com 710 Dino Farinacci 711 Cisco Systems 712 170 W Tasman Drive 713 San Jose, CA 95138 714 US 716 Email: dino@cisco.com