idnits 2.17.1 draft-marques-l3vpn-mcast-edge-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 222: '... MUST allocate a different label tha...' RFC 2119 keyword, line 287: '...g forwarding entry SHALL result in the...' RFC 2119 keyword, line 289: '...he VPN Forwarder MAY decide to hold on...' RFC 2119 keyword, line 366: '...he VPN Forwarder MUST drop an incoming...' RFC 2119 keyword, line 370: '...he VPN Forwarder MUST generate a copy ...' (6 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 2012) is 4305 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-07) exists of draft-marques-l3vpn-end-system-05 Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Marques 3 Internet-Draft Contrail Systems 4 Intended status: Standards Track L. Fang 5 Expires: December 01, 2012 Cisco Systems 6 D. Winkworth 7 FIS 8 Y. Cai 9 P. Lapukhov 10 Microsoft Corporation 11 June 2012 13 Edge multicast replication for BGP IP VPNs. 14 draft-marques-l3vpn-mcast-edge-01 16 Abstract 18 In data-center networks it is common to use Clos network topologies 19 [clos] in order to provide a non-blocking switched network. In these 20 topologies it is often not desirable to provide native IP multicast 21 service. 23 This document defines a multicast replication algorithm along with 24 its control and data forwarding procedures that provides a multicast 25 service for a BGP IP VPN network without assuming that the underlying 26 infrastructure supports IP multicast. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on December 01, 2012. 45 Copyright Notice 47 Copyright (c) 2012 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents (http://trustee.ietf.org/ 52 license-info) in effect on the date of publication of this document. 53 Please review these documents carefully, as they describe your rights 54 and restrictions with respect to this document. Code Components 55 extracted from this document must include Simplified BSD License text 56 as described in Section 4.e of the Trust Legal Provisions and are 57 provided without warranty as described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 3. VPN Forwarder behavior . . . . . . . . . . . . . . . . . . . . 6 64 4. Multicast tree management . . . . . . . . . . . . . . . . . . 8 65 5. BGP Protocol Extensions . . . . . . . . . . . . . . . . . . . 11 66 5.1. Multicast Tree Route Type . . . . . . . . . . . . . . . . 12 67 5.2. Multicast Edge Discovery Attribute . . . . . . . . . . . . 12 68 5.3. Multicast Edge Forwarding Attribute . . . . . . . . . . . 13 69 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 70 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 71 7.1. Normative References . . . . . . . . . . . . . . . . . . . 14 72 7.2. Informational References . . . . . . . . . . . . . . . . . 15 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 75 1. Introduction 77 In Wide-Area Networks having native multicast service on hop-by-hop 78 basis allows for more efficient use of scarse link bandwidth. In 79 Clos network topologies [clos] the trade-offs are different. 81 A Clos network is often used to provide full cross-sectional 82 bandwidth between all the ports on the network. When used in a 83 switching infrastructure it achieves this goal by spreading flows 84 across multiple equal cost paths. 86 For Clos topologies with multiple stages native multicast support 87 within the switching infrastructure is both unnecessary and 88 undesirable. By definition the Clos network has enough bandwidth to 89 deliver a packet from any input port to any output port. Native 90 multicast support would however make it such that the network would 91 no longer be non-blocking. Bringing with it the need to devise 92 congestion management procedures. 94 In this type of environments it is desirable to provide multicast 95 service as an edge functionality on top of a unicast clos fabric. 96 Early versions of IP VPN multicast services have used ingress 97 replication. The drawback with that approach is the load imposed on 98 the ingress node which is specially relevant for situations in which 99 the multicast group has a large number of receivers. This document 100 takes a different approach by leveraging the receivers in order to 101 build an edge based replication tree on a per-flow basis. 103 Data-center networks often require network virtualization services 104 such as the one described in [I-D.marques-l3vpn-end-system]. This 105 document defines a set of procedures to be implemented in a VPN 106 forwarder in order to provide multicast service for a BGP IP VPN. 108 It meets several important requirements: 110 Support for both source-specific and shared multicast trees. 112 Support for variable degrees of replication per tree node. 114 Loop-free forwarding topology. 116 The solution itself does not assume a specific topology on the 117 underlying infrastructure network. We simply assume that it is 118 undesirable to use native multicast service. This can be a result of 119 topology as per the CLOS example above or some other constraint that 120 makes it undesirable to create multicast groups based on the overlay 121 topology. 123 2. Overview 125 This document defines a mechanism to construct and manage multicast 126 distribution trees for overlay networks that does not rely on the 127 underlying physical network to provide multicast capabilities. The 128 solution places an upper bound on the number of copies that a 129 particular network node has to generate in contrast with ingress 130 replication in which the ingress node must generate one packet 131 replica for each receiver in the group. 133 Using this approach ingress node and link load is traded off for 134 additional packet replication steps in other nodes in the network. 135 This is achieved by building a K-ary tree where each node is 136 responsible to generate up-to K replicas. For a multicast group with 137 m receivers the height of the tree is approximately "log K(m)". 138 Where the height of the tree determines the maximum number of 139 forwarding hops required to deliver a packet to the receiver. 141 A separate overlay distribution tree is constructed for each 142 multicast group, using an MPLS label to identify the tree at each 143 hop. The nodes in the tree are VPN forwarders with local receivers 144 for the specific group. The tree uses a bi-directional forwarding 145 algorithm. A shared tree is used for all the sources in the group in 146 the case of an ASM group. 148 The distribution tree is constructed hierarchically: 150 1. Signaling Gateways build a tree the contains all locally 151 registered VPN forwarders with local multicast receivers, 152 observing the out-degree constraint K. 154 2. Each Signaling Gateway announces a collection of available edges 155 that can be used to join its local distribution tree with other 156 trees built by other Signaling Gateways. The number of such 157 edges also respects the out-degree constraint. 159 3. One of the Signaling Gateways that has been previously elected to 160 assume the role of "tree manager" for the specific group, assigns 161 the edges that connect the lowest level trees together and 162 advertises this information to the other Signaling Gateways. 164 IP hosts use IGMP [RFC3376]/MLD [RFC3810] to request the delivery of 165 multicast packets for a particular (*, g) or (s, g). Discovery 166 applications where the intent is to allow applications to discover 167 the group membership use (*, g) JOINs. Content delivery applications 168 may use an (s, g) JOIN after initially performing discovery either 169 via multicast or by other means. 171 In the context of end-system VPNs, the VPN Forwarder acts as an IGMP 172 querier on the virtual interfaces and receives IGMP/MLD Membership 173 Report packets. It uses this information to generate VPN-specific 174 multicast membership information. 176 This information is communicated to the Signaling Gateway as a triple 177 (vrf-id, s/*, g) via an XMPP publish request. This is similar to the 178 process used to publish unicast IP addresses associated with virtual- 179 interfaces. 181 This message also indicates the label range that can be used to 182 assign 20-bit forwarding labels to this multicast traffic flow. The 183 same label range can be shared between different multicast groups. 184 It is the responsibility of the VPN Gateway to ensure that a given 185 label is not used for multiple groups simultaneously. 187 VPN Forwarders can choose to advertise a single label range for all 188 multicast groups or different label ranges for different sets of 189 multicast groups. The set granularity can be as small as single 190 multicast group. 192 The label range advertised by the VPN Forwarder should be larger than 193 the expected number of active multicast groups within the set plus an 194 additional constant that ensures that a label will not be reused 195 within a time frame greater than the time it takes for topology 196 updates to propagate. 198 Signaling Gateways construct multicast distribution trees such that 199 each node in the tree is a VPN Forwarder and each node in the tree 200 has no more than K-edges where K is defined by configuration. The 201 parameter K may be different for different VPN gateways. 203 The multicast distribution tree is an acyclic graph. The Signaling 204 Gateway assigns edges between nodes ensuring that all nodes are 205 connected and there are no cycles. The resulting graph is a spanning 206 tree. 208 The Signaling Gateway can use any algorithm to manage the graph. In 209 practice, we expect that the Signaling Gateway would attempt to 210 minimize the cost of the tree subject to the out-degree constraint 211 (at most K edges) while also minimizing the disruption caused by each 212 individual node JOIN or LEAVE. 214 The Signaling Gateway constructs an OLIST for each VPN Forwarder, 215 where its OLIST is constituted by an upstream edge (for all nodes 216 except for the root) plus up-to K downstream edges. Each VPN 217 Forwarder delivers traffic locally to the virtual interfaces that 218 have JOINed the specific group as well as replicate the packet up-to 219 K times according to the OLIST. 221 Whenever the OLIST for a given node changes, the Signaling Gateway 222 MUST allocate a different label that corresponds to that version of 223 the OLIST. This is used to avoid forwarding loops. The assumption is 224 that at each run of its tree management algorithm the Gateway is 225 capable of building a acyclic graph. However signaling updates from 226 the Gateway to the VPN Forwarders are not synchronous. Each modified 227 OLIST will have a different label assigned, which means that in 228 transient state traffic may be discarded if a VPN forwarder with 229 information regarding an old edge send traffic to a VPN forwarder 230 which has already received information of the new topology. However 231 this eliminates the possibility of forwarding loops. 233 Traffic forwarding is done according to a bi-directional forwarding 234 algorithm. Packets flowing from the root are distributed to all the 235 outgoing edges. Traffic received from one of the leaves is sent to 236 the root facing interface plus remaining descendants. This assumes 237 that the VPN forwarder has the ability to determine the source of the 238 traffic, by examining the outer IP header of the packet. The MPLS 239 label contained in the packet identifies the multicast distribution 240 tree but it is not sufficient to determine the OLIST element from 241 which the packet has been received. 243 Signaling Gateways communicate multicast membership information to 244 each other using BGP L3VPN C-Multicast routes [RFC6514]. Associated 245 with each C-Multicast route, the Signaling Gateway also advertises 246 up-to K edges that can be use to interconnect the multicast 247 distribution tree that it manages with other trees managed by its 248 peers. The C-Multicast routes are known to all signaling gateways 249 which have local membership in the corresponding VPNs. 251 A predefined hash function is used to determine a 32-bit value X 252 associated with the specific multicast group. This value is used to 253 elect the multicast tree manager for the specific group. The tree 254 manager is the Signaling Gateway for which the value (RouterId - X) 255 is lower using 32-bit unsigned math. 257 As previously described in the case of the Signaling Gateways 258 managing distribution trees of VPN Forwarders, the tree manager is 259 responsible to determine the edges between the several nodes in order 260 to build an acyclic graph. In this case the nodes are themselves 261 replication trees. 263 The tree manager is responsible to assign the forwarding labels used 264 by the particular graph edge. These labels are offered in the 265 C-Multicast membership information as a list of available labels per 266 edge. 268 3. VPN Forwarder behavior 270 VPN Forwarders act as IGMP/MLD queriers on the virtual interfaces 271 that provide connectivity to end-systems. They receive IGMP/MLD 272 Membership Report packets on these point-to-point interfaces which 273 are then used to build the local per VRF membership information. 275 Each VRF may have a list of (s, g), (*, g) and (*, *) multicast 276 routing entries associated with it. These are the result of IGMP/MLD 277 Membership Reports or Queries. Routing entries can also be created 278 as a result of detecting a local source on one of the virtual- 279 interfaces associated with the VRF. 281 Multicast groups in the Source-Specific Multicast [RFC4607] address 282 prefix use both (s, g) and (*, g) routing entries while the Any- 283 Source Multicast (ASM) groups use (*, g) routing entries only. 285 The forwarding table on a VPN Forwarder contains (vrf, *, g) entries 286 for ASM groups and (s, g) entries for SSM groups. Multicast packets 287 that do not match an existing forwarding entry SHALL result in the 288 creation of a local routing entry, when received from a virtual 289 interface. The VPN Forwarder MAY decide to hold on the the first 290 packet that triggers the creating of a routing entry. 292 Locally-know multicast routes, either the result of IGMP/MLD 293 Membership Reports or locally sourced traffic are subject to 294 expiration. 296 When a multicast route is created locally, the VPN Forwarder 297 generates an XMPP subscription message to the corresponding vrf-name, 298 group and source. When the source is not specified a (*, g) is 299 implied. When a multicast router is detected on the virtual- 300 interface, via the receipt of IGMP/MLD Query messages the VPN 301 forwarder subscribes to the group 0.0.0.0. 303 Group Join form VPN Forwarder to gateway: 305 307 to='network-control.domain.org' 308 id='sub1'> 309 310 311 312 313 10000-20000 314 315 316 317 319 Signaling Gateways build the multicast distribution tree for a 320 specific group. When the distribution tree is built, the signaling 321 gateway will include as members all the (*, *) receivers of ASM 322 groups and all (*, *) and (*, g) receivers of SSM groups. 324 Once the subscription is received, the gateway sends XMPP event 325 notifications that contain forwarding information for the specific 326 group. These messages contain an incoming label, assigned by the 327 gateway, and a list of up-to K+1 next-hops, where each next-hop 328 consists of an IP destination address and an outgoing label. 330 When the last local member of a multicast group leaves the group, 331 either explicitly or as a result of a expiration timer, the VPN 332 forwarder generates an XMPP pubsub 'delete' message to the Signaling 333 Gateway. 335 Multicast forwarding state update from gateway to VPN forwarder: 337 338 339 340 341 342 343 344 345 [...] 346 347 348 349 350 351 ... 352 353 354 355 356 The VPN forwarder updates its multicast forwarding table with the 357 information received in this event notification. Any label that was 358 previously assigned to the (vrf, *, g) or (vrf, s, g) forwarding 359 entry is implicitly withdrawn. 361 Multicast packets are encapsulated in an IP tunnel that contains a 362 20-bit label as well as the original multicast datagram. This 20-bit 363 label uniquely identifies the multicast replication state as 364 specified by the OLIST. 366 The VPN Forwarder MUST drop an incoming multicast packet unless it is 367 either received from a local virtual interface or the source is 368 present in the OLIST. 370 The VPN Forwarder MUST generate a copy of the incoming packet to all 371 next-hops in the OLIST except the next-hop with the same IP address 372 as the outer header source of the incoming packet. 374 Additionally, the VPN Forwarder MUST generate additional copies to 375 the virtual interfaces associated with the VRF that have expressed 376 interest in the specific multicast group. 378 4. Multicast tree management 380 The multicast forwarding tree associated with a specific multicast 381 group is built hierarchically. At the lowest level, Signaling 382 Gateways build a acyclic graph in which nodes are VPN Forwarders and 383 where nodes have up-to (K+1) edges. Above this level, the graph 384 nodes are multicast replication trees themselves. 386 At the lowest level, VPN Forwarders implicitly select the signaling 387 gateway responsible to manage its tree by subscribing to a single 388 gateway. At higher levels, the forwarding tree manager is elected by 389 selecting the gateway with the smaller value of (RouterId - 390 HashFunction(g)) in unsigned 32-bit arithmetic. 392 While the multicast tree management algorithm is a local matter to 393 the gateway implementation, the algorithm used SHOULD minimize the 394 height of the multicast replication tree and attempt to minimize the 395 number of state changes to the tree. As an example Prim's algorithm 396 [prim] can be used to generate a minimum spanning tree. 398 In this application all the nodes in the graph can have an edge to 399 any other node as long as the total number of edges does not exceed K 400 + 1. The implementation may choose to assign the same cost to all 401 the edges or i may use external information to determine cost. For 402 instance, an implementation may choose to assign lower cost to edges 403 between nodes in the same server rack versus nodes in different 404 racks. 406 Signaling gateways assign forwarding labels from an interval provided 407 by the VPN Forwarder. Whenever the tree topology changes such that 408 nodes in with different versions of the topology could create a 409 forwarding loop the gateway MUST allocate a new label. When leaf 410 nodes in the tree are added or deleted these changes can be performed 411 without concern for the formation of transient loops. However, in 412 the case of tree rotations to rebalance the tree, there is a clear 413 potential for forwarding loops. 415 In generic terms, a transient forwarding loop can be formed if there 416 exist multiple versions of the graph that are being executed by 417 different nodes. As an example consider a graph with nodes (a, b, c) 418 that goes through the following topology versions: 420 +---------+----------+ 421 | Version | Edges | 422 +---------+----------+ 423 | 1 | a-b, a-c | 424 | 2 | a-b, b-c | 425 | 3 | a-c, b-c | 426 +---------+----------+ 428 In a scenario where node 'a' is at version 1, node 'b' at version 2 429 and node 'c' at version 3 a transient loop will occur. In this 430 example, a packet that is injected at node 'a' will propagate to both 431 'b' and 'c'. 'b' accepts the packet since the edge (a-b) is a valid 432 edge and propagates the traffic to 'c'. 'c' accepts packet from both 433 'a' and 'b'. The packet 'c' receives from 'b' will be forwarded to 434 'a' which will accept it, creating a loop. Likewise the packet 'c' 435 receives from 'a' will be forwarded to 'b', which will then forward 436 to 'a'. 438 All of the topologies in the example above are loop-free. However 439 the fact that routing information propagates is not synchronized 440 allows for the formation of loops. 442 Given that the propagation of forwarding entries to VPN Forwarders is 443 asynchronous and that it would be undesirable to attempt to 444 synchronize the process, we use the incoming label to break the 445 potential forwarding loop. For the loop to be broken it is necessary 446 that the forwarding labels used in the edge (a-c) in the example 447 above be different in configuration 1 versus configuration 3. 449 As an example, forwarding loop avoidance can be implemented by 450 keeping a list of edges that have been previously present in a node 451 and modifying the label every time the tree management algorithm adds 452 an edge that had been removed from the node. 454 A Signaling Gateway that has received multicast routing information 455 from locally connected VPN Forwarders shall advertise the 456 corresponding multicast group as a C-Multicast route. These 457 C-Multicast routes shall include an Edge Discovery attribute that 458 describes up-to K + 1 multicast next-hops, each containing an IP 459 address and a label range that can be used to assign forwarding 460 labels. 462 The tree manager election algorithm selects which of the signaling 463 gateways is responsible to determine the topology of the multicast 464 distribution tree. At this level in the hierarchy, the distribution 465 tree consists of graph nodes that are themselves distribution trees. 466 In the case where tree nodes were VPN Forwarders, the tree management 467 algorithm can assign up-to (K + 1) edges to a node (where K can 468 potentially be configurable per node). In the case where tree nodes 469 are distribution trees, the tree management algorithm is limited to 470 the number of edges received in the C-Multicast route. 472 The algorithm used to manage the lower and higher levels in the 473 hierarchy can be the same. 475 When the tree manager modifies the tree topology it shall generate 476 BGP routes that describe the current topology. These routes are 477 encoded using the MCAST-VPN NLRI using the Multicast Tree Route Type 478 defined bellow. 480 Multicast Tree routes contain an Edge Forwarding attribute that 481 describes the active edges between different nodes. 483 Multicast Tree routes are interpreted only by the Signaling Gateway 484 that is identified by the Router-Id contained in the NLRI prefix. On 485 a receipt of such a route, the Signaling Gateway connects its own 486 multicast distribution tree with the edges contained in the Edge 487 Forwarding attribute. 489 In order to illustrate the operation of the hierarchical label 490 management process, we present the following example. 492 Consider a scenario where the out-degree constant K is 4 and where 4 493 Signaling Gateways (A, B, C, D) are present. The Signaling Gateways 494 are the multicast tree managers for a set of VPN forwarders. We use 495 the notation a1 .. an for the VPN forwarders managed by node A. 497 Assume that Signaling Gateway A has built a multicast distribution 498 tree such that node a3 is the root. This node has 2 descendants a1 499 and a2. Each of these have at most 3 locally connected edges. In 500 this scenario the Signaling Gateway A has chosen to reserve edges at 501 the top of its tree in order to connect to external trees. 503 Signaling Gateway A advertises its state to the BGP network by 504 generating a C-multicast route containing a Multicast Edge Discovery 505 attribute with the next-hops (a1, a1, a2, a3). Although signaling 506 gateway A may have many other VPN forwarders that are receivers of 507 the specified group this information is not propagated through BGP. 509 Each of the next-hops in the list has an incoming label that is 510 currently in use. The Edge Discovery attribute contains a interval 511 of free (unused labels) that is valid for each of the next-hops. 513 In this example, we assume that the Signaling Gateway B was elected 514 as the tree manager for the higher level tree. At this stage, the 515 tree manager has the following state: 517 +-----------+------------------------------------+ 518 | Router-Id | Edges | 519 +-----------+------------------------------------+ 520 | A | (a1, 0), (a1, 0), (a2, 0), (a3, 0) | 521 | B | (b1, 0) | 522 | C | (c1, 0), (c2, 0), (c3, 0) | 523 | D | (d1, 0), (d2, 0), (d3, 0) | 524 +-----------+------------------------------------+ 526 In the table above, each pair represents the IP address an assigned 527 incoming label of a VPN forwarder. 529 In this example all the signaling gateways decided to advertise less 530 than K+1 edges. 532 One possible assignment is to make node A's tree the root of the top- 533 level distribution tree. This can be accomplished by creating the 534 edges (a1, b1), (a1, c1), (a2, d1). The tree manager must allocate a 535 label for each of the next-hops from their respective label space. 537 As a result of this tree assignment, the multicast tree manager (B) 538 generates the following Multicast Tree Route Type updates: 540 +-----------------+-------------------------------------------------+ 541 | Router-Id | Edges | 542 +-----------------+-------------------------------------------------+ 543 | A | (a1, b1, 10000, 20000), (a1, c1, 10000, 21000), | 544 | | (a2, d1, 11000, 22000) | 545 | C | (c1, a1, 21000, 10000) | 546 | D | (d1, a2, 22000, 11000) | 547 +-----------------+-------------------------------------------------+ 549 When A, C and D receive their respective routing updates they will 550 generate the corresponding XMPP event notification messages to the 551 affected VPN forwarders. In A's case this implies updating the state 552 of a1 and a2. a1 forwarding table now has a new incoming label 553 (10000) and the next-hops (b1, a2, a3, c1). 555 5. BGP Protocol Extensions 556 This document defines an additional Route Type for the MCAST-VPN NLRI 557 [RFC6514], called Multicast Tree Route Type. 559 5.1. Multicast Tree Route Type 561 Multicast Tree Routes are used by multicast tree managers to 562 advertise a acyclic graph topology of nodes which themselves may 563 consist of multicast distribution trees. A BGP UPDATE containing a 564 Multicast Tree Route as part of the MP_REACH Path Attribute MUST also 565 contain the Multicast Edge Forwarding Attribute. 567 A Multicast Tree Route is encoded as a MCAST-VPN NLRI with Route Type 568 8 and consists of the following: 570 +--------------------------------------+ 571 | RD (8 octets) | 572 +--------------------------------------+ 573 | Router-Id (4 octects) | 574 +--------------------------------------+ 575 | Multicast Source Length (1 octect) | 576 +--------------------------------------+ 577 | Multicast Source (variable) | 578 +--------------------------------------+ 579 | Multicast Group Length (1 octect) | 580 +--------------------------------------+ 581 | Multicast Group (variable) | 582 +--------------------------------------+ 584 The Route Distinguisher (RD) is encoded as described in [RFC4364]. 586 The Router-Id field identifies the multicast tree node for which the 587 edges are being advertised. 589 The Multicast Source and Group fields specify the multicast group for 590 which the multicast forwarding state is being advertised. 592 5.2. Multicast Edge Discovery Attribute 594 The Multicast Edge Discovery Path Attribute is associated with 595 C-Multicast routes and contains one or more next-hop information 596 elements where each information element follows the model described 597 bellow: 599 +------------------------------+ 600 | IP addr Length (1 octect) | 601 +------------------------------+ 602 | IP address (variable) | 603 +------------------------------+ 604 |Label Range Length (1 octect) | 605 +------------------------------+ 606 | Start Label (4 octects) | 607 +------------------------------+ 608 | End Label (4 octects) 609 +------------------------------+ 610 | ... | 611 +------------------------------+ 612 | Start Label (4 octects) | 613 +------------------------------+ 614 | End Label (4 octects) | 615 +------------------------------+ 617 Each 'Next-hop' information element identifies an incoming edge that 618 can be used to connect Signaling Gateway locally managed replication 619 tree with other replication trees for the same group. The 'IP 620 address' value corresponds to the IP address of VPN Forwarder that is 621 a member of the local tree. 623 The same VPN Forwarder address can appear multiple times in the 624 Discovery Path Attribute. Signaling Gateways advertise up-to K + 1 625 Next-hop elements. 627 Each attribute specifies one or more contiguous label ranges 628 available for assignment at the specified VPN Forwarder. If the VPN 629 forwarder appears multiple times in the list, the label range 630 advertisements SHOULD be the same. 632 The Signaling Gateway SHALL ensure that the number of locally 633 assigned edges on a VPN forwarder plus the number of Next-hop 634 information elements that refer to that VPN forwarder do not exceed K 635 + 1. 637 5.3. Multicast Edge Forwarding Attribute 639 The Multicast Edge Forwarding Path Attribute is associated with 640 Multicast Tree Route Type NLRI routes and contains one or more edge 641 information elements where each information element follows the model 642 described bellow: 644 +------------------------------+ 645 | Next-hop Length (1 octect) | 646 +------------------------------+ 647 | Inbound Node (variable) | 648 +------------------------------+ 649 | Inbound Label (4 octects) | 650 +------------------------------+ 651 | Outbound Node (variable) | 652 +------------------------------+ 653 | Outbound Label (4 octects) | 654 +------------------------------+ 656 Each edge element contained in this list contains the address on an 657 inbound node that has been advertised via the Edge Discovery 658 attribute as well as a label assigned from the respective interval. 659 The outbound node address and label connect specify the destination 660 node for this edge. 662 6. Security Considerations 664 It is helpful to differentiate between the control plane and data 665 plane security aspects of the solution. 667 The control plane assumes that XMPP sessions between VPN forwarders 668 and Signaling Gateway are authenticated such that the Signaling 669 Gateway is able to verify the identity of the VPN Forwarder. 671 BGP sessions between Signaling Gateways should also be subject to 672 authentication. 674 At the data-plane, it is important to note that a compromised VPN 675 forwarder is able to modify message that traverse through it. 677 7. References 679 7.1. Normative References 681 [RFC3376] Cain, B., Deering, S., Kouvelas, I., Fenner, B. and A. 682 Thyagarajan, "Internet Group Management Protocol, Version 683 3", RFC 3376, October 2002. 685 [RFC3810] Vida, R. and L. Costa, "Multicast Listener Discovery 686 Version 2 (MLDv2) for IPv6", RFC 3810, June 2004. 688 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 689 Networks (VPNs)", RFC 4364, February 2006. 691 [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for 692 IP", RFC 4607, August 2006. 694 [RFC6514] Aggarwal, R., Rosen, E., Morin, T. and Y. Rekhter, "BGP 695 Encodings and Procedures for Multicast in MPLS/BGP IP 696 VPNs", RFC 6514, February 2012. 698 7.2. Informational References 700 [I-D.marques-l3vpn-end-system] 701 Marques, P., Fang, L., Pan, P., Shukla, A., Napierala, M. 702 and N. Bitar, "BGP-signaled end-system IP/VPNs.", 703 Internet-Draft draft-marques-l3vpn-end-system-05, March 704 2012. 706 [clos] "A study of non-blocking switching networks", Bell System 707 Technical Journal 32, March 1953. 709 [prim] Prim, R.C., "Shortest connection networks and some 710 generalizations", Bell System Technical Journal 36, 1957. 712 Authors' Addresses 714 Pedro Marques 715 Contrail Systems 716 440 N. Wolfe Rd. 717 Sunnyvale, CA 94085 719 Email: roque@contrailsystems.com 721 Luyuan Fang 722 Cisco Systems 723 111 Wood Avenue South 724 Iselin, NJ 08830 726 Email: lufang@cisco.com 728 Derick Winkworth 729 FIS 731 Email: derick.winkworth@fisglobal.com 733 Yiqun Cai 734 Microsoft Corporation 735 1065 La Avenida 736 Mountain View, CA 94043 738 Email: yiqunc@microsoft.com 739 Petr Lapukhov 740 Microsoft Corporation 742 Email: petrlapu@microsoft.com