idnits 2.17.1 

draft-rosen-vpn-mcast-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 931.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 904.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 911.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 917.

  ** The document claims conformance with section 10 of RFC 2026, but uses
     some RFC 3978/3979 boilerplate.  As RFC 3978/3979 replaces section 10 of
     RFC 2026, you should not claim conformance with it if you have changed to
     using RFC 3978/3979 boilerplate.

  ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
     Acknowledgement. 

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** The document seems to lack an RFC 3978 Section 5.4 Reference to BCP 78
     -- however, there's a paragraph with a matching beginning. Boilerplate
     error?

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 2004) is 7286 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'PIM-RPF-Proxy' is mentioned on line 414, but not
     defined

  == Unused Reference: 'MT-DISC' is defined on line 833, but no explicit
     reference was found in the text

  == Unused Reference: 'PIM-RPF-PROXY' is defined on line 840, but no
     explicit reference was found in the text

  == Unused Reference: 'BIDIR' is defined on line 853, but no explicit
     reference was found in the text

  == Unused Reference: 'GRE1701' is defined on line 860, but no explicit
     reference was found in the text

  == Unused Reference: 'SSM' is defined on line 872, but no explicit
     reference was found in the text

  -- No information found for draft-nalawade-mdt-safi - is the name correct?

  -- Possible downref: Normative reference to a draft: ref. 'MDT-SAFI' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MT-DISC'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PIMv2'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PIM-RPF-PROXY'

  ** Obsolete normative reference: RFC 2547 (Obsoleted by RFC 4364)

  == Outdated reference: A later version (-07) exists of
     draft-ietf-ssm-arch-04


     Summary: 9 errors (**), 0 flaws (~~), 9 warnings (==), 11 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                             Eric C. Rosen (Editor)
3	Internet Draft                                        Yiqun Cai (Editor)
4	Expiration Date: November 2004                         IJsbrand Wijnands
5	                                                     Cisco Systems, Inc.

7	                                                                May 2004

9	                     Multicast in MPLS/BGP IP VPNs

11	                      draft-rosen-vpn-mcast-07.txt

13	Status of this Memo

15	   This document is an Internet-Draft and is in full conformance with
16	   all provisions of Section 10 of RFC2026.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	Abstract

36	   In order for IP multicast traffic within a BGP/MPLS IP VPN (Virtual
37	   Private Network) to travel from one VPN site to another, special
38	   protocols and procedures must be implemented by the VPN Service
39	   Provider.  These protocols and procedures are specified in this
40	   document.

42	Table of Contents

44	    1          Specification of requirements  ......................   3
45	    2          Introduction  .......................................   3
46	    2.1        Scaling Multicast State Info. in the Network Core  ..   3
47	    2.2        Overview  ...........................................   4
48	    3          Multicast VRFs  .....................................   5
49	    4          Multicast Domains  ..................................   6
50	    4.1        Model of Operation  .................................   6
51	    5          Multicast Tunnels  ..................................   7
52	    5.1        Ingress PEs  ........................................   7
53	    5.2        Egress PEs  .........................................   7
54	    5.3        Tunnel Destination Address(es)  .....................   8
55	    5.4        Auto-Discovery  .....................................   8
56	    5.5        Which PIM Variant to Use  ...........................   9
57	    5.6        Inter-AS MDT Construction  ..........................   9
58	    5.7        Encapsulation  ......................................  10
59	    5.7.1      Encapsulation in GRE  ...............................  10
60	    5.7.2      Encapsulation in IP  ................................  11
61	    5.7.3      Encapsulation in MPLS  ..............................  11
62	    5.7.4      Interoperability  ...................................  11
63	    5.8        MTU  ................................................  12
64	    5.9        TTL  ................................................  12
65	    5.10       Differentiated Services  ............................  12
66	    5.11       Avoiding Conflict with Internet Multicast  ..........  12
67	    6          The PIM C-Instance and the MT  ......................  13
68	    6.1        PIM C-Instance Control Packets  .....................  13
69	    6.2        PIM C-instance RPF Determination  ...................  13
70	    7          Data MDT: Optimizing flooding  ......................  14
71	    7.1        Limitation of Multicast Domain  .....................  14
72	    7.2        Signaling Data MDT Trees  ...........................  15
73	    7.3        Use of SSM for Data MDTs  ...........................  16
74	    8          Packet Formats and Constants  .......................  17
75	    8.1        MDT TLV  ............................................  17
76	    8.2        MDT Join TLV  .......................................  17
77	    8.3        Constants  ..........................................  18
78	    9          Acknowledgments  ....................................  19
79	   10          Normative References  ...............................  19
80	   11          Informative References  .............................  20
81	   12          Authors' Addresses  .................................  20
82	   13          Intellectual Property Statement  ....................  21
83	   14          Full Copyright Statement  ...........................  21

85	1. Specification of requirements

87	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
88	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
89	   document are to be interpreted as described in [RFC2119].

91	2. Introduction

93	   The base specification for BGP/MPLS IP VPNs [RFC2547bis] does not
94	   provide a way for IP multicast data or control traffic to travel from
95	   one VPN site to another.  This document extends that specification by
96	   specifying the necessary protocols and procedures for support of IP
97	   multicast.  Only IPv4 multicast is considered in this specification.

99	   This specification presupposes that:

101	      1. PIM [PIMv2] is the multicast routing protocol used within the
102	         VPN,

104	      2. PIM is also the multicast routing protocol used within the SP
105	         network, and

107	      3. the SP network supports native IP multicast forwarding.

109	   Familiarity with the terminology and procedures of [RFC2547bis] is
110	   presupposed.  Familiarity with [PIMv2] is also presupposed.

112	2.1. Scaling Multicast State Info. in the Network Core

114	   The BGP/MPLS IP VPN service of [RFC2547bis] provides a VPN with
115	   "optimal" unicast routing through the SP backbone, in that a packet
116	   follows the "shortest path" across the backbone, as determined by the
117	   backbone's own routing algorithm.  This optimal routing is provided
118	   without requiring the P routers to maintain any routing information
119	   which is specific to a VPN; indeed, the P routers do not maintain any
120	   per-VPN state at all.

122	   Unfortunately, optimal MULTICAST routing cannot be provided without
123	   requiring the P routers to maintain some VPN-specific state
124	   information.  Optimal multicast routing would require that one or
125	   more multicast distribution trees be created in the backbone for each
126	   multicast group that is in use.  If a particular multicast group from
127	   within a VPN is using source-based distribution trees, optimal
128	   routing requires that there be one distribution tree for each
129	   transmitter of that group. If shared trees are being used, one tree
130	   for each group is still required.  Each such tree requires state in
131	   some set of the P routers, with the amount of state being
132	   proportional to the number of multicast transmitters.  The reason
133	   there needs to be at least one distribution tree per multicast group
134	   is that each group may have a different set of receivers; multicast
135	   routing algorithms generally go to great lengths to ensure that a
136	   multicast packet will not be sent to a node which is not on the path
137	   to a receiver.

139	   Given that an SP generally supports many VPNs, where each VPN may
140	   have many multicast groups, and each multicast group may have many
141	   transmitters, it is not scalable to have one or more distribution
142	   trees for each multicast group.  The SP has no control whatsoever
143	   over the number of multicast groups and transmitters that exist in
144	   the VPNs, and it is difficult to place any bound on these numbers.

146	   In order to have a scalable multicast solution for MPLS/BGP IP VPNs,
147	   the amount of state maintained by the P routers needs to be
148	   proportional to something which IS under the control of the SP.  This
149	   specification describes such a solution.  In this solution, the
150	   amount of state maintained in the P routers is proportional only to
151	   the number of VPNs which run over the backbone; the amount of state
152	   in the P routers is NOT sensitive to the number of multicast groups
153	   or to the number of multicast transmitters within the VPNS.  To
154	   achieve this scalability, the optimality of the multicast routes is
155	   reduced.  A PE which is not on the path to any receiver of a
156	   particular multicast group may still receive multicast packets for
157	   that group, and if so, will have to discard them.  The SP does
158	   however have control over the tradeoff between optimal routing and
159	   scalability.

161	2.2. Overview

163	   An SP determines whether a particular VPN is multicast-enabled.  If
164	   it is, it corresponds to a "Multicast Domain".  A PE which attaches
165	   to a particular multicast-enabled VPN is said to belong to the
166	   corresponding Multicast Domain.  For each Multicast Domain, there is
167	   a default "Multicast Distribution Tree (MDT)" through the backbone,
168	   connecting ALL of the PEs that belong to that Multicast Domain.  A
169	   given PE may be in as many Multicast Domains as there are VPNs
170	   attached to that PE.  However, each Multicast Domain has its own MDT.
171	   The MDTs are created by running PIM in the backbone, and in general
172	   an MDT also includes P routers on the paths between the PE routers.

174	   In a departure from the usual multicast tree distribution procedures,
175	   the Default MDT for a Multicast Domain is constructed automatically
176	   as the PEs in the domain come up.  Construction of the Default MDT
177	   does not depend on the existence of multicast traffic in the domain;
178	   it will exist before any such multicast traffic is seen.

180	   In BGP/IP MPLS VPNs, each CE router is a unicast routing adjacency of
181	   a PE router, but CE routers at different sites do NOT become unicast
182	   routing adjacencies of each other.  This important characteristic is
183	   retained for multicast routing -- a CE router becomes a PIM adjacency
184	   of a PE router, but CE routers at different sites do NOT become PIM
185	   adjacencies of each other.  Multicast packets from within a VPN are
186	   received from a CE router by an ingress PE router.  The ingress PE
187	   encapsulates the multicast packets and (initially) forwards them
188	   along the Default MDT tree to all the PE routers connected to sites
189	   of the given VPN.  Every PE router attached to a site of the given
190	   VPN thus receives all multicast packets from within that VPN.  If a
191	   particular PE routers is not on the path to any receiver of that
192	   multicast group, the PE simply discards that packet.

194	   If a large amount of traffic is being sent to a particular multicast
195	   group, but that group does not have receivers at all the VPN sites,
196	   it can be wasteful to forward that group's traffic along the Default
197	   MDT.  Therefore, we also specify a method for establishing individual
198	   MDTs for specific multicast groups.  We call these "Data MDTs".  A
199	   Data MDT delivers VPN data traffic for a particular multicast group
200	   only to those PE routers which are on the path to receivers of that
201	   multicast group.  Using a Data MDT has the benefit of reducing the
202	   amount of multicast traffic on the backbone, as well reducing the
203	   load on some of the PEs; it has the disadvantage of increasing the
204	   amount of state that must be maintained by the P routers.  The SP has
205	   complete control over this tradeoff.

207	   This solution requires the SP to deploy appropriate protocols and
208	   procedures, but is transparent to the SP's customers.  An enterprise
209	   which uses PIM-based multicasting in its network can migrate from a
210	   private network to a BGP/MPLS IP VPN service, while continuing to use
211	   whatever multicast router configurations it was previously using; no
212	   changes need be made to CE routers or to other routers at customer
213	   sites.  For instance, any dynamic RP-discovery procedures that area
214	   already in use may be left in place.

216	3. Multicast VRFs

218	   The notion of a "VRF", defined in [RFC2547bis], is extended to
219	   include multicast routing entries as well as unicast routing entries.

221	   Each VRF has its own multicast routing table.  When a multicast data
222	   or control packet is received from a particular CE device, multicast
223	   routing is done in the associated VRF.

225	   Each PE router runs a number of instances of PIM-SM, as many as one
226	   per VRF.  In each instance of PIM-SM, the PE maintains a PIM
227	   adjacency with each of the PIM-capable CE routers associated with
228	   that VRF.  The multicast routing table created by each instance is
229	   specific to the corresponding VRF.  We will refer to these PIM
230	   instances as "VPN-specific PIM instances", or "PIM C-instances".

232	   Each PE router also runs a "provider-wide" instance of PIM-SM (a "PIM
233	   P-instance"), in which it has a PIM adjacency with each of its IGP
234	   neighbors (i.e., with P routers), but NOT with any CE routers, and
235	   not with other PE routers (unless they happen to be adjacent in the
236	   SP's network).  The P routers also run the P-instance of PIM, but do
237	   NOT run a C-instance.

239	   In order to help clarify when we are speaking of the PIM P-instance
240	   and when we are speaking of a a PIM C-instance, we will also apply
241	   the prefixes "P-" and "C-" respectively to control messages,
242	   addresses, etc.  Thus a P-Join would be a PIM Join which is processed
243	   by the PIM P-instance, and a C-Join would be a PIM Join which is
244	   processed by a C-instance.  A P-group address would be a group
245	   address in the SP's address space, and a C-group address would be a
246	   group address in a VPN's address space.

248	4. Multicast Domains

250	4.1. Model of Operation

252	   A "Multicast Domain (MD)" is essentially a set of VRFs associated
253	   with interfaces that can send multicast traffic to each other.  From
254	   the standpoint of PIM C-instance, a multicast domain is equivalent to
255	   a multi-access interface.  The PE routers in a given MD become PIM
256	   adjacencies of each other in the PIM C-instance.

258	   Each multicast VRF is assigned to one MD.  Each MD is configured with
259	   a distinct, multicast P-group address, called the "Default MDT group
260	   address".  This address is used to build the Default MDT for the MD.

262	   When a PE router needs to send PIM C-instance control traffic to the
263	   other PE routers in the MD, it encapsulates the control traffic, with
264	   its own address as source IP address and the Default MDT group
265	   address as destination IP address.  Note that the Default MDT is part
266	   of P-instance of PIM, whereas the PEs that communicate over the
267	   Default MDT are PIM adjacencies in a C-instance.  Within the C-
268	   instance, the Default MDT appears to be a multi-access network to
269	   which all the PEs are attached.  This is discussed in more detail in
270	   section 5.

272	   The Default MDT does not only carry the PIM control traffic of the
273	   MD's PIM C-instance.  It also, by default, carries the multicast data
274	   traffic of the C-instance.  In some cases though, multicast data
275	   traffic in a particular MD will be sent on a Data MDT rather than on
276	   the Default MDT The use of Data MDTs is described in section 7.

278	   Note that, if an MDT (Default or Data) is set up using PIM-SM or
279	   Bidirectional PIM, each MDT (Default or Data) must have a P-group
280	   address which is "globally unique" (more precisely, unique over the
281	   set of SP networks carrying the multicast traffic of the
282	   corresponding MD).  If PIM-SSM is used, the P-group address of an MDT
283	   only needs to be unique relative to the source of the MDT (though see
284	   section 5.4).

286	5. Multicast Tunnels

288	   An MD can be thought of as a set of PE routers connected by a
289	   "multicast tunnel (MT)".  From the perspective of a VPN-specific PIM
290	   instance, an MT is a single multi-access interface.  In the SP
291	   network, a single MT is realized as a Default MDT combined with zero
292	   or more Data MDTs.

294	5.1. Ingress PEs

296	   An ingress PE is a PE router that is either directly connected to the
297	   multicast sender in the VPN, or via a CE router.  When the multicast
298	   sender starts transmitting, and if there are receivers (or PIM RP)
299	   behind other PE routers in the common MD, the ingress PE becomes the
300	   transmitter of either the Default MDT group or a Data MDT group in
301	   the SP network.

303	5.2. Egress PEs

305	   A PE router with a VRF configured in an MD becomes a receiver of the
306	   Default MDT group for that MD.  A PE router may also join a Data MDT
307	   group if if it has a VPN-specific PIM instance in which it is
308	   forwarding to one of its attached sites traffic for a particular C-
309	   group, and that particular C-group has been associated with that
310	   particular Data MDT.  When a PE router joins any P-group used for
311	   encapsulating VPN multicast traffic, the PE router becomes one of the
312	   endpoints of the corresponding MT.

314	   When a packet is received from an MT, the receiving PE derives the MD
315	   from the destination address which is a P-group address of the the
316	   packet received.  The packet is then passed to the corresponding
317	   Multicast VRF and VPN-specific PIM instance for further processing.

319	5.3. Tunnel Destination Address(es)

321	   An MT is an IP tunnel for which the destination address is a P-group
322	   address.  However an MT is not limited to using only one P-group
323	   address for encapsulation.  Based on the payload VPN multicast
324	   traffic, it can choose to use the Default MDT group address, or one
325	   of the Data MDT group addresses (as described in section 7 of this
326	   document), allowing the MT to reach a different set of PE routers in
327	   the common MD.

329	5.4. Auto-Discovery

331	   Any of the variants of PIM may be used to set up the Default MDT:
332	   PIM-SM, Bidirectional PIM, or PIM-SSM.  Except in the case of PIM-
333	   SSM, the PEs need only know the proper P-group address in order to
334	   begin setting up the Default MDTs.  The PEs will then discover each
335	   others' addresses by virtue of receiving PIM control traffic, e.g.,
336	   PIM Hellos, sourced (and encapsulated) by each other.

338	   However, in the case of PIM-SSM, the necessary MDTs for an MD cannot
339	   be set up until each PE in the MD knows the source address of each of
340	   the other PEs in that same MD.  This information needs to be auto-
341	   discovered.

343	   In [MDT-SAFI], a new BGP Address Family is defined.  The NLRI for
344	   this address family consists of an RD, an IPv4 unicast address, and
345	   an multicast group address.  A given PE router in a given MD
346	   constructs an NLRI in this family from:

348	     - Its own IPv4 address.  If it has several, it uses the one which
349	       it will be placing in the IP source address field of multicast
350	       packets that it will be sending over the MDT.

352	     - An RD which has been assigned to the MD.

354	     - The P-group address which is to be used as the IP destination
355	       address field of multicast packets that will be sent over the
356	       MDT.

358	   When a PE distributes this NLRI via BGP, it may include a Route
359	   Target Extended Communities attribute.  This RT must be an "Import
360	   RT" [RFC2547bis] of each VRF in the MD.  The ordinary BGP
361	   distribution procedures used by [RFC2547bis] will then ensure that
362	   each PE learns the MDT-SAFI "address" of each of the other PEs in the
363	   MD, and that the learned MDT-SAFI addresses get associated with the
364	   right VRFs.

366	   If a PE receives an MDT-SAFI NLRI which does not have an RT
367	   attribute, the P-group address from the NLRI has to be used to
368	   associate the NLRI with a particular VRF.  In this case, each
369	   multicast domain must be associated with a unique P-address, even if
370	   PIM-SSM is used.  However, finding a unique P-address for a multi-
371	   provider multicast group may be difficult.

373	   In order to facilitate the deployment of multi-provider multicast
374	   domains, this specification REQUIRES the use of the MDT-SAFI NLRI
375	   (even if PIM-SSM is not used to set up the default MDT).  This
376	   specification also REQUIRES that an implementation be capable of
377	   using PIM-SSM to set up the default MDT.

379	5.5. Which PIM Variant to Use

381	   To minimize the amount of multicast routing state maintained by the P
382	   routers, the Default MDTs should be realized as shared trees, such as
383	   PIM Bidirectional trees.  However, the operational procedures for
384	   assigning P-group addresses may be greatly simplified, especially in
385	   the case of multi-provider MDs, if PIM-SSM is used.

387	   Data MDTs are best realized as source trees, constructed via PIM-SSM.

389	5.6. Inter-AS MDT Construction

391	   Standard PIM techniques for the construction of source trees
392	   presuppose that every router has a route to the source of the tree.
393	   However, if the source of the tree is in a different AS than a
394	   particular P router, it is possible that the P router will not have a
395	   route to the source.  For example, the remote AS may be using BGP to
396	   distribute a route to the source, but a particular P router may be
397	   part of a "BGP-free core", in which the P routers are not aware of
398	   BGP-distributed routes.

400	   What is needed in this case is a way for a PE to tell PIM to
401	   construct the tree through a particular BGP speaker, the "BGP next
402	   hop" for the tree source.  This can be accomplished with a PIM
403	   extension.

405	   If the PE has selected the source of the tree from the MDT SAFI
406	   address family, then it may be desirable to build the tree along the
407	   route to the MDT SAFI address, rather than along the route to the
408	   corresponding IPv4 address.  This enables the inter-AS portion of the
409	   tree to follow a path which is specifically chosen for multicast
410	   (i.e., it allows the inter-AS multicast topology to be "non-
411	   congruent" to the inter-AS unicast topology).  This too requires a
412	   PIM extension.

414	   The necessary PIM extension is described in [PIM-RPF-Proxy].

416	5.7. Encapsulation

418	5.7.1. Encapsulation in GRE

420	   GRE encapsulation is recommended when sending multicast traffic
421	   through an MDT.  The following diagram shows the progression of the
422	   packet as it enters and leaves the service provider network.

424	   Packets received        Packets in transit      Packets forwarded
425	   at ingress PE           in the service          by egress PEs
426	                           provider network

428	                           +---------------+
429	                           |  P-IP Header  |
430	                           +---------------+
431	                           |      GRE      |
432	   ++=============++       ++=============++       ++=============++
433	   || C-IP Header ||       || C-IP Header ||       || C-IP Header ||
434	   ++=============++ >>>>> ++=============++ >>>>> ++=============++
435	   || C-Payload   ||       || C-Payload   ||       || C-Payload   ||
436	   ++=============++       ++=============++       ++=============++

438	   The IPv4 Protocol Number field in the P-IP Header must be set to 47.
439	   The Protocol Type field of the GRE Header must be set to 0x800.

441	   [GRE2784] specifies an optional GRE checksum, and [GRE2890] specifies
442	   optional GRE key and sequence number fields.

444	   The GRE key field is not needed because the P-group address in the
445	   delivery IP header already identifies the MD, and thus associates the
446	   VRF context, for the payload packet to be further processed.

448	   The GRE sequence number field is also not needed because the
449	   transport layer services for the original application will be
450	   provided by the C-IP Header.

452	   The use of GRE checksum field must follow [GRE2784].

454	   To facilitate high speed implementation, this document recommends
455	   that the ingress PE routers encapsulate VPN packets without setting
456	   the checksum, key or sequence field.

458	5.7.2. Encapsulation in IP

460	   IP-in-IP [IPIP1853] is also a viable option.  When it is used, the
461	   IPv4 Protocol Number field is set to 4. The following diagram shows
462	   the progression of the packet as it enters and leaves the service
463	   provider network.

465	   Packets received        Packets in transit      Packets forwarded
466	   at ingress PE           in the service          by egress PEs
467	                           provider network

469	                           +---------------+
470	                           |  P-IP Header  |
471	   ++=============++       ++=============++       ++=============++
472	   || C-IP Header ||       || C-IP Header ||       || C-IP Header ||
473	   ++=============++ >>>>> ++=============++ >>>>> ++=============++
474	   || C-Payload   ||       || C-Payload   ||       || C-Payload   ||
475	   ++=============++       ++=============++       ++=============++

477	5.7.3. Encapsulation in MPLS

479	   An SP may choose MPLS encapsulation if a method described in [PIM-
480	   MPLS] is deployed.  The specification of the encapsulation as well as
481	   the forwarding behavior of the PE routers, is out of the scope for
482	   this document.

484	5.7.4. Interoperability

486	   PE routers in a common MD must agree on the method of encapsulation.
487	   This can be achieved either via configuration or means of some
488	   discovery protocols.  To help reduce configuration overhead and
489	   improve multi-vendor interoperability, it is strongly recommended
490	   that GRE encapsulation must be supported and enabled by default.

492	5.8. MTU

494	   Because multicast group addresses are used as tunnel destination
495	   addresses, existing Path MTU discovery mechanisms can not be used.
496	   This requires that:

498	      1. The ingress PE router (one that does the encapsulation) must
499	         not set the DF bit in the outer header, and

501	      2. If the "DF" bit is cleared in the IP header of the C-Packet,
502	         fragment the C-Packet before encapsulation if appropriate.
503	         This is very important in practice due to the fact that the
504	         performance of reassembly function is significantly lower than
505	         that of decapsulating and forwarding packets on today's router
506	         implementations.

508	5.9. TTL

510	   The ingress PE should not copy the TTL field from the payload IP
511	   header received from a CE router to the delivery IP header.  The
512	   setting the TTL of the deliver IP header is determined by the local
513	   policy of the ingress PE router.

515	5.10. Differentiated Services

517	   By default, the setting of the DS field in the delivery IP header
518	   should follow the guidelines outlined in [DIFF2983].  An SP may also
519	   choose to deploy any of the additional mechanisms the PE routers
520	   support.

522	5.11. Avoiding Conflict with Internet Multicast

524	   If the SP is providing Internet multicast, distinct from its VPN
525	   multicast services, it must ensure that the P-group addresses which
526	   correspond to its MDs are distinct from any of the group addresses of
527	   the Internet multicasts it supports.  This is best done by using
528	   administratively scoped addresses [ADMIN-ADDR].

530	   The C-group addresses need not be distinct from either the P-group
531	   addresses or the Internet multicast addresses.

533	6. The PIM C-Instance and the MT

535	   If a particular VRF is in a particular MD, the corresponding MT is
536	   treated by that VRF's VPN-specific PIM instances as a LAN interface.
537	   The PEs which are adjacent on the MT must execute the PIM LAN
538	   procedures, including the generation and processing of PIM Hello,
539	   Join/Prune, Assert, DF election and other PIM control packets.

541	6.1. PIM C-Instance Control Packets

543	   The PIM protocol packets are sent to ALL-PIM-ROUTERS (224.0.0.13) in
544	   the context of that VRF, but when in transit in the provider network,
545	   they are encapsulated using the Default MDT group configured for that
546	   MD.  This allows VPN-specific PIM routes to be extended from site to
547	   site without appearing in the P routers.

549	6.2. PIM C-instance RPF Determination

551	   Although the MT is treated as a PIM-enabled interface, unicast
552	   routing is NOT run over it, and there are no unicast routing
553	   adjacencies over it.  It is therefore necessary to specify special
554	   procedures for determining when the MT is to be regarded as the "RPF
555	   Interface" for a particular C-address.

557	   When a PE needs to determine the RPF interface of a particular C-
558	   address, it looks up the C-address in the VRF. If the route matching
559	   it is not a VPN-IP route learned from MP-BGP as described in
560	   [RFC2547bis], or if that route's outgoing interface is one of the
561	   interfaces associated with the VRF, then ordinary PIM procedures for
562	   determining the RPF interface apply.

564	   However, if the route matching the C-address is a VPN-IP route whose
565	   outgoing interface is not one of the interfaces associated with the
566	   VRF, then PIM will consider the outgoing interface to be the MT
567	   associated with the VPN-specific PIM instance.

569	   Once PIM has determined that the RPF interface for a particular C-
570	   address is the MT, it is necessary for PIM to determine the RPF
571	   neighbor for that C-address.  This will be one of the other PEs that
572	   is a PIM adjacency over the MT.

574	   In [MDT-SAFI], the BGP "Connector" attribute is defined.  Whenever a
575	   PE router distributes a VPN-IPv4 address from a VRF that is part of
576	   an MD, it SHOULD distribute a Connector attribute along with it.  The
577	   Connector attribute should specify the MDT address family, and its
578	   value should be the IP address which the PE router is using as its
579	   source IP address for multicast packets which encapsulated and sent
580	   over the MT.  Then when a PE has determined that the RPF interface
581	   for a particular C-address is the MT, it must look up the Connector
582	   attribute that was distributed along with the VPN-IPv4 address
583	   corresponding to that C-address.  The value of this Connector
584	   attribute will be considered to be the RPF adjacency for the C-
585	   address.

587	   If a Connector attribute is not present, but the "BGP Next Hop" for
588	   the C-address is one of the PEs that is a PIM adjacency, then that PE
589	   should be treated as the RPF adjacency for that C-address.  However,
590	   if the MD spans multiple Autonomous Systems, the BGP Next Hop might
591	   not be a PIM adjacency, and the RPF check will not succeed unless the
592	   Connector attribute is used.

594	7. Data MDT: Optimizing flooding

596	7.1. Limitation of Multicast Domain

598	   While the procedure specified in the previous section requires the P
599	   routers to maintain multicast state, the amount of state is bounded
600	   by the number of supported VPNs.  The P routers do NOT run any VPN-
601	   specific PIM instances.

603	   In particular, the use of a single bidirectional tree per VPN scales
604	   well as the number of transmitters and receivers increases, but not
605	   so well as the amount of multicast traffic per VPN increases.

607	   The multicast routing provided by this scheme is not optimal, in that
608	   a packet of a particular multicast group may be forwarded to PE
609	   routers which have no downstream receivers for that group, and hence
610	   which may need to discard the packet.

612	   In the simplest configuration model, only the Default MDT group is
613	   configured for each MD.  The result of the configuration is that all
614	   VPN multicast traffic, control or data, will be encapsulated and
615	   forwarded to all PE routers that are part of the MD.  While this
616	   limits the number of multicast routing states the provider network
617	   has to maintain, it also requires PE routers to discard multicast C-
618	   packets if there are not receivers for those packets in the
619	   corresponding sites.  In some cases, especially when the content
620	   involves high bandwidth but only a limited set of receivers, it is
621	   desirable that certain C-packets only travel to PE routers that do
622	   have receivers in the VPN to save bandwidth in the network and reduce
623	   load on the PE routers.

625	7.2. Signaling Data MDT Trees

627	   A simple protocol is proposed to signal additional P-group addresses
628	   to encapsulate VPN traffic.  These P-group addresses are called data
629	   MDT groups.  The ingress PE router advertises a different P-group
630	   address (as opposed to always using the Default MDT group) to
631	   encapsulate VPN multicast traffic.  Only the PE routers on the path
632	   to eventual receivers join the P-group, and therefore form an optimal
633	   multicast distribution tree in the service provider network for the
634	   VPN multicast traffic.  These multicast distribution trees are called
635	   Data MDT trees because they do not carry PIM control packets
636	   exchanged by PE routers.

638	   The following documents the procedures of the initiation and teardown
639	   of the Data MDT trees.  The definition of the constants and timers
640	   can be found in section 8.

642	     - The PE router connected to the source of the content initially
643	       uses the Default MDT group when forwarding the content to the MD.

645	     - When one or more pre-configured conditions are met, it starts to
646	       periodically announce MDT Join TLV at the interval of
647	       [MDT_INTERVAL].  The MDT Join TLV is forwarded to all the PE
648	       routers in the MD.

650	       If a PE in a particular MD transmits a C-multicast data packet to
651	       the backbone, by transmitting it through an MD, every other PE in
652	       that MD will receive it. Any of those PEs which are not on a C-
653	       multicast distribution tree for the packet's C-multicast
654	       destination address (as determined by applying ordinary PIM
655	       procedures to the corresponding multicast VRF) will have to
656	       discard the packet.

658	       A commonly used condition is the bandwidth.  When the VPN traffic
659	       exceeds certain threshold, it is more desirable to deliver the
660	       flow to the PE routers connected to receivers in order to
661	       optimize the performance of PE routers and the resource of the
662	       provider network.  However, other conditions can also be devised
663	       and they are purely implementation specific.

665	     - The MDT Join TLV is encapsulated in UDP and the packet is
666	       addressed to ALL-PIM-ROUTERS (224.0.0.13) in the context of the
667	       VRF and encapsulated using the Default MDT group when sent to the
668	       MD.  This allows all PE routers to receive the information.

670	     - Upon receiving MDT Join TLV, PE routers connected to receivers
671	       will join the Data MDT group announced by the MDT Join TLV in the
672	       global table.  When the Data MDT group is in PIM-SM or
673	       bidirectional PIM mode, the PE routers build a shared tree toward
674	       the RP.  When the data MDT group is setup using PIM-SSM, the PE
675	       routers build a source tree toward the PE router that is
676	       advertising the MDT Join TLV.  The IP address of the source
677	       address is the same as the source IP address used in the IP
678	       packet advertising the MDT Join TLV.

680	       PE routers which are not connected to receivers may wish to cache
681	       the states in order to reduce the delay when a receiver comes up
682	       in the future.

684	     - After [MDT_DATA_DELAY], the PE router connected to the source
685	       starts encapsulating traffic using the Data MDT group.

687	     - When the pre-configured conditions are no longer met, e.g. the
688	       traffic stops, the PE router connected to the source stops
689	       announcing MDT Join TLV.

691	     - If the MDT Join TLV is not received over [MDT_DATA_TIMEOUT], PE
692	       routers connected to the receivers just leave the Data MDT group
693	       in the global instance.

695	7.3. Use of SSM for Data MDTs

697	   The use of Data MDTs requires that a set of multicast P-addresses be
698	   pre-allocated and dedicated for use as the destination addresses for
699	   the Data MDTs.

701	   If SSM is used to set up the Data MDTs, then each MD needs to be
702	   assigned a set of these of multicast P-addresses.  Each VRF in the MD
703	   needs to be configured with this set (i.e., all VRFs in the MD are
704	   configured with the same set).  If there are n addresses in this set,
705	   then each PE in the MD can be the source of n Data MDTs in that MD.

707	   If SSM is not used for setting up Data MDTs, then each VRF needs to
708	   be configured with a unique set of multicast P-addresses; two VRFs in
709	   the same MD cannot be configured with the same set of addresses.
710	   This requires the pre-allocation of many more multicast P-addresses,
711	   and the need to configure a different set for each VRF greatly
712	   complicates the operations and management.  Therefore the use of SSM
713	   for Data MDTs is very strongly recommended.

715	8. Packet Formats and Constants

717	8.1. MDT TLV

719	   "MDT TLV" has the following format.  It uses port 3232.

721	        0                   1                   2                   3
722	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
723	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
724	       |     Type      |            Length           |     Value       |
725	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
726	       |                               .                               |
727	       |                               .                               |
728	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

730	   Type (8 bits):

732	       the type of the MDT TLV.  Currently,  only 1, MDT Join TLV is
733	       defined.

735	   Length (16 bits):

737	       the total number of octets in the TLV for this type, including
738	       both the Type and Length field.

740	   Value (variable length):

742	       the content of the TLV.

744	8.2. MDT Join TLV

746	   "MDT Join TLV" has the following format.

748	        0                   1                   2                   3
749	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
750	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
751	       |     Type      |           Length            |    Reserved     |
752	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
753	       |                           C-source                            |
754	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
755	       |                           C-group                             |
756	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
757	       |                           P-group                             |
758	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

760	   Type (8 bits):

762	       as defined above.  For MDT Join TLV, the value of the field is 1.

764	   Length (16 bits):

766	       as defined above.  For MDT Join TLV, the value of the field is
767	       16, including 1 byte of padding.

769	   Reserved (8 bits):

771	       for future use.

773	   C-Source (32 bits):

775	       the IPv4 address of the traffic source in the VPN.

777	   C-Group (32 bits):

779	       the IPv4 address of the multicast traffic destination address in
780	       the VPN.

782	   P-Group (32 bits):

784	       the IPv4 group address that the PE router is going to use to
785	       encapsulate the flow (C-Source, C-Group).

787	8.3. Constants

789	   [MDT_DATA_DELAY]:

791	       the interval before the PE router connected to the source to
792	       switch to the Data MDT group.  The default value is 3 seconds.

794	   [MDT_DATA_TIMEOUT]:

796	       the interval before which the PE router connected to the
797	       receivers to time out MDT JOIN TLV received and leave the data
798	       MDT group.  The default value is 3 minutes.  This value must be
799	       consistent among PE routers.

801	   [MDT_DATA_HOLDOWN]:

803	       the interval before which the PE router will switch back to the
804	       Default MDT tree after it started encapsulating packets using the
805	       Data MDT group.  This is used to avoid oscillation when traffic
806	       is bursty.  The default value is 1 minute.

808	   [MDT_INTERVAL]
809	       the interval the source PE router uses to periodically send
810	       MDT_JOIN_TLV message.  The default value is 60 seconds.

812	9. Acknowledgments

814	   Major contributions to this work have been made by Dan Tappan and
815	   Tony Speakman.

817	   This document is based on a previous version which included
818	   additional material not covered here.  Yakov Rekhter and Dino
819	   Farinacci were co-authors of the previous version, and the current
820	   authors thank them for their contribution.

822	   The authors also wish to thank Arjen Boers, Robert Raszuk, Toerless
823	   Eckert and Ted Qian for their help and their ideas.

825	10. Normative References

827	   [GRE2784] "Generic Routing Encapsulation (GRE)", Farinacci, Li,
828	   Hanks, Meyer, Traina, March 2000, RFC 2784

830	   [MDT-SAFI] "MDT SAFI", Nalawade, Sreekantiah, February 2004, draft-
831	   nalawade-mdt-safi-00.txt

833	   [MT-DISC] "MT Tunnel Discovery and RPF check", Wijnands, Nalawade,
834	   August 2004, <draft-wijnands-mt-discovery-00.txt>

836	   [PIMv2] "Protocol Independent Multicast - Sparse Mode (PIM-SM)",
837	   Fenner, Handley, Holbrook, Kouvelas, October 2003, <draft-ietf-pim-
838	   sm-v2-new-08.txt>

840	   [PIM-RPF-PROXY] "PIM RPF Proxy" Wijnands, Boers, Rosen, forthcoming.

842	   [RFC2119] "Key words for use in RFCs to Indicate Requirement
843	   Levels.", Bradner, March 1997

845	   [RFC2547bis] "BGP/MPLS VPNs", Rosen, et. al., September 2003,
846	   <draft-ietf-l3vpn-rfc2547bis-01.txt>

848	11. Informative References

850	   [ADMIN-ADDR] "Administratively Scoped IP Multicast", Meyer, July
851	   1998, RFC 2365

853	   [BIDIR] "Bi-directional Protocol Independent Multicast", Handley,
854	   Kouvelas, Speakman, Vicisano, June 2003, <draft-ietf-pim-bidir-
855	   05.txt>

857	   [DIFF2983] "Differentiated Services and Tunnels", Black, October
858	   2000, RFC2983.

860	   [GRE1701] "Generic Routing Encapsulation (GRE)", Farinacci, Li,
861	   Hanks, Traina, October 1994, RFC 1701

863	   [GRE2890] "Key and Sequence Number Extensions to GRE", Dommety,
864	   September 2000, RFC 2890

866	   [IPIP1853] "IP in IP Tunneling", Simpson, October 1995, RFC1853.

868	   [PIM-MPLS] "Using PIM to Distribute MPLS Labels for Multicast
869	   Routes", Farinacci, Rekhter, Rosen, Qian, November 2000, <draft-
870	   farinacci-mpls-multicast-03.txt>

872	   [SSM] "Source-Specific Multicast for IP", Holbrook, Cain, October
873	   2003, draft-ietf-ssm-arch-04.txt

875	12. Authors' Addresses

877	   Yiqun Cai (Editor)
878	   Cisco Systems, Inc.
879	   170 Tasman Drive
880	   San Jose, CA, 95134
881	   E-mail: ycai@cisco.com

883	   Eric C. Rosen (Editor)
884	   Cisco Systems, Inc.
885	   1414 Massachusetts Avenue
886	   Boxborough, MA, 01719
887	   E-mail: erosen@cisco.com

889	   IJsbrand Wijnands
890	   Cisco Systems, Inc.
891	   170 Tasman Drive
892	   San Jose, CA, 95134
893	   E-mail: ice@cisco.com

895	13. Intellectual Property Statement

897	   The IETF takes no position regarding the validity or scope of any
898	   Intellectual Property Rights or other rights that might be claimed to
899	   pertain to the implementation or use of the technology described in
900	   this document or the extent to which any license under such rights
901	   might or might not be available; nor does it represent that it has
902	   made any independent effort to identify any such rights.  Information
903	   on the procedures with respect to rights in RFC documents can be
904	   found in BCP 78 and BCP 79.

906	   Copies of IPR disclosures made to the IETF Secretariat and any
907	   assurances of licenses to be made available, or the result of an
908	   attempt made to obtain a general license or permission for the use of
909	   such proprietary rights by implementers or users of this
910	   specification can be obtained from the IETF on-line IPR repository at
911	   http://www.ietf.org/ipr.

913	   The IETF invites any interested party to bring to its attention any
914	   copyrights, patents or patent applications, or other proprietary
915	   rights that may cover technology that may be required to implement
916	   this standard.  Please address the information to the IETF at ietf-
917	   ipr@ietf.org.

919	14. Full Copyright Statement

921	   Copyright (C) The Internet Society (2004).  This document is subject
922	   to the rights, licenses and restrictions contained in BCP 78 and
923	   except as set forth therein, the authors retain all their rights.

925	   This document and the information contained herein are provided on an
926	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
927	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
928	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
929	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
930	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
931	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.