[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Comments on draft-morin-l3vpn-mvpn-considerations-02
I have thoroughly read draft-morin-l3vpn-mvpn-considerations-02, and I
regret that I continue to feel that it is nowhere near ready to be accepted
as a WG document. There are too many places where the reasoning is
superficial or incomplete, or where the reasoning given just does not
support the putative conclusions. I will go into this in some detail below.
If the WG really wants to have a document which provides a sound technical
comparison of various options, I would suggest that a different sort of
"design team" is needed.
Anyway, specific comments below. I will extract text from the draft, and
begin my comments with ****.
1. Introduction
The current proposal for multicast in BGP/MPLS
[I-D.ietf-l3vpn-2547bis-mcast] includes multiple alternative
mechanisms for some of the required building blocks of the solution.
However, it does not identify the core set of mechanisms which must
be implemented in order to ensure interoperability. This may lead to
a situation where implementations may support different subsets of
the available optional mechanisms leading to implementations that do
not interoperate.
**** Since one size does not fit all, not all service providers will want
**** the same sets of options all the time. To ensure interoperability, one
**** needs to define a set of profiles, each of which contains a specific
**** set of options that work with each other. That ensures inter-
**** operability with a profile. Then the question of whether any profile
**** should be mandatory can be addressed. Just providing a single
**** mandatory set of procedures does nothing to ensure interoperability
**** when non-mandatory options are used.
3. Examining alternatives mechanisms for MVPN functions
3.1. MVPN auto-discovery
Section 5.2.10 of [RFC4834] states "The operation of a multicast VPN
solution SHALL be as light as possible and providing automatic
configuration and discovery SHOULD be a priority when designing a
multicast VPN solution. Particularly the operational burden of
setting up multicast on a PE or for a VR/VRF SHOULD be as low as
possible".
The current solution document [I-D.ietf-l3vpn-2547bis-mcast]
addresses this requirement by proposing two different mechanisms for
MVPN auto-discovery:
1. BGP-based auto-discovery (described in section 4).
2. Discovery using PIM running on a MI-PMSI implemented with a
shared tree using multicast ASM, or MP2MP LDP with the same
common tree identifier configured in all VRFs of an MVPN.
It is the recommendation of the authors that BGP-based auto-discovery
is the preferred solution for auto-discovery and should be supported
by all implementations while PIM/shared-tree based auto-discovery
should be optionally considered for migration purpose only.
**** I would agree that the spec should say that BGP-based discovery SHOULD
**** be used. However, the arguments below seem to be mostly specious to
**** me.
Part of the rationale for this recommendation is also based on
section 5.2.10 of [RFC4834] which states "as far as possible, the
design of a solution SHOULD carefully consider the number of
protocols within the core network: if any additional protocols are
introduced compared with the unicast VPN service, the balance between
their advantage and operational burden SHOULD be examined
thoroughly".
BGP is the auto-discovery protocol used in unicast (RFC4364) VPNs and
therefore the use of BGP-based auto-discovery within multicast VPNs
avoids the introduction of an additional auto-discovery protocol that
would require additional OAM processes and tools. Service providers
with deployed unicast (RFC4364) VPNs already have extensive
deployment and operations experience of using BGP as an auto-
discovery protocol including OAM processes and tools. Such processes
and tools will require modifications in order to support multicast
auto-discovery but those modifications are anticipated to be less
than those required to develop new processes and tools for a specific
auto-discovery protocol.
**** This seems to be directed against a strawman. In the absence of
**** BGP-based auto-discovery, the only auto-discovery protocol that needs
**** to be supported is PIM on the PEs, and PIM has to be supported on the
**** PEs anyway in order to interact with the CEs. Thus using PIM for
**** auto-discovery does require any additional protocols.
**** Furthermore, the idea that by using BGP one avoids introducing any new
**** protocols is just a bit strange. Please note that it has taken upwards
**** of 50 pages (and three years) to specify the message formats and
**** procedures of this "no new protocol". The fact is that considerable
**** new protocol has been added to BGP: new address families, new sets of
**** interactions, new procedures.
Additionally, BGP supports MD5
authentication of its peers for additional security. In contrast,
there are no obvious authentication mechanisms to secure PIM
communications in any known implementation.
**** On the contrary, in a very widespread implementation, it is possible
**** to protect PIM control packets with IPsec, either with manual keying
**** or with the GDOI dynamic group key management protocol from the MSEC
**** WG.
Furthermore, PIM based discovery is only applicable to deployments
using a shared tree on an MI-PMSI, whereas BGP-based auto-discovery
does not place any restrictions on the type of multicast trees that
can be used. BGP-based auto-discovery is independent of the type of
P-multicast tree used thus satisfying the requirement in section
5.2.4.1 of [RFC4834] that "a multicast VPN solution SHOULD be
designed so that control and forwarding planes are not
interdependent".
**** I think this paragraph is confused. MI-PMSI is not a forwarding plane,
**** it is a service. The different forwarding planes are the different
**** kinds of multipoint LSPs, the PIM/GRE multicast tunnels, the unicast
**** tunnels used by ingress replication, etc. Any of these could be used
**** to instantiate the MI-PMSI service. I believe that only the unicast
**** tunnels really require BGP-based auto-discovery. So I find this
**** argument weak.
...
Last, the use of the BGP-based autodiscovery is expected to be less
prone to spoofing attacks (being based on a connection established
with a three-way handshake), to which the PIM Hello over MI-PMSI
procedures may be subject to (being datagram-based).
**** I don't believe that TCP's three-way handshake has ever been claimed to
**** be a security feature ;-)
( the authors note that, in order to support the coexistence of both
protocols (for example during migration scenarios), implementations
could support both alternatives by providing a per-VRF configuration
knob that would allow recognizing new PIM neighbors based on the
reception of PIM hellos on a shared P-multicast tree, even for
neighbors that did not advertise a BGP auto-discovery route )
**** Are you saying that PEs should refuse to accept PIM hellos in advance
**** of receiving the BGP auto-discovery routes? I don't think that
**** behavior is currently required by any specification. What would be the
**** advantage of imposing such a rule?
3.2. S-PMSI Signaling
**** Editorial comment: this text should probably be in a subsection, "3.2.1
**** BGP vs. UDP" or something like that.
The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
two mechanisms for S-PMSI Signaling:
1. A new UDP-based TLV protocol specifically for S-PMSI signaling
(described in section 7.2.1).
2. A BGP-based mechanism for S-PMSI signaling (described in section
7.2.2).
**** Please note that the adjective "new" has been applied to the wrong
**** solution ;-) A typo, no doubt.
It is the recommendation of the authors that BGP is the preferred
solution for S-PMSI signaling and should be supported by all
implementations while the UDP-based S-PMSI signaling protocol should
be considered optional.
Part of the rationale for this recommendation is similar to that for
BGP-based auto-discovery and is based on section 5.2.10 of [RFC4834]
and the desire to avoid introducing and deploying additional
protocols unless strictly necessary.
**** See my comment above about referring to the major additions to BGP as
**** "no new protocol".
Furthermore:
o The BGP-based S-PMSI signaling mechanism can be efficiently used
in an inter-AS option B deployment context while the use of the
UDP-based protocol does not preserve AS routing independence when
used in an inter-AS option B context (i.e. the decision by a PE in
an AS to use an S-PMSI for a given customer flow will impact
routing state in other ASes). Co-existence with unicast inter-AS
VPN options is strongly encouraged by section 5.2.6 of [RFC4834].
**** I don't understand what is being said here.
Therefore, it is the opinion of the authors that BGP is the preferred
solution for performing S-PMSI signaling.
**** Generally, if one lists the advantages of scheme A and the
**** disadvantages of scheme B, one finds that A looks to be better than B.
**** If one takes the trouble to also list some of the advantages of B and
**** disadvantages of A, one may draw a different conclusion.
**** For instance, note that the BGP-based S-PMSI signaling requires all PEs
**** of a particular MVPN to maintain state for each C-flow that has been
**** assigned to an S-PMSI. The PEs must maintain this state even if they
**** in fact have no receivers for that C-flow. If one uses a
**** datagram-based mechanism, a PE does not have to retain any state for a
**** flow unless it has receivers for that flow.
**** It is true, as the authors point out, that one can't use the
**** datagram-based procedure unless one has set up a tunnel to carry the
**** datagrams. If one doesn't want to set up tunnels that carry only
**** control messages, then an "out of band" signaling mechanism such as
**** that offered by BGP (or, perhaps, unicast PIM) is needed. On the other
**** hand, if one does have the tunnel available to carry the datagrams, why
**** should one be forced to maintain all the extra state imposed by the
**** BGP-based procedure?
**** Further, in the absence of any well understood criteria for assigning
**** and/or removing particular C-flows to/from particular P-tunnels, we
**** don't know what the rate of change is likely to be or what impact this
**** is likely to have on the BGP route reflectors.
**** I do not believe that the authors have given full consideration to all
**** the pros and cons of each feature, and I therefore believe that their
**** conclusions have not been properly supported.
...
**** Editorial comment: I think the following should be a new section "3.2.2
**** Switching to S-PMSI"
Section 7.2.2.3 of [I-D.ietf-l3vpn-2547bis-mcast] proposes two
approaches for how a source PE can decide when to start transmitting
customer multicast traffic on a S-PMSI:
1. The source PE sends multicast packets for the <C-S, C-G> on both
the I-PMSI P-multicast tree and the S-PMSI P-multicast tree
simultaneously for a pre-configured period of time, letting the
receiver PEs select the new tree for reception, before switching
to only the S-PMSI.
2. The source PE waits for a pre-configured period of time after
advertising the <C-S, C-G> entry bound to the S-PMSI before fully
switching the traffic onto the S-PMSI-bound P-multicast tree.
....
For these reasons, it is the authors' recommendation to mandate the
implementation of the second alternative for switching to S-PMSI.
**** I believe this recommendation has been incorporated into the next
**** revision of the architecture spec.
3.3. PE-PE Transmission of C-Multicast Routing
The current solution document [I-D.ietf-l3vpn-2547bis-mcast] proposes
multiple mechanisms for PE-PE transmission of customer multicast
routing information:
1. Full per-MVPN PIM peering across an MI-PMSI (described in section
5.2.1).
2. Lightweight PIM peering across an MI-PMSI (described in section
5.2.2)
3. The unicasting of PIM C-Join/Prune messages (described in section
5.2.3)
4. The use of BGP for carrying C-Multicast routing (described in
section 5.3).
3.3.1. PE-PE signalling scalability
Scalability being one of the core requirements for multicast VPN, it
is useful to compare the proposed C-multicast routing mechanisms from
this perspective : Section 4.2.4 of [RFC4834] recommends that "a
multicast VPN solution SHOULD support several hundreds of PEs per
multicast VPN, and MAY usefully scale up to thousands" and section
4.2.5 states that "a solution SHOULD scale up to thousands of PEs
having multicast service enabled".
At such scales of multicast deployment, the first and third
mechanisms require the PEs to maintain a large number of PIM
adjacencies with other PEs of the same multicast VPN (which implies
the regular exchange PIM Hellos with each other) and to refresh
C-Join/Prune states, thus limiting the scalability of these
approaches.
**** This analysis is questionable, and appears to be based on unstated
**** assumptions.
**** If one assumes that the Join/Prune states don't change much, then of
**** course the refresh overhead is useless overhead. On the other hand, if
**** one assumes that the Join/Prune states change frequently, perhaps more
**** frequently than than the refresh rate, the overhead due to the
**** refreshes of things that don't change is just in the noise.
**** Furthermore, BGP has never shown itself to be scalable in the face of
**** such rapid changes. Without an analysis of rate of change, the
**** conclusions above are unsupported.
**** There is also no analysis to show that PE-PE PIM overhead is going to
**** be the bottleneck, rather than CE-PE PIM, which the authors appear
**** willing to tolerate.
**** Further, the assumption that we can eliminate much of the state that
**** constitutes the "PIM adjacencies" is questionable. The fact is that
**** whether one is doing PE-PE PIM or not, there is considerable state that
**** must be devoted to keeping track of who the upstream PE is for each
**** C-multicast flow. One has to keep track of unicast routing changes
**** that impact this, one has to make sure that packets one receives are
**** from the right upstream PE, etc. This is a good piece of what is
**** involved in maintaining the PIM adjacencies.
The third mechanism would reduce the amount of C-Join/Prune
processing for a given multicast flow for PEs that are not the
upstream neighbor for this flow, but would require "explicit
tracking" state to be maintained by the upstream PE, and would
require refresh-reduction mechanisms to be used to mitigate the fact
that PIM "Join suppression" cannot be used (what such a refresh-
reduction mechanism would be has not been described yet). For these
reasons, it seems that this approach is not suitable for higher scale
scenarios.
**** I'm not sure I understand the relationship between PIM Join Suppression
**** and refresh reduction.
**** By the way, the BGP-based signaling has an entire set of messages and
**** procedures whose purpose is to support explicit tracking: Leaf A-D
**** routes. These messages and procedures exist for two reasons: (a) to
**** support aggregation schemes that try to aggregate trees that are more
**** or less congruent, and (b) to support the use of RSVP-TE P2MP LSPs to
**** instantiate S-PMSIs. If the authors believe that explicit tracking is
**** inherently unscalable, they should go on to point out that neither the
**** use of RSVP-TE P2MP LSPs nor a strategy of "aggregation by congruence"
**** are scalable schemes.
The second mechanism would operate in a similar manner to full per-
MVPN PIM peering except that PIM hellos are not transmitted and PIM
C-Join/Prune refresh-reduction would be used, thereby improving
scalability, but this approach has been further developed and it is
unclear if it is applicable.
**** I can't argue with this, but the reason it's been hard to get further
**** development of the lightweight PIM peering is that few of the
**** principals in PIM WG agree that there is a practical scaling problem
**** with it.
The first and second mechanisms can leverage the "Join suppression"
behavior and thus improve the processing burden of an upstream PE,
sparing the processing of one Join message for each remote PE joined
to a multicast stream, but this improvement comes at the price of
requiring all PEs of a multicast VPN to process all PIM Joins sent by
any PE participating in the same multicast VPN whether they are the
upstream PE or not.
**** What one gains for this "price" though is that a PE won't send a Join
**** at all if it sees that someone else has already sent it. The price is
**** thus well worth it if there are lots of nodes with receivers for the
**** given group. The price is only high when there aren't lots of nodes
**** with receivers. So the analysis seems to assume that the receivers for
**** each group are sparsely distributed among the PEs of a particular VPN.
**** The authors have not provided any grounds to support that assumption.
The fourth mechanism (the use of BGP for carrying C-Multicast
routing) would have a comparable drawback of requiring all PEs to
process a BGP C-multicast route only interesting a specific upstream
PE. For this reason the C-multicast routing approach leverages the
Route-Target constraint mechanisms, which specifically allows only
the interested upstream PE to receive a BGP C-multicast route. When
RT constraints are used the fourth mechanism reduces the processing
load put on the provider infrastructure for customer multicast
routing to the minimum
**** I think this is a really nifty feature of the BGP method, but let's not
**** get carried away. One cannot say that this method keeps the processing
**** load at a "minimum", because this method causes all nodes with
**** receivers to send Joins, whereas a true Join suppression scheme would
**** allow those joins to be, err, suppressed. Further, the route
**** reflectors are processing all these joins, even if all the PEs are not.
**** The authors just have not provided any grounds for calling this
**** "reducing the processing load on the infrastructure to the minimum".
**** One also needs to look at related processing loads which are not
**** mentioned here. If one needs to do explicit tracking, say, because one
**** is using RSVP-TE P2MP LSPs as S-PMSIs, then not only does each PE with
**** receivers have to send a Join, each one has to send a Leaf A-D route as
**** well.
(by avoiding any processing by "unrelated"
PEs, that are nor the joining PE nor the upstream PE), and inherits
BGP features that are expected to improve scalability (through, for
instance, providing a means to offload some of the processing burden
associated with client multicast routing onto BGP route-reflectors),
and being based on TCP has no refresh-related scalability limit.
(Please refer to Handling the PIM routing processing load load, for a
detailed explanation of the differences in ways of handling the
C-multicast routing load, between the PIM-based approaches and the
BGP-based approach)
**** The savings gained by using route reflectors is not nearly as large in
**** multicast as in unicast.
**** In unicast, if there is no RR, then for a given VPN, each PE would need
**** to unicast a copy of the VPN routing info to every other PE that is
**** attached to the VPN. So by using an RR, you end up transmitting much
**** less info. But in multicast, you don't unicast the same info to each
**** PE, you multicast it, so you're only sending it once. Using the RR
**** provides no off-loading at all of the processing load to do the
**** transmissions.
**** In unicast, the RR runs the decision process, and only sends the
**** installed routes to the PEs. This prevents the PEs from getting
**** getting information about the uninstalled routes, thus reducing the
**** amount of info each PE has to get. It also off-loads the processing
**** needed to do the decision process. Whether this is really a good thing
**** or not is not so clear. It is common to adopt various tricks to force
**** certain routes through the RR (e.g., using different RDs for different
**** VRFs of the same VPN), and BGP ADD_PATH idea never seems to die either.
**** But in any case, the effect on multicast is nowhere near as strong. In
**** the BGP-based C-multicast scheme, every PE has to choose its own
**** upstream PE for each C-multicast flow, track changes in the unicast
**** routing that might affect its choice, participate in the signaling to
**** construct the multicast trees, etc. The only processing load that the
**** RR really saves is the processing needed to do Join Suppression. And
**** if you have a need to do explicit tracking, it doesn't even do that.
However, it is to be noted that offloading customer multicast routing
processing onto BGP route-reflectors will increase the processing
load placed on the route-reflector infrastructure, which, in the
higher scale scenarios, is expected to call for adaptations such as:
o a separation of resources for unicast and multicast VPN routing :
using mvpn-dedicated BGP sessions and/or mvpn-dedicated BGP
instances on route-reflectors, and/or mvpn-dedicated route-
reflectors ;
o the deployment of additional route-reflectors resources :
increased processing resources on existing route reflectors or
additional route-reflectors.
**** So at the end of the section, we see a few understated sentences
**** saying, "by the way, the BGP scheme may not work at scale with the
**** existing route reflector infrastructure or even with existing route
**** reflector implementations". To me, that seems like a rather serious
**** disadvantage, which should not be dismissed or minimized with wishful
**** thinking about future "adaptations".
**** I do agree that C-multicast routing might call for major changes in the
**** RR implementations. I think one of the requirements we should set is
**** that mandatory features should not require such changes.
**** The fact that the authors make a statement about the need to change all
**** the route reflectors makes me worry that the BGP solution may not be
**** anywhere near ready for PS status. Perhaps it should be advanced as
**** experimental until we have a better understanding of its properties.
3.3.2. P-routers scalability
Mechanisms (1) and (2) are restricted to use within multicast VPNs
that use an MI-PMSI, thereby necessitating:
the use of a P-multicast tree technique that allows shared trees
(for example PIM-SM in ASM mode or MP2MP LDP)
or the use of one P-multicast tree per PE per VPN, even for PEs
that do not have sources in their directly attached sites for that
VPN.
By comparison, the fourth mechanism doesn't impose either of these
restrictions, and when P2MP trees are used only necessitates the use
of one tree per VPN per PE attached to a site with a multicast source
or RP (or with a candidate BSR, if BSR is used), thereby improving
the amount of state maintained by P-routers compared to the amount
required to build an MI-PMSI with P2MP trees.
**** I have to say I cannot follow the reasoning here.
**** Perhaps the point the authors are trying to make is the following. If
**** one uses multicast distribution trees to transmit control packets,
**** there is a chance that one will need to set up multicast distribution
**** trees that only carry control packets (and that are not needed for
**** carrying data) . Each such tree requires state in the P routers. One
**** uses less state in the P routers if one only sets up the multicast
**** trees that are needed for carrying data packets, and doesn't set up any
**** just for carrying control packets.
**** This reasoning depends heavily upon unstated assumptions about how the
**** multicast sources are distributed around the customer sites. One might
**** also ask whether this is anywhere close to being a bottleneck.
**** Nevertheless, it is shown in draft-rosen-mvpn-profiles how one can use
**** MP2MP LSPs to avoid setting up tunnels that are not needed for data
**** packets, while still using PIM.
**** If MP2MP LSPs are made a mandatory feature, then the issue goes away
**** entirely. Since MP2MP LSPs are the only reasonable way to support
**** BIDIR-PIM C-flows, they are needed anyway.
**** (Note that the other way of supporting BIDIR-PIM flows require either
**** upstream-assigned MPLS labels or the "PE as RP" scheme, neither of
**** which is regarded as a mandatory feature.)
3.3.3. Impact of C-multicast routing on Inter-AS deployments
Furthermore, co-existence with unicast inter-AS VPN options, and an
equal level of security for multicast and unicast including in an
inter-AS context, are specifically mentioned in sections 5.2.6, 5.2.8
and 5.2.12 of [RFC4834].
The first three mechanisms impose direct PE to PE communications :
this does not apply well to an inter-AS option B context, because of
security and robustness issues that are involved by such a level of
reachability and interaction between PEs in different ASes.
**** This is not a technical argument, just an unsupported claim. It would
**** be interesting to see this point developed in proper detail though.
Their use in an inter-AS context is possible, but not without
limitations or additional engineering design trade-offs depending
upon the interconnect types.
**** See previous comment.
By comparison, the fourth option (the use of BGP for carrying
C-Multicast routing) does not have any of the above limitations
related to inter-AS deployments, and also provides an additional
alternative to facilitate such deployments through the possibility of
using segmented inter-AS trees.
**** The claim here is just wrong. The use of segmented inter-AS trees does
**** not presuppose the use of BGP C-multicast routing.
3.3.4. Security and robustness
BGP supports MD5 authentication of its peers for additional security,
thereby possibly benefit directly to multicast VPN customer multicast
routing, whether for intra-AS or inter-AS communications. By
contrast, with a PIM-based approach, no mechanism providing a
comparable level of security to authenticate communications between
remote PEs has been yet fully described yet
[I-D.ietf-pim-sm-linklocal][], and in any case would require
significant additional operations for the provider to be usable in a
multicast VPN context.
**** See prior comment on security. Also, how common is it for SPs to use
**** MD5 on the BGP connections between PEs and RRs?
The robustness of the infrastructure, especially the existing
infrastructure providing unicast VPN connectivity, is key. The
C-multicast routing function, especially under load, will compete
with the unicast routing infrastructure. With the PIM-based
approaches, unicast and multicast VPN routing are expected to only
compete in the PE, for routing plane processing resources. In the
case of the BGP-based approach, they will compete on the PE for
processing resources, and in the route-reflector if they are used.
It is identified that in both cases, mechanisms will be required to
arbitrate resources (e.g. processing priorities). In the case of
PIM-based procedures, between the different control plane routing
instances in the PE. And in the case of the BGP-based approach, this
is likely to require using distinct BGP sessions for multicast and
unicast, possibly toward distinct route-reflectors.
Multicast routing is dynamic by nature, and multicast VPN routing has
to follow the VPN customers multicast routing events. The different
approaches can be compared on how they are expected to behave in
scenarios where multicast routing in the VPNs is subject to an
intense activity. Such a load would be comparable to the higher
scale scenarios described in xx (Section 3.3.1) and the fourth (BGP-
based) approach - when deployed to handle a significant multicast VPN
routing load - is expected to be the most efficient approach in a
such case.
**** I missed the argument which leads to the conclusion that "the fourth
**** approach ... is expected to be the most efficient approach"
**** Of course, if "expected" really just means "hoped", no additional
**** reasoning is needed ;-)
On the other hand, while the BGP-based approach is likely
to suffer a slowdown under a load raising beyond processing resources
(because of possibly congested TCP sockets), the PIM-based approaches
would react to such a load by dropping messages, with later failure
recovery through message refreshes, this being at the expense of some
predictability.
In fact both situations are problematic, and what seems important is
the ability for the VPN backbone operator to (a) limit the amount of
multicast routing activity that can be triggered by a multicast VPN
customer, and to (b) provide the best possible independence between
distinct VPNs. It seems that both of these can be addressed through
local implementation improvements, and that both the BGP-based and
PIM-based approach could be engineered to provide (a) and (b). It
can be noted though that the BGP approach proposes ways to dampen
C-multicast route withdrawals and/or advertisements, and thus already
describes a way to provide (a), while nothing hasn't been yet
described for the PIM-based approach, though these type of approaches
rely on a per VPN dataplane to carry the mvpn control plane, and thus
might naturally benefit from this first level of separation to solve
(b).
**** It seems to me that the reasoning here just doesn't provide a basis for
**** the conclusions.
3.3.5. C-multicast VPN join latency
Section 5.1.3 of [RFC4834] states that "the group join delay [...] is
also considered one important QoS parameter. It is thus RECOMMENDED
that a multicast VPN solution be designed appropriately in this
regard.". In a multicast VPN context, the "group join delay"of
interest is the time between a CE sending a PIM Join to its PE and
the first packet of the corresponding multicast stream being received
by the CE.
The different approaches proposed seem to have different
characteristics in how they are expected to impact join latency:
o the PIM-based approaches minimize the number of control plane
processing hops between the PE of a new receiver and the PE of the
multicast source, and being datagram based introduce minimal
delay, thereby possibly having a best-case join latency as good as
possible depending on implementation efficiency
o the BGP-based approach uses TCP exchanges, that may introduce an
additional delay depending on BGP implementation performances, but
are expected to control the worst-case join latency under load
o the BGP-based approaches is designed to allow the introduction of
route-reflectors which will introduce an additional processing
delay between the receiver-PE and the source-PE
o in higher scale scenarios, the BGP-based approach is expected to
provide some control of the worst-case join latency whereas the
PIM-based approaches may behave less efficiently if PIM messages
are lost
**** There's that "is expected to" again. What is the reasoning that would
**** support this conclusion? As far as I can tell, the BGP-based approach
**** provides no more control over the worst case than any other possible
**** approach.
o in higher scale scenarios, the introduction of route-reflectors in
the BGP architecture are expected to provide processing efficiency
which is expected to improve latency compared to the PIM-based
approaches
**** I don't see why. Please provide the reasoning that supports this
**** conclusion.
This qualitative comparison of approaches tend to highlight that the
BGP based approach is designed for controlling the "worst-case" join
latency
**** This claim has not been supported by any reasoning.
whereas for the PIM-based approaches seem to structurally be
able to reach the shorter "best-case" group join latency (especially
compared to deployment of the BGP-based approach where route-
reflectors are used).
**** In other words, PIM provides lower latency in the typical case, where
**** the environment is neither lossy nor congested.
Doing a quantitative comparison is not
possible without referring to specific implementations and
benchmarking procedures, and would possibly expose different
conclusions, especially for best-case group join latency for which
performance is expected vary with implementations. We can also note
that improving a BGP implementation for reduced latency of route
processing would not only benefit multicast VPN group join latency,
but the whole BGP-based routing.
**** Improving BGP is always useful, but it's hard to see what that has to
**** do with any of the issues here.
Last, it is to be noted that the C-multicast routing procedures will
only impact the group join latency of a said multicast stream for the
first receiver that is located across the provider backbone from the
multicast source.
**** This is not the case. For one thing, different receiving PEs may
**** select different upstream PEs for the same C-source. PE1 may have
**** joined C-(S,G) via PE2, but when PE3 tries to join via PE4, join
**** latency is still an issue.
**** We might also take a closer look at the case where PE1 and PE2 have
**** both joined C-(S,G) via PE3. If an I-PMSI is being used, then
**** presumably PE1 and PE2 are already joined to it, and can immediately
**** begin receiving the traffic. But suppose an S-PMSI is being used, PE1
**** is already joined to it, but PE2 has not joined it yet. Since the
**** BGP-based S-PMSI procedures require PE2 to know already that C-(S,G) is
**** bound to a particular S-PMSI, then PE2 just has to join that S-PMSI.
**** So that does reduce latency, but at the expense of state; each PE has
**** to know, for ALL flows in the MVPN, which flows have been bound to
**** which S-PMSIs. Now suppose that the S-PMSI is instantiated by an
**** RSVP-TE P2MP LSP. In this case, PE2 cannot just join the S-PMSI.
**** First it must issue a BGP Leaf A-D route specifying that it is
**** interested in C-(S-G). When PE3 sees this, it initiates the signaling
**** to add PE2 to the LSP. So although PE3 doesn't have to get a join from
**** PE2, it does have to get a BGP message from PE2.
**** Increased prune latency of course, will keep unwanted traffic coming
**** for a longer time.
3.3.6. Architectural considerations
The fourth mechanism (the use of BGP for carrying C-Multicast
routing) would appear to fit well with the current unicast
architecture as BGP is the customer routing distribution protocol
used in unicast VPNs and therefore using BGP for customer routing
distribution within multicast VPNs avoids the introduction of an
additional protocol that would require additional OAM processes and
tools.
**** I imagine that was one of the arguments in favor of using MOSPF instead
**** of PIM ;-)
**** As long as PIM is still used for CE-PE multicast routing, it is false
**** to say that using PIM PE-PE requires an additional protocol or
**** additional "OAM processes and tools". I also do not understand why it
**** is simply presupposed that the new BGP address families (and the 50
**** pages of new specification added to BGP) do not impose requirements for
**** any new OAM procedures.
Service provider's with deployed unicast (RFC4364) VPNs
already have extensive deployment and operations experience of using
BGP as a customer routing distribution protocol including OAM
processes and tools. Such processes and tools will require
modification in order to support customer multicast routing but those
modifications are anticipated to be less than those required to
develop new processes and tools for a distinct customer routing
protocol.
**** "are anticipated to be", is that yet another euphemism for "are hoped
**** to be"? Again, this paragraph seems to forget that PIM is needed in the
**** PEs anyway.
It should be noted that because PIM will be used as the CE-PE
customer routing distribution protocol, service providers will still
need OAM processes and tools in order to manage the PIM protocol, so
this rationale only applies to a subset of the tools and processes
already in place.
**** This "should have been noted" several paragraphs back where it would be
**** even clearer that that it undercuts much of the authors' reasoning.
An illustrative example of the benefit brought by consistency with
unicast design is how the "extranet" feature can be implemented :
when BGP-based mechanisms are used, the already defined and well
understood BGP route target import/export semantics are just reused
and applied to BGP mVPN routes. By contrast, it is not specified how
implementing the same feature would be done in the context of other
alternative mechanisms, and unclear if this is possible without
significant engineering trade-offs given that their control plane is
tied to a specific MI-PMSI tunnel. Note that the support for the
Extranet feature is stated as a MUST in sections 5.1.6 of [RFC4834].
**** Even the existing MVPN deployments, available for years, provide
**** extranet support. So there's a proof in practice that the BGP-based
**** mechanisms are not needed for this purpose, notwithstanding the
**** authors' theoretical arguments about abstract qualities of
**** "consistency".
Section 5.2.10 of [RFC4834] states that "as far as possible, the
design of a solution SHOULD carefully consider the number of
protocols within the core network: if any additional protocols are
introduced compared with the unicast VPN service, the balance between
their advantage and operational burden SHOULD be examined
thoroughly". Considering that the recommendation of the authors
would be BGP for auto-discovery and S-PMSI signaling, the choice of
BGP for customer multicast routing would be consistent with the
protocol choice for unicast VPNs and would adequately address this
requirement.
**** Since PIM is needed in the PEs anyway, the choice of PIM addresses this
**** requirement equally well.
3.3.7. Conclusion on C-multicast routing
The fourth approach (BGP-based) for customer multicast routing
clearly presents some advantages over the PIM-based alternatives.
However it has yet to be deployed within an operational MVPN, and
only limited experience exists with its implementations. By
contrast, PIM-based mechanisms lack many of these benefits and have
identified limitations in how they can handle customer multicast
routing load in higher-scale scenarios. Despite these, experience
showed that the "Full PIM peering" approach is operationally viable.
Consequently, at the present time and until there is experience with
all of the proposed mechanisms it is not clear which of the above
mechanisms should be recommended as the preferred solution to
implementers. However, it would appear prudent for implementations
to consider supporting both the fourth (BGP-based) and first (full
per-MPVN PIM peering) mechanisms. Further experience on both
implementations is likely to be required before some best practice
can be defined.
**** Let's see, we have one mechanism which is known to be operationally
**** viable, and another that has only recently been invented, with which
**** there is zero operational experience, and which has only one partial
**** implementation. Which one is eligible to be called a "best practice"?
**** Which one should be called an "experiment"?
Moreover to improve the clarity of the proposed specifications,
considering that neither hello suppression nor refresh-reduction
procedures are currently specified or documented and that it is not
clear what the impact to the PIM state machine of these additional
procedures may be, the authors recommend that the proposals for
lightweight PIM peering across an MI-PMSI (the second mechanism) and
for the unicasting of PIM C-Join/Prune messages (the third mechanism)
be removed from the current solution document
[I-D.ietf-l3vpn-2547bis-mcast] (at least until they have been further
specified and both their impact and benefit on a multicast VPN
deployment is spelled out).
**** Those proposals appear only in the "framework" section, which seems
**** perfectly appropriate. If they do get developed at some later time, we
**** wouldn't want them ruled out without consideration as being
**** inconsistent with the framework, would we? (BTW, there is a draft for
**** TCP-based support of PIM now, being presented to the PIM WG.)
**** Serious comments about how to improve the "clarity of the proposed
**** specifications" would of course be welcome.
3.4. Encapsulation techniques for P-multicast trees
...
Current unicast VPN deployments use a variety of LDP, RSVP-TE and
GRE/IP-Multicast for encapsulating customer packets for transport
across the provider core of VPN services.
**** There are current unicast VPN deployments that also use MPLS-in-L2TPv3.
**** Why is this not in the list of things that have to be supported for
**** MVPN then?
It is recommended that
implementations support the three corresponding multicast tree
encapsulations techniques, namely: mLDP, P2MP RSVP-TE and GRE/
IP-multicast in order to allow the same encapsulations to be used for
unicast and multicast traffic as well as facilitating migration from
[I-D.rosen-vpn-mcast] to an MPLS label based encapsulation.
**** Given that there is no specification requiring support for all these
**** encapsulations in unicast, it hardly makes sense to require support for
**** all of them in multicast. Anyway, I'd say it is unreasonable to
**** REQUIRE support for every possible tunnel encapsulation (except L2TPv3,
**** of course ;-)) just because for each encaps there is someone who wants
**** it.
All three of the above encapsulation techniques support the building
of P2MP multicast trees. In addition mLDP and GRE/IP-ASM-Multicast
implementations may also support the building of MP2MP multicast
trees. The use of MP2MP trees may provide some scaling benefits to
the service provider as only a single MP2MP tree need be deployed per
VPN, thus reducing the amount of multicast state that needs to be
maintained by P routers. This gain in state is at the expect of
bandwidth optimization, since sites that do not have multicast
receivers for multicast streams sourced behind a said PE group will
still receive packets of such streams, leading to non-optimal
bandwidth utilization across the VPN core.
**** The use of MP2MP LSPs is perfectly compatible with the use of S-PMSIs
**** for optimizing the routing of individual C-flows. The claim that they
**** lead to non-optimal bandwidth utilization thus seems to be completely
**** unsupported.
One thing to consider is
that the use of MP2MP multicast tree will require configuring the
same tree identifier or multicast ASM group address in all PEs, and
will not provide the kind of autoconfiguration possible with P2MP
trees.
**** If you think that it would be better not to have to configure the ASM
**** group identifier at all the PEs, this can easily be avoided. You
**** really only have to configure it at one PE, and all the other PEs can
**** auto-discover it. This requires only a minor change to the mvpn-bgp
**** spec.
**** I'm not sure how important that is though, you're configuring all the
**** PEs with the RTs for the MVPN, what's the marginal difficulty in
**** configuring them with a group identifier as well?
**** The text above seems like an example of stretching to try to find
**** something bad to say about a scheme which is really the technically
**** superior scheme.
MVPN services can also be supported over a unicast VPN core through
the use of ingress PE replication whereby the ingress PE replicates
any multicast traffic over the P2P tunnels used to support unicast
traffic. While this option does not require the service provider to
modify their existing P routers (in terms of protocol support) and
does not require maintaining multicast-specific state on the P
routers in order for the service provider to be able deploy a
multicast VPN service, the use of ingress PE replication obviously
leads to non-optimal bandwidth utilization and it is therefore
unlikely to be the long term solution chosen by service providers.
However ingress PE replication may be useful during some migration
scenarios or where a service provider considers the level of
multicast traffic on their network to be too low to justify deploying
multicast specific support within their VPN core.
**** The spec also has a strange combination of ingress replication with
**** multicast trees, whereby one unicast tunnels the packets to the root of
**** a shared tree, and then decapsulates them and distributes them down the
**** tree. Senders then get their own transmissions back! I wonder why the
**** authors have not asked for this partially specified and not very useful
**** technique to be removed from the spec.
All proposed approaches for control plane and dataplane can be used
to provide aggregation amongst multicast groups within a VPN and
amongst different multicast VPNs, and potentially reduce the amount
of state to be maintained by P routers. However the latter -- the
aggregation amongst different multicast VPNs will require support for
upstream-assigned labels on the PEs. Support for upstream-assigned
labels may require changes to the data plane processing of the PEs
and this should be taken into consideration by service providers
considering the use of aggregate S-PMSI tunnels for the specific
platforms that the service provider has deployed.
3.5. Inter-AS deployments options
...
The segmented inter-AS solution would appear to offer the largest
degree of deployment flexibility to operators, however the non-
segmented inter-AS solution can simplify deployment in a restricted
number of scenarios and [I-D.rosen-vpn-mcast] only supports the non-
segmented inter-AS solution and therefore the non-segmented inter-AS
solution is likely to be required by some operators for backward
compatibility and during migration from [I-D.rosen-vpn-mcast] to
[I-D.ietf-l3vpn-2547bis-mcast].
The applicability of segmented or non-segmented inter-AS tunnels to a
given deployment or inter-provider interconnect will depend on a
number of factors specific to each service provider. However, due to
the additional deployment flexibility offered by segmented inter-AS
tunnels, it is the recommendation of the authors that all
implementations should support the segmented inter-AS model.
Additionally, the authors recommend that implementations should
consider supporting the non-segmented inter-AS model in order to
facilitate co-existence with existing deployments, and as a feature
to provide a lighter engineering in a restricted set of scenarios,
although it is recognized that initial implementations may only
support one or the other.
Additionally, the authors note that the proposed BGP-based approaches
for S-PMSI signaling and C-multicast routing information distribution
provide a good fit with both segmented and non-segmented inter-AS
tunnels. In contrast the UDP-TLV based approach for S-PMSI signaling
appears to be incompatible with segmented inter-AS tunnels, and it is
unclear if the proposed PIM-based approaches for C-multicast routing
information distribution would be fully applicable to segmented
inter-AS tunnels.
**** I don't see any reason why one couldn't instantiate an MI-PMSI service
**** with segmented inter-AS trees, so I don't see why there is any
**** incompatibility between the segmented inter-AS trees and the various
**** non-BGP approaches to signaling.
**** While segmented inter-AS trees do have the advantages cited, there are
**** some scalability concerns. The S-PMSIs are individuated on a
**** per-root-PE basis, not on a per-root-AS basis. To make efficient use
**** of these you have to have aggregation via upstream-assigned labels at
**** the border routers. Of course, upstream-assigned labels are not a
**** required feature. The border routers need to be aware of each MVPN
**** that passes traffic through them. While unsegmented trees are not a
**** particularly good solution in multi-provider scenarios, it is far from
**** clear that the segmented inter-AS trees provide a good solution either.
**** This is an area where I think more study and more review would be
**** appropriate.
...
6. Summary of recommendations
The following list summarizes the authors' recommendations. These
recommendations are not intended to prevent the implementation of
alternative solutions, rather they are the authors' recommendations
for the mechanisms that should be made mandatory in
[I-D.ietf-l3vpn-2547bis-mcast] and therefore be supported by all
implementations.
It is the authors' recommendation:
o that BGP-based auto-discovery be the mandated solution for auto-
discovery ;
o that BGP be the mandated solution for S-PMSI signaling ;
o that the mandated solution for S-PMSI switch-over be the mechanism
based on the source-connected PE switching traffic from the I-PMSI
tunnel to the S-PMSI tunnel, without transmitting traffic on both
at the time ;
o that implementations support both the BGP-based and the full per-
MPVN PIM peering solutions for PE-PE transmission of customer
multicast routing until further operational experience is gained
with both solutions ;
**** I really don't think this provides a reasonable basis for requiring
**** support for BGP-based C-multicast routing. If one wants to require
**** support for something, the alternative that is already deployed
**** and known to work is the one that should be mandated, and the
**** alternative that is still experimental should be made optional.
**** As a wise man once said, "the proof of the pudding is in the eating,
**** not in the debate about the pudding." ;-)
o that implementations support the following multicast tree
encapsulations: mLDP, P2MP RSVP-TE and GRE/IP-Multicast ;
o that implementations support segmented inter-AS tunnels and
consider supporting non-segmented inter-AS tunnels (in order to
maintain backwards compatibility and for migration) ;
**** I really do like the segmented inter-AS tunnel idea, and in theory it
**** is a very good thing, but of course we don't know how it will pan out
**** in practice, and there are scalability concerns. I don't think this is
**** ready to be a "must implement" yet.
o implementations MUST support deployments when activation of a PIM
RP function (PIM Register processing and RP-specific PIM
procedures) or VRF MSDP instance is not required on any PE router.