|
Hi Eric, working group, In your comments, there is only *one* of them backed with a precise quote (it is which is discussed below, and I happen to agree that a better formulation can be found, though it doesn't show the analysis as incorrect). But, on the other (numerous) comments, you don't make any quotes, which makes it hard to identify what exactly would be incorrect in Appendix A in your opinion. This makes your claim that the analysis is incorrect very weakly founded. These other comments are discussed below, but let me highlight the following: * some comments are just plain wrong or misleading, possibly due to a misunderstanding on the hypothesis made and what is counted (see below) * some comments are not related to what is looked at in Appendix A (such as CE-PE scaling) * some comments relate to more complex stuff, and are debatable, but as far as I can tell they don't challenge the conclusions in Appendix A, not the conclusions drawn elsewhere in the document Reading in between the lines of your comment, it seems to me that you may just disagree with how the counting of the amount of processing is done. You seem to prefer counting in "number of messages" rather than in "how many times a message is processed", which is what we do in this analysis. Merely counting the number of message would certainly be very favorable to PIM in the considered scenario, since PIM uses messages that are processed by all the PEs in the mVPN, but it would not reflect the load due to multicast routing on the different equipments, for the considered options. What is funny is that, for BGP S-A A-D routes, you'd seem to prefer counting "how many times a message is processed" rather than the "number of messages". What we can do is insist on why we count "how many times a message is processed", to make sure that the reader doesn't miss that point. And we can make sure that in the different places we get the wording right to avoid misunderstandings (e.g. I've spotted a table legend which is unclear). In any case, in all of your comments, I don't find anything solid that questions the conclusions or the correctness of the analysis in Appendix A, and none of the few changes suggested appear to me as changing the conclusions reused elsewhere in the document. Please find below more detailed answers... 11/06/2009 17:52, Eric Rosen: [..] The analysis does not seem correct to me. I'm not saying that there is not a single mistake, there might be. This document was written by humans. But if you think you found an incorrectness, please do a precise quote and explain. You are here just re-explaining us what Join suppression is. I don't know if it is useful.Using PE-PE PIM over an MI-PMSI, how many control messages over the MI-PMSI are needed to enable PE2, ..., PEn to join the (S,G) tree via PE1? Answer: 1. PE2, say, sends a PIM Join(S,G) over the MI-PMSI. PE1 receives this message, and as a result may send a PIM Join(S,G) out one of its VRF interfaces. PE3, ..., PEn also received PE2's Join(S,G), and as a result they do not send Joins themselves. You insist a lot in all your comment on how Join suppression is great. Join suppression is certainly a useful mechanism, and the analysis in Appendix A does not forget to take it into account. So I don't think that this comment puts into question the analysis. No, when a PE Joins the tree it sends a Join(S,G). There is no "join suppression" effect for the first Join (no suppression for the first join, just read p.55 of RFC4601). This is true for each PE. The first Join of each additional PE is received by all the PEs in the mVPN. Thus all the PEs of the mVPN process one Join for each PE joined to (S,G).Appendix A says that the messaging cost of additional PEs after the first joining an (S,G) tree is the same as the cost of the first PE joining the tree. This is false. The messaging cost of each additional PE to join the tree is 0. So, If you increase the number of PEs joined to (S,G) by one, the number of messages processed by each PE increases by *one*. And it is the same if you count the number of Join(S,G) that have been sent: also increased by *one*, not by zero. Sorry, no free beer. ;-) Appendix A, noting the fact that the first Join must be received by every PE in the VPN, states that the message processing overhead is O(# PEs in the VPN). "messaging" is vague, do you talk about "how many times a messages is sent" or "how many times a messages is processed" ?That is not really a meaningful measure of anything. The messaging itself is O(1), and You possibly challenge the fact we count the total number of messages processed on all equipments of a kind, and not merely look at one equipement. If so, refer to my comment on this in the introduction of this me. No, and we don't assert this, AFAIK.there is no router whose processing increases as O(# PEs in the VPN). What is true, is that the total number of message processed by each PE in the mVPN increases as O(#R_PEs), the number of PEs that join to the considered stream. In each PE router, the message processing per tree per VPN that is due to PE-PE interactions is O(1). That statement is bogus. Because, for each PE that joins a stream and leaves it later, the first Join and the Prune is always sent, and because both are processed by each PE in the mVPN, it is obvious that each PE processes 2 message for each PE that joins and leaves the stream (ignoring refreshes) in that case. The amount of processing depends on the number of PEs joined to a stream and is not in O(1). This statement related to PE-CE is nowhere related to Appendix A, AFAICT.It would be more illuminating to take the PE-CE interactions into account as well. In PIM, whether a given interface is p2p or multiaccess, only a single Join(S,G) message is needed to enable all the PEs (except the upstream PE, of course) on that interface to join the (S,G) tree. At a given node, the PIM messaging overhead per tree is actually proportional to the number of interfaces, not to the number of PIM adjacencies. (Excluding Hellos, of course, just as Appendix A does.) That's why the PE-PE overhead per VPN tends to be dwarfed by the PE-CE overhead; for a given VPN, a PE may have lots of VRF interfaces, but it only has one PE-PE interface. In many cases, the PE-PE overhead is just in the noise. (and again, you do not take into account that the first join is always sent (RFC4601) and that a Prune is always sent by a PE that leaves the stream, thus the amount of processing is certainly not independent of the amount of neighbors on an interface, as soon as they join a stream.) You can chose count the "number of messages" instead of the number of times "some message is processed by a node". But Appendix A counts the number of times "some message is processed by a node" (see comment in intro of this mail). See comment in the intro of this mail. You are not challenging here the correctness of the analysis in Appendix A.Anyway, let's look at the case where BGP C-multicast routing is used. Now each PE joining an (S,G) tree must send a C-multicast Source Tree Join to each RR. If n-1 PEs want to join the tree (with the nth PE being the upstream PE), the number of C-multicast Source Tree Joins sent is n times the number of RRs (most likely 2*n). In PIM, this would have been done with only one message; it is BGP that makes the number of messages needed proportional to the number of PEs in the VPN. The reason is that there is no way in BGP to prevent each PE from issuing a Join. As said above, Join suppression effects are accounted for in the analysis.PIM Join Suppression is a big savings when receivers are widely dispersed among the sites; if each PE is attached to a site with receivers, the need to process someone else's Join is a good tradeoff against the need to send your own Join. If receivers are concentrated in a few sites, this tradeoff is not so good. But many multicast applications have the model of a few sources with a lot of widely dispersed receivers. Let's get to the point: how does the statement above is supposed to challenge Appendix A ? (please quote the exact paragraph/line/counting that you disagree with) Appendix A is also incorrect about the overhead related to PIM Prune(S,G) messages. Given what I already read, I feel that you just can't get away with such a blunt affirmation. The basic scenario considered, for what matters to Prunes, is that n PEs, that had joined a stream, not leave a stream, one by one. - each joined PE, when it leave the stream (because JoinDesired became false because the last CE left), sends a Prune - each Prune is processed by each PE of the VPN So, just considering the above, how many times did we had "a Prune is processed" ? Answer: #PEs x #R_PEs The conclusion are just drawn from such basic arithmetic. The numbers for the amount of processing for Prunes are nothing surprising, and are (in order of magnitude) similar to numbers for Joins.
True, but everyone knows this. Everyone also knows that this PIM Prune message is processed by all the PEs: they all have to parse it, and lookup if they have a corresponding state, and reset a timer. If none of the PEs has state, nobody sends a Join to override ; this "non response" is the result of the collective work of all PEs, that have checked that they didn't had matching state.It takes only one message to do a Prune, and one message to override it. Only two routers did send a message, but every PE worked ! This is what Appendix A refers to by saying : The "did the last receiver leave?" question is thus *implicitly* replied to by all PE routers, for each PIM Prune message. > Appendix A would lead one to believe that a Prune(S,G) immediately > causes lots of Join(S,G) messages to be sent; this is not true. We'll try and improve the sentence to avoid the misunderstanding, insisting on the collective work to produce a non-response in due time, letting the upstream node deduce that it can prune traffic. In BGP, if PE2 decides to prune itself from the (S,G) tree, it has to send each of the RRs a message withdrawing its Source Tree Join C-multicast route for (S,G). Then the RR has to use its BGP decision process to determine whether there is another Source Tree Join C-multicast route for the same (S,G). Then the RR has to distribute the latter route. The number of messages is no fewer. What is counted in Appendix A is the amount of routing processing done by the routing equipments, not the number of messages. And in the example you take, in PIM case the amount of times there was "some message processed by some equipment" is higher than in BGP. This comment does not expose an incorrectness in Appendix A. I think the distinction that Appendix A is really trying to get at is the following. Consider a given interface of a given node. Suppose that interface appears either as the upstream interface or as a downstream interface of n trees. Then let's say that interface contains n "branches". The total amount of state that that PIM needs to maintain is roughly proportional to the sum, over all the node's interfaces, of the number of branches. (you omit the processing of prunes and first joins on each branch! ) Suppose that in a given PE, for a given VPN, there are i VRF interfaces, and that the average number of branches per interface is m. For each VPN, there is also a single PE-PE interface (a multiaccess interface to which all the PEs of the VPN belong). Let's suppose that this link contains k branches, where j of these branches belong to trees that the PE is a member of, and k-j belong to other trees in the VPN. If PE-PE PIM is used, the amount of state is proportional to i*m+k. If BGP C-multicast routing is used, the amount of state is proportional to i*m+j. The incremental state needed by PIM is thus proportional to k-j. For the state savings to be significant, one must assume not only that k is much larger than j, but also that k-j is significiant when compared to i*m. Basically this means that you need to assume that each PE has receivers for only a small number of the VPN's trees, or else that there are a small number of VRF interfaces per VPN. Whether or not these assumptions are accurate depends of course on the set of multicast applications being used by the customers. I don't get your point and I don't see why with PE-PE PIM "the amount of state is proportional to i*m+k", since a PIM PE has to maintain state only for streams it is joined to, thus rather i*m+j (it has to process messages for joins for streams it is not joined to, but no state to maintain, afaik). But anyway, the conclusion on state maintenance in Appendix A are that, in order of magnitude, the amount of state is the same for all approaches. This does not goes against the conclusion above, right ? (please be more explicit on what in Appendix A you challenge as "incorrect") This was pointed out by Maria and already fixed in -03.There are also a couple of important factors which Appendix A has omitted: - If you are using two different protocols to maintain each multicast state, then the total amount of state is effectively doubled. This will generally result in both a memory cost and a CPU cost. While this is a "mere factor of two," perhaps it deserves a mention in any document that compares a control plane that uses two protocols (e.g. PIM for PE-CE, BGP for PE-PE) to a control plane that uses only one. (state maintenance in A.3, multiplied by 2; it is a linear factor not changing the conclusions of Appendix A in O(x)) > When PE1 receives a C-multicast Source Tree Join for a sparse mode- For sparse mode, which is the type of multicast most commonly found in the enterprises that buy the VPN service, there is quite a bit of additional messaging in BGP that has not been considered. > group,it has to generate a Source Active A-D Route. It needs to > send a BGP update for this route to each RR. Each RR needs to > send it to each of PE2, ..., PE3. So we have more messaging which > is O( # of PEs in VPN). Needless to say, PIM also has sparse > mode overhead which hasn't been considered, but the point is that > BGP overhead is not O(1). A few comments: - let me remind, that due to the complexity in comparing for a non-SSM scenario, because the PIM and BGP procedures are significantly different, it just wasn't done ; the SPT/SSM part is common, and seems to me as enough to guide the comparison - let me highlight that, this time, you choose to count the number of times a message is processed, not how many messages are sent ; you seem to change what you like to count depending on what you want to show :) - you highlight above a scenario with a total amount of processing across all nodes in O(#mVPN_PE) and say that "the BGP overhead is not O(1)" : this is correct, but not significantly different from what we have in the base scenario of Appendix A, for which the total amount of processing across all nodes is O(#R_PE) for BGP, and O(#mVPN_PE x #R_PE) for PIM So, while it could be conceivable to complement the analysis with a non-SSM scenario in the line of the SSM scenario already considered, we don't have yet seen a compelling argument to do it, since the change wouldn't seem to fundamentally affect the comparison between PIM and BGP. Now, there is another thing briefly looked at at the very end of Appendix A: the case of a more dynamic situation where PEs join and leave many times. In the non-SSM case, as you mention, BGP would have to also produce the S-A A-D routes, and they would have to be processed by all PEs. But for one said downsteam PE joining the steam, the upstream PE will advertises a new S-A A-D routes only if there was no PE already joined to the stream. The impact on BGP thus depends a lot on the Join/Prune dynamics... As you say, the overhead of BGP for the total amount of processing is not O(1) in that case, but will depend on the number of PEs. But this is true to for PIM too, except that for PIM we don't need to make strong hypothesis on the dynamics to know the number of times a message is processed (in the dynamic case, PIM loses the gains of join suppression, and pays the full price of having each message being processed by every PE in the mVPN). So well, if you can explain that such a case would expose conclusions impacting what is in the document today, then it could be worth including. We don't state anything contradictory with the above, AFAIK.- One might get the impression that the BGP scheme eliminates any need for the C-PIM state machine to maintain states for the PMSIs that connect the PEs over the backbone. This is not true; Taking the exact amount/volume of state information would be doable, but as far as I can tell it doesn't seem to change the orders of magnitude. The above is not related to the routing/processing load, hence not to Appendix A.PEs must maintain Prune(S,G,rpt) for the PMSIs. PEs receiving multicast data over a PMSI must also do the RPF interface check for the arriving data, considering the PMSI as the "input interface". The backbone does not become transparent to the C-PIM interfaces just because PIM control packets do not flow over the PMSIs. Since some of the conclusions of the draft depend on the analysis in Appendix A, I think those conclusions need to be removed until the underlying analysis is corrected. Your comments will be useful to improve some points in Appendix A (thank you), but nothing that changes the conclusions significantly enough to remove stuff elsewhere in the document. Or if you really have such a claim, I think that to take a decision we would need you to make it more explicit what you would remove. And a last call by IDR too for the BGP part ?If and when a corrected version of the analysis is available, it should be last called by the PIM WG, which is where one might expect to find the most expertise in PIM. Let's be serious: first, let's see if a "correction" is needed, and delay the question until you really show in how the analysis would be incorrect ! Cheers, -Thomas |