[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Comments on draft-morin-l3vpn-mvpn-considerations-01.txt



I have carefully  reread draft-morin-l3vpn-mvpn-considerations-01.txt, and I
have a number of comments.

In  general, I  think the  authors have  done  a good  job in  trying to  be
even-handed, but  I think there are some  errors in the details,  and I also
think that some of  the risks and overheads involved in the  use of BGP have
not been fully considered.  I also don't think that the different flavors of
MPLS multicast  LSPs have  been carefully considered.   I would like  to see
some of these issues addressed before adopting it as a WG draft.

On the general issue of comparing PIM vs. BGP, while the draft does not make
a recommendation, it  is heavy on PIM's disadvantages  and BGP's advantages.
So I  think it is  worth listing some  of the risk areas  of BGP.  I  do not
believe that  any of  these are "showstoppers"  for BGP, but  they certainly
should  be articulated  in  any document  that  purports to  be providing  a
balanced view. 

- As BGP  is generally  used, BGP churn  is a  function of real  or apparent
  topology changes.  BGP churn is NOT generally caused by enduser events.

  This changes when  BGP is used to distribute  multicast routes.  BGP churn
  can now  be initiated by PIM  Joins, and a PIM  Join may be  the result of
  some enduser clicking  a button on his PC (e.g.,  "receive video").  it is
  not clear that there is any limit on the rate at which such enduser events
  can occur.  We do not appear to  have adequate data from which we can draw
  conclusions as to the impact of enduser actions on BGP churn.

  It is easy to offer opinions about this, but facts are scarce.

- In some  multicast applications, "join  latency" is considered to  be very
  important.  No  one has ever claimed  BGP will provide  lower join latency
  than  PIM, the only  real question  is whether  latency will  be increased
  beyond  the  limits  that   some  applications  find  acceptable.   Again,
  different people have  different opinions about how much  this will matter
  in practice, but hard data is hard to come by.

- Churn  vs. latency:  To  control the  BGP  churn, we  have discussed  such
  features as  Join Dampening.   However, that can't  be expected to  have a
  salutary effect on the latency!

  There really should  be some recognition in the draft  that the effects on
  latency and churn are as yet poorly understood.

- Any time a new high volume address family is added to BGP. it is important
  to consider  the impact on  the route reflectors,  and whether it  will be
  necessary to add  new route reflectors, perhaps ones  dedicated to the new
  family.  One should also consider the impact (especially on the RRs) which
  one address family  may have on another, e.g.,  will an unexpectedly large
  amount  of churn  due  to PIM  cause  a slowdown  in  the unicast  routing
  convergence.  There should be some discussion of this.

- Transactional nature of PIM

  At first glance,  it may seem that having  PIM distribute multicast routes
  to BGP is no different than  having OSPF distribute unicast routes to BGP,
  i.e., that it is really  nothing new.  However, this isn't quite accurate.
  The issue  is that PIM isn't really  a "routing protocol" in  the sense of
  choosing  routes  based   on  topology.   It  is  a   protocol  that  uses
  router-to-router  transactions  to construct  paths  and  assign flows  to
  paths. In  some cases, especially when  Sparse Mode flows  are involved, a
  number of  transactions involving  multiple nodes may  need to  take place
  before the multicast "routing" stabilizes.  Each one of these transactions
  must  pass through  the RR,  placing more  load on  the RR  and increasing
  latency.

  In  some  cases, where  PIM-SM  has  data-driven  state changes,  we  have
  attempted  to avoid  putting the  data-driven state  changes into  BGP, by
  having BGP send extra  messages, including timer-driven messages.  While I
  tend to think that this is the right tradeoff, it does modify the dynamics
  of path construction,  and we don't have any hard data  to tell us whether
  we've made the right tradeoff,  or whether the "right tradeoff" depends on
  just how the applications are using multicast.
  
- Strange uses of PIM

  Talk  to  anyone  with a  lot  of  experience  in  PIM deployment  in  the
  enterprise, and you'll hear lots of  strange stories about how PIM is used
  in some enterprises.   Although PIM is optimized for  "many receivers, few
  sources", it isn't always used  that way.  I've also heard anecdotes about
  applications that make assumptions about the dynamic nature of PIM and may
  break if  anything changes.   As usual,  hard data is  hard to  find.  But
  there's always a risk that when you change the infrastructure underneath a
  crufty old application you will break the application.  There are people I
  respect, with  much more experience  in multicast deployment than  I have,
  who think it's  just foolish to do anything that changes  the way in which
  the infrastructure behaves for sparse mode.  While I am not convinced that
  they are  right, I think that  this does have  to be recognized as  a risk
  element.

Now let's look at some of the topics covered in the draft.

1. From section 3.1, "the operational burden of setting up multicast on a PE
   or for a VR/VRF SHOULD be as low as possible".

   I certainly  agree with  this!  However, I  don't think it  combines well
   with the  later recommendation to support outsourced  RPs.  Obviously the
   use of outsourced RPs increases the operational burden for the provider.

   I  note that  there doesn't  seem to  be any  requirement for  making the
   operational burden  of MVPN on the  SP's customer be as  low as possible,
   though such a requirement is  prominent in another SP requirements draft,
   viz.,  draft-mnapierala-mvpn-part-reqt-00.txt.  Such a  requirement would
   see to  militate against  the use  of the "outsourced  RP", at  least for
   MVPN customers who already use  multicast.  (Which I imagine would be the
   vast majority of them.)

2. Auto-discovery

   "BGP-based  auto-discovery is the  preferred solution  for auto-discovery
   ...  while  PIM/shared-tree  based  auto-discovery should  be  optionally
   considered for migration purposes only".

   There isn't  really a  dichotomy here.  The  facts are that  if PIM-based
   shared  trees  are  used,  BGP-based auto-discovery  doesn't  really  add
   anything,  but in  any  other circumstance,  BGP-based auto-discovery  is
   absolutely necessary.

   But  if the recommendation  here is  to provide  BGP-based auto-discovery
   even in the one case where  it isn't strictly needed, I certainly support
   that recommendation.  This  is, in fact, already true  of the majority of
   existing MVPN  deployments.  (Though  I have been  told that there  is at
   lest one vendor which does not support it in existing deployments.)

   There is a suggestion in the  draft that if PIM-based shared trees are in
   use, BGP auto-discovery can  help detect misconfigurations.  I'm not sure
   about that, as I can imagine  transition scenarios in which more than one
   shared tree is in use at some time.

3. Number of protocols

   "if any additional protocols are introduced compared with the unicast VPN
   service,  the  balance between  their  advantage  and operational  burden
   SHOULD be examined thoroughly"

   Well, who  could disagree with that?   The pros/cons of  any new protocol
   should always be  examined thoroughly.  In fact, the  pros/cons of adding
   new  functionality  to  existing  protocols  should  always  be  examined
   thoroughly  as well.   As  should  the pros/cons  of  adding new  address
   families, new procedures,  new dynamics, new events and  states, etc., to
   existing protocols.

   It is better to focus on the overall complexity of the system, and on the
   overall  behavior  of  the system,  than  just  to  count the  number  of
   protocols. 

4. S-PMSI Signaling

   The BGP-based S-PMSI  signaling mechanism does have a  risk factor that I
   have mentioned above, but that is not considered in the draft

   In current practice,  BGP changes are caused by  topology changes, not by
   end  user  activity.   However,   it  is  enduser  activity  that  causes
   invocation of the S-PMSI signaling  mechanism.  We simply do not have the
   data at  the present time  to determine how  much this will  increase the
   amount of churn  in BGP.  The S-PMSI signaling is  also something that is
   best done through a low latency  mechanism, and we do not at present have
   the data to determine whether BGP will significantly increase the latency
   when compared with the UDP-based proposal.

   In theory,  the use of BGP  S-PMSI signaling has a  number of advantages,
   but I  don't think  I would  recommend against the  use of  the UDP-based
   procedure unless and until I had some data about these pragmatic issues.

   By the  way, when I  say "data", I  don't mean "strongly held  and loudly
   voiced  opinions", even  though the  latter are  sometimes  confused with
   data.

   "The  UDP-based protocol  is  restricted  to use  within  MVPNs using  an
   MI-PMSI".

   This fact seems to me to be  neither here nor there.  It is true that the
   UDP-based protocol is unlikely to be useful unless PIM is the C-multicast
   routing control protocol, in which case  you have an MI-PMSI.  If you are
   using  BGP as  the C-multicast  routing protocol,  then you  have already
   determined that you don't care if BGP churn is caused by enduser events.

   The UDP-based S-PMSIs signaling is best considered as an optional part of
   the  PIM C-multicast  routing control  plane, rather  than as  a separate
   protocol with its own independent set of pros and cons.

   "the  use  of  the  UDP-based  protocol  does  not  preserve  AS  routing
    independence  when  used in  an  inter-AS  option  B context  (i.e.  the
    decision by  a PE in an  AS to use an  S-PMSI for a  given customer flow
    will impact routing state in other ASes)

   I  have no  idea what  this means,  so I  hope one  of the  authors will
   explain.   (I also  don't  know what  is  meant by  the  claim that  the
   BGP-based S-PMSI signaling  mechanism doesn't have whatever disadvantage
   this is.)

5. Off-loading processing onto route reflector

   The claim is made that, when using BGP to carry C-multicast routes, "some
   of  the processing burden  associated with  client multicast  routing [is
   offloaded] onto BGP route reflectors"

   I  would  like the  authors  to  specify  precisely what  this  offloaded
   processing burden is.  Some consideration of its effect on existing L3VPN
   route reflector systems would also be worthwhile.

   As the  text is now, it is  very difficult to understand  or evaluate the
   claim.

6. MI-PMSI issues

   "Moreover, mechanisms one and two are restricted to use within MVPNs
   using an MI-PMSI, thereby necessitating:

   a.  The use of a P-multicast tree technique that allows shared trees
       (for example PIM-SM in ASM mode or MP2MP LDP).

   b.  The use of one P-multicast tree per PE per VPN, even for PEs that
       do not have sources in their directly attached sites for that
       VPN."

   First,  claim b  is  not true,  as  explained in  draft-rosen-l3vpn-mvpn-
   profiles-00.txt, section 3.3.

   Second, the fact that a control  protocol requires that it be possible to
   instantiate a P-tunnel as a shared tree  of some sort does not seem to be
   to be  a "restriction" or disadvantage.   You might as well  say that the
   use  of  BGP  for  C-multicast  routing has  the  disadvantage  of  being
   "restricted"  to those networks  whose route  reflectors support  the BGP
   C-multicast routing address family.

   There are however some other things  that do seem to require the presence
   of an MI-PMSI:   autoRP and BSR.  Since many  SP customers use autoRP/BSR
   to discover  RPs, and since  it is not  clear how to provide  support for
   these without an  MI-PMSI, I am not convinced that one  can do without an
   MI-PMSI even if BGP is used for distributing C-multicast routes.


7. BGP doesn't eliminate PIM anyway
   
   "using BGP for customer routing distribution within multicast VPNs avoids
   the introduction of an  additional protocol that would require additional
   OAM processes and tools."

   It's not as if  PIM is eliminated by BGP.  As long as  PIM is used on the
   PE/CE links, the SP is still operating a PIM environment.  In fact, there
   may be significant  scaling problems due to PE-CE  PIM.  Some people with
   years of PIM  experience believe that the PE-CE  PIM scaling problems are
   actually worse  than the PE-PE PIM  scaling problems, and  hence that the
   attempt  to improve PE-PE  scaling by  using BGP  just attacks  the wrong
   problem. 

   In the absence of  actual data, the risk is that you  end up adding a lot
   of NEW stuff and don't even address the scaling bottlenecks.
 
   In  any  event, I'm  not  sure  I understand  how  adding  a  lot of  new
   functionality  to BGP  (multicast  path  creation) is  going  to be  done
   without the need for additional OAM processes and tools.

8. Consistency with unicast control mechanisms.

   "An illustrative example of the benefit brought by consistency with
   unicast design is how the "extranet" feature can be implemented :
   when BGP-based mechanisms are used, the well defined and well
   understood BGP route target import/export semantics are just reused."

   This seems like a red  herring.  Extranet capability is already supported
   in  the field  today  by existing  MVPN  deployments based  on PIM.   The
   primary impact is really on the auto-discovery phase, not the C-multicast
   routing phase.

9. Encapsulation techniques

    I notice that  there is one particular data plane  technique that is not
    addressed  by the  document, viz.,  the use  of ingress  replication and
    transmission through  unicast tunnels.  It would be  interesting to know
    whether SPs  have interest in this  technology; if not, maybe  it can be
    removed from the  drafts.  (I think it adds  complexity and frankly, I'm
    not sure it's fully and properly specified.)

    I think section 3.4 can be taken as a requirement for both a "BGP + GRE"
    profile  and a  "PIM +  MPLS" profile,  since it  seems to  advocate for
    independence between the the control  plane and the data plane.  But I'm
    not sure if that is what is intended, and would like clarification.

    I would like  to see a specific recommendation for  the support of MP2MP
    LSPs,  given their  importance in  C-bidir support  (for  either control
    plane, see draft-ietf-l3vpn-2547bis-  mcast-05.txt, section 12.2) and in
    general MPLS  multicast support  for the PIM  control plane  (see draft-
    rosen-l3vpn-mvpn-profiles-00.txt, section 3.3).

10. Upstream-Assigned Labels, MI-PMSI, and Scalability
   
    The  WG drafts frequently  discuss aggregation,  and the  possibility of
    aggregation is frequently touted as a major scalability advantage.

    What isn't always clear is  that all forms of S-PMSI aggregation require
    a  major   new  MPLS  feature,   upstream-assigned  labels.   Supporting
    upstream-assigned labels requires significant  changes to the data plane
    processing  of  many  different  platforms,  and is  not  likely  to  be
    available for some period of time.

    So realistically, for  the foreseeable future, the use  of an MI-PMSI is
    going  to remain  the  only  method of  aggregation.   Unless a  service
    provider does not care if their  P-routers have to maintain an amount of
    multicast state  which is proportional to  the sum of the  states of all
    their customers, the service provider should be requiring support for an
    MI-PMSI. 

    As  a further  note,  it should  be  pointed out  that inter-AS  segment
    S-PMSIs are on a per-PE basis, not a per-AS basis.  Therefore without the
    aggregation that  can be obtained  from upstream-assigned labels,  it is
    not clear that segmented inter-AS  S-PMSIs are scalable at all.  This is
    an issue which should be mentioned.

    I think the draft should have a clear recommendation that in the absence
    of upstream-assigned MPLS labels, MI-PMSIs should be used.

11. Outsourced RPs

    I  would like  to see  the  draft offer  much stronger  support for  its
    recommendation to  implement outsourced RPs,  or else to  eliminate that
    recommendation. 

    Outsourced RPs  (i.e., PE  as RP) have  a number of  disadvantages which
    need to be given more emphasis:

    - More work  for the PE (do  many SPs really want  to give a  lot of new
      work to their PEs)?

    - Not  likely to  be  of much  interest  to customers  with an  existing
      multicast infrastructure

    - Require  increased coordination  between SP  and  enterprise customer,
      such as agreement on RP (or RPL) addresses.
    
    - Not completely clear  how this works in scenarios  where an enterprise
      gets VPN  service from two different providers,  particularly if there
      is a site multihomed to two different providers.

    I presume  that the extra  operational work for  SP and customer  is not
    going to be justified by the  fact that it simplifies things for some of
    the SP's vendors.


12. The draft's conclusion

   "Consequently, at the present time and until there is experience with
   all of the proposed mechanisms it is not clear which of the above
   mechanisms should be recommended as the preferred solution to
   implementers.  However, it would appear prudent for implementations
   to consider supporting both the fourth (BGP-based) and first (full
   per-MPVN PIM peering) mechanisms.  Further experience on both
   implementation is likely to be required before some best practice can
   be defined."

   I certainly agree with this  conclusion.  Perhaps it should be given more
   prominence. 

13. My conclusion

    While  I do  think this  draft may  be a  good start  on the  project of
    comparing the  different MVPN options, I  don't think it is  ready to be
    adopted as a WG document, and it  needs a lot more work to deal with the
    issues I have raised in this note.

    I  also hope  it  is clearly  understood  that this  document should  be
    evaluated by the same criteria as any other internet-draft, i.e., by its
    technical quality.