[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bmwg] WGLC: draft-ietf-bmwg-protection term-06 and meth-05



Great.  We will maintain the 'shutdown' test case in the document.

Scott

-----Original Message-----
From: Rajiv Asati (rajiva) [mailto:rajiva at cisco.com] 
Sent: Monday, August 10, 2009 5:30 PM
To: Kris Michielsen (kmichiel); Scott Poretsky; Jay Karthik (jakarthi);
Al Morton; bmwg at ietf.org
Subject: RE: [bmwg] WGLC: draft-ietf-bmwg-protection term-06 and meth-05

I agree with Kris. In fact, one would reason that the interface failure
due to 'shutdown' (executed by the admin) may provide different result
from that of the interface failure that happened on its own (fiber cut,
for ex).

It is a mere title change, IMO.

Cheers,
Rajiv

> -----Original Message-----
> From: bmwg-bounces at ietf.org [mailto:bmwg-bounces at ietf.org] On Behalf
Of Kris
> Michielsen (kmichiel)
> Sent: Thursday, August 06, 2009 4:23 AM
> To: 'Scott Poretsky'; Jay Karthik (jakarthi); 'Al Morton';
bmwg at ietf.org
> Subject: Re: [bmwg] WGLC: draft-ietf-bmwg-protection term-06 and
meth-05
> 
> Scott,
> 
> I'm not arguing against testing performance when administratively
taking a
> link out of service. It's indeed a valid real world
> performance issue. But a network adminstrator that cares about
customer
> network uptime usually doesn't do that by a plain interface
> shutdown instead of following procedures such as set overload bit,
cost-out
> link, shut adjacency, ...
> 
> I have a problem with claiming that an interface shutdown is a
_failure_
> scenario. Please reread my comment below with the example
> of implementations "A" and "B". Which of the two devices is the best
> performing (and hence expected to show the best benchmark
> metric) in your opinion?
> 
> Performance of taking an interface adminstratively out of service
(using a
> real procedure/sequence of actions chosen by the
> user/tester -- even plain interface shutdown if that is what the
user/tester
> cares about) should be measured, but under the title of
> Adminstrative Actions, not claiming it to be a failure.
> 
> Thanks,
> Kris
> 
> > -----Original Message-----
> > From: Scott Poretsky [mailto:sporetsky at allot.com]
> > Sent: 05 August 2009 18:52
> > To: Kris Michielsen; Jay Karthik (jakarthi); Al Morton;
bmwg at ietf.org
> > Subject: RE: [bmwg] WGLC: draft-ietf-bmwg-protection term-06
> > and meth-05
> >
> > Kris,
> >
> > The purpose of the BMWG is to develop informational standards
> > to benchmark performance of different vendor implementations.
> >  If every vendor has the same implementation then there would
> > be no need for BMWG because every vendor would have the same
> > performance.  How a Cisco admin shut impacts layer 3
> > performance versus a Juniper admin shut is a very real world
> > performance issue that is within scope of the BMWG.
> >
> > Scott
> >
> > -----Original Message-----
> > From: bmwg-bounces at ietf.org [mailto:bmwg-bounces at ietf.org] On
> > Behalf Of Kris Michielsen
> > Sent: Wednesday, August 05, 2009 12:04 PM
> > To: 'Jay Karthik (jakarthi)'; 'Al Morton'; bmwg at ietf.org
> > Subject: Re: [bmwg] WGLC: draft-ietf-bmwg-protection term-06
> > and meth-05
> >
> > Jay,
> >
> > see below.
> >
> > > -----Original Message-----
> > > From: Jay Karthik (jakarthi) [mailto:jakarthi at cisco.com]
> > > Sent: 04 August 2009 19:33
> > > To: Kris Michielsen (kmichiel); Al Morton; bmwg at ietf.org
> > > Subject: RE: [bmwg] WGLC: draft-ietf-bmwg-protection term-06 and
> > > meth-05
> > >
> > > Hi Kris,
> > >
> > > Thanks for your detailed review. I have responded to your
> > comments on
> > > the methodology draft here. We will get back with our
> > response to your
> > > comments on the terminology draft.
> > >
> > > Please see inline.
> > >
> > > Cheers,
> > > Jay
> > >
> > > -----Original Message-----
> > > From: bmwg-bounces at ietf.org [mailto:bmwg-bounces at ietf.org]
> > On Behalf
> > > Of Kris Michielsen (kmichiel)
> > > Sent: Monday, August 03, 2009 9:56 AM
> > > To: 'Al Morton'; bmwg at ietf.org
> > > Subject: Re: [bmwg] WGLC: draft-ietf-bmwg-protection term-06 and
> > > meth-05
> > >
> > > <Snip>
> > >
> > > comments on meth-05:
> > >
> > > * 5.1 I don't think shutdown is a good method to simulate a
failure
> > > since the actions taken by a shutdown are very much implementation
> > > dependent. Shutdown should only be considered as an administrative
> > > action.
> > >
> > > Jay: Agreed that the shutdown is very much implementation
dependent
> > > and hence the reason for including this as failover trigger
> > event. As
> > > you can see this is 1 of the failure events and we have
> > listed a few
> > > other events as well.
> > >
> >
> > If the shutdown is implementation dependent, how can we
> > compare benchmarks of different vendors' implementations?
> > An example of an "Interface shutdown on remote side with POS
> > Alarm": One implementation "A" immediately stops receiving
> > traffic after an interface shutdown and sends AIS. Another
> > implementation "B"
> > just sends AIS after interface shutdown but keeps forwarding
> > traffic coming in from the shutdown interface. Implementation
> > "B" is also very slow in triggering FRR after receiving AIS (e.g.
> > ~200ms), while implementation "A" reacts swiftly to AIS (e.g.
~10ms).
> > The benchmark test for implementation "A" will see some
> > packet loss (e.g. 10ms), while implementation "B" will see
> > zero packet loss, but for a _real_ failure implementation "B"
> > may see more packet loss (e.g. ~200ms).
> >
> > To test the reaction to different kinds of layer1 failure
> > indications some additional equipment (same as used in real
> > networks) is required to inject these failure indications in
> > the DUTs. This can go with or without traffic loss, just as
> > in reality. Also higher layer failures may cause immediate
> > traffic loss or not. To be able to compare benchmarks the
> > characteristics of a Failover event need to be the same or at
> > least documented/reported.
> >
> > An interface shutdown is not to be considered as a real
> > "failure", but merely as an administrative action. Measuring
> > packet loss following such an administrative action is
> > useful, but a user/tester is free to decide how to take an
> > interface administratively out of service (which can be more
> > complicated than just doing a "shut" on an interface).
> >
> > > * 5.2 Can't the remote fault indication in the ethernet
> > > auto-negotiation scheme be considered as a type of link failure
> > > indicator for directly connected devices? In that configuration it
> > > doesn't need to rely on
> > > layer3 failure detection.
> > >
> > > Jay: Sure - we shall modify the text with your suggestion
> > explicitly
> > > for directly connected device.
> > >
> > > * 5.6 Shouldn't reoptimization (signal a new LSP while the failure
> > > still
> > > persists) be added to this section?
> > >
> > > Jay - The authors discussed this and based on the comments
> > we received
> > > earlier, we have mentioned the make-before-break scenario
> > towards the
> > > end of section 8. Would you prefer that we include this in
> > section 5.6
> > > as well ?
> >
> > It needs to be explained that a reoptimization can take place
> > while the failure is still not recovered, but it doesn't need
> > to take place, also depending on the DUT configuration. Since
> > it doesn't fit the title/subject of 5.6 it may be better to
> > put it in a new section before 5.6.
> > In the Reporting format it needs to be indicated if this
> > reoptimization takes place or not.
> >
> > >
> > > * 5.7 If the failover is prefix/LSP dependent, chosing only
> > 3 routes
> > > to measure is not a good idea. In that case as much prefixes as
> > > possible (limited by required accuracy and
> > > Throughput) are needed. Maybe a scheme to randomly select as much
> > > prefixes as possible from the total number could be considered.
One
> > > can also first test if the failover is prefix/LSP
> > independent and make
> > > use of that characteristic to reduce the number of traffic
> > > destinations that need to be measured. If there is a
> > dependency on the
> > > number of prefixes, the number of LSPs, ... then how is
> > this reflected
> > > in the measurements? Is only the fastest failover time
> > important, or
> > > the slowest, or the average?
> > >
> > > Jay - Kris, we have not mandated 3 routes/streams. You
> > probably picked
> > > the 3 routes from the example we had furnished. We also
> > mention at the
> > > beginning of this section that more streams could be used
> > as long as
> > > the flow is steady across all the traffic streams/routes.
> >
> > Guidelines are needed on how much and which streams need to
> > be used when benchmarking. This is required to avoid
> > misinterpretations of the benchmark metrics. If one picks
> > only a handfull stream destinations out of a total of e.g.
> > 5000, then in a prefix or tunnel dependent implementation one
> > cannot know what is being measured (unless maybe when using
> > white-box information). Are those traffic streams going to
> > the prefixes and/or tunnels that failover first or last or
> > somewhere inbetween?
> >
> > >
> > > * 5.8 When tunnels are established from the Tester, it should be
> > > capable to resignal/reoptimize LSPs following the receipt of a
path
> > > error from the PLR, just as real devices do
> > >
> > > Jay - Sure but we did not see much value in including benchmarking
> > > scenarios that would validate the Tester as opposed to the
> > DUT. Btw,
> > > most if not all of the testers we know do not support
> > reoptimization
> > > as of now.
> >
> > Then why is there a need for Tester capability 1. in section 5.8?
> > While we're at it, are Tester capabilities 2. and 5. really
> > needed? If so, when?
> >
> > >
> > > * 6 It is confusing to mark R1 as HE and R2 as MID while
> > their actual
> > > roles may vary depending on the testcase
> > >
> > > Jay - Not sure where the confusion arises from. We believe all the
> > > different scenarios could be tested by positioning R1 and
> > HE and R2 as
> > > MID nodes
> >
> > I meant the indications in the topology figures, such as
> > Figure 2. below (indicated with =><=).
> >
> > #             -------    -------- PRI  --------
> > #            |  R1   |  |   R2   |    |   R3   |
> > #         TG-|=>HE<= |--|=>MID<= |----|    TE  |-TA
> > #            |       |  |  PLR   |----|        |
> > #             -------    -------- BKP  --------
> > #
> > #                          Figure 2.
> >
> > And e.g. the testcase section 7.2 where R2 is Headend. This
> > does not match with "HE" and "MID" in Figure2, and leads to
confusion.
> >
> > # 7.2. Headend PLR with Link Failure
> > ...
> > #      1. Establish the primary LSP on R2 required by the topology
> > #         selected.
> > #      2. Establish the backup LSP on R2 required by the selected
> > #         topology.
> >
> > >
> > > * 8 "Packet Size" is it layer2, layer3 ? "Forwarding rate"
> > > should be "Offered Load"?
> > >
> > > Jay - Packet Size is w.r.t layer3. Forwarding Rate is used in "IGP
> > > dataplane convergence term" documents and we are consistent
> > with that.
> >
> > Then it needs to be indicated in the draft it's layer3 packet size.
> > Forwarding Rate is not the correct term to use here. It
> > should be Offered Load. Forwarding Rate varies during the
> > test (e.g. drops to zero when pulling the cable), Offered
> > Load is constant/predefined.
> >
> > >
> > > * 8 BGP routes, is this only applicable for VPN PEs?
> > >
> > > Jay - We meant BGP routes from global table, just like we
mentioned
> > > IGP routes. Please note, we have explicitly listed VPN
> > routes as well
> >
> > In the IGP dataplane convergence draft, all other protocols
> > are banned from the DUT. If you permit to have BGP on the
> > DUT, many more things need to be specified. How many BGP
> > nexthops? How many BGP prefixes are routed on tunnel(s)? ...
> >
> > >
> > > * 8 What is a "FRR tunnel", the primary or the backup? I think
it's
> > > important to indicate the scale of what is protected, besides the
> > > global network scale. The Benchmarks need to indicate how they are
> > > measured, which method.
> > >
> > > Jay - "FRR tunnel" as in "Protected tunnel". We have left
> > the decision
> > > making on which method of the PBLM, TBLM and TBM to the individual
> > > measuring the benchmarks.
> >
> > Then it's better to indicate it as "Protected tunnel" in the draft.
> >
> > It needs to be reported which method is being used to derive
> > the benchmark metrics.
> >
> > >
> > > * 8, p 24, bullet 1. s/Rate/Loss/
> > >
> > > Jay - Not sure what you are referring to. I looked for this
pattern
> > > and don't see any.
> >
> > This is the bullet I refer to.
> >
> > #      1. Packet-Based Loss method (PBLM): (Number of packets
> > #        dropped/packets per second * 1000) milliseconds. This
method
> > #        could also be referred as Rate Derived method.
> >                                    ^^^^^^^^^^^^--> Loss-Derived
> >
> > I just noticed that the PBLM is called Packet-Loss Based
> > Method (PLBM) in term-06
> >
> > >
> > > * 8 The note at the end of the section should be more elaborated
on
> > > (and may have implications throughout the document).
> > >
> > > Jay - We can elaborate on that.
> >
> > Thanks,
> > Kris
> >
> > >
> > > <Snip>
> > >
> > > Thanks,
> > > Jay
> > >
> >
> > _______________________________________________
> > bmwg mailing list
> > bmwg at ietf.org
> > https://www.ietf.org/mailman/listinfo/bmwg
> >
> 
> _______________________________________________
> bmwg mailing list
> bmwg at ietf.org
> https://www.ietf.org/mailman/listinfo/bmwg