[bmwg] Mean vs Median

"GEORGESCU LIVIU MARIUS" <liviumarius-g@is.naist.jp> Tue, 03 November 2015 05:57 UTC

From: GEORGESCU LIVIU MARIUS <liviumarius-g@is.naist.jp>
To: bmwg@ietf.org
Message-ID: <6a50aab7bf13.5638cb72@naist.jp>
Date: Tue, 03 Nov 2015 14:57:54 +0900
MIME-Version: 1.0
Content-Language: en
Priority: normal
In-Reply-To: <6a509431f711.56384c39@naist.jp>
References: <6b20c5aba195.56384250@naist.jp> <6aa0d4b4811d.5638428d@naist.jp> <6c3092a4e4de.563842ca@naist.jp> <6c30e9bcce6f.56384306@naist.jp> <6c30b769f897.56384342@naist.jp> <6bd0eb5cc61c.5638437f@naist.jp> <6a80acabaf05.563843bb@naist.jp> <6a40d704f84b.563843f7@naist.jp> <6aa08acd9d6a.56384434@naist.jp> <6c10886bda9e.56384470@naist.jp> <6c1081bddbe0.563844ac@naist.jp> <6c1084a7be89.563844e9@naist.jp> <6a608b65b1c2.56384525@naist.jp> <6a60d6ebaa6a.56384561@naist.jp> <6a80d3baddd6.5638459e@naist.jp> <6aa08a52c1ca.563845da@naist.jp> <6aa09799f4a7.563846ca@naist.jp> <6b60a07c9bbf.56384707@naist.jp> <6c109c80bfc2.56384743@naist.jp> <6a60e1ff9170.56384780@naist.jp> <6a60f4388bab.563847bc@naist.jp> <6bd0f10697e2.563847f8@naist.jp> <6a409179ad4a.56384835@naist.jp> <6a80cfd8c72d.56384871@naist.jp> <6c30b15ad280.563848ae@naist.jp> <6c30f0e98215.563848ea@naist.jp> <6c10c39aeff9.56384926@naist.jp> <6ab08659b996.56384963@naist.jp> <6ab0ea4dfdd6.563849a0@naist.jp> <6ab0be62e098.563849dc@naist.jp> <6aa0abb5b14b.56384a19@naist.jp> <6aa0e679a9c8.56384a55@naist.jp> <6b60e1babb96.56384a93@naist.jp> <6b60fdd88897.56384acf@naist.jp> <6a509431f711.56384c39@naist.jp>
Content-Type: multipart/alternative; boundary="--802f5b96141c55be6394"
Archived-At: <http://mailarchive.ietf.org/arch/msg/bmwg/jmAe1NrHh2OymnFicluJgr_71CI>
Cc: k.pentikousis@eict.de
Subject: [bmwg] Mean vs Median
Precedence: list

Hello BMWG,

Following some of the discussion we had in IETF93 about using either mean or median as a summarizing function for the results of multiple test iterations, I added the following section in http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00 
.

10(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-10). Summarizing function and repeatability To ensure the stability of the benchmarking scores obtained using the tests presented in Sections 6(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-6)-9(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-9), multiple test iterations are recommended. Following the recommendations of RFC2544(http://tools.ietf.org/html/rfc2544), the average was chosen to be the summarizing function for the reported values. While median can be an alternative summarizing function, a rationale for using one or the other is needed. The median can be useful for summarizing especially when outliers are not a desired quantity. However, in the overall performance of a network device the outliers can represent a malfunction or misconfiguration in the DUT, which should be taken into account. The average is a more inclusive summarizing function. Moreover, as underlined in [DeNijs(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#ref-DeNijs)], the average is less exposed to statistical uncertainty. These reasons make it the RECOMMENDED summarizing function for the results of different test iterations, unless stated otherwise. To express the repeatability of the benchmarking tests through a number, the Margin of error (MoE) can be used. Of course, other functions, such as standard error could be employed as well. The advantage the MoE has is expressing an associated confidence interval by using the alpha parameter. The recommended formula for calculating the MoE is presented in 

 Section 6.3.1(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-6.3.1).

After discussing this rationale with Al (Morton) and Kostas (Pentakousis), I am tending to lean towards using median. One of the reasons is non-normal probability distribution cases (e.g. bimodal distribution), where the Mean might not mean much (trying to paraphrase Al). One could add a step in the procedure like "analyze the probability distribution of the 20 measurements after deciding the summarizing function", but this might be an undesired over-complication. In any case, I think a measure of variance should be provided with the summarized results, in order to express the stability/repeatability of the results.

Since the rationale for using Mean or Median (or ...) could be reused in other documents produced by this WG, I would like to ask for more feedback on this subject.

Best regards,
Marius

[bmwg] Mean vs Median GEORGESCU LIVIU MARIUS
Re: [bmwg] Mean vs Median Stenio Fernandes
Re: [bmwg] Mean vs Median GEORGESCU LIVIU MARIUS
Re: [bmwg] Mean vs Median Stenio Fernandes
Re: [bmwg] Mean vs Median Paul Emmerich
Re: [bmwg] Mean vs Median Marius Georgescu
Re: [bmwg] Mean vs Median MORTON, ALFRED C (AL)
Re: [bmwg] Mean vs Median Marius Georgescu
Re: [bmwg] Mean vs Median MORTON, ALFRED C (AL)
Re: [bmwg] Mean vs Median Marius Georgescu
Re: [bmwg] Mean vs Median Paul Emmerich
Re: [bmwg] Mean vs Median Paul Emmerich
Re: [bmwg] Mean vs Median Stenio Fernandes
Re: [bmwg] Mean vs Median Marius Georgescu
Re: [bmwg] Mean vs Median Marius Georgescu
Re: [bmwg] Mean vs Median Paul Emmerich
Re: [bmwg] Mean vs Median Paul Emmerich
Re: [bmwg] Mean vs Median Marius Georgescu
Re: [bmwg] Mean vs Median Paul Emmerich
Re: [bmwg] Mean vs Median MORTON, ALFRED C (AL)
Re: [bmwg] Mean vs Median Marius Georgescu
Re: [bmwg] Mean vs Median Marius Georgescu
Re: [bmwg] Mean vs Median Stenio Fernandes