[bmwg] Mean vs Median

"GEORGESCU LIVIU MARIUS" <liviumarius-g@is.naist.jp> Tue, 03 November 2015 05:57 UTC

Return-Path: <liviumarius-g@is.naist.jp>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B21141AD2C4 for <bmwg@ietfa.amsl.com>; Mon, 2 Nov 2015 21:57:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.498
X-Spam-Level: **
X-Spam-Status: No, score=2.498 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, HTML_MESSAGE=0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Kt05an6YJ7-C for <bmwg@ietfa.amsl.com>; Mon, 2 Nov 2015 21:57:56 -0800 (PST)
Received: from mailrelay21.naist.jp (mailrelay21.naist.jp [IPv6:2001:200:16a:50::71]) by ietfa.amsl.com (Postfix) with ESMTP id 8BF4E1AD23D for <bmwg@ietf.org>; Mon, 2 Nov 2015 21:57:56 -0800 (PST)
Received: from mailpost21.naist.jp (mailscan21.naist.jp [163.221.80.58]) by mailrelay21.naist.jp (Postfix) with ESMTP id 14D72B60; Tue, 3 Nov 2015 14:57:55 +0900 (JST)
Received: from naist.jp (webmail21-a.naist.jp [IPv6:2001:200:16a:50::53]) by mailpost21.naist.jp (Postfix) with ESMTP id F29FBB5F; Tue, 3 Nov 2015 14:57:54 +0900 (JST)
Received: from [127.0.0.1] (Forwarded-For: ::ffff:133.93.30.80) by webmail21-a.naist.jp (mshttpd); Tue, 03 Nov 2015 14:57:54 +0900
From: GEORGESCU LIVIU MARIUS <liviumarius-g@is.naist.jp>
To: bmwg@ietf.org
Message-ID: <6a50aab7bf13.5638cb72@naist.jp>
Date: Tue, 03 Nov 2015 14:57:54 +0900
X-Mailer: Oracle Communications Messenger Express 7.0.5.35.0 64bit (built Mar 31 2015)
MIME-Version: 1.0
Content-Language: en
X-Accept-Language: en
Priority: normal
In-Reply-To: <6a509431f711.56384c39@naist.jp>
References: <6b20c5aba195.56384250@naist.jp> <6aa0d4b4811d.5638428d@naist.jp> <6c3092a4e4de.563842ca@naist.jp> <6c30e9bcce6f.56384306@naist.jp> <6c30b769f897.56384342@naist.jp> <6bd0eb5cc61c.5638437f@naist.jp> <6a80acabaf05.563843bb@naist.jp> <6a40d704f84b.563843f7@naist.jp> <6aa08acd9d6a.56384434@naist.jp> <6c10886bda9e.56384470@naist.jp> <6c1081bddbe0.563844ac@naist.jp> <6c1084a7be89.563844e9@naist.jp> <6a608b65b1c2.56384525@naist.jp> <6a60d6ebaa6a.56384561@naist.jp> <6a80d3baddd6.5638459e@naist.jp> <6aa08a52c1ca.563845da@naist.jp> <6aa09799f4a7.563846ca@naist.jp> <6b60a07c9bbf.56384707@naist.jp> <6c109c80bfc2.56384743@naist.jp> <6a60e1ff9170.56384780@naist.jp> <6a60f4388bab.563847bc@naist.jp> <6bd0f10697e2.563847f8@naist.jp> <6a409179ad4a.56384835@naist.jp> <6a80cfd8c72d.56384871@naist.jp> <6c30b15ad280.563848ae@naist.jp> <6c30f0e98215.563848ea@naist.jp> <6c10c39aeff9.56384926@naist.jp> <6ab08659b996.56384963@naist.jp> <6ab0ea4dfdd6.563849a0@naist.jp> <6ab0be62e098.563849dc@naist.jp> <6aa0abb5b14b.56384a19@naist.jp> <6aa0e679a9c8.56384a55@naist.jp> <6b60e1babb96.56384a93@naist.jp> <6b60fdd88897.56384acf@naist.jp> <6a509431f711.56384c39@naist.jp>
Content-Type: multipart/alternative; boundary="--802f5b96141c55be6394"
X-TM-AS-MML: No
X-TM-AS-Product-Ver: IMSS-7.1.0.1392-8.0.0.1202-21918.005
X-TM-AS-Result: No--19.002-5.0-31-10
X-imss-scan-details: No--19.002-5.0-31-10
X-TMASE-MatchedRID: 3jjmsNNiuHRjyIn1sobvhavnfoenSIXeRphO9iDI3+WrmmW6xY+ZmClr osmS0SOAgRykyfrH1xkUlWIKEoGBmZGcQk4/oIgCCjUhgJ4nmoFfzCYQ0TRf4qapMon1aq1cS+s HxS/MWviFBoWoxESWCcwHSQ+yXjTDX1VY9tu3l+4ESz5E/KsaGaTtRiXhPq5LjE4CeeESgDMVgl Qa/gMvfA++8G9b9V4aAVscD1Oo0tCzI9xgBJUBhHMyWIhoKfzEy4xrez9Cz6Hiu0l0lBInBkoBS l62HrpYEQhmLSwsumoE6M1YtcX6vGXaKSN+/ZJgqnZtYvvlwHqWkn/1Anvcu0cemmDX0iCZE0Q8 3A2vD+sxmbT6wQT2a23D6f6IpbLIwtaaHEUh+kBSRtyc7uJ5uCJYfSao+IYYcB+LBgyKoRmVUcz 8XpiS9CXdp9l6EkRZowtRP8whCK/soEFnZAFTLDNuBEpRMJaumTAzlnBh6bzbKAyc+TgM3LYUrk nUqEL7nTcLR8+TzEqo8kkueQXWLtjwnRPo6utT4vM1YF6AJbZO+3uGNcav90+nDWwqdIRcEylL5 Dk7E17AJ9WqTeD4B61dDoojS9MYRHBmOnaJEu9sZUSYh+N/e8WVNQuez8ZBYOIw3TAveasFojJj Ur5CXQZfgp6bkThTq2MrsriMoJhNRgVVMRav2ZXQhv65EV8fUZGp2QtpqYAzpId5NZ9RYg==
Archived-At: <http://mailarchive.ietf.org/arch/msg/bmwg/jmAe1NrHh2OymnFicluJgr_71CI>
Cc: k.pentikousis@eict.de
Subject: [bmwg] Mean vs Median
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Nov 2015 05:57:58 -0000

Hello BMWG,

Following some of the discussion we had in IETF93 about using either mean or median as a summarizing function for the results of multiple test iterations, I added the following section in http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00 
.

10(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-10). Summarizing function and repeatability To ensure the stability of the benchmarking scores obtained using the tests presented in Sections 6(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-6)-9(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-9), multiple test iterations are recommended. Following the recommendations of RFC2544(http://tools.ietf.org/html/rfc2544), the average was chosen to be the summarizing function for the reported values. While median can be an alternative summarizing function, a rationale for using one or the other is needed. The median can be useful for summarizing especially when outliers are not a desired quantity. However, in the overall performance of a network device the outliers can represent a malfunction or misconfiguration in the DUT, which should be taken into account. The average is a more inclusive summarizing function. Moreover, as underlined in [DeNijs(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#ref-DeNijs)], the average is less exposed to statistical uncertainty. These reasons make it the RECOMMENDED summarizing function for the results of different test iterations, unless stated otherwise. To express the repeatability of the benchmarking tests through a number, the Margin of error (MoE) can be used. Of course, other functions, such as standard error could be employed as well. The advantage the MoE has is expressing an associated confidence interval by using the alpha parameter. The recommended formula for calculating the MoE is presented in 

 Section 6.3.1(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-6.3.1).

After discussing this rationale with Al (Morton) and Kostas (Pentakousis), I am tending to lean towards using median. One of the reasons is non-normal probability distribution cases (e.g. bimodal distribution), where the Mean might not mean much (trying to paraphrase Al). One could add a step in the procedure like "analyze the probability distribution of the 20 measurements after deciding the summarizing function", but this might be an undesired over-complication. In any case, I think a measure of variance should be provided with the summarized results, in order to express the stability/repeatability of the results.

Since the rationale for using Mean or Median (or ...) could be reused in other documents produced by this WG, I would like to ask for more feedback on this subject.

Best regards,
Marius