Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06
Andy Polyakov <appro@openssl.org> Mon, 18 September 2017 17:51 UTC
Return-Path: <appro@openssl.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3E4931243F6 for <cfrg@ietfa.amsl.com>; Mon, 18 Sep 2017 10:51:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.9
X-Spam-Level:
X-Spam-Status: No, score=-6.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MmIyDxwK7ZMr for <cfrg@ietfa.amsl.com>; Mon, 18 Sep 2017 10:51:41 -0700 (PDT)
Received: from mta.openssl.org (mta.openssl.org [194.97.150.230]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BEC05124B18 for <cfrg@irtf.org>; Mon, 18 Sep 2017 10:51:40 -0700 (PDT)
Received: from [127.0.0.1] (localhost [IPv6:::1]) by mta.openssl.org (Postfix) with ESMTP id 08E54E035C; Mon, 18 Sep 2017 17:51:37 +0000 (UTC)
To: cfrg@irtf.org
References: <EA4347BF-D26F-4303-9A8D-E7B28986DE56@isode.com>
From: Andy Polyakov <appro@openssl.org>
Message-ID: <71d10985-4c46-4a7c-e634-76a822102a61@openssl.org>
Date: Mon, 18 Sep 2017 19:51:36 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <EA4347BF-D26F-4303-9A8D-E7B28986DE56@isode.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/bJv0BHbp0O5LbYE1HjxUKRUBhdg>
Subject: Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Sep 2017 17:51:44 -0000
Hi, I apologize for chiming in so late, but I wasn't aware of this endeavour till very recently. The question I find myself struggling with is that I can't find introduction of new primitives justified, at least not in this context. I did my best sieving through https://mailarchive.ietf.org/arch/search/?q=gcmsiv) to locate relevant prior comments, so even with those in mind. One of arguments was that POLYVAL gets bit order right in comparison to GHASH. Well, comparing the two is actually boils down to one question: what is it you choose to bit-flip, input or polynomial? [Do note "bit-flip", not "byte-flip".] What I'm trying to say is that we have to recognize that POLYVAL is simply formulated in terms of GHASH polynomial in reverse bit order. On the other hand on more practical level, i.e. from software implementation viewpoint, answer to the question is self-obvious, it's more efficient to bit-flip single polynomial than each fragment of the input. This last thing effectively means that sensible GHASH implementation would actually use reverse-bit polynomial (and at least OpenSSL implementations all do). This in turn means that the only essential difference between GHASH and POLYVAL is byte order. Now, it was argued that little-endian byte order gives performance edge [on platform of popular choice]. But relevant question in the context is if the edge actually justifies introduction of new algorithm. It was asserted that it provides 20% improvement and it was attributed to the fact that PCLMULQDQ-based GHASH implementations use vector byte swap instructions, *one* per block. But we have to recognize that said improvement coefficient is actually specific to Skylake. For example on Haswell we'll observe ... 0% improvement, on Broadwell - 6%, on Ryzen - 3%. Let's even have a look at absolute cycles per processed byte values for GHASH: Haswell 0.41 Broadwell 0.29 Skylake 0.36 Do note that effectively we observe regression on Skylake, also note that it's ~20%. I mean if Skylake inherited Broadwell qualities, GHASH would have performed ~20% better. With all the numbers in mind it's only natural to conclude that vector byte swap "costs" unproportionally much specifically on Skylake [and in GHASH], and it's rather anomaly than rule. And it's not unlikely to be addressed in next processor generation. At least other alike cases were, just look at improvement from Haswell to Broadwell above. And if not, maybe it would still be more appropriate to demand that Intel fixes this in next-next generation, instead of introducing new algorithm. Because latter approach is not really sustainable. I mean how many algorithms would we have to implement and support if they were introduced on per specific-processor-pipeline-anomaly basis? Besides, even 20% improvement has to be considered in overall context, not by itself. I mean if the only difference between two otherwise equivalent proposals was POLYVAL vs. GHASH, then improvement for *whole* operation would be ~6% [on Skylake, and none to negligible on others]. Or in other words this proposal *could* just as well be formulated with GHASH instead of POLYVAL, there is no *compelling* reason to introduce another algorithm. And same can be said about suggested CTR tweak. I.e. suggested mode *could* just as well be formulated with standard CTR. Even more so, as there is no apparent difference in performance [on platform of popular choice]. Well, it's not like I actually reject the sentiment to favour little-endian platform[s], question rather is if *mode* specification is right place for it. Protocols are about *communication* and there are more factors in play, most notably using already available primitives would facilitate adoption. And if little-endian-centric primitives are deemed desirable, wouldn't it be more appropriate to define them separately to promote modularity and re-usability? Cheers.
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Paterson, Kenny
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Yehuda Lindell
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Stefano Tessaro
- [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06 Alexey Melnikov
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Blumenthal, Uri - 0553 - MITLL
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Ted Krovetz
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Watson Ladd
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Adam Langley
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Shay Gueron
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Shay Gueron
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
- Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Stefano Tessaro