Re: [Simple] SIMPLE chat: Internationalization and normalization of nicknames

Peter Saint-Andre <stpeter@stpeter.im> Fri, 02 March 2012 21:23 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: simple@ietfa.amsl.com
Delivered-To: simple@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6AB9A21E8021 for <simple@ietfa.amsl.com>; Fri, 2 Mar 2012 13:23:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.788
X-Spam-Level:
X-Spam-Status: No, score=-103.788 tagged_above=-999 required=5 tests=[AWL=0.811, BAYES_00=-2.599, GB_I_LETTER=-2, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CpSO60JzvNMW for <simple@ietfa.amsl.com>; Fri, 2 Mar 2012 13:23:51 -0800 (PST)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id 978F321E8019 for <simple@ietf.org>; Fri, 2 Mar 2012 13:23:47 -0800 (PST)
Received: from squire.local (unknown [64.101.72.114]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 40E3B40058; Fri, 2 Mar 2012 14:35:31 -0700 (MST)
Message-ID: <4F513A61.30103@stpeter.im>
Date: Fri, 02 Mar 2012 14:23:45 -0700
From: Peter Saint-Andre <stpeter@stpeter.im>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: Ben Campbell <ben@nostrum.com>
References: <4F4FF06F.6040906@ericsson.com> <79EDE431-3574-4BDF-82E0-4C803511A183@nostrum.com> <4F4FF778.8000402@stpeter.im> <03F68416-1641-4465-9913-9B2740ADBD53@nostrum.com> <4F4FFCD7.1060700@stpeter.im> <5D79B0DC-24F4-44A2-92F5-F3CB6D1BACC1@nostrum.com> <4F50F7BB.5030306@alum.mit.edu> <13674D46-6517-44C3-8172-3580158EE76B@nostrum.com>
In-Reply-To: <13674D46-6517-44C3-8172-3580158EE76B@nostrum.com>
X-Enigmail-Version: 1.3.5
OpenPGP: url=https://stpeter.im/stpeter.asc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Cc: simple@ietf.org, Paul Kyzivat <pkyzivat@alum.mit.edu>
Subject: Re: [Simple] SIMPLE chat: Internationalization and normalization of nicknames
X-BeenThere: simple@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP for Instant Messaging and Presence Leveraging Extensions <simple.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/simple>, <mailto:simple-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/simple>
List-Post: <mailto:simple@ietf.org>
List-Help: <mailto:simple-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/simple>, <mailto:simple-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Mar 2012 21:23:52 -0000

On 3/2/12 12:08 PM, Ben Campbell wrote:
> 
> On Mar 2, 2012, at 10:39 AM, Paul Kyzivat wrote:
> 
>> On 3/1/12 5:54 PM, Ben Campbell wrote:
>> 
>>>> OK. It seems that you might also be making some authorization
>>>> decisions based on the nickname, e.g., if you've reserved a
>>>> nickname then I can't get that nickname (can I join the
>>>> chatroom but just not with that nickname?).
>>>> 
>>> 
>>> Yes, that is true. We do need an equality operator in some sense.
>>> (And yes, assuming no other policy issue was blocking you, you
>>> could still use a different nickname)
>> 
>> And this presumably needs some notion of canonicalization, so that
>> you can't get two nicknames that compare as different but *look*
>> the same.
> 
> That's going to be a very hard problem to solve. I suspect we will
> end up with text that points out that this can happen, but leave it
> up to implementors to figure out what to do about it.

The problem of so-called confusable characters is impossible to solve.
See for instance:

http://tools.ietf.org/html/draft-ietf-precis-framework-01#section-10.3

The Unicode Consortium has gone a bit farther:

http://unicode.org/reports/tr39/#Confusable_Detection

You can nibble around the edges here, but foolproof solutions cannot be
had (in large measure because humans are so easily fooled).

> (Peter: Do you have your Cherokee example handy?)

http://stpeter.im/journal/1420.html

That is: ᏚᎢᎵᏋᎢᏋᏒ != STPETER (and Cherokee Letter Quv = Ꮛ is not defined
by the Unicode standard as confusable with, mapped to, or otherwise
equivalent to Latin Uppercase E despite their surface similarity).

Peter

-- 
Peter Saint-Andre
https://stpeter.im/