Re: [EAI] UTF-8 in Message-IDs

"Charles Lindsey" <chl@clerew.man.ac.uk> Wed, 17 August 2011 10:31 UTC

Return-Path: <chl@clerew.man.ac.uk>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 86D0921F8AFA for <ima@ietfa.amsl.com>; Wed, 17 Aug 2011 03:31:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.349
X-Spam-Level:
X-Spam-Status: No, score=-4.349 tagged_above=-999 required=5 tests=[AWL=-0.902, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9r2eTWtTDr15 for <ima@ietfa.amsl.com>; Wed, 17 Aug 2011 03:31:56 -0700 (PDT)
Received: from outbound-queue-2.mail.thdo.gradwell.net (outbound-queue-2.mail.thdo.gradwell.net [212.11.70.35]) by ietfa.amsl.com (Postfix) with ESMTP id 207BD21F8AF0 for <ima@ietf.org>; Wed, 17 Aug 2011 03:31:55 -0700 (PDT)
Received: from outbound-edge-2.mail.thdo.gradwell.net (bonnie.gradwell.net [212.11.70.2]) by outbound-queue-2.mail.thdo.gradwell.net (Postfix) with ESMTP id 9240B220E7 for <ima@ietf.org>; Wed, 17 Aug 2011 11:32:45 +0100 (BST)
Received: from port-89.xxx.th.newnet.co.uk (HELO clerew.man.ac.uk) (80.175.135.89) (smtp-auth username postmaster%pop3.clerew.man.ac.uk, mechanism cram-md5) by outbound-edge-2.mail.thdo.gradwell.net (qpsmtpd/0.83) with (DES-CBC3-SHA encrypted) ESMTPSA; Wed, 17 Aug 2011 11:32:45 +0100
Received: from clerew.man.ac.uk (localhost [127.0.0.1]) by clerew.man.ac.uk (8.13.7/8.13.7) with ESMTP id p7HAWgYW013411 for <ima@ietf.org>; Wed, 17 Aug 2011 11:32:44 +0100 (BST)
Date: Wed, 17 Aug 2011 11:32:42 +0100
To: IMA <ima@ietf.org>
From: Charles Lindsey <chl@clerew.man.ac.uk>
Content-Type: text/plain; format="flowed"; delsp="yes"; charset="iso-8859-1"
MIME-Version: 1.0
References: <CAHhFybo47--0YjCRcvSO4asoV_R89+ULDB3tyij+ba=O_6gKsQ@mail.gmail.com> <01O4T11O8X4M00VHKR@mauve.mrochek.com> <op.vz8z3v0a6hl8nm@clerew.man.ac.uk> <01O4VFNKDGEE00VHKR@mauve.mrochek.com>
Content-Transfer-Encoding: 8bit
Message-ID: <op.v0cswsg76hl8nm@clerew.man.ac.uk>
In-Reply-To: <01O4VFNKDGEE00VHKR@mauve.mrochek.com>
User-Agent: Opera Mail/9.25 (SunOS)
X-Gradwell-MongoId: 4e4b98cd.48f3-3d50-2
X-Gradwell-Auth-Method: mailbox
X-Gradwell-Auth-Credentials: postmaster@pop3.clerew.man.ac.uk
Subject: Re: [EAI] UTF-8 in Message-IDs
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 10:31:57 -0000

On Mon, 15 Aug 2011 16:19:44 +0100, <ned+ima@mrochek.com> wrote:

>> On Sat, 13 Aug 2011 23:08:48 +0100, <ned+ima@mrochek.com> wrote:

>> Yes, but you have to consider all the other protocols using mail-like
>> formats and their use of the Message-ID.
>
> These protocols don't make use of email addresses?

Some do some don't. In Netnews they are not used except to reply to the  
author or an article by email.

>> For exmaple, if EAI were to be
>> carried over into Netnews (quite a likely development) it would NOT be
>> regarded as a "new message format" since the transport paths for Netnews
>> are already 8-bit clean and it would simply be necessary for those who
>> wish to take advantage of the new facilities to ensure that their user
>> agents were suitably upgraded. There will never be a need for
>> "downgrading" except at gateways back into the email system.
>
> And all netnews applications properly handle the construction and  
> display of
> utf-8 addresses in message headers? Just as one example, netnews already
> suports the proper form of Unicode normalization on input for this.  
> Right?

No. If you want to participate in newsgroups where strange characters are  
likely to be used, then you need to ensure your MUA has the necessary  
capability. That's your problem.
>
> An 8-bit clean transport is nowhere near sufficient to accomodate the  
> other
> changes to the format we're making here. Message ids are actually the  
> least of
> it since they are machine generated.

It is sufficient for the existing Netnews transport to deliver articles  
with utf-8 headers to participants; its their problem to get them  
displayed. The only two headers that are necessary in the transport layer  
are the Newsgroups header and the Message-ID. It has already been  
demonstrated that a Newsgroups header with utf-8 will be delivered  
correctly, but a standard would be needed to ensure proper normalization.  
The Message-ID is crucial to the correct operation of the Netnews  
transport, and the effect of using utf-8 in it has not been investigated  
(but should be). It is expected that some ancient transport agents might  
not like it but even that might not be a show stopper, since Netnews is  
very good at navigating around such obstacles.

> I'm sorry, but the entire approach of this WG is to define a format that  
> is NOT
> downgradeable to previous formats without substantial information loss.

Utter rubbish!

We are not designing a protocol that goes out of its way to make  
downgrading impossible. We are designing a protocol in which downgrading  
is not REQUIRED to be possible, but we should still avoid putting  
unnecessary obstacles in its way.

And, as I have said, in Netnews downgrading is not an issue (except at  
gateways out of Netnews).

Returning now to email, the Message-ID is only important when used in the  
References and In-Reply-To headers, where is is generally used by MUAs for  
threading purposes. So the worst that can happen if Message-IDs get  
garbled in transport is that threading fails to work.

My suggestion is that we retain a requirement that Message-ID remain in  
pure ASCII but add a remark that this position is likely to be reviewed in  
a future version of the standard.

If then, as is likely, they start to appear, then we will see how they  
cope and decide how best to regulate them in a future standard. In the  
meantime, they will do little harm. But when that future standard comes,  
it have to pay attention to exactly what sorts of things can appear in  
them (as indeed 5322 places quite a lot of restrictions) and, in addition,  
it will have to pay careful attention to normalization if threading it to  
continue to work.

And I don't think we want to go into that sort of detail at this present  
stage of our work.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131                       
   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5