Re: [apps-discuss] RFC 6657 on Update to MIME regarding "charset" Parameter Handling in Textual Media Types

Ned Freed <ned.freed@mrochek.com> Wed, 11 July 2012 05:46 UTC

Return-Path: <ned.freed@mrochek.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BBA5311E80BA for <apps-discuss@ietfa.amsl.com>; Tue, 10 Jul 2012 22:46:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.503
X-Spam-Level:
X-Spam-Status: No, score=-2.503 tagged_above=-999 required=5 tests=[AWL=0.096, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LJ70WfRTKPNe for <apps-discuss@ietfa.amsl.com>; Tue, 10 Jul 2012 22:46:00 -0700 (PDT)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.59.230.40]) by ietfa.amsl.com (Postfix) with ESMTP id 58B3311E8073 for <apps-discuss@ietf.org>; Tue, 10 Jul 2012 22:45:59 -0700 (PDT)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OHP9J99HVK0054FL@mauve.mrochek.com> for apps-discuss@ietf.org; Tue, 10 Jul 2012 22:45:53 -0700 (PDT)
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OHLKS3CK340006TF@mauve.mrochek.com>; Tue, 10 Jul 2012 22:45:50 -0700 (PDT)
Message-id: <01OHP9J7BTNM0006TF@mauve.mrochek.com>
Date: Tue, 10 Jul 2012 22:44:54 -0700
From: Ned Freed <ned.freed@mrochek.com>
In-reply-to: "Your message dated Tue, 10 Jul 2012 23:58:29 +0100" <4FFCB395.9030400@zoo.ox.ac.uk>
MIME-version: 1.0
Content-type: TEXT/PLAIN; format="flowed"
References: <20120710000754.6BF59B1E006@rfc-editor.org> <4FFBE454.1020601@zoo.ox.ac.uk> <01OHOK4TIDIW0006TF@mauve.mrochek.com> <4FFCB395.9030400@zoo.ox.ac.uk>
To: Graham Klyne <Graham.Klyne@zoo.ox.ac.uk>
Cc: Ned Freed <ned.freed@mrochek.com>, apps-discuss@ietf.org
Subject: Re: [apps-discuss] RFC 6657 on Update to MIME regarding "charset" Parameter Handling in Textual Media Types
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Jul 2012 05:46:00 -0000

> On 10/07/2012 18:18, Ned Freed wrote:
> >> On 10/07/2012 01:07, rfc-editor@rfc-editor.org wrote:
> >> >
> >> > A new Request for Comments is now available in online RFC libraries.
> >> >
> >> >
> >> > RFC 6657
> >> >
> >> > Title: Update to MIME regarding "charset"
> >> > Parameter Handling in Textual Media Types
> >
> >> I didn't see this one coming.
> >
> > It was discussed at considerable length both here and on the IETF list.

> Sure, I just meant that I missed it.

> >> I'm a bit confused by the specification.
> >
> > You need to keep in mind that this only applies to subtypes of text.

> Ack.

> >> If we define a media type that is *always* UTF-8, does this count as
> >> transporting its own charset information?
> >
> > That's one approach you can use. The alternatives are to allow or require
> > a charset parameter, always with the value utf-8. The best approach depends
> > on the specifics of the type.
> >
> >> Should we say that the media type
> >> SHOULD NOT be included, or that it SHOULD be included with value UTF-8?
> >
> > Included where? Within the content? If so, that's up to the registration to
> > say. There are plenty of utf-8 based formats that don't provide for inclusion
> > of media type information - and that includes some that use XML syntax.

> Doh... I meant media type "charset" parameter.

> There's no character encoding information in the content.

> >> Section
> >> 3 implies the latter, but it also talks about media types defining their own
> >> default encoding.
> >
> > Relying on defaults is discouraged for historical reasons - they don't work
> > very well. As such, if it's possible for the type to explicitly say what the
> > charset is, that's probably the best way to do it. If the type isn't capable of
> > that for whatever reason, your options are to simply say it's always utf-8 or
> > alternately allow or require a charset parameter with utf-8 as the only value.
> > The best approach depends on the situation, which is why the document is full
> > of SHOULDs, not MUSTs.

> Yeah, one of those.  I expect it will come up for review very soon, so you can
> comment if we've made the wrong call.

> >> (This is not an academic question - a W3C group I'm involved with is about to
> >> submit a registration for a UTF-8 only text/... media type)
> >
> > Does this type actually meet the criteria for text specified in RFC 2046
> > section 4.1? I rather suspect it doesn't. If not, it really has no business
> > being a text subtype, and all of this is moot.

> I believe it does.  We're not talking XML or anything like that.  It's a textual
> notation for provenance information, intended for human and occasional machine
> consumption.

Then my suggestion would be to make the charset parameter mandatory, with 
the only legal value being utf-8. The alternative would be to omit
it and specify utf-8 as the default, but as I said, that's not likely
to interoperate well.

				Ned