[apps-discuss] draft-ietf-appsawg-xml-mediatypes vs. JSON and BOM and UTF-8

Larry Masinter <masinter@adobe.com> Tue, 07 January 2014 19:01 UTC

Return-Path: <masinter@adobe.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B1F071ADF9B for <apps-discuss@ietfa.amsl.com>; Tue, 7 Jan 2014 11:01:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.154
X-Spam-Level:
X-Spam-Status: No, score=0.154 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FRT_ADOBE2=2.455, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dgDQnle4W1Ya for <apps-discuss@ietfa.amsl.com>; Tue, 7 Jan 2014 11:01:42 -0800 (PST)
Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2lp0237.outbound.protection.outlook.com [207.46.163.237]) by ietfa.amsl.com (Postfix) with ESMTP id 5845B1AD935 for <apps-discuss@ietf.org>; Tue, 7 Jan 2014 11:01:42 -0800 (PST)
Received: from BL2PR02MB307.namprd02.prod.outlook.com (10.141.91.21) by BL2PR02MB308.namprd02.prod.outlook.com (10.141.91.24) with Microsoft SMTP Server (TLS) id 15.0.842.7; Tue, 7 Jan 2014 19:01:32 +0000
Received: from BL2PR02MB307.namprd02.prod.outlook.com ([10.141.91.21]) by BL2PR02MB307.namprd02.prod.outlook.com ([10.141.91.21]) with mapi id 15.00.0842.003; Tue, 7 Jan 2014 19:01:32 +0000
From: Larry Masinter <masinter@adobe.com>
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>, Tim Bray <tbray@textuality.com>
Thread-Topic: draft-ietf-appsawg-xml-mediatypes vs. JSON and BOM and UTF-8
Thread-Index: Ac8L2XBoRlaAodKPT3Ke5g3Ngorsig==
Date: Tue, 07 Jan 2014 19:01:31 +0000
Message-ID: <dc29826a2bbf48088abe51bb5de22e0d@BL2PR02MB307.namprd02.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [50.184.24.49]
x-forefront-prvs: 008421A8FF
x-forefront-antispam-report: SFV:NSPM; SFS:(10019001)(13464003)(24454002)(189002)(199002)(479174003)(51704005)(377454003)(65816001)(80022001)(74706001)(4396001)(85852003)(76176001)(66066001)(19580395003)(76796001)(33646001)(16601075003)(87266001)(76786001)(15975445006)(81816001)(76576001)(63696002)(74876001)(81686001)(56776001)(83072002)(77982001)(54316002)(87936001)(59766001)(80976001)(83322001)(19580405001)(81542001)(2656002)(54356001)(74316001)(76482001)(81342001)(53806001)(46102001)(50986001)(90146001)(74502001)(74662001)(47736001)(56816005)(15202345003)(47976001)(74366001)(85306002)(31966008)(47446002)(49866001)(69226001)(79102001)(51856001)(24736002); DIR:OUT; SFP:1102; SCL:1; SRVR:BL2PR02MB308; H:BL2PR02MB307.namprd02.prod.outlook.com; CLIP:50.184.24.49; FPR:; RD:InfoNoRecords; MX:1; A:1; LANG:en;
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: adobe.com
Cc: IETF Apps Discuss <apps-discuss@ietf.org>
Subject: [apps-discuss] draft-ietf-appsawg-xml-mediatypes vs. JSON and BOM and UTF-8
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jan 2014 19:01:44 -0000

I have some discussion topics for draft-ietf-appsawg-xml-mediatypes which I will send out one-by-one.
This first one was discussed but here I'm making some specific suggestions.

It cannot be right that everyone specifying a text-based media type should have to go through the process of deciding, for themselves, independently, how to decide between conflicting sources of information about charset, coming from an initial BOM, from external metadata, from any supplied "charset" parameter of the media type, and from internal embedded metadata such as found in XML.

If the future is UTF-8, UTF-8, UTF-8, then the two documents should say so, right at the beginning.

http://tools.ietf.org/html/draft-ietf-appsawg-xml-mediatypes-06#section-3
      Going forward, XML producers SHOULD use UTF-8 exclusively, without any BOM. For compatibility with existing implementations, processing rules are given.

Then section 3.1 XML MIME producers
    "XML MIME producers" generating a MIME body (who SHOULD encode the XML as UTF-8 without a BOM, and SHOULD also include a UTF-8 encoding declaration)
And
   'XML MIME wrappers" those that are not re-encoding an XML body and just want to deliver it via MIME
   who can follow the guidelines in 3.1. 
   I don't really understand what a "XML-unaware MIME producer" is. If they're "unaware", why are they reading this spec?

Larry
--
http://larry.masinter.net

-----Original Message-----
From: "Martin J. Dürst" [mailto:duerst@it.aoyama.ac.jp] 
Sent: Saturday, December 14, 2013 4:26 AM
To: Tim Bray
Cc: Larry Masinter; IETF Apps Discuss
Subject: Re: [apps-discuss] Unicode BOM detection in JSON vs. draft-ietf-appsawg-xml-mediatypes

On 2013/12/14 7:20, Tim Bray wrote:
> I actually think 4627bis is perfectly clear. You can use UTF-{8/16/32} in
> theory; in practice only UTF-8 works over the wire; you MUST NOT emit a BOM
> but if some dork does, it’s OK to forgive. There’s no charset parameter on
> the media type.
>
> If we were doing XML in 2013, I think we’d probably end up at the same
> place. Note that the original XML spec was written in ISO-8859-1 :)

Yes. On the surface, some of the issues in JSON and XML may look 
similar, but in actual practice, they are quite different. So a common 
model or a common spec doesn't make sense.

We have had some kind of common model from around 1996 or so, which was 
that the outside (charset parameter) has precedence. This was based on 
the assumption that there were transcoding proxies than didn't look 
inside the document, and on statements from Netscape (when it was still 
the dominant browser) that this was what they did.

It turned out that there were few if any transcoding proxies, and 
external information didn't get preserved well in a file system, and so 
the practice moved more and more to give priority of internal 
information over external.

In ietf-appsawg-xml-mediatypes, we can see one specific example of what 
this history has resulted in: carefully defined specific steps, but not 
too pretty an overall picture.

The model of the future is much simpler: UTF-8, UTF-8, UTF-8. Once 
everything is UTF-8, the BOM will die a quiet death, too.

Regards,   Martin.

> On Fri, Dec 13, 2013 at 2:10 PM, Larry Masinter<masinter@adobe.com>  wrote:
>
>>   Compare
>> http://tools.ietf.org/html/draft-ietf-json-rfc4627bis-09#section-8.1
>>
>> and
>> http://tools.ietf.org/html/draft-ietf-appsawg-xml-mediatypes-06#section-3
>>
>>
>>
>> it’s hard to tell exactly what either is saying, much less whether they’re
>> saying
>>
>> the same thing.   I don’t want to slow down either of these, but some work
>> on
>>
>> a model for how MIME types should deal with the relationship between
>>
>>
>>
>> -          Charset parameter in content-type
>>
>> -          Possible BOM maker
>>
>> -          Other in-content character set indication (like in XML)
>>
>> -          Other protocol sources of charset information
>>
>>
>>
>> JSON, XML, HTML, IRIs, elements based on those, and likely many other
>>
>> protocol elements are defined using BNF or other syntactic description
>>
>> techniques over the space of sequence-of-Unicode-characters
>>
>> (sequence-of-Unicode-code-point).
>>
>>
>>
>> Has there been any attempts to create a separate BCP that JSON and XML
>>
>> Could (in the future) reference? The JSON working group spent only
>>
>> a few thousand emails on this.
>>
>>
>>
>> Larry
>>
>> --
>>
>> http://larry.masinter.net
>>
>>
>>
>>
>>
>
>
>
> _______________________________________________
> apps-discuss mailing list
> apps-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/apps-discuss