Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-05.txt
Bjoern Hoehrmann <derhoermi@gmx.net> Tue, 19 November 2013 15:03 UTC
Return-Path: <derhoermi@gmx.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3B3E61AE005 for <apps-discuss@ietfa.amsl.com>; Tue, 19 Nov 2013 07:03:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.425
X-Spam-Level:
X-Spam-Status: No, score=-2.425 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.525, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3QkM0KLIsPqf for <apps-discuss@ietfa.amsl.com>; Tue, 19 Nov 2013 07:03:07 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) by ietfa.amsl.com (Postfix) with ESMTP id 644051AE001 for <apps-discuss@ietf.org>; Tue, 19 Nov 2013 07:03:07 -0800 (PST)
Received: from netb.Speedport_W_700V ([91.35.62.159]) by mail.gmx.com (mrgmx103) with ESMTPA (Nemesis) id 0MU0U9-1W8o1S3Ns9-00Qi0R for <apps-discuss@ietf.org>; Tue, 19 Nov 2013 16:03:00 +0100
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: ht@inf.ed.ac.uk
Date: Tue, 19 Nov 2013 16:03:00 +0100
Message-ID: <q8nm895dap8iefa6srlf1k5787j8fuc6n4@hive.bjoern.hoehrmann.de>
References: <20131119120919.12901.59046.idtracker@ietfa.amsl.com> <f5b1u2cr365.fsf@troutbeck.inf.ed.ac.uk>
In-Reply-To: <f5b1u2cr365.fsf@troutbeck.inf.ed.ac.uk>
X-Mailer: Forte Agent 3.3/32.846
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K0:hnxvgJS9YfpGiBEdwu1BHs1e22GlPz1GaJufieRTa3xcJddxhAf sWwUWdPqUk/NawGXsiu1FK4SQI5Ze8CymuDMWxRQcvuc+ckG8LBUfR6cBY6D4+B6S63skll IgsfwcaYSBOUXDKn7EZSxyQUYMSO1EB27BxGDOc4aIB+ESdkW3x+zU9i8PKQqUNQ0Jeq176 U3mZiuPqO0EC6tJ7BYP9Q==
Cc: apps-discuss@ietf.org
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-05.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 15:03:10 -0000
* Henry S. Thompson wrote: >My reasoning for not changing 3.6 in response to Bjoern Hoehrmann's >objection to the treatment of a BOM as authoritative even in the >presence of a charset parameter [1] is set out in [2]. I said that I think this is a serious and problematic change that needs wide review prior to a two-week Last Call period and that the change is fine by me provided there is evidence of wide review. It would be easy, for instance, to ask on a set of mailing lists, say * ietf-charsets@iana.org * www-international@w3.org * unicode@unicode.org for endorsements for one part of the proposal which would require XML implementations to treat data:application/xml;charset=utf-32,%FF%FE%00%00... as malformed UTF-16 encoded document, and if the question is not mis- leading and there are endorsements by qualified individuals then I'd be satisfied that this part of the proposal has received adequate review. You could also point me to messages in archive of this list where such individuals have provided rationale for their support of this change. I note that the "Changes from RFC 3023" appendix of the document does not mention the changes in question and you need to have a good grasp of the issues to notice them when carefully reading the document. If the document had carefully explained the changes where appropriate, for instance, for the issue above, which is just one among several, e.g. NOTE: While appendix F.1 of the XML 1.0 Recommendation suggests that an initial byte sequence of FF FE 00 00 is indicative of the UTF-32 character encoding, the rules of this specification require such a sequence to be interpreted as a UTF-16 Byte Order Mark followed by a U+0000 character rendering the document malformed. It is therefore impossible to use UTF-32 in application/xml, ... entities; a charset parameter cannot be used to prevent misinterpretation of documents encoded in UTF-32 when using the media types defined in this memo. then there might be a plausible claim people knew of this change and some of its implications and signed off on it. The document does not have that, so I find it reasonable to believe that people are unaware of these changes, have not considered their implications, and possibly would object to them if they did. I can't even tell myself whether a note such as the one above accurately reflects what you mean to propose. I do not think it would be useful to continue the discussion between only the two of us, and I find your misrepresentation of my position most unhelpful. I do appreciate that you have now started to gather some information on what running code actually does, but let's have a look at your claims: Expat is a surprising case -- it provides a parameter which can be used to pass in a character encoding name, but it will ignore this if it detects a different encoding in its input byte-stream (tested via both the Perl and Python embeddings, and confirmed by examining the source). So in fact expat does explicitly to treat a BOM as authoritative. Well then let us examine the source: /* This is what detects the encoding. ... */ static int initScan(const ENCODING * const *encodingTable, ... case 0xEFBB: /* Maybe a UTF-8 BOM (EF BB BF) */ /* If there's an explicitly specified (external) encoding of ISO-8859-1 or some flavour of UTF-16 and this is an external text entity, don't look for the BOM, because it might be a legal data. */ What does that suggest about expat always looking for a BOM that overrides anything else? Or how about this claim: (I'm not ignoring your reminder that you have built an add-on to perl's HTML::Parser module which treats the charset as authoritative, but since that module does not qualify as a conformant XML parser in any case, it's not really relevant to 3023bis). The W3C Markup Validator uses my HTML::Encoding module to detect the character encoding of HTML and XHTML documents. XHTML documents use whatever rules there are for XML documents to detect the encoding. It is an implementation of "Given a HTTP response, what is the encoding of the XML document in it", which is what most of the draft is about. Inconsistent, ad-hoc, and changing character encoding detection rules have been a long-standing concern of mine and I have tried to reach out e.g. in http://www.unicode.org/mail-arch/unicode-ml/y2010-m10/0003.html to others to improve the situation. It is not too much to ask of the Applications Area Working Group to do the same. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
- [apps-discuss] I-D Action: draft-ietf-appsawg-xml… internet-drafts
- Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
- Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Bjoern Hoehrmann
- Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Rushforth, Peter
- Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
- [apps-discuss] expat and the BOM (was Re: I-D Act… Henry S. Thompson
- Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Martin J. Dürst
- Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
- Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Martin J. Dürst
- Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
- Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson