Re: [ietf-types] Request for review of text/{turtle,n3}

Eric Prud'hommeaux <eric@w3.org> Sat, 12 March 2011 13:45 UTC

Return-Path: <ericw3c@gmail.com>
X-Original-To: ietf-types@core3.amsl.com
Delivered-To: ietf-types@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8A9D73A6926 for <ietf-types@core3.amsl.com>; Sat, 12 Mar 2011 05:45:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.999
X-Spam-Level:
X-Spam-Status: No, score=-4.999 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, GB_I_LETTER=-2, J_CHICKENPOX_48=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a1QCBX5UMUFm for <ietf-types@core3.amsl.com>; Sat, 12 Mar 2011 05:45:40 -0800 (PST)
Received: from mail-qw0-f44.google.com (mail-qw0-f44.google.com [209.85.216.44]) by core3.amsl.com (Postfix) with ESMTP id 6AAAC3A6B56 for <ietf-types@ietf.org>; Sat, 12 Mar 2011 05:45:40 -0800 (PST)
Received: by qwg5 with SMTP id 5so490076qwg.31 for <ietf-types@ietf.org>; Sat, 12 Mar 2011 05:47:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:date:from:to:cc:subject:message-id :references:mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:organization:user-agent; bh=IPEX9oyCr/oLGxh2QkqMtudeXY9V+IfYEYPB7gJ8Who=; b=El8MAJ6R2RfpJcpl73zGg1xON9TLUcnVhMFK1kQ/uzW1vwLRDxJk+/oWeP3BU+y5g7 IOxAest+uA0Uy0IpGRPVk8NsOzKaA8LlzGFxJlHVBTKNmUXl/ockgbhYkLLOlkOWfwkT ACtySBSWEauiG4+YHkGGOSDv67TNogr8He6ro=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:organization:user-agent; b=YR/9sfXKOgpmMKrHlwan9YuTOoyVa3JWG7QADNRC49CMKd9vmcys+DdA8jICGdp33N pGj4JKOrCiG8bJOGZam+Jo+n42d8id2lW+7oEvduGuhP7u2w6mTgaalu+27MXBW5n0a1 s46c3CSQipb5PsQxZ9JML8so/cZr30Efa6MZc=
Received: by 10.229.32.69 with SMTP id b5mr9162530qcd.266.1299937620661; Sat, 12 Mar 2011 05:47:00 -0800 (PST)
Received: from w3.org (c-24-218-124-254.hsd1.ma.comcast.net [24.218.124.254]) by mx.google.com with ESMTPS id p13sm4008442qcu.29.2011.03.12.05.46.59 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 12 Mar 2011 05:46:59 -0800 (PST)
Sender: Eric Prud'hommeaux <ericw3c@gmail.com>
Date: Sat, 12 Mar 2011 08:46:57 -0500
From: Eric Prud'hommeaux <eric@w3.org>
To: Nathan <nathan@webr3.org>
Message-ID: <20110312134656.GA9642@w3.org>
References: <20110311233325.GC7042@w3.org> <4D7ABDC0.6090509@webr3.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="azLHFNyN32YCQGCU"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <4D7ABDC0.6090509@webr3.org>
Organization: World Wide Web Consortium (W3C) - http://www.w3.org/
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: ietf-types <ietf-types@ietf.org>
Subject: Re: [ietf-types] Request for review of text/{turtle,n3}
X-BeenThere: ietf-types@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "Media \(MIME\) type review" <ietf-types.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ietf-types>, <mailto:ietf-types-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf-types>
List-Post: <mailto:ietf-types@ietf.org>
List-Help: <mailto:ietf-types-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-types>, <mailto:ietf-types-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Mar 2011 13:45:42 -0000

* Nathan <nathan@webr3.org> [2011-03-12 00:26+0000]
> possible bugs..

tx for vigilance

> Eric Prud'hommeaux wrote:
> >Believing all of the issues in
> ><http://www.w3.org/2008/01/rdf-media-types> to be resolved, I'd like a
> >review of the media type text/turtle . The Published specification
> >will be <http://www.w3.org/TeamSubmission/2011/SUBM-turtle-20110314/>,
> >which will be the same as
> ><http://www.w3.org/TeamSubmission/2008/SUBM-turtle-20080114/> with
> >modifications to the media type sections.
> >
> >Contact:
> >    Eric Prud'hommeaux
> >See also:
> >    How to Register a Media Type for a W3C Specification
> >    Internet Media Type registration, consistency of use
> >    TAG Finding 3 June 2002 (Revised 4 September 2002)
> >
> >The Internet Media Type / MIME Type for Turtle is "text/turtle".
> >
> >It is recommended that Turtle files have the extension ".ttl" (all lowercase) on all platforms.
> >
> >It is recommended that sparql query files stored on Macintosh HFS file systems be given a file type of "TEXT".
> 
> s/sparql/Turtle ??

fixed in attached


> >This information that follows has been submitted to the IESG for review, approval, and registration with IANA.
> >
> >Type name:
> >    text
> >Subtype name:
> >    turtle
> >Required parameters:
> >    None
> >Optional parameters:
> >    charset — this parameter is required when transfering non-ASCII data. If present, the value of charset is always UTF-8.
> 
> s/transfering/transferring

fixed in attached


> >Encoding considerations:
> >    The syntax of Turtle is expressed over code points in Unicode [UNICODE]. The encoding is always UTF-8 [RFC3629].
> >    Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-F]
> >Security considerations:
> >    Turtle is a general-purpose assertion language; applications may evaluate given data to infer more assertions or to dereference URIs, invoking the security considerations of the scheme for that URI. Note in particular, the privacy issues in RFC3023 section 10 for HTTP URIs. Data obtained from an inaccurate or malicious data source may lead to inaccurate or misleading conclusions, as well as the dereferencing of unintended URIs. Care must be taken to align the trust in consulted resources with the sensitivity of the intended use of the data; inferences of potential medical treatments would likely require different trust than inferences for trip planning.
> >    Turtle is used to express arbitrary application data; security considerations will vary by domain of use. Security tools and protocols applicable to text (e.g. PGP encryption, MD5 sum validation, password-protected compression) may also be used on Turtle documents. Security/privacy protocols must be imposed which reflect the sensitivity of the embedded information.
> >    Turtle can express data which is presented to the user, for example, RDF Schema labels. Application rendering strings retrieved from untrusted Turtle documents must ensure that malignant strings may not be used to mislead the reader. The security considerations in the media type registration for XML ([RFC3023] section 10) provide additional guidance around the expression of arbitrary data and markup.
> >    Turtle uses IRIs as term identifiers. Applications interpreting data expressed in Turtle sould address the security issues of Internationalized Resource Identifiers (IRIs) [RFC3987] Section 8, as well as Uniform Resource Identifier (URI): Generic Syntax [RFC3986] Section 7.
> 
> s/sould/should

fixed in attached


> also, should it be referencing IRIs throughout, rather than URIs and IRIs?

I think the dereferencing section should talk about URIs as HTTP, and
I think all, URL schemes follow RFC2718's recommendation¹ of turning
non-ascii chars into bytes and %HH-encoding them. Thus, all
dereferencing happens via URNs. The media type *could* talk about
referencing IRIs and either explicitly call out the URI translation,
or leave it as an exercise for the reader. I'm interested in editorial
suggestions.

  ¹ http://tools.ietf.org/html/rfc2718#section-2.2.5

The other place URI is used is in the template heading "Base URI". The
media type registration template effectively asks "how does your
language represent base URIs?" and I've answered it with "with @base,
but it may also be an IRI".

Should I change the template to include "Base IRI" instead of "Base URI"?


> >    Multiple IRIs may have the same appearance. Characters in different scripts may look similar (a Cyrillic "о" may appear similar to a Latin "o"). A character followed by combining characters may have the same visual representation as another character (LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT has the same visual representation as LATIN SMALL LETTER E WITH ACUTE). Any person or application that is writing or interpreting data in Turtle must take care to use the IRI that matches the intended semantics, and avoid IRIs that make look similar. Further information about matching of similar characters can be found in Unicode Security Considerations [UNISEC] and Internationalized Resource Identifiers (IRIs) [RFC3987] Section 8.
> >Interoperability considerations:
> >    There are no known interoperability issues.
> >Published specification:
> >    http://www.w3.org/TeamSubmission/2011/SUBM-turtle-20110314/
> >Applications which use this media type:
> >    No widely deployed applications are known to use this media type. It may be used by some web services and clients consuming their data.
> >Additional information:
> >Magic number(s):
> >    Turtle documents may have the strings '@prefix' or '@base' (case dependent) near the beginning of the document.
> 
> unsure this counts as magic numbers??

* Nathan <nathan@webr3.org> [2011-03-12 00:29+0000]
> as before, afaict this isn't a magic number that can reliably be
> used for sniffing and detection of content type.. possibly remove

I had the impression that this was a call for heuristics which would
allow one to sniff a reasonable subset of the documents in the wild.
I note that Turtle and Notation3 are indistinguishable by the supplied
"Magic number" discriminators, and that the the real discriminator, a
lexical token '{', is hard to describe without rolling a lot of the
lexer into an outrageous regex:

  (   '<' ([^<>"{}|^`\]-[#x00-#x20])* '>'
    | '"' ( ([^#x22#x5C#xA#xD]) | [tbnrf\"'] )* '"'
    | '"""' ( ( '"' | '""' )? ( [^"\] | '\' [tbnrf\"'] ) )* '"""'
  )* [#x20#x9#xD#xA]* '{'

Also, even this distinction doesn't discriminate n3 from trig. Oof.

Do others think I should say "Magic number: None"?


> >File extension(s):
> >    ".ttl"
> >Base URI:
> >    The Turtle '@base <IRIref>' term can change the current base URI for relative IRIrefs in the query language that are used sequentially later in the document.
> >Macintosh file type code(s):
> >    "TEXT"
> >Person & email address to contact for further information:
> >    Eric Prud'hommeaux <eric@w3.org>
> >Intended usage:
> >    COMMON
> >Restrictions on usage:
> >    None
> >Author/Change controller:
> >    The Turtle specification is the product of David Beckett and Tim Berners-Lee. A W3C Working Group may assume maintenance of this document; W3C reserves change control over this specifications.
> >
> >
> >I'll be submitting another form for text/n3 .
> >Boy I hope this is near the end.
> 
> likewise! fingers crossed
> 
> Best,
> 
> Nathan

-- 
-ericP