Re: [apps-discuss] URL definitions and draft-ruby-url-problem

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Mon, 22 December 2014 10:13 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 85C571A033A for <apps-discuss@ietfa.amsl.com>; Mon, 22 Dec 2014 02:13:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.098
X-Spam-Level: **
X-Spam-Status: No, score=2.098 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id grxmp02KMzjM for <apps-discuss@ietfa.amsl.com>; Mon, 22 Dec 2014 02:13:46 -0800 (PST)
Received: from scintmta01-14.scbb.aoyama.ac.jp (scintmta01-14.scbb.aoyama.ac.jp [133.2.253.64]) by ietfa.amsl.com (Postfix) with ESMTP id 164CF1A8A1F for <apps-discuss@ietf.org>; Mon, 22 Dec 2014 02:13:27 -0800 (PST)
Received: from scmeg01-14.scbb.aoyama.ac.jp (scmeg01-14.scbb.aoyama.ac.jp [133.2.253.15]) by scintmta01-14.scbb.aoyama.ac.jp (Postfix) with ESMTP id 9E35C32E50F for <apps-discuss@ietf.org>; Mon, 22 Dec 2014 19:12:41 +0900 (JST)
Received: from itmail2.it.aoyama.ac.jp (unknown [133.2.206.134]) by scmeg01-14.scbb.aoyama.ac.jp with smtp id 587c_8a7c_07a53db7_6cb0_4bc1_86db_ecffe27988c3; Mon, 22 Dec 2014 19:12:40 +0900
Received: from [133.2.210.64] (unknown [133.2.210.64]) by itmail2.it.aoyama.ac.jp (Postfix) with ESMTP id CD758BF4E7 for <apps-discuss@ietf.org>; Mon, 22 Dec 2014 19:12:40 +0900 (JST)
Message-ID: <5497EE9A.4030706@it.aoyama.ac.jp>
Date: Mon, 22 Dec 2014 19:12:42 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Apps Discuss <apps-discuss@ietf.org>
References: <B53877D1-0996-448F-982D-4536805F2B1E@vpnc.org>
In-Reply-To: <B53877D1-0996-448F-982D-4536805F2B1E@vpnc.org>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/QAQu0zM3DS7nmyA2tC84Pf5R_kQ
Subject: Re: [apps-discuss] URL definitions and draft-ruby-url-problem
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Dec 2014 10:13:49 -0000

I have the following comments on draft-ruby-url-problem-00:

Overall: I think it's good to have such a document because a lot of 
people keep mentioning problems related to URIs/IRIs/URLs but often 
without enough background or in fragments that then get forgotten. If 
this document can clearly call out these problems, then it will be very 
valuable. This is separate from whether this document should eventually 
be published; it could be published after things have settled down, or 
just be abandoned.

Abstract: I'd make sure that URI and IRI are also mentioned, because 
they clearly are the terminology used widely in the IETF.



Details:

1. Brief History of URL standards
(http://tools.ietf.org/html/draft-ruby-url-problem-00#section-1)

"Although it was quickly determined that it was desirable to allow 
non-ASCII characters, shoehorning utf-8 into ASCII-only systems was 
unacceptable; at the time Unicode was not so widely deployed."

I'd prefer a less emotional word than "shoehorning", and a more 
technical word than "unacceptable". Also, UTF-8 doesn't need to be 
mentioned here, what matters is Unicode.

Also, it would be good to know what "at the time" refers to. If it 
refers to 1994, the statement about Unicode is certainly true, but for 
that date, the statement "it was quickly determined that it was 
desirable to allow  non-ASCII characters" is clearly false. Actually, 
even in 2005, and even after that, there were/are people who claim that 
IRIs should be just "user interface elements", or that ASCII only would 
be okay. Otherwise, the introduction in RFC 3987 (see 
http://tools.ietf.org/html/rfc3987#section-1.1) could have been a lot 
shorter.


"The IRI-to-URI transformation specified in [RFC3987] had options; it 
wasn't a deterministic path."

If this is seen as relevant, please move it to section 3, because it's 
way too small a detail to keep in a section entitled "Brief History of 
URL standards". Then please explain what the options are, and what the 
problems are that you see with these options.
(Please note that the choice between using ToASCII and UTF-8->%-encode 
in the conversion, which may be one of the options you refer to, stems 
from the fact that RFC 3986 allows both %-encoding and Punycode, and 
therefore if this is a problem, this should be mentioned for RFC 3986.)


"The URI-to-IRI transformation was also heuristic, since there was no 
guarantee that %xx-encoded bytes in the URI were actually meant to be 
%xx percent-hex-encoded bytes of a utf8 encoding of a Unicode string."

Again, if this is seen as relevant, please move it to section 3, because 
it's way too small a detail to keep in a section entitled "Brief History 
of URL standards". Also, please change "utf8" to "UTF-8". [As far as I'm 
aware, this isn't a problem; although it's a heuristic, it's one that 
works extremely well, and one that browsers use all the time.]


"... IDNA specs ([RFC3490] and [RFC5895]) did not fully addressed IRI 
processing."

It would be good to know what this is about. UTS-46 is about how to 
bridge the differences between IDNA2003 and IDNA2008.


2. Current Organizations and Specs in Development
(http://tools.ietf.org/html/draft-ruby-url-problem-00#section-2)

"Documents sitting needing update, abandoned now, are three drafts 
([iri-3987bis], [iri-comparison], and [iri-bidi-guidelines]), which were 
originally intended to obsolete [RFC3987]."

The problems with Bidi IRIs should definitely be mentioned in section 3. 
They are well known among experts. What's not known is the solution for 
these problems. The solution given in RFC 3987 has some obvious errors 
(how to handle combining marks); it's general approach also probably can 
be improved on, but it's not sure why.

Removing the bidi issues, the problem with the drafts as currently 
abandoned is that besides the very necessary work of fixing errors,..., 
there was also an attempt to rewrite some of the more general text. My 
conclusion from that exercise has been that this isn't productive at 
all. If I restart on these documents, I'll go back to the original RFC 
3987 and integrate errata. Based on this experience, I also strongly 
suggest that in the event of an attempt to update RFC 3986, updates are 
done on a strict "need-to" base, rather than attempting to write 
everything from scratch.


"This work is based on [UTS-46], and is intented to obsolete both 
[RFC3986] and [RFC3987]."

s/intented/intended/

UTS-46 doesn't fully cover [RFC3986] and [RFC3987], so I'd suggest 
splitting the sentence to avoid potential misunderstandings.

I have read the phrase "is intended to obsolete both [RFC3986] and 
[RFC3987]", but in particular in the document at hand, it would be good 
to say what that's actually supposed to mean. Very specifically, a 
non-IETF document cannot formally obsolete an IETF document. If (a) the 
intent is to bring this document to a level of completeness and 
consensus that would allow the IETF to obsolete the above RFCs with a 
stump RFC, then it would be good to say so explicitly. On the other 
hand, if (b) the intent is to allow browser makers and browser-related 
specs to not have to consult these RFCs, that should be said explicitly.
[My current impression is that the goal is supposed to be close to (a), 
but what's been done is mostly close to (b).]


4. Outline of Potential Solution

s/outout/output/

"Other than responsing to any feedback that may be provided, no changes 
to any Unicode Consortium product is required."

My understanding is that the Unicode Consortium plans to adapt UTS-46 as 
registries (e.g. DENIC) move from IDNA2003 to IDNA2008. So this sentence 
is not necessarily correct.


Regards,   Martin.


On 2014/12/19 01:20, Paul Hoffman wrote:
> This seems like an important document for us to look at, and possibly adopt. Section 3 is pretty scary, and section 4 seems like a very reasonable solution.
>
> --Paul Hoffman
> _______________________________________________
> apps-discuss mailing list
> apps-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/apps-discuss
>