Den 2009-05-18 13.07, skrev "Martin J. Dürst" <duerst at it.aoyama.ac.jp>:
> When searching through
>
http://tools.ietf.org/html/draft-ietf-ltru-4646bis-21, I found the word
>
"canonical" all over the place. I would suggest that some people do a
>
careful read of these places, checking whether they fit together with
the
> new text. In particular, if there is text elsewhere in the document
that
> assumes that canonicalization may be without or with extlangs, then
we would
> have to revise the text proposal below.
Ok, below is my walkthrough for instances of "canonical" in 4646bis-21.
I did not find any issue of the nature you suggest, but I found one that
should be an easy fix ("canonical" -> "preferred"), and one that may be
more major, but has an unclear fix, and I don't want to raise an issue.
There is also a possible omission in the definition of canonical form
w.r.t. extention subtags.
/kent k
==========================================
In order to avoid instability in the canonical form of tags, if a
two-character code is added to ISO 639-1 for a language for which a
three-character code was already included in either ISO 639-2 or ISO
639-3, the two-character code MUST NOT be registered. See
Section 3.4.
----------------------------
This seems to be excessive, since we don't require absolute stability
of canonical forms. It also seems unnecessary, since apparently (see next
entry just below) ISO has 'promised' not to add any more two-letter
language codes.
Anyway, this text is not problematic w.r.t. the current edits
re. canonical form definition.
==================================================
To avoid these problems with versioning and subtag choice (as
experienced during the transition between RFC 1766 and RFC 3066), as
well as to ensure the canonical nature of subtags defined by this
document, the ISO 639 Registration Authority Joint Advisory Committee
(ISO 639/RA-JAC) has included the following statement in
[iso639.prin]:
...
----------------------------
See previous entry (just above).
=====================================================
9. In the event that more than one extension appears in a single
tag, the tag SHOULD be canonicalized as described in Section 4.5,
by ordering the various extension sequences into case-insensitive
ASCII order.
----------------------------
This is about Extension Subtags (not to be confused with Extended
Language Subtags; I still find this naming needlessly confusing),
and there is no change to canonicalisation w.rt. that in section 4.5.
===============================================
* Preferred-Value's field body contains a canonical mapping from
this record's value to a modern equivalent that is preferred in
its place. Depending on the value of the 'Type' field, this
value can take different forms:
----------------------------
I see no problem with this text.
==============================================
The field 'Preferred-Value' contains a mapping between the record in
which it appears and another tag or subtag (depending on the record's
'Type'). The value in this field is used for canonicalization (see
Section 4.5). In cases where the subtag or tag also has a
'Deprecated' field, then the 'Preferred-Value' is RECOMMENDED as the
best choice to represent the value of this record when selecting a
language tag.
----------------------------
I see no problem with this text.
================================================
4. Extended language subtags always have a mapping to their
identical primary language subtag. For example, the extended
language subtag 'yue' (Cantonese) can be used to form the tag
"zh-yue". It has a Preferred-Value mapping to the primary
language subtag 'yue', meaning that a tag such as
"zh-yue-Hant-HK" can be canonicalized to "yue-Hant-HK".
----------------------------
This refers to what is the latest (and single) version of "canonical",
so no problem with this text.
==========================================
Occasionally the deprecated code is preferred in certain contexts.
For example, both "iw" and "he" can be used in the Java programming
language, but "he" is converted on input to "iw", which is thus the
canonical form in Java.
----------------------------
This is a Java specific problem, not one with LTRU. Still, it would be
better to say "preferred" instead of "canonical" in this paragraph.
========================================
o The specification MUST specify a canonical representation.
----------------------------
This is a requirement on registrations of "Extension subtags".
One that is somehow NOT taken advantage of in the current definition
of canonicalisation. Forgotten point in the def. of canonicalisation?
=========================================
Extension authors are strongly cautioned that many (including most
well-formed) processors will be unaware of any special relationships
or meaning inherent in the order of extension subtags. Extension
authors SHOULD avoid subtag relationships or canonicalization
mechanisms that interfere with matching or with length restrictions
that sometimes exist in common protocols where the extension is used.
In particular, applications MAY truncate the subtags in doing
matching or in fitting into limited lengths, so it is RECOMMENDED
that the most significant information be in the most significant
(left-most) subtags and that the specification gracefully handle
truncated subtags.
----------------------------
I did not know that we classified *processors* (of any kind) as being
well-formed or not...
This text seems to refer to some other canonicalisation mechanisms
than that in section 4.5. And the current def. of canonicalisation
requires sorting the extention subtags. Maybe this refers to
canonicalisation *within* an extension subtag.
I think this text is problematical (in any case, quite regardless of
the changes to section 4.5), but I don't want to raise an issue on
this now.
==========================================
In some cases, the encompassed languages had tags registered for them
during the RFC 3066 era. Those grandfathered tags not already
deprecated or rendered redundant were deprecated in the registry upon
adoption of this document. As grandfathered values, they remain
valid for use and some content or applications might use them. As
with other grandfathered tags, since implementations might not be
able to associate the grandfathered tags with the encompassed
language subtag equivalents that are recommended by this document,
implementations are encouraged to canonicalize tags for comparison
purposes. Some examples of this include the tags "zh-hakka" (Hakka)
and "zh-guoyu" (Mandarin or Standard Chinese).
----------------------------
I see no problem with this text.
=========================================
4.5. Canonicalization of Language Tags
----------------------------
Heading for the section on canonicalisation, a section which naturally has
many instances of the word "canonical" (and inflections of that), which I
don't list here.
===========================================
===========================================
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.