[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] 4.5: canonicalization and subtags mapped to Preferred-Value



Server at inter-locale should now be up again.

This section, as noted, is a mess. I've altered it significantly just now in an attempt to fix Peter's issues, plus others.

I quote the text here for archival purposes, but probably you'll want to see it on the document in a few minutes.

--
<section title="Canonicalization of Language Tags" anchor="canonical">
                                <t>Since a particular language tag is sometimes used by many processes, language tags SHOULD always be created or generated in a canonical form.</t>
                                <t>A language tag is in canonical form when:
   <list style="numbers">
                                                <t>The tag is well-formed according the rules in <xref target="syntax"/> and
      <xref target="sources"/>.</t>
                                                <t>Redundant or grandfathered tags that have a Preferred-Value mapping
      in the IANA registry (see <xref target="ianaformat"/>) MUST
      be replaced with their mapped value. These items either are
      deprecated mappings created before the adoption of this document
      (such as the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh")
      or are the result of later registrations or additions to this
      document (for example, "zh-hakka" was deprecated in favor of the
      ISO 639-3 code 'hak' when this document was adopted). These mappings SHOULD be done before additional processing, since there can be additional changes to subtag values. These field-body of the Preferred-Value for grandfathered and redundant tags is an "extended language range" (<xref target="RFC4647"></xref>) and might consist of more than one subtag.</t>

                                                <t>Subtags of type 'extlang' SHOULD be mapped to their Preferred-Value. The field-body of the Preferred-Value for extlangs is an "extended language range" and typically maps to a primary language subtag. For example, the subtag sequence "zh-hak" (Chinese, Hakka) would be replaced with the tag "hak" (Hakka).</t><t>Other subtags that have a Preferred-Value field
      in the IANA registry (see <xref target="ianaformat"/>) MUST be
      replaced with their mapped value. Most of these are either Region subtags where the country name or designation has changed or clerical corrections to ISO 639-1.</t>
                                                <t>If more than one extension subtag sequence exists, the extension
      sequences are ordered into case-insensitive ASCII order by singleton
      subtag.</t>
                                        </list>
                                </t>
                                <t>Example: The language tag "en-a-aaa-b-ccc-bbb-x-xyz" is in canonical form,
while "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed and potentially valid (extensions 'a' and 'b' are not defined as of the publication of this document) but not in canonical form (the extensions are not in alphabetical order).</t>
                                <t>Example: The language tag "en-BU" (English as used in Burma) is
not canonical because the 'BU' subtag has a canonical mapping to 'MM' (Myanmar), although the tag "en-BU" maintains its validity.</t>
                                <t>Canonicalization of language tags does not imply anything about the use of upper or lowercase letters when processing or comparing subtags (and as described in <xref target="syntax"/>). All comparisons MUST be performed in a case-insensitive manner.</t>
                                <t>When performing canonicalization of language tags, processors MAY regularize the case of the subtags (that is, this process is OPTIONAL), following the case used in the registry (see <xref target="casing"></xref>). </t>

                                <t>Note: if the field 'Deprecated' appears in a registry record without an accompanying 'Preferred-Value' field, then that tag or subtag is deprecated without a replacement. Validating processors SHOULD NOT generate tags that include these values, although the values are canonical when they appear in a language tag.</t>
                                <t>An extension MUST define any relationships that exist between the
various subtags in the extension and thus MAY define an alternate
canonicalization scheme for the extension's subtags. Extensions MAY
define how the order of the
extension's subtags are interpreted. For example, an extension could
define that its subtags are in canonical order when the subtags are placed
into ASCII order: that is, "en-a-aaa-bbb-ccc" instead of
"en-a-ccc-bbb-aaa". Another
extension might define that the order of the subtags influences their
semantic meaning (so that "en-b-ccc-bbb-aaa" has a different value from
"en-b-aaa-bbb-ccc"). However, extension specifications SHOULD be designed so that they are tolerant of the typical processes described in <xref target="extensions"/>.</t>
                        </section>
--

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: ltru-bounces at ietf.org [mailto:ltru-bounces at ietf.org] On
> Behalf Of Peter Constable
> Sent: Tuesday, July 08, 2008 5:57 PM
> To: LTRU Working Group
> Subject: [Ltru] 4.5: canonicalization and subtags mapped to
> Preferred-Value
>
> In the current draft, Section 4.5 lists requirements for canonical
> form. In those requirements, point 2 states the need to map Region
> subtags to their preferred value, and point 3 states the need to
> map Grandfathered and Redundant tags to their preferred value. Then,
> point 4 states the following:
>
> ------------
> Other subtags that have a Preferred-Value mapping in the IANA
> registry (see Section 3.1 (Format of the IANA Language Subtag
> Registry)) MUST be replaced with their mapped value. These items
> consist entirely of clerical corrections to ISO 639-1 in which the
> deprecated subtags have been maintained for compatibility purposes.
> ------------
>
> "Other subtags" here would include any language, extlang, script or
> variant subtags. Section 3.1.2 indicates that any of these types of
> subtags potentially can have Preferred-Value mappings. Yet the text
> I've quoted states that the set of all these cases is limited to
> clerical corrections in 639-1 (the Javanese and Hebrew cases).
> That's definitely not the case (e.g. every extlang must have a
> Preferred-Value field.) Something's not right.
>
> (I tried to check if something has changed here since previous
> drafts, but the Inter-Locale site appears to be down.)
>
>
> Peter
>
> _______________________________________________
> Ltru mailing list
> Ltru at ietf.org
> https://www.ietf.org/mailman/listinfo/ltru
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru



Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.