[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ltru] Ticket #45: updated editor's copy available



All,

I am back from a recent soggy camping trip and catching up with this thread. I have just now posted an editor's copy of the text that has been discussed on this thread to inter-locale. Note that I have made some minor edits to the proposed text (to ensure it matches the style of the document; uses RFC 2119 keywords properly; and is grammatically correct). 

Here are the links:

  Diff: http://tinyurl.com/cyrmju
  HTML: http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-22-ed-md.html
  TXT: http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-222-ed-mt.txt 

Here is the proposed text for Section 4.5:

--
4.5.  Canonicalization of Language Tags

   Since a particular language tag is sometimes used by many processes,
   language tags SHOULD always be created or generated in a canonical
   form.

   There are two canonical forms for language tags: the 'default'
   canonical form contains no extended language subtags, while the
   'extlang' canonical form contains extended language subtags where
   required.  Normally, the 'default' canonicalization is preferred.
   However, the 'extlang' canonical form can be useful in environments
   where the presence of the enclosing primary language subtag is
   considered beneficial to matching or selection (see Section 4.1.2)

   A language tag is in a canonical form, either default or extended,
   when the tag is well-formed according the rules in Section 2.1 and
   Section 2.2 and it has been canonicalized by applying each of the
   following steps in order, using data from the IANA registry (see
   Section 3.1):

   1.  Extension sequences are ordered into case-insensitive ASCII order
       by singleton subtag.

       *  That is, the subtag sequence '-a-babble' comes before
          '-b-warble'.

   2.  Redundant or grandfathered tags are replaced by their Preferred-
       Value, if there is one.

       *  These items are either deprecated mappings created before the
          adoption of this document (such as the mapping of "no-nyn" to
          "nn" or "i-klingon" to "tlh") or are the result of later
          registrations or additions to this document (for example, "zh-
          hakka" was deprecated in favor of the ISO 639-3 code 'hak'
          when this document was adopted).

       *  Note: The field-body of the Preferred-Value for grandfathered
          and redundant tags is an "extended language range" ([RFC4647])
          and might consist of more than one subtag.

   3.  Subtags are replaced by their Preferred-Value, if there is one.
       For extended language subtags, the original primary language
       subtag is also replaced if there is a primary language subtag in
       the Preferred-Value.

       *  The field-body of the Preferred-Value for extlangs is an
          "extended language range" and almost always consists of a
          single, primary language subtag.  For example, the subtag
          sequence "zh-hak" (Chinese, Hakka) would be replaced with the
          tag "hak" (Hakka).

       *  The field-body of the Preferred-Value for all other types of
          subtags consists of a subtag of the same type.  Most of these
          non-extlang subtags are either Region subtags where the
          country name or designation has changed or are clerical
          corrections to ISO 639-1.

   4.  In the 'extlang' canonical form (but not the 'default' canonical
       form), primary language subtags that are also extlang subtags are
       prepended with the extlang's Prefix.

       *  For example, "hak-CN" (Hakka, China) has a primary language
          subtag of 'hak', which also appears in the registry as an
          'extlang' record with a Prefix 'zh' (Chinese).  The 'extlang'
          canonical form would be "zh-hak-CN" (Chinese, Hakka, China).

       *  Note that this step can restore a subtag that was removed by
          the previous step.

   Example: The language tag "en-a-aaa-b-ccc-bbb-x-xyz" is in a
   canonical form, while "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed and
   potentially valid (extensions 'a' and 'b' are not defined as of the
   publication of this document) but not in a canonical form (the
   extensions are not in alphabetical order).

   Example: Although the tag "en-BU" (English as used in Burma)
   maintains its validity, the language tag "en-BU" is not in a
   canonical form because the 'BU' subtag has a canonical mapping to
   'MM' (Myanmar).

   Canonicalization of language tags does not imply anything about the
   use of upper or lowercase letters when processing or comparing
   subtags (and as described in Section 2.1).  All comparisons MUST be
   performed in a case-insensitive manner.

   When performing canonicalization of language tags, processors MAY
   regularize the case of the subtags (that is, this process is
   OPTIONAL), following the case used in the registry (see
   Section 2.1.1).

   If more than one variant appears within a tag, processors MAY reorder
   the variants to obtain better matching behavior or more consistent
   presentation.  Reordering of the variants SHOULD follow the
   recommendations for variant ordering in Section 4.1.

   If the field 'Deprecated' appears in a registry record without an
   accompanying 'Preferred-Value' field, then that tag or subtag is
   deprecated without a replacement.  These values are canonical when
   they appear in a language tag.  However, tags that include these
   values SHOULD NOT be selected by users or generated by
   implementations.

   An extension MUST define any relationships that exist between the
   various subtags in the extension and thus MAY define an alternate
   canonicalization scheme for the extension's subtags.  Extensions MAY
   define how the order of the extension's subtags are interpreted.  For
   example, an extension could define that its subtags are in canonical
   order when the subtags are placed into ASCII order: that is, "en-a-
   aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa".  Another extension might
   define that the order of the subtags influences their semantic
   meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
   aaa-bbb-ccc").  However, extension specifications SHOULD be designed
   so that they are tolerant of the typical processes described in
   Section 3.7.
--

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.



Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.