Argh... one of those links is a typo. TXT: http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-22-ed-md.txt Addison Phillips Globalization Architect -- Lab126 Internationalization is not a feature. It is an architecture. > -----Original Message----- > From: ltru-bounces at ietf.org [mailto:ltru-bounces at ietf.org] On > Behalf Of Phillips, Addison > Sent: Tuesday, May 05, 2009 11:07 AM > To: LTRU Working Group > Subject: [Ltru] Ticket #45: updated editor's copy available > > All, > > I am back from a recent soggy camping trip and catching up with > this thread. I have just now posted an editor's copy of the text > that has been discussed on this thread to inter-locale. Note that I > have made some minor edits to the proposed text (to ensure it > matches the style of the document; uses RFC 2119 keywords properly; > and is grammatically correct). > > Here are the links: > > Diff: http://tinyurl.com/cyrmju > HTML: http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-22- > ed-md.html > TXT: http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-222- > ed-mt.txt > > Here is the proposed text for Section 4.5: > > -- > 4.5. Canonicalization of Language Tags > > Since a particular language tag is sometimes used by many > processes, > language tags SHOULD always be created or generated in a > canonical > form. > > There are two canonical forms for language tags: the 'default' > canonical form contains no extended language subtags, while the > 'extlang' canonical form contains extended language subtags > where > required. Normally, the 'default' canonicalization is preferred. > However, the 'extlang' canonical form can be useful in > environments > where the presence of the enclosing primary language subtag is > considered beneficial to matching or selection (see Section > 4.1.2) > > A language tag is in a canonical form, either default or > extended, > when the tag is well-formed according the rules in Section 2.1 > and > Section 2.2 and it has been canonicalized by applying each of > the > following steps in order, using data from the IANA registry (see > Section 3.1): > > 1. Extension sequences are ordered into case-insensitive ASCII > order > by singleton subtag. > > * That is, the subtag sequence '-a-babble' comes before > '-b-warble'. > > 2. Redundant or grandfathered tags are replaced by their > Preferred- > Value, if there is one. > > * These items are either deprecated mappings created before > the > adoption of this document (such as the mapping of "no- > nyn" to > "nn" or "i-klingon" to "tlh") or are the result of later > registrations or additions to this document (for example, > "zh- > hakka" was deprecated in favor of the ISO 639-3 code > 'hak' > when this document was adopted). > > * Note: The field-body of the Preferred-Value for > grandfathered > and redundant tags is an "extended language range" > ([RFC4647]) > and might consist of more than one subtag. > > 3. Subtags are replaced by their Preferred-Value, if there is > one. > For extended language subtags, the original primary language > subtag is also replaced if there is a primary language > subtag in > the Preferred-Value. > > * The field-body of the Preferred-Value for extlangs is an > "extended language range" and almost always consists of a > single, primary language subtag. For example, the subtag > sequence "zh-hak" (Chinese, Hakka) would be replaced with > the > tag "hak" (Hakka). > > * The field-body of the Preferred-Value for all other types > of > subtags consists of a subtag of the same type. Most of > these > non-extlang subtags are either Region subtags where the > country name or designation has changed or are clerical > corrections to ISO 639-1. > > 4. In the 'extlang' canonical form (but not the 'default' > canonical > form), primary language subtags that are also extlang > subtags are > prepended with the extlang's Prefix. > > * For example, "hak-CN" (Hakka, China) has a primary > language > subtag of 'hak', which also appears in the registry as an > 'extlang' record with a Prefix 'zh' (Chinese). The > 'extlang' > canonical form would be "zh-hak-CN" (Chinese, Hakka, > China). > > * Note that this step can restore a subtag that was removed > by > the previous step. > > Example: The language tag "en-a-aaa-b-ccc-bbb-x-xyz" is in a > canonical form, while "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed > and > potentially valid (extensions 'a' and 'b' are not defined as of > the > publication of this document) but not in a canonical form (the > extensions are not in alphabetical order). > > Example: Although the tag "en-BU" (English as used in Burma) > maintains its validity, the language tag "en-BU" is not in a > canonical form because the 'BU' subtag has a canonical mapping > to > 'MM' (Myanmar). > > Canonicalization of language tags does not imply anything about > the > use of upper or lowercase letters when processing or comparing > subtags (and as described in Section 2.1). All comparisons MUST > be > performed in a case-insensitive manner. > > When performing canonicalization of language tags, processors > MAY > regularize the case of the subtags (that is, this process is > OPTIONAL), following the case used in the registry (see > Section 2.1.1). > > If more than one variant appears within a tag, processors MAY > reorder > the variants to obtain better matching behavior or more > consistent > presentation. Reordering of the variants SHOULD follow the > recommendations for variant ordering in Section 4.1. > > If the field 'Deprecated' appears in a registry record without > an > accompanying 'Preferred-Value' field, then that tag or subtag is > deprecated without a replacement. These values are canonical > when > they appear in a language tag. However, tags that include these > values SHOULD NOT be selected by users or generated by > implementations. > > An extension MUST define any relationships that exist between > the > various subtags in the extension and thus MAY define an > alternate > canonicalization scheme for the extension's subtags. Extensions > MAY > define how the order of the extension's subtags are interpreted. > For > example, an extension could define that its subtags are in > canonical > order when the subtags are placed into ASCII order: that is, > "en-a- > aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension > might > define that the order of the subtags influences their semantic > meaning (so that "en-b-ccc-bbb-aaa" has a different value from > "en-b- > aaa-bbb-ccc"). However, extension specifications SHOULD be > designed > so that they are tolerant of the typical processes described in > Section 3.7. > -- > > Addison Phillips > Globalization Architect -- Lab126 > > Internationalization is not a feature. > It is an architecture. > > > _______________________________________________ > Ltru mailing list > Ltru at ietf.org > https://www.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.