Very good.
Now that I see the result, I think you are right about moving the examples. I'd suggest the following locations (yellow for those who can see it), and a slight shortening.4.5. Canonicalization of Language Tags
Since a particular language tag can be used by many processes, language tags SHOULD always be created or generated in canonical form.
A language tag is in 'canonical form' when the tag is well-formed according to the rules in Section 2.1 (Syntax) and Section 2.2 (Language Subtag Sources and Interpretation) and it has been canonicalized by applying each of the following steps in order, using data from the IANA registry (see Section 3.1 (Format of the IANA Language Subtag Registry)):
Extension sequences are ordered into case-insensitive ASCII order by singleton subtag.
- For example, the subtag sequence '-a-babble' comes before '-b-warble'.
- For example, the language tag "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed and potentially valid (extensions 'a' and 'b' are not defined as of the publication of this document) but not in a canonical form because the extensions are not in alphabetical order; the canonical form would be "en-a-aaa-b-ccc-bbb-X-xyz".
Redundant or grandfathered tags are replaced by their Preferred-Value, if there is one.
- The field-body of the Preferred-Value for grandfathered and redundant tags is an "extended language range" ([RFC4647] (Phillips, A. and M. Davis, “Matching of Language Tags,” September 2006.)) and might consist of more than one subtag.
- Preferred-Value fields in the registry provide mappings from deprecated tags to modern equivalents. Many of these were created before the adoption of this document (such as the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh"). Others are the result of later registrations or additions to the registry as permitted or required by this document (for example, "zh-hakka" was deprecated in favor of the ISO 639-3 code 'hak' when this document was adopted).
Subtags are replaced by their Preferred-Value, if there is one. For extlangs, the original primary language subtag is also replaced if there is a primary language subtag in the Preferred-Value.
- For example, although the tag "en-BU" (English as used in Burma) is valid, it is not in canonical form because the 'BU' subtag has a canonical mapping to 'MM' (Myanmar).
- The field-body of the Preferred-Value for extlangs is an "extended language range" and typically maps to a primary language subtag. For example, the subtag sequence "zh-hak" (Chinese, Hakka) is replaced with the subtag 'hak' (Hakka).
- Most of the non-extlang subtags are either Region subtags where the country name or designation has changed or clerical corrections to ISO 639-1.
Mark
On Tue, May 19, 2009 at 07:27, Phillips, Addison <addison at amazon.com> wrote:A language tag is in 'canonical form' (or in the alternative 'extlangAP> DONE
form', see below), when the tag is well-formed according the rules in
Remove the parenthetical, since Steps 1-3 *don't* define the extlang form; the extlang form is defined separately. I've been trying for a while to get you to fix this ;-)
according => according to
AP> DONE in response to Martin's note.
AP> Good points both. I did a more extensive edit to fix this. In the new document, it isn't "many" but rather "all" of the redundant/grandfathered mappings that fit the two categories provided: there are no other kinds of P-V values! To convey all this I put:
* Many of these Preferred-Value mappings are either deprecated tags
A mapping isn't a tag. You could add "for" after "either" to fix that.
Grammatically you need to structure this sentence as one of the following. I don't care which.
... mappings either are ... or are ...
or
... mappings are either ... or ...
--
Preferred-Value fields in the registry provide mappings from deprecated tags to modern equivalents. Many of these were created before the adoption of this document (such as the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh"). Others are the result of later registrations or additions to the registry as permitted or required by this document (for example, "zh-hakka" was deprecated in favor of the ISO 639-3 code 'hak' when this document was adopted).
--
AP> The first one illustrates extension ordering. The second one illustrates preferred-value mapping. These are covered by the text now in a way that wasn't as apparent before. But I digress, since I already said I wasn't going to muck with them anymore.
--
I’d also be in favor of losing the two “Example” paragraphs as redundant (or incorporating them into the canonicalization rules). They detract from the overall flow, are confusing, and are already covered by other examples in this final text.
The examples do cover cases not otherwise covered.
Addison
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.