"Phillips, Addison" <addison at amazon dot com> wrote:
I think it a mistake to water down "deprecated".
If you want "deprecated" to have its original strong meaning of "SHOULD NOT," that is fine, but we MUST be consistent about it. We need to stop giving people the false impression, in drafts and mailing-list conversation, that it will be acceptable to use deprecated tags or subtags. We need to stop trying to placate people in the X-Y versus Y debate by saying it doesn't matter much which form is deprecated, that the difference is only one of matching and fallback. It will be a major difference.
The problem we have with deprecation at present is the use of the Preferred-Value field in 'extlang' records requires deprecation. If we don't want to deprecate the extlang form, then we don't deprecate it. The question is whether a tag of form "X-Y" is in its canonical form or not. That is, is "zh-cmn-Hans-CN" canonical? Or is its canonical form "cmn-Hans-CN"? It isn't necessarily true that non-canonical forms are deprecated. For example, the tag "en-b-moo-a-foo" is not in canonical form, but isn't necessarily deprecated.
I thought it was clear in my proposal that the no-extlang version would be preferred. But you're right that nothing in the Registry would say so, which I concede is important.
<AP> Canonicalization is not the same thing as permission to normalize. Permission to normalize is much weaker. If we say that the 'Y' form is canonical, it will save us having to modify aspects of matching (we may still have to say something about matching). </AP>
I have been using the two terms interchangeably, and probably shouldn't have been.
<AP> It doesn't remove the arbitrariness. It is just a different set of arbitrariness. Personally, I prefer the smaller set of languages we have now. But we could solve this by only initially registering a few and allowing ietf-languages to register others. Or just register them all. </AP>
How is it "just a different set of arbitrariness" if we handle all of the encompassed languages the same way, instead of hand-selecting a few macrolanguages like "zh" and treating their encompassed languages differently from, say, the languages encompassed by "fa"?
Everyone seems to prefer the smaller set of languages, and in fact many want it to be smaller than what we have now. Something tells me that nobody has read the passage in draft-4645bis, Section 2.2, where I attempted to explain the arbitrariness we have now, so I'll repeat it here:
"If the language was encompassed by one of the [ISO639‑3] macrolanguages ‘ar’ (Arabic), ‘kok’ (Konkani), ‘ms’ (Malay), ‘sw’ (Swahili), ‘uz’ (Uzbek), or ‘zh’ (Chinese), as determined by [iso‑639‑3‑macrolanguages_20080218], an extended language subtag was also added, with the primary language subtag of the macrolanguage as the value for the Prefix field. These macrolanguage subtags were already present in the Language Subtag Registry and were determined by the LTRU Working Group to have been used to represent a single dominant language as well as the macrolanguage as a whole, making the extended language mechanism suitable for languages encompassed by the macrolanguage."
Is everyone satisfied with this explanation of why these particular languages were chosen? If we choose a different set, will we be able to explain that set in terms that are just as good or better?
I withdraw the radical proposal, but I hope it got us thinking a little about some of these issues.
-- Doug Ewell * Arvada, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ _______________________________________________ Ltru mailing list Ltru at ietf.org https://www.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.