---#. Use specific language subtags or subtag sequences in preference to subtags for language collections. A "language collection" is a subtag derived from one of the ISO 639-2 codes that represents multiple related languages. For example, the code 'nai' represents "North American languages". The registry contains values for the specific languages represented by this collective code. For example 'xxx' (language1) and 'yyy' (language2). Note that these languages are otherwise unrelated.
---I wouldn't have a problem with deprecating these codes. Should we provide a list of Prefix values?
Addison John Cowan wrote:
With the introduction of language subtags for most of the known languages of the world, it's time to consider deprecating ISO 639-2-derived subtags that represent collections of languages (as distinct from macrolanguages). Under RFC 4646, if you want to tag a text in a Native American language from North America that does not have its own subtag, the best you can do is use the collection subtag "nai". Now, however, all those languages will appear under their own names with their own subtags, and "nai" will never be necessary and rarely useful. In addition, many of the collection subtags are for genetic groups, like 'dra' "Dravidian languages (Other)" and so are essentially unstable, because the list of which languages are considered Dravidian may change over time. Therefore, I decided to investigate how much, and which, language collection subtags would be deprecated. There are 68 code elements that appear in 639-2 but not 693-3, excluding the bibliographic codes like 'ger', 'alb', and so on that are not used in either 639-3 or RFC 4646. One of these, 'sgn', is a special case, to be treated by 4646bis as a de facto macrolanguage. Of the remaining 67, 55 are uncontroversially collections; they include either the string "languages" (as in 'nai', "North American languages") or the string "(Other)" (as in 'dra', "Dravidian (Other)") in their 639-2 names. (Note that 'mul', "Multiple languages", contains the string "languages" in its name but is part of 639-3 and should not be deprecated; 'sgn' likewise should not be deprecated.) The remaining 12 codes appear below. The 14th Edition of the Ethnologue is now obsolete, but still useful for explicating the Ethnologue view (and implicitly the 639-3 view) of these codes: see http://www.ethnologue.com/14/show_iso639.asp?code=xxx for information on 639-2 code xxx. 'bad', "Banda" 'bih', "Bihari" 'btk', "Batak (Indonesia)" 'day', "Dayak" 'him', "Himachali" 'ijo', "Ijo" 'kar', "Karen" 'kro', "Kru" 'nah', "Nahuatl" 'nqo', "N'ko" 'son', "Songhai" 'znd', "Zande" Notes: There is a 639-3 language called 'bnd', "Banda", which is a particular member of the 'bad' collection; and analogously for 'zne', "Zande (specific)". The 14th edition says this about 'day': The term "Dayak" is particularly problematic. It is a cover term used to refer to all non-Muslim peoples of Borneo. It is not a linguistic identifier, nor does it even refer to a single ethnic identity. According to MARC information, this code has been used in reference to various languages from several distinct branches of Western-Malayo-Polonesian, and thus it also does not correspond to some node in a genetic classification. 'nqo' does not appear in the 14th edition at all. It is to be hoped that these 12 discrepancies are cleaned up in some way by the 639/RA-JAC before RFC 4646bis goes into effect. In any case, my proposal is that all the code elements of 639-2 that are not in 639-3, with the sole exception of 'sgn', should be deprecated as of Date C with a comment of "Collection of languages" or the like.
-- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature. _______________________________________________ Ltru mailing list Ltru at ietf.org https://www1.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.