[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] On deprecating 639-2 language collections



I'd prefer to tinker with the code lists as little as possible. I think that, with the introduction of 4646bis and 639-3, it might be wise to add some text in Section 4.1 (Choice) along the lines of:

---
#. Use specific language subtags or subtag sequences in preference to subtags for language collections. A "language collection" is a subtag derived from one of the ISO 639-2 codes that represents multiple related languages. For example, the code 'nai' represents "North American languages". The registry contains values for the specific languages represented by this collective code. For example 'xxx' (language1) and 'yyy' (language2). Note that these languages are otherwise unrelated.
---


I wouldn't have a problem with deprecating these codes. Should we provide a list of Prefix values?

Addison

John Cowan wrote:
With the introduction of language subtags for most of the known languages
of the world, it's time to consider deprecating ISO 639-2-derived subtags
that represent collections of languages (as distinct from macrolanguages).

Under RFC 4646, if you want to tag a text in a Native American language
from North America that does not have its own subtag, the best you
can do is use the collection subtag "nai".  Now, however, all those
languages will appear under their own names with their own subtags, and
"nai" will never be necessary and rarely useful.  In addition, many of
the collection subtags are for genetic groups, like 'dra' "Dravidian
languages (Other)" and so are essentially unstable, because the list of
which languages are considered Dravidian may change over time.

Therefore, I decided to investigate how much, and which, language
collection subtags would be deprecated.

There are 68 code elements that appear in 639-2 but not 693-3, excluding
the bibliographic codes like 'ger', 'alb', and so on that are not used
in either 639-3 or RFC 4646.  One of these, 'sgn', is a special case,
to be treated by 4646bis as a de facto macrolanguage.

Of the remaining 67, 55 are uncontroversially collections; they include
either the string "languages" (as in 'nai', "North American languages")
or the string "(Other)" (as in 'dra', "Dravidian (Other)") in their
639-2 names.  (Note that 'mul', "Multiple languages", contains the string
"languages" in its name but is part of 639-3 and should not be deprecated;
'sgn' likewise should not be deprecated.)

The remaining 12 codes appear below.  The 14th Edition of the
Ethnologue is now obsolete, but still useful for explicating the
Ethnologue view (and implicitly the 639-3 view) of these codes:  see
http://www.ethnologue.com/14/show_iso639.asp?code=xxx for information
on 639-2 code xxx.

'bad', "Banda"
'bih', "Bihari"
'btk', "Batak (Indonesia)"
'day', "Dayak"
'him', "Himachali"
'ijo', "Ijo"
'kar', "Karen"
'kro', "Kru"
'nah', "Nahuatl"
'nqo', "N'ko"
'son', "Songhai"
'znd', "Zande"

Notes:

There is a 639-3 language called 'bnd', "Banda", which is a particular
member of the 'bad' collection; and analogously for 'zne', "Zande
(specific)".

The 14th edition says this about 'day':

	The term "Dayak" is particularly problematic. It is a cover term
	used to refer to all non-Muslim peoples of Borneo. It is not a
	linguistic identifier, nor does it even refer to a single ethnic
	identity. According to MARC information, this code has been
	used in reference to various languages from several distinct
	branches of Western-Malayo-Polonesian, and thus it also does
	not correspond to some node in a genetic classification.

'nqo' does not appear in the 14th edition at all.

It is to be hoped that these 12 discrepancies are cleaned up in some way
by the 639/RA-JAC before RFC 4646bis goes into effect.	In any case, my
proposal is that all the code elements of 639-2 that are not in 639-3,
with the sole exception of 'sgn', should be deprecated as of Date C with
a comment of "Collection of languages" or the like.


--
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.