[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] ISO 639 language code addition rules...



Doug Ewell scripsit:

> Should we consider incorporating the 639-5 code elements into 
> draft-4645bis?  

I think we should do so, despite 639-5 being technically off-charter.

> I know some WG members will argue that collection codes are Evil, in 
> which case I would respond that perhaps we need to think beyond Web 
> servers and spell-checkers.  Our intent is for language tags to be 
> useful for a wide variety of applications, and as far as I know --  
> despite the horror stories -- collection codes haven't caused major 
> tagging problems since 2001 (publication date of RFC 3066, which first 
> allowed them).

+1 for this reasoning.

By my highly unofficial count, there are 49 new code elements in the
current 639-5 registry, which is embedded in the International Standard.
There are no exact definitions of either the old 639-2 or the new language
collections, but if we identify them with their obvious counterparts
from the Ethnologue, I think these six cases require special consideration:

1) 'euq' codes "Basque (family)".  The Ethnologue lists three languages
in this family, but ISO 639-3 has folded two of them into 'eu', Basque
proper.  The next edition of the Ethnologue will presumably follow.
So 'euq' and 'eu' have the same denotation even though they have
different scopes.  Strong recommendation: exclude 'euq'.

2) 'hyx' represents "Armenian (family)".  Ethnologue shows only one
language in this family.  So 'hyx' and 'hy' have the same denotation even
though they have different scopes.  Strong recommendation: exclude 'hyx'.

3) 'jpx' represents "Japanese (family)".  Ethnologue shows 12 languages in
this family, of which Japanese proper is dominant.  Strong recommendation:
include 'jpx'.

4) 'qwe' represents "Quechuan (family)".  Ethnologue shows 46 languages
in this family, of which all but two (Inga 'inb' and Jungle Inga 'inj'
are included in the macrolanguage 'qu'.  (Presumably these two are
excluded because they do not have "Quechua" or "Quichua" in their names.)
Weak recommendation: exclude 'qwe'.

5) 'sqj' represents "Albanian (family)".  Ethnologue shows four languages
(aln, aae, aat, als) in this family; however, the macrolanguage 'sq'
encompasses these four languages.  So 'sqj' and 'sq' have the same
denotation even though they have different scopes.  Strong recommendation:
exclude 'sqj'.

6) 'zhx' represents "Chinese (family)".  Ethnologue shows 14 languages
in this family; however, the macrolanguage 'zh' encompasses all but one
of these (Dungan 'dng').  (Dungan is excluded because it does not have
"Chinese" in its name.)  Weak recommendation: exclude 'zhx'.

> At the very least, if we are going to continue to include the 639-2-only
> collection codes such as 'bad' and 'gem' but exclude the 639-5 ones,
> we should explain this decision in RFC 4646bis so it doesn't look like
> an oversight.

+1 to that too.

-- 
John Cowan    http://ccil.org/~cowan  cowan at ccil.org
The Penguin shall hunt and devour all that is crufty, gnarly and
bogacious; all code which wriggles like spaghetti, or is infested with
blighting creatures, or is bound by grave and perilous Licences shall it
capture.  And in capturing shall it replicate, and in replicating shall
it document, and in documentation shall it bring freedom, serenity and
most cool froodiness to the earth and all who code therein.  --Gospel of Tux
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru



Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.