Doug Ewell scripsit: > Should we consider incorporating the 639-5 code elements into > draft-4645bis? I think we should do so, despite 639-5 being technically off-charter. > I know some WG members will argue that collection codes are Evil, in > which case I would respond that perhaps we need to think beyond Web > servers and spell-checkers. Our intent is for language tags to be > useful for a wide variety of applications, and as far as I know -- > despite the horror stories -- collection codes haven't caused major > tagging problems since 2001 (publication date of RFC 3066, which first > allowed them). +1 for this reasoning. By my highly unofficial count, there are 49 new code elements in the current 639-5 registry, which is embedded in the International Standard. There are no exact definitions of either the old 639-2 or the new language collections, but if we identify them with their obvious counterparts from the Ethnologue, I think these six cases require special consideration: 1) 'euq' codes "Basque (family)". The Ethnologue lists three languages in this family, but ISO 639-3 has folded two of them into 'eu', Basque proper. The next edition of the Ethnologue will presumably follow. So 'euq' and 'eu' have the same denotation even though they have different scopes. Strong recommendation: exclude 'euq'. 2) 'hyx' represents "Armenian (family)". Ethnologue shows only one language in this family. So 'hyx' and 'hy' have the same denotation even though they have different scopes. Strong recommendation: exclude 'hyx'. 3) 'jpx' represents "Japanese (family)". Ethnologue shows 12 languages in this family, of which Japanese proper is dominant. Strong recommendation: include 'jpx'. 4) 'qwe' represents "Quechuan (family)". Ethnologue shows 46 languages in this family, of which all but two (Inga 'inb' and Jungle Inga 'inj' are included in the macrolanguage 'qu'. (Presumably these two are excluded because they do not have "Quechua" or "Quichua" in their names.) Weak recommendation: exclude 'qwe'. 5) 'sqj' represents "Albanian (family)". Ethnologue shows four languages (aln, aae, aat, als) in this family; however, the macrolanguage 'sq' encompasses these four languages. So 'sqj' and 'sq' have the same denotation even though they have different scopes. Strong recommendation: exclude 'sqj'. 6) 'zhx' represents "Chinese (family)". Ethnologue shows 14 languages in this family; however, the macrolanguage 'zh' encompasses all but one of these (Dungan 'dng'). (Dungan is excluded because it does not have "Chinese" in its name.) Weak recommendation: exclude 'zhx'. > At the very least, if we are going to continue to include the 639-2-only > collection codes such as 'bad' and 'gem' but exclude the 639-5 ones, > we should explain this decision in RFC 4646bis so it doesn't look like > an oversight. +1 to that too. -- John Cowan http://ccil.org/~cowan cowan at ccil.org The Penguin shall hunt and devour all that is crufty, gnarly and bogacious; all code which wriggles like spaghetti, or is infested with blighting creatures, or is bound by grave and perilous Licences shall it capture. And in capturing shall it replicate, and in replicating shall it document, and in documentation shall it bring freedom, serenity and most cool froodiness to the earth and all who code therein. --Gospel of Tux _______________________________________________ Ltru mailing list Ltru at ietf.org https://www.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.