[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ltru] RE: New item in ISO 639-2 - Zaza



> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Peter Constable


> > The fallback problem has been considered paramount to this point, and
> > was the reason for inventing extended language subtags in the first
> > place.
> 
> My recollection is that the idea of extended language subtags was my idea, and that
> fallback was not the paramount reason why it occurred to me that we do that.

Someone asked me offlist for further explanation; I thought I provide that for the list.

The general, underlying issue that was behind this was the need to deal on the one hand with users that prefer to ‎‎"clump" and on the other with users that like to split in certain cases involving very ‎closely-related varieties. That was specifically manifested in that 639-2 had for some ‎time included identifiers for single entities it considered "individual" languages (not ‎collections) that corresponded in the source data sets for 639-3 to multiple ‎languages. ‎

Early in our work toward 639-3 (actually before it was an approved work item), Gary ‎Simons and I established a set of guidelines for mapping how to map between ‎entries in ISO 639-2 and those in Ethnologue. For cases that had these 1-to-many ‎mappings, we initially adopted a principle that, if within the cluster of languages ‎there was one variety that was dominant -- significantly larger population and with a ‎significant difference in the degree of language development (use in media, ‎education, etc.), then we would equate the item in 639-2 with that particular variety.‎

(That principle has thus far been applied in one case: Western Frisian.)

In the course of working on 3066bis, however, I realized that that left a problem. ‎This struck me particularly in the case of Chinese: there is long established usage of ‎‎"zh", and that usage is ambiguous: it doesn't differentiate between distinct ‎languages like Cantonese and Mandarin. Plus, the IANA registry already had ‎accepted tags of the form "zh-yue" etc.: the initial thought was to treat them as ‎grandfathered anomalies, but of course there's always some preference to have a model in ‎which everything is systematic rather than having ad-hoc anomalies. ‎

Considering those, it suddenly struck me that there was a way to bridge the ‎clumping / legacy usage with the distinctions we wanted to add in 639-3: rather than ‎try to force the clumped items in 639-2 into a more granular inventory, we could ‎accept them as they are: ambiguous clumps -- "macrolanguages" (I couldn't think of ‎any better term); and if we simply provided a normative mapping between a ‎macrolanguage and the individual languages it encompasses, then that could provide ‎a basis for creating a model in 3066bis that treated things like "zh-yue" as systematic ‎rather than anomalies.‎

The fallback behaviour is a by-product of the multi-subtag form of those tags and the ‎fallback mechanism that already existed in RFC 3066. That wasn't the main purpose in my ‎mind for proposing the addition of extlang in the syntax for RFC 3066bis. Rather, the ‎reason was simply to provide a bridge between users that like to clump and those ‎that like to split; and a bridge between legacy in 639-2 and the IANA registry and ‎what was coming in 639-3.‎


Peter Constable

_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.