Whenever I read a standard, for better or for worse, the question I am asking myself isn't, "What does this mean?" or "What purpose does this serve?" I am always asking myself, "How do I write a program that does this?" This is why RFC 3066, for all that it's a BCP, simply is inadequate; I am interested enough in this working group to come here and express a solid support for the current direction because having a language tag which is parsable according to constructable rules greatly reduces the amount of work any programmer has to do when developing for compliance.
As such, I read your first point, regarding characterizing the use of a document, and it tells me something interesting about myself. The fact is, for all its irony, I'm not typically even remotely interested in how a document is used; I tend to focus on how a document is /filed/ by the people who use it. If a client tells me, "I want a list of all documents sorted alphabetically by the third sentence of the second chapter," I might ask, "Why? Are you sure?" but if the client insists, I'll dutifully begin writing the appropriate algorithsm.
I admit that issues of defining "What is a language?" and its philosophical correlates are perhaps of coffee-table interest to me, but not of professional interest as a programmer. Instead I merely want to know how I serve to any random end-user the appropriate document following whatever language /he/ thinks he's speaking. This means I need my language tags to be /descriptive,/ not proscriptive, and they have to be extensible in a logical way.
For better or for worse, if we are describing human languages, we must deal with the reality that human beings /do/ invent languages. It /is/ possible to find websites written in Klingon and poetry written in "Yodish." A bit disconcerting, to be sure, but possible. Certainly, if someone in my living room was trying to speak to me in Klingon, I'd probably request that he get a life, but /personally,/ I can do that. Professionally I can't; I can't use the fact that a man who speaks Esperanto, an equally artificial language, is more likely to have a girlfriend than a man who speaks Klingon as a justification for design limitations. Again, human beings /do/ invent languages and any tagging standard which does not account for this reality is inadequate for the task of classifying human languages.
Effectively, I look at the language tag in this fashion: if, for any two random given people, I must use two different syntaxes to say the same thing, then I need a different tag. en-US and en-GB are different not just because the British like throwing in superfluous u's, but also because the word "fanny" gets me in more trouble in one place than in the other. Same with the word "mantequilla" en es-MX versus es-AR. Anyone interested in providing global content must be able to navigate these differences /and/ similar unpredictable differences which arise in the future. What happens, for example, when significant language-use differences are based on social class in the exact same region in the exact same tongue? No existing tag accounts for this; can we be sure that 35 possibilities for extension protect us against future social and creative inventions? Extensibility and modularity must be incorporated into the system from its inception or the system will fail. Human creativity and pop culture move altogether faster than specification revision committees.
Ultimately, these tags are not, and /cannot/ be, proscriptive for how people are /allowed/ to classify languages; you can't program humans like a computer. Instead, they need to be descriptive of how human beings /use/ language--"use" it here in both senses, of how they actually speak, write or signal it, and also in how they already classify it for their own purposes of transmission, and that descriptiveness must share the same capacity for growth as the objects it describes. In other words, the tags used to describe languages must themselves be like languages: if the language changes from region to region, so must the tags. If languages divide or combine over time, so must the tags. And if languages can spring wholesale from the minds of hack science-fiction screenwriters, so must the tags.
Further, if languages can be analyzed for factors important to one organization but irrelevant to another, so must the tags. The reading-level example, for instance, is intrinsically part of how language is used within a culture; the very educational institutions teaching the language divide material in this fashion. The regional press example points to how material is requested and provided, still however analyzed based on sheer linguistic--word choice and level of abstractness--factors. And since it would be daunting for a registration body to make any attempt at trying to track and describe the myriad of human possibilities for interpreting a language, best we put that charge directly in the hands of the people who do it for a living. Certainly this means that corporations will also use their namespace for less germaine reasons. And fortunately, parsing agents can ignore their tags and still remain completely in compliance: a small price to pay for effectively making the entire world a de facto but organized registration authority for how language is used worldwide.
I've been informed both here and privately that perhaps a more appropriate approach would be to wait until this document becomes a standard and then propose organizational namespace as a new Internet Draft. Certainly, you guys know better than I do what we're up against and I'll defer to your best judgment. But the entire reason I so strongly support this project is because we undeniably /need/ a parsable internationalization architecture (as a programmer, /I/ need it, and I have yet to speak to a colleage who accuses RFC 3066 of being sufficient) and it needs to speak to all the ways in which languages are used, distributed, selected, and even (perhaps I'm making enemies here) invented.
Extensibility can be anything from a lifesaver to a mere buzzword. For any descriptor to be extensible in a way which has value, it must be extensible in exactly the same ways in which the object it describes is extensible. If it is not, time and human creativity will obsolete it.
Sincerely, Dylan N. Pierce _______________________________________________ Ltru mailing list Ltru at lists.ietf.org https://www1.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.