Look, I'm happy *enough* with my low-cost, two-piece solution. What turned you off? Was it that you found my logic below compelling or just that you didn't agree with the design below? If it is just the design, I can go with the original proposal. If it is the logic, then help me out here: what works? Consensus is good. I'm happy to take a design I don't like for some personal reason (I might be wrong in my reasoning, after all). But it should be the right solution for the problem, if possible. Addison Addison P. Phillips Globalization Architect, Quest Software Chair, W3C Internationalization Core Working Group Internationalization is not a feature. It is an architecture. > -----Original Message----- > From: McDonald, Ira [mailto:imcdonald at sharplabs.com] > Sent: dimanche 17 avril 2005 12:14 > To: Addison Phillips; Doug Ewell; LTRU Working Group > Subject: RE: [Ltru] Re: Proposed Text for Moving Forward > > -1 > > Ah, I see, we _are_ all wasting our time. > > Ira McDonald (Musician / Software Architect) > Blue Roof Music / High North Inc > PO Box 221 Grand Marais, MI 49839 > phone: +1-906-494-2434 > email: imcdonald at sharplabs.com > > > -----Original Message----- > > From: ltru-bounces at lists.ietf.org > > [mailto:ltru-bounces at lists.ietf.org]On > > Behalf Of Addison Phillips > > Sent: Sunday, April 17, 2005 3:08 PM > > To: Doug Ewell; LTRU Working Group > > Subject: RE: [Ltru] Re: Proposed Text for Moving Forward > > > > > > I agree with you 100%, Doug, because I think that we have it > > 180 degrees reversed. > > > > The problem with default script, as you point out, is that it > > puts the registry maintainers into a position of having to > > adjudicate about a language's script. "en" is a very poor > > test case because it is nowhere near an edge case. > > > > The problem is that registering a default script for a > > language is like trying to prove a negative. Let's pick on > > Serbian or Azerbaijani for a minute. Let's say someone tried > > to register a default script for one of these. You could > > point to all of the sr-Cyrl or az-Latn texts you want. That > > doesn't prove the non-existence or non-importance of other > > scripts for that language. I'm concerned that later > > registrations might cause a need for general retagging and > > that is BAD. > > > > For the registry process to have meaning, I think that we > > should stick to what is actually demonstrable: you can > > register the fact that a language is commonly written in two > > or more scripts because you can demonstrate that. This would > > make the rules and design simpler and *also* make the entries > > more robust. The rules would then be: > > > > 1. Script subtags SHOULD NOT be used to form a language tag > > unless they add some distinguishing information. For example, > > the 'Latn' subtag is generally unnecessary with the primary > > language 'en' because nearly all English documents are > > written in Latin script. > > > > 2. The script subtag SHOULD always be used to form a language > > tag when the script of the tagged content matches a > > 'Required_Script' field for the associated primary language. > > For example, you should use the 'Hant' script subtag to form > > the language tag "zh-Hant-TW", even though most Chinese > > language documents in Taiwan are written in the Traditional > > Chinese script. The 'Required_Script' field generally appears > > in primary languages that are written in more than one script > > and which might otherwise be ambiguous. > > > > IOW, I added script suppression as a sop to default script, a > > concept that is easy to understand and seems desirable, but > > which may be difficult to maintain in practice (see all the > > deprecated primary language registrations for examples of the > > registration process being used in unproductive ways). > > > > Best Regards, > > > > Addison > > > > Addison P. Phillips > > Globalization Architect, Quest Software > > Chair, W3C Internationalization Core Working Group > > > > Internationalization is not a feature. > > It is an architecture. > > > > > -----Original Message----- > > > From: ltru-bounces at lists.ietf.org > > [mailto:ltru-bounces at lists.ietf.org] On > > > Behalf Of Doug Ewell > > > Sent: dimanche 17 avril 2005 01:05 > > > To: LTRU Working Group > > > Subject: [Ltru] Re: Proposed Text for Moving Forward > > > > > > Addison Phillips <addison dot phillips at quest dot com> wrote: > > > > > > > I agree, Mark, that the full effect can be achieved with only one > > > > field and that your proposal is superior in a number of > > regards (fewer > > > > moving parts, ease of maintenance, ease of application). > > > > > > > > I proposed two, though, for a reason. One of the > > objections was that > > > > we didn't document when a particular script really ought > > to be used > > > > (i.e. that you really should start to use zh-HanX-XX in > > preference to > > > > zh-XX). > > > > > > I know I said I would back off and not get involved in the > > > default-script issue. OK, so I lied. Sorry about that. > > > > > > AFAICT, this whole issue started with the concern that > > people would use > > > a script subtag in cases where it was generally thought to be (a) > > > unnecessary, because the intended script would be obvious, and (b) > > > undesirable, because it would interfere with left-prefix > > (RFR) matching. > > > > > > The standard example was "en-Latn-US." The case was made that the > > > overwhelming majority of written U.S. English text is written in the > > > Latin script, so the added flexibility of being able to specify the > > > script would be largely unnecessary, and in particular it would be > > > overshadowed by the inability of existing left-prefix matching > > > algorithms to match "en-Latn-US" with "en-US" (sometimes > > generalized to > > > "broken backward compatibility"). > > > > > > This was the foundation of "default script": certain languages like > > > English could be listed as having a default script of > > Latin, so that tag > > > generators could avoid creating tags like "en-Latn" or "en-Latn-US" > > > whose disadvantages would outweigh their advantages. > > > > > > Of course, in certain circumstances you might have English > > written in > > > Braille, or even in Cyrillic, and most if not all seemed to > > agree that > > > in these rare circumstances it would be acceptable to generate > > > "en-Brai-whatever" or "en-Cyrl-whatever." > > > > > > The standard counterexamples were "zh-Hans" and "zh-Hant." > > The case was > > > made that Chinese is commonly written in both of these > > script variants, > > > and it would often be beneficial to include script > > information, to the > > > point of perhaps being more important than strict compatibility with > > > left-prefix matching algorithms. Languages like Chinese > > and Azerbaijani > > > and Serbian, after all, were the major use cases for the > > introduction of > > > script subtags in the first place. > > > > > > So unlike "en-Latn", it would not be discouraged to write > > "zh-Hans" or > > > "zh-Hant", or either of these followed by a region subtag. > > Of course, > > > just like English, Chinese could also be written in a less obvious > > > script like Braille or Cyrillic, and so "zh-Brai" and > > "zh-Cyrl" ought to > > > be allowed as well. > > > > > > I concede that because of the stated predominance of > > processes that use > > > left-prefix matching, it might be beneficial to define a > > default script > > > for common languages that are written in a single script > > 99.9% of the > > > time. I still don't know where the authority comes from to > > decide which > > > languages and which scripts get marked in this way -- definitely not > > > from ISO or documented registrations or deterministic rules, like > > > everything else in the registry -- but I assume that would > > be worked out > > > in due course. > > > > > > What I do NOT understand is how this has expanded to > > telling people when > > > they SHOULD use script subtags, and how the set of allowable subtags > > > should be limited in some way. > > > > > > There may well be cases where "zh" or "zh-CN' or "zh-TW" is > > all that is > > > needed, and there is certainly existing data that uses such tags. I > > > don't see any justification for discouraging such usage, > > even if we have > > > defined a way to tag Chinese data more precisely. > > Likewise, if there is > > > no prohibition against writing "en-Brai" or "en-Cyrl", then I see no > > > reason to prohibit or discourage "zh-Brai" or "zh-Cyrl" either. A > > > "required-script" field would do exactly this, by listing 'Hans' and > > > 'Hant' but not others. > > > > > > This is too prescriptive. It tells people how they SHOULD > > tag data, not > > > just in terms of "tag content wisely" or "don't be > > excessively precise," > > > but on a specific language-by-language basis. It assumes, > > implicitly, > > > that this group or ietf-languages has the expertise and authority to > > > make this judgment. Unlike default-script, required-script > > does nothing > > > to solve the left-prefix matching problem, and as such, I > > don't think > > > it's within the scope of the charter. > > > > > > I propose the following: > > > > > > 1. An optional, informative default-script field that > > would suggest to > > > tag generators that they not use that particular script > > subtag together > > > with that particular language subtag. This field could be added, > > > changed, or removed at any time. (It doesn't matter much > > what the field > > > is called, and I renew my suggestion that we not try to > > inject too much > > > deep meaning into the names of fields, or assume that users > > will derive > > > deep meaning from them.) > > > > > > 2. NO requirement within the draft that tag generators > > "must not" use > > > script subtags in any given scenario. The text in draft-01 that > > > discourages the use of a script subtag "unless it conveys additional > > > information" should be adequate. > > > > > > 3. NO mechanism to tell tag generators that they "should" > > use a script > > > subtag together with any particular language subtag, and > > *especially* > > > not one that lists the "expected" script subtags while > > excluding others. > > > If tag generators opt to create a tag such as "zh-TW" that "may be > > > ambiguous without script information," that should be up to them. > > > > > > -Doug Ewell > > > Fullerton, California > > > http://users.adelphia.net/~dewell/ > > > > > > > > > > > > _______________________________________________ > > > Ltru mailing list > > > Ltru at lists.ietf.org > > > https://www1.ietf.org/mailman/listinfo/ltru > > > >
_______________________________________________ Ltru mailing list Ltru at lists.ietf.org https://www1.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.