Mark Davis さんは書きました: > Peter's addressed some of the questions. Back to your original question. > > For backward compatibility, we'll continue to represent Mandarin as > "zh", Standard Arabic as "ar", and so on. Note that this is > independent of whether extlang is used or not. That is, if extlang > exists, we'll treat incoming "zh-cmn" as if it were "zh"; if it > doesn't, we'll treat "cmn" as if it were "zh". And under either > scenario it is conformant to tag Mandarin as 'zh'. > > Why? > While 639-3 now specifies clearly that "de" means (for example) just Standard German, while "zh" means Any > Chinese, this clarity of specification was not present earlier. The > code "zh" has been used in the past for Mandarin, overwhelmingly so; > not just 99% or 99.9%, but many 9's. As you said, the tendency was to > use illegal (or private use codes) for non-Mandarin content. All of > our internal software and any external software that we talk to will > expect Mandarin to be tagged as 'zh' for the forseeable future. Of > course, we recognize that others may end up using 'zh-cmn' / 'cmn', so > we're prepared to deal with that. > > Note also that really the whole premise of extlang is that 'zh' > continues to normally map to Mandarin. After all, if 'zh' really > meant that you were as likely to get Gan or Hakka as Mandarin, > then having "zh-yue" in order to get some kind of automatic > fallback wouldn't make any sense. > > > Other comments below. > > On Thu, May 29, 2008 at 6:04 PM, Broome, Karen > <Karen_Broome at spe.sony.com <mailto:Karen_Broome at spe.sony.com>> wrote: > > Mark, > > One thing I think you aren't acknowledging is that "treat as > synonyms" means something very different to the vast numbers of > content creators who use this standard than it does the handful of > search engines that use the fuzzy logic associated with companion > standards. As you note in your document, "It is clear that > companies like Google or Yahoo can work around the problems with > extlang." How many other users need and can afford to implement > the extended fallback and filtering logic? Enough that this logic > should be the primary driver behind the chosen solution? > > Before I spend too much time picking apart your lengthy screed > involving a scenario where the BBC presents its web site in > Sudanese Creole Arabic with rotating languages code logic for each > day of the week ... (ahem) ... here's my real-world Chinese > language list: > > Chinese (Variant Unknown) > Chinese (Cantonese, Spoken) > Chinese (Cantonese, Written) > Chinese (Mandarin, Spoken) > Chinese (Mandarin, Spoken Taiwanese) > Chinese (Mandarin, Simplified) > Chinese (Mandarin, Traditional) > Chinese (Taiwanese, Spoken) > Chinese (Taiwanese, Written) > > > Sorry you consider it a scree. I realize that the emails have > sometimes gotten heated -- email really is a poor substitute for audio > discussions in controversial issues; I've seen many, many issues in > Unicode and other standards flare for months in email, and be resolved > in a few hours of discussion. Which let's me think that a few hours meeting, maybe even face-to-face, between the two of you and potentially others, could be very helpful. Felix > > My real point is that if a query for 'ar' really means "give me any > kind of Arabic", then a query for 'ar' would be almost meaningless, > since it could return any of a number of mutually incomprehensible > alternatives. Although 639-3 now defines it to be "any Arabic", in > practice what users will expect to get back is Standard Arabic, and > they would be unpleasantly surprised to get back other varieties. And > our purpose should be to avoid our users' getting unpleasant surprises. > > > > > (Apologies, this is hard to represent in ASCII. I have a > mini-spreadsheet if someone wants it.) > > > 1 2 3 4 > a. zh zh zh zh > b. zh-yue yue yue yue > c. zh-yue yue yue yue > d. zh-cmn cmn zh cmn > e. zh-cmn-TW cmn-TW zh-TW cmn-TW > f. zh-cmn-Hans cmn-Hans zh-Hans zh-Hans > g. zh-cmn-Hant cmn-Hant zh-Hant zh-Hant > h. zh-min-nan nan nan nan > i. zh-min-nan nan nan nan > > > above modified slightly to add row references. > > > > > * Option #1 (RFC 4646) contains the codes as I have them today. > > Note that this is not actually RFC4646 conformant: zh-cmn-TW is not valid. > > > * Option #2 (RFC 4646bis) contains the codes if I choose to go > against the grain and use "cmn". > > > * Option #3 (RFC 4646bis) treats "zh" and "cmn" as synonyms; > avoids using "cmn" for compatibility. > * Option #4 (RFC 4646bis) contains the codes "cmn" for spoken > context (where distinction is essential) and "zh" for written context. > > Comments: > > * Option #1 is unambiguous and shows that there is a relationship > between these languages. It also preserves the legacy "zh" tag so > developers that aren't hip to later versions of BCP 47 or 639-3 > will have some idea what these tags mean. The tags are maybe > longer than they need to be, but if I need a fixed-length tag, I > can wait for 639-6. The languages may not be mutually intelligible > in some contexts, but they are related. > > * Option #2 is unambiguous, but Microsoft, Google, and Amazon > won't be using the same tags for Chinese that I do. Even if I > don't follow their lead, others likely will. This worries me. > Also, the rules for #2 must include fuzzy guidelines such as, "use > the 'zh' tag except when you think it's a bad idea" and "use the > shortest tag except when you don't want to." This presents > complications in trying to explain some sort of consistent method > to the LTRU madness to others. Given this, I start to wish ISO > 639-6 a safe and speedy passage. > > * Option #3 is what I believe you might suggest, but for me, > that's the worst list of all. There are five ambiguous "zh" > categories on that list. It follows the "always use the shortest > tag" rule and respects history, but it's useless to me from an > identification perspective. > > > Your list is already ambiguous for columns 1 and 2; you are using > "yue" for two different things (written and spoken). The only change > it really makes is that you don't have a term for "any chinese". > > RFC 4646 lacks terms for many, many combinations of things: a term for > "any german" (including de, gsw, ...), "any french", "any > scandinavian", or any one of the countless other possible sets of > languages that people consider to be important for some particular > purpose. That's why lists of languages are really the appropriate vehicle. > > > > * Option #4 has three ambiguous tags and means I have to explain > to people who aren't in this industry about why I use different > tags for the same language. This strategy is less ambiguous that > #3, but I'm not sure I can explain it to other content creators > for the same reasons as #2 and presents the spoken/written > complication others may not want. In the long run, this seems > messy and unclear enough that it will result in bad tagging. > > * Options #2,3,4: In general, it worries me that RFC 4646bis > offers so many "preferred" options for the same thing. I really > can't see how this simplifies things for anyone. > > I don't have a need for fuzzy fallback scenarios. I need precise > tags and mostly simple lookup. I think if you take the fallback > scenarios and absurdities out of the document you reference, I > don't think there's much left. > > > The only purpose I have heard for extlang *is* for fallback; that's > why the document goes into (painful) depth on that topic. For > identification alone, "zh" and "zh-cmn" really mean just the same > thing. It is only in the context of matching (filtering and lookup) > that they differ in semantics *because of their behavior*: where "cmn" > means simply Cantonese, "zh-cmn" effectively means "Cantonese but > fallback to any Chinese". > > > > Regards, > > Karen Broome > > > > > >-----Original Message----- > >From: ltru-bounces at ietf.org <mailto:ltru-bounces at ietf.org> > [mailto:ltru-bounces at ietf.org <mailto:ltru-bounces at ietf.org>] On > Behalf > >Of Mark Davis > >Sent: Thursday, May 29, 2008 4:00 PM > >To: debbie at ictmarketing.co.uk <mailto:debbie at ictmarketing.co.uk> > >Cc: LTRU Working Group > >Subject: Re: [Ltru] Consensus call: extlang > > > >What would be useful is to hear from the extlangistas what their > >concerns are specifically; many have not given reasons for favoring > >encompassed languages into extlang instead of into the primary > >language subtag. It would be useful for them to give the scenarios > >where they think extlang is an improvement. It would be useful to > >find out why they think the scenarios such as in > >http://docs.google.com/Doc?docid=dfqr8rd5_676kxxxjhd&hl=en > <http://docs.google.com/Doc?docid=dfqr8rd5_676kxxxjhd&hl=en> are not a > >problem. > > > >Clearly people think that using the extlang model solves more > >problems than it causes, so it would be useful to example specific > >cases and see if that is, in fact, true. > > > > > >Mark > > > > > -- > Mark > ------------------------------------------------------------------------ > > _______________________________________________ > Ltru mailing list > Ltru at ietf.org > https://www.ietf.org/mailman/listinfo/ltru > _______________________________________________ Ltru mailing list Ltru at ietf.org https://www.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.