One problem with scripts is that scenarios fall along one or more continua rather than falling into a few distinct buckets: English is 99.99% of the time written in Latin script with certain limited orthographic variations. Arabic is similar. At the other extreme, we know that Hans and Hant are both viable and widely-used written forms for zh. Now consider Somali: any new text is written in Latin, but just a few decades ago this was transitional. Or consider Balkan languages: a language might be written in more than one script, with one clearly the majority case though the other can't be disregarded as rare. These and other cases fall somewhere between the two extremes. For the cases at the two extremes, whether we declare a known script (en uses Latn; zh uses Hans, Hant, Hani) or declare scripts to be suppressed (don't use Latn with en) might not be a huge difference. It's the in-between cases that present the biggest problems. Although, even in the cases at the two extremes, we need to consider how to deal with the less common usages: e.g. English in shorthand, Chinese in Bopomofo. Clearly tags for these should have a script tag; it's just not clear what we ought to declare about the languages in the registry. If we're declaring scripts, then for en do we just list Latn? If so, then we're not really declaring all scripts used for that language; we're only declaring commonly-used scripts. Maybe that's another alternative to consider: we have two data fields that can be used, one to declare scripts in common usage for modern orthographies, and another to declare other scripts that are known to be used (but only in limited, uncommon scenarios). If the former field has only one value, then we can assume it is safe to suppress that. Just thinking out loud... Peter Constable
_______________________________________________ Ltru mailing list Ltru at ietf.org https://www1.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.