[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ltru] Great Script Debate "the Next Generation"... (long)



All,

Taking up the gauntlet flung down by John Cowan :-), herewith my proposal for fixing Suppress-Scripts

----
First, some recap of the problem. Suppress-Scripts are meant to identify
languages that are written predominantly in a single script. This is to
warn users and implementers not to form language tags using a script
subtag that is usually redundant, for compatibility with tagging
practices prior to RFC 4646.

S-S poses a number of interesting problems.

I've cited previously the registration problems. Mainly the problem here
is that most languages fit the pattern of "wanting" some form of S-S
field. Even languages that might not have a clear relationship to a
specific script subtag (cf. Doug's research on Korean) probably should
not use a script subtag.

In particular, creating accurate values for even the ISO 639-1 and ISO
639-2 set of languages would require significant knowledge about the
current and recent historical writing traditions of each given language,
plus, possibly, some knowledge of public policy and/or potential
suppression or abuse of minority tradition in regard to that language.

S-S indicates that a given language is written predominantly in a
specific script, so the burden of proof for a less common language might
be very difficult to achieve, since the presence of many texts in a
specific script does not "prove the negative", that is, that a
significant body of texts or a specific writing tradition does not exist
that uses a separate script.

The main alternative we've dealt with in the past would be an
"Accept-Script" or "Recommend-Script" approach. That design, which was
not adopted in RFC 4646, involves documenting the known cases in which a
script subtag *should* be used, and, in effect, recommending that
languages that do not have an "A-S" field not use a script subtag except
when indicating a specific difference important within a given group of
information items.

A-S avoids the "proving the negative" problem of S-S. Since it applies
to a much smaller set of languages, it probably requires less
registration overhead. The burden of proof may be just as difficult to
achieve and is encumbered with essentially the same problems that attend
S-S, since assertions about multiple script usage are just as
potentially disruptive as assertions about the "single scriptness" of a
language.

Removing script information from the registry altogether is appealing as
an alternative. As Mark points out, script subtags are entirely
voluntary and entirely valid. The informational nature of the S-S field
is merely to help guide implementers and users to try and do the right
things. If maintenance is a nightmare, why persist in maintaining
somewhat fictional information?

I do think that guidance for users/implementers is a valid goal here. My
experience as an "eminence grise" for language tags over the past couple
of years is that the level of ill-informedness and mythology surrounding
language tags is pretty deep. Anything we can do to help speed proper
implementation of language tags will help.

The problem here is that I think we're miscasting the role of the script
advisory field or fields in the registry. If we only document the "do
not use" case, users and implementers will remain ignorant of what to do
for languages without the S-S field. Having explained the Chinese issue
several times, it's clear to me that many implementers will not stumble
over the right subtags by accident... and certainly not for languages such as Serbian, Uzbek, or Azerbaijani.

If we only document the "do use" cases, though, users and implementers
may not notice the warnings against use of scripts elsewhere. Leading to
the problem we initially sought to prevent.

Thus, my proposal for solving the problem:

1. Include the strongest possible warnings about not using script
subtags in 4646bis. This is probably embodied in Karen Broome's
suggested texts.

2. Replace "Suppress-Script" with "Script". If no script field is
supplied, language tags/ranges should still not use a script subtag
unless one is warranted by the information item or request. If the
script field is present and contains a single item, the language is
known to use that script predominantly. If two or more items are
present, the language is commonly written in more than one script. Users are advised *not* to use a script subtag unless the language has more than one item in the Script field.

Potential issues:

1. zero or one script subtag have the same behavior. Acquiring a second script, however, requires extra scrutiny by ietf-languages because it changes the potential default behavior for tag formation for that language.


Reactions?



--
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.


_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.