[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Ltru] Great Script Debate "the Next Generation"... (long)
Thoughts:
I wonder if in assigning only commonly
used scripts to a "script" designator -- as opposed to a "suppress-script"
designator -- it might be assumed that other more obscure scripts not entered
into the registry are invalid. (Granted, this problem may exist with
today's suppress-script field as well, but adding more than one option
might present new problems.)
If a user had a Somali document written
in Osmanya and didn't find Osma as a choice for "script" because
it's not as commonly used (today), would the classifier assume that so-Osma
and so-Arab were invalid or deprecated under RFC 4646bis? To avoid this
confusion, I fear the theoretical Army of Charitable Linguists might be
required to do the script research to populate the registry with additional
script variations.
On the other hand, it would be very
useful to be able to see all common script options for a particular language
in the registry when setting up systems.
Regards,
Karen Broome
Metadata Systems Designer
Sony Pictures Entertainment
310.244.4384
| Addison Phillips <addison at yahoo-inc.com>
10/05/2006 11:58 AM
|
|
To
| "'LTRU Working Group'" <ltru at ietf.org>
|
|
cc
|
|
|
Subject
| [Ltru] Great Script Debate "the
Next Generation"... (long) |
|
All,
Taking up the gauntlet flung down by John Cowan :-), herewith my
proposal for fixing Suppress-Scripts
----
First, some recap of the problem. Suppress-Scripts are meant to identify
languages that are written predominantly in a single script. This is to
warn users and implementers not to form language tags using a script
subtag that is usually redundant, for compatibility with tagging
practices prior to RFC 4646.
S-S poses a number of interesting problems.
I've cited previously the registration problems. Mainly the problem here
is that most languages fit the pattern of "wanting" some form
of S-S
field. Even languages that might not have a clear relationship to a
specific script subtag (cf. Doug's research on Korean) probably should
not use a script subtag.
In particular, creating accurate values for even the ISO 639-1 and ISO
639-2 set of languages would require significant knowledge about the
current and recent historical writing traditions of each given language,
plus, possibly, some knowledge of public policy and/or potential
suppression or abuse of minority tradition in regard to that language.
S-S indicates that a given language is written predominantly in a
specific script, so the burden of proof for a less common language might
be very difficult to achieve, since the presence of many texts in a
specific script does not "prove the negative", that is, that
a
significant body of texts or a specific writing tradition does not exist
that uses a separate script.
The main alternative we've dealt with in the past would be an
"Accept-Script" or "Recommend-Script" approach. That
design, which was
not adopted in RFC 4646, involves documenting the known cases in which
a
script subtag *should* be used, and, in effect, recommending that
languages that do not have an "A-S" field not use a script subtag
except
when indicating a specific difference important within a given group of
information items.
A-S avoids the "proving the negative" problem of S-S. Since it
applies
to a much smaller set of languages, it probably requires less
registration overhead. The burden of proof may be just as difficult to
achieve and is encumbered with essentially the same problems that attend
S-S, since assertions about multiple script usage are just as
potentially disruptive as assertions about the "single scriptness"
of a
language.
Removing script information from the registry altogether is appealing as
an alternative. As Mark points out, script subtags are entirely
voluntary and entirely valid. The informational nature of the S-S field
is merely to help guide implementers and users to try and do the right
things. If maintenance is a nightmare, why persist in maintaining
somewhat fictional information?
I do think that guidance for users/implementers is a valid goal here. My
experience as an "eminence grise" for language tags over the
past couple
of years is that the level of ill-informedness and mythology surrounding
language tags is pretty deep. Anything we can do to help speed proper
implementation of language tags will help.
The problem here is that I think we're miscasting the role of the script
advisory field or fields in the registry. If we only document the "do
not use" case, users and implementers will remain ignorant of what
to do
for languages without the S-S field. Having explained the Chinese issue
several times, it's clear to me that many implementers will not stumble
over the right subtags by accident... and certainly not for languages
such as Serbian, Uzbek, or Azerbaijani.
If we only document the "do use" cases, though, users and implementers
may not notice the warnings against use of scripts elsewhere. Leading to
the problem we initially sought to prevent.
Thus, my proposal for solving the problem:
1. Include the strongest possible warnings about not using script
subtags in 4646bis. This is probably embodied in Karen Broome's
suggested texts.
2. Replace "Suppress-Script" with "Script". If no script
field is
supplied, language tags/ranges should still not use a script subtag
unless one is warranted by the information item or request. If the
script field is present and contains a single item, the language is
known to use that script predominantly. If two or more items are
present, the language is commonly written in more than one script. Users
are advised *not* to use a script subtag unless the language has more
than one item in the Script field.
Potential issues:
1. zero or one script subtag have the same behavior. Acquiring a second
script, however, requires extra scrutiny by ietf-languages because it
changes the potential default behavior for tag formation for that language.
Reactions?
--
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Internationalization is an architecture.
It is not a feature.
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions
of the senders and do not imply endorsement by the IETF.