[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] Re: script tag for IPA



Martin Hosken wrote:

The proper "repair" to this issue is to fix ISO 15924. Multiple script
subtags would be very difficult for users to understand and use
consistently. And we'd have to deal with canonical ordering rules,
prefix checking, and all sorts of other nastiness---all to figure out
which Latin transcription was used? Bah.

The registrar of ISO 15924 has indicated that he has no intention of
ever giving IPA a script code and that it is a variant of Latn. Perhaps
you can get him to change his mind, but I doubt it. So where does that
leave me? How do I tag text in the IPA script that can be in any
language? You are asking me to live between a rock and a hard place.

Very few ISO standards are entirely under the thumb of one person's opinion. Just because Michael indicated his opposition on an unaffiliated email list doesn't mean that a carefully prepared request to ISO 15924 would fail.

I'm not asking you to live between a Scylla and Charybdis. I'm asking you to carefully consider what you're asking for. The idea you presented would break a number of principles and commitments in the formation of language tags. Furthermore, I note that ietf-languages history as a registry is somewhat uneven---something that was reined in under RFC 4646 on purpose. I also note that adding more complex and arcane structures to language tags strikes me as questionable. The "finger of doubt" points at 15924 as the source of scripts. I would exhaust that angle first.


As Mark has stated, we need something to indicate that a script variant
is more significant than a region.

For starters, I don't believe in "script variants". I think there is a disconnect between the granularity of "script" as currently embodied in 15924 and "script" as needed in language tags. Mark and I were of the (possibly mistaken) impression that 15924 would be a bit more expansive in their interpretation of "script".

> For example, please prioritise the
aspects of "UK Glaswegian English written in IPA" in terms of the
components that have the most significance on the text and you will find
that UK comes last and Glaswegian second to last. But if IPA is marked
by an extension, it will come last.

So? Language tags cannot do everything. At some point one must look at the fact that two bodies of text have different tags and concluded that they are different, possibly mutually unintelligible, entities. RFC 4646 even says this in Section 4.2. While the relative importance of various subtags is a key feature, there may be times that the subtags cannot entirely reflect the real relationships of the variations.

And I think the correct resolution to the problem would be to get 15924 to register additional codes or to get a source for codes that mirrors 15924 while adding additional codes to suit the needs of (say) language and locale identification. "en-Lipa-GB-glasgow" would be the best solution to your conundrum.


In discussions with the ISO 15924 registrar on this, he seemed open to
the idea of extending the private use script code space. In addition, I
agree that since a script variant (in my 4 character scheme) would
always occur after a real script, there is no need to worry about
codespace overlap.

Script "variants" are just scripts in another container. In a language tag, I see very little benefit to having a tag like "el-grek-mono" instead of an "el-mono" or an "en-Latn-Lipa" over an "en-Lipa".

Not to mention: if script variations aren't registered in 15924, where
will they come from? What rules will be applied to their registration?
Why does anyone think ietf-languages will be a good arbiter of said
variations?

ISO 15924 hasn't scored too highly for us so far. Addressing what a
script variant really is will need some discussion, of course.

Yes, and from hard experience I don't believe that ietf-languages is a better solution. A few email exchanges with ISO 15924 folks does not indicate utter failure and other ideas might be useful before creating new language subtag fauna.


Remember that ISO 15924 isn't our standard to control. It's coming from
TC46.

"Controlling a standard" implies some measure of responsibility. In this case, RFC 4646 strives to *avoid* making up its own rules and "working around" the underlying ISO standards. In fact, registration of language tags has a long and dubious history in this regard: many existing registrations have later been repented when (for example) ISO 639 took action at a later date.

The most likely possibility is that ISO 15924, by its definition, does not define all of the orthographic variations that are needed in language tags. While you (or I) might not like the 15924 JAC's decisions, I must admit that they have a certain logic.

However, I also note that modifying language tags is probably not the best method of overcoming this deficiency. There is precedent for creating a project such as an ISO 15924 Part 2 "Codes for the representation of scripts and script variations". Such a project, if it shared a codespace with 15924-1, would be a more methodical and consistent way to register the values you seek.


In the meantime, please send me the form to request 7000 language
variants or extensions (since both are registered by language).

Extensions are NOT AT ALL associated with particular languages (unless the RFC that defines them says they are).

Variants are NOT required to be associated with specific languages, although there is stern resistance by some to "generic" variants. If you were to propose (say) 'phonipa' as a variant with no Prefix field, I would probably support it.

And the form is in RFC 4646.


I would encourage folks to think about how the language tag can be made
productive. If every possible language tag in effect has to be
registered, it will push many tags underground and the x- extension
space will become far more popular than we might want it to be.

Not really. Hardly anyone uses the registered stuff. And the x- private use space was made productive with subtags to facilitate those with specialized needs. I *hope* that more people can derive use from tags such as en-x-ipa-glasgow (oh, hey! there's your tag :-) ).

I, for
example, have to deal with emerging writing systems and storing data in
them for archival purposes long before such writing systems are well
established (in some cases), if every such instance needs to be
registered RFC4646 will be seen as a bottle kneck to be worked around
rather than in collaboration with.


Your use case is explicitly what private-use subtags are for: idiosyncratic or specialized requirements.

Yes, every subtag you want to use has to be registered if you want it to be non-private-use. But previously you needed to registered entire tags (including every variation you wanted), so 4646 greatly improves your situation.

But I would start with really trying harder with 15924, followed by looking at creating a registry for "scripts and script variations", preferably within TC 46. Nothing that does not confront us today will prevent us from adding "script variants" in the future if they are truly necessary. But I don't see that as strictly necessary yet.

Addison

--
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.