[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] Uniqueness of variant subtags



Phillips, Addison 2008-10-03 00.58:

You here use the word "romanization" - aka latinisation -
instead of transcription. Can we rule out that UNGEGN will
not specify e.g. Cyrillic transcriptions? (But if they should
add Cyrillic transcriptions, then one could just add e.g.
"en-Cyrl" to the list of prefixes.)

Mark is talking about a generic variant: one with no prefixes
at all.


Mark said: 'ungegn' [...]  productively used with 'ru-Latn' [...]
But in an earlier message he said:

Type: variant
Subtag: 2003       [ snip ]
Prefix: be-ungegn

I think Prefix: be-Latn-ungegn would have be more consistent.

Others have pointed out the fallacy in this example, which does
not invalidate the point. Registrations SHOULD be specific
about script or region or even variant (in addition to
language) when they are actually confined to a specific usages.
See, for example, the registry record for '1994'.


It is interesting to compare the prefixes allowed for "1996",

	Prefix: de

to those allowed for "1994":

	Prefix: sl-rozaj
	Prefix: sl-rozaj-biske
	Prefix: sl-rozaj-njiva
	Prefix: sl-rozaj-osojs
	Prefix: sl-rozaj-solba

Allthough "sl-rozaj-lipaw" is not listed, since "sl-rozaj" is listed, how can we know that "sl-rozaj-lipaw-1994" is not permitted? The only answer is: by applying the logic that the Prefix fields list *all* the possible "minimum prefixes".

But then, following the same logic, it should also be possible to write "de-NO-1996". In other words, I think the 1996 registration should have listed these "minimum prefixes":

	Prefix: de
	Prefix: de-DE
	Prefix: de-AT
	Prefix: de-CH
	Prefix: de-LI

Only then would "1996" and "1994" appear to follow the same rules.

"1996" and "1994" have much in common: I don't think that 1996 lays down one 100% identical norm for all 4 country norms. Just as for the 1994 norm, 1996 has room for some region/location spesific variants. And, just as for the Lipovaz/lipaw dialect of Resian, there are some locations (Luxemburg/"LU") which was not formally covered by the reform.

Hence one should think that "de-LU-1996" would be wrong, just as "de-NO-1996" would be.

The taggers would then have to know the Suppress-Script rules
in order to understand that in real life tagging, they ought
to write "de-DE-1996" and not "de-DE-Latn-1996". And they
would also have

Uh... taggers already need to know when to use subtags. They
are specifically advised not to use scripts unless it adds
something to the tag,


Right. My proposal was not to change that advisory.

a specific example of the more general
advisory not to use subtags that add nothing to the overall
tag. Most German applications are actually fine with "de". In
some cases "de-DE" or "de-DE-1996" are appropriate. Rarely
"de-Latn-DE-1996" is appropriate---probably when the German in
question is also rendered with other scripts in the same
document/collection.


to know the rules in order to know that they can/should/could
also often drop "DE" and just write "de-1996".

RTFRFC.

Of course. However, in order to know that the 1996 norm applies to the German as used in the regions "DE", "CH", "AT" and "LI", one must read Wikipedia - not the RFC.

Just because a language subtag has (or hasn't) a
Suppress-Script field, does not necessarily imply that the
variant subtag has the same - or the same lack of - script
association. Therefore I think

Yes it does.


OK. But there are some language subtags for which there aren't a suppress-script field. E.g. Turkmen language. So, for the hypothetical 2008 reform of the Turkmen langauge, one would have to include "Latn" as part of the prefix in order to make clear what the reform relates to:

	Prefix: tk-latn

The question then is: How do you discern this from the transliteration variants you discuss below, where the "latn" is an *required* prefix (which it probably should not be for "tk")?

Possible answer: in order to allow tk-1998, yet at the same time make clear in the registry that it relates to Turkmen of Latin script of Turkmenistan, the follwoing prefixes would have to be listed - in order to list all "minium prefix" variants:

	Prefix: tk
	Prefix: tk-latn
	Prefix: tk-TM

This would rule out adding "1998" to any other region subtags than -TM or to any other script subtag than -latn.

Note that transliteration variants, such as
'wadegile', have a Prefix (zh-Latn, in that case) to convey the
fact that the script is needed. Although 'zh' does not suppress
a script, the same thing can be said of (for example)
romanizations of other languages. "be" requires no script and
suppresses "Cyrl", but "be-Latn-ungegn" would probably be a
good choice if 'ungegn' were a valid subtag representing a
Latin transcription scheme.


OK. So the current tradition is that the Prefix field lists the *minimum prefix* that must be present before the variant subtag in question can be used.

If one cannot use the Prefix field to do this, then one
should invent a new field for this particular purpose.
Perhaps a Relates-to field.

I think this is overkill. At some point we have to let the LSR
register subtags and at some point we have to let people tag
stuff. It is difficult enough--and maybe too difficult--for
people to understand the information there today.


I agree. But then we must commit overkill by adding more prefix fields instead, I think, such as I proposed for Turkmen above.
--
leif halvard silli
_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru



Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.