[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] Preferred-Value cycles



+1 generally

Frank Ellermann wrote:
Doug Ewell wrote:

Okay, my first impression is "nothing special or unexpected" -
there's no A -> C shortcut for A -> B -> C yet, because that's
not yet in the rules (4646bis), correct ?

"Yet" is an understatement.  There is no consensus for making this
destabilizing change.

The version without shortcuts is worse, applications are forced to
follow Preferred-Value chains until they (hopefully) arrive at a
tag or subtag without Preferred-Value.  Or if the registry is in a
dubious state with cycles they have to implement their own sanity
check.

I think it a useful policy to administer the registry such that A -> C is always the case rather than A -> B -> C. But forbidding it might not make sense in some cases. In particular, maintaining "stale" pointers allows implementations to see the progression of mappings. This allows a registry processor to reconstruct (within some limits) the state of the registry at various intervening dates.

That is:

  on Date A: xx-AA is the canonical tag
  on Date B: xx-BB is the canonical tag
  on Date C: xx-CC is the canonical tag

Addison


Of course we don't expect that the registry is ever in a dubious
state, we have "registry validators" for this purpose.  But then
it's again simpler to verify that Preferred-Value always arrives
at a tag or subtag that has no further Preferred-Value.

Last but not least it would eliminate the "hakka" reference in a
Preferred-Value, where "hakka" is the only "non-subtag" in any
reference.

there is no "collision" between 5- to 8-letter language subtags
(if any existed) and 5- to 8-letter variant subtags, any more
than there is a collision between "ar" for Arabic and "AR" for
Argentina.

Of course there is for naive implementations, the theory of the
4646 structure is that you can decompose a tag into subtags, and
still tell what's what based alone on its length.

3 digits => UN number
3 alpha  => language (for 4646)
4 alpha at begin => language (reserved)
4 alpha later => script
4 alnum starting with a digit => variant
2 alpha at begin => language
2 alpha later => region
5..8 alpha at begin => language
5..8 alnum later => variant

For that a 4646 decomposer only needs to note "at begin" for the
disambiguation of 2 alpha or 4..8 alpha.  For 4646bis it's in
essence the same idea adding:

3 alpha at begin => language (for 4646bis)
3 alpha later => extlang

In my old 4646 validator I've implemented this by adding a dummy
hyphen for anything that's not "at begin", and after that it was
as you said obvious that "AR" (case insensitive) is not "-AR".

For 4646bis that isn't good enough anymore, because language and
extlang share the same namespace, as soon as there is a language
"ang" there can't be an extlang "-ang".

Therefore I took the canonical case without adding a dummy hyphen.
With that I'd catch an erroneous extlang "ang", and case-sensitive
"ar" and "AR" are different.  That's of course how I found the
old yi-latn instead of yi-Latn.

But that's not yet enough for a registered language "abcde" and a
registered variant "abcde", I've to implement some logic for the
variants (and references to variants).

With a registry validator anybody including IANA can check that a
registry is in a plausible state.  Or verify that a proper subset
is complete (all references resolved).

Frank



_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru

--
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www1.ietf.org/mailman/listinfo/ltru




Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.