+1 generally Frank Ellermann wrote:
Doug Ewell wrote:Okay, my first impression is "nothing special or unexpected" - there's no A -> C shortcut for A -> B -> C yet, because that's not yet in the rules (4646bis), correct ?"Yet" is an understatement. There is no consensus for making this destabilizing change.The version without shortcuts is worse, applications are forced to follow Preferred-Value chains until they (hopefully) arrive at a tag or subtag without Preferred-Value. Or if the registry is in a dubious state with cycles they have to implement their own sanity check.
I think it a useful policy to administer the registry such that A -> C is always the case rather than A -> B -> C. But forbidding it might not make sense in some cases. In particular, maintaining "stale" pointers allows implementations to see the progression of mappings. This allows a registry processor to reconstruct (within some limits) the state of the registry at various intervening dates.
That is: on Date A: xx-AA is the canonical tag on Date B: xx-BB is the canonical tag on Date C: xx-CC is the canonical tag Addison
Of course we don't expect that the registry is ever in a dubious state, we have "registry validators" for this purpose. But then it's again simpler to verify that Preferred-Value always arrives at a tag or subtag that has no further Preferred-Value. Last but not least it would eliminate the "hakka" reference in a Preferred-Value, where "hakka" is the only "non-subtag" in any reference.there is no "collision" between 5- to 8-letter language subtags (if any existed) and 5- to 8-letter variant subtags, any more than there is a collision between "ar" for Arabic and "AR" for Argentina.Of course there is for naive implementations, the theory of the 4646 structure is that you can decompose a tag into subtags, and still tell what's what based alone on its length. 3 digits => UN number 3 alpha => language (for 4646) 4 alpha at begin => language (reserved) 4 alpha later => script 4 alnum starting with a digit => variant 2 alpha at begin => language 2 alpha later => region 5..8 alpha at begin => language 5..8 alnum later => variant For that a 4646 decomposer only needs to note "at begin" for the disambiguation of 2 alpha or 4..8 alpha. For 4646bis it's in essence the same idea adding: 3 alpha at begin => language (for 4646bis) 3 alpha later => extlang In my old 4646 validator I've implemented this by adding a dummy hyphen for anything that's not "at begin", and after that it was as you said obvious that "AR" (case insensitive) is not "-AR". For 4646bis that isn't good enough anymore, because language and extlang share the same namespace, as soon as there is a language "ang" there can't be an extlang "-ang". Therefore I took the canonical case without adding a dummy hyphen. With that I'd catch an erroneous extlang "ang", and case-sensitive "ar" and "AR" are different. That's of course how I found the old yi-latn instead of yi-Latn. But that's not yet enough for a registered language "abcde" and a registered variant "abcde", I've to implement some logic for the variants (and references to variants). With a registry validator anybody including IANA can check that a registry is in a plausible state. Or verify that a proper subset is complete (all references resolved). Frank _______________________________________________ Ltru mailing list Ltru at ietf.org https://www1.ietf.org/mailman/listinfo/ltru
-- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature. _______________________________________________ Ltru mailing list Ltru at ietf.org https://www1.ietf.org/mailman/listinfo/ltru
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.