[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] Canonical variants




Mark

On Tue, Jul 15, 2008 at 9:06 AM, John Cowan <cowan at ccil.org> wrote:
Mark Davis scripsit:

> I'm just confused by this. If order is supposed to be semantically
> relevant,

It's not that it's necessarily semantically relevant.  It's that it's a
bad idea, considering the state of our ignorance, to *assume* it's going
to be semantically irrelevant in all possible future cases (modulo the
ones involving prefixes that Addison discussed).

It's like asking "Is the order of XML child elements semantically
relevant"?  Sometimes it is, sometimes it isn't.  Some schemas for
documents where the order is irrelevant prescribe a fixed order so as
to prevent user-specified channels from becoming a covert information
channel; others don't prescribe a fixed order.  Not having a general
scheme for canonicalizing child-element order costs something, but the
gain in flexibility is generally considered to outweigh it.

It's bad to leave it open, because then we have no idea whether rearrangement is supposed to determine meaning or not.


> *what* is it exactly that establishes what each order means (V1-V2 vs
> V2-V1), if not BCP47? Google? Microsoft? the DOD?

Ietf-languages, and after them, the community of usage.

But then what you are calling for is a mechanism we don't have. Remember, what we are talking about is the ordering of variant subtags that do not have a prefix relationship to one another. (If they do, then we have a canonical order.)

Do you really expect Ietf-languages to specify that V1 means X if it is before V2, and means Y if it is after? Or that V1 needs to come before V2 although V1 is not a prefix of V2? What would be an example of what you think that ietf-languages would do? Would this be in the description?

Or the "community of users"? How am I on the parsing end supposed to know that some nebulous "community of users" says that the ordering of V1 and V2 means Y. And frankly, the ordering of subtags is, in the vast majority of cases due to some GUI -- and the way in which the GUI presents how to build the tags has more to do with the ordering than what the users choose. And what is that GUI developer supposed to do?

My take-away from all this is that multiple variants without prefix orderings are nearly so nebulous as to be not worth supporting until some later version clarifies them.




> Unicode ordering is not necessarily a good model, since it applies
> to characters that reflect real usage, whereas BCP47 is completely
> responsible for the meaning and syntax of these subtags.

But it is not responsible for the fact that '1996' and 'fonipa' are
self-contradictory: it's real-word facts that determine that.

> It sounds to me like what people are saying is that multiple variants
> are not important enough to worry about -- if so, then I'm fine with
> that. My main concern is whether to have an API that returns a set
> or a list for variants; if BCP47 leaves it up in the air, then we can
> just use a set, since the implementation is simpler, and we won't then
> return false inequalities.

Preserve the order; that way you may lose some rare opportunities for
canonicalization, but you don't throw away information that's potentially
valuable.  LinkedHashSet is your friend.

Of course one can use something like LinkedHashSet; the issue is whether one is preserving a material ordering or not.


--
A witness cannot give evidence of his           John Cowan
age unless he can remember being born.          cowan at ccil.org
 --Judge Blagden                               http://www.ccil.org/~cowan

_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru

Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.