[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltru] extlang & deprecation (was draft updated



Peter wrote:
>
> I also see another potential concern wrt #2: I think there is an
> implicit assumption that X-Y and Y be considered semantically
> equivalent in terms of what they denote (although they may have
> slightly different behaviours in certain matching scenarios). Yet,
> with one or the other deprecated, there's some likelihood that some
> implementations will assume that the deprecated form can be largely
> disregarded in designing matching behaviour.

Disregarded may not be the right word. In my implementation, I canonicalize the deprecated form away before matching. A "well-formed" 4646 or 3066 implementation might not do that and arrive at different results. Besides, what you're proposing below is that both forms be treated as semantically equivalent, which is tantamount to "disregarding" one of the forms.

>
> Btw, it should be noted that both #1 and #2 lead to consideration
> of cherry picking.

I don't see why.

What we're doing here is recommending one form over another for equivalent tags. Let me draw a parallel here. I work with a group of developers who code in Java. Like most development teams, we have coding style guidelines. One of them has to do with "if" statements. Both of the following forms (a) and (b) are completely equivalent, but one of them is preferred:

a) if (boolean) {
      // code
   } else {
      // code
   }

b) if (boolean)
   {
      // code
   }
   else
   {
      // code
   }

It doesn't matter which one we choose and our choice doesn't affect what you might choose. That's because the compiler reduces both to the same thing. However, here our parallel breaks down: X-Y and Y produce different results in non-validating implementations, so choosing one over the other is a Good Thing. And it isn't cherry picking if we ALWAYS do it the same way.

>
> Now, let me propose an elaboration of #3 for adoption. This
> elaboration is captured by three points:
>
> (a) that both X-Y and Y are freely allowed,

Both #2 and #3 provide this.

> (b) that at the level of the language production X-Y and Y must
> always be considered a match (regardless of which is part of a tag
> or of a language range), but

This is where #3 is a non-starter for me: it requires us to change all of the matching schemes in ways that are incompatible with our previous tenets. In my implementation (which is validating), I didn't have to change the matching code because I canonicalize tags and ranges before matching. With a deprecation, the registry provides all of the information for this using the same mechanism I already use for mapping. See section 4.1.2 in the current draft, which says, in part:

--
As with other grandfathered tags, since implementations might not be able to associate the grandfathered tags with the encompassed language subtag equivalents that are recommended by this document, implementations are encouraged to canonicalize tags for comparison purposes.
--

I would tend to extend this recommendation to be more general.

In lookup, the main problem with #3 is that we are not given a specific canonical form to use for our fallbacks. If we don't choose a specific canonical form, people will have to implement additional code and keep track of other fields (macrolanguage, prefix) from the registry to equate X-Y and Y for fallbacks. Currently "Preferred-Value" is the only field you have to look at for canonicalization purposes and you look at the same field for *all* subtags and tags. And this code can be executed once separately from the lookup process, since it is a canonicalization step.

I'm more concerned about filtering, though. In filtering, for example, this would require a range like "zh-yue" to match a tag such as "yue-HK". This requires additional processing that none of the existing implementations or specifications have required. One of the delights of RFC 4646 was that existing tag matching schemes weren't broken.


>
> This proposal frees us entirely from having to decide whether "zh-
> cmn"/"zh-yue" or "cmn"/"yue" is better.

"Better" may not be the right word. "Preferred" would better describe the situation. Either form may be better for *your* application (or mine), depending on circumstances. What the deprecation says, in this case, is that implementations are free to/encouraged to eliminate the differences in a certain way when processing tags.

I like form Y over form X-Y for two main reasons:

1. It simplifies the tags.
2. It can be done without looking at the registry.

If one has a well-formed implementation, one can still canonicalize the extlang form using a regular expression (one does need the 4646bis ABNF to do this). But you can't add the macrolanguage without a glance at the registry.

> It would also mean there's
> no particular reason to cherry pick: the IETF-Language can discuss
> when it may or may not be beneficial to use the extlang formulation,
> but users can ultimately decide for themselves and (because of
> requirement b) they are assured of some degree of interoperability
> no matter which they choose.

Uh....... ietf-languages can discuss the benefits of extlang or no extlang for a given new registration with a macrolanguage, and then register subtags to match. But users only have a choice when ietf-languages gifts us with subtags of both types.

>
> Of course, this would probably need some revision to RFC4647, but I
> think we have been heading toward that regardless.

We may need to revise 4647 anyway, but we don't break as many things if we deprecate one form, especially if that form is the 'X-Y' form.

Addison


Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.



_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru



Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.