[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Simple] Re: [Geopriv] Domain identifier in common policy



I'm probably well beyond my I18N/IDN depth here, so I agree that external advice is called for.

My understanding of RFC 3987, Section 5, is that these steps are performed in sequence, starting with the low-cost step in 5.3.1 and progressing to steps requiring more work if there's no match.

First, I think we can restrict ourselves to discussing URIs appearing in specific protocols, such as part of a SIP From URI or an XMPP URI.

There are probably two cases: Protocols that are clearly specified as using IRIs already (XMPP, say) and those that require discussion (SIP, say). [Discussion is required for SIP since SIP itself allows UTF-8 and RFC 3987 Section 6.3 alludes to the fact that most schemes do not have to be upgraded to support IRIs.]

I'll stick to the IRI case. If the IRI shows up in the protocol request, the steps in 5.3.1 would be executed until either a match occurred or the process falls off the ladder mentioned in the spec. Clearly, some of the comparison steps do not apply since they concern the port number or path components of the comparison.

From my reading of 3987, the punycode version would be compared as well during the ladder, presumably by converting the IDN to punycode. (I suspect that the conversion from UTF-8 to punycode is unique, while I suspect that this is not true in the other direction. In other words, multiple UTF-8 strings could generate the same punycode.)

This is all a bit messier than ASCII comparison, but I don't think we want users to edit punycode into their XML rule files.

Henning



At the very least, common-policy ought to point out which comparison is being done. I would assume that since these are being used as identifiers, that section 5.3.1 is the section that is relevant here.

And it might be that I'm just not smart enough to understand something as trivial as internationalized character comparison, but what happens with punycode encoded domain names? As the example from that RFC: if I receive it as domain="xn--99zt52a", do I convert it to domain="納豆" for the comparison? As the RFC states:

   Implementations with scheme-specific knowledge MAY convert
   punycode-encoded domain name labels to the corresponding characters
   by using the ToUnicode procedure.

Again, I'm not an expert, but shouldn't something specify what is to happen here for doing this identity comparison?

-andy

_______________________________________________ Simple mailing list Simple at ietf.org https://www1.ietf.org/mailman/listinfo/simple