On Tue, Nov 10, 2009 at 10:40:13PM +0900, Alexey Melnikov wrote:
> Not really slides, just a collection of issues/wishes for now...
A few comments:
- SASLprep is SASLprep -- we can only make changes where those changes
wouldn't break anything:
Q: Can non-hashed query strings be made unnormalized?
A: Not if servers expect them to be normalized
Q: Can we switch to a different NF?
A: No.
SASLprep is pretty much written in stone for the mechanisms that use
it _now_. We could hold up SCRAM for a new stringprep profile, but
I doubt we'll want to (we can live with SASLprep, no?).
- Clearly RFC3454 will have to require some updates if we're to have a
new profile that we want to call a "stringprep" profile but which
does things outside RFC3454, such as NF-insensitive string
comparison, or use of NFs other than KC.
I don't think this is an obstacle. But it will slow things down if
we choose to update/obsolete RFC3454.
- IMO query strings should be sent unnormalized -- normalization should
always be delayed as long as possible, or avoided altogether where
NF-insensitive string comparison will do (see more below).
Normalization should only happen at certain very well defined points,
such as:
- as an internal detail of normalization-insensitive string cmp
- prior to hashing a string (of any kind, query or storage)
- prior to storing a string (which is thus a storage string)
- I've not made up my mind re: compatibility mappings. Evidently new
compatibility mappings can be added at any time (or so I'm told),
which complicates Unicode version agility, which argues against using
them.
But use of K mappings removes some confusables, thus seems likely to
be worthwhile (though, to be sure, we'll never have zero
confusables).
- NF negotiation may be appropriate in some protocols, but I can't
think of which. Leaving the NF completely unspecified can certainly
be done, as it was, e.g., for NFSv4, but only if there is a single
entity that will be doing hashing, and preferably only if a single
entity will be doing normalization-insensitive string comparisons.
(In the NFSv4 case there's no hashing, save as an implementation
detail of the server, and all n-i object name string cmps happen in
the server. Therefore leaving the NF completely unspecified was
reasonable, and IMO fortuitous.)
- I don't know enough about Unicode version evolution, but if all we
had to worry about were unassigned codepoints, then we'd be done,
wouldn't we? OTOH, if K mappings can be added at any time, then we
have a problem. (NFC is closed to new compositions, for example.)
- NFD makes the most sense to me, and should to anyone who has a
generic Unicode normalization library: NFD is a step in the NFC path,
therefore NFD is faster than NFC. Plus, NFC == NFD, asymptotically :)
However, some implementors may not have such a generic Unicode
normalization library, and today may only be capable of NFC, or even
NFKC. That wouldn't sway me, but then, the choice of C or D is not
all that consequential -- not as consequential as the to-K-or-not-K
choice, for example.
Nico
--
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.