PRECIS WG, IETF 83
Meeting notes
2012-03-29, Paris

Chairs: Marc Blanchet, Yoshiro Yoneya

Legend:
PR = Pete Resnick
JH = Joe Hildebrand
AS = Andrew Sullivan
MB = Marc Blanchet
PSA = Peter Saint-Andre
DB = David Black
AM = Alexey Melnikov
YY = Yoshiro Yoneya
LM = Larry Masinter
JL = Jonathan Lennox
PK = Paul Kyzivat

1. Problem Statement

http://www.ietf.org/proceedings/83/slides/slides-83-precis-1.pdf

PR: with glyph similarity, if it's in the problem statement, is it a problem we are saying needs to be addressed?
JH: we could put in some text of why its not solvable.
PR: Is this the one place where we have a serious problem where we don't expect protocol to deal with?  Can we have a statement that this is something that protocols are not expected to solve?
AS: There are some things that protocols can do to limit the damage, possibly by limiting the acceptable range of characters.  But there are protocols where this is not possible.
PR: do we want to remove the section because it's unique, or do we want to a little further to say this is a problem that you can't really solve?  Either is fine, but if this is a unique thing in the doc we should flag it.
AS: I'm not sure this is unique.  If you are using NFKC and you're not willing to ditch it, you have this problem.
PR: <missed>
MB: This is something I put in, and thought it was appropriate to include.
JH: I prefer that we not remove it. People thought stringprep did this, but it didn't, so it's better to make that clear.
MB: We solved it (-:
JH: Mention Unicode confusables ( http://unicode.org/reports/tr39/#Confusable_Detection ) work?
MB: This means we need to start refering to Unicode lables, and there's nothing else that does that yet.
/JH waves hands
JH: Perhaps it would help registries avoid confusables.
MB: I see two people that will provide text
PR: This has not been WGLC, so these can be LC comments and that's fine.
MB: Appx D is an attempt to copy-paste the reviews of stringprep profiles. But it's many pages of text that is informative, but it would be great if we could ask the authors of these sections to review.
PSA: I remember going over the wiki page trying to make them consistent, so that might be something else we need to do.
MB: That's part of what we did in the document but further review might be helpful.
MB: Are we ready for WGLC?
AS: I noted that only one person acknowledged reading the draft, so it might not be ready.
MB: WGLC might be a good motivation to read it.
PR: As chairs, you might want to request certain people read this soon.

Chairs action: do wglc on the document and ask specific people for review.

2. PRECíS Framework (PSA)

http://www.ietf.org/proceedings/83/slides/slides-83-precis-2.pdf

AS: I just want to point out a risk about things like symbols and other characters that are big classes of not-fully-understood stuff.  We are starting to understand the flaw in this approach, because a bunch of things are in certain classes are critical for things like vowels in hindi.  We were going to restrict these, but then we end up excluding more than intended.  I'm really nervous about prohibiting any of these.  The FreeClass was supposed to be permissive, and we might want it to be really really permissive, but you MUST be really sure you want to use it.
PSA: <missed>
AS: Be aware that FreeClass will have things you don't understand
MB: This is a class, and a profile can further restrict it.
JH: +1 for all before.
PSA: We need to note in the draft that FreeClass could be problematic for all it's inclusiveness.

MB: I think the assumption if we're talking about identifiers, then identifiers are over multiple lines if included
DB: If we can't figure out how to use line separators, but space separators does seem useful for doing certain things.
PSA: The definition is just space, but we want to keep this limit

AS: I think the reason these are disallowed (in IDNA) is because you can't tell the difference between them and other things.  I might be wrong, but I think it's things like 'X' versus 'ROMAN 10', and you can't tell just by looking at them.  The advantage of disallowing is you have a whole class of problem you skip over.
AS: Do all of these have compatibility mappings?  All the title-case do, and all the letter-number do, but are there others?
PSA: Good point, will check.

AM: Where did this allowing space in NameClass come from?
PSA: We closed that in IETF-82
AM: Then we had this discussion in SASLprep, and we may need to reopen this issue
PSA: We certain allow space in FreeClass, but we do not in NameClass.  I thought when we talked to security folk, we would want space in NameClass ...
PSA: Allow ASCII-SPACE in NameClass, but limit by profile
MB: Currently, space is disallowed in NameClass currently.
MB: What do the customers think?
AM: I say no, because SASLPrep will need space?  Spaces are needed for LDAP and IMAP.  We can agree people should not use it, but existing protocols already use it.
MB: We prefer everyone use the same base class, but if a specific protocol cannot then we have it more open and each profile an restrict it more.
PSA: We can include it, and profiles will need to say "don't use spaces".
MB: Question: what do we want to do in the base class?
AS: I am informed that in some input methods, 'space bar' inserts a zero-width non-joiner, and do we really want to allow that?  If we want protocols to include this, we need to do some analysis.
PR: I think AM and AS just said what I needed to hear; there are certain protocols that use FreeClass in their "name class", and use things that we really don't want to allow.  I'm starting to worry that what we have is a protocol is using something beyond a standard name for login id's and we want them to use a different class for their identifiers.
AM: I think we just need ASCII-SPACE, because we don't want ctrl characters, and other ASCII-range are fine (punctuation, etc).
PR: Do we want to allow non-ASCII punctuation in the NameClass
JH: You could, if you were not doing certain mappings (e.g. compatibility) you could want to use them as delimiters. Are there places where we can give protocols some i18n recommendations to prevent some of these problems?
MB: We could go with something we could consider "safer", that is a subset of FreeClass.
PSA: It's incumbent upon us to exercise responsibility; I'd be fine coming up with a separate class if they need whacky stuff.
AS: The NameClass cannot be something covers everybody's name, but is something that is safe and most expect to be a name.  The more I think about it, putting in Spaces is a bad idea and we shouldn't do it.
JH: We could also make one other decision, which is "spaces are bad and don't use them".
AM: are you allowing them, but note they should not be allowed
JH: No; we could say they're always disallowed, or we could say they're allowed but recommend against it, or we could say here's the base, but you can add more.
PR: This group came to a conclusion that spaces et al are problematic, and we should be careful.  With things like LDAP can superclass, but it can have dragons, and we allow people to superclass and allow them to appear.
PSA: We don't currently allow superclassing.
AS: Looking through the confusables, there's no way to tell the difference between some inputs.  It seems like if people want it, they can have a superclassing.  But since people can do bad things, we need to have explicit text that tells them they're about to shoot their own feet off.
JH: There's a fourth approach that have enough AD people, and tell SASL "thou shalt not ..."
AM: then we'll go somewhere else.
JH: Well, is this actually used, or can you make a change now and have people deal with the edges?
MB: It appears that people will need space, but is not safe.  So we'll need a UnsafeNameClass and big text that says "EXPERIMENTAL: USE AT YOUR OWN RISK"
PR: Something that just came up; we should go to the Sec, and we're planning to disallow spaces in i18n usernames -- what's going to break?  If they come back with "we MUST have spaces", then we'll have to do some dancing.  But I hear the recommendation here is to disallow spaces, and strongly recommended you can't use spaces.
DB: IF we ever get around to NFS then we'll have to do filenames.  However, NFS filenames are truly weird, because it depends on what the FS thinks it is you're doing, and there's too much running code that does things differently, and clients are aleady having to fudge things.  On one hand are not part of iSCSI, and on the other that NFS filenames are sufficiently troublesome to not start with.
PSA: Action item is to go back to Sec Area to discuss disallowing spaces.
YY: I'd like to discuss if the point mapping topic is something to address.


3: PRECÍS Mapping

http://www.ietf.org/proceedings/83/slides/slides-83-precis-3.pdf

PSA: One of the things we got with stringprep was this, but we lost it with the latest.  It's something we want in XMPP, but is this something we want in the framework or in a separate doc.
AM: I think adding this to the framework might resolve the NFKC vs NFC debate
TH: I was thinking the same thing, why not in framework document?
TH: Also, if superclassing were introduced, we could have multiple inheritance
group. (grin, various groans)
AS: I agree with people that this *COULD* go in the framework, but the suggestion for the two places, which is a BAD IDEA.  It needs to be one place. The reason that mapping was removed is because IDNA2003 had it, but we are in a country now that have capitals with accents in them, but don't always use them.  There is debate whether such letters have accents or not, and that created ambiguities. Some expect things to map with or without accents, and we don't know who.  We removed from IDNA2008 because we can't make locale-sensitive issues.  Maybe this argument doesn't hold for framework, but it did for IDNA2008
DB: In iSCSI, output is always lowercase. Would be useful to have a form of casemapping that could be reused by iSCSI.
MB: I think we have consensus for mapping somewhere.  It's a question if we want one or two docs.  If we have two docs, we need to cross-ref, but it might be easier to have two documents that allow the to progress at different rates.
PSA: Which we might want to put together later.
AS: In the problem statement, we say you might not want to allow casemapping (e.g., protocol doesn't need it). If in the framework, might be harder to tease it out.
PR: We need to say something about casemapping. But it might cause problems for the user. Whatever the mapping document says, it might make life easier for the implementer but more difficult for the user, so you need to provide some advice to your users.
DB: Would the following have potential?  Would it be possible to define a baseline casing that gets it right for all but these troublemakers, and the input method above the framework to deal with them?
AS: The only reliable case-folding is ASCII and nothing else.  We have this because of how typewriters work.  We have a problem because we have ambiguous mappings, and there's no way around that.
DB: Since iSCSI used the case-folding table from stringprep, so we still have a need for this.  Throwing it out completely is not going to help things get done.
MB: When we did the survey, most of the stringprep profiles do casemapping. We can't ignore that problem. Customers will need to transition. IDNA had a different context.
JH: I think we need at least one or more locales that influence the algorithm that don't solve all problems, but point out the caveats.
AS: Unicode provides a casefold operation. So you could do this. I strongly support keeping this in a separate document, and it needs to contain implementation advice that the results are not predictable and are locale-specific. See what IDNA says as one approach to improved reliability (e.g., don't use uppercase).
DB: I Think I mostly agree with everything said.  Part of putting this together, we need guidance for moving forward from the casefolding in an RFC.  Something that points out the dragons.
YY: As a co-chair, I recommend we keep this separate, and we do some modifications to point out the problems.
PSA: It depends on where the checks are done, and it can be different for different protocols.  For instance, the XMPP server does the checks, which does not have the locale knowledge.
JH: We also don't necessarily have access to everything you'd expect, e.g., in the browser. We might want to coordinate with the W3C.
AS: Just to illustrate what browsers have done, some say "if you support this language, then you are using this set of characters and display these as U-labels and not A-labels".  Some have a bunch of other rules, but some just look at the reported supported languages, for IDNA.  Some browsers tell you this page is unavailable because you can't read it.
MB: Consensus for a separate document, and there is consensus this is a WG item.
PSA: This is a good starting point, but we obviously have more work to do, however it's a start.
MB: The next draft will be a WG item.

Consensus summary:
- agree that mapping needs to be done and this document is a good base to start with.
- next rev as draft-ietf-precis-mapping
- separate document for now. may reconsider to merge with framework later, but not likely.
- more work to be done and especially warnings regarding case folding.

4: SASLPrep Bis

http://www.ietf.org/proceedings/83/slides/slides-83-precis-5.pdf

PR: In the space cases, do zero-width joiners and non-joiners count?
AM: I need to check, I don't remember. Attempt was to minimize what we break when we change.
JH: The nonjoiners are Cf.
PR: As you're going through this, is there a reason to not map all the spaces to nothing?  Is that false-positive bad?
JH: You will break existing hashses in your databases, if you included usernames and they had spaces.
PSA: I think we're disallowing this stuff, then it's not in the output.
PR: Mapped to nothing means remove it before you compare.  Disallow means don't allow the comparison at all.
JH: There are a bunch of things that are Zs that are non-ASCII space.
PSA: But those are "non-ASCII space", not zero-width joiners etc.
PR: Are there things that are spaces that have no width, but I think the answer is no.
AS: The reason I raised the zwnj is beause we input "space" and it actually puts in something else, but that's a separate problem.
PSA: From a protocol perspective, we're not going to allow that thing even if you think that's what you typed.
AS: People who use those input methods understand space to mean that. The usual case is certain Arabic input methods. They use "space" for zero-width joiner and zero-width non-joiner.
PR: Mapped to nothing is a worse choice if you can avoid it.  If you can get away with making them disallowed, that's better.
PSA: +1 to PR.  If we need to provide advice, I'll help formulate the text.
JS: How much trouble will we get into regarding hashes by changing normalization?
AM: We might, but we don't know. If these characters are not easy to enter using existing tools, it might not be a problem.
JH: I might suggest you might want to consider a tag to mark this is normalized (during transition) so you know what happens during upgrade.
JS: That might be fine in the server, but it might now work for the client.
JH: If the UA includes a hint of how it normalized, this might help.  Some SASL mechanisms allow for signaling the normalization, but others might require a new mechanism name.
PSA: We might have some data about this in XMPP because we've got plenty of people that have created plenty of names in all sorts of things, and we can do some analysis.
MB: That would be good input.
PSA: I plan to provide that.
MB: I would postulate that customers that move from stringprep to PRECIS might bump their protocol versions. Versioning will need to be handled by using protocols.
MB: Could you redo the same slidedeck with the things you talked together?
PSA: We will take out the slides we did not present.

5: XMPP Nicknames

http://www.ietf.org/proceedings/83/slides/slides-83-precis-4.pdf

JH: Is the thought that the UA's will do this, and they can't have mixed-case names?
PSA: I did this for comparison purposes, and it's something that is done on the server for comparison purposes.
PR: As long as this is documented as the compare on the server, then this is good.  The reason this works is because it avoids two people that look the same from getting confused.  The first thing I thought was why are passwords invoking the bidi rule, which are not usually a user-visible thing.
PR: You might want to remove bidi from the SASL recommendations.
AM: We will look into that.
TH: There are some user interfaces where there is a checkbox to show the password while typing it in.
PSA: This is a physical security issues and not a input issues.
PR: There are funny things that happen if you do not follow the bidi rule for display purposes only.  For labels et al that can be necessary, but for password fields that are less about order of display.
JH: Do we have a confusable mapping yet, because this would be a great place for it.  I would like to recommend we add it.
PSA: Yes, we should add a confusable mapping, but don't want to hold up the SIMPLE WG until we're done
JL: stpeter / "Peter Saint-Andre" are confusable, so human moderators have to be involved
PSA: maybe we don't do that at the IETF
AS: I wanted to ask about NFKC.  What is the rationale for NFKC vs NFC.
JH: In this case, we're trying to get as many collisions as possible, so NFKC is better.  This is also why we want the confusable table.
MB: Qustion for AD - I think of precis as an advisor, since most of the work is done in other places.  Are these separate documents owned by precis or by the other WGs?
PK: SIMPLE chat just wants to refer to this, but we've been putting along, and we're almost done.
PSA: We need to finish the framework document before we can finish this.
PK: This taking "too long" might cause problems with SIMPLE.
PR: Your drafts might get stuck in the IESG if this is a normative reference.

6. WG Future / Structure

PR: My sense is that precis will be an odd collection.  The only other way this works is to have a bunch of individual submissions from a bunch of other places, so I think we're just going to be an odd duck.
PR: We can think of other models, say as a "Precis Directorate" versus a working group.
MB: The first outcome is, these other documents are not WG items, but are individual submissions.
PR: I agree that these are separate for now, but we finish our core drafts first.  Then work with the other chairs to figure it out, and we might need to leave this WG open to work on other documents from closed working groups.  And maybe discuss having a Precis Directorate.
MB: We should continue the discussion about how precis works
PR: We'll figure it out as we go, but for now we'll leave nicknames (and similar) as individual submissions.

Consensus:
- profiles document are individual documents (or non-precis wg) for now.
- will revisit when problem statement and framework are done or almost done.


7. IRI Coordination?
LM: I'm in the IRI WG, and we split out processing of IRI's from how to compare IRI's.  One of the issues is the use of IRI's as security tokens have issues, and having an IRI comparison document would be another customer.
PR: This brings up slightly different take on the last document. PRECIS was chartered to deal with the people who were using stringprep. Wasn't always about comparison but about normalization. The SIMPLE document we just talked about was about comparison. That's a much more specific topic. As a WG we might want to revisit whether that's currently in our charter or whether we take on product of that kind.
LM: The other approach is to reconcile the IRI document with RFC<missed>.
PSA: I see the two as different cases. In the chat nicknames, we are customers of the framework. The IRI thing we would put off to the side because they're not a customer of the framework.
MB: I agree.
MB: Yoneya-san found another customer.
YY: Document in DRINKS is working on comparisons, and they may need the precis framework.
PSA: I actually talked to the DRINKS folks, and I agree we need to figure out the model going forward.  During my IESG term, it comes up and will come up more and more.