IDNAbis WG minutes

Meeting: IETF72, Monday, July 28, 1520-1720, Tuesday, July 29, 1520-1720
Place: Rathcoole room, Citywest hotel, Dublin
Chair: Vint Cerf <vint@google.com>
Minutes: Andrew Sullivan <ajs@commandprompt.com>
Version: 1.0

========================================================================

Core documents

http://www.ietf.org/internet-drafts/draft-ietf-idnabis-rationale-01.txt
http://www.ietf.org/internet-drafts/draft-ietf-idnabis-tables-02.txt
http://www.ietf.org/internet-drafts/draft-ietf-idnabis-protocol-02.txt
http://www.ietf.org/internet-drafts/draft-ietf-idnabis-bidi-01.txt



FIRST SESSION (July 28, 2008 1520-1720, RATHCOOLE ROOM)

1.  Administrivia
==========================


Scribe appointed, agenda modified.



2. Outstanding issues:
==========================


2a. Protocol document
--------------------------

John Klensin presented an overview of the contextual rules registry.
There was some debate about whether or not to state the contextual
rules as regular expressions. A group of interested individuals,
including Mark Davis, John Klensin, Patrik FŠltstršm, agreed to discuss
and make a decision prior to session 2.

Mark Davis raised two additional matters: 

        - Mapping issue 

        - Normative parts of rationale should be moved into protocol or
          tables or bidi.

The Chair asked that these be deferred until the discussion of the
rationale document.


2b. Bidi
--------------------------

Harald Alvestrand presented an overview.  He noted in particular that
while there were issues hanging over from Philadelphia, he hadn't seen
discussion that appeared to converge on consensus.

There are two main issues open:

        1. Can we accept strings that mix RTL and LTR contexts?
        2. Do we need inter-label tests?

Pete Resnick argued that the current document is too strict, and the
rules should be relaxed.

John Klensin argued that it would be a bad thing to perform
inter-label checks.  Harald observed that this entailed disallowing
numbers at both ends of a string (candidate label) with RTL
characters.  John is willing to accept this restriction.

Paul Hoffman argued that the document has too much justification, and
should be reduced to rules that work without the reasons why.  Harald
argued in response that the lack of justification caused problems in
IDNA2003.

Ted Hardie argued that the proposed changes were "clinically insane"
and made many other amusing loud noises.  The purpose for this was to
note that the proposed changes led to extreme instability of labels,
where characters in one label could "jump over the dot".  This would
be bad.  John Klensin agreed, and argued that this was also a
justification for keeping rationale for rules in the document,
because without reasons for a strange rule, implementers will ignore
it.

Mark Davis spoke in favor of including rationale text (although
possibly moving it to the rationale document).  He also argued for the
text as it currently exists, without the proposed modifications.

Pete Resnick suggested that people would have to get over the
"sacrosanct dot", because any application that is going to deal with
non-ASCII characters will have to do work before talking to DNS
anyway.  Ted Hardie replied that this was the invention of a new
delimiter to solve a problem created by the proposed change.

Alirezah Saleh suggested that testing was needed.

Vint Cerf noted that there is a problem with testing, because many of
the examples were being tested in, e.g., word processors, which are
not treating the strings label by label.  Harald replied that domain
names often occur in text, and if text processing software messes them
up after they come out of the domain name context, then there will
continue to be practical difficulties.

Paul Hoffman pointed out that there remained a problem in the text,
which turned out to be an inconsistency between sections 1.1 and 6.1
that needed to be fixed.

John Klensin pointed out that many things depend on the convention
$string1.$string2 to identify domain names, and many applications will
break if "." does not remain the only separator.  Also, the security
community will be angry.

Alirezah Saleh observed that the problem is not just with "." because
"@" has the same problem.  If you start substituting ".", it will
cause problems for RTL readers in a new way.  Restrictions on numbers
at the end of a label are less problematic.  Users have already
adapted to IDNA2003 anyway.

Mark Davis put an example on the flip-chart, which showed that
1AB2.X.3CD4 displays as 1AB2.3.XCD4, where "X" is any Arabic
character.

The Chair noted at this point that there was not enough for a
conclusion in favor of any approach.  Discussion continued.  

Phillip Hallam Baker argued against inter-label checks, because they
won't work.

Andrew Sullivan felt that inter-label checking rules were not
enforceable.

The Chair asked for a sense of the room.  Mark Davis and Harald
Alvestrand felt strongly against removing inter-label comparisons.
Nobody claimed not to  care.  Many people supported elimination of
inter-label checking.  Harald said he would add text describing the
residual dangers if that approach was adopted.


2c. Tables
--------------------------


Patrik FŠltstršm presented.  The document had been stable, but a
recent email from Korea asked for a large number of characters to be
changed to DISALLOWED.  Patrik argued that the change should be
included because of the justification for the change (it's not a
character-by-character analysis).

Mark Davis argued that allowing the change essentially put the group
on the road of going character by character through Unicode, because
it depended on an analysis of whether people used the characters.  The
argument here is similar to the argument, already rejected on the
list, to eliminate some 30 scripts as obsolete.

Vint Cerf noted that it might be bad to reject the advice from a group
of native language speakers.

Patrik suggested looking more carefully at the request and discussing
in the next session.

The second issue was an objection to the IANA considerations section,
because it's not clear.  The document contains non-normative tables.
There is a request to IANA to keep the tables up to date; it does not
ask IANA to keep track of the rules.  This is a source of confusion.
Patrik suggested a new approach, which is that IANA keeps track of the
table of code points using an appointed expert, but the document is
clear that any change to the normative rules needs IESG action.
Patrik asked people to think about the new suggestion until the next
session.

The meeting adjourned at 17:22 local time.





SECOND SESSION (July 29, 2008, 1520-1720, RATHCOOLE ROOM)


1.  Administrivia
==========================


Scribe appointed, agenda modified.


2. Items from previous day
==========================


2a. Protocol
--------------------------

John Klensin reported on the results of the discussions within the small
group continuing the previous day's consideration of the format for the
contextual rules.  Although there was no clear preference, they decided
not to use regular expressions since a significant segment of the
audience for the document might find them difficult to read. The next
version of the document will reflect the results of this discussion.

2b. Tables
--------------------------

Patrik FŠltstršm noted that his personal first impressions on first
reading were not echoed in the list discussion.

A representative of the National Internet Development Agency of Korea
(NIDA) [name not understood at mic] rose to clarify the Korean
submission, because it was the result of consensus within that
community.  They wish to restrict the entire Hangul Jamo block at the
protocol level, because restriction by policy cannot be guaranteed in
all registries, and the risk of user confusion is otherwise significant.

Mark Davis rose to argue that, if the WG wants to restrict historic
scripts generally, then the Korean proposal would be okay.  Otherwise,
the Korean proposal is inconsistent with what's been done in other
cases.

The Chair noted that he is uncomfortable overriding advice taken from
experts on Korean, and wanted the WG to have time to study the
submission, so said that no decision would be taken immediately.

Alirezah Saleh asked whether a security problem was a good reason to
disallow characters.  Patrik replied that it was, except that it had
to be weighed against the cost of evaluating character by character:
block by block evaluations are okay.

John Klensin added that there are different definitions of the
security problem: visual confusability, for instance, isn't enough,
whereas invisible joiners probably are.

Paul Hoffman noted that the charter of the WG explicitly excludes
phishing and confusing similarity.

On another issue, Mark Davis noted that the tables document is
structured such that the lists are not normative.  The experience with
Unicode is that people will just follow the lists and not the rules.
Patrik replied that the tables need to be taken out of the document.

Paul Hoffman asked whether there would be a non-normative table
maintained by IANA.  Otherwise, everyone would have to do a
complete implementation.  Stephane Bortzmeyer agreed, saying that
the list of characters are useful at least during I-D phase.  Patrik
agreed to leave alone for now.


2c. Bidi
--------------------------

Harald Alvestrand said that he had added text to the document stating
with regard to labels containing RTL characters, "Here's what will fail
when the conditions are met," and that the resolver MAY refuse to look
up such domains.

Andrew Sullivan asked for clarification of the role played by the
resolver, being unhappy about the suggested action 

Lisa Dusseault said this wording made her nervous. Harald observed that
one way of getting fewer of these cases displayed is to refuse to look
them up.  Suzanne Wolf suggested finding and using a term other than
"resolver", since the term is obviously being used in a sense that
differs from the one it has in specialized discourse about the DNS.

Several additional people joined the discussion at this point, adding
further perspectives to the consideration of terminology. Andrew
Sullivan volunteered to draft specific wording. Harald asked for
additional contributions to that action, with Suzanne Wolf responding
affirmatively, and for further volunteers to pre-check his proposal
before sending the draft to the repository.


The Chair summarized the discussion with two points; first that clearer
language was needed about the context in which the specified action was
to take place, and second, that inter-label testing would be removed
from the bidi rules, also to be clarified by explanatory rewording.


3. Rationale
==========================

John Klensin began a discussion of "critical path" issues for the
rationale document.  Slides are at
http://www3.ietf.org/proceedings/08jul/slides/idnabis-0.pdf.  

Mark Davis noted that there were two issues he was worried about.  The
first is the stability of labels.  The second is the non-stability of
non-labels.  The current goal in the documents is that a current
non-label remain a non-label forever.  John replied that the current
rules of the context registry require standards action, but that's
because we don't know whether there is a safer approach, and in
future the rule might be relaxed.  Mark noted that this might mean
changes weren't done in time for a new version of Unicode.  Patrik
FŠltstršm noted that there will be inter-operation problems anyway,
because of the need to support different Unicode libraries in the
field; and anyway, an uncontroversial action should take less than 6
months anyway (and a controversial one, while taking longer than 6
months, likely needs the additional deliberation).

The next issue is the policy statements about zone administrators.
There was some discussion of whether it is useful to have text that
says, "You must have a policy," since one policy could be "everything
in".  Yoshiro Yoneya proposed a rule that children MUST adopt their
parent's policy, but John replied that it wasn't a practical answer.
Mark Davis suggested that the statements as they stand are not
terribly meaningful, and just make the document set harder to read.
Andrew Sullivan suggested doing this in a BCP document rather than in
the protocol.

After further discussion with several participants, the Chair felt
that a body of information needed to be collated but that it was not
certain how this should best be put forward.

Marcos Sanz argued that the rules are not implementation-vacuous,
because they have implications.  He asked the WG not to issue a BCP
on best registration policies, but said that if anything is important
enough it ought to be treated in protocol.


4. Local Mapping and Preprocessing
==================================

IDNA2003 specifies an approach that is lossy in some cases.  IDNA2008
drops the mappings that cause this. This opens the question of how to
treat the affected characters, and how to handle moving from IDNA2003
when dependent on the IDNA2003 mappings.  One solution would be to
specify new rules that are compatible with IDNA2003, but in that case
one might as well stick with mapping. Alternatively, we could specify
mandatory preprocessing, preprocessing in just some circumstances, or
no mandatory preprocessing.

Yoshiro Yoneya said he thinks there is a need for a good table, or
else implementers will do nothing.

Mark Davis suggested that part of the problem has to do with what
people mean by "on the wire".  He is concerned that the current
document leaves it open for a client to do more or less anything it
wants.  He would like text that says, "Here is how the mapping works
for IDNA2003," then everyone could do it that way.

Pete Resnick suggested that the proposals being raised would probably
entail big topics like, "If a user types something _like_ a domain
name, there are transformations you should do, and they are these: "
and that this WG is probably the wrong group to write such a document.

Edmon Chung observed that any case in which two end users go to two
different destinations due to local mappings, despite having typed the
same characters on the keyboard, is obviously bad.

Thomas Roessler mentioned that there are other groups who will step in
if the WG doesn't do something, and those others may not do what WG
would like.

The meeting adjourned at 17:28 local time.