IDNAbis WG minutes Meeting: IETF72, Monday, July 28, 1520-1720, Tuesday, July 29, 1520-1720 Place: Rathcoole room, Citywest hotel, Dublin Chair: Vint Cerf Minutes: Andrew Sullivan Version: 1.0 ======================================================================== Core documents http://www.ietf.org/internet-drafts/draft-ietf-idnabis-rationale-01.txt http://www.ietf.org/internet-drafts/draft-ietf-idnabis-tables-02.txt http://www.ietf.org/internet-drafts/draft-ietf-idnabis-protocol-02.txt http://www.ietf.org/internet-drafts/draft-ietf-idnabis-bidi-01.txt FIRST SESSION (July 28, 2008 1520-1720, RATHCOOLE ROOM) 1. Administrivia ========================== Scribe appointed, agenda modified. 2. Outstanding issues: ========================== 2a. Protocol document -------------------------- John Klensin presented an overview of the contextual rules registry. There was some debate about whether or not to state the contextual rules as regular expressions. A group of interested individuals, including Mark Davis, John Klensin, Patrik FŠltstršm, agreed to discuss and make a decision prior to session 2. Mark Davis raised two additional matters: - Mapping issue - Normative parts of rationale should be moved into protocol or tables or bidi. The Chair asked that these be deferred until the discussion of the rationale document. 2b. Bidi -------------------------- Harald Alvestrand presented an overview. He noted in particular that while there were issues hanging over from Philadelphia, he hadn't seen discussion that appeared to converge on consensus. There are two main issues open: 1. Can we accept strings that mix RTL and LTR contexts? 2. Do we need inter-label tests? Pete Resnick argued that the current document is too strict, and the rules should be relaxed. John Klensin argued that it would be a bad thing to perform inter-label checks. Harald observed that this entailed disallowing numbers at both ends of a string (candidate label) with RTL characters. John is willing to accept this restriction. Paul Hoffman argued that the document has too much justification, and should be reduced to rules that work without the reasons why. Harald argued in response that the lack of justification caused problems in IDNA2003. Ted Hardie argued that the proposed changes were "clinically insane" and made many other amusing loud noises. The purpose for this was to note that the proposed changes led to extreme instability of labels, where characters in one label could "jump over the dot". This would be bad. John Klensin agreed, and argued that this was also a justification for keeping rationale for rules in the document, because without reasons for a strange rule, implementers will ignore it. Mark Davis spoke in favor of including rationale text (although possibly moving it to the rationale document). He also argued for the text as it currently exists, without the proposed modifications. Pete Resnick suggested that people would have to get over the "sacrosanct dot", because any application that is going to deal with non-ASCII characters will have to do work before talking to DNS anyway. Ted Hardie replied that this was the invention of a new delimiter to solve a problem created by the proposed change. Alirezah Saleh suggested that testing was needed. Vint Cerf noted that there is a problem with testing, because many of the examples were being tested in, e.g., word processors, which are not treating the strings label by label. Harald replied that domain names often occur in text, and if text processing software messes them up after they come out of the domain name context, then there will continue to be practical difficulties. Paul Hoffman pointed out that there remained a problem in the text, which turned out to be an inconsistency between sections 1.1 and 6.1 that needed to be fixed. John Klensin pointed out that many things depend on the convention $string1.$string2 to identify domain names, and many applications will break if "." does not remain the only separator. Also, the security community will be angry. Alirezah Saleh observed that the problem is not just with "." because "@" has the same problem. If you start substituting ".", it will cause problems for RTL readers in a new way. Restrictions on numbers at the end of a label are less problematic. Users have already adapted to IDNA2003 anyway. Mark Davis put an example on the flip-chart, which showed that 1AB2.X.3CD4 displays as 1AB2.3.XCD4, where "X" is any Arabic character. The Chair noted at this point that there was not enough for a conclusion in favor of any approach. Discussion continued. Phillip Hallam Baker argued against inter-label checks, because they won't work. Andrew Sullivan felt that inter-label checking rules were not enforceable. The Chair asked for a sense of the room. Mark Davis and Harald Alvestrand felt strongly against removing inter-label comparisons. Nobody claimed not to care. Many people supported elimination of inter-label checking. Harald said he would add text describing the residual dangers if that approach was adopted. 2c. Tables -------------------------- Patrik FŠltstršm presented. The document had been stable, but a recent email from Korea asked for a large number of characters to be changed to DISALLOWED. Patrik argued that the change should be included because of the justification for the change (it's not a character-by-character analysis). Mark Davis argued that allowing the change essentially put the group on the road of going character by character through Unicode, because it depended on an analysis of whether people used the characters. The argument here is similar to the argument, already rejected on the list, to eliminate some 30 scripts as obsolete. Vint Cerf noted that it might be bad to reject the advice from a group of native language speakers. Patrik suggested looking more carefully at the request and discussing in the next session. The second issue was an objection to the IANA considerations section, because it's not clear. The document contains non-normative tables. There is a request to IANA to keep the tables up to date; it does not ask IANA to keep track of the rules. This is a source of confusion. Patrik suggested a new approach, which is that IANA keeps track of the table of code points using an appointed expert, but the document is clear that any change to the normative rules needs IESG action. Patrik asked people to think about the new suggestion until the next session. The meeting adjourned at 17:22 local time. SECOND SESSION (July 29, 2008, 1520-1720, RATHCOOLE ROOM) 1. Administrivia ========================== Scribe appointed, agenda modified. 2. Items from previous day ========================== 2a. Protocol -------------------------- John Klensin reported on the results of the discussions within the small group continuing the previous day's consideration of the format for the contextual rules. Although there was no clear preference, they decided not to use regular expressions since a significant segment of the audience for the document might find them difficult to read. The next version of the document will reflect the results of this discussion. 2b. Tables -------------------------- Patrik FŠltstršm noted that his personal first impressions on first reading were not echoed in the list discussion. A representative of the National Internet Development Agency of Korea (NIDA) [name not understood at mic] rose to clarify the Korean submission, because it was the result of consensus within that community. They wish to restrict the entire Hangul Jamo block at the protocol level, because restriction by policy cannot be guaranteed in all registries, and the risk of user confusion is otherwise significant. Mark Davis rose to argue that, if the WG wants to restrict historic scripts generally, then the Korean proposal would be okay. Otherwise, the Korean proposal is inconsistent with what's been done in other cases. The Chair noted that he is uncomfortable overriding advice taken from experts on Korean, and wanted the WG to have time to study the submission, so said that no decision would be taken immediately. Alirezah Saleh asked whether a security problem was a good reason to disallow characters. Patrik replied that it was, except that it had to be weighed against the cost of evaluating character by character: block by block evaluations are okay. John Klensin added that there are different definitions of the security problem: visual confusability, for instance, isn't enough, whereas invisible joiners probably are. Paul Hoffman noted that the charter of the WG explicitly excludes phishing and confusing similarity. On another issue, Mark Davis noted that the tables document is structured such that the lists are not normative. The experience with Unicode is that people will just follow the lists and not the rules. Patrik replied that the tables need to be taken out of the document. Paul Hoffman asked whether there would be a non-normative table maintained by IANA. Otherwise, everyone would have to do a complete implementation. Stephane Bortzmeyer agreed, saying that the list of characters are useful at least during I-D phase. Patrik agreed to leave alone for now. 2c. Bidi -------------------------- Harald Alvestrand said that he had added text to the document stating with regard to labels containing RTL characters, "Here's what will fail when the conditions are met," and that the resolver MAY refuse to look up such domains. Andrew Sullivan asked for clarification of the role played by the resolver, being unhappy about the suggested action Lisa Dusseault said this wording made her nervous. Harald observed that one way of getting fewer of these cases displayed is to refuse to look them up. Suzanne Wolf suggested finding and using a term other than "resolver", since the term is obviously being used in a sense that differs from the one it has in specialized discourse about the DNS. Several additional people joined the discussion at this point, adding further perspectives to the consideration of terminology. Andrew Sullivan volunteered to draft specific wording. Harald asked for additional contributions to that action, with Suzanne Wolf responding affirmatively, and for further volunteers to pre-check his proposal before sending the draft to the repository. The Chair summarized the discussion with two points; first that clearer language was needed about the context in which the specified action was to take place, and second, that inter-label testing would be removed from the bidi rules, also to be clarified by explanatory rewording. 3. Rationale ========================== John Klensin began a discussion of "critical path" issues for the rationale document. Slides are at http://www3.ietf.org/proceedings/08jul/slides/idnabis-0.pdf. Mark Davis noted that there were two issues he was worried about. The first is the stability of labels. The second is the non-stability of non-labels. The current goal in the documents is that a current non-label remain a non-label forever. John replied that the current rules of the context registry require standards action, but that's because we don't know whether there is a safer approach, and in future the rule might be relaxed. Mark noted that this might mean changes weren't done in time for a new version of Unicode. Patrik FŠltstršm noted that there will be inter-operation problems anyway, because of the need to support different Unicode libraries in the field; and anyway, an uncontroversial action should take less than 6 months anyway (and a controversial one, while taking longer than 6 months, likely needs the additional deliberation). The next issue is the policy statements about zone administrators. There was some discussion of whether it is useful to have text that says, "You must have a policy," since one policy could be "everything in". Yoshiro Yoneya proposed a rule that children MUST adopt their parent's policy, but John replied that it wasn't a practical answer. Mark Davis suggested that the statements as they stand are not terribly meaningful, and just make the document set harder to read. Andrew Sullivan suggested doing this in a BCP document rather than in the protocol. After further discussion with several participants, the Chair felt that a body of information needed to be collated but that it was not certain how this should best be put forward. Marcos Sanz argued that the rules are not implementation-vacuous, because they have implications. He asked the WG not to issue a BCP on best registration policies, but said that if anything is important enough it ought to be treated in protocol. 4. Local Mapping and Preprocessing ================================== IDNA2003 specifies an approach that is lossy in some cases. IDNA2008 drops the mappings that cause this. This opens the question of how to treat the affected characters, and how to handle moving from IDNA2003 when dependent on the IDNA2003 mappings. One solution would be to specify new rules that are compatible with IDNA2003, but in that case one might as well stick with mapping. Alternatively, we could specify mandatory preprocessing, preprocessing in just some circumstances, or no mandatory preprocessing. Yoshiro Yoneya said he thinks there is a need for a good table, or else implementers will do nothing. Mark Davis suggested that part of the problem has to do with what people mean by "on the wire". He is concerned that the current document leaves it open for a client to do more or less anything it wants. He would like text that says, "Here is how the mapping works for IDNA2003," then everyone could do it that way. Pete Resnick suggested that the proposals being raised would probably entail big topics like, "If a user types something _like_ a domain name, there are transformations you should do, and they are these: " and that this WG is probably the wrong group to write such a document. Edmon Chung observed that any case in which two end users go to two different destinations due to local mappings, despite having typed the same characters on the keyboard, is obviously bad. Thomas Roessler mentioned that there are other groups who will step in if the WG doesn't do something, and those others may not do what WG would like. The meeting adjourned at 17:28 local time.