Minutes from IDN BOF, IETF 71
Philadelphia, PA, 2008-03-12
Chair: Harald Alvestrand.

The Chair brought the meeting to order.

A.  Administration; Agenda; Plan of the Meeting.

Administrative matters.  The Chair asked for additional
agenda items.  There were no additions.

The Chair reviewed the agenda.

The first part of the meeting: document review.

    The Chair noted that list discussion tends to focus on very
    specific, technical issues.  In order to cover the items on the
    agenda, he asked that specific discussion be deferred until later
    in the meeting, so that the current documents could be reviewed
    broadly.

Following document review: charter review.  

After the charter review: certain linguistic and cultural issues that
might need addressing at the protocol level.  Note that this item was
removed late in the meeting because of lack of time.

[A note on minutes convention: items mentioned on the slide for which
there was not substantive expansion are not mentioned in here.]

B.  Document review.

1.  Introduction to an IDN revision.  John Klensin

    slides: http://www3.ietf.org/proceedings/08mar/slides/idn-3.pdf


    	How many people had looked at recent versions of the relevant
    	documents?  Many.  John said he would proceed quickly to cover
    	the high points, relying on the fact that many people had read
    	the documents.

	"History".  Emphasise that some of these problems are
	simply not possible to solve: 
	      
	      - domain names are at best a subset of a language
	      ("can't write literature in domain names")

	      - Unicode not perfect (inventing a completely new
	      	encoding scheme is not a likely solution)

	"Evolution".  RFC 4690 might be different if it were written
	today.  This is an open effort, and there is evidence of that
	openness.

	"Key Issues I".  It's important to understand how and why the
	things left out of IDNA2003 are important.  The problems are
	real problems to users of those languages, even if there are
	only small numbers of such speakers.  
	     That said, John noted that it is important to avoid the
	trap of thinking everything can fit in the DNS.  Emphasise
	that the IETF does not have a consensus mechanism for solving
	orthographic or linguistic disputes.  Emphasise also the
	serious problem of characters that are dangerous in
	themselves, but make certain words impossible to write at all.

	"Key Issues II".  Emphasise that it doesn't matter what one
	thinks of the linguistic foundation of complaints about
	IDNA2003's handling of certain scripts.  The IETF hasn't a way
	to resolve such disputes, but must listen and try to resolve
	the complaints.

	"Key Issues III".  John noted that some of the problems are
	because of natural analogies with the way "traditional" DNS
	works (e.g. the relationship between, say, ASCII lower case
	and upper case; and the relationship between certain
	characters in some other script).  John also emphasised the
	importance of libraries in the apparent issues: IDNA2003
	specified a certain version of Unicode, but applications don't
	know and can't learn what version they're actually using.  and
	even if they could, it wouldn't help).  He also emphasised
	that the "stable list" approach in IDNS2003 turned out not to
	work.

	"Key Goals". (No expansion)
		       
	"Current Structure".  John emphasised that the point of a
	"why" document is to prevent people making up own
	rules & principles because they don't understand why the rules
	are as they are.

	"DNS Internationalization".  John mentioned that the "common
	sense" that users need to have may require some education
	in order to develop it.

QUESTIONS FROM THE FLOOR

	a.  Ted Hardie asked whether a document is needed outlining
	which other protocols need (or need not) to pay attention to
	this new work.  John replied that in some sense, the answer is
	that the need is restricted to issues of localization.  On the
	other hand, because of the possibility of new label separators
	and the ubiquity of DNS label parsing, everything is going to
	need to know something about it, so the idea that IDNs can be
	implemented entirely on the client side is dead.

	b.  Phillip Hallam-Baker suggested a caveat that the group
	will not undo what has already happened: existing
	registrations, even those not conforming to the new approach,
	will remain legal. John replied that in this case it is more
	important to design it to work well, rather than to address
	current pathologies.  

	c.  An unnamed speaker (Lisa Dussault?) asked why the apparent
	introduction of a new label separator appears to have been
	assumed, rather than argued for.  John replied that the basic
	problem is an analogy made by non-ASCII users between the
	period/full stop and the dot label separator on the one hand,
	and a non-ASCII sentence separator and a same-character label
	separator.  There is no way to be sure the list of such
	candidate separators is bounded.  Finally, there's the
	practical issue that some people complain they don't have the
	dot on their keyboard.  An unnamed speaker (Paul Hoffman?)
	noted also that in IDNA2003, the "dot mapping" went into all
	mappings, and that's isn't to be the case under the new approach.

2.  Issues and Rationale.  John Klensin.

    slides: continue in same deck

    	"Issues and Rationale".  (no expansion)
	
	"Address Primary Issues".  (no expansion)

	"New Terminology".  John emphasised the difference of approach
	here: the IDNA2003 documents were about labels, and not
	FQDNs.  

	"The Front End".  (no expansion)

	"Summary".  (no expansion)

QUESTIONS FROM THE FLOOR

        a. [Name not clear: Yoshiro YONEYA?] asked about the issues of
	usability with different label separators.  John replied that
	he didn't have a complete answer, but he noted that there is a
	difference between mapping label separators and other parts of
	the FQDN, because different targets need to know these maps.
	It's really a perspective issue, because what goes on the wire
	doesn't change that much between IDNA2003 and 200x; but what
	goes is changes.

	b.  Stuart [name not clear] from Apple expressed surprise that
	people collapsed "dot" and "full stop".  John made some
	remarks about the historical name of the "." character, and
	the decimal-number separator in some European usage.

	c.  Harald Alvestrand observed that the main issues seem to be
	some protocol issues plus the matter of separators.  John
	replied that the most important differences lie in trying to
	define rules and mechanisms to which one can conform rather
	than an algorithm one can implement.  In some way it's
	simpler, because the mappings are gone (in contrast to
	IDNA2003).

	d.  An unnamed person noted that there seems to be a
	difference where in IDNA2003 the dots were "in the mapping",
	whereas now the mapping has to be "in before the protocol".
	 
3.  Tables.  Patrik Fältström

    slides: http://www3.ietf.org/proceedings/08mar/slides/idn-1.pdf

    	"Abstract".  (no expansion)
	  
	"What is this".  Patrik emphasised that he's trying to ensure
	that there is more than one algorithm to generate the same
	tables; he's had some success with this.  John mentioned that
	doing it this way may help eventual users understand the way
	it works.

	"Algorithm/tables".  Note which parts are normative.

	"Property values". These will become clearer later in the
	presentation.

	"Category A".  Each category is determined by rules; the rules
	decide whether some code point belongs to a category.  This
	category encompasses the "good codepoints", which mostly means
	that this is where graphic characters and such like are thrown
	away.

	"Category B".  It's important that there is "stability" in
	normalization and casefolding.  John noted that it's important
	to tease out different meanings of "stability", and then
	promptly refused to say exactly what it means.

	"Category C".  (no expansion)

	"Category D".  Whereas C calls out individual characters, D
	calls out whole blocks.  John noted that this means you can't
	write music in DNS either.

QUESTIONS FROM THE FLOOR

	a.  [Unnamed] asked whether this is a stable list.  Patrik
	replied that this is what's in version 5 of the tables
	document; the rules have changed in different versions.

	b.  Ted Hardie asked whether, if a new code point is added to
	a list, then does the list have to change?  The answer is
	yes.  John also noted that several of these characters are
	likely already to have been picked by other rules.

	c.  Phillip Hallam-Baker noted that the current approach is
	not specifying what the attacks to which the approach is
	responding.  Patrik replied that the point is to extend the
	"LDH rule" to international contexts, so all this is doing is
	picking out from all the possible characters the ones that are
	not in "internationalized LDH".  John said that the work is
	not motivated by security.  Phillip pressed, though, asking
	why these ones are the ones that are "out".  Patrik made an
	analogy with control characters or NULL in hostnames.  Harald
	suggested the LDH principle: for IDNs to be useful, they have
	to be a subset of all language.

	"Category E".  (no expansion)

	"Category F".  These are important exceptions, called out
	individually.  Notice that they have individually assigned
	properties.  John mentioned that, except for hyphen-minus
	(which is historic), these are all mismatches in the way DNS
	and Unicode are optimized.

	"Category G".  This is a placeholder category for things that
	somehow get missed.  Let's hope it stays empty.

	"Category H".  (no expansion)

	"Category I".  H and I are worded carefully to avoid cases
	where "can't happen" cases happen.  If some unanalyzed script
	turns up that causes problems, this is an out ("banned unless
	a special rule").
	 
	"Category J". (no expansion)

	"Algorithm explanation": roughly, do these in order of
	FGEHIBCDJA,

QUESTIONS FROM THE FLOOR
	
	a.  [Unnamed]: Has anyone come up with a mnemonic for the
	table order?  Patrik noted that there've been various
	suggestions about how to name the categories, but
	disadvantages in each case.
 
	b.  Tony Hansen said he couldn't tell which ones are positive
	categories and which negative.  Patrik replied that PVALID is
	"ok", DISALLOWED is "bad", and "table lookup" means that it's
	dependent on the codepoint itself.  Harald noted that the idea
	is simply to be able to crank out the table values over again
	for the next version of Unicode, with no human judgement
	involved.  Tony asked about CONTEXTO.  John replied that this
	is a dirty engineering trick, and asked for a better answer.
	The problem here is that some characters cause problems in
	some contexts, but are needed in others.  So there are two
	meta-rules: if no rule, then the character is prohibited; if
	there's a rule, then follow that rule.  He made the analogy
	with regular expressions: "Is this codepoint in that script?"
	     There is a possible difference between CONTEXTO and
	CONTEXTJ, but not everyone agrees.  Some things are really
	problematic and need to be checked at lookup (J): even if they
	somehow get registered, they should never get looked up.
	Others are less troublesome.  If the distinction is removed,
	the everything has to be checked at lookup time.  So this
	distinction is really an application optimisation.

4.  Bidi.  Harald Alvestrand

    slides: none
    	
	In IDNA2003, only if the first character is RTL and the last
	character is RTL can the label can be RTL.  The problem with
	this is non-spacing or combining marks (most combining marks
	are non-spacing, but two aren't).  Some languages (2 have been
	mentioned in a draft) have words with a combining mark at the
	end, and the combining mark has no direction.  So under
	IDNA2003, you can't use that language at all.  So Cary Karp
	and Harald proposed a new rule: a nonspacing mark may occur at
	the end of a label.  After hacking up some Perl to check the
	way this might work, it turned out that some ASCII labels next
	to some RTL labels will break.  It also turns out that Arabic
	numbers cannot be mixed with European numbers.  Nothing can
	start with "-" or numbers.

QUESTIONS FROM THE FLOOR

	a.  [Unnamed] So server5.3com.com is bad?  Harald: yes, if
	"server5" is in Arabic.

	b.  Pete Resnick said that the rules seem good, but seem to
	break things that "should work"; so the principle must be
	wrong.  Harald replied that he worries about what happens when
	a user gets email in an RTL script with an IRI in it
	containing a domain name: will they blame their application?
	Pete thought that meant the application was broken.  John
	mentioned also the problem that no modern language is strictly
	RTL, because everyone uses decimal numbers.  This also
	highlights the problems with "foo123" as a single label or a
	label made up of "f" "o" "o" "1" "2" "3".

	c.  [Unnamed: Paul Hoffman?] notes that there are names
	currently allowed in IDNA2003 that are in use, that will
	become illegal under the new rules.  Harald asked for an
	example, but also noted that he's willing to take the hit if
	it will make things better (but one can only do it once).

5.  Charter review.  Lisa Dussault, in role as AD.

    slides: http://www3.ietf.org/proceedings/08mar/slides/idn-2.ppt
    [note: this segment of the minutes is not keyed to individual
    slides]

	AD said that she is hoping for a working group that can
	resolve some of the issues, call for consensus, and answer the
	call in a reasonable amount of time.  There was some pressure
	not to have a working group, but there seemed not to be a
	consensus in external review.  She asked that people identify
	whether they are opposed to the WG, and comments in favour
	(even if in the latter case with scope restrictions).

	She asked for a show of who had seen the charter.  There
	seemed to be many who had.

QUESTIONS AND DISCUSSION FROM THE FLOOR

	a.  Stephane Bortzmeyer asked that the requirements of the
	work be spelled out more clearly.  In particular, it seems
	that removing references to RFC 4690 from the charter is
	needed.  For instance, 4690 discusses phishing extensively,
	but John's presentation explicitly called it one of the
	problems that can't be solved by this effort.
	
	*** AD called for hum on leaving phishing out of charter.  
	    +++ result: consensus  
	[scribe's note: see also below, item e]

	b.  Phillip Hallam-Baker argued that the charter needs
	specific reasons for changes.  AD asked for a suggestion
	for the charter.  Patrik Fältström noted a need for explicit
	examples.  John Klensin argued that the point is to try to
	support languages, not look at Unicode and figure out "what we
	can't have".  He opposed the latter approach.  

	*** No proposed charter text on this item; no sense of room test

	c.  Ted Hardie said that the milestones in the proposed
	charter were not practical.  AD noted that actual months
	removed from slides because of a similar comment in external
	review.  

	*** No proposed charter text on this item; no sense of room test

	d.  Ted Hardie noted that there is a significant need for
	tutorial material, and asked whether this should be added to
	the charter, or left for the IAB.  John Klensin agreed, but
	worried that this would lead to specifying user interface
	details.  AD asked for a volunteer to write a tutorial.  Dave
	Crocker noted that tutorials are good and probably important,
	but audience considerations might yield many possible
	tutorials, which expands the scope [beyond a tractable WG
	charter?].  Phillip Hallam-Baker noted an apparent absurdity
	in writing an ASCII-only [RFC] tutorial for
	internationalization.  

	*** AD called for hum on adding tutorial matter to charter
	    +++ result: indistinct.  Take to list

	e.  [Name not clear: Marcos Sanz?] noted that the earlier
	argument was in favour of removing references to RFC 4196, but
	that that was not the question posed at hum in (a).  Stephane
	Bortzmeyer argued that he wanted to mention explicitly in the
	charter that solving phishing was not a goal, because that's
	been one of the "hottest" issues with IDNA2003.  Phillip
	Hallam-Baker suggested that it would be better to exclude the
	subset of IDNA-related phishing cases than phishing itself;
	nobody ever thought all phishing was IDNA-based.  Harald
	Alvestrand objected to removing 4690, because he'd heard no
	rationale.  [Unnamed: Paul Hoffman?] replied that 4690 is a
	laundry list; and if it's mentioned, the chartered WG would
	have to answer every item on list.  [Original speaker:
	Marcos?] said he did not want to talk about perceived
	inadequacies of the current system, because the issue is to
	address what to do (positively).

	*** AD called for hum on keeping or removing reference to RFC
            4690.
	    +++ result: remove from charter

	f.  In response to foregoing, [Unnamed: Paul Hoffman?]
	suggested that the issue was with specific documents listed.
	It would be better to talk about goals, not specific
	documents.  AD stated that such is a matter of charter
	clarity, not scoping, so it's a topic for the list.

	*** No possible sense of room test.

	g.  [Name not clear] noted that the earlier IDN working group
	was established in the Internet area, and this group is
	contemplated in the Applications area.  He wanted to know why.
	AD responded that the earlier effort had to be in Internet
	area because there was an option to change (radically) the DNS
	in order to achieve the goals.  In this case, the relevant
	work is already in the Applications area.  If the result of
	the working group is in fact a consensus that a deeper change
	is necessary, a recharter under the Internet area could be
	required.  That said, this is clearly work that affects other
	areas, and the group cannot work in isolation from those
	areas.  We're still part of the IETF.

	*** Clarification question; no sense of room test.

	h.  Jelte Jansen for Simon Josefsson (on Jabber): Is this work
	going to obsolete INDA2003, Stringprep, Nameprep, &c?  Which
	things?  AD responded that such is (see f) part of charter
	clarification, and a topic for clarification on list. 

	*** No sense of room possible.

	i.  Stephane Bortzmeyer observed that some technical decisions
	appear to have been prejudged in proposed charter;
	specifically, what is checked at registration, vs. what is
	checked at resolution.  AD asked whether it would be ok if the
	distinction remains in the charter, as long as there is no
	requirement that the difference does not necessarily entail
	different classes in the ultimate work result.  Stephane
	replied that the charter needs to be clear that the goal is
	not to dictate what is valid at registration time.  [Name not
	clear: Michael?] supported Stephane, and suggested a rewording
	using a distinction between stored labels as opposed to
	lookups instead of registration and resolution.  Patrik
	Fältström also expressed strong agreement with mentioning
	registration and resolution as different things.

	*** AD concluded that this is complicated and needs to move to
            the list.

	j.  Tony Hansen expressed concern with whether the continued
	use of xn-- as a prefix is set in stone.  Paul Hoffman argued
	that the current charter restriction must remain.  The BOF was
	out of time, and this discussion appeared inconclusive

	*** No determination of status of xn-- prefix.

	k.  Phillip Hallam-Baker suggested that the charter should
	emphasise the goal of solving "badness" rather than technical
	perfection.  Several respondents disagreed.

	*** No text or specific charter items proposed, so no sense of
            room test possible.