IETF December 2000 Proceedings

Current Meeting Report
Slides

2.3.9 Internationalized Domain Name (idn)

NOTE: This charter is a snapshot of the 49th IETF Meeting in San Diego, California. It may now be out-of-date. Last Modified: 12-Oct-00

Chair(s):

James Seng <jseng@pobox.org.sg>
Marc Blanchet <Marc.Blanchet@viagenie.qc.ca>

Internet Area Director(s):

Thomas Narten <narten@raleigh.ibm.com>
Erik Nordmark <nordmark@eng.sun.com>

Internet Area Advisor:

Erik Nordmark <nordmark@eng.sun.com>

Technical Advisor(s):

John Klensin <klensin@jck.com>
Harald Alvestrand <alvestrand@cisco.com>

Mailing Lists:

General Discussion:idn@ops.ietf.org
To Subscribe: idn-request@ops.ietf.org
Archive: ftp://ops.ietf.org/pub/lists/idn*

Description of Working Group:

The goal of the group is to specify the requirements for internationalized access to domain names and to specify a standards track protocol based on the requirements.

The scope of the group is to investigate the possible means of doing this and what methods are feasible given the technical impact they will have on the use of such names by humans as well as application programs, as well as the impact on other users and administrators of the domain name system.

A fundamental requirement in this work is to not disturb the current use and operation of the domain name system, and for the DNS to continue to allow any system anywhere to resolve any domain name.

The group will not address the question of what, if any, body should administer or control usage of names that use this functionality.

The group must identify consequences to the current deployed DNS infrastructure, the protocols and the applications as well as transition scenarios, where applicable.

The WG will actively ensure good communication with interested groups who are studying the problem of internationalized access to domain names.

The Action Item(s) for the Working Group are

1. An Informational RFC specifying the requirements for providing Internationalized access to domain names. The document should provide guidance for development solutions to this problem, taking localized (e.g. writing order) and related operational issues into consideration.

2. An Informational RFC or RFC's documenting the various proposals and Implementations of Internationalization (i18n) of Domain Names. The document(s) should also provide a technical evaluation of the proposals by the Working Group.

3. A standards track specification on access to internationalized domain names including specifying any transition issues.

Goals and Milestones:

Done



First draft of the requirements document

Done



Presentation and discussion at IETF-Adelaide

Done



Second version of the requirement document

Done



Final discussion on the requirement document

Aug 00



Req document wg last call

Sep 00



First draft of comparaison document

Dec 00



Final discussion of comparaison document

Dec 00



Protocol RFC first draft

Jan 01



Comparaison document wg last call

Mar 01



Protocol RFC second draft

Mar 01



Transition RFC first draft

Jun 01



Protocol RFC wg last call

Jun 01



Transition RFC second draft

Sep 01



Transition RFC wg last call

Internet-Drafts:

· Requirements of Internationalized Domain Names

· Comparison of Internationalized Domain Name Proposals

· RACE: Row-based ASCII Compatible Encoding for IDN

· Internationalized domain names using EDNS (IDNE)

· Preparation of Internationalized Host Names

· Using the Universal Character Set in the Domain Name System (UDNS)

· Architecture of Internationalized Domain Name System

· The DNSII Multilingual Domain Name Protocol

· Internationalized Host Names Using Resolvers and Applications (IDNRA)

· DNSII Multilingual Domain Name Resolution

· Simple ASCII Compatible Encoding (SACE)

· Internationalizing Host Names In Applications (IDNA)

· Han Ideograph (CJK) for Internationalized Domain Names

· DNSII Transitional Reflexive ASCII Compatible Encoding (TRACE)

· BRACE: Bi-mode Row-based ASCII-Compatible Encoding for IDN version 0.1.2

· Handling versions of internationalized domain names protocols

· LACE: Length-based ASCII Compatible Encoding for IDN

· Virtually Internationalized Domain Names (VIDN)

· Internationalized PTR Resource Record (IPTR)

· Japanese characters in multilingual domain name label

· Proposal for a determining process of ACE identifier

· DUDE: Differential Unicode Domain Encoding

· UTF-6 - Yet Another ASCII-Compatible Encoding for IDN

No Request For Comments

Current Meeting Report

IDN WG Meeting San Diego

* Working group update, Marc Blanchet

20 new documents since Pittsburg
most are going to be presented this week
not: UDNS, BRACE, SACE, CJK (last presented in Pittsburgh).

Status of documents
- requirements: last rev and go WG last call after San Diego

WG ACE Prefix registry if needed by the WG can be done on a temporary basis

* More radical solutions: directories and basic DNS changes, John Klensin

Internationalized access to Domain Names: Some Radical Solutions

Worried for some months about too narrow a focus. We're looking for quick solutions. Problem with quick solutions is that they are usually not as quick as they're supposed to be, especially in deployment. As a result, we have to get rid of them. This issue is very important since its effects are very extensive -- affecting current past and future.

Started exploring a different question -- what should we have done if we started in the beginning? This is a different question than how to we start from where we are today.

Short term question: how to get i18n domain names now, i.e., quick deployment. Steps taken for short term are with us forever.

Design for long term: problem figures out how to get there, preserve installed base

Designing for i18n:
- not fitting i18n into the network
- not just a DNS problem

- applications need to work and make sense
- need to support presentation of non-ASCII characters everywhere users would expect
- extreme user case is on whose keyboard doesn't have ASCII

- rethinking several assumptions

IDN may require a presentation layer
- IETF tradition usually

- keeps users close to protocol interface
- often done because we are lazy and it is easy
- or to keep things simple (may be the same thing)

- internationalization may not permit this
- difference between

- unique strings stored in Internet-wide databases
- appearance of names to users
- selection and choice mechanisms

two hypotheses / strawmen
- not a DNS problem

- we are looking in the wrong place
- very restricted protocol element, not ASCII
- still the right choice
- solve UI problems at another level

- DNS should be fully international
- content is names, not protocol elements

- names should reflect international usage

DNS is being pushed in a direction where it doesn't belong
- symptoms showing up in many places, not just i18n
- DNS doesn't do

- search
- ambiguity
- nearest server

directories
- not a technology or protocol but a class of them

- typically good at handling problems for which the DNS is weak

- multiple forms of lookup
- attributes with values
- tolerance for ambiguity

- IETF has never succeeded in widespread deployment of a directory protocol

- but maybe this issue is important enough

Directory protocol modules
- applications don't call DNS with user supplied names

- call directory with names or keywords
- involve user in evaluation if needed
- get results ad call DNS
- DNS labels remain "protocol elements"

For many applications, replace gethostbyname()

Directory advantages
- well adapted to user selection of target from a list
- use with keywords and descriptors not just rigid names
- attributes can identify
- types of use and business location (may help with trademarks)
- multiple server

Disadvantages
- the Internet has never deployed one
- reaching an agreement on protocol and schema
- too many options

Legacy application code:
- don't call directory, hence don't see internationalized names
- can use only a subset of Internet until upgraded
- transition similar to host table to DNS

Other hypothesis
- assume it should have contained non-ASCII (names not protocol elements) from the beginning
- would look like IS 10646 everywhere
- Mixing old and new conventions cause problems
- A new class? Obsolete "class=IN"
Everything going forward goes into the new class

Changing classes
- in principle, obsoletes everything

- no requirement to share server tree, delegations, etc. with class=IN

- Some advantages of preserving models

Transition issues
- copy existing records or search
- pointers from the new to old class
- political nightmares
- opportunity to not carry rubbish forward

Legacy applications and Code
- Don't know about new class so "see" only Class=IN
- non-ASCII names are invisible until they are upgraded

Shared problems
- DNS ASCII is convenient

- case independence and matching rules are easy
- very small number of characters

- Almost any international character set will require canonicalization and matching rules

- and may require localization

- ISO 10646 requires non-obvious coding decisions UTF-8 may not be the right choice

Documents
- Directory

- draft-klensin-dns-role-00.txt

- New Class

- draft-klensin-i18n-class-00.txt

Eliot Lear: the problems I face: we don't have a successful hierarchical directory. I don't think we have the luxury of time and we don't have the technology. I don't see anything that can scale as well as the DNS.

JK: let me lay part of that question aside. I can argue either side. The question of time is an interesting one. Our hope a solution would be converged upon fast enough that junk wouldn't be deployed. We lost. But problems with the trash are becoming obvious. Our only opportunity is to do it right. Half-right and quick won't do it. Taking the extra time to do it right is the right thing to do as it will help get rid of the trash. Some better than trash, some worse, average is trash.

Michael Mealing: you said there were components to directory service. Have you given any thought to what the components might be and whether we already have some of the components

JK: If I say "yes" and duck, would that be an appropriate answer? Directories are complicated. Talk about that as a design principle instead of getting bogged down into whois++/ldap is better arguments. After posting my draft I received two groups of comments. First group: first attempt to view it at an architectural issue. Second group: my schema better than yours. LDAP better than .... If WG comes to a decision that a directory approach is the best way forward, then we sit down and focus.

Keith Moore: in response to the new class proposal, it seems that what you're doing is partitioning the applications in the Internet to those who can deal with I18n and those that can't. The on the wire protocol doesn't matter, it is what the applications do with the protocol. The other thing is if we use this to de-cruft the DNS, trying to clean up things gets out of hand.

JK: No intent to change protocols. The rest is a bar discussion

Hideyo Imazu: Directory based solution is mostly for WWW. ACE based IDN solution can be applied to email. Among countries like Japan, China, Korea, internationalization of entire email address should be considered

JK: Directory works as well for email as for the web. Doesn't work for non-interactive, non-online applications. Architecturally, it feels like the right solution.

David Chadwick: Support the directory notion, users need the search facility. Once we have multiple characters sets, we must have a way to search.

James Seng: John's proposal is a very theoretical proposal. Fairly close to NGDNS. Internationalization of NGDNS is probably one of the problems that can be fixed. Some of the comments are very true. Good starting point for NGDNS. Not talking about long term deployment of NGDNS, looking at RACE for quick deployment. I'm afraid the chaos during transition to NGDNS would be disastrous.

JK: I believe it will take 10 years to deploy anything this WG comes out with.

JS: Yes and no. To do a directory would require a comprehensive re-architecture. Removing stuff from current DNS has not really been discussed.

Ted Hardie: Eliot was wrong, you aren't being radical enough. We have never had a presentation layer. Putting a directory layer between application and DNS is a presentation layer. Perhaps coming up with a presentation layer.

Bob Hinden: Both of these choices have plusses and minuses. There are harder issues and dealing with them scares me. Solve the important problems first.

JK: I think this is the most interesting and hardest problems since deploying IPv4. We need to take this very seriously.

* IDNA, Patrik Faltstrom

ACE from the Applications

Not so nervous about changing applications. Really nervous about backwards compatibility.

This proposal is to change the presentation layer in applications. Not change the application layer protocols, the DNS protocols, or the DNS servers. We only change the presentation and input mechanisms.

When a user inputs a domain name into the application, the information will be stored in whatever local script the application is using. The application must do a transformation anyway. Many applications must be better regardless of what we choose. All applications must transform from the local character set into something that is entirely backwards compatible.

Changes to applications:
- host names must be nameprep'd
- an ACE is applied
- sends encode name to resolver
Display of host names
+++

Known side effects
- un-updated applications will display obscure ACE format (leakage)
- moving names from update programs to un-updated programs might cause more leakage
- Non-IDN names that use the ACE prefix or suffix will either be considered illegal or will appear as nonsense characters
- The IDN WG must choose an ACe
- Doesn't internationalize text records in the DNS zone files

Updating name servers
- administrative interface for DNS servers must all ch3ck IDN names
- probably done with automated scripts converting from and to preferred native format (which will be different for different users)p
- Will probably be important to check all names with nameprep

No major comments (Yet!) to -00 draft
No planned revision

Pete Resnick: work in a protocol that has lots of layer violations. email embeds DNS names. Do we need sendmail to change headers to ace encoded things?

PF: one of the good things about this proposal is that it is an application layer issue.
Paul Hoffman: No. What we're talking about presentation layer change. If you've changed your application to handle IDNA, then all applications should handle IDNA. We don't want to touch things that IDNA will be looking at layer. You must change your paste function.

PR: I update Eudora to do IDNA, I encode for presentation, decode for transmission.

PF: Yes.

Yoneya: It should apply to any application protocol

PH: Applies to any protocol that has a presentation layer.

Rick Wesson: encoding and decoding leakage was experienced in the VGRS testbed. Broke lots of things. Leaks happen.

PF: leakage is a bad thing. If leakage occurs and if you know the ACE version, you can still figure out where things should go.

Hideo Imazu: Considering the fact that there are very few fonts that have all Unicode characters in it, even if a platform has full support for ACE encoding, in some cases, it can't encode/decode an ACE sequence for some characters. Even on a good implementation, leakage can happen.

PF: Yes. It might be possible for user to look at ACE encoded name.

PH: Decoding will be possible, but display may not be possible.

JS: another leakage not considered: user assuming application is internationalized and its not.

PF: Yes.

* Virtually internationalized domain names, Shim

Allows the use of i18n domain names, but without creating i18n names.

** Description

Most domain names in regions where English is not widely spoken, are created as the characters of the local language are transliterated into those of English language.
VIDN formalizes and uses this knowledge of transliteration
At one end, we have local characters, at the other end, we have regular DNS characters
1. local characters changed to phonemes
2. each phoneme is matched with English phoneme
3. English phonemes are united to form the transliterated name

provides one to many mappings

For one-to-one reversible mapping
a) each server is pre-assigned a unique code (e.g., Unicode)
b) the code is also generated by VIDN on the client whenever the corresponding virtual domain name is typed.
c) the code is compared to the code retrieved from the server
d) VIDN matches the code

** Implementation and Administration

Development of VIDN software for each local language may be done by a local standard body
- the codes for one-to-one mapping may be administered by international standard body such as IANA
or the codes may be administered by a local standards body

** Key features

VIDN does not require creating and registering additional domain names in local scripts

VIDN does not make any change to the current DNS infrastructure

VIDN does not need separate name server/resolver

** Testing results

Keywords and ACE require conversion, VIDN already there.

Web browser add-on:
- Korean-English conversion
- 800 KB

Comments/ Suggestions
- Comments and responses on the IETF IDN discussion mail archive
- Contact me

Itashiro: different ideographic script can have same pronunciation, how do you convert?

Shim: include one-to-one mapping scheme

Paul Hoffman: this can be used today, but only when a code isn't needed. If a code is needed, they'd be non-obvious to users, so your assertion that this is easier/faster to deploy isn't true. MIght be true for Korean or maybe Japanese, so same as ACE or IDNA.

Shim: 30%-40% will need the codes. ACE/IDNA will be 100%

Maynard Kang: if you add the addition of the code, then it is the same as ACE.

Internationalized PTR RR, Hong Bo Shi

Why not PTR:
- Current PTR and its mapping method can't support IDNS as well as traditional ASCII domain names
- It is impossible to let client to choose which it wants from those PTR records without any additional selection
- It makes no sense to return an unreadable domain name
- But no PTR record is worse.

Why IPTR:
- IPTR can give a language selection to client/end-user
- accordin to the language tag, the cli3ent/end-user can get what they want from a list of corresponding values

IPTR Format
4.3.1.1.in-addr.arpa. IPTR "LANGUGAGE" "name-in-utf8"
- the LANGUAGE field should be treated in a case insensitive manner and must follow the conventions defined in 1866

EDNS0 is require to implement IPTR
IPTR format
Langauge: 2 octets, an argument for IPTR to define kind of language used in the following IDN label

IPTR query/response
If qytpe=IPTR then all of the corresponding IPTR RRs should be returned in one response
Transport: use UDP first if TC bit set, then use TCP in future, EDNS0 is required.

PTR extension FOR IDNS ONLY case
it is very difficult to avoid IDN ONLY
IDN ONLY means a host only has its IDN but not any traditional ASCII domain name
- in above case PTR RR must not be null in a response message
- PTR RR must contain a domain name in ACE to co-op with IDN unaware systems

- else a syntax error message should be sent back with an administrator configures zone

Open issues:
- do you think we should return all corresponding IPTR records to a resolver
- nameserver should only send back one IDN in each language

- this kind of function has already been implemented, for example in BIND-9.x.x

To be or not to be
- it is said that the proposal that returns all the corresponding IPTR records increase the complexity to implement a resolver
- according to the suggestion it is better to let a server just feed back

The above suggestion is in the belief that the IPTR design is to introduce a language tag. it should be used in queries from client.

Demchenko: it is not enough just language, need character set + language

Jiang: intention was just language, not character set. the character set is supposed to be UTF-8

Hideo Imazu: specifying language is not enough in some cases.

Jiang: mixture of languages?

HI: some languages have two scripts

J: right. we're just trying to capture language, not script.

* Protocol Design Team Report, David Lawrence:

Primary task: categorize protocol proposals and make recommendation to WG
Members: DL, Olafur Guomundsson, Dan Oscarsson, Paul Hoffman
Observers/commentators: Marc Blanchet, Harald Alvestrand, John Klensin, Rob Austein, James Seng

Output of the design team
- report to the mailing list (didn't make it before this meeting, will have it soon afterwards
- comparison of protocols
- possibly other reports or recommendations

- updating 2425 to take a much deeper look
- looking at different ACEs

Categories of protocols, based on expected implementation
- Do a long term architecturally clean fix that requires upgrading to the whole naming infrastructure
- Change only the applications in the sort term and possibly the application protocols, but change none of the DNS infrastructure
- change the applications for short-term gain, but transition in the long term to a clearer solution that requires upgrades to the whole naming infrastructure (two-phase approach)

IDNA- application
IDNE- infrastructure

Big picture
- we cannot simply put binary characters into the current DNS without breaking many applications and some DNS servers
- none of the solutions at this point is a comprehensive solution that considers all the effects of the changes proposed
- the design team has not looked at all impacts on applications

Where the solutions fit
- long term solutions mostly involve changes to the current DNS infrastructure, although there is also a proposal for using a directory layer with internationalized input to find resources
- short term solutions are based on using ASCII compatible encoding in applications
- Two phase solutions are a mixture of the two

Infratructure solutions
- long term proposal require that all DNS servers in a request path be updated before the user can get a correct answer to a query for an internationalized host name
- different proposals and different legacy DNS servers will cause different error messages to get back to the user if their query traverse a server that was not upgraded
- a user can get different results for the same query
- maximum breakage of applications

ACE solutions, positive
- they are easier to implement than the long term solutions, but they are not without problems
- the obvious advantage is that update just applications will go faster than updating applications and the entire naming infrastructure
- they work on the presentation layer which means that they don't even require changes to any application protocols.

ACE solutions, negative
- ASCIIi-type ugly name leakage in non-updated applications
- non-IDN names that use the ACE prefix or suffix will either be considered illegal or will appear as nonsense characters
- the ACE solutions only internationalize host names, not textual material that appears in some types of DNS records
- IDN must choose an ACE
- versioning is harder in ACEs than in other proposals
- there are probably other ACE-specific implications that we haven't thought about yet

Two phase solution
- request that the applications be updated to handle an ACE encoding in the case the query using the long-term solution fails
- every application must work correctly with the short term solution and the long term part of the solution has all of the problems listed earlier
- it is likely that only the short term solution would be implemented unless the long term solution had other notable advantages

ACE considerations
- ACE must be designed to have one-to-one mapping and versioning
- they must continue to use 63 octet max for name parts, while other proposals cold extend the length of the name parts

Changes to applications
- all scenarios will require that all applications that use the new names be updated 2825 lists many protocols for which i18n may be very difficult.

Go with DNS infrastructure Change
- the decision about which of the solutions is chosen should be made by people in the DNS community and Application area with internationalization community

Directory infrastructure
- WG needs to work closely with the directory community for both protocol and schema interoperability
- no successful operational experience in an Internet wide directory service

Suggestion: go with application-only solution,
- focusing on the negative attributes of each solution, the design team considers the short--term "use ACE from the application"

Looking at costs:
- we note that the more arch. desirable infrastructure solutions are very costly in terms of new protocol work needed and upgrading deployed name servers
- predicting when any of the solutions might actually be useful is impossible, making them very difficult to sell to the Internet community

An ACE solution does not prevent an infrastructure solution
- fortunately choosing the ACE solution now does not preclude the EITF

What's next:
- design team finishes its report and send it to the WG
- design team finishes the comparison document
- somebody should study the impacts of the chose solution more in-depth
- WG decides

J Kim: forgot one negative -- two sided business card problem. Amazon.com in English, Amazon.com in ACE is a tour site. That is a serious problem to solve.

DL: which aspect of the two-side business card are you talking about. this is a registry issue.

JK: but the problems arise from a technical basis

OG: the ace encoded name will be long and ugly

RW: Verisign has posted their resolution proposal: UTF-8 on the wire, how did the design team feel about it.

OG: no comment.
PH: UTF-8 on the wire was a non-starter
DL: we're avoiding recommending a two phase solution

EL: operational infrastructure change is not mutually exclusive from presentation change. Refers back to John proposal. Two problems: representing domain names and transliteration. there is value add.

RB: DNS protocol supports 8 bit binary. Does not break DNS but might break other protocols.
Design Team: agree

MM: directory approach not feasible due to schema is a red herring as DNS RRs define schema. If you constrain your space, makes the problem more easily solved.

DL: Yup.

Proposal for a determining process of ACE identifier, Naomasa Murayama

Requirements of IDN technology:
- unified root
- interoperability
- compatibility to BIND

RACE: row based ACE
- prefix bq--
Brace: bimode row based ACE

Why ACE?

1. permitting 8-bit domain names and modify name server software to 8 bit clean
2. partitioning the current domain name space to accommodate MDNs using some kind of ACE

problems for approach 1
- difficulty in inter-operability
- needs a change of DNS protocol

problem for 2
- needs a consensus among all domain name registries in partitioning the DN space

How can we negotiate for a partitioning of the DN space

- how we can select ace identifiers
ACE identifiers :=- ACE prefixes | ace Suffixes

What is happening in the testbed of MDN in .com,.net,.org

registration started from Nov. 10 but <nihongo>.com is encoded and taken by ns.bulkregister.com on Nov. 2

ACE identifier candidates
- prefixes: AA--, AB--, ..., 99--
- suffixes: --AA, --AB, ..., --99

Relevant domain names:
aa--a.com, aa-b.org, ..99--zzzz.net, aa--x.co.jp, etc.
a-aa.com, b--aa.org, ..., zzzzz--99.or.kr, etc.

Proposal
step 1: tentative suspension of registering relevant domain names for ACE identifier candidates
step 2: conduct a survey of relevant domain names already registered
step 3: select about 10 to 20 identifiers one of which is for test and others for real use, based on the survey
step 4: permanent blocking of registrations of domain names relevant to the selected identifiers (except for registrations compliant to MDN semantics).

when writing an ACE proposal
author should either
- describe the ACE identifier as "to be decided"
which must be decided by the IDN WG or other organ when it is p[ublished as an Internet draft) or use an ACE identifier

When a proposal becomes an Internet standard
- when a specific ACE proposal is accepted as an Internet standard, the experimental ACE identifier should be replace by one for real use (hopefully decided by IANA)

Important change from -00 to -01

excluded suffixes of one hyphen followed by the alpha numerics from the candidates

Among 227, 852 registrations of .JP domain names, 23921 were relevant to these suffixes

Need cooperation of IETF, ICANN, and domain name registries.

* Handling revisions of IDN, Marc Blanchet

Problem statement: Unicode is going to have revisions because of characters, languages, scripts that change

Nameprep is going to have revisions. Nameprep should not necessarily be sync'd with Unicode revisions. We might fix bugs in nameprep, etc.

Protocol will have revisions. We should include versioning in the requirement document

Patrik Faltstrom: I think the requirement should be able to handle changes in Unicode. I'm not sure we need versioning. I have some ideas on how to handle this in ACE which would not affect nameprep. Not fully baked. Might not need versioning. Don't want to have versioning as the requirement. Unicode and nameprep will change.

MB: will work together on how to specify the wording for the requirements

Versioning with ACEs and IDNA-like approach: there is no protocol. One way: have a different prefix for each version. But: no negotiation is handled. Needs the same domain registered with different prefixes.

Versioning with DNS extensions

Version numbers: simple, increment by 1. More complex major.minor. minor changes table lookup, major being changes in lookup algorithm.

Table format simplified Unicode table.

Conclusion: we need versioning in IDN. We can do it in different ways and need to think about it.

Harald Alvestrand: versioning in the ACE means you'll either see every i18n domain name disappear each time you upgrade client or you'll need to do queries for each version. Having different labels is too broken for words.

Mark Welter: what would drive the versioning is characters forbidden at the application layer. Can push this to the registration layer.

MB: I'd prefer it if we don't need it at all. But we should think about it before doing the protocol.

Randy Prezen: You only care what happens after the ACE transform is applied.

MB: Just think about it. If we don't need it, good.

* Japanese characters in multilingual domain name label, Yoneya

Definition of characters to be used as JP characters in MDN label
Definition of JP characters to be normalized.

Def. of JP characters:
- idntabjp10.txt

- does not include NAMEPREP prohibited characters

- usual characters for JP names
- selection of chars is based on JIS
- table consists of code points in Unicode and corresponding JIS code

- does not mean specifying chars

6531 characters in table
Kanji: 6355
Hiragana: 83
Katackana: 86
Graphic:7

Definiton of normalization:
table of compatible characters to be canonicalized
idntabjpcanon10.txt
- one character must be added, will correct in next version
compatible characters prohibited in NAMEPREP but widely used in PCs, PDA, etc.
- half width katakana
- full width alphabets, numerics, hyphen
- table consists from code points in Unicode to be canonicalized
-- half width katakana and full width katakana must be canonicalized to the same thing

Def. of normalization (cont)
Table of characters to be composed
idntabjpcomp10.txt
composition of kana and voiced sound mark varieties
- the table consists of code point sequences in Unicode to be composed

- ka-tenten -> ga

Definiton of normalization rules:
1. canonicalize compatible characters

- adopt idntabjpcanon conversion

2. compose voiced sound marks

- adopt idmtabjpcomp conversion

Example:
1 canonicalization idntabjpcanon
2. canonicalization idntabjpcomp
3. NAMEPREP

Why:
- convenience for users and implementors

- explicit definition of usable characters and normalization
- Unicode KC is insufficient

- differences exist between VGRS and JPNIC

Mark Davis: the changes you are talking about are combining certain forms. I don't see the requirement for additional steps.

YY: difference is between JIS and Unicode KC.

MD: if you map these together in the folding step than KC takes care of this for you in the normalization step.

* Nameprep design team report, Paul Hoffman

This is a summary of what we posted to the mailing list last week.

Overview:
- make it easy for user to enter names.

- we don't want to make it hard.

- prohibit as little as possible

- not all domain names are entered by typing.

- keep names sensible

- there are plenty of chars (such as backspace) that are bad/dangerous.

- linguistic juggling.

- don't over-restrict. not a protocol discussion. "yes, good character. no, bad character"

Proposed changes from -00:
- fewer prohibitions on input

- may limit on output
- make it so that input programs do not need to follow the output rules

- make it easier to implement

- every application has to do nameprep regardless of protocol approach taken
- give tables for 2 of 3 steps

prohibit less:
- it is difficult and probably not useful to try to limit confusion

- e.g., should 'O' been prohibited because it looks like '0'.

- get out of the business of disallowing because they look alike

-01 will have much smaller list of prohibited characters
-00 prohibited compatibility characters. -01 says that if you can algorithmically change, then accept on input
- many examples in Arabic and Asian scripts

Change order of steps:
- ordering was prohibit -> fold (case mapping) -> normalize
- ordering is now map -> normalize -> prohibit

- prohibit on output

Many edge cases in ISO-10646, so doing mapping first can be very clean.

Currently we just do case mapping. New version will do additional mapping such as mapping all hyphen characters to normal hyphen. There are some special cases for case-mapping that need to be added so that all characters case-map as expected. Won't change semantic meaning of characters (in JP, hyphen and lengthener characters would be treated differently). At the end of the process, we won't have surprised characters.

Have option of mapping into nothing instead of prohibiting. Haven't specified which we would do this to, but Arabic and Hebrew vowels could be mapped to nothing (as per discussion on mailing list).

Use a couple of hundred line list of mappings to be done with the first step of nameprep. We think this would simplify things

New case folding:
- mapping to lowercase will be derived from Unicode case mapping file

Non-character code points:
- non-character codepoints will be listed as prohibited characters
- already Unicode code points assigned as non-characters
Make everyone look outside plane 0.

Remove location of nameprep
- this is a protocol issue. the protocol must say where nameprep is done.
- different protocol proposals need nameprep in different places

Change canonicalize to normalize

Next steps:
- WG reviews design team recommendations
- Marc and Paul produce -01 based on WG consensus
- Design team keeps working on remaining issues

- what is still prohibited on output vs. what is mapped to nothing on input
- a few specific characters need attention

PF: sent a proposal about taking unmapped characters and calling them prohibited or pass through depending on usage (registration or lookup). If you want to talk, I'll forward to nameprep design team.

PH: we'd love to see a specification

Mark Walter: did you find a solution to the dotted capital i problem

PH: yes, we picked one.

MW: Greek capital gamma looks like a capital Russian character, so there is room to masquerade one name as another. One way to fix this is to have the user look very hard. Greek user most likely not use Russian chars.

PH: the design group saw this as the '0'/'O' problem. We didn't want to go there. THis will go into the security

Rick Wilson: are some of these font characters?

PH: yes.

RW: Can you characterize where you lightened up? E.g., can you now put in Zapf Dingbats?

PH: yes. We also allow the compatibility characters

Eric Brunner(?): does the change of ordering allow us to fix problems introduced by authors of 10646?

PH: yes. If you have a list of errors, send them to the nameprep design team.

Chris Neuman: on the versioning issue, having a recommendation that deployed software be able to load mapping table without a new version of the software, would be reasonable and sufficient.

PH: please send the suggestion to the design team.

DNSII transitional reflexive ASCII compatible encoding, Edmund Chang

Trace is not another ACE. It is a deployment/implementation strategy.

Trace format is a zone file management system. We've put a control character into the ACE so people won't be able to register before things have been finalized.

Transition:
- ASCII to multilingual
- local encoding to ISO 10646
- ace to long term solution
- ace to ace

reflexive:
- deployed at the server end and activated only when certain criteria are met
- utilizes existing RR types as ad hoc records: CNAME, DNAME

- using DNAME is probably better

Long term solution:
- ACE

- strength: easy to deploy
- weakness: version control

- Protocol approach

- expandability
- weakness: more difficult to deploy

- Phased hybrid approach

- ACE approach as immediate and fallback, protocol approach as next generation (eliminating ACE versioning requirement)

Bit-flag based:
- DNSII-TRace format
- \127\127ILET-Hex
Possible implementation:
- \12701--UTF8inhex
- \12701--acestring

Quasi-directory based
- DNS directory hybrid

- utilizing the DNS wildcard: *

- *.domain in zonefile
- employ separate server for lookup and sub-delegation

OpenIDN
- open-sourced NeDNS
- Current implementations:
- RACE, simple hex dump DNSII & TRACE
- Contemplated Additoins:

- IDNE, DNS CLASS (john Klensin)

- Invite all I-D authors and interested parties to contribute to the IDN server experiments
- http://www.openidn.org

Paul Hoffman: have you done a draft on this?

EC: Yes, TRACE.

PH: Have you released the IPR for your patents?

EC: I send the info to the secretariat when we put the site up.

JK: For those whose proposals or plans require changing or doing tricky things with the DNS, please remember that it is a complex protocol. Wildcards will get you. DNS is UDP, it has timeout properties. Not many tries possible. Please understand the protocol before you go

EC: that's exactly why we want to create a working prototype and see what happens.

JK: the interesting thing about the IETF is that we have exactly one problem: scaling. A prototype can often tell us exactly nothing.

* LACE: Length based ACE for IDN, Mark Davis

Goal: simple design with good compression. Uses run length encoding. Uses base 32 encoding

Input is UTF-16 after nameprep. Take each sequence of common top bytes. If total length happens to be longer than the original, then you just quote the UTF-16 and base32 encode

Same compression as RACE for incompressible characters.

Simple. All code points are equal. No quoting necessary, except when no compression possible.

RW: I like this. You should update your draft

JS: Can you explain why LACE would have better compression than RACE?

MD: If you have two different scripts, LACE will be better than RACE.

MW: Any comments on how it handles names outside plane 0?

MD: It handles it as UTF-16

YY: LACE is simple and efficient for Japanese

* Designing an ACE for IDN, Mark Welter

Had a bunch of ACEs proposed, explores design base well.

Using Unicode as a base is good. Simplicity and efficiency are where difference lie.

Enoding algorithm should be straight forward.
If possible, making it pencil and paper algorithm would be nice.

My two schemes are nibble based.

Efficiency:
- should have uniform treatment for the various scripts
- CJK are pre-compressed, can't expect too much better than hex

Handling surrogates
- in our UTF-6 proposal, we treated surrogates to 16 bit quantities and closed to our eyes to the issue
- we should be dealing with surrogates expanded

DUDE:
- encoding based on radix 16, representation of initial code point followed by encoded hex diffs of subsequent codepoints

What about surrogates:
- if you don't expand surrogates, the worst case limits are half to two thirds of claimed name lengths
- DUDE handles expanded surrogates gracefully

Ways to separate IDNs from ordinary DNs
- add a per segment redirector
- add a once per name redirector
- combination of above

MD: Characters above FFFF will be extremely rare, so trying to compress is a waste of effort.

MW: focus is handling full Unicode. Matter of taste.

* Update on RACE, Paul Hoffman

Major changes in -03 draft
- added the need to check for all -STD13 names before encoding and after decoding
- added many error conditions in both the ACE and the Base32 encoding and decoding

- didn't change anything on the wire

- Changed all the examples to use lowercase characters on input

- nameprep is going to change everything to lowercase

What I didn't change:
- left the prefix the same because the bits on the wire are the same

Verisign is using RACE in their testbed. They provided the first 1500 rejections. Some were base on Verisign's prohibited character. Most were errors in RACE encoding. Speaks poorly of RACE's ease of implementation

RW: for those that are implementing RACE, there is a mailing list for developers. Send mail to Rick <rick@ar.com>.

* Judging ACEs, Paul Hoffman

Going to have to pick an ACE. What we should look at.

Features of a good ACE:
- it has compression

- least restriction for total name length
- shorter transmissions

- simplicity

- easy to code
- easy to find bugs in code
- few special cases

Compression:
- prohibiting sensible long names is bad
- all ACEs allow different length for different scripts
- shorter transmissions is useful but not nearly as important as not restricting long names

- if we have a trademark, go with longer names

Simplicity:
- RACE has show that implementors can easily get it wrong
- Even if compression step is easy, if decompressing is hard, display and security errors will be made

Complexity:
- special cases get missed or are misunderstood
- bit stuffing achieves better compression but is very difficult for many programmers

- even base32 seems hard for some

Mandatory Features for all ACEs:
- encoding a string of characters can have only one result
- decoding an ACE string can have only one result
- There must be a way to indicate a version number

Summary
- compression
- simplicity
- mandatory features
- what else?

MW: in terms of decoding, it is complex enough that the only way you can guarantee is to take the resulting decoded Unicode, encode it and decode it again to see it is the same.

PH: I agree.

* Discussion and WG Next Steps, Marc Blanchet

We think we have a lot of solutions. We should narrow the solution space. We've had two design teams, one on nameprep one on protocol.

Is the nameprep work the right thing to do as a major WG orientation?

RW: yes, I agree with what you are going and should continue to do it.

MB: Namprep right direction?

MB: on the protocol, we have a recommendation to use an ACE (not disallowing something different in the future). Do we have enough information?

RW: I think some of the things John Klensin proposed should be looked into further. I don't think we should pick RACE.

Dongman Lee: I'd like more information about 8-bit approach problems. A web page or report would be helpful.

MB: 8-bit on the wire or 8-bit tagged.

Dongman Lee: more interested in the problems

RW: there will be a report kicked to the DNSO on operational considerations for the registrar constituency

Erik Nordmark: Are you asking to choose an ACE today?

MB: The question was are we going forward with the design team recommendation.

Rob Austein: I believe continuing to figure out which ACE to choose is good. Both of JK's drafts came in late. I don't think we've had time to consider them properly. My major concern is that we may never move on from an ACE if we deploy it.

Olafur Gudmundsson: it was very hard to recommend an ACE while on the protocol design time. I'm not sure it will work, but I can't find a place in applications where it will fail. We're looking for 7-10 year deployment period. All current proposals fail in one way or another.

Paul Hoffman: if we go with a short-term (application only ACE), we are not preventing longer term solutions. I want a long term directory solution.

David Lawrence: as a member of the protocol design team, I don't want the IETF/IDN to be marginalized if we don't respond in a timeframe that matches the market demand.

Ted Hardie: getting something out of the Internet infrastructure is hard. Effort should be focused on the more general solution. There might be other areas of the presentation side of things that need to be looked at.

Patrik Faltstrom: as an area director, seeing what works and not works doesn't depend on the applications, we need to look at other protocols. Regarding the ACE encodings, I think the path we are going is a much more layered design than we normally do. This doesn't preclude inserting a directory layer

John Klensin: this working group has two choices regarding market forces and competitive ideas. Do it hastily or do it right. Too late to do it hastily. If the good solution solves the problem better than the market, the right solution wins.

Harald Alvestrand: before we leave this room today we should have an idea of whether we should go with ACE or not.

Mark Davis: I agree with the last speaker. People have been talking about domain names as if what you see is what you get, but the stuff over the wire is just bits. Using one of the ACEs gives a different set of bits.

<>: Comfortable with what the protocol design team did. I don't think we should have a final sign off unless we have a transition strategy for the long term solution.

MB: Many said we should choose ACE. Many said we should think more.

JS: Do we have enough information to say the protocol design team is working the right direction?

PH: We thought we were done.

MB: Is there enough information to make a decision?

JK: Of the people who think they have enough information, do you feel you know enough about the DNS?

DL: is the consensus of the working group to be focusing on an application area approach. Should we stop looking at infrastructure approaches.

HA: this does not mean the infrastructure need replacing. just that this WG is not the one to do it.

MB: Next step is choosing an ACE.

Slides

None received.