idnits 2.17.1 

draft-ietf-idn-compare-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 688 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There is 1 instance of too long lines in the document, the longest one
     being 1 character in excess of 72.

  ** The abstract seems to contain references ([IDN-REQ]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 11, 2000) is 8690 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC1034' is mentioned on line 122, but not defined

  == Missing Reference: 'UTR-15' is mentioned on line 446, but not defined

  == Missing Reference: 'HOFFMAN' is mentioned on line 640, but not defined

  == Missing Reference: 'OSCARSSON' is mentioned on line 658, but not defined

  == Unused Reference: 'UTR15' is defined on line 636, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BLOCK-NAMES'

  == Outdated reference: A later version (-04) exists of
     draft-duerst-i18n-norm-03

  -- Possible downref: Normative reference to a draft: ref. 'DUERST' 

  == Outdated reference: A later version (-10) exists of
     draft-ietf-idn-requirements-02

  -- Possible downref: Normative reference to a draft: ref. 'IDN-REQ' 

  == Outdated reference: A later version (-02) exists of
     draft-ietf-idn-idne-01

  -- Possible downref: Normative reference to a draft: ref. 'IDNE' 

  == Outdated reference: A later version (-06) exists of
     draft-skwan-utf8-dns-03

  -- Possible downref: Normative reference to a draft: ref. 'KWAN' 

  == Outdated reference: A later version (-03) exists of
     draft-ietf-idn-race-00

  -- Possible downref: Normative reference to a draft: ref. 'RACE' 

  ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629)

  ** Obsolete normative reference: RFC 2671 (Obsoleted by RFC 6891)

  == Outdated reference: A later version (-02) exists of draft-jseng-utf5-01

  -- Possible downref: Normative reference to a draft: ref. 'SENG' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UDNS'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15'


     Summary: 8 errors (**), 0 flaws (~~), 13 warnings (==), 11 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Draft                                         Paul Hoffman
2	draft-ietf-idn-compare-01.txt                            IMC & VPNC
3	July 11, 2000
4	Expires in six months

6	Comparison of Internationalized Domain Name Proposals

8	Status of this memo

10	This document is an Internet-Draft and is in full conformance with all
11	provisions of Section 10 of RFC 2026.

13	Internet-Drafts are working documents of the Internet Engineering Task
14	Force (IETF), its areas, and its working groups. Note that other groups
15	may also distribute working documents as Internet-Drafts.

17	Internet-Drafts are draft documents valid for a maximum of six months
18	and may be updated, replaced, or obsoleted by other documents at any
19	time. It is inappropriate to use Internet-Drafts as reference material
20	or to cite them other than as "work in progress."

22	     The list of current Internet-Drafts can be accessed at
23	     http://www.ietf.org/ietf/1id-abstracts.txt

25	     The list of Internet-Draft Shadow Directories can be accessed at
26	     http://www.ietf.org/shadow.html.

28	Abstract

30	The IDN Working Group is working on proposals for internationalized
31	domain names that might become a standard in the IETF. Before a single
32	full proposal can be made, competing proposals must be compared on a
33	wide range of requirements and desired features. This document compares
34	the many parts of a comprehensive protocol that have been proposed. It
35	is the companion document to "Requirements of Internationalized Domain
36	Names" [IDN-REQ], which lays out the requirements for the
37	internationalized domain name protocol.

39	1. Introduction

41	As the IDN Working Group has discussed the requirements for IDN,
42	suggestions have been made for various candidate protocols that might
43	meet the requirements. These proposals have been somewhat helpful in
44	bringing up real-world needs for the requirements.

46	It became clear no single proposal had wide agreement from the working
47	group. In fact, the authors of various proposals found themselves taking
48	some features from other proposals as they revised their drafts. At the
49	same time, working group participants were making suggestions for
50	incremental changes that might affect more than one proposal.

52	Because of this mixing and matching, it was decided that this IDN
53	comparisons document should compare features that might end up in the
54	final protocol, not full protocol suggestions themselves. The features
55	that had been discussed in the working group were divided by function,
56	and appear in this document in separate sections. For each function,
57	there are multiple suggestions for protocol elements that might meet the
58	requirements that are described in [IDN-REQ].

60	This document is being discussed on the "idn" mailing list. To join the
61	list, send a message to <majordomo@ops.ietf.org> with the words
62	"subscribe idn" in the body of the message. Archives of the mailing list
63	can also be found at ftp://ops.ietf.org/pub/lists/idn*.

65	1.1 Format of this document

67	Each section covers one feature that has been discussed as being part of
68	the final IDN solution. Within each section, alternate proposals are
69	listed with the major perceived pros and cons of the proposal. Also,
70	each proposal is given a label to make discussion of this document (and
71	of the proposals themselves) easier.

73	References to the numbered requirements in [IDN-REQ] are from version
74	-02 of that document. These numbers are expected to change and the
75	requirements document evolves. In this draft, the requirements are show
76	as "[#n-02]", where "n" is the requirement number from draft -02 of
77	[IDN-REQ]. This document only lists where particular proposals don't
78	meet particular requirmenents from [IDN-REQ], not the ones that they
79	fulfill.

81	Note that this document is supposed to reflect the discussion of all
82	proposed alternatives, not just the ones that fully match the
83	requirements in [IDN-REQ]. It will serve as a summary of the discussion
84	in the IDN WG for readers in the future who may want to know why certain
85	alternatives were not chosen for the eventual protocol.

87	The proposal drafts covered in this document are:

89	[DUERST] Character Normalization in IETF Protocols,
90	draft-duerst-i18n-norm-03

92	[IDNE] Internationalized domain names using EDNS (IDNE),
93	draft-ietf-idn-idne-01

95	[KWAN] Using the UTF-8 Character Set in the Domain Name System,
96	draft-skwan-utf8-dns-03

98	[RACE] RACE: Row-based ASCII Compatible Encoding for IDN,
99	draft-ietf-idn-race-00

101	[SENG] UTF-5, a transformation format of Unicode and ISO 10646,
102	draft-jseng-utf5-01

104	[UDNS] Using the Universal Character Set in the Domain Name System
105	(UDNS), draft-ietf-idn-udns-00

107	2. Architecture

109	One of the biggest questions raised early in the IDN discussion was what
110	the format of internationalized name parts would be on the wire, that
111	is, between the user's computer and the DNS resolvers. It was agreed
112	that the DNS protocols certainly allow non-ASCII octets in domain name
113	parts and resource records, but there was also acknowledgement that many
114	protocols that rely on the DNS could not handle non-ASCII names due to
115	the design of the protocol. Section 3.1 of this document describes the
116	proposed encodings for the non-ASCII name parts.

118	Because of requirement [#2-02], there were proposals for
119	ASCII-compatible encodings (ACEs) of non-ASCII characters. Different
120	ACEs were proposed (and are discussed in Section 4 of this document),
121	but they all have the same goal: to allow non-ASCII characters to be
122	represented in host names that conform to RFC 1034 [RFC1034].

124	2.1 arch-1: Just send binary

126	[KWAN] proposes beginning to send characters outside the range allowed
127	in RFC 1034.

129	Pro: Easiest to describe. Only changes host name syntax, not any of the
130	related DNS protocols.

132	Con: Doesn't work with many exiting protocols that relies on DNS.
133	Violates requirement [#9-02].

135	2.2 arch-2: Send binary or ACE

137	[UDNS] (and, later, [IDNE]) proposes using both binary and ACE formats
138	on the wire.

140	Pro: Allows protocols that can handle binary name parts to use them
141	directly, while allowing protocols that cannot use binary name parts to
142	also handle names without conversion. Allows domain names in free text
143	to be displayed in binary even in systems that require ACE-formatted
144	names on the wire.

146	Con: Requires all software that uses domain names to handle both
147	formats. Requires processing time for conversion of ACE formats into the
148	format must likely used internally to the software.

150	2.3 arch-3: Just send ACE

152	[RACE] and [SENG] propose that host naming rules remain the same and
153	that all internationalize domain names be sent in ACE format.

155	Pro: No changes at all to current DNS protocols.

157	Con: Requires all software to recognize ACE domain names and convert
158	them to human-readable for display. This is true not only in domain
159	names used on the wire but also domain names used in free text.

161	3. Names in binary

163	Both arch-1 and arch-2 include domain name parts that are represented on
164	the wire in a binary format. This section describes some of the features
165	of such names.

167	3.1 bin-1: Format

169	There are many different charsets and encodings for the scripts of the
170	world. The WG has discussed which binary encoding should be used on the
171	wire.

173	3.1.1 bin-1.1: UTF-8

175	The IETF policy on character sets [RFC2277] states that UTF-8 [RFC2279]
176	is the preferred charset for IETF protocols. UTF-8 encodes all
177	characters in the ISO 10646 repertoire.

179	Pro: Well-supported in other IETF protocols. Compact for most scripts.
180	Wide implementation in programming languages. US-ASCII characters have
181	the same encoding in UTF-8 as they do in US-ASCII. Because it is based
182	on ISO 10646, expansion of the repertoire comes from respected
183	international standards bodies.

185	Con: Asian scripts require three octets per character.

187	3.1.2 bin-1.2: Labelled charsets

189	Mailing list discussion mentioned using multiple charsets for the binary
190	representation. Each name part would be labelled with the charset used.

192	Pro: Allows users to specify names in the charsets they are most
193	familiar with.

195	Con: All resolvers would have to know all charsets. Thus, the number of
196	charsets would probably have to be limited and never expand. Mapping of
197	characters between charsets would have to be exact and not change over
198	time.

200	3.2 bin-2: Distinguishing binary from current format

202	Software built for current domain names might give unexpected results
203	when dealing with non-ASCII characters in domain names. For example, it
204	was reported on the mailing list that some software crashes when a
205	non-ASCII domain name is returned for in-addr.arpa requests. Thus, there
206	may be a need for IDN to prevent software that is not binary-aware from
207	receiving domain names with binary parts. This would only apply to an
208	IDN that used arch-2, not arch-1.

210	3.2.1 bin-2.1: Don't mark binary

212	[KWAN] does not specify any way of changing requests to prevent binary
213	name parts from being transmitted.

215	Pro: No changes to current DNS requests and responses.

217	Con: Likely to cause disruption in software that is not binary-aware.
218	Likely to cause systems to misread names and possibly (and incorrectly)
219	convert them to ASCII names by stripping off the high bit in octets;
220	this in turn would lead to security problems due to mistaken identities.
221	Returning binary host names to DNS queries is known to break some
222	current software.

224	3.2.2 bin-2.2: Mark binary with IN bit

226	[UDNS] describes using a bit from the header of DNS queries to mark the
227	query as possibly containing a binary name part and indicating that the
228	response to the query can contain binary name parts.

230	Pro: This bit is currently unused and must be set to zero, so current
231	software won't use it accidentally. No changes to any other part of the
232	query or RRs.

234	Con: It's the last unused bit in the header and DNS folks have indicated
235	that they are very hesitant to give it up.

237	3.2.3 bin-2.3: Mark binary with new QTYPEs

239	[UDNS] using new QTYPEs to mark the query as possibly containing a
240	binary name part and indicating that the response to the query can
241	contain binary name parts. QTYPEs are two octets long, and no QTYPEs to
242	date use more than the lower eight bits, so one of the bits from the
243	upper octet could be used to indicate binary names.

245	Pro: These bits are currently unused and must be set to zero, so current
246	software won't use them accidentally. No changes to any other part of
247	the query or RRs. Uses a bit that isn't as prized as the IN bit.

249	Con: Software must pay more attention to the QTYPEs than it might have
250	previously.

252	3.2.4 bin-2.4: Mark binary with EDNS

254	[IDNE] uses EDNS [RFC2671] to mark the query and response as containing
255	a binary name part.

257	Pro: There is little use of EDNS at this point, so it is very unlikely
258	to have bad interactions with old software. EDNS allows longer name
259	parts, and allows additional information (such as IDN version number)
260	in each name part.

262	Con: There is little use of EDNS and this might make implementation
263	harder.

265	4. Names in ASCII-compatible encoding (ACE)

267	Both arch-2 and arch-3 include domain name parts that are represented on
268	the wire in an ASCII-compatible encoding (ACE). This section describes
269	some of the features of such names.

271	4.1 ace-1: Format

273	A variety of proposals for the format of ACE have been proposed. Each
274	proposal has different features, such as how many characters can be
275	encoded within the 63 octet limit for each name part. The length
276	descriptions in this section assume that there is no distinguishing of
277	ACE from current names; this is not a likely outcome of the WG work.

279	The descriptions of lengths is based on script block names from
280	[BLOCK-NAMES].

282	4.1.1 ace-1.1: UTF-5

284	[SENG] Describes UTF-5, which is a fairly direct encoding of ISO 10646
285	characters using a system similar to UTF-8. Characters from Basic Latin
286	and Latin-1 Supplement take 2 octets; Latin Extended-A through Tibetan
287	take 3 octets; Myanmar through the end of BMP take 4 octets; non-BMP
288	characters take 5 octets. This means that names using all characters
289	in the Myanmar through the end of BMP are limited to 15 characters.

291	Pro: Extremely simple.

293	Con: Poor compression, particularly for Asian scripts.

295	4.1.2 ace-1.2: RACE

297	[RACE] describes RACE, which is a two-step algorithm that first
298	compresses the name part, then converts the compressed string into and
299	ACE. Name parts in all scripts other than Han, Yi, Hangul syllables,
300	Ethiopic, and non-BMP take up ceil(1.6*(n+1)) octets; name parts in
301	those scripts and any name that mixes characters from different rows in
302	ISO 10646 take up ceil(3.2*(n+1)) octets. This means that names using
303	Han, Yi, Hangul syllables, or Ethiopic, are limited to 18 characters.
304	(Note: this document used to be called CIDNUC.)

306	Pro: Best compression for most scripts, and similar compression for the
307	scripts where it is not the best.

309	Con: More complicated than UTF-5. Not well optimized for names that have
310	mixed scripts, such as non-Latin names that use hyphen or ASCII digits.

312	4.1.3 ace-1.3: Hex of UTF-8

314	An early draft described "hex of UTF-8", which is a straight-forward
315	hexadecimal encoding of UTF-8. Characters in Basic Latin (other than
316	non-US-ASCII and hyphen) take 3 octets; Latin Extended-A through Tibetan
317	take 5 octets; Myanmar through end of BMP take 7 octets; non-BMP
318	characters take 9 octets. This means that names using all characters
319	in the Myanmar through the end of BMP are limited to 9 characters.

321	Pros: Very simple to describe.

323	Cons: Very poor compression for all scripts.

325	4.1.4 ace-1.5: SACE

327	A message on the mailing list pointed to code for SACE, an ASCII
328	encoding that purports to compact to about the same size as UTF-8.

330	Pros: Similar compression to UTF-8.

332	Cons: No description of how the algorithm works.

334	4.2 ace-2: Distinguishing ACE from current names

336	Software that finds ACE name parts in free text probably should
337	display the name part using the actual characters, not the ACE
338	equivalent. Thus, software must be able to identify which ASCII name
339	parts are ACE and which are non-ACE ASCII parts (such as current names).
340	This would only apply to an IDN proposal that used arch-2, not arch-3.

342	4.2.1 ace-2.1: Currently legal names

344	Name parts that are currently legal in RFC 1034 can be tagged to
345	indicate the part is encoded with ACE.

347	4.2.1.1 ace-2.1.1: Add hopefully-unique legal tag

349	[RACE] proposes adding a hopefully-unique legal tag to the beginning
350	of the name. The proposal would also work with such a tag at the end of
351	the name part, but it is easier for most people to recognize at the
352	beginning of name parts.

354	Pros: Easy for software (and humans) to recognize.

356	Cons: There is no way to prevent people from beginning non-ACE names
357	with the tag. Unless the tag is very unlikely to appear in any name in
358	any human language, non-ACE names that begin with the tag will display
359	oddly or be rejected by some systems.

361	4.2.1.2 ace-2.1.2: Add a checksum

363	Off-list discussion has mentioned the possibility of creating a checksum
364	mechanism where the checksum would be added to the beginning (or end) of
365	ACE name parts.

367	4.2.2 ace-2.2: Currently illegal names

369	Instead of creating names that are currently legal, another proposal is
370	to create names that use the current ASCII characters but are illegal.

372	4.2.2.1 ace-2.2.1: Add trailing hyphen

374	An earlier draft described using a trailing hyphen as a signifier of an
375	ACE name.

377	Pros: It is surmised that most current software does not reject names
378	that are illegal in this fashion. Thus, there would be little disruption
379	to current systems. This mechanism takes up fewer characters than any
380	proposed in ace-2.1.

382	Cons: Some current software is will probably break with this mechanism.
383	It goes against some current protocols that match the rules in RFC 1034.

385	5. Prohibited characters

387	There was a short but active discussion on the mailing list about which
388	characters from the ISO 10646 character set should never appear in host
389	names. To date, there are no Internet Drafts on the subject. This
390	section summarizes some of the suggestions.

392	5.1 prohib-1: Identical and near-identical characters

394	Some characters are visually identical or incredibly similar to other
395	characters, thus making it impossible to accurately enter host names
396	that are seen in print.

398	5.2 prohib-2: Separators

400	Horizontal and vertical spacing characters would make it unclear where a
401	host name begins and ends. Also, allowing periods and period-like
402	characters as characters within a name part would also cause similar
403	confusion.

405	5.3 prohib-3: Non-displaying and non-spacing characters

407	There are many characters that cannot be seen in the ISO 10646 character
408	set. These include control characters, non-breaking spaces, formatting
409	characters, and tagging characters. These characters would certainly
410	cause confusion if allowed in host names.

412	5.4 prohib-4: Private use characters

414	Private use characters from ISO 10646 inherently have no specified
415	visual form (and in fact can be used for non-displaying characters).
416	Thus, there could be no visual interoperability for characters in the
417	private use areas.

419	5.5 prohib-5: Punctuation

421	Some punctuation characters are disallowed in URLs because they are used
422	in URL syntax.

424	5.6 prohib-6: Symbols

426	Some mailing list discussion stated that characters that do not normally
427	appear in human or company names should not be allowed in host names.
428	This includes symbols and non-name punctuation.

430	6. Canonicalization

432	The working group has a spirited discussion on the need for
433	canonicalization. [IDN-REQ] describes many requirements for when and what
434	type of canonicalization might be performed.

436	6.1 canon-1: Type of canonicalization

438	The Unicode Consortium's recommendations and definitions of
439	canonicalization [UTR-15] describes many forms of canonicalization that
440	can be performed on character strings. [DUERST] covers much of the same
441	ground but makes more focused requirements for canonicalization on the
442	Internet.

444	6.1.1 canon-1.1: Normalization Form C

446	[DUERST] recommends Normalization Form C, as described in [UTR-15], for
447	use on the Internet. This form is a canonical decomposition, followed by
448	canonical composition.

450	6.1.2 canon-1.2: Normalization Form KC

452	Discussion on the mailing list recommended Normalization Form KC. This
453	form is a compatibility decomposition, followed by canonical
454	composition. Compatibility decomposition makes characters that have
455	compatibility equivalence the same after decomposing.

457	6.2 canon-2: Other canonicalization

459	Host names may have special canonicalization needs that can be added to
460	those given in canon-1.

462	6.2.1 canon-2.1: Case folding in ASCII

464	RFC 1034 specifies that there is no difference between host names that
465	have the same letters but the letters have different case. Thus, the
466	name part "example" is considered the same as "Example" and "EXamPLe".
467	Neither uppercase nor lowercase is specified as being canonical.

469	6.2.2 canon-2.2: Case folding in non-ASCII

471	Discussion on the mailing list has raised the issue of whether or not
472	non-ASCII Latin characters should have the same case-folding rules as
473	ASCII. Such rules would match the expectations of native speakers of
474	some languages, but would go counter to the expectations of native
475	speakers of other languages.

477	6.2.3 canon-2.3: Han folding

479	Discussion on the mailing list has raised the issue of equivalences in
480	some languages use of Han characters. For example, in Chinese, there are
481	many traditional characters that have equivalent simplified characters.
482	Similarly, there are some Han ideographs for which there are multiple
483	representations in ISO 10646. There are no well-established rules for
484	such folding, and some of the proposed folding would be locale-specific.

486	6.3 canon-3: Location of canonicalization

488	Canonicalization can be performed in any system in the DNS. Because it
489	is not a trivial operation and can require large tables, the location of
490	where canonicalization is performed is important.

492	6.3.1 canon-3.1: Canonicalize only in the application

494	Early canonicalization is a cleaner architecture design. Spending the
495	cycles on the end systems puts less burden on resolvers or servers in
496	the DNS service. When IDN is first adopted, the applications need to be
497	updated anyway to handle the new format for the names. It is easier for
498	people to upgrade their applications than their resolvers if they need a
499	new IDN feature.

501	6.3.2 canon-3.2: Canonicalize only in the resolver

503	Updating a single resolver provides new service to large number of
504	applications and (possibly) users. It is easier to find canonicalization
505	bugs in resolvers than in applications because the resolver has
506	predictable programmatic interfaces. IDN will probably be revised often
507	as new characters are added to ISO 10646, so updating smaller number of
508	resolvers is better than revising more applications. When an end user
509	has a problem with resolving an IDN name, it is much easier to test if
510	the problem is in the resolver than in the user's application.

512	6.3.3 canon-3.3: Canonicalize in the DNS service

514	Canonicalization should happen as late as possible so that changes in
515	the canonicalization algorithm don't orphan all applications and
516	resolvers. Some canonicalization discards information and so should be
517	delayed as long as possible. Canonicalization is practically free,
518	computationally (although it involves some large tables). Because adding
519	IDN to the DNS will happen over time, canonicalizing at the server will
520	minimize the number of things that need to be changed, and simplify and
521	centralize the process of change.

523	7. Transitions

525	Early in the working group discussion, there was active debate about how
526	the transition from the current host name rules to IDN would be handled.
527	Given requirement [#1-02], this transition is quite important to
528	deciding which proposals might be feasible.

530	7.1 trans-1: Always do current plus new architecture

532	In this proposal, IDN will be used at the same time as the current DNS
533	forever. That is, IDN will be in addition to the current DNS.

535	7.2 trans-2: Transition period

537	In this proposal, IDN will be used at the same time as the current DNS
538	for a specified period of time, after which only IDN will exist. That
539	is, IDN will replace the current DNS.

541	8. Root server considerations

543	DNS root servers receive all requests for top-level domains that are not
544	in the local DNS cache. They are critical to the Internet.  Care must be
545	taken to ensure that root servers will not be affected by new mechanisms
546	introduced.

548	Any IDN proposal that includes a binary encoding will have an impact on
549	the root servers. The binary requests will affect the root servers
550	because the current root server software is designed to handle current
551	host names. Further, the root zone files which contain ccTLDs and gTLDs
552	would have to support binary domain names and possibly binary host names
553	for NS records. Because all the root servers are equivalent, they would
554	have to be synchronized to support the binary domain names at the same
555	time.

557	Proposals that only use ACE and use tagging with currently-legal names
558	would, by definition, not affect the root servers.

560	9. Security considerations

562	All security considerations listed in [IDN-REQ] apply to this document.
563	Further, all security considerations listed in each of the IDN proposals
564	must be considered when comparing the proposals.

566	Some proposals described in this document may create new security
567	considerations. However, these considerations will have to be addressed
568	in the eventual protocol document. All the proposals described here are
569	still incomplete and security considerations may be added to them as
570	they are revised. All the proposals listed in this document use the ISO
571	10646 character set, so the proposals inherit any security
572	characteristics of that character set.

574	Many protocols and applications rely on domain names to identify the
575	parties involved in a network transaction. For example, a user who
576	connects to a web site by entering or selecting a URL expects that their
577	software will select the web site named in the URL. The uniqueness of
578	domain names are crucial to ensure identification of Internet entities.

580	To make round-trip translation between local charsets and ISO 10646, the
581	ISO 10646 specification has assigned multiple code points to individual
582	glyphs. Moreover, some glyphs might look similar to some users, but look
583	clearly different by other users. This means that it would be simple for
584	an attacker to mimic a domain name by using similar-looking but
585	different glyphs and guessing that some users will not see the
586	difference in their user interface.

588	Some IDN protocols may have denial of service attacks, such as by using
589	non-identified chars, exception characters, or under-specified behavior
590	in using some special characters.

592	10. IANA considerations

594	This document does not create any new IANA registries. However, it is
595	possible that a character property registry may need to be set up when
596	the IDN protocol is created in order to list prohibited characters
597	(section 5) and canonicalization mappings (section 6).

599	11. Acknowledgements

601	James Seng and Marc Blanchet gave many helpful suggestions on the
602	pre-release versions of this document.

604	12. References

606	[BLOCK-NAMES] Unicode Consortium,
607	<ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt>.

609	[DUERST] Character Normalization in IETF Protocols,
610	draft-duerst-i18n-norm-03

612	[IDN-REQ] Requirements of Internationalized Domain Names,
613	draft-ietf-idn-requirements-02

615	[IDNE] Internationalized domain names using EDNS (IDNE),
616	draft-ietf-idn-idne-01

618	[KWAN] Using the UTF-8 Character Set in the Domain Name System,
619	draft-skwan-utf8-dns-03

621	[RACE] RACE: Row-based ASCII Compatible Encoding for IDN,
622	draft-ietf-idn-race-00

624	[RFC2277] IETF Policy on Character Sets and Languages, RFC 2277

626	[RFC2279] UTF-8, a transformation format of ISO 10646, RFC 2279

628	[RFC2671] Extension Mechanisms for DNS (EDNS0), RFC 2671

630	[SENG] UTF-5, a transformation format of Unicode and ISO 10646,
631	draft-jseng-utf5-01

633	[UDNS] Using the Universal Character Set in the Domain Name System
634	(UDNS), draft-ietf-idn-udns-00

636	[UTR15] Unicode Normalization Forms, Unicode Technical Report #15

638	A. Differences Between -00 and -01 Drafts

640	Throughout: Changed references from [HOFFMAN] to [RACE].

642	Throughout: Changed references from [OSCARSSON] to [UDNS].

644	Throughout: Added [IDNE].

646	Removed section 1.2.

648	3.2.3: Updated to mention [UDNS].

650	3.2.4: Updated with [IDNE], changed "EDNS0" to "EDNS", and reworded.

652	4.1.2: Added Ethiopic to the list of scripts that require two octets per
653	character.

655	4.1.3: Removed reference to [OSCARSSON] because that is no longer in the
656	[UDNS] draft.

658	4.2.2.1: Removed reference to [OSCARSSON] because that is no longer in
659	the [UDNS] draft.

661	6.1.1: Reworded first sentence.

663	6.3: Added entire section and subsections.

665	8: Fixed typo in first sentence.

667	B. Author Contact

669	Paul Hoffman
670	IMC & VPNC
671	127 Segre Place
672	Santa Cruz, CA  95060
673	phoffman@imc.org or paul.hoffman@vpnc.org